SGCE-Font: Skeleton Guided Channel Expansion for Chinese Font Generation

Jie Zhou, Yefei Wang, Yiyang Yuan, Qing Huang, Jinshan Zeng^† Jie Zhou, Yefei Wang, Yiyang Yuan, Qing Huang and Jinshan Zeng are with the School of Computer and Information Engineering, Nanchang, 330022, China. Email: [email protected] (J. Zhou), [email protected] (Y. Wang), [email protected] (Y.Yang), [email protected] (Q. Huang), [email protected] (J. Zeng).

{\dagger}

Corresponding author: Jinshan Zeng.

Abstract

The automatic generation of Chinese fonts is an important problem involved in many applications. The predominated methods for the Chinese font generation are based on the deep generative models, especially the generative adversarial networks (GANs). However, existing GAN-based methods (say, CycleGAN) for the Chinese font generation usually suffer from the mode collapse issue, mainly due to the lack of effective guidance information. This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) module for the Chinese font generation through integrating the skeleton information into the generator with the channel expansion way, motivated by the observation that the skeleton embodies both local and global structure information of Chinese characters. We conduct extensive experiments to show the effectiveness of the proposed module. Numerical results show that the mode collapse issue suffered by the known CycleGAN can be effectively alleviated by equipping with the proposed SGCE module, and the CycleGAN equipped with SGCE outperforms the state-of-the-art models in terms of four important evaluation metrics and visualization quality. Besides CycleGAN, we also show that the suggested SGCE module can be adapted to other models for Chinese font generation as a plug-and-play module to further improve their performance.

Index Terms:

Chinese font generation, generative adversarial networks, mode collapse, skeleton, channel expansion.

I Introduction

The generation of Chinese fonts has attracted amounts of attention in recent years due to its wide range of applications [1, 2, 3]. With the development of deep learning, in particular the deep generative models such as the generative adversarial networks (GANs) [4] and variational auto-encoder (VAE) [5], the kind of deep generative model based methods has become the mainstream due to their impressive performance [6, 7, 8, 9, 10, 11]. Existing deep generative models for Chinese font generation can be generally divided into two categories, that is, supervised models [12, 13, 7, 14, 15, 16, 17] and unsupervised models [1, 11, 10, 18, 19], where the kind of supervised models is mainly based on the paired data (i.e., there is a one-to-one correspondence between characters in the source and target font domains), and the kind of unsupervised models is mainly based on the unpaired data (without requiring a one-to-one correspondence between the source and target font domains).

In the early work [12] of supervised models, the authors suggested an auto-encoder guided GAN model for the Chinese calligraphy synthesis based on the paired data, regarding the Chinese font generation problem as an image-to-image translation problem. Motivated by this perspective, many models have been developed for the Chinese font generation in recent years [7, 16, 17]. In [7], the authors adopted the Pix2Pix model [20] to the generation of Chinese fonts. In [16], an effective GAN model dubbed CalliGAN was proposed for the Chinese font generation by exploiting certain component information of Chinese characters. The recent paper [17] proposed a GAN-based image translation model for the synthesis of Chinese brush handwriting font by integrating the skeleton information in the expense of increasing model complexity of the deep neural networks. Despite the impressive performance of this kind of supervised models [12, 7, 16, 17], the collection of a large amount of paired samples is generally overhead, and even unrealistic for some font generation problems like the generation of ancient calligraphy fonts [21, 22].

To tackle this issue, many unsupervised models based on unpaired data have been suggested in the literature [23, 26, 27, 28]. In the seminal work [23] of unsupervised models, the authors adopted the well-known CycleGAN model [29] to the generation of Chinese fonts based on the unpaired data. In [30], the authors proposed an effective model using graph matching for the calligraphy character generation. In [26], a structure-guided deep generative model dubbed SCFont was suggested for the generation of Chinese fonts, by integrating the domain knowledge of Chinese characters such as writing trajectory and skeleton into the generation. [28] proposed a novel deformable generative model called DG-Font for the generation of Chinese font based on unpaired data. Although the effectiveness of these unsupervised models has been demonstrated in the literature, the kind of unsupervised models (say, CycleGAN [23]) usually suffer from the well-known mode collapse issue [4], which results in poor performance, mainly due to the lack of effective guidance information. As shown in the second row of Figure LABEL:fig:modecollapse, mode collapse happens for CycleGAN in three concerned generation tasks, and in this case, CycleGAN fails to yield the correct Chinese characters.

Refer to caption — Figure 2: Examples of Chinese characters and their stroke encodings, square-block transformations and skeletons.

To reduce the mode collapse, several effective Chinese font generation models have been recently suggested in the literature [25, 24] through integrating certain kinds of component information of Chinese characters. In [25], the authors incorporated the stroke information into CycleGAN to reduce the mode collapse (there dubbed StrokeGAN), where certain one-bit stroke encoding was introduced to represent the stroke information as shown in the second row of Figure 2. In [24], the authors proposed a square-block geometric transformation based self-supervised scheme for CycleGAN to capture certain spatial structure information of Chinese characters, inspired by the observation that character structures are closely related to these square-block transformations. As shown in the third row of Figure 2, these four square-block transformations were used in [24] to represent the single-component, left-right, up-bottom, and semi-encircling structures, respectively. It can be observed from the third and fourth rows of Figure LABEL:fig:modecollapse that both the one-bit stroke encoding and square-block transformations are generally insufficient to thoroughly address the mode collapse issue, since there are some Chinese characters with both the same one-bit stroke encodings and character structures as shown in Figure 2. Moreover, the guidance information used in both StrokeGAN [25] and SQ-GAN [24] was imposed on the discriminator, leading to certain implicit guidance during the generation.

In this paper, we focus on developing the effective information guidance scheme for the Chinese font generation. In particular, we aim to answer the following two questions:

(Q1)

What kind of guidance information is more effective for the Chinese font generation?
(Q2)

How to use the guidance information more effectively for the Chinese font generation?

For (Q1), we suggest using the skeletons of Chinese characters as the guidance information for the Chinese font generation, motivated by the observation that the skeleton can embody both local and global structure information of Chinese characters as well as their semantic information, as shown in Figure 2. From Figure 2, the stroke encodings used in StrokeGAN [25] and square-block transformations used in SQ-GAN [24] can only reflect either local or global structure information of Chinese characters and in particular cannot provide the sole representation of a Chinese character, while the skeleton can provide both local and global structure information as well as the sole representation of a Chinese character. Thus, the skeleton should be more effective than both the stroke encoding and square-block transformation for the Chinese font generation. For (Q2), we directly use the skeleton information as a part of input of the generator through a channel expansion way, instead of imposing it on the discriminator. Thus, the skeleton should provide more information and stronger guidance for the generator than existing implicit ways of imposing guidance information on the discriminator.

The major contributions of this paper can be summarized as follows:

(1)

This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) for the Chinese font generation through integrating the skeleton information into the generator by a channel expansion way. The proposed module can provide both local and global structure information as the guidance information for the Chinese font generation while does not increase any additional model complexity. Thus, the proposed module can effectively alleviate the mode collapse and improve the quality of the generated Chinese characters, and can be used as a plug-and-play module for Chinese font generation models without increasing model complexity.
(2)

Extensive experiments are conducted to demonstrate the effectiveness of the proposed module, in particular its superiority on reducing mode collapse and enhancing the generation performance. The outperformance of the proposed SGCE-Font model (i.e., equipping the proposed SGCE module into the known CycleGAN model) is also shown by numerous experiments under quantitative comparisons with the state-of-the-art models. Some generated characters of the proposed model are presented in Figure 3. It can be observed that the proposed model can generate very realistic Chinese characters over these ten concerned Chinese fonts.
(3)

Besides CycleGAN, we also show that the proposed module can be easily adapted to other existing Chinese font generation models and further enhance their performance.

The rest of this paper is organized as follows. In Section II, we present some related work. In Section III, we describe the proposed module in detail. In Section IV, we provide amounts of experiments to demonstrate the effectiveness of the proposed module. We conclude this paper in the final section.

II Related Work

As shown in Figure 2, the components of Chinese characters such as strokes and skeletons directly reflect certain structure information of Chinese characters. Motivated by this, many kinds of component information of Chinese characters have been widely used in the literature [31, 32, 33, 34, 35, 17, 26, 36, 11, 37] to improve the generation performance. In the early stage, traditional models [31, 32, 33] firstly decompose the Chinese characters into several basic components such as strokes and radicals, and then assemble them to yield new Chinese characters by leveraging some effective machine learning methods. It should be pointed out that the decomposition of strokes or radicals is generally hand-crafted and thus very costly.

To address this issue, several end-to-end models based on deep learning are introduced to extract these important components of Chinese characters and incorporate them into the generation procedure [34, 35, 17, 26, 36, 11, 25, 24]. Specifically, [34] firstly adopted a coherent point drift algorithm to divide the Chinese characters into strokes, and then produced new font strokes by fusing the styles of two existing font strokes, and further yielded new fonts through assembling them. [35] suggested a stroke refinement branch to particularly focus on the generation of thin strokes. [25] introduced a simple one-bit stroke encoding to represent the stroke information and incorporated it into the CycleGAN model as the supervision information. By the use of such stroke encoding, the mode collapse of CycleGAN can be remarkably reduced [25]. In the recent paper [24], the authors proposed a square-block translation based self-supervised scheme for the generation of Chinese characters, inspired by the fact that structures of Chinese characters are closely related to the square-block transformations. As compared to [34, 35, 25, 24], this paper exploits the skeleton instead of the stroke representing certain local structure information [34, 35, 25] and the square-block transformation representing certain global structure information [24], mainly motivated by the observation that skeletons can provide both local and global structure information of Chinese characters.

Besides the stroke and character structure information, several works based on the skeleton-like information have been studied in the literature [26, 36, 37, 17, 11]. [26] introduced two neural network modules to realize the extraction of strokes and writing trajectories of Chinese characters respectively, and then incorporated them into the generation of Chinese fonts. [36] proposed a three-stage GAN model including the skeleton extraction, skeleton transformation and stroke rendering for multi-chirography image translation. [11] proposed a font fusion network based on GAN and disentangled representation learning to generate brand new fonts, where the disentangled representation learning was introduced to obtain the stoke style and skeleton shape. [37] incorporated the skeleton of Chinese characters into the generator and utilized it as certain structural constraint to supervise the model, while similar idea was also used in [17] for the generation of brush handwriting Chinese font. It should be pointed out that both [37] and [17] are based on the paired data and through introducing auxiliary network modules to extract skeletons, while this paper is based on the unpaired data and through using some simple mathematical verification conditions to extract the skeleton.

$P_{9}$	$P_{2}$	$P_{3}$
$P_{8}$	$P_{1}$	$P_{4}$
$P_{7}$	$P_{6}$	$P_{5}$

TABLE I: Layout of the defined 3

\times

3 image patch.

Distinguished from previous studies struggling to increase the model complexity to utilize these components of Chinese characters to help the generation tasks, we aim to perform much simpler and more effective supervision for a relatively simple GAN model to achieve its full potential. As discussed before, the proposed skeleton guided channel expansion module is very simple but effective to provide certain local and global supervision for GAN models to alleviate the mode collapse issue and enhance the performance, while does not introduce any additional model complexity. The skeleton information used in this paper can provide richer spatial structure information and semantic information of Chinese characters, and the suggested channel expansion way to exploit the skeleton information can provide stronger guidance for the generator. Moreover, the proposed module can be easily adapted to many existing GAN models to improve their performance.

III SGCE for Chinese Font Generation

In this section, we firstly describe the proposed skeleton guided channel expansion (SGCE) module as an effective information guidance module for the Chinese font generation, and then take the well-known CycleGAN [23] as an example to show how to integrate the proposed module into the existing Chinese font generation models.

III-A Skeleton Guided Channel Expansion Module

As discussed before, using what kind of information as the guidance information and how to use it are very important for the Chinese font generation. Motivated by the superiority of the skeleton on representing the global and local structure information of Chinese characters and the advantage of channel expansion on directly using the information, we propose a skeleton guided channel expansion module as certain a information guidance module for the Chinese font generation, as depicted in Figure 4. From Figure 4, given a character image $x$ as the input, we firstly extract its skeleton $sx$ with single channel via a skeletonization strategy denoted as Ske for short and described later, and then combine the single-channel skeleton information with the original (R, G, B) three-channel character image information to yield the four-channel image information $cex$ as the input of the generator through the channel expansion way.

In the next, we describe the skeletonization strategy used in this paper to extract the skeleton of a Chinese character, inspired by [38]. Given an image of a Chinese character, we firstly perform a binarized operation on it, and then yield the skeleton from the binarized image. Specifically, for a given pixel $P_{1}$ of the binarized image, we define a $3\times 3$ image patch with $P_{1}$ as the center, and number the other eight pixels according to Table I. For those pixels lying on the edge of the binarized image, we obtain the associated $3\times 3$ image patches using the zero-padding way. For the defined $3\times 3$ image patch as shown in Table I, let $p_{i}$ be the value of $P_{i}$ ( $i=1,\ldots,9$ ), ${\cal N}(P_{1})$ be the number of non-zero points in this $3\times 3$ image patch, and ${\cal P}(P_{1})$ be the number of $(0,1)$ pairs among $P_{2}$ to $P_{9}$ by sequence, then we delete $P_{1}$ if the following hold:

(a)

$2\leq{\cal N}(P_{1})\leq 6,\ {\cal P}(P_{1})=1$ ; and either
(b)

$p_{2}\cdot p_{4}\cdot p_{6}=0,\ p_{4}\cdot p_{6}\cdot p_{8}=0$ ; or
(c)

$p_{2}\cdot p_{4}\cdot p_{8}=0,\ p_{2}\cdot p_{6}\cdot p_{8}=0$ .

We eventually yield the skeleton of a Chinese character through doing the above operation over all pixels.

According to the above description, the proposed SGCE module does not require any additional neural network to implement it. This is very different to existing works [26, 36, 37, 17, 11] struggling to increase the model complexity to exploit the component information (say skeleton) for the Chinese font generation.

III-B Proposed SGCE-Font Model

In the following, we describe the proposed model for Chinese font generation through integrating the suggested SGCE module (called SGCE-Font), where we take the well-known CycleGAN [23] as the base model as an example. The specific model architecture is presented in Figure 5. As depicted in Figure 5, the proposed SGCE-Font is very similar to that of the existing model (say, CycleGAN), except that the input of the generator is replaced by the four-channel skeleton-image information yielded by the suggested SGCE module, instead of the original (R, G, B) three-channel image information. The specific workflow of the proposed model can be described as follows. We first input the Chinese character image $x$ with (R, G, B) three channels in the source font domain into SGCE module to yield $cex=SGCE(x)$ enriched by the skeleton information into the generator $G_{y}$ to realize the style translation from the source font style to the target font style. After $G_{y}$ , we yield the generated character $\tilde{y}=G_{y}(cex)$ in the target font domain. On one hand, we simultaneously feed the generated characters and the real Chinese characters in the target font domain into the discriminator $D_{y}$ to distinguish whether they are real or fake, and on the other hand, we feed the generated character $\tilde{y}=G_{y}(cex)$ into the SGCE module and then input the enriched information into another generator $G_{x}$ (realizing the font generation task from the target font domain to the source font domain), and finally yield a reconstructed character $\tilde{x}=G_{x}(SGCE(\tilde{y}))$ in the source font domain. According to the above workflow, the training loss of the proposed model consists of four parts including two adversarial losses ${\cal L}_{adv_{x}}$ and ${\cal L}_{adv_{y}}$ related to $(D_{x},G_{x})$ and $(D_{y},G_{y})$ respectively, the cycle consistency loss ${\cal L}_{cyc}$ between the original Chinese character $x$ and its reconstructed character $\tilde{x}$ in the source font domain, the skeleton consistency loss ${\cal L}_{ske}$ between the skeleton $sx$ of the input $x$ and the skeleton $\widetilde{sx}=Ske(\tilde{x})$ of the reconstructed Chinese character $\tilde{x}$ in the source font domain, where the skeleton consistency loss is imposed to further supervise the generation procedure. Specifically, these losses are defined as follows:

	$\displaystyle\mathcal{L}_{adv_{x}}\left(D_{x},G_{x}\right)$
	$\displaystyle=E_{x\sim\mathcal{X}}\left[\log D_{x}(x)\right]+E_{\tilde{x}\sim\tilde{\mathcal{X}}}\left[\log\left(1-D_{x}(\tilde{x})\right)\right],$
	$\displaystyle\mathcal{L}_{adv_{y}}\left(D_{y},G_{y}\right)$
	$\displaystyle=E_{y\sim\mathcal{Y}}\left[\log D_{y}(y)\right]+E_{\tilde{y}\sim\tilde{\mathcal{Y}}}\left[\log\left(1-D_{y}(\tilde{y})\right)\right],$

	$\displaystyle\mathcal{L}_{cyc}\left(G_{x},G_{y}\right)=E_{(x,\tilde{x})\sim(\mathcal{X},\tilde{\mathcal{X}})}\\|x-\tilde{x}\\|_{1},$
	$\displaystyle\mathcal{L}_{ske}\left(G_{x},G_{y}\right)=E_{(x,\tilde{x})\sim(\mathcal{X},\tilde{\mathcal{X}})}\\|Ske(x)-{Ske}(\tilde{x})\\|_{1},$

where ${\cal X}$ and ${\cal Y}$ respectively represent the sets of real Chinese characters in the source and target domains, $\tilde{\cal X}$ represents the set of reconstructed Chinese characters in the source font domain by $G_{x}$ , $\tilde{\cal Y}$ represents the set of generated characters in the target font domain by $G_{y}$ . Thus, the training model of SGCE-Font is shown as follows:

\displaystyle\min_{G:=(G_{x},G_{y})}\max_{D:=(D_{x},D_{y})}{\cal L}_{SGCE-Font}(D,G),

where

	$\displaystyle{\cal L}_{SGCE-Font}(D,G)={\cal L}_{adv_{x}}(D_{x},G_{x})+{\cal L}_{adv_{y}}(D_{y},G_{y})$
	$\displaystyle+\lambda_{cyc}{\cal L}_{cyc}(G_{x},G_{y})+\lambda_{ske}{\cal L}_{ske}(G_{x},G_{y}),$

and $\lambda_{cyc}$ , $\lambda_{ske}$ are two tunable hyperparameters.

Font	ht	fs	zk	hw	hp	st	ls	hc	zx	ss
Size	3755	3755	2811	3755	2811	2811	2811	2811	920	166

TABLE II: Number of sizes of different font datasets.

	task	zk $\rightarrow$ fs	zk $\rightarrow$ st	fs $\rightarrow$ ht	zk $\rightarrow$ hp	zk $\rightarrow$ hw	fs $\rightarrow$ ls	ls $\rightarrow$ hc	fs $\rightarrow$ zk	st $\rightarrow$ zk	ht $\rightarrow$ fs	hp $\rightarrow$ zk	hw $\rightarrow$ zk	ls $\rightarrow$ fs	hc $\rightarrow$ ls	avg
FID	CycleGAN	25.80	79.33	304.34	43.91	31.54	282.81	33.02	32.56	47.19	248.63	218.18	27.10	136.13	30.17	110.05
	SQ-GAN	49.87	71.69	38.90	57.44	49.53	29.08	71.22	99.04	146.29	37.55	305.93	24.24	56.90	50.45	77.72
	StrokeGAN	57.85	71.30	28.97	92.86	37.78	28.63	34.60	67.51	71.30	39.61	140.23	23.80	27.35	26.74	53.47
	SkeGAN	36.58	58.75	40.27	26.90	28.91	32.32	46.09	33.58	33.12	59.52	27.59	48.29	49.84	72.97	42.48
	SGCE-Font	34.68	40.35	20.66	34.60	53.57	20.79	30.29	25.55	31.91	23.73	26.74	23.03	35.70	30.70	30.88
MSE	CycleGAN	0.117	0.214	0.197	0.212	0.097	0.187	0.155	0.135	0.121	0.166	0.177	0.201	0.193	0.148	0.166
	SQ-GAN	0.147	0.199	0.186	0.292	0.106	0.162	0.176	0.152	0.158	0.143	0.161	0.208	0.153	0.206	0.175
	StrokeGAN	0.120	0.158	0.188	0.223	0.104	0.090	0.175	0.104	0.125	0.129	0.181	0.207	0.116	0.112	0.145
	SkeGAN	0.112	0.198	0.106	0.187	0.092	0.088	0.150	0.133	0.128	0.116	0.121	0.191	0.111	0.146	0.134
	SGCE-Font	0.111	0.158	0.117	0.175	0.090	0.094	0.163	0.087	0.122	0.117	0.112	0.188	0.123	0.144	0.129
PSNR	CycleGAN	9.51	7.26	7.07	6.79	10.21	7.287	8.16	8.75	8.32	7.80	7.55	7.02	7.17	5.65	7.75
	SQ-GAN	9.82	7.06	7.82	6.98	9.79	9.05	7.72	9.37	8.05	8.63	7.40	6.85	8.99	8.36	8.28
	StrokeGAN	9.36	6.77	7.32	6.35	9.91	10.66	7.72	10.00	6.77	9.04	8.38	6.88	9.55	9.68	8.46
	SkeGAN	9.70	7.08	9.98	7.37	10.46	10.81	8.30	8.83	9.06	9.47	9.29	7.25	9.73	8.42	8.98
	SGCE-Font	9.73	8.11	9.55	7.72	10.55	10.47	7.95	10.85	9.32	9.69	9.68	7.35	9.28	8.49	9.20
SSIM	CycleGAN	0.568	0.495	0.346	0.386	0.330	0.461	0.519	0.502	0.473	0.146	0.413	0.459	0.433	0.008	0.396
	SQ-GAN	0.570	0.448	0.478	0.354	0.293	0.595	0.499	0.530	0.428	0.485	0.359	0.444	0.505	0.543	0.466
	StrokeGAN	0.553	0.464	0.462	0.300	0.301	0.660	0.452	0.574	0.464	0.533	0.448	0.447	0.547	0.566	0.484
	SkeGAN	0.579	0.465	0.605	0.428	0.341	0.657	0.535	0.506	0.527	0.526	0.543	0.460	0.538	0.553	0.519
	SGCE-Font	0.582	0.534	0.613	0.452	0.349	0.664	0.539	0.632	0.542	0.579	0.566	0.476	0.549	0.578	0.547

TABLE III: Comparison on the performance of the proposed SGCE-Font and existing GAN models with other guidance schemes over fourteen font generation tasks. The best and second best results are marked in bold and blue color respectively.

IV Experiments

In this section, we conducted a series of experiments to show the effectiveness of the proposed SGCE module for the Chinese font generation. All experiments were carried out in Pytorch environment running Linux, AMD Ryzen 7 5800X 8-Core Processor CPU, GeForce RTX 3090 GPU.

IV-A Experimental Settings

A. Datasets. We considered ten datasets with different fonts including three standard printing fonts {Heiti (ht), Fangsong (fs), Zhengkai (zk)}, the handwriting (hw) font, three pseudo-handwriting fonts {Hupo (hp), Shuti (st), Lishu (ls)}, and three calligraphy fonts {Huangcao (hc), Zhuxi (zx), SuShi(ss)}. The specific sizes of datasets were presented in Table II and some samples were shown in the left part of Figure 3. The handwriting dataset was randomly collected from the CASIA-HWDB1.1 dataset, the other font datasets were collected by ourselves from the internet. The sizes of the last two calligraphy font datasets, i.e., Zhuxi and SuShi are very small. We use them mainly to show the performance of the proposed model over the few-shot font generation task. For each sample, we reset the image size of each character to 128×128×3. In all experiments, we used 80% and 20% samples as the training and test sets respectively.

B. Baselines. In this paper, we considered the following six state-of-the-art models as baselines.

•

CycleGAN [23]: A typical model based on GAN and unpaired data.
•

SQ-GAN [24]: An improved CycleGAN model equipped with a square-block geometric transformation based self-supervised learning scheme.
•

StrokeGAN [25]: A refined CycleGAN model incorporated with the one-bit stroke encoding for the Chinese font generation to mitigate the mode collapse.
•

UGATIT [20]: An effective GAN model using unsupervised generative attention network with adaptive layer instance normalization and unpaired data.
•

FUNIT [39]: An effective GAN model based on the disentangled representation learning and unpaired data.
•

AttentionGAN [40]: An effective GAN model based on the attention mechanism and unpaired data.

Besides the above baselines, we also suggested another model called SkeGAN as a baseline for better comparison with the proposed model. In the SkeGAN, we impose the skeleton information to the discriminator in the similar way of StrokeGAN [25].

C. Network structures. The network structure of the generator of SGCE-Font is almost the same as that of CycleGAN [29], including 2 convolutional layers in the down-sampling module, 9 residual modules with 2 convolutional layers of residual networks for each residual module, and 2 deconvolutional layers in the up-sampling module, except that the input of the generator was modified from 3 channels to 4 channels. The network structure of the discriminator of SGCE-Font is the same as CycleGAN [29], with 6 hidden convolutional layers and 2 convolutional layers in the output module. Moreover, batch normalization [41] was used in all layers.

We used the popular Adam algorithm [42] as the optimizer with the associated parameters (0.5, 0.999) in both the generator and discriminator optimization subproblems. The hyperparameters of the cycle consistency loss and skeleton consistency loss were empirically set as 1 and 0.001, respectively.

D. Evaluation metrics. We used four important evaluation metrics to measure the performance. The first two evaluation metrics were the commonly used SSIM (Structural Similarity) [43] and PSNR (Peak Signal to Noise Ratio), which were employed to measure how the pixel-level details are preserved. The third evaluation metric is the well-known MSE (Mean Square Error). The last evaluation metric is FID (Frechet Inception Distance) [44], which was used to measure how the generated results of the model match the distribution of real data. Larger SSIM and PSNR values, and smaller MSE and FID values generally imply better generation performance.

	task	zk $\rightarrow$ fs	zk $\rightarrow$ st	fs $\rightarrow$ ht	zk $\rightarrow$ hp	zk $\rightarrow$ hw	fs $\rightarrow$ ls	fs $\rightarrow$ zk	st $\rightarrow$ zk	ht $\rightarrow$ fs	hp $\rightarrow$ zk	hw $\rightarrow$ zk	ls $\rightarrow$ fs	avg
FID	FUNIT	37.50	43.67	64.99	55.28	75.55	30.84	151.73	207.22	40.68	254.40	68.82	63.33	91.17
	UGATIT	42.82	118.10	37.92	64.37	77.10	30.32	34.03	128.10	28.32	76.55	29.01	23.00	58.11
	AttentionGAN	32.32	92.03	32.80	25.20	26.86	17.77	57.46	51.84	37.74	55.69	25.01	34.33	40.76
	SGCE-Font	34.68	40.35	20.66	34.60	53.57	20.79	25.55	31.91	23.73	26.73	23.03	35.70	30.94
MSE	FUNIT	0.127	0.224	0.212	0.331	0.105	0.170	0.138	0.180	0.178	0.188	0.216	0.155	0.185
	UGATIT	0.113	0.151	0.170	0.263	0.129	0.119	0.090	0.127	0.140	0.144	0.224	0.133	0.150
	AttentionGAN	0.109	0.197	0.170	0.162	0.097	0.087	0.095	0.151	0.125	0.143	0.191	0.126	0.138
	SGCE-Font	0.111	0.158	0.117	0.175	0.090	0.094	0.087	0.122	0.117	0.112	0.188	0.123	0.125
PSNR	FUNIT	9.07	6.52	6.77	4.83	9.86	7.75	8.66	7.48	7.52	7.28	6.68	8.16	7.56
	UGATIT	9.63	8.34	7.80	5.87	9.34	8.83	10.65	9.10	8.66	8.48	6.55	9.43	8.52
	AttentionGAN	9.83	7.08	7.75	8.05	10.24	10.86	10.40	8.27	9.15	8.47	7.26	9.09	8.87
	SGCE-Font	9.73	8.11	9.55	7.71	10.55	10.47	10.85	9.32	9.69	9.68	7.35	9.28	9.36
SSIM	FUNIT	0.525	0.426	0.388	0.241	0.295	0.511	0.485	0.412	0.389	0.386	0.373	0.460	0.408
	UGATIT	0.549	0.528	0.549	0.341	0.271	0.642	0.589	0.486	0.549	0.451	0.400	0.607	0.497
	AttentionGAN	0.580	0.451	0.463	0.440	0.332	0.665	0.588	0.433	0.507	0.452	0.462	0.505	0.490
	SGCE-Font	0.582	0.534	0.613	0.452	0.349	0.664	0.632	0.542	0.579	0.566	0.476	0.549	0.545

TABLE IV: Comparison on the performance of SGCE-Font and state-of-the-art models over fourteen generation tasks. The best and second best results are marked in bold and blue color respectively.

IV-B Superiority of Proposed SGCE Module

In this section, we implemented extensive experiments over fourteen generation tasks to demonstrate the superiority of the proposed SGCE module for the Chinese font generation. The quantitative comparisons between the proposed SGCE-Font and these baselines including CycleGAN [23], SQ-GAN [24], StrokeGAN [25] and SkeGAN are presented in Table III, and some visual comparisons on the generated characters are presented in Figure LABEL:fig:modecollapse and Figure 6.

From Table III, the proposed SGCE-Font model outperforms these concerned baselines in terms of four evaluation metrics in average. Specifically, when regarding the FID, the proposed SGCE-Font achieves the best results in nine font generation tasks and the second best results in three font generation tasks, and is significantly better than the baselines. Similar claims can be concluded in terms of the other three evaluation metrics.

When comparing the performance among different models, the refined CycleGAN models using different guidance information such as the square-block transformation in SQ-GAN [24], the stroke encoding in StrokeGAN [25] and the skeleton in SkeGAN outperform the original CycleGAN model for the Chinese font generation. Considering the performance of these models using different guidance information, the performance of SkeGAN using the skeleton information is significantly better than that of SQ-GAN and StrokeGAN in average, in terms of these four evaluation metrics. This show clearly that the skeleton information is much better than the square-block transformation and stroke encoding as the guidance information for the Chinese font generation. Moreover, when comparing the performance of SkeGAN and the proposed SGCE-Font, it can be observed that the proposed SGCE-Font outperforms SkeGAN in most of cases. This shows that the suggested way of using the skeleton information by the channel expansion as the direct input of the generator is much better than the way of imposing the skeleton information onto the discriminator as used in SkeGAN. These claims can be also verified by the visualization results as depicted in Figure 6. From Figure 6, the quality of generated characters of the proposed SGCE-Font is better than SkeGAN, while the performance of SkeGAN is much better than that of StrokeGAN, SQ-GAN as well as CycleGAN.

In particular, as shown in Figure LABEL:fig:modecollapse, the well-known CycleGAN suffers from the mode collapse issue when applied to these three font generation tasks, i.e., {Hupo $\rightarrow$ Zhengkai, Fangsong $\rightarrow$ Heiti, Fangsong $\rightarrow$ Lishu}. Such mode collapse phenomena of CycleGAN can be also observed from Table III, where the FID values of CycleGAN in these three font generation tasks are 218.18, 304.34 and 282.81, which are abnormally large. Although existing square-block transformation and stroke encoding based guidance schemes can reduce the mode collapse to some extent as shown by the third and fourth rows of Figure LABEL:fig:modecollapse, the SQ-GAN [24] still suffers from the mode collapse in the font generation task {Hupo $\rightarrow$ Zhengkai} and there is stroke-missing or stroke-redundancy phenomenon in the generated characters of StrokeGAN, while from the fifth rows of Figure LABEL:fig:modecollapse, the mode collapse issue of CycelGAN can be significantly alleviated by the suggested SGCE module, and the quality of the generated characters of the proposed SGCE-Font is much better than that of the other three baselines. These show clearly the superiority of the suggested SGCE module on reducing mode collapse.

	model	hc $\rightarrow$ ls	ls $\rightarrow$ hc	zk $\rightarrow$ ss	zk $\rightarrow$ zx	avg
FID	CycleGAN	30.173	33.015	204.112	115.039	95.585
	SQ-GAN	50.451	71.220	295.611	87.388	126.168
	FUNIT	54.917	55.576	213.107	81.275	101.219
	UGATIT	52.689	48.358	225.062	224.546	137.664
	AttentionGAN	82.824	83.822	228.433	92.930	122.002
	SGCE-Font	30.695	30.291	185.330	70.030	79.087
MSE	CycleGAN	0.148	0.155	0.219	0.097	0.154
	SQ-GAN	0.206	0.176	0.249	0.095	0.181
	FUNIT	0.170	0.180	0.234	0.095	0.170
	UGATIT	0.160	0.198	0.211	0.179	0.187
	AttentionGAN	0.205	0.203	0.232	0.094	0.184
	SGCE-Font	0.144	0.163	0.212	0.094	0.153
PSNR	CycleGAN	5.648	8.161	6.668	10.239	7.679
	SQ-GAN	8.364	7.721	6.095	10.412	8.148
	FUNIT	7.730	7.480	6.356	10.361	7.982
	UGATIT	8.068	7.075	6.774	7.543	7.365
	AttentionGAN	6.911	6.944	6.380	10.415	7.662
	SGCE-Font	8.489	7.953	6.934	10.456	8.458
SSIM	CycleGAN	0.008	0.519	0.456	0.683	0.416
	SQ-GAN	0.543	0.499	0.544	0.685	0.568
	FUNIT	0.533	0.519	0.430	0.684	0.542
	UGATIT	0.531	0.458	0.426	0.498	0.478
	AttentionGAN	0.445	0.445	0.429	0.684	0.501
	SGCE-Font	0.578	0.539	0.456	0.691	0.566

TABLE V: Comparison on the performance of the proposed SGCE-font model and the state-of-the-art models over four calligraphy font generation tasks. The best and second best results are marked in bold and blue color respectively.

IV-C Comparison with State-of-the-art Models

In this section, we conducted twelve printing or handwriting font generation tasks to demonstrate the effectiveness of the proposed model through comparing with the state-of-the-art models including FUNIT [39], UGATIT [45], and AttentionGAN [40]. The comparison results between the proposed model and these existing models for the twelve standard character generation are presented in Table IV.

From Table IV, the proposed SGCE-Font achieves the best performance in seven of these twelve font generation tasks and performs the second best in four of these twelve font generation tasks in terms of FID, as presented in the fifth row of Table IV. The average FID value of the proposed model over these fourteen standard Characters generation tasks is much better than those of state-of-the-art models, as shown in the last column of Table IV. Similar claims can be concluded in terms of the other three evaluation metrics from Table IV.

Some generated characters of the proposed model as well as these three state-of-the-art models for some Chinese characters in the test set are shown in Figure 7. It can be observed from this figure that SGCE-Font can produce Chinese characters with higher quality than the concerned existing models. Specifically, FUNIT [39] does not perform very well when applied to the generation tasks {Fangsong $\rightarrow$ Zhengkai, Zhengkai $\rightarrow$ Hupo}, that is, the styles of generated fonts seem closer to the styles of source fonts than that of target fonts, as shown in the second row of Figure 7. Besides these, there are some flaws (say, missing some strokes) in the generated characters of existing models for some font generation tasks such as {Zhengkai $\rightarrow$ Handwriting} and {Zhengkai $\rightarrow$ Shuti}, while these flaws can be remarkably reduced by the suggested SGCE-Font model. These demonstrate the effectiveness of the proposed model.

We also present some generated characters of the proposed model and these state-of-the-art models for some unseen Chinese characters, i.e., excluded by the concerned datasets, in Figure 8. It can be observed that the proposed SGCE-Font model can generate more realistic and better Chinese characters than existing models for these unseen Chinese characters. This shows that the proposed model has the better generalization performance.

IV-D Comparison in Calligraphy Font Generation

Besides the previous twelve printing or handwriting font generation tasks, we particularly considered four calligraphy font generation tasks to show the effectiveness of the proposed model. As presented in II, the total sizes of the Zhuxi (zx) and SuShi (ss) are only 920 and 166, which are much fewer than those of printing or handwriting font datasets. The quantitative comparison results over these four font generation tasks are presented in Table V.

		task	zk $\rightarrow$ fs	zk $\rightarrow$ st	fs $\rightarrow$ ht	zk $\rightarrow$ hw	fs $\rightarrow$ zk	st $\rightarrow$ zk	ht $\rightarrow$ fs	hp $\rightarrow$ zk	hc $\rightarrow$ ls	zk $\rightarrow$ ss	zk $\rightarrow$ zx
UAGTIT	FID	Orig	42.824	118.105	37.918	77.100	34.032	128.105	28.324	76.551	48.358	225.062	224.546
	FID	+SGCE	29.516	33.060	31.177	71.696	32.290	48.030	25.291	67.203	37.943	103.618	84.286
	MSE	Orig	0.113	0.151	0.170	0.129	0.090	0.127	0.140	0.144	0.198	0.211	0.179
	MSE	+SGCE	0.118	0.149	0.144	0.098	0.089	0.159	0.123	0.129	0.167	0.249	0.085
	PSNR	Orig	9.628	8.336	7.799	9.342	10.649	9.101	8.657	8.481	7.075	6.774	7.543
	PSNR	+SGCE	9.479	8.398	8.571	10.214	10.709	8.039	9.213	8.981	7.855	6.086	10.863
	SSIM	Orig	0.549	0.528	0.549	0.271	0.589	0.486	0.464	0.451	0.458	0.426	0.498
	SSIM	+SGCE	0.548	0.613	0.584	0.322	0.594	0.522	0.602	0.472	0.505	0.433	0.704
AtteniongGAN	FID	Orig	32.319	92.032	32.796	26.861	57.464	51.844	37.740	55.689	83.822	228.433	92.930
	FID	+SGCE	19.163	43.024	28.997	20.053	46.997	38.718	25.601	58.478	24.550	237.649	72.509
	MSE	Orig	0.109	0.197	0.170	0.097	0.095	0.151	0.125	0.143	0.203	0.232	0.094
	MSE	+SGCE	0.106	0.151	0.105	0.094	0.095	0.122	0.105	0.134	0.143	0.220	0.094
	PSNR	Orig	9.826	7.084	7.749	10.241	10.398	8.265	9.148	8.470	6.944	6.380	10.415
	PSNR	+SGCE	9.941	8.334	9.998	10.391	10.368	9.301	9.880	8.780	8.509	6.744	10.450
	SSIM	Orig	0.580	0.451	0.463	0.332	0.588	0.433	0.507	0.452	0.445	0.429	0.684
	SSIM	+SGCE	0.589	0.520	0.571	0.344	0.587	0.506	0.602	0.475	0.562	0.444	0.689
SQ-GAN	FID	Orig	49.865	71.694	38.896	49.532	99.038	146.293	37.546	305.931	50.451	295.611	115.039
	FID	+SGCE	45.790	49.004	35.345	46.152	40.964	101.882	84.855	88.694	30.805	125.620	77.896
	MSE	Orig	0.147	0.199	0.186	0.106	0.152	0.158	0.143	0.161	0.206	0.249	0.090
	MSE	+SGCE	0.116	0.187	0.123	0.100	0.123	0.164	0.133	0.147	0.164	0.262	0.094
	PSNR	Orig	9.822	7.058	7.817	9.791	9.369	8.050	8.626	7.404	8.364	6.095	10.635
	PSNR	+SGCE	9.539	7.323	9.318	10.082	9.180	7.887	8.852	8.378	7.891	5.858	10.432
	SSIM	Orig	0.570	0.448	0.478	0.293	0.530	0.428	0.485	0.359	0.543	0.544	0.754
	SSIM	+SGCE	0.570	0.488	0.583	0.320	0.535	0.423	0.497	0.465	0.532	0.405	0.690
FUNIT	FID	Orig	37.505	43.675	64.990	75.553	151.728	207.221	40.677	254.404	55.576	282.494	139.837
	FID	+SGCE	38.683	41.299	57.760	77.190	133.800	205.078	53.910	132.696	55.480	111.208	101.170
	MSE	Orig	0.127	0.224	0.212	0.105	0.138	0.180	0.178	0.188	0.180	0.167	0.093
	MSE	+SGCE	0.128	0.220	0.212	0.106	0.137	0.178	0.177	0.189	0.183	0.182	0.095
	PSNR	Orig	9.075	6.523	6.772	9.861	8.660	7.476	7.525	7.276	7.480	7.819	10.414
	PSNR	+SGCE	9.042	6.604	6.777	9.791	8.727	7.519	7.553	7.271	7.407	7.476	10.287
	SSIM	Orig	0.525	0.426	0.388	0.295	0.485	0.412	0.389	0.386	0.519	0.397	0.686
	SSIM	+SGCE	0.522	0.433	0.386	0.291	0.498	0.416	0.395	0.380	0.519	0.498	0.691

TABLE VI: Comparison on the performance of the state-of-the-art models and their refined models equipped with the suggested SGCE module. For each model, “Orig” and “+SGCE” represent the concernced model and its refined model equipped with SGCE, respectively.

		task	zk $\rightarrow$ fs	zk $\rightarrow$ st	fs $\rightarrow$ ht	zk $\rightarrow$ hw	fs $\rightarrow$ zk	st $\rightarrow$ zk	ht $\rightarrow$ fs	hp $\rightarrow$ zk
StrokeGAN	FID	Orig	57.851	71.302	28.972	37.779	67.507	71.302	39.610	140.227
	FID	+SGCE	33.283	25.230	15.861	28.201	24.959	44.303	21.837	89.440
	MSE	Orig	0.120	0.158	0.188	0.104	0.104	0.125	0.129	0.181
	MSE	+SGCE	0.114	0.189	0.123	0.095	0.128	0.169	0.117	0.168
	PSNR	Orig	9.359	6.774	7.318	9.906	10.002	6.774	9.043	8.379
	PSNR	+SGCE	9.613	7.311	9.362	10.336	8.996	7.763	9.495	9.010
	SSIM	Orig	0.553	0.464	0.462	0.301	0.574	0.464	0.533	0.448
	SSIM	+SGCE	0.577	0.495	0.600	0.448	0.520	0.432	0.567	0.503

TABLE VII: Comparison on the performance of StrokeGAN and its refined model equipped with the suggested SGCE module.

From Table V, the proposed SGCE-Font achieves the best performance in average in terms of FID, MSE and PSNR, while acheives the second best performance in average in terms of SSIM. Specifically, in terms of FID, the proposed SGCE-Font model achieves the best results in the font generation tasks {ls $\rightarrow$ hc, zk $\rightarrow$ ss, zk $\rightarrow$ zx}, while achieves the second best result in the font generation tasks {hc $\rightarrow$ ls}. Similar claims can be also claimed in terms of the other three evaluation metrics. We also present some visualization comparisons between the proposed model and five state-of-the-art models over two calligraphy font generation tasks. The comparison results for the characters in the test set and the unseen characters excluded by the datasets are presented in Figure 9. It can be observed from Figure 9 that the quality of generated characters of the proposed model is better than the baselines. These show clearly the effectiveness of the proposed model in these calligraphy font generation tasks.

IV-E Generalization of Proposed SGCE Module

As pointed out before, the proposed SGCE module can be used as a plug-and-play module for the Chinese font generation models. In the following, we adapted the proposed SGCE module to five existing models including UGATIT [45], AttentionGAN [40], StrokeGAN [25], SQ-GAN [24] and FUNIT [39] to further enhance their performance. We incorporated SGCE into each base model by a similar way in Figure 5. For UGATIT, AttentionGAN, SQ-GAN and FUNIT, we implemented eleven font generation tasks including eight printing or handwriting font generation tasks and three calligraphy font generation tasks to evaluate the effectiveness of the proposed SCGE module, while for StrokeGAN, we did not tested its performance in these three calligraphy font generation tasks since it was expensive to build up the stroke encodings of the traditional Chinese characters involved in the calligraphy font generation tasks.

The quantitative comparison results are presented in Table VI and Table VII. From Table VI and Table VII, the performance of these five concerned models can be substantially improved over most of font generation tasks, through equipping with the proposed SGCE module in terms of these four evaluation metrics. We also present some visualization comparison results in Figure 10. It can be observed from Figure 10 that the quality of generated characters can be improved by equipping with the proposed SGCE module. These show clearly the effectiveness of the proposed SGCE module.

V Conclusion

Using what kind of information as the guidance information and how to use it to reduce the mode collapse issue are two important questions for the predominated GAN based models for the Chinese font generation. This paper proposed an effective guidance module called SGCE for the Chinese font generation models, where the skeleton information is adopted and used through a channel expansion way, motivated by the observation that the skeleton embodies both local and global information of a Chinese character. The channel expansion way of using skeleton directly imposes the skeleton guidance information onto the generator, which is more effective than the existing way that imposes the guidance information onto the discriminator. Extensive experiments are conducted to demonstrate the effectiveness of the proposed SGCE module. Experimental results show that the proposed SGCE module is more effective in reducing the mode collapse issue suffered by the well-known CycleGAN, and can be easily adapted to existing models as a plug-and-play module to further improve their performance. One future direction is to adapt the proposed module to the generation of other languages (e.g., Korean) or other scenarios like the few-shot Chinese font generation.

References

[1] G. Zhang, W. Huang, R. Chen, J. Yang, and H. Peng, “Calligraphy fonts generation based on generative adversarial networks,” ICIC express letters. Part B, Applications: an international journal of research and surveys, vol. 10, no. 3, pp. 203–209, 2019.
[2] Y. Wang, G. Pu, W. Luo, Y. Wang, P. Xiong, H. Kang, and Z. Lian, “Aesthetic text logo synthesis via content-aware layout inferring,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2436–2445, June 2022.
[3] C. Luo, Y. Zhu, L. Jin, Z. Li, and D. Peng, “Slogan: Handwriting style synthesis for arbitrary-length and out-of-vocabulary text,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
[5] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proceedings of the 2014 International Conference on Learning Representations, 2014.
[6] S. Xu, F. C. Lau, W. K. Cheung, and Y. Pan, “Automatic generation of artistic chinese calligraphy,” IEEE Intelligent Systems, vol. 20, no. 3, pp. 32–39, 2005.
[7] Y. Tian, “zi2zi: Master chinese calligraphy with conditional adversarial networks.” https://github.com/kaonashi-tyc/zi2zi, 2017. Accessed:2022-08-16.
[8] T. Yamada, M. Hosoe, K. Kato, and K. Yamamoto, “The character generation in handwriting feature extraction using variational autoencoder,” in Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.
[9] K. Goto and N. Inoue, “Learning vae with categorical labels for generating conditional handwritten characters,” in Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA), 2021.
[10] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Few-shot font generation with localized style representations and factorization,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2393–2402, May 2021.
[11] M. Qin, Z. Zhang, and X. Zhou, “Disentangled representation learning gans for generalized and stable font fusion network,” IET Image Processing, vol. 16, no. 2, pp. 393–406, 2022.
[12] P. Lyu, X. Bai, C. Yao, Z. Zhu, T. Huang, and W. Liu, “Auto-encoder guided gan for chinese calligraphy synthesis,” in Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1095–1100, IEEE, 2017.
[13] Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “Dcfont: an end-to-end deep chinese font generation system,” in SIGGRAPH Asia 2017 Technical Briefs, pp. 1–4, 2017.
[14] Y. Lei, L. Zhou, T. Pan, H. Qian, and Z. Sun, “Learning and generation of personal handwriting style chinese font,” in Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1909–1914, IEEE, 2018.
[15] J. Chang, Y. Gu, Y. Zhang, Y.-F. Wang, and C. Innovation, “Chinese handwriting imitation with hierarchical generative adversarial network.,” in Proceedings of the BMVC, p. 290, 2018.
[16] S.-J. Wu, C.-Y. Yang, and J. Y.-j. Hsu, “Calligan: Style and structure-aware chinese calligraphy character generator,” 2020.
[17] S. Yuan, R. Liu, M. Chen, B. Chen, Z. Qiu, and X. He, “Se-gan: Skeleton enhanced gan-based model for brush handwriting font generation,” 2022.
[18] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Multiple heads are better than one: Few-shot font generation with multiple localized experts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13900–13909, 2021.
[19] A. U. Hassan, H. Ahmed, and J. Choi, “Unpaired font family synthesis using conditional generative adversarial networks,” Knowledge-Based Systems, vol. 229, p. 107304, 2021.
[20] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
[21] Z. Qian and D. Fang, “Towards chinese calligraphy,” Macalester International, vol. 18, no. 1, p. 12, 2007.
[22] T. Chen, Chinese Calligraphy. Cambridge University Press, 2011.
[23] B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating handwritten chinese characters using cyclegan,” in Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV), pp. 199–207, IEEE, 2018.
[24] J. Zeng, Q. Chen, and M. Wang, “Self-supervised chinese font generation based on square-block transformation (in chinese),” Sci Sin Inform, vol. 52, no. 1, p. 15, 2022.
[25] J. Zeng, Q. Chen, Y. Liu, M. Wang, and Y. Yao, “Strokegan: Reducing mode collapse in chinese font generation via stroke encoding,” in Proceedings of the AAAI, vol. 3, 2021.
[26] Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “Scfont: Structure-guided chinese font generation via deep stacked networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4015–4022, Jul. 2019.
[27] Y. Lin, H. Yuan, and L. Lin, “Chinese typography transfer model based on generative adversarial network,” in Proceedings of the 2020 Chinese Automation Congress (CAC), pp. 7005–7010, IEEE, 2020.
[28] Y. Xie, X. Chen, L. Sun, and Y. Lu, “Dg-font: Deformable generative networks for unsupervised font generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5130–5140, 2021.
[29] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
[30] M. Li, J. Wang, Y. Yang, W. Huang, and W. Du, “Improving gan-based calligraphy character generation using graph matching,” in Proceedings of the 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 291–295, IEEE, 2019.
[31] Z. Lian and J. Xiao, “Automatic shape morphing for chinese characters,” in Proceedings of the SIGGRAPH Asia 2012 Technical Briefs, pp. 1–4, 2012.
[32] P. Liu, S. Xu, and S. Lin, “Automatic generation of personalized chinese handwriting characters,” in Proceedings of the 2012 Fourth International Conference on Digital Home, pp. 109–116, IEEE, 2012.
[33] J.-W. Lin, C.-Y. Hong, R.-I. Chang, Y.-C. Wang, S.-Y. Lin, and J.-M. Ho, “Complete font generation of chinese characters in personal handwriting style,” in Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), pp. 1–5, IEEE, 2015.
[34] X. Lin, J. Li, H. Zeng, and R. Ji, “Font generation based on least squares conditional generative adversarial nets,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 783–797, 2019.
[35] C. Wen, Y. Pan, J. Chang, Y. Zhang, S. Chen, Y. Wang, M. Han, and Q. Tian, “Handwritten chinese font generation with collaborative stroke refinement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3882–3891, 2021.
[36] Y. Gao and J. Wu, “Gan-based unpaired chinese character image translation via skeleton transformation and stroke rendering,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 646–653, Apr. 2020.
[37] S. Yu, J. Zhao, X. Ye, C. Tang, and Y. Zhen, “Calligraphic chinese characters generation based on generative adversarial networks with structural constraint,” Pattern Recognition and Artificial Intelligence, 2021.
[38] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236–239, 1984.
[39] M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10551–10560, 2019.
[40] H. Tang, H. Liu, D. Xu, P. H. Torr, and N. Sebe, “Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
[41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International conference on machine learning, pp. 448–456, PMLR, 2015.
[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
[43] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
[44] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
[45] J. Kim, M. Kim, H. Kang, and K. H. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” in Proceedings of the 2020 International Conference on Learning Representations, 2020.