SGCE-Font: Skeleton Guided Channel Expansion for Chinese Font Generation
Abstract
The automatic generation of Chinese fonts is an important problem involved in many applications. The predominated methods for the Chinese font generation are based on the deep generative models, especially the generative adversarial networks (GANs). However, existing GAN-based methods (say, CycleGAN) for the Chinese font generation usually suffer from the mode collapse issue, mainly due to the lack of effective guidance information. This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) module for the Chinese font generation through integrating the skeleton information into the generator with the channel expansion way, motivated by the observation that the skeleton embodies both local and global structure information of Chinese characters. We conduct extensive experiments to show the effectiveness of the proposed module. Numerical results show that the mode collapse issue suffered by the known CycleGAN can be effectively alleviated by equipping with the proposed SGCE module, and the CycleGAN equipped with SGCE outperforms the state-of-the-art models in terms of four important evaluation metrics and visualization quality. Besides CycleGAN, we also show that the suggested SGCE module can be adapted to other models for Chinese font generation as a plug-and-play module to further improve their performance.
Index Terms:
Chinese font generation, generative adversarial networks, mode collapse, skeleton, channel expansion.I Introduction
The generation of Chinese fonts has attracted amounts of attention in recent years due to its wide range of applications [1, 2, 3]. With the development of deep learning, in particular the deep generative models such as the generative adversarial networks (GANs) [4] and variational auto-encoder (VAE) [5], the kind of deep generative model based methods has become the mainstream due to their impressive performance [6, 7, 8, 9, 10, 11]. Existing deep generative models for Chinese font generation can be generally divided into two categories, that is, supervised models [12, 13, 7, 14, 15, 16, 17] and unsupervised models [1, 11, 10, 18, 19], where the kind of supervised models is mainly based on the paired data (i.e., there is a one-to-one correspondence between characters in the source and target font domains), and the kind of unsupervised models is mainly based on the unpaired data (without requiring a one-to-one correspondence between the source and target font domains).
In the early work [12] of supervised models, the authors suggested an auto-encoder guided GAN model for the Chinese calligraphy synthesis based on the paired data, regarding the Chinese font generation problem as an image-to-image translation problem. Motivated by this perspective, many models have been developed for the Chinese font generation in recent years [7, 16, 17]. In [7], the authors adopted the Pix2Pix model [20] to the generation of Chinese fonts. In [16], an effective GAN model dubbed CalliGAN was proposed for the Chinese font generation by exploiting certain component information of Chinese characters. The recent paper [17] proposed a GAN-based image translation model for the synthesis of Chinese brush handwriting font by integrating the skeleton information in the expense of increasing model complexity of the deep neural networks. Despite the impressive performance of this kind of supervised models [12, 7, 16, 17], the collection of a large amount of paired samples is generally overhead, and even unrealistic for some font generation problems like the generation of ancient calligraphy fonts [21, 22].
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/99662e69-0149-4a86-953d-ca8e4674ba57/x1.png)
To tackle this issue, many unsupervised models based on unpaired data have been suggested in the literature [23, 26, 27, 28]. In the seminal work [23] of unsupervised models, the authors adopted the well-known CycleGAN model [29] to the generation of Chinese fonts based on the unpaired data. In [30], the authors proposed an effective model using graph matching for the calligraphy character generation. In [26], a structure-guided deep generative model dubbed SCFont was suggested for the generation of Chinese fonts, by integrating the domain knowledge of Chinese characters such as writing trajectory and skeleton into the generation. [28] proposed a novel deformable generative model called DG-Font for the generation of Chinese font based on unpaired data. Although the effectiveness of these unsupervised models has been demonstrated in the literature, the kind of unsupervised models (say, CycleGAN [23]) usually suffer from the well-known mode collapse issue [4], which results in poor performance, mainly due to the lack of effective guidance information. As shown in the second row of Figure LABEL:fig:modecollapse, mode collapse happens for CycleGAN in three concerned generation tasks, and in this case, CycleGAN fails to yield the correct Chinese characters.


To reduce the mode collapse, several effective Chinese font generation models have been recently suggested in the literature [25, 24] through integrating certain kinds of component information of Chinese characters. In [25], the authors incorporated the stroke information into CycleGAN to reduce the mode collapse (there dubbed StrokeGAN), where certain one-bit stroke encoding was introduced to represent the stroke information as shown in the second row of Figure 2. In [24], the authors proposed a square-block geometric transformation based self-supervised scheme for CycleGAN to capture certain spatial structure information of Chinese characters, inspired by the observation that character structures are closely related to these square-block transformations. As shown in the third row of Figure 2, these four square-block transformations were used in [24] to represent the single-component, left-right, up-bottom, and semi-encircling structures, respectively. It can be observed from the third and fourth rows of Figure LABEL:fig:modecollapse that both the one-bit stroke encoding and square-block transformations are generally insufficient to thoroughly address the mode collapse issue, since there are some Chinese characters with both the same one-bit stroke encodings and character structures as shown in Figure 2. Moreover, the guidance information used in both StrokeGAN [25] and SQ-GAN [24] was imposed on the discriminator, leading to certain implicit guidance during the generation.
In this paper, we focus on developing the effective information guidance scheme for the Chinese font generation. In particular, we aim to answer the following two questions:
-
(Q1)
What kind of guidance information is more effective for the Chinese font generation?
-
(Q2)
How to use the guidance information more effectively for the Chinese font generation?
For (Q1), we suggest using the skeletons of Chinese characters as the guidance information for the Chinese font generation, motivated by the observation that the skeleton can embody both local and global structure information of Chinese characters as well as their semantic information, as shown in Figure 2. From Figure 2, the stroke encodings used in StrokeGAN [25] and square-block transformations used in SQ-GAN [24] can only reflect either local or global structure information of Chinese characters and in particular cannot provide the sole representation of a Chinese character, while the skeleton can provide both local and global structure information as well as the sole representation of a Chinese character. Thus, the skeleton should be more effective than both the stroke encoding and square-block transformation for the Chinese font generation. For (Q2), we directly use the skeleton information as a part of input of the generator through a channel expansion way, instead of imposing it on the discriminator. Thus, the skeleton should provide more information and stronger guidance for the generator than existing implicit ways of imposing guidance information on the discriminator.


The major contributions of this paper can be summarized as follows:
-
(1)
This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) for the Chinese font generation through integrating the skeleton information into the generator by a channel expansion way. The proposed module can provide both local and global structure information as the guidance information for the Chinese font generation while does not increase any additional model complexity. Thus, the proposed module can effectively alleviate the mode collapse and improve the quality of the generated Chinese characters, and can be used as a plug-and-play module for Chinese font generation models without increasing model complexity.
-
(2)
Extensive experiments are conducted to demonstrate the effectiveness of the proposed module, in particular its superiority on reducing mode collapse and enhancing the generation performance. The outperformance of the proposed SGCE-Font model (i.e., equipping the proposed SGCE module into the known CycleGAN model) is also shown by numerous experiments under quantitative comparisons with the state-of-the-art models. Some generated characters of the proposed model are presented in Figure 3. It can be observed that the proposed model can generate very realistic Chinese characters over these ten concerned Chinese fonts.
-
(3)
Besides CycleGAN, we also show that the proposed module can be easily adapted to other existing Chinese font generation models and further enhance their performance.
The rest of this paper is organized as follows. In Section II, we present some related work. In Section III, we describe the proposed module in detail. In Section IV, we provide amounts of experiments to demonstrate the effectiveness of the proposed module. We conclude this paper in the final section.
II Related Work
As shown in Figure 2, the components of Chinese characters such as strokes and skeletons directly reflect certain structure information of Chinese characters. Motivated by this, many kinds of component information of Chinese characters have been widely used in the literature [31, 32, 33, 34, 35, 17, 26, 36, 11, 37] to improve the generation performance. In the early stage, traditional models [31, 32, 33] firstly decompose the Chinese characters into several basic components such as strokes and radicals, and then assemble them to yield new Chinese characters by leveraging some effective machine learning methods. It should be pointed out that the decomposition of strokes or radicals is generally hand-crafted and thus very costly.
To address this issue, several end-to-end models based on deep learning are introduced to extract these important components of Chinese characters and incorporate them into the generation procedure [34, 35, 17, 26, 36, 11, 25, 24]. Specifically, [34] firstly adopted a coherent point drift algorithm to divide the Chinese characters into strokes, and then produced new font strokes by fusing the styles of two existing font strokes, and further yielded new fonts through assembling them. [35] suggested a stroke refinement branch to particularly focus on the generation of thin strokes. [25] introduced a simple one-bit stroke encoding to represent the stroke information and incorporated it into the CycleGAN model as the supervision information. By the use of such stroke encoding, the mode collapse of CycleGAN can be remarkably reduced [25]. In the recent paper [24], the authors proposed a square-block translation based self-supervised scheme for the generation of Chinese characters, inspired by the fact that structures of Chinese characters are closely related to the square-block transformations. As compared to [34, 35, 25, 24], this paper exploits the skeleton instead of the stroke representing certain local structure information [34, 35, 25] and the square-block transformation representing certain global structure information [24], mainly motivated by the observation that skeletons can provide both local and global structure information of Chinese characters.
Besides the stroke and character structure information, several works based on the skeleton-like information have been studied in the literature [26, 36, 37, 17, 11]. [26] introduced two neural network modules to realize the extraction of strokes and writing trajectories of Chinese characters respectively, and then incorporated them into the generation of Chinese fonts. [36] proposed a three-stage GAN model including the skeleton extraction, skeleton transformation and stroke rendering for multi-chirography image translation. [11] proposed a font fusion network based on GAN and disentangled representation learning to generate brand new fonts, where the disentangled representation learning was introduced to obtain the stoke style and skeleton shape. [37] incorporated the skeleton of Chinese characters into the generator and utilized it as certain structural constraint to supervise the model, while similar idea was also used in [17] for the generation of brush handwriting Chinese font. It should be pointed out that both [37] and [17] are based on the paired data and through introducing auxiliary network modules to extract skeletons, while this paper is based on the unpaired data and through using some simple mathematical verification conditions to extract the skeleton.
Distinguished from previous studies struggling to increase the model complexity to utilize these components of Chinese characters to help the generation tasks, we aim to perform much simpler and more effective supervision for a relatively simple GAN model to achieve its full potential. As discussed before, the proposed skeleton guided channel expansion module is very simple but effective to provide certain local and global supervision for GAN models to alleviate the mode collapse issue and enhance the performance, while does not introduce any additional model complexity. The skeleton information used in this paper can provide richer spatial structure information and semantic information of Chinese characters, and the suggested channel expansion way to exploit the skeleton information can provide stronger guidance for the generator. Moreover, the proposed module can be easily adapted to many existing GAN models to improve their performance.
III SGCE for Chinese Font Generation
In this section, we firstly describe the proposed skeleton guided channel expansion (SGCE) module as an effective information guidance module for the Chinese font generation, and then take the well-known CycleGAN [23] as an example to show how to integrate the proposed module into the existing Chinese font generation models.
III-A Skeleton Guided Channel Expansion Module
As discussed before, using what kind of information as the guidance information and how to use it are very important for the Chinese font generation. Motivated by the superiority of the skeleton on representing the global and local structure information of Chinese characters and the advantage of channel expansion on directly using the information, we propose a skeleton guided channel expansion module as certain a information guidance module for the Chinese font generation, as depicted in Figure 4. From Figure 4, given a character image as the input, we firstly extract its skeleton with single channel via a skeletonization strategy denoted as Ske for short and described later, and then combine the single-channel skeleton information with the original (R, G, B) three-channel character image information to yield the four-channel image information as the input of the generator through the channel expansion way.
In the next, we describe the skeletonization strategy used in this paper to extract the skeleton of a Chinese character, inspired by [38]. Given an image of a Chinese character, we firstly perform a binarized operation on it, and then yield the skeleton from the binarized image. Specifically, for a given pixel of the binarized image, we define a image patch with as the center, and number the other eight pixels according to Table I. For those pixels lying on the edge of the binarized image, we obtain the associated image patches using the zero-padding way. For the defined image patch as shown in Table I, let be the value of (), be the number of non-zero points in this image patch, and be the number of pairs among to by sequence, then we delete if the following hold:
-
(a)
; and either
-
(b)
; or
-
(c)
.
We eventually yield the skeleton of a Chinese character through doing the above operation over all pixels.
According to the above description, the proposed SGCE module does not require any additional neural network to implement it. This is very different to existing works [26, 36, 37, 17, 11] struggling to increase the model complexity to exploit the component information (say skeleton) for the Chinese font generation.
III-B Proposed SGCE-Font Model
In the following, we describe the proposed model for Chinese font generation through integrating the suggested SGCE module (called SGCE-Font), where we take the well-known CycleGAN [23] as the base model as an example. The specific model architecture is presented in Figure 5. As depicted in Figure 5, the proposed SGCE-Font is very similar to that of the existing model (say, CycleGAN), except that the input of the generator is replaced by the four-channel skeleton-image information yielded by the suggested SGCE module, instead of the original (R, G, B) three-channel image information. The specific workflow of the proposed model can be described as follows. We first input the Chinese character image with (R, G, B) three channels in the source font domain into SGCE module to yield enriched by the skeleton information into the generator to realize the style translation from the source font style to the target font style. After , we yield the generated character in the target font domain. On one hand, we simultaneously feed the generated characters and the real Chinese characters in the target font domain into the discriminator to distinguish whether they are real or fake, and on the other hand, we feed the generated character into the SGCE module and then input the enriched information into another generator (realizing the font generation task from the target font domain to the source font domain), and finally yield a reconstructed character in the source font domain. According to the above workflow, the training loss of the proposed model consists of four parts including two adversarial losses and related to and respectively, the cycle consistency loss between the original Chinese character and its reconstructed character in the source font domain, the skeleton consistency loss between the skeleton of the input and the skeleton of the reconstructed Chinese character in the source font domain, where the skeleton consistency loss is imposed to further supervise the generation procedure. Specifically, these losses are defined as follows:
where and respectively represent the sets of real Chinese characters in the source and target domains, represents the set of reconstructed Chinese characters in the source font domain by , represents the set of generated characters in the target font domain by . Thus, the training model of SGCE-Font is shown as follows:
where
and , are two tunable hyperparameters.
Font | ht | fs | zk | hw | hp | st | ls | hc | zx | ss |
---|---|---|---|---|---|---|---|---|---|---|
Size | 3755 | 3755 | 2811 | 3755 | 2811 | 2811 | 2811 | 2811 | 920 | 166 |
task | zkfs | zkst | fsht | zkhp | zkhw | fsls | lshc | fszk | stzk | htfs | hpzk | hwzk | lsfs | hcls | avg | |
FID | CycleGAN | 25.80 | 79.33 | 304.34 | 43.91 | 31.54 | 282.81 | 33.02 | 32.56 | 47.19 | 248.63 | 218.18 | 27.10 | 136.13 | 30.17 | 110.05 |
SQ-GAN | 49.87 | 71.69 | 38.90 | 57.44 | 49.53 | 29.08 | 71.22 | 99.04 | 146.29 | 37.55 | 305.93 | 24.24 | 56.90 | 50.45 | 77.72 | |
StrokeGAN | 57.85 | 71.30 | 28.97 | 92.86 | 37.78 | 28.63 | 34.60 | 67.51 | 71.30 | 39.61 | 140.23 | 23.80 | 27.35 | 26.74 | 53.47 | |
SkeGAN | 36.58 | 58.75 | 40.27 | 26.90 | 28.91 | 32.32 | 46.09 | 33.58 | 33.12 | 59.52 | 27.59 | 48.29 | 49.84 | 72.97 | 42.48 | |
SGCE-Font | 34.68 | 40.35 | 20.66 | 34.60 | 53.57 | 20.79 | 30.29 | 25.55 | 31.91 | 23.73 | 26.74 | 23.03 | 35.70 | 30.70 | 30.88 | |
MSE | CycleGAN | 0.117 | 0.214 | 0.197 | 0.212 | 0.097 | 0.187 | 0.155 | 0.135 | 0.121 | 0.166 | 0.177 | 0.201 | 0.193 | 0.148 | 0.166 |
SQ-GAN | 0.147 | 0.199 | 0.186 | 0.292 | 0.106 | 0.162 | 0.176 | 0.152 | 0.158 | 0.143 | 0.161 | 0.208 | 0.153 | 0.206 | 0.175 | |
StrokeGAN | 0.120 | 0.158 | 0.188 | 0.223 | 0.104 | 0.090 | 0.175 | 0.104 | 0.125 | 0.129 | 0.181 | 0.207 | 0.116 | 0.112 | 0.145 | |
SkeGAN | 0.112 | 0.198 | 0.106 | 0.187 | 0.092 | 0.088 | 0.150 | 0.133 | 0.128 | 0.116 | 0.121 | 0.191 | 0.111 | 0.146 | 0.134 | |
SGCE-Font | 0.111 | 0.158 | 0.117 | 0.175 | 0.090 | 0.094 | 0.163 | 0.087 | 0.122 | 0.117 | 0.112 | 0.188 | 0.123 | 0.144 | 0.129 | |
PSNR | CycleGAN | 9.51 | 7.26 | 7.07 | 6.79 | 10.21 | 7.287 | 8.16 | 8.75 | 8.32 | 7.80 | 7.55 | 7.02 | 7.17 | 5.65 | 7.75 |
SQ-GAN | 9.82 | 7.06 | 7.82 | 6.98 | 9.79 | 9.05 | 7.72 | 9.37 | 8.05 | 8.63 | 7.40 | 6.85 | 8.99 | 8.36 | 8.28 | |
StrokeGAN | 9.36 | 6.77 | 7.32 | 6.35 | 9.91 | 10.66 | 7.72 | 10.00 | 6.77 | 9.04 | 8.38 | 6.88 | 9.55 | 9.68 | 8.46 | |
SkeGAN | 9.70 | 7.08 | 9.98 | 7.37 | 10.46 | 10.81 | 8.30 | 8.83 | 9.06 | 9.47 | 9.29 | 7.25 | 9.73 | 8.42 | 8.98 | |
SGCE-Font | 9.73 | 8.11 | 9.55 | 7.72 | 10.55 | 10.47 | 7.95 | 10.85 | 9.32 | 9.69 | 9.68 | 7.35 | 9.28 | 8.49 | 9.20 | |
SSIM | CycleGAN | 0.568 | 0.495 | 0.346 | 0.386 | 0.330 | 0.461 | 0.519 | 0.502 | 0.473 | 0.146 | 0.413 | 0.459 | 0.433 | 0.008 | 0.396 |
SQ-GAN | 0.570 | 0.448 | 0.478 | 0.354 | 0.293 | 0.595 | 0.499 | 0.530 | 0.428 | 0.485 | 0.359 | 0.444 | 0.505 | 0.543 | 0.466 | |
StrokeGAN | 0.553 | 0.464 | 0.462 | 0.300 | 0.301 | 0.660 | 0.452 | 0.574 | 0.464 | 0.533 | 0.448 | 0.447 | 0.547 | 0.566 | 0.484 | |
SkeGAN | 0.579 | 0.465 | 0.605 | 0.428 | 0.341 | 0.657 | 0.535 | 0.506 | 0.527 | 0.526 | 0.543 | 0.460 | 0.538 | 0.553 | 0.519 | |
SGCE-Font | 0.582 | 0.534 | 0.613 | 0.452 | 0.349 | 0.664 | 0.539 | 0.632 | 0.542 | 0.579 | 0.566 | 0.476 | 0.549 | 0.578 | 0.547 |
IV Experiments
In this section, we conducted a series of experiments to show the effectiveness of the proposed SGCE module for the Chinese font generation. All experiments were carried out in Pytorch environment running Linux, AMD Ryzen 7 5800X 8-Core Processor CPU, GeForce RTX 3090 GPU.

IV-A Experimental Settings
A. Datasets. We considered ten datasets with different fonts including three standard printing fonts {Heiti (ht), Fangsong (fs), Zhengkai (zk)}, the handwriting (hw) font, three pseudo-handwriting fonts {Hupo (hp), Shuti (st), Lishu (ls)}, and three calligraphy fonts {Huangcao (hc), Zhuxi (zx), SuShi(ss)}. The specific sizes of datasets were presented in Table II and some samples were shown in the left part of Figure 3. The handwriting dataset was randomly collected from the CASIA-HWDB1.1 dataset, the other font datasets were collected by ourselves from the internet. The sizes of the last two calligraphy font datasets, i.e., Zhuxi and SuShi are very small. We use them mainly to show the performance of the proposed model over the few-shot font generation task. For each sample, we reset the image size of each character to 128×128×3. In all experiments, we used 80% and 20% samples as the training and test sets respectively.
B. Baselines. In this paper, we considered the following six state-of-the-art models as baselines.
-
•
CycleGAN [23]: A typical model based on GAN and unpaired data.
-
•
SQ-GAN [24]: An improved CycleGAN model equipped with a square-block geometric transformation based self-supervised learning scheme.
-
•
StrokeGAN [25]: A refined CycleGAN model incorporated with the one-bit stroke encoding for the Chinese font generation to mitigate the mode collapse.
-
•
UGATIT [20]: An effective GAN model using unsupervised generative attention network with adaptive layer instance normalization and unpaired data.
-
•
FUNIT [39]: An effective GAN model based on the disentangled representation learning and unpaired data.
-
•
AttentionGAN [40]: An effective GAN model based on the attention mechanism and unpaired data.
Besides the above baselines, we also suggested another model called SkeGAN as a baseline for better comparison with the proposed model. In the SkeGAN, we impose the skeleton information to the discriminator in the similar way of StrokeGAN [25].
C. Network structures. The network structure of the generator of SGCE-Font is almost the same as that of CycleGAN [29], including 2 convolutional layers in the down-sampling module, 9 residual modules with 2 convolutional layers of residual networks for each residual module, and 2 deconvolutional layers in the up-sampling module, except that the input of the generator was modified from 3 channels to 4 channels. The network structure of the discriminator of SGCE-Font is the same as CycleGAN [29], with 6 hidden convolutional layers and 2 convolutional layers in the output module. Moreover, batch normalization [41] was used in all layers.
We used the popular Adam algorithm [42] as the optimizer with the associated parameters (0.5, 0.999) in both the generator and discriminator optimization subproblems. The hyperparameters of the cycle consistency loss and skeleton consistency loss were empirically set as 1 and 0.001, respectively.
D. Evaluation metrics. We used four important evaluation metrics to measure the performance. The first two evaluation metrics were the commonly used SSIM (Structural Similarity) [43] and PSNR (Peak Signal to Noise Ratio), which were employed to measure how the pixel-level details are preserved. The third evaluation metric is the well-known MSE (Mean Square Error). The last evaluation metric is FID (Frechet Inception Distance) [44], which was used to measure how the generated results of the model match the distribution of real data. Larger SSIM and PSNR values, and smaller MSE and FID values generally imply better generation performance.
task | zkfs | zkst | fsht | zkhp | zkhw | fsls | fszk | stzk | htfs | hpzk | hwzk | lsfs | avg | |
FID | FUNIT | 37.50 | 43.67 | 64.99 | 55.28 | 75.55 | 30.84 | 151.73 | 207.22 | 40.68 | 254.40 | 68.82 | 63.33 | 91.17 |
UGATIT | 42.82 | 118.10 | 37.92 | 64.37 | 77.10 | 30.32 | 34.03 | 128.10 | 28.32 | 76.55 | 29.01 | 23.00 | 58.11 | |
AttentionGAN | 32.32 | 92.03 | 32.80 | 25.20 | 26.86 | 17.77 | 57.46 | 51.84 | 37.74 | 55.69 | 25.01 | 34.33 | 40.76 | |
SGCE-Font | 34.68 | 40.35 | 20.66 | 34.60 | 53.57 | 20.79 | 25.55 | 31.91 | 23.73 | 26.73 | 23.03 | 35.70 | 30.94 | |
MSE | FUNIT | 0.127 | 0.224 | 0.212 | 0.331 | 0.105 | 0.170 | 0.138 | 0.180 | 0.178 | 0.188 | 0.216 | 0.155 | 0.185 |
UGATIT | 0.113 | 0.151 | 0.170 | 0.263 | 0.129 | 0.119 | 0.090 | 0.127 | 0.140 | 0.144 | 0.224 | 0.133 | 0.150 | |
AttentionGAN | 0.109 | 0.197 | 0.170 | 0.162 | 0.097 | 0.087 | 0.095 | 0.151 | 0.125 | 0.143 | 0.191 | 0.126 | 0.138 | |
SGCE-Font | 0.111 | 0.158 | 0.117 | 0.175 | 0.090 | 0.094 | 0.087 | 0.122 | 0.117 | 0.112 | 0.188 | 0.123 | 0.125 | |
PSNR | FUNIT | 9.07 | 6.52 | 6.77 | 4.83 | 9.86 | 7.75 | 8.66 | 7.48 | 7.52 | 7.28 | 6.68 | 8.16 | 7.56 |
UGATIT | 9.63 | 8.34 | 7.80 | 5.87 | 9.34 | 8.83 | 10.65 | 9.10 | 8.66 | 8.48 | 6.55 | 9.43 | 8.52 | |
AttentionGAN | 9.83 | 7.08 | 7.75 | 8.05 | 10.24 | 10.86 | 10.40 | 8.27 | 9.15 | 8.47 | 7.26 | 9.09 | 8.87 | |
SGCE-Font | 9.73 | 8.11 | 9.55 | 7.71 | 10.55 | 10.47 | 10.85 | 9.32 | 9.69 | 9.68 | 7.35 | 9.28 | 9.36 | |
SSIM | FUNIT | 0.525 | 0.426 | 0.388 | 0.241 | 0.295 | 0.511 | 0.485 | 0.412 | 0.389 | 0.386 | 0.373 | 0.460 | 0.408 |
UGATIT | 0.549 | 0.528 | 0.549 | 0.341 | 0.271 | 0.642 | 0.589 | 0.486 | 0.549 | 0.451 | 0.400 | 0.607 | 0.497 | |
AttentionGAN | 0.580 | 0.451 | 0.463 | 0.440 | 0.332 | 0.665 | 0.588 | 0.433 | 0.507 | 0.452 | 0.462 | 0.505 | 0.490 | |
SGCE-Font | 0.582 | 0.534 | 0.613 | 0.452 | 0.349 | 0.664 | 0.632 | 0.542 | 0.579 | 0.566 | 0.476 | 0.549 | 0.545 |
IV-B Superiority of Proposed SGCE Module
In this section, we implemented extensive experiments over fourteen generation tasks to demonstrate the superiority of the proposed SGCE module for the Chinese font generation. The quantitative comparisons between the proposed SGCE-Font and these baselines including CycleGAN [23], SQ-GAN [24], StrokeGAN [25] and SkeGAN are presented in Table III, and some visual comparisons on the generated characters are presented in Figure LABEL:fig:modecollapse and Figure 6.
From Table III, the proposed SGCE-Font model outperforms these concerned baselines in terms of four evaluation metrics in average. Specifically, when regarding the FID, the proposed SGCE-Font achieves the best results in nine font generation tasks and the second best results in three font generation tasks, and is significantly better than the baselines. Similar claims can be concluded in terms of the other three evaluation metrics.
When comparing the performance among different models, the refined CycleGAN models using different guidance information such as the square-block transformation in SQ-GAN [24], the stroke encoding in StrokeGAN [25] and the skeleton in SkeGAN outperform the original CycleGAN model for the Chinese font generation. Considering the performance of these models using different guidance information, the performance of SkeGAN using the skeleton information is significantly better than that of SQ-GAN and StrokeGAN in average, in terms of these four evaluation metrics. This show clearly that the skeleton information is much better than the square-block transformation and stroke encoding as the guidance information for the Chinese font generation. Moreover, when comparing the performance of SkeGAN and the proposed SGCE-Font, it can be observed that the proposed SGCE-Font outperforms SkeGAN in most of cases. This shows that the suggested way of using the skeleton information by the channel expansion as the direct input of the generator is much better than the way of imposing the skeleton information onto the discriminator as used in SkeGAN. These claims can be also verified by the visualization results as depicted in Figure 6. From Figure 6, the quality of generated characters of the proposed SGCE-Font is better than SkeGAN, while the performance of SkeGAN is much better than that of StrokeGAN, SQ-GAN as well as CycleGAN.
In particular, as shown in Figure LABEL:fig:modecollapse, the well-known CycleGAN suffers from the mode collapse issue when applied to these three font generation tasks, i.e., {Hupo Zhengkai, Fangsong Heiti, Fangsong Lishu}. Such mode collapse phenomena of CycleGAN can be also observed from Table III, where the FID values of CycleGAN in these three font generation tasks are 218.18, 304.34 and 282.81, which are abnormally large. Although existing square-block transformation and stroke encoding based guidance schemes can reduce the mode collapse to some extent as shown by the third and fourth rows of Figure LABEL:fig:modecollapse, the SQ-GAN [24] still suffers from the mode collapse in the font generation task {Hupo Zhengkai} and there is stroke-missing or stroke-redundancy phenomenon in the generated characters of StrokeGAN, while from the fifth rows of Figure LABEL:fig:modecollapse, the mode collapse issue of CycelGAN can be significantly alleviated by the suggested SGCE module, and the quality of the generated characters of the proposed SGCE-Font is much better than that of the other three baselines. These show clearly the superiority of the suggested SGCE module on reducing mode collapse.



model | hcls | lshc | zkss | zkzx | avg | |
---|---|---|---|---|---|---|
FID | CycleGAN | 30.173 | 33.015 | 204.112 | 115.039 | 95.585 |
SQ-GAN | 50.451 | 71.220 | 295.611 | 87.388 | 126.168 | |
FUNIT | 54.917 | 55.576 | 213.107 | 81.275 | 101.219 | |
UGATIT | 52.689 | 48.358 | 225.062 | 224.546 | 137.664 | |
AttentionGAN | 82.824 | 83.822 | 228.433 | 92.930 | 122.002 | |
SGCE-Font | 30.695 | 30.291 | 185.330 | 70.030 | 79.087 | |
MSE | CycleGAN | 0.148 | 0.155 | 0.219 | 0.097 | 0.154 |
SQ-GAN | 0.206 | 0.176 | 0.249 | 0.095 | 0.181 | |
FUNIT | 0.170 | 0.180 | 0.234 | 0.095 | 0.170 | |
UGATIT | 0.160 | 0.198 | 0.211 | 0.179 | 0.187 | |
AttentionGAN | 0.205 | 0.203 | 0.232 | 0.094 | 0.184 | |
SGCE-Font | 0.144 | 0.163 | 0.212 | 0.094 | 0.153 | |
PSNR | CycleGAN | 5.648 | 8.161 | 6.668 | 10.239 | 7.679 |
SQ-GAN | 8.364 | 7.721 | 6.095 | 10.412 | 8.148 | |
FUNIT | 7.730 | 7.480 | 6.356 | 10.361 | 7.982 | |
UGATIT | 8.068 | 7.075 | 6.774 | 7.543 | 7.365 | |
AttentionGAN | 6.911 | 6.944 | 6.380 | 10.415 | 7.662 | |
SGCE-Font | 8.489 | 7.953 | 6.934 | 10.456 | 8.458 | |
SSIM | CycleGAN | 0.008 | 0.519 | 0.456 | 0.683 | 0.416 |
SQ-GAN | 0.543 | 0.499 | 0.544 | 0.685 | 0.568 | |
FUNIT | 0.533 | 0.519 | 0.430 | 0.684 | 0.542 | |
UGATIT | 0.531 | 0.458 | 0.426 | 0.498 | 0.478 | |
AttentionGAN | 0.445 | 0.445 | 0.429 | 0.684 | 0.501 | |
SGCE-Font | 0.578 | 0.539 | 0.456 | 0.691 | 0.566 |
IV-C Comparison with State-of-the-art Models
In this section, we conducted twelve printing or handwriting font generation tasks to demonstrate the effectiveness of the proposed model through comparing with the state-of-the-art models including FUNIT [39], UGATIT [45], and AttentionGAN [40]. The comparison results between the proposed model and these existing models for the twelve standard character generation are presented in Table IV.
From Table IV, the proposed SGCE-Font achieves the best performance in seven of these twelve font generation tasks and performs the second best in four of these twelve font generation tasks in terms of FID, as presented in the fifth row of Table IV. The average FID value of the proposed model over these fourteen standard Characters generation tasks is much better than those of state-of-the-art models, as shown in the last column of Table IV. Similar claims can be concluded in terms of the other three evaluation metrics from Table IV.
Some generated characters of the proposed model as well as these three state-of-the-art models for some Chinese characters in the test set are shown in Figure 7. It can be observed from this figure that SGCE-Font can produce Chinese characters with higher quality than the concerned existing models. Specifically, FUNIT [39] does not perform very well when applied to the generation tasks {FangsongZhengkai, ZhengkaiHupo}, that is, the styles of generated fonts seem closer to the styles of source fonts than that of target fonts, as shown in the second row of Figure 7. Besides these, there are some flaws (say, missing some strokes) in the generated characters of existing models for some font generation tasks such as {ZhengkaiHandwriting} and {ZhengkaiShuti}, while these flaws can be remarkably reduced by the suggested SGCE-Font model. These demonstrate the effectiveness of the proposed model.
We also present some generated characters of the proposed model and these state-of-the-art models for some unseen Chinese characters, i.e., excluded by the concerned datasets, in Figure 8. It can be observed that the proposed SGCE-Font model can generate more realistic and better Chinese characters than existing models for these unseen Chinese characters. This shows that the proposed model has the better generalization performance.
IV-D Comparison in Calligraphy Font Generation
Besides the previous twelve printing or handwriting font generation tasks, we particularly considered four calligraphy font generation tasks to show the effectiveness of the proposed model. As presented in II, the total sizes of the Zhuxi (zx) and SuShi (ss) are only 920 and 166, which are much fewer than those of printing or handwriting font datasets. The quantitative comparison results over these four font generation tasks are presented in Table V.
task | zkfs | zkst | fsht | zkhw | fszk | stzk | htfs | hpzk | hcls | zkss | zkzx | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
UAGTIT | FID | Orig | 42.824 | 118.105 | 37.918 | 77.100 | 34.032 | 128.105 | 28.324 | 76.551 | 48.358 | 225.062 | 224.546 |
+SGCE | 29.516 | 33.060 | 31.177 | 71.696 | 32.290 | 48.030 | 25.291 | 67.203 | 37.943 | 103.618 | 84.286 | ||
MSE | Orig | 0.113 | 0.151 | 0.170 | 0.129 | 0.090 | 0.127 | 0.140 | 0.144 | 0.198 | 0.211 | 0.179 | |
+SGCE | 0.118 | 0.149 | 0.144 | 0.098 | 0.089 | 0.159 | 0.123 | 0.129 | 0.167 | 0.249 | 0.085 | ||
PSNR | Orig | 9.628 | 8.336 | 7.799 | 9.342 | 10.649 | 9.101 | 8.657 | 8.481 | 7.075 | 6.774 | 7.543 | |
+SGCE | 9.479 | 8.398 | 8.571 | 10.214 | 10.709 | 8.039 | 9.213 | 8.981 | 7.855 | 6.086 | 10.863 | ||
SSIM | Orig | 0.549 | 0.528 | 0.549 | 0.271 | 0.589 | 0.486 | 0.464 | 0.451 | 0.458 | 0.426 | 0.498 | |
+SGCE | 0.548 | 0.613 | 0.584 | 0.322 | 0.594 | 0.522 | 0.602 | 0.472 | 0.505 | 0.433 | 0.704 | ||
AtteniongGAN | FID | Orig | 32.319 | 92.032 | 32.796 | 26.861 | 57.464 | 51.844 | 37.740 | 55.689 | 83.822 | 228.433 | 92.930 |
+SGCE | 19.163 | 43.024 | 28.997 | 20.053 | 46.997 | 38.718 | 25.601 | 58.478 | 24.550 | 237.649 | 72.509 | ||
MSE | Orig | 0.109 | 0.197 | 0.170 | 0.097 | 0.095 | 0.151 | 0.125 | 0.143 | 0.203 | 0.232 | 0.094 | |
+SGCE | 0.106 | 0.151 | 0.105 | 0.094 | 0.095 | 0.122 | 0.105 | 0.134 | 0.143 | 0.220 | 0.094 | ||
PSNR | Orig | 9.826 | 7.084 | 7.749 | 10.241 | 10.398 | 8.265 | 9.148 | 8.470 | 6.944 | 6.380 | 10.415 | |
+SGCE | 9.941 | 8.334 | 9.998 | 10.391 | 10.368 | 9.301 | 9.880 | 8.780 | 8.509 | 6.744 | 10.450 | ||
SSIM | Orig | 0.580 | 0.451 | 0.463 | 0.332 | 0.588 | 0.433 | 0.507 | 0.452 | 0.445 | 0.429 | 0.684 | |
+SGCE | 0.589 | 0.520 | 0.571 | 0.344 | 0.587 | 0.506 | 0.602 | 0.475 | 0.562 | 0.444 | 0.689 | ||
SQ-GAN | FID | Orig | 49.865 | 71.694 | 38.896 | 49.532 | 99.038 | 146.293 | 37.546 | 305.931 | 50.451 | 295.611 | 115.039 |
+SGCE | 45.790 | 49.004 | 35.345 | 46.152 | 40.964 | 101.882 | 84.855 | 88.694 | 30.805 | 125.620 | 77.896 | ||
MSE | Orig | 0.147 | 0.199 | 0.186 | 0.106 | 0.152 | 0.158 | 0.143 | 0.161 | 0.206 | 0.249 | 0.090 | |
+SGCE | 0.116 | 0.187 | 0.123 | 0.100 | 0.123 | 0.164 | 0.133 | 0.147 | 0.164 | 0.262 | 0.094 | ||
PSNR | Orig | 9.822 | 7.058 | 7.817 | 9.791 | 9.369 | 8.050 | 8.626 | 7.404 | 8.364 | 6.095 | 10.635 | |
+SGCE | 9.539 | 7.323 | 9.318 | 10.082 | 9.180 | 7.887 | 8.852 | 8.378 | 7.891 | 5.858 | 10.432 | ||
SSIM | Orig | 0.570 | 0.448 | 0.478 | 0.293 | 0.530 | 0.428 | 0.485 | 0.359 | 0.543 | 0.544 | 0.754 | |
+SGCE | 0.570 | 0.488 | 0.583 | 0.320 | 0.535 | 0.423 | 0.497 | 0.465 | 0.532 | 0.405 | 0.690 | ||
FUNIT | FID | Orig | 37.505 | 43.675 | 64.990 | 75.553 | 151.728 | 207.221 | 40.677 | 254.404 | 55.576 | 282.494 | 139.837 |
+SGCE | 38.683 | 41.299 | 57.760 | 77.190 | 133.800 | 205.078 | 53.910 | 132.696 | 55.480 | 111.208 | 101.170 | ||
MSE | Orig | 0.127 | 0.224 | 0.212 | 0.105 | 0.138 | 0.180 | 0.178 | 0.188 | 0.180 | 0.167 | 0.093 | |
+SGCE | 0.128 | 0.220 | 0.212 | 0.106 | 0.137 | 0.178 | 0.177 | 0.189 | 0.183 | 0.182 | 0.095 | ||
PSNR | Orig | 9.075 | 6.523 | 6.772 | 9.861 | 8.660 | 7.476 | 7.525 | 7.276 | 7.480 | 7.819 | 10.414 | |
+SGCE | 9.042 | 6.604 | 6.777 | 9.791 | 8.727 | 7.519 | 7.553 | 7.271 | 7.407 | 7.476 | 10.287 | ||
SSIM | Orig | 0.525 | 0.426 | 0.388 | 0.295 | 0.485 | 0.412 | 0.389 | 0.386 | 0.519 | 0.397 | 0.686 | |
+SGCE | 0.522 | 0.433 | 0.386 | 0.291 | 0.498 | 0.416 | 0.395 | 0.380 | 0.519 | 0.498 | 0.691 |
task | zkfs | zkst | fsht | zkhw | fszk | stzk | htfs | hpzk | ||
---|---|---|---|---|---|---|---|---|---|---|
StrokeGAN | FID | Orig | 57.851 | 71.302 | 28.972 | 37.779 | 67.507 | 71.302 | 39.610 | 140.227 |
+SGCE | 33.283 | 25.230 | 15.861 | 28.201 | 24.959 | 44.303 | 21.837 | 89.440 | ||
MSE | Orig | 0.120 | 0.158 | 0.188 | 0.104 | 0.104 | 0.125 | 0.129 | 0.181 | |
+SGCE | 0.114 | 0.189 | 0.123 | 0.095 | 0.128 | 0.169 | 0.117 | 0.168 | ||
PSNR | Orig | 9.359 | 6.774 | 7.318 | 9.906 | 10.002 | 6.774 | 9.043 | 8.379 | |
+SGCE | 9.613 | 7.311 | 9.362 | 10.336 | 8.996 | 7.763 | 9.495 | 9.010 | ||
SSIM | Orig | 0.553 | 0.464 | 0.462 | 0.301 | 0.574 | 0.464 | 0.533 | 0.448 | |
+SGCE | 0.577 | 0.495 | 0.600 | 0.448 | 0.520 | 0.432 | 0.567 | 0.503 |

From Table V, the proposed SGCE-Font achieves the best performance in average in terms of FID, MSE and PSNR, while acheives the second best performance in average in terms of SSIM. Specifically, in terms of FID, the proposed SGCE-Font model achieves the best results in the font generation tasks {lshc, zkss, zkzx}, while achieves the second best result in the font generation tasks {hcls}. Similar claims can be also claimed in terms of the other three evaluation metrics. We also present some visualization comparisons between the proposed model and five state-of-the-art models over two calligraphy font generation tasks. The comparison results for the characters in the test set and the unseen characters excluded by the datasets are presented in Figure 9. It can be observed from Figure 9 that the quality of generated characters of the proposed model is better than the baselines. These show clearly the effectiveness of the proposed model in these calligraphy font generation tasks.
IV-E Generalization of Proposed SGCE Module
As pointed out before, the proposed SGCE module can be used as a plug-and-play module for the Chinese font generation models. In the following, we adapted the proposed SGCE module to five existing models including UGATIT [45], AttentionGAN [40], StrokeGAN [25], SQ-GAN [24] and FUNIT [39] to further enhance their performance. We incorporated SGCE into each base model by a similar way in Figure 5. For UGATIT, AttentionGAN, SQ-GAN and FUNIT, we implemented eleven font generation tasks including eight printing or handwriting font generation tasks and three calligraphy font generation tasks to evaluate the effectiveness of the proposed SCGE module, while for StrokeGAN, we did not tested its performance in these three calligraphy font generation tasks since it was expensive to build up the stroke encodings of the traditional Chinese characters involved in the calligraphy font generation tasks.
The quantitative comparison results are presented in Table VI and Table VII. From Table VI and Table VII, the performance of these five concerned models can be substantially improved over most of font generation tasks, through equipping with the proposed SGCE module in terms of these four evaluation metrics. We also present some visualization comparison results in Figure 10. It can be observed from Figure 10 that the quality of generated characters can be improved by equipping with the proposed SGCE module. These show clearly the effectiveness of the proposed SGCE module.
V Conclusion
Using what kind of information as the guidance information and how to use it to reduce the mode collapse issue are two important questions for the predominated GAN based models for the Chinese font generation. This paper proposed an effective guidance module called SGCE for the Chinese font generation models, where the skeleton information is adopted and used through a channel expansion way, motivated by the observation that the skeleton embodies both local and global information of a Chinese character. The channel expansion way of using skeleton directly imposes the skeleton guidance information onto the generator, which is more effective than the existing way that imposes the guidance information onto the discriminator. Extensive experiments are conducted to demonstrate the effectiveness of the proposed SGCE module. Experimental results show that the proposed SGCE module is more effective in reducing the mode collapse issue suffered by the well-known CycleGAN, and can be easily adapted to existing models as a plug-and-play module to further improve their performance. One future direction is to adapt the proposed module to the generation of other languages (e.g., Korean) or other scenarios like the few-shot Chinese font generation.
References
- [1] G. Zhang, W. Huang, R. Chen, J. Yang, and H. Peng, “Calligraphy fonts generation based on generative adversarial networks,” ICIC express letters. Part B, Applications: an international journal of research and surveys, vol. 10, no. 3, pp. 203–209, 2019.
- [2] Y. Wang, G. Pu, W. Luo, Y. Wang, P. Xiong, H. Kang, and Z. Lian, “Aesthetic text logo synthesis via content-aware layout inferring,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2436–2445, June 2022.
- [3] C. Luo, Y. Zhu, L. Jin, Z. Li, and D. Peng, “Slogan: Handwriting style synthesis for arbitrary-length and out-of-vocabulary text,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
- [5] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in Proceedings of the 2014 International Conference on Learning Representations, 2014.
- [6] S. Xu, F. C. Lau, W. K. Cheung, and Y. Pan, “Automatic generation of artistic chinese calligraphy,” IEEE Intelligent Systems, vol. 20, no. 3, pp. 32–39, 2005.
- [7] Y. Tian, “zi2zi: Master chinese calligraphy with conditional adversarial networks.” https://github.com/kaonashi-tyc/zi2zi, 2017. Accessed:2022-08-16.
- [8] T. Yamada, M. Hosoe, K. Kato, and K. Yamamoto, “The character generation in handwriting feature extraction using variational autoencoder,” in Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 2017.
- [9] K. Goto and N. Inoue, “Learning vae with categorical labels for generating conditional handwritten characters,” in Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA), 2021.
- [10] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Few-shot font generation with localized style representations and factorization,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2393–2402, May 2021.
- [11] M. Qin, Z. Zhang, and X. Zhou, “Disentangled representation learning gans for generalized and stable font fusion network,” IET Image Processing, vol. 16, no. 2, pp. 393–406, 2022.
- [12] P. Lyu, X. Bai, C. Yao, Z. Zhu, T. Huang, and W. Liu, “Auto-encoder guided gan for chinese calligraphy synthesis,” in Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1095–1100, IEEE, 2017.
- [13] Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “Dcfont: an end-to-end deep chinese font generation system,” in SIGGRAPH Asia 2017 Technical Briefs, pp. 1–4, 2017.
- [14] Y. Lei, L. Zhou, T. Pan, H. Qian, and Z. Sun, “Learning and generation of personal handwriting style chinese font,” in Proceedings of the 2018 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 1909–1914, IEEE, 2018.
- [15] J. Chang, Y. Gu, Y. Zhang, Y.-F. Wang, and C. Innovation, “Chinese handwriting imitation with hierarchical generative adversarial network.,” in Proceedings of the BMVC, p. 290, 2018.
- [16] S.-J. Wu, C.-Y. Yang, and J. Y.-j. Hsu, “Calligan: Style and structure-aware chinese calligraphy character generator,” 2020.
- [17] S. Yuan, R. Liu, M. Chen, B. Chen, Z. Qiu, and X. He, “Se-gan: Skeleton enhanced gan-based model for brush handwriting font generation,” 2022.
- [18] S. Park, S. Chun, J. Cha, B. Lee, and H. Shim, “Multiple heads are better than one: Few-shot font generation with multiple localized experts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13900–13909, 2021.
- [19] A. U. Hassan, H. Ahmed, and J. Choi, “Unpaired font family synthesis using conditional generative adversarial networks,” Knowledge-Based Systems, vol. 229, p. 107304, 2021.
- [20] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
- [21] Z. Qian and D. Fang, “Towards chinese calligraphy,” Macalester International, vol. 18, no. 1, p. 12, 2007.
- [22] T. Chen, Chinese Calligraphy. Cambridge University Press, 2011.
- [23] B. Chang, Q. Zhang, S. Pan, and L. Meng, “Generating handwritten chinese characters using cyclegan,” in Proceedings of the 2018 IEEE winter conference on applications of computer vision (WACV), pp. 199–207, IEEE, 2018.
- [24] J. Zeng, Q. Chen, and M. Wang, “Self-supervised chinese font generation based on square-block transformation (in chinese),” Sci Sin Inform, vol. 52, no. 1, p. 15, 2022.
- [25] J. Zeng, Q. Chen, Y. Liu, M. Wang, and Y. Yao, “Strokegan: Reducing mode collapse in chinese font generation via stroke encoding,” in Proceedings of the AAAI, vol. 3, 2021.
- [26] Y. Jiang, Z. Lian, Y. Tang, and J. Xiao, “Scfont: Structure-guided chinese font generation via deep stacked networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4015–4022, Jul. 2019.
- [27] Y. Lin, H. Yuan, and L. Lin, “Chinese typography transfer model based on generative adversarial network,” in Proceedings of the 2020 Chinese Automation Congress (CAC), pp. 7005–7010, IEEE, 2020.
- [28] Y. Xie, X. Chen, L. Sun, and Y. Lu, “Dg-font: Deformable generative networks for unsupervised font generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5130–5140, 2021.
- [29] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
- [30] M. Li, J. Wang, Y. Yang, W. Huang, and W. Du, “Improving gan-based calligraphy character generation using graph matching,” in Proceedings of the 2019 IEEE 19th International Conference on Software Quality, Reliability and Security Companion (QRS-C), pp. 291–295, IEEE, 2019.
- [31] Z. Lian and J. Xiao, “Automatic shape morphing for chinese characters,” in Proceedings of the SIGGRAPH Asia 2012 Technical Briefs, pp. 1–4, 2012.
- [32] P. Liu, S. Xu, and S. Lin, “Automatic generation of personalized chinese handwriting characters,” in Proceedings of the 2012 Fourth International Conference on Digital Home, pp. 109–116, IEEE, 2012.
- [33] J.-W. Lin, C.-Y. Hong, R.-I. Chang, Y.-C. Wang, S.-Y. Lin, and J.-M. Ho, “Complete font generation of chinese characters in personal handwriting style,” in Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), pp. 1–5, IEEE, 2015.
- [34] X. Lin, J. Li, H. Zeng, and R. Ji, “Font generation based on least squares conditional generative adversarial nets,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 783–797, 2019.
- [35] C. Wen, Y. Pan, J. Chang, Y. Zhang, S. Chen, Y. Wang, M. Han, and Q. Tian, “Handwritten chinese font generation with collaborative stroke refinement,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3882–3891, 2021.
- [36] Y. Gao and J. Wu, “Gan-based unpaired chinese character image translation via skeleton transformation and stroke rendering,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 646–653, Apr. 2020.
- [37] S. Yu, J. Zhao, X. Ye, C. Tang, and Y. Zhen, “Calligraphic chinese characters generation based on generative adversarial networks with structural constraint,” Pattern Recognition and Artificial Intelligence, 2021.
- [38] T. Y. Zhang and C. Y. Suen, “A fast parallel algorithm for thinning digital patterns,” Communications of the ACM, vol. 27, no. 3, pp. 236–239, 1984.
- [39] M.-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, and J. Kautz, “Few-shot unsupervised image-to-image translation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10551–10560, 2019.
- [40] H. Tang, H. Liu, D. Xu, P. H. Torr, and N. Sebe, “Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks,” IEEE Transactions on Neural Networks and Learning Systems, 2021.
- [41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the International conference on machine learning, pp. 448–456, PMLR, 2015.
- [42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
- [43] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
- [44] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- [45] J. Kim, M. Kim, H. Kang, and K. H. Lee, “U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation,” in Proceedings of the 2020 International Conference on Learning Representations, 2020.