SP-BatikGAN: An Efficient Generative Adversarial Network for Symmetric Pattern Generation

Abstract

Following the contention of AI arts, our research focuses on bringing AI for all, particularly for artists, to create AI arts with limited data and settings. We are interested in geometrically symmetric pattern generation, which appears on many artworks such as Portuguese, Moroccan tiles, and Batik, a cultural heritage in Southeast Asia. Symmetric pattern generation is a complex problem, with prior research creating too-specific models for certain patterns only. We provide publicly¹¹1ITB-mBatik Dataset with CC-BY-NC-SA license can be found:
https://data.mendeley.com/datasets/7hzr5539ws, the first-ever 1,216 high-quality symmetric patterns straight from design files for this task. We then formulate symmetric pattern enforcement (SPE) loss to leverage underlying symmetric-based structures that exist on current image distributions. Our SPE improves and accelerates training on any GAN configuration, and, with efficient attention, SP-BatikGAN compared to FastGAN, the state-of-the-art GAN for limited setting, improves the FID score from 110.11 to 90.76, an 18% decrease, and model diversity recall score from 0.047 to 0.204, a 334% increase.

Index Terms— GAN, Application, Unsupervised Image Generation, Batik Generation, Symmetric Pattern Generation

1 Introduction

Batik is a cultural icon in Southeast Asia, a ‘Masterpiece of Oral and Intangible Heritage of Humanity’ by UNESCO, is an ancient art to produce beautiful textiles. In recent year many research [1, 2, 3, 4, 5, 6] interests grew utilizing AI to generate art with unique complex pattern style such as batik. However, this early preliminearies research suffer from difficulty of training unstable GAN, the need of many data sample, the difficulty to get high quality data suitable for unsupervised image generation, and pattern structure from each art is complex and vary greatly.

First problem, while large dataset is preferable and important, but also too the quality of the images. Second, even when we have the dataset, mixing different pattern type or style together does not necessarily give a guarentee of good results i.e. even with thousands of images, result still not as high quality as it could be[1, 4]. Recent research [3] tradeoff generality with quality as their approach is to limit a given style with a fixed loss function only for a single specific pattern type.

To address these problems, we first collect and preprocess high-quality batik data. Then, we focused on generalizing symmetrical pattern generation tasks. We outline a framework that allows the generative model to efficiently train the generator on symmetric pattern tasks, which is one of the key characteristics of batik and other cultural artwork as well. From the framework itself, we then intuitively derive Symmetric Pattern Enforcement (SPE) Loss. With SPE we are able to improve image quality and detail from state-of-the-art FastGAN[7] on symmetric pattern generation task.

In short, our contributions are the following:

•

We generalize symmetric pattern generation task for other different symmetric pattern type (Section 3.2)
•

We create intuitive learning method by deriving SPE loss from existing symmetric structure in the dataset (Section 3.2.1)
•

SPE loss improve the SOTA FastGAN and combined with Efficient Attention, SPE Loss accelerate GAN training, reducing FID, and drastically improve recall (Section 4.2)

2 RELATED WORK

Prior related batik generation research. In recent years there are great interest in batik generation with AI topics, such as [6], [5], and [2] where they respectively use text-to-image, Conditional GAN, and neural style transfer. For unsupervised generation tasks, early research such as [4], uses simple DC-GAN to generate batik motif synthesis. Other [1] continue this research with a more diverse dataset. However, for both research, they never measure any metrics and the generated image only has vague batik characteristics with too much noise. The most recent research for unsupervised batik generation tasks was BatikGAN [3], however, BatikGAN has too many constraints. Even with five new custom losses, it only takes into account one type of pattern. The generated image is also low in resolution only 32×32 pixels per patch.

Advancement on GAN for limited computation and dataset. Unsupervised image generation GAN, from the first original GAN to the current SOTA at the time of writing such as StyleGAN2 [8], the amount of data needed to train a GAN model is enormous. However, recent advancements such as Adaptive Discriminator Augmentation (ADA) [9] and Differentiable Augmentation for data-efficient GAN [10] can reduce the requirement of GAN data into only hundreds of images. This leads us to use the state-of-the-art FastGAN [7] as our baseline, where it outperforms previous SOTA StyleGAN2 in limited computation and dataset environments.

3 METHODS

Refer to caption — Fig. 1: ITB-mBatik Dataset. Examples of the seamless and symmetric pattern pair, out of the final 1,216 in the final proposed dataset.

To have a good image generation result, the quality of input images is crucial. In our case of Batik pattern, a traditional clothing pattern is difficult to acquire, especially in high resolution, and this is where most of the previous research was lacking. Previous research used UI Batik Dataset [11] and the image was taken only with a camera and was noticeably substandard. Thus, for future research, we acquire and create our own more modern digital Batik dataset. We propose a better, new high-quality, publicly available dataset, straight from design files called Bandung Institute of Technology modern Batik (ITB-mBatik) dataset [12], a better-suited dataset for unsupervised image generation task shown in Fig. 1.

3.1 Inefficiency of Whole Pattern Generation and Differing Symmetry Type

Not only high-quality images are difficult to collect, another discernible problem in previous work [1, 4] is the inefficiency of training the generator directly on a whole pattern. Fitting the generator with a full repeating pattern unnecessarily increase real image distribution complexity, where the generator needs to take to account many different arbitrary repeating pattern, resulting in poor subpar results. Let a ”patch” be defined as the most minimum image subset, $\mathbf{p}\subset\mathbf{P}$ in which the patch can reconstruct the original repeating pattern, $\textbf{tile}(\mathbf{p})=\mathbf{P}$ . If a repeating pattern $\mathbf{P}$ can be fully reconstructed with the repetition of the subset $\mathbf{p}$ , then rather than redundantly training the generator to learn the whole repeating pattern, generating the subset pattern is a better efficient approach.

For patch generation, the patch patterns themselves have a high degree of freedom and flexibility. In addition to generating a patch, to limit and make the patch sensible to work with, we put a constraint that every patch across the dataset need to have a common symmetrical transformation. Training GAN with different symmetry will cause a problem of asymmetry and can damage the aesthetic of the output image. Therefore, putting a constraint on the dataset to have at least one similar symmetrical transformation preserve and avoid asymmetry.

3.2 Symmetric Patch Pattern Generation Task within 2-Dimensional Basic Isometries

We define our tasks within the most common group of transformation, which is the Euclidian group within 2-D space or isometries of E(2) (keep in mind there exist symmetry outside the Euclidian group). First, let us define the patch image $\mathbf{x}$ in our dataset of patches $\mathcal{X}$ as $\mathbf{x}\in\mathcal{X}$ . A patch pattern has symmetrical transformation $T$ , if the patch is equal to itself when transformed or, $\mathbf{x}=T\mathbf{x}$ . Common symmetrical transformations set $S$ , which we desire, is a set of symmetrical transformations that are valid and true for every patch across the dataset. To be a symmetric pattern generation task, one common symmetrical transformation must atleast exist, or $S\neq\varnothing$ . Finally, let $T_{s}$ be a symmetrical transformation within $S$ so the following properties hold true:

\forall{{T}_{s}\in S},\forall{\mathbf{x}\in\mathcal{X}},(\mathbf{x}=T_{s}\mathbf{x})

(1)

3.2.1 Symmetric Pattern Enforcement

Since we know every output image must have symmetric properties, by Eq. 1, this motivates the idea to enforce symmetric properties for each pattern directly to the generator right from the beginning. We call this loss, Symmetrical Pattern Enforcement (SPE), denotes by $\mathcal{L}_{\text{SPE}}$ , where we aggregate image output similarity from the generator directly with what should have been its equivalent geometrical symmetry counterpart. Fig. 2 shows where $\mathcal{L}_{\text{SPE}}$ used only the raw output from the generator and before DiffAug $T$ augment both real and fake images. Lastly, to aggregate generated image with its symmetrical counterparts, our SPE Loss based on Similarity Loss $\mathcal{L}_{\text{Sim}}$ . Thus, in general Symmterical Pattern Enforcement Loss formally defined as:

\mathcal{L}_{\text{SPE}}(G(z))=\frac{1}{|S|}\sum_{T_{s}\in S}\mathcal{L}_{\text{Sim}}(G(z),T_{s}G(z))

(2)

where $G(z)$ is the generated output image directly taken from the generator, $\frac{1}{|S|}$ a normalization method for case of high cardinality of $S$ .

3.2.2 Implementation of Symmetric Pattern Enforcement for ITB-mBatik

In our implementation, however, we reduce our generality for the sake of practicality. First, Cartan-Dieudonné theorem [13] establish that complex high dimensional isommetry can be constructed as a composition of reflections or multiple basic simple reflection. Therefor, in order to reduce loss complexity we will derive loss focusing on a set of observable intuitive reflection symmetry in ITB-mBatik Dataset.

We found 4 simple and most intuitive symmetrical transformation that satisfied the Eq. 1, which are reflection on horizontal, vertical axis, and diagonal with positive gradient and negative counterpart as shown in Fig. 3. Additionally, in application, the transformation operation from $R$ must be differentiable and support autograd for backprop. Therefore, let $R=\{\text{h-flip, v-flip, p-flip, n-flip}\}$ , then we further optimize by finding the best subset operation by constructing powerset of $R$ or $\mathcal{P}(R)$ . To reduce computation however, later we experiment with four subset for SPE Loss.

Next, using Eq. 2 as template, we implement SPE as:

\displaystyle\mathcal{L}_{\text{SPE}}(G(z))=\frac{1}{|R|}\sum_{T_{r}\in R}\mathcal{L}_{\text{Sim}}(G(z),T_{r}G(z))

(3)

Where for $\mathcal{L}_{\text{Sim}}$ we use simple similarities metrics L2, but $\mathcal{L}_{\text{Sim}}$ can use other function such as L1 or LPIPS[14] as well.

4 EXPERIMENTS

Dataset. We used our ITB-mBatik dataset [12]. From the raw design files, we preprocess crop scale the repeating area “patches” to $1024\times 1024$ pixels. Additionally, we handle translational symmetry by multi-phase sampling and increasing the Dataset training size.
Metric. In the evaluation process, we will measure five unsupervised image generation metrics. Fréchet Inception Distance [15], Improved Precision and Recall [16], and Density and Coverage [17] for better insight on generator performance.
GAN Training Summary. We used FastGAN with DiffAug as our baseline. We set output image size to $254\times 254$ and for GANs losses, we believe different losses perform similarly from each other [18], therefore we use Hinge Loss [19] for its simplicity and fast computation. For optimizers we use Adam Optimizers [20] with learning rate of $2e-4$ , $\beta_{1}=0.5,\beta_{2}=0.999$ . We then train the GAN five times and report the best FID result. Each GAN trained for 200k minibatch time, where the batch size is 8. Lastly, for each 1k minibatch, we save the model for evaluation and possible early stopping purposes.
Environment used. For hardware, the research environment run on an i5-3470 CPU, 16GB of RAM, and GPU RTX 3060 with 12GB of VRAM. For software, the research environment will use PyTorch for its GAN model, which consists of the Generator and Discriminator, and TensorFlow to run InceptionV3 and VGG-16 for evaluation purposes.

4.1 Symmetric Pattern Enforcement Analysis

First, we experiment with $R:=$ {hv, np, hvnp, [hv,np]}, where h, v, n, p characters represent horizontal, vertical, diagonal negative, diagonal positive flip respectively, and [hv, np] is an operation where hv-flip, np-flip used interchangeably.

Table 1 show the best result for each $R$ operation combination set. We find hv-flip combination achieves the lowest FID from 110.106 to 97.852 and increase baseline Recall by twice from 0.047 to 0.094, but there is slight deterioration on both Fidelity metric. Interestingly, however, if we use more symmetrical operation such as $hvnp$ , SPE improve over baseline in every metric. In this case, however, since FID is the main metric to improve, further experimentation was done with SPE Loss config set to hv-flip.

Table 1: SPE Loss Experiment Results. Bold number show best score while underline shows second bests.

SPE Loss Config			Fidelity		Diversity
SPE Loss Config		FID $\downarrow$	Pre $\uparrow$	Den $\uparrow$	Rec $\uparrow$	Cov $\uparrow$
Baseline		110.106	0.821	0.863	0.047	0.557
L2	$\bm{hv}$	97.852	0.762	0.696	0.094	0.490
	$np$	103.782	0.838	0.866	0.068	0.564
	$hvnp$	102.624	0.835	0.904	0.067	0.570
	$[hv,np]$	111.291	0.819	0.762	0.048	0.463

4.2 Accelerating Attention Based Generator with SPE

Table 2: Attention based generator with SPE Enforcement experiment results. Bold number show best score while underline shows second best. Second column next to attention mechanism name denotes on which feature resolution attention is applied to.

Model			Fidelity		Diversity
Model		FID $\downarrow$	Pre $\uparrow$	Den $\uparrow$	Rec $\uparrow$	Cov $\uparrow$
Baseline+SPE		97.852	0.762	0.696	0.094	0.490
+SAN	$8^{2}$	122.27	0.797	0.774	0.011	0.428
+SAN	$16^{2}$	139.78	0.875	0.773	0.007	0.298
+EAttn	$8^{2}$	99.62	0.772	0.775	0.087	0.461
	$16^{2}$	99.62	0.772	0.763	0.102	0.480
	$32^{2}$	90.76	0.737	0.661	0.204	0.541

Attention mechanism achieves SOTA in many fields, such as attention-based GAN[21], where attention mechanism Self Attention Network (SAN) [22] added into StyleGAN2 [23]. However, the original attention based Generator was expensive computationally. Due to this, we also experiment with Efficient Attention (EAttn) [24], a more reasonable alternative for our limited settings.

In previous experimentation, attention-based generators have trouble training and converging due to increase in parameters and low training data size. However with SPE Loss, Fig. 4 shows every single model’s performance improved significantly. Now, all EAttn generators from $8^{2},16^{2},32^{2}$ achieve lower than 100 FID with the smallest EAttn $32^{2}$ improved from 127.06 to a record low 90.76.

This record low FID however came with a tradeoff between fidelity and diversity. Every generator configuration that uses SPE, has a slightly reduced fidelity metrics. This however to be expected since FastGAN designed for high fidelity to begin with. In return when diversity increases we can see in Fig. 5 between Baseline and Baseline with SPE+EAttn, noticeably SPE+EAttn have much finer details and better symmetry. We also find EAttn covers more different styles with better detail than the previous method.

5 CONCLUSION

With the massive stride of advancements in GAN, particularly in limited computation and data, mainstream applications of GAN will start to follow. As such, this paper covers the batik generation task, a pop icon in Southeast Asia. While doing this research, we found symmetrical pattern tasks generalizable to similar tasks, such as Portuguese or Morocan tiles. With a new and better high-quality dataset, exploration of cheaper attention adoption to GAN in limited computation, and SPE Loss for training acceleration, we improve symmetric pattern generation significantly from the best state-of-the-art FastGAN in terms of image quality (FID) and detail (recall). We believe this research would be the start and the foundation of future AI Art, Symmetric Pattern Generation applications for artists.

References

[1] Agus Eko Minarno, Moch. Chamdani Mustaqim, Yufis Azhar, Wahyu Andhyka Kusuma, and Yuda Munarko, “Deep convolutional generative adversarial network application in batik pattern generator,” in 2021 9th International Conference on Information and Communication Technology (ICoICT), 2021, pp. 54–59.
[2] Aditya Firman Ihsan, “A study of batik style transfer using neural network,” in 2021 9th International Conference on Information and Communication Technology (ICoICT), 2021, pp. 313–319.
[3] Wei-Ta Chu and Lin-Yu Ko, “BatikGAN: A Generative Adversarial Network for Batik Creation,” in Proceedings of the 2020 Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia, New York, NY, USA, 2020, MMArt-ACM ’20, pp. 13–18, Association for Computing Machinery.
[4] Miqdad Abdurrahman, Nabila Husna Shabrina, and Dareen K. Halim, “Generative adversarial network implementation for batik motif synthesis,” in 2019 5th International Conference on New Media Studies (CONMEDIA), 2019, pp. 63–67.
[5] Syenne Ecstexela and Sung-Ho Bae, “A new batik pattern generation method for creative photography using deep neural networks,” in Proceedings of the Korean Society of Information Sciences, 06 2019.
[6] Aifa Nur Amalia, Arief Fatchul Huda, Diena Rauda Ramdania, and Mohamad Irfan, “Making a batik dataset for text to image synthesis using generative adversarial networks,” in 2019 IEEE 5th International Conference on Wireless and Telematics (ICWT), 2019, pp. 1–7.
[7] Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal, “Towards faster and stabilized gan training for high-fidelity few-shot image synthesis,” 2021.
[8] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila, “Analyzing and improving the image quality of StyleGAN,” in Proc. CVPR, 2020.
[9] Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila, “Training generative adversarial networks with limited data,” in Proc. NeurIPS, 2020.
[10] Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han, “Differentiable augmentation for data-efficient gan training,” 2020.
[11] Yohanes Gultom, Aniati Murni Arymurthy, and Rian Josua Masikome, “Batik Classification using Deep Convolutional Network Transfer Learning,” Jurnal Ilmu Komputer dan Informasi, vol. 11, no. 2, 2018.
[12] Chrystian Chrystian, “Itb-mbatik dataset,” 2023.
[13] Jean Gallier, The Cartan-Dieudonné Theorem, pp. 197–247, Springer New York, New York, NY, 2001.
[14] Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018.
[15] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. 2017, vol. 30, Curran Associates, Inc.
[16] Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila, “Improved precision and recall metric for assessing generative models,” CoRR, vol. abs/1904.06991, 2019.
[17] Muhammad Ferjad Naeem, Seong Joon Oh, Youngjung Uh, Yunjey Choi, and Jaejun Yoo, “Reliable fidelity and diversity metrics for generative models,” 2020.
[18] Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, and Olivier Bousquet, “Are gans created equal? a large-scale study,” 2018.
[19] Jae Hyun Lim and Jong Chul Ye, “Geometric gan,” 2017.
[20] Diederik P. Kingma and Jimmy Ba, “Adam: A method for stochastic optimization,” 2017.
[21] Ning Yu, Guilin Liu, Aysegul Dundar, Andrew Tao, Bryan Catanzaro, Larry Davis, and Mario Fritz, “Dual contrastive loss and attention for gans,” in IEEE International Conference on Computer Vision (ICCV), 2021.
[22] Hengshuang Zhao, Jiaya Jia, and Vladlen Koltun, “Exploring self-attention for image recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
[23] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila, “Analyzing and Improving the Image Quality of StyleGAN,” CoRR, vol. abs/1912.0, 2019.
[24] Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li, “Efficient attention: Attention with linear complexities,” in WACV, 2021.