Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

Jaewoong Choi, Junho Lee¹¹footnotemark: 1, Changyeon Yoon, Jung Ho Park,
Geonho Hwang, Myungjoo Kang
Seoul National University
{chjw1475,joon2003,shinypond,jhpark009,hgh2134,mkang}@snu.ac.kr Equal contributionCorresponding author

Abstract

The discovery of the disentanglement properties of the latent space in GANs motivated a lot of research to find the semantically meaningful directions on it. In this paper, we suggest that the disentanglement property is closely related to the geometry of the latent space. In this regard, we propose an unsupervised method for finding the semantic-factorizing directions on the intermediate latent space of GANs based on the local geometry. Intuitively, our proposed method, called Local Basis, finds the principal variation of the latent space in the neighborhood of the base latent variable. Experimental results show that the local principal variation corresponds to the semantic factorization and traversing along it provides strong robustness to image traversal. Moreover, we suggest an explanation for the limited success in finding the global traversal directions in the latent space, especially $\mathcal{W}$ -space of StyleGAN2. We show that $\mathcal{W}$ -space is warped globally by comparing the local geometry, discovered from Local Basis, through the metric on Grassmannian Manifold. The global warpage implies that the latent space is not well-aligned globally and therefore the global traversal directions are bound to show limited success on it.

1 Introduction

Generative Adversarial Networks (GANs, Goodfellow et al. (2014)), such as ProGAN (Karras et al., 2018), BigGAN (Brock et al., 2018), and StyleGANs (Karras et al., 2019; 2020b; 2020a), have shown tremendous performance in generating high-resolution photo-realistic images that are often indistinguishable from natural images. However, despite several recent efforts (Goetschalckx et al., 2019; Jahanian et al., 2019; Plumerault et al., 2020; Shen et al., 2020) to investigate the disentanglement properties (Bengio et al., 2013) of the latent space in GANs, it is still challenging to find meaningful traversal directions in the latent space corresponding to the semantic variation of an image.

The previous approaches to find the semantic-factorizing directions are categorized into local and global methods. The local methods (e.g. Ramesh et al. (2018), Latent Mapper in StyleCLIP (Patashnik et al., 2021), and attribute-conditioned normalizing flow in StyleFlow (Abdal et al., 2021)) suggest a sample-wise traversal direction. By contrast, the global methods, such as GANSpace (Härkönen et al., 2020) and SeFa (Shen & Zhou, 2021), propose a global direction for the particular semantics (e.g. glasses, age, and gender) that works on the entire latent space. Throughout this paper, we refer to these global methods as the global basis. These global methods showed promising results. However, these methods are successful on the limited area, and the image quality is sensitive to the perturbation intensity. In fact, if a latent space does not satisfy the global disentanglement property itself, all global methods are bound to show a limited performance on it. Nevertheless, to the best of our knowledge, the global disentanglement property of a latent space has not been investigated except for the empirical observation of generated samples. In this regard, we need a local method that describes the local disentanglement property and an evaluation scheme for the global disentanglement property from the collected local information.

In this paper, we suggest that the semantic property of the latent space in GANs (i.e. disentanglement of semantics and image collapse) is closely related to its geometry, because of the sample-wise optimization nature of GANs. In this respect, we propose an unsupervised method to find a traversal direction based on the local structure of the intermediate latent space $\mathcal{W}$ , called Local Basis (Fig 1(a)). We approximate $\mathcal{W}$ with its submanifold representing its local principal variation, discovered in terms of the tangent space $T_{\mathbf{w}}\mathcal{W}$ . Local Basis is defined as an ordered basis of $T_{\mathbf{w}}\mathcal{W}$ corresponding to the approximating submanifold. Moreover, we show that Local Basis is obtained from the simple closed-form algorithm, that is the singular vectors of the Jacobian matrix of the subnetwork. The geometric interpretation of Local Basis provides an evaluation scheme for the global disentanglement property through the global warpage of the latent manifold. Our contributions are as follows:

1.

We propose Local Basis, a set of traversal directions that can reliably traverse without escaping from the latent space to prevent image collapse. The latent traversal along Local Basis corresponds to the local coordinate mesh of local-geometry-describing submanifold.
2.

We show that Local Basis leads to stable variation and better semantic factorization than global approaches. This result verifies our hypothesis on the close relationship between the semantic and geometric properties of the latent space in GANs.
3.

We propose Iterative Curve-Traversal method, which is a way to trace the latent space in the curved trajectory. The trajectory of the images with this method shows a more stable variation compared to the linear traversal.
4.

We introduce the metrics on the Grassmannian manifold to analyze the global geometry of the latent space through Local Basis. Quantitative analysis demonstrates that the $\mathcal{W}$ -space of StyleGAN2 is still globally warped. This result provides an explanation for the limited success of the global basis and proves the importance of local approaches.

2 Related Work

Style-based Generators.

In recent years, GANs equipped with style-based generators (Karras et al., 2019; 2020b) have shown state-of-the-art performance in high-fidelity image synthesis. The style-based generator consists of two parts: a mapping network and a synthesis network. The mapping network encodes the isotropic Gaussian noise $\mathbf{z}\in\mathcal{Z}$ to an intermediate latent vector $\mathbf{w}\in\mathcal{W}$ . The synthesis network takes $\mathbf{w}$ and generates an image while controlling the style of the image through $\mathbf{w}$ . Here, $\mathcal{W}$ -space is well known for providing a better disentanglement property compared to $\mathcal{Z}$ (Karras et al., 2019). However, there is still a lack of understanding about the effect of latent perturbation in a specific direction on the output image.

Latent Traversal for Image Manipulation.

The impressive success of GANs in producing high-quality images has led to various attempts to understand their latent space. Early approaches (Radford et al., 2016; Upchurch et al., 2017) show that vector arithmetic on the latent space for the semantics holds, and StyleGAN (Karras et al., 2019) shows that mixing two latent codes can achieve style transfer. Some studies have investigated the supervised learning of latent directions while assuming access to the semantic attributes of images (Goetschalckx et al., 2019; Jahanian et al., 2019; Shen et al., 2020; Yang et al., 2021; Abdal et al., 2021). In contrast to these supervised methods, some recent studies have suggested novel approaches that do not use the prior knowledge of training dataset, such as the labels of human facial attributes. In Voynov & Babenko (2020), an unsupervised optimization method is proposed to jointly learn a candidate matrix and a corresponding reconstructor, which identifies the semantic direction in the matrix. GANSpace (Härkönen et al., 2020) finds a global basis for $\mathcal{W}$ in StyleGAN using a PCA, enabling a fast image manipulation. SeFa (Shen & Zhou, 2021) focuses on the first weight parameter right after the latent code, suggesting that it contains essential knowledge of an image variation. SeFa proposes singular vectors of the first weight parameter as meaningful global latent directions. StyleCLIP (Patashnik et al., 2021) achieves a state-of-the-art performance in the text-driven image manipulation of StyleGAN. StyleCLIP introduces an additional training to minimize the CLIP loss (Radford et al., 2021).

Jacobian Decomposition.

Some works use the Jacobian matrix to analyze the latent space of GAN (Zhu et al., 2021; Wang & Ponce, 2021; Chiu et al., 2020; Ramesh et al., 2018). However, these methods focus on the Jacobian of the entire model, from the input noise $\mathbf{z}$ to the output image. Ramesh et al. (2018) suggested the right singular vectors of the Jacobian as local disentangled directions in the $\mathcal{Z}$ space. Zhu et al. (2021) proposed a latent perturbation vector that can change only a particular area of the image. The perturbation vector is discovered by taking the principal vector of the Jacobian to the target area and projecting it into the null space of the Jacobian to the complementary region. On the other hand, our Local Basis utilizes the Jacobian matrix of the partial network, from the input noise $z$ to the intermediate latent code $\mathbf{w}$ , and investigates the black-box intermediate latent space from it. The top- $k$ Local Basis corresponds to the best-local-geometry-describing submanifolds. This intuition leads to exploiting Local Basis to assess the global geometry of the intermediate latent space.

Refer to caption — (a) Concept Diagram of Local Basis

3 Traversing a curved latent space

In this section, we introduce a method for finding a local-geometry-aware traversal direction in the intermediate latent space $\mathcal{W}$ . The traversal direction is refered to as the Local Basis at $\mathbf{w}\in\mathcal{W}$ . In addition, we evaluate the proposed Local Basis by observing how the generated image changes as we traverse the intermediate latent variable. Throughout this paper, we assess Local Basis of the $\mathcal{W}$ -space in StyleGAN2 (Karras et al., 2020b). However, our methodology is not limited to StyleGAN2. See appendix for the results on StyleGAN (Karras et al., 2019) and BigGAN (Brock et al., 2018).

3.1 Finding a Local Basis

Given a pretrained GAN model $M:\mathcal{Z}\rightarrow\mathcal{X}$ , from the input noise space $\mathcal{Z}$ to the image space $\mathcal{X}$ , we choose the intermediate layer $\tilde{\mathcal{W}}$ to discover Local Basis. We refer to the former part of the GAN model as the mapping network $f:\mathcal{Z}\rightarrow\tilde{\mathcal{W}}$ . The image of the mapping network is denoted as $\mathcal{W}=f(\mathcal{Z})\subset\tilde{\mathcal{W}}$ . The latter part, a non-linear mapping from $\tilde{\mathcal{W}}$ to the image space $\mathcal{X}$ , is denoted by $G:\tilde{\mathcal{W}}\rightarrow\mathcal{X}$ . Local Basis at $\mathbf{w}\in\mathcal{W}$ is defined as the basis of the tangent space $T_{\mathbf{w}}\mathcal{W}$ . This basis can be interpreted as a local-geometry-aware linear traversal direction starting from $\mathbf{w}$ .

To define the tangent space of the intermediate latent space $\mathcal{W}$ properly, we assume that $\mathcal{W}$ is a differentiable manifold. Note that the support of the isotropic Gaussian prior $\mathcal{Z}=\mathbb{R}^{d_{\mathcal{Z}}}$ and the ambient space $\tilde{\mathcal{W}}=\mathbb{R}^{d_{\tilde{\mathcal{W}}}}$ are already differentiable manifolds. The tangent space at $\mathbf{w}$ , denoted by $T_{\mathbf{w}}\mathcal{W}$ , is a vector space consisting of tangent vectors of curves passing through point $\mathbf{w}$ . Explicitly,

T_{\mathbf{w}}\mathcal{W}=\{\,\dot{\gamma}(0)\mid\gamma:(-\epsilon,\epsilon)\rightarrow\mathcal{W},\,\gamma(0)=\mathbf{w},\text{ for }\epsilon>0\}.

(1)

Then, the differentiable mapping network $f$ gives a linear map $df_{\mathbf{z}}$ between the two tangent spaces $T_{\mathbf{z}}\mathcal{Z}$ and $T_{\mathbf{w}}\tilde{\mathcal{W}}$ where $\mathbf{w}=f(\mathbf{z})$ .

df_{\mathbf{z}}:T_{\mathbf{z}}\mathcal{Z}\longrightarrow T_{\mathbf{w}}\mathcal{W}\lhook\joinrel\longrightarrow T_{\mathbf{w}}\tilde{\mathcal{W}},\quad\dot{\gamma}(0)\longmapsto\dot{(f\circ\gamma)}(0)

(2)

We utilize the linear map $df_{\mathbf{z}}$ , called the differential of $f$ at $\mathbf{z}$ , to find the basis of $T_{\mathbf{w}}\mathcal{W}$ . Based on the manifold hypothesis in representation learning, we posit that the latent space of the image space $\mathcal{X}$ in $\tilde{\mathcal{W}}$ is a lower-dimensional manifold embedded in $\mathcal{W}$ . In this approach, we estimate the latent manifold as a lower-dimensional approximation of $\mathcal{W}$ describing its principal variations. The approximation manifold can be obtained by solving the low-rank approximation problem of $df_{\mathbf{z}}$ . The manifold hypothesis is supported by the empirical distribution of singular values $\sigma^{\mathbf{z}}_{i}$ . The analysis is provided in Fig 9 in the appendix.

The low-rank approximation problem has an analytic solution defined by Singular Value Decomposition (SVD). Because the matrix representation of $df_{\mathbf{z}}$ is a Jacobian matrix $(\nabla_{\mathbf{z}}f)(\mathbf{z})\in\mathbb{R}^{d_{\tilde{\mathcal{W}}}\times d_{\mathcal{Z}}}$ , Local Basis is obtained as the following: For the $i$ -th right singular vector $\mathbf{u}^{\mathbf{z}}_{i}\in\mathbb{R}^{d_{\mathcal{Z}}}$ , $i$ -th left singular vector $\mathbf{v}^{\mathbf{w}}_{i}\in\mathbb{R}^{d_{\tilde{\mathcal{W}}}}$ , and $i$ -th singular value $\sigma^{\mathbf{z}}_{i}\in\mathbb{R}$ of $(\nabla_{\mathbf{z}}f)(\mathbf{z})$ with $\sigma^{\mathbf{z}}_{1}\geq\cdots\geq\sigma^{\mathbf{z}}_{n}$ ,

	$\displaystyle df_{\mathbf{z}}(\mathbf{u}^{\mathbf{z}}_{i})$	$\displaystyle=\sigma^{\mathbf{z}}_{i}\cdot\mathbf{v}^{\mathbf{w}}_{i}\,\,\text{for}\,\,\forall i,$		(3)
	$\displaystyle\text{Local Basis}(\mathbf{w}=f(\mathbf{z}))$	$\displaystyle=\{\mathbf{v}^{\mathbf{w}}_{i}\}_{1\leq i\leq n}.$		(4)

Then, the $k$ -dimensional approximation of $\mathcal{W}$ around $\mathbf{w}$ is described as the following because $\mathcal{Z}=\mathbb{R}^{d_{\mathcal{Z}}}$ (if $\sigma^{\mathbf{z}}_{k}>0$ ). Note that $\mathcal{W}^{k}_{\mathbf{w}}$ is a submanifold¹¹1Strictly speaking, $\mathcal{W}^{k}_{\mathbf{w}}$ may not satisfy the conditions of the submanifold. The injectivity of $df_{\mathbf{z}}$ on the domain $\left\{\mathbf{z}+\sum_{i}t_{i}\cdot\mathbf{u}^{\mathbf{z}}_{i}\mid t_{i}\in(-\epsilon_{i},\epsilon_{i}),\text{ for }1\leq i\leq k\right\}$ is a sufficient condition for the submanifold. As described below, this sufficient condition is satisfied under the locally affine mapping network $f$ and $\sigma^{\mathbf{z}}_{k}>0$ . of $\mathcal{W}$ corresponding to the $k$ components of Local Basis, i.e. $T_{\mathbf{w}}\mathcal{W}^{k}_{\mathbf{w}}=\operatorname{span}\{\mathbf{v}^{\mathbf{w}}_{i}:1\leq i\leq k\}$ .

\mathcal{W}^{k}_{\mathbf{w}}=\left\{f\left(\mathbf{z}+\sum_{i}t_{i}\cdot\mathbf{u}^{\mathbf{z}}_{i}\right)\mid t_{i}\in(-\epsilon_{i},\epsilon_{i}),\text{ for }1\leq i\leq k\right\}

(5)

Locally affine mapping network

In this paragraph, we focus on the locally affine mapping network $f$ , which is one of the most widely adopted GAN structures, such as MLP or CNN layers with ReLU or leaky-ReLU activation functions. This type of mapping network has several well-suited properties for Local Basis.

f(\mathbf{z})=\sum_{p\in\Omega}\mathbf{1}_{\mathbf{z}\in p}\left(\mathbf{A}_{p}\mathbf{z}+\mathbf{b}_{p}\right)

(6)

where $\Omega$ denotes a partition of $\mathcal{Z}$ , and $\mathbf{A}_{p}$ and $\mathbf{b}_{p}$ are the parameters of the local affine map. With this type of mapping network $f$ , it is clear that the intermediate latent space $\mathcal{W}$ satisfies a differentiable manifold property at least locally on the interior of each $p\in\Omega$ . The region where the property may not hold, the intersection of several closure of $p$ ’s in $\Omega$ , has measure zero in $\mathcal{Z}$ .

Moreover, the Jacobian matrix $(\nabla_{\mathbf{z}}f)(\mathbf{z})$ becomes a locally constant matrix. Then, the approximating manifold $\mathcal{W}^{k}_{\mathbf{w}}$ (Eq 5) satisfies the submanifold condition, and is consistent locally for each $p$ , avoiding being defined for each $\mathbf{w}$ . In addition, the linear traversal of the latent variable $\mathbf{w}$ along $\mathbf{v}^{\mathbf{w}}_{i}$ can be described as the curve on $\mathcal{W}$ (Eq 7). Most importantly, these curves on $\mathcal{W}$ (Eq 7), starting from $\mathbf{w}$ in the direction of Local Basis, corresponds to the local coordinate mesh of $\mathcal{W}^{k}_{\mathbf{w}}$ .

\text{Traversal}(\mathbf{w}=f(\mathbf{z}),\mathbf{v}^{\mathbf{w}}_{i}):(-\epsilon,\epsilon)\longrightarrow\mathcal{Z}\xrightarrow{\,\,f\,\,}\mathcal{W},\quad t\mapsto\left(\mathbf{z}+\frac{t}{\sigma^{\mathbf{z}}_{i}}\cdot\mathbf{u}^{\mathbf{z}}_{i}\right)\mapsto\left(\mathbf{w}+t\cdot\mathbf{v}^{\mathbf{w}}_{i}\right)

(7)

Equivalence to Local PCA

To provide additional intuition about Local Basis, we prove the following proposition. The proposition shows that Local Basis is equivalent to applying a PCA on the samples on $\mathcal{W}$ around $\mathbf{w}$ .

Proposition 1 (Equivalence to Local PCA).

Consider the Local PCA problem around the base latent variable $\mathbf{w}_{b}=f(\mathbf{z}_{b})$ on $\mathcal{W}$ , i.e. PCA of the latent variable samples $\mathbf{w}^{\prime}$ around $\mathbf{w}_{b}$ .

\mathbf{w}^{\prime}=T_{1}f(\mathbf{z}_{b}+c\cdot\epsilon)\quad\text{with }\epsilon\sim N(0,I)\text{ and for some small }c>0.

(8)

where $T_{1}f(\mathbf{z})=\mathbf{w}_{b}+(\nabla_{\mathbf{z}_{b}}f)(\mathbf{z}-\mathbf{z}_{b})$ is the linear approximation of $f$ around $\mathbf{z}_{b}$ . Then, the principal components discovered in the Local PCA problem are equivalent to Local Basis at $\mathbf{w}_{b}$ .

3.2 Iterative Curve-Traversal

We suggest a natural curve-traversal that can keep track of the $\mathcal{W}$ -manifold and an iterative method to implement it. We divide the long curved trajectory into small pieces and approximate each piece by the local curves using Local Basis. We call this curve-traversal Iterative Curve-Traversal method. Consistent with the linear traversal method, we consider Iterative Curve-Traversal $\gamma$ departing in the direction of a Local Basis. Explicitly, for a sufficiently large $c>0$ ,

\gamma:(-c,c)\longrightarrow\mathcal{W},\quad\gamma(0)=\mathbf{w},\,\,\dot{\gamma}(0)=\mathbf{v}_{k}^{\mathbf{w}}\quad\text{ for some }1\leq k\leq d_{\mathcal{W}}

(9)

where $\{\mathbf{v}_{i}^{\mathbf{w}}\}_{i}=\text{Local Basis}(\mathbf{w})$ . We split the curve-traversal $\gamma$ into $N$ pieces $\gamma_{n}$ and denote each $n$ -th iterate in $\mathcal{W}$ and $\mathcal{Z}$ as $\mathbf{w}_{n}$ and $\mathbf{z}_{n}$ for $1\leq n\leq N$ . The starting point of the traversal is denoted as the $0$ -th iterate $\mathbf{w}=\mathbf{w}_{0}$ , $\mathbf{z}=\mathbf{z}_{0}$ , and $\mathbf{w}_{0}=f(\mathbf{z}_{0})$ . (Fig 2) Note that to find Local Basis at $\mathbf{w}_{n}$ , we need a corresponding $\mathbf{z}_{n}\in\mathcal{Z}$ such that $\mathbf{w}_{n}=f(\mathbf{z}_{n})$ .

Below, we describe the positive part $\gamma^{+}=\gamma|_{[0,c)}$ of Iterative Curve-Traversal. For the negative part, we repeat the same procedure using the reversed tangent vector $-\mathbf{v}_{k}^{\mathbf{w}}$ . The first step $\gamma_{1}^{+}$ of Iterative Curve-Traversal method with perturbation intensity $I$ is as follows:

	$\displaystyle\gamma_{1}^{+}:\left[0,I/(N\cdot\sigma_{k}^{\mathbf{z}_{0}})\right]\longrightarrow\mathcal{Z}\xrightarrow{\,\,f\,\,}\mathcal{W},$	$\displaystyle\quad t\longmapsto\left(\mathbf{z}_{0}+t\cdot\mathbf{u}_{k}^{\mathbf{z}_{0}}\right)\longmapsto f(\mathbf{z}_{0}+t\cdot\mathbf{u}_{k}^{\mathbf{z}_{0}})$		(10)
	$\displaystyle\mathbf{z}_{1}=\mathbf{z}_{0}+\frac{I}{(N\cdot\sigma_{k}^{\mathbf{z}_{0}})}\mathbf{u}_{k}^{\mathbf{z}_{0}},$	$\displaystyle\quad\mathbf{w}_{1}=f(\mathbf{z}_{1})$		(11)

Note that $\mathbf{w}_{1}$ is the endpoint of the curve $\gamma_{1}^{+}$ and $\dot{\gamma}_{1}^{+}(0)=\mathbf{v}_{k}^{\mathbf{w}}$ . We scale the step size in $\mathcal{Z}$ by $1/\sigma_{k}^{\mathbf{z}_{0}}$ to ensure each piece of curve has a similar length of $(I/N)$ . To preserve the variation in semantics during the traversal, the departure direction of $\gamma_{2}^{+}$ is determined by comparing the similarity between the previous departure direction $\mathbf{v}_{k}^{\mathbf{w}_{0}}$ and Local Basis at $\mathbf{w}_{1}$ . The above process is repeated $N$ -times. (The algorithm for Iterative Curve-Traversal can be found in the appendix.)

\dot{\gamma}_{2}^{+}(0)=\mathbf{v}_{j}^{\mathbf{w}_{1}}\quad\text{where}\quad j=\operatornamewithlimits{argmax}_{1\leq i\leq d_{\mathcal{W}}}\left|\langle\mathbf{v}_{k}^{\mathbf{w}_{0}},\mathbf{v}_{i}^{\mathbf{w}_{1}}\rangle\right|

(12)

3.3 Results of Local Basis traversal

We evaluate Local Basis by observing how the generated image changes as we traverse $\mathcal{W}$ -space in StyleGAN2 and by measuring FID score for each perturbation intensity. The evaluation is based on two criteria: Robustness and Semantic Factorization.

Robustness

Fig 3 and 4 present the Robustness Test results²²2Ramesh et al. (2018) is not compared because it took hours to get a traversal direction of an image. See appendix for the Qualitative Robustness Test results of Ramesh et al. (2018).. In Fig 3, the traversal image of Local Basis is compared with those of the global methods (GANSpace (Härkönen et al., 2020) and SeFa (Shen & Zhou, 2021)) under the strong perturbation intensity of 12 along the $1$ st and $2$ nd direction of each method. The perturbation intensity is defined as the traversal path length in $\mathcal{W}$ . The two global methods show severe degradation of the image compared to Local Basis. Moreover, we perform a quantitative assessment of robustness. We measure the FID score for 10,000 traversed images for each perturbation intensity. In Fig 4, the global methods show the relatively small FID under the small perturbation. But, as we impose the stronger perturbation, the FID scores on the global methods increase sharply, implying the image collapse in Fig 3. By contrast, Local Basis achieves much smaller FID scores with and without Iterative Curve-Traversal.

We interpret the degradation of image as due to the deviation of trajectory from $\mathcal{W}$ . The theoretical interpretation shows that the linear traversal along Local Basis corresponds to a local coordinate axis on $\mathcal{W}$ , at least locally. Therefore, the traversal along Local Basis is guaranteed to stay close to $\mathcal{W}$ even under the longer traversal. However, we cannot expect the same property on the global basis because it is based on the global geometry. Iterative Curve-Traversal shows more stable traversal because of its stronger tracing to the latent manifold. This further supports our interpretation.

Semantic Factorization

Local Basis is discovered in terms of singular vectors of $df_{\mathbf{z}}$ . The disentangled correspondence, between Local Basis and the corresponding singular vectors in the prior space, induces a semantic-factorization in Local Basis. Fig 5 and 6 presents the semantics of the image discovered by Local Basis. In Fig 5, we compare the semantic factorizations of Local Basis and GANSpace (Härkönen et al., 2020) for the particular semantics discovered by GANSpace. For each interpretable traversal direction of GANSpace provided by the authors, the corresponding Local Basis is chosen by the one with the highest cosine similarity. For a fair comparison, each traversal is applied to the specific subset of layers in the synthesis network (Karras et al., 2020b) provided by the authors of GANSpace with the same perturbation intensity. In particular, as we impose the stronger perturbation (from left to right), GANSpace shows the image collapse in Fig 5(a) and entanglement of semantics (Glasses + Head Raising) in Fig 5(d). However, Local Basis does not show any of those problems. Fig 6 provides additional examples of semantic factorization where the latent traversal is applied to a subset of layers predefined in StyleGAN. The subset of the layers is selected as one of four, i.e. coarse, middle, fine, or all styles. Local Basis shows decent factorization of semantics such as Body Length of car and Age of cat in LSUN (Yu et al., 2015) in Fig 6.

3.4 Exploration inside Abstract Semantics

Abstract semantics of image often consists of several lower-level semantics. For instance, Old can be represented as the correlated distribution of Hair color, Wrinkle, Face length, etc. In this section, we show that the adaptation of Iterative Curve-Traversal can explore the variation of abstract semantics, which is represented by the cone-shaped region of the generative factors (Träuble et al., 2021).

Because of its text-driven nature, we utilize the global basis³³3We use the global basis defined on $\mathcal{W}^{+}$ (Tov et al., 2021). See the appendix for detail. from StyleCLIP (Patashnik et al., 2021) corresponding to the abstract semantics of Old. Then, we consider the modification of Iterative Curve-Traversal following the given global basis $\mathbf{v}_{global}$ . To be more specific, the departure direction of each piece of curve $\gamma_{i}$ in Eq 12 is chosen by the similarity to $\mathbf{v}_{global}$ , not by the similarity to previous departure direction. The results for old are provided in Fig 7. (See the appendix for other examples.) Step size denotes the length of each piece of curve, i.e. $(I/N)$ in Sec 3.2. For a fair comparison, the overall perturbation intensity $I$ is fixed to $4$ by adjusting the number of steps $N$ . The linear traversal along the global basis adds only wrinkles to the image and the image collapses shortly. On the contrary, both Iterative Curve-Traversal methods obtain the diverse and high-fidelity image manipulation for the target semantics old. In particular, the diversity is greatly increased as we add stochasticity to the step size. We interpret this diversity as a result of the increased exploration area from the stochastic step size while exploiting the high-fidelity of Iterative Curve-Traversal.

4 Evaluating warpage of $\mathcal{W}$ -Manifold

In this section, we provide an explanation for the limited success of the global basis in $\mathcal{W}$ -space of StyleGAN2. In Sec 3, we showed that Local Basis corresponds to the generative factors of data. Hence, the linear subspace spanned by Local Basis, which is the tangent space $T_{\mathbf{w}}\mathcal{W}^{k}_{\mathbf{w}}$ in Eq 5, describes the local principal variation of image. In this regard, we assess the global disentanglement property by evaluating the consistency of the tangent space at each $\mathbf{w}\in\mathcal{W}$ . We refer to the inconsistency of the tangent space as the warpage of the latent manifold. Our evaluation proves that $\mathcal{W}$ -manifold is warped globally. In this section, we present the quantitative evaluation of the global warpage by introducing the Grassmannian Metric. The qualitative evaluation by observing the subspace traversal is provided in the appendix. The subspace traversal denotes a simultaneous traversal in multiple directions.

Grassmannian Manifold

Let $V$ be the vector space. The Grassmannian manifold $\text{Gr}(k,V)$ (Boothby, 1986) is defined as the set of all $k$ -dimensional linear subspaces of $V$ . We evaluate the global warpage of $\mathcal{W}$ -manifold by measuring the Grassmannian distance between the linear subspaces spanned by top- $k$ Local Basis of each $\mathbf{w}\in\mathcal{W}$ . The reason for measuring the distance for top- $k$ Local Basis is the manifold hypothesis. The linear subspace spanned by top- $k$ Local Basis corresponds to the tangent space of the $k$ -dimensional approximation of $\mathcal{W}$ (Eq 5). From this perspective, a large Grassmannian distance means that the $k$ -dimensional local approximation of $\mathcal{W}$ severely changes. Likewise, we consider the subspace spanned by the top- $k$ components for the global basis. In this study, two types of metrics (i.e. Projection metric and Geodesic metric) are adopted as metrics of the Grassmannian manifold.

Grassmannian Metric

First, for two subspaces $W,W^{\prime}\in\text{Gr}(k,V)$ , let the projection into each subspace be $P_{W}$ and $P_{W^{\prime}}$ , respectively. Then the Projection Metric (Karrasch, 2017) on Gr $(k,V)$ is defined as follows.

d_{\mathrm{proj}}\left(W,W^{\prime}\right)=\left\|P_{W}-P_{W^{\prime}}\right\|

(13)

where $||\cdot||$ denotes the operator norm.

Second, let $M_{W},M_{W^{\prime}}\in\mathbb{R}^{d_{V}\times k}$ be the column-wise orthonormal matrix of which columns span $W,W^{\prime}\in\text{Gr}(k,V)$ , respectively. Then, the Geodesic Metric (Ye & Lim, 2016) on Gr $(k,V)$ , which is induced by canonical Riemannian structure, is formulated as follows.

d_{\mathrm{geo}}(W,W^{\prime})=\left(\sum^{k}_{i=1}\theta^{2}_{i}\right)^{1/2}

(14)

where $\theta_{i}=\cos^{-1}(\sigma_{i}(M_{W}^{\top}\,M_{W^{\prime}}))$ denotes the $i$ -th principal angle between $W$ and $W^{\prime}$ .

Evaluation

We evaluate the global warpage of the $\mathcal{W}$ -manifold by comparing the five distances as we vary the subspace dimension.

1.

Random $\boldsymbol{O(d)}$ : Between two random basis of $\mathbb{R}^{d_{\mathcal{W}}}$ uniformly sampled from $O(d_{\mathcal{W}})$
2.

Random $\boldsymbol{\mathbf{w}}$ : Between two Local Basis from two random $\mathbf{w}\in\mathcal{W}$

Close $\boldsymbol{\mathbf{w}}$ : Between two Local Basis from two close $\mathbf{w}^{\prime},\mathbf{w}\in\mathcal{W}$ (See appendix for the Grassmannian metric with various $\epsilon=|\mathbf{z}^{\prime}-\mathbf{z}|$ .)

\mathbf{w}^{\prime}=f(\mathbf{z}^{\prime}),\quad\mathbf{w}=f(\mathbf{z})\qquad\text{ where }|\mathbf{z}^{\prime}-\mathbf{z}|=0.1

(15)

4.

To global GANSpace: Between Local Basis and the global basis from GANSpace
5.

To global SeFa: Between Local Basis and the global basis from SeFa

Fig 8 shows the above five Grassmannian metrics. We report the metric results from 100 samples for the Random $O(d_{\mathcal{W}})$ and 1,000 samples for the others. The Projection metric increases in order of Close $\mathbf{w}$ , To global GANSpace, Random $\mathbf{w}$ , To global SeFa, and Random $O(d_{\mathcal{W}})$ . For the Geodesic metric, the order is reversed for Random $\mathbf{w}$ and To global SeFa. Most importantly, the Random $\mathbf{w}$ metric is much larger than Close $\mathbf{w}$ . This shows that there is a large variation of Local Basis on $\mathcal{W}$ , which proves that $\mathcal{W}$ -space is globally warped. In addition, Close $\mathbf{w}$ metric is always significantly smaller than the others, which implies the local consistency of Local Basis on $\mathcal{W}$ . Finally, the metric results prove the existence of limited global disentanglement on $\mathcal{W}$ . Random $\mathbf{w}$ is smaller than Random $O(d)$ . This order shows that Local Basis on $\mathcal{W}$ is not completely random, which implies the existence of a global alignment. In this regard, both To global results prove that the global basis finds the global alignment to a certain degree. To global GANSpace lies in between Close $\mathbf{w}$ and Random $\mathbf{w}$ . To global SeFa does so on the Geodesic metric and is similar to Random $\mathbf{w}$ on the Projection metric. However, the large gap between Close $\mathbf{w}$ and both To global implies that the discovered global alignment is limited.

5 Conclusion

In this work, we proposed a method for finding a meaningful traversal direction based on the local-geometry of the intermediate latent space of GANs, called Local Basis. Motivated by the theoretical explanation of Local Basis, we suggest experiments to evaluate the global geometry of the latent space and an iterative traversal method that can trace the latent space. The experimental results demonstrate that Local Basis factorizes the semantics of images and provides a more stable transformation of images with and without the proposed iterative traversal. Moreover, the suggested evaluation of the $\mathcal{W}$ -space in StyleGAN2 proves that the $\mathcal{W}$ -space is globally distorted. Therefore, the global method can find a limited global consistency from $\mathcal{W}$ -space.

Acknowledgement

This work was supported by the NRF grant [2021R1A2C3010887], the ICT R&D program of MSIT/IITP [2021-0-00077] and MOTIE [P0014715].

Ethics Statement

The limitations and the potential negative societal impacts of our work are that Local Basis would reflect the bias of data. The GANs learn the probability distribution of data through samples from it. Thus, unlike the likelihood-based method such as Variational Autoencoder (Kingma & Welling, 2014) and Flow-based models (Kingma & Dhariwal, 2018), the GANs are more likely to amplify the dependence between the semantics of data, even the bias of it. Because Local Basis finds a meaningful traversal direction based on the local-geometry of latent space, Local Basis would show the bias of data as it is. Moreover, if Local Basis is applied to real-world problems like editing images, Local Basis may amplify the bias of society. However, in order to fix a problem, we have to find a method to analyze it. In this respect, Local Basis can serve as a tool to analyze the bias.

Reproducibility Statement

To ensure the reproducibility of this study, we attached the entire source code in the supplementary material. Every figure can be reproduced by running the jupyter notebooks in notebooks/*. In addition, the proof of Proposition 1 is included in the appendix.

References

Abdal et al. (2019) Rameen Abdal, Yipeng Qin, and Peter Wonka. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4432–4441, 2019.
Abdal et al. (2021) Rameen Abdal, Peihao Zhu, Niloy J Mitra, and Peter Wonka. Styleflow: Attribute-conditioned exploration of stylegan-generated images using conditional continuous normalizing flows. ACM Transactions on Graphics (TOG), 40(3):1–21, 2021.
Bengio et al. (2013) Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
Boothby (1986) William M Boothby. An introduction to differentiable manifolds and Riemannian geometry. Academic press, 1986.
Brock et al. (2018) Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2018.
Chiu et al. (2020) Chia-Hsing Chiu, Yuki Koyama, Yu-Chi Lai, Takeo Igarashi, and Yonghao Yue. Human-in-the-loop differential subspace search in high-dimensional latent space. ACM Transactions on Graphics (TOG), 39(4):85–1, 2020.
Goetschalckx et al. (2019) Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. Ganalyze: Toward visual definitions of cognitive image properties. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5744–5753, 2019.
Goodfellow et al. (2014) Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron C Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
Härkönen et al. (2020) Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems, 33, 2020.
Heusel et al. (2017) Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
Jahanian et al. (2019) Ali Jahanian, Lucy Chai, and Phillip Isola. On the” steerability” of generative adversarial networks. In International Conference on Learning Representations, 2019.
Karras et al. (2018) Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
Karras et al. (2019) Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
Karras et al. (2020a) Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676, 2020a.
Karras et al. (2020b) Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119, 2020b.
Karrasch (2017) Daniel Karrasch. An introduction to grassmann manifolds and their matrix representation. 2017.
Kingma & Dhariwal (2018) Diederik P Kingma and Prafulla Dhariwal. Glow: generative flow with invertible 1 $\times$ 1 convolutions. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 10236–10245, 2018.
Kingma & Welling (2014) Diederik P Kingma and Max Welling. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
Patashnik et al. (2021) Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. Styleclip: Text-driven manipulation of stylegan imagery. arXiv preprint arXiv:2103.17249, 2021.
Pfau et al. (2020) David Pfau, Irina Higgins, Aleksandar Botev, and Sébastien Racanière. Disentangling by subspace diffusion. arXiv preprint arXiv:2006.12982, 2020.
Plumerault et al. (2020) Antoine Plumerault, Hervé Le Borgne, and Céline Hudelot. Controlling generative models with continuous factors of variations. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1laeJrKDB.
Radford et al. (2016) Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
Radford et al. (2021) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. Image, 2:T2, 2021.
Ramesh et al. (2018) Aditya Ramesh, Youngduck Choi, and Yann LeCun. A spectral regularizer for unsupervised disentanglement. arXiv preprint arXiv:1812.01161, 2018.
Shen & Zhou (2021) Yujun Shen and Bolei Zhou. Closed-form factorization of latent semantics in gans. In CVPR, 2021.
Shen et al. (2020) Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252, 2020.
Tov et al. (2021) Omer Tov, Yuval Alaluf, Yotam Nitzan, Or Patashnik, and Daniel Cohen-Or. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
Träuble et al. (2021) Frederik Träuble, Elliot Creager, Niki Kilbertus, Francesco Locatello, Andrea Dittadi, Anirudh Goyal, Bernhard Schölkopf, and Stefan Bauer. On disentangled representations learned from correlated data. In International Conference on Machine Learning, pp. 10401–10412. PMLR, 2021.
Upchurch et al. (2017) Paul Upchurch, Jacob Gardner, Geoff Pleiss, Robert Pless, Noah Snavely, Kavita Bala, and Kilian Weinberger. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7064–7073, 2017.
Voynov & Babenko (2020) Andrey Voynov and Artem Babenko. Unsupervised discovery of interpretable directions in the gan latent space. In International Conference on Machine Learning, pp. 9786–9796. PMLR, 2020.
Wang & Ponce (2021) Binxu Wang and Carlos R Ponce. The geometry of deep generative image models and its applications. arXiv preprint arXiv:2101.06006, 2021.
Yang et al. (2021) Ceyuan Yang, Yujun Shen, and Bolei Zhou. Semantic hierarchy emerges in deep generative representations for scene synthesis. International Journal of Computer Vision, pp. 1–16, 2021.
Ye & Lim (2016) Ke Ye and Lek-Heng Lim. Schubert varieties and distances between subspaces of different dimensions. SIAM Journal on Matrix Analysis and Applications, 37(3):1176–1197, 2016.
Yu et al. (2015) Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
Zhu et al. (2021) Jiapeng Zhu, Ruili Feng, Yujun Shen, Deli Zhao, Zhengjun Zha, Jingren Zhou, and Qifeng Chen. Low-rank subspaces in gans. arXiv preprint arXiv:2106.04488, 2021.

Appendix A Proposition proof

Denote $(\nabla_{\mathbf{z}_{b}}f)$ by $J$ . Then, from $\mathbf{w}^{\prime}=T_{1}f(\mathbf{z}_{b}+c\cdot\epsilon)$ ,

\mathbf{w}^{\prime}=\mathbf{w}_{b}+c\cdot J\epsilon\,\,\sim\,\,N(\mathbf{w}_{b},\,c^{2}\cdot JJ^{\intercal})

(16)

The first principal component $\mathbf{v}_{1}$ is the vector such that $\mathbf{v}_{1}^{\intercal}(c\cdot J\epsilon)$ has the maximum variance.

\textrm{Var}(\mathbf{v}_{1}^{\intercal}(c\cdot J\epsilon))=c^{2}\cdot||\mathbf{v}_{1}^{\intercal}J||_{2}^{2}

(17)

Therefore,

\mathbf{v}_{1}=\underset{||\mathbf{v}||_{2}=1}{\operatorname{argmax}}\,\,\textrm{Var}(\mathbf{v}^{\intercal}(c\cdot J\epsilon))=\underset{||\mathbf{v}||_{2}=1}{\operatorname{argmax}}\,\,{||J^{\intercal}\cdot\mathbf{v}||}_{2}

(18)

Clearly, $\mathbf{v}_{1}$ corresponds to the first right singular vector of $J^{\intercal}$ , i.e. the first left singular vector of $J$ , from the linear operator norm maximizing property of singular vectors. Inductively, $k$ -th principal component $\mathbf{v}_{k}$ is the vector such that

\mathbf{v}_{k}=\underset{||\mathbf{v}||_{2}=1}{\operatorname{argmax}}\,\,\textrm{Var}(\mathbf{v}^{\intercal}(c\cdot J\epsilon))\quad\text{ where }\quad\mathbf{v}_{k}\perp\{\mathbf{v}_{1},\mathbf{v}_{2},\cdots,\mathbf{v}_{k-1}\}

(19)

Thus, $\mathbf{v}_{k}$ becomes the $k$ -th left singular vector of $J$ . Therefore, the principal components from the Local PCA problem are equivalent to Local Basis at $\mathbf{w}_{b}$ .

Appendix B Algorithm

Algorithm 1 Local Basis

z\in\mathbb{R}^{d_{\mathcal{Z}}}

is the input code.

f:\mathbb{R}^{d_{\mathcal{Z}}}\rightarrow\mathbb{R}^{d_{\mathcal{W}}}

is the mapping network.

5:LocalBasis(

z,f

)

w\leftarrow f(z)

J\in\mathbb{R}^{d_{\mathcal{W}}\times d_{\mathcal{Z}}}\leftarrow\textsc{Jacobian}(z,w)

U,S,V\leftarrow\textsc{SVD}(J)

9:return

\{U,S,V\}

Algorithm 2 Iterative Curve-Traversal along positive direction

z\in\mathbb{R}^{d_{\mathcal{Z}}}

is the input code.

f:\mathbb{R}^{d_{\mathcal{Z}}}\rightarrow\mathbb{R}^{d_{\mathcal{W}}}

is the mapping network.

k\in[1,\min\left\{d_{\mathcal{Z}},d_{\mathcal{W}}\right\}]

is the ordinal number of direction to traverse.

I

is the total perturbation intensity.

N\geq 1

is the number of iterations.

8:IterativeTraversal(

z,f,k,I,N

)

z_{0}\leftarrow z

10:

c\leftarrow ones(d_{\mathcal{Z}},1)

11:for

i\in[0,N)

12:

U,S,V\leftarrow\textsc{LocalBasis}(z_{i},f)

13: if i ¿ 0 then

14:

c\leftarrow U^{T}\cdot u_{i-1}

15:

k\leftarrow\arg\max(|c|)

\triangleright

The row number most similar to the previously selected basis.

16: end if

17:

u_{i},\,v_{i}\leftarrow\operatorname{sign}(c_{k})U_{k},\,\operatorname{sign}(c_{k})V_{k}

\triangleright

Aligns with previous orientation

18:

s_{i}\leftarrow S_{kk}

19:

z_{i+1}\leftarrow z_{i}+\frac{I}{s_{i}\cdot N}v_{i}

20:end for

21:return

\{z_{0},...,z_{N}\}

Appendix C Model and Computation Resource Details

Model

We evaluate GANSpace (Härkönen et al., 2020), SeFa (Shen & Zhou, 2021), and Local Basis on StyleGAN2 models for FFHQ (Karras et al., 2019) and LSUN (Yu et al., 2015) provided by the authors (Karras et al., 2020b).

Computation Resource

We generated Latent traversal results on the environment of TITAN RTX with Intel(R) Xeon(R) Gold 5220 CPU @ 2.20GHz. However, it requires a low computational cost to get a Local Basis. For example, on the environment of GTX 1660 with Ryzen 5 2600, computing a Local Basis takes about 0.05 seconds.

Appendix D Code License

The files models/wrappers.py, notebooks/ganspace_utils.py and notebooks/notebook_utils.py are a derivative of the GANSpace, and are provided under the Apache 2.0 license. The directory netdissect is a derivative of the GAN Dissection project, and is provided under the MIT license. The directories models/biggan and models/stylegan2 are provided under the MIT license.

Appendix E Distribution of Sigular Values of Jacobian

Appendix F Grassmannian Metric

Appendix G More Latent Traversal Examples

Appendix H Implementation Details for Sec 3.4

In Sec 3.4, we utilize the global basis from StyleCLIP (Patashnik et al., 2021) defined on $\mathcal{W}^{+}$ , the layer-wise extension of $\mathcal{W}$ introduced in (Abdal et al., 2019; Patashnik et al., 2021). To be more specific, since the synthesis network in StyleGAN has 18 layers, we obtain an extended latent code $\mathbf{w}^{+}\in\mathcal{W}^{+}$ defined by the concatenation of latent codes $\mathbf{w}_{i}\in\mathcal{W}$ of dimension 512 for each $i$ -th layer and it can be described as follows:

\mathbf{w}^{+}=(\mathbf{w}_{1},\mathbf{w}_{2},\cdots,\mathbf{w}_{18})\in\mathbb{R}^{512\times 18}.

(20)

Note that our Iterative Curve-Traversal originally defined on $\mathcal{W}$ has a canonical extension to $\mathcal{W}^{+}$ without additional changes in structures or methodologies.

For implementing the stochastic Iterative Curve-Traversal introduced in Sec 3.4, we firstly find a global basis on $\mathcal{W}^{+}$ using StyleCLIP (Patashnik et al., 2021) which implies a given semantic attribute in the form of text (e.g. old). Now denote the global basis by $\mathbf{v}_{global}^{+}$ , which can be represented as follows:

\mathbf{v}_{global}^{+}=(\mathbf{v}_{1}^{global},\mathbf{v}_{2}^{global},\cdots,\mathbf{v}_{18}^{global}).

(21)

Then we perform the (extended) Iterative Curve-Traversal following $\mathbf{v}_{global}^{+}$ , equipped with a stochastic movement for each step. In practice, we consider two options to choose a traversal direction for each step; First, follow the direction most similar to the previously selected basis (as Algorithm 2), except for the first iteration. Note that we compute the similarity between the local basis and the global basis only at once when choosing the first traversal direction. Second, follow the direction most similar to the given global basis. This is slightly different from our Algorithm 2, however, we empirically verify that setting the exploration in that way leads to a more desirable image change.

Fig 14 shows that the first method still preserves the image quality well, but it does not guarantee that the desired direction of image change, namely ‘old’. We speculate the reason why such phenomenon occurs is that most of the information contained in the meaningful global basis disappears after the first step (a unique, direct comparison to the global basis), although our methodology guarantees that the latent code does not escape from the manifold and achieve a high image quality. Nevertheless, Fig 15 shows that the second method for the stochastic Iterative Curve-Traversal can change a given facial image in a very high quality and various ways.

Appendix I Subspace Traversal

In Section 4, we proved that the $\mathcal{W}$ -space in StyleGAN2 is warped globally. Specifically, the subspace of traversal direction generating principal variation in the image changes severely as we vary the starting latent variable $\mathbf{w}$ . To verify the claim further, we visualize the subspace traversal on the latent space $\mathcal{W}$ . The subspace traversal denotes a simultaneous traversal in multiple directions. In this paper, we visualize the two-dimensional traversal,

\textit{\text{Subspace Traversal}}_{(i,j)}^{\mathbf{w}}(x,y)=G\left(\mathbf{w}+\frac{x}{N}\mathbf{v}_{i}^{\mathbf{w}}+\frac{y}{N}\mathbf{v}_{j}^{\mathbf{w}}\right)

(22)

where $\mathbf{w}=f(\mathbf{z})$ and $G$ denotes a subnetwork of the given GAN model from $\mathcal{W}$ to the images space $\mathcal{X}$ . Since the disentanglement into the linear subspace implies the commutativity of transformation (Pfau et al., 2020), the subspace traversal can be a more challenging version of linear traversal experiments.

Fig 16 and Fig 17 show results of the subspace traversal for the global basis and Local Basis. Starting from the center, the horizontal and vertical traversals correspond to the $1$ st and $2$ nd directions of each method. The same perturbation intensity per step is applied for both directions. When restricted to the linear traversal (red and green box), the GANSpace shows relatively stable traversals. However, the traversal image deteriorates at the corner of the subspace traversal. By contrast, Local Basis shows a stable variation during the entire subspace traversal. This result proves that the global basis is not well-aligned with the local-geometry of the $\mathcal{W}$ manifold.