Sketch-guided Cage-based 3D Gaussian Splatting Deformation

Tianhao Xie
Concordia University
Montréal, Canada
[email protected] Noam Aigerman
Université de Montréal
Montréal, Canada
[email protected] Eugene Belilovsky
Concordia University and MILA
Montréal, Canada
[email protected] Tiberiu Popa
Concordia University
Montréal, Canada
[email protected]

Abstract

3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.

Figure 1: Deformations of 3D Gaussian splats (GS) using our sketch-guided deformation method. Given a 3D GS scene, the user can deform the 3D GS by drawing a deformed silhouette sketch of a single view. (A) and (B) show examples with synthetic data and real-world large-scale data respectively. We can also produce an animation of a of 3D GS by few (

\geq 2

) keyframe sketches, as shown in (C).

1 Introduction

Refer to caption — Figure 2: Overview of our method. We start with a 3D GS model (A). Users can select one specific sketch view (B) to draw an edited silhouette sketch (C). This user-drawn sketch (C) and rendering from the chosen view (B) are fed into ControlNet [39] resulting in a deformed reference image (D). We then optimize the cage of the source 3D GS model (A) using two losses: (1) a silhouette loss $L_{sil}$ between the silhouette (E) of the deformed GS model (F) and the silhouette of the generated reference image (D), and (2) a 3D-aware SDS loss from $4$ random views (G) conditioned on the reference image (D).

Editing of 3D models and shapes often arises in computer graphics, computer animation, and geometric modeling. This editing is usually carried out by deforming the 3D models. In this work, we aim to provide such editing-through-deformation capabilities for one of the most promising novel representations of 3D geometry - 3D Gaussian Splatting (GS) [16], which offers great real-time novel view rendering ability and better photorealistic reconstruction than previous methods.

Similarly to other geometric representations such as NeRFs [24] and triangle meshes, various control mechanisms were proposed for editing GS, such as text prompts [5, 34, 4, 35] and video priors [28, 20]. Unfortunately, these types of controls are designed for broad, high-level edits (ones within the capabilities of a novice user), without enabling fine-grained control over the deformation. On the other hand, some investigation has been made into more direct, geometrical editing, e.g., via physics-based simulation [37] - this again offers limited editing capabilities.

The problem in providing fine-grained geometric control lies in the GS representation, which is made up of an unstructured array of different 3D Gaussians whose aggregation forms visuals when splatted onto a 2D canvas. This often leads to a global dependence between different Gaussian - changes the position of one and the plausibility of the scene is ruined. Hence, it is difficult to provide the ability to perform local edits while maintaining the integrity of the resulting visuals.

To tackle these issues, in this work, we introduce the first sketch-guided 3D GS deformation system, which enables the user to intuitively interact with a simple 2D sketch of the object, and by which induces a 3D deformation of the Gaussians. To achieve this, we propose several technical contributions: 1) geometrically, to ensure the deformations produced are regulated, we propose a novel deformation framework for GS, based on cage-based deformations, which are in turn controlled by deformation Jacobians [1]. 2) Semantically, we leverage ControlNet [39] and Score distillation Sampling(SDS) [27] to ensure a semantically-meaningful, plausible 3D GS deformation. Together, these two contributions enable the user to deform the shape freely, while preserving its integrity, see Figure 5.

Our experiments verify our method’s ability to provide deformation of Gaussian splats.

2 Related Work

2.1 Sketch-based 3D shape Editing

Sketching is a widely used modeling paradigm in geometric modeling and computer graphics. Early methods such as Teddy [13] and Fibremesh [26] construct smooth shapes guided by user-specified 2D sketches. After generating the initial shape, deformations can be done by drawing reference strokes or deforming the generated curve.

Some works [25, 17, 42, 19] focus on the sketch-based deformation that uses individual drawn strokes as a deformation clue. (For a thorough review of the 3D shape modeling, we refer to [6].) All these methods combine the user constraints from sketches with some geometric regularizer such as the Laplacian. Whereas, these geometric energies were designed to preserve certain properties, such as smoothness, and couldn’t take into account the semantic information of the object. This can sometimes lead to unnatural deformation, e.g. bending the straight shape.

More recently, data-driven methods were applied to sketch-based 3D shape modeling [9, 23, 21, 3]. Some works [9, 3] trained neural networks to generate 3D meshes conditioned on the input sketch. These methods always need large-scale datasets to train the networks and the editing can only be done for generated shapes. For more general sketch-based editing, [23, 21] edited the shapes (represented by Neural Radiance Field) by 2D sketch matching and employed Score Distillation Sampling(SDS) [27] to produce natural-looking editing that satisfied the semantic of the text prompt.

2.2 SDS in 3D shape Editing

Score Distillation Sampling(SDS) was first introduced in [27] as a 3D shape generation method based on 2D diffusion prior. The core idea of SDS is to make the renderings of generated 3D shapes look natural from any random viewpoint. Since it is an image-based score, it can be applied to 3D editing of any representation, e.g. triangular mesh [36, 38], NeRF [23] and 3D GS [20]. However, the original SDS used a 2D image diffusion model with only 2D knowledge, leading to the view inconsistency problem of SDS. Recently, by using 3D data in training the diffusion model, Multi-view diffusion [29, 22, 12] was introduced to generate 3D shapes with better geometric consistency. By replacing the image diffusion model in the SDS with multi-view Diffusion, the cross-view consistency can be improved in the 3D shape editing [28, 21].

2.3 Editing 3D Gaussian Splatting

Neural Radiance Field(NeRF) [24] and 3D Gaussian Splating(3D GS) [16] are new 3D representations designed for novel-view synthesis that the 3D scene can be reconstructed from a set of images. Some work has been done for NeRF editing [23, 10, 30, 41]. Since our work focuses on sketch-based deformation of 3D GS, we will talk in more detail about editing 3D GS. [34] introduce a method that edits the 3D Gaussian Splatting(GS) scene with text instruction, powered by LLM and 2D image diffusion prior, which can achieve object texture editing and environment changing. [5] also uses 2D image diffusion prior as the guidance of the editing but introduced the Hierarchical GS to improve the editing quality. It enables object removal and addition by employing inpainting techniques.

Instead of guiding the editing by 2D image diffusion prior, [4, 35] introduced methods that edit the rendered image of original 3D GS from multi-views with consistency control, and fit the changes in the edited images to 3D GS directly, which improve the efficiency and quality of editing significantly.

Some works focus on the deformation of the 3D GS. Align-Your-Gaussians(AYG) [20] and DreamGaussian4D [28] introduced methods that animate a static 3D GS object to a 4D GS sequence. AYG [20] employs Video Diffusion prior to drive the temporal deformation between frames and using 2D Diffusion prior for every frame respectively to constrain the deformation validly. Instead of using Video Diffusion prior, DreamGaussian4D [28] uses a reference video in one specific view to animate the static 3D GS and applies 3D-aware Score Distillation Sampling(SDS) to propagate the deformation in every frame. The reference video was generated from an image-to-video Diffusion model based on the rendered image of static 3D GS from that specific view. Compared to AYG, DreamGaussian4D is more efficient but limited by the generation quality and universality.

PhysGaussians [37] seamlessly integrates physically grounded Newtonian dynamics within 3D GS to achieve high-quality novel motion synthesis. It adapts the Material Point Method(MPM) to 3D GS which enriches 3D GS with meaningful kinematic deformation and mechanical stress attributes.

SuGaR [8] employs new training energies based on original 3D GS [16] that produce 3D GS with better surface alignment and more even density distribution. These new properties make it possible to extract mesh from 3D GS only using traditional methods, such as Poisson reconstruction [15]. Given the extracted mesh, The 3D GS can be bound to the mesh and deformed by mesh deformation algorithms, such as ARAP [31]. Similar to SuGaR, [7] also proposes to bind the 3D GS kernels to the mesh, which is reconstructed by the existing methods from the input multi-view images directly. However, the result significantly depends on the extracted mesh, which can fail in scenes with complex geometry and transparent components.

3 Method

We next detail the various components of our framework, starting with an overview of the representation of 3D Gaussians [16], moving on to our cage-based deformation through jacobians, and concluding with applying this deformation technique using sketches and Score Distillation Sampling [27].

3.1 3D Gaussians

A $3D$ Gaussian is defined by a full $3D$ matrix $\Sigma\in\mathbb{R}^{3\times 3}$ and a centroid $\mu\in\mathbb{R}^{3}$ , defining the density of the Gaussian:

G(x)=e^{-(1/2)(x-\mu)^{T}\Sigma^{-1}(x-\mu)}.

(1)

Given a diagonal scaling matrix $S\in\mathbb{R}^{3}$ and rotation matrix $R\in SO(3)$ , the corresponding $\Sigma$ is constructed as:

\Sigma=RSS^{T}R^{T},

(2)

with $S$ and $R$ being the variables that are optimized during the reconstruction of the scene using $3D$ Gaussians. We consider a collection of such Gaussians, $\left(\Sigma_{i},\mu_{i}\right)_{i=1}^{n}$ .

3.2 A Regularized Framework for Deforming 3D Gaussians

Deforming 3D Gaussians entails assigning a new centroid position $\mu$ and a new covariance matrix $\Sigma$ to each Gaussian. However, we wish to regulate the space of possible deformations, to avoid Gaussians floating apart, and exposing only meaningful deformations of the object the Gaussians represent. Towards this goal, we design a novel, tailor-made deformation scheme, incorporating two components: 1) a cage-based deformation [2], tailored to Gaussian Splats; 2) a method to control this cage deformation, inspired by neural jacobian fields [1]. We detail these two components next.

3.2.1 Cage-Based Deformation of Gaussians

A cage-based deformation uses a triangular mesh $S$ with vertices $V$ and triangles $T$ . The mesh is deformed into another state, $S^{\prime}$ , by moving its vertices. By that, the mesh defines a deformation $f_{S^{\prime}}:\mathbb{R}^{3}\to\mathbb{R}^{3}$ , mapping every point in $\mathbb{R}^{3}$ as a function of the positioning of the vertices of $S^{\prime}$ . This enables exposing a more meaningful, low-dimensional deformation space, controlled by the cage’s vertices. There are many different approaches to define this function; we choose to use [2] - see the supplementary material for the full details.

We next define the deformation of the Gaussians w.r.t the cage’s deformation: the deformed position $\mu^{\prime}$ of each centroid $\mu$ is defined as mapping it through the cage deformation:

\mu^{\prime}=f_{S^{\prime}}(\mu).

(3)

Similarly, to modify the covariance matrix $\Sigma$ , as is standard, we use a local linear approximation of the deformation, via the Jacobian matrix $J_{f}=\nabla f_{S^{\prime}}$ , evaluated at the centroid $\mu$ . Then, the deformed covariance matrix $\Sigma^{\prime}$ can be expressed as:

\Sigma^{\prime}=J_{f}RSS^{T}R^{T}J_{f}^{T}.

(4)

3.2.2 Controlling Cages via Neural Jacobian Fields

While the cage-based deformation already regularizes the deformation of the Gaussians, We found that optimizing directly on the cage vertices tends to produce entanglement and unsmooth cage which can lead to artifacts in the rendering of $3D$ GS, as in figure 8. We thus control the cage vertices’ position using Neural Jacobian fields (NJF) [1]. In short, NJF positions a mesh’s vertices $V^{\prime}$ from given per-triangle matrices $M_{i}\in\Re^{3\times 3}$ , by minimizing the squared error between those matrices and the mesh’s per-face Jacobians $J_{i}\in\Re^{3\times 3}$ , defined as

J_{i}=V^{\prime}\nabla^{T}_{i},

(5)

where $\nabla^{T}_{i}$ is the gradient operator of triangle $t_{i}$ . The solution to this least-squares problem is achieved via Poisson’s equation, amounting to solving a single sparse linear system, which is easily implementable in a differentiable pipeline.

We represent jacobians in a manner better accommodating for optimization: Suppose the initial per-face Jacobian is $J_{i}\in\Re^{3\times 3}$ for face $i$ , and the deformed per-face Jacobian is $J_{i}^{{}^{\prime}}$ . There is a linear transformation $T\in\Re^{3\times 3}$ s.t. $J_{i}^{{}^{\prime}}=TJ_{i}$ . By polar decomposition, this transformation matrix can be decomposed to

T=R\cdot S,

(6)

where $R$ is an orthogonal matrix and $S$ is the stretching component (symmetric semi-positive definite matrix) of the transformation.

As shown in Figure 4, we express the deformation of the cage as a per-triangle rotation and stretching in the Jacobian space. After the cage was deformed, the deformation of $3D$ GS was computed by equations 3 and 4.

We represent the rotational component using the smooth 6-DoF representation illustrated in [40] and the stretching component as a symmetric 3 by 3 matrix with 6 degrees of freedom. Since we actually have more freedom after decomposing the Jacobian field for optimization, we found that it can produce lower energy values than optimizing the Jacobian field directly.

3.3 3D-aware Score Distillation Sampling

We use Score Distillation Sampling(SDS) [27] to guide our deformation to be plausible. In short, SDS renders a 3D model from different view points, and then leverages a trained 2D image diffusion model, and backpropagates the diffusion process through the differentiable renderer, to the degrees of freedom of the 3D model. we further make use of a 3D-aware image diffusion model [22] to enable a more accurate 3D consistency of the generated images. In short, this model can be conditioned on a specific viewing direction, and produces more consistent images of the same object from different view point. See Supplemental for the full details.

3.4 Sketch-guided 3D GS Deformer

The overview of the editing pipeline is shown in Figure 2. The user first selects a viewpoint $vp$ (blue camera). This viewpoint is used to extract a silhouette of the object. We render the $3D$ GS from viewpoint $vp$ to get an image $I_{vp}$ (Figure 2 B). The user additionally deforms the silhouette into $S_{vp}$ (Figure 2 C). To obtain the deformed image $I^{CN}_{vp}$ (Figure 2 D) guided by the user’s sketch, the rendering $I_{vp}$ is fed into an image-to-image diffusion model conditioned on the $S_{vp}$ by using ControlNet [39]. Our loss measures the silhouette difference $\mathcal{L}_{sil}$ between $I^{CN}_{vp}$ and $I^{def}_{vp}$ (Figure 2 E), the rendering of deformed 3D GS from viewpoint $vp$ :

\mathcal{L}_{sil}=\|I^{CN}_{vp}-I^{def}_{vp}\|_{2}^{2}.

(7)

We chose to only penalize the silhouette and not the full RGB deformed image, as our experiments showed the texture of the objects can be otherwise changed drastically.

To keep the deformation natural in all views, we apply 3D-aware SDS on randomly sampled views (red camera) in every iteration. Thus the final gradient used during optimization is:

\nabla\mathcal{L}_{total}=\alpha\nabla\mathcal{L}_{sil}+\nabla\mathcal{L}_{SDS},

(8)

$\alpha$ is set to $10000$ for all examples. When optimizing the 3D GS by the objective function 8, the 3D GS is deformed by the differentiable Cage-based block as shown in section 3.2.

3.5 Implementation details

The cage is generated automatically by first extracting a mesh from the GS model using the coarse stage of the SuGaR [8] method. This mesh is very large, sometimes it is not a closed manifold and it has many fold-overs so it would not be suited for a cage. Therefore, we compute a triangulated offset surface using a function from Libigl [14] that applies marching cubes on a grid of signed distance values from the input triangle mesh. The resulting mesh is a close manifold triangular mesh with an adjustable number of vertices depending on the model size and level of detail desired. We used StableDiffusion-XL as the diffusion model of the ControlNet. For each deformation, we optimized for $2000$ iterations, with a learning rate of $0.002$ , and optimized by Adam optimizer [18]. Except for the reference view, $4$ random views were sampled for the 3D-aware SDS. The diffusion model used in 3D-aware SDS is zero-1-to-3 XL [22]. All experiments were run on a single Nvidia RTX A6000 GPU. The running time was related to the number of Gaussian splats in different scenes. For the scene with less than 100k Gaussian splats, the running time is less than 10 minutes. More efficient diffusion models is a growing area of research and our method would be directly accelerated by the many methods that are being developed [32, 11].

4 Experiments

4.1 Results

We tested our method on various 3D objects, from human-made objects to animals and humans, as shown in Figure 1, 5, and 9. It shows that our method can deform the objects precisely with the guidance of the sketch and produce natural-looking results. These 3D shapes were collected from the SketchFab and TurboSquid and transferred into a 3D GS scene. The 3D shapes were originally mesh and converted into 3D GS by training from 100 sampling images from random views of the shape.

Moreover, we also explored the ability of our method to deform large-scale real-world captured data. We used the UAV-captured dataset of a great Oratory and a Clock Tower to reconstruct the 3D GS scene respectively. As shown in Figure 6.

As one of the important applications of 3D deformation is animation, we also conducted experiments of animating static 3D GS by our method. As shown in Figure 1 (C), 7, and the accompanying video, the user can provide multiple input sketches as some key-frame sketches in the animation sequence. Then, by running our method for each key frame and interpolating between the deformed cage of key frames, we can get animations of the static 3D GS.

Table 1: Quantitative comparison: we report the relative CLIP-IQA score [33], which is calculated as the CLIP-IQA score of the undeformed renderings divided by the CLIP-IQA score of the deformed renderings.

	MLP+HexPlane	Ours
Relative CLIP-IQA	75.32%	99.03%

We compared our Cage-based deformation for 3D GS with the SOTA method MLP+HexPlane [28]. For comparison, we deform the source 3D GS using our pipeline and render the sketch view of the deformed 3D GS as the reference image of the MLP+Hexplane. Thus, except for the difference in the deformation method, MLP+Hexplane also has extra RGB image guidance in the sketch view when our pipeline is only guided by silhouette. Even in that case, our pipeline can deform the 3D GS with much higher fidelity than using MLP+Hexplane. The qualitative comparison was shown in Figure 5, because of the good space continuity of the cage deformation, the local detail features can be fully preserved, e.g. the pattern on the statue, and the teeth of the dinosaur. Though the Hexplane was applied as a geometric regularizer, the detailed features can still be destroyed and cause fuzzy renderings when using MLP+Hexplane. We also quantitatively compare the visual quality of the deformed shapes based on image quality. We use the CLIP-IQA [33], a metric measuring the image quality based on the CLIP score, to evaluate 8 deformed results. For every deformed 3D GS, 8 views were rendered as the images to be evaluated. Since our task is to measure how the image quality was changed after the deformation process, we report the relative CLIP-IQA score, which is calculated as the CLIP-IQA score of the undeformed renderings divided by the CLIP-IQA score of the deformed renderings. As shown in table 1, the average decrease of the CLIP-IQA score for the MLP+Hexplane method is $24.68\%$ , however, our method almost keeps the same CLIP-IQA score as original renderings with less than $1\%$ decreasing.

4.2 Ablation

We tested the effects of two important components in our method, decomposed NJF and 3D-aware SDS. We explored the difference of optimizing directly on the cage vertices, via NJF and our proposed decomposed NJF. As shown in Figure 8, optimizing directly on the cage vertices can lead to undesired fuzzy rendering which is caused by entanglement of the cage during the optimization. As shown in Figure 8 (B), the detailed feather of the statue and the door on the side of the clock tower were destroyed. Applying NJF can solve this problem since it works as a geometry regularizer for the cage. However, it can be too strong to fit the sketched silhouette precisely. As shown in Figure 8 (C), the wing of the statue is not fitted on the end and the clock tower is not bent enough. When using decomposed NJF as stated in section 5, we can obtain a more efficient optimization which leads to a lower $\mathcal{L}_{sil}$ (average of $49.36\%$ decreasing in these two examples), but with same rendering visual quality compared to NJF (Figure 8 A).

We also explored the effect of 3D-aware SDS in our pipeline. As shown in Figure 9, the 3D-aware SDS plays a critical role in preserving undesirable deformation in the views except for the sketch view in the pipeline. For instance, when the pistol was deformed only by $\mathcal{L}_{sil}$ , the barrel was bent, which can be fixed by adding the 3D-aware SDS.

5 Conclusion

In this work, we propose a novel sketch-guided 3D Gaussian Splatting deformation framework, which enables intuitive, fine-grained control over the geometry of 3D GS by a single-view silhouette sketch. Geometrically, we designed a novel cage-based deformation tailored to Gaussian Splats and optimized its position using a modified Neural Jacobian Fields formulation. As the rendering of Gaussian Splats overlays intersecting splats on top of each other, deforming these splats can lead to visual artifacts due to misalignment. Our method provides accurate alignment of the splats in the deformed pose that yields crisp rendered results that adhere closely to the input sketch. To ensure semantically meaningful deformation from any viewpoint, our method leverages ControlNet as well as Score distillation sampling.

6 Limitation and Future work

First, our method relies on ControlNet and an image-to-image diffusion model to translate the user-drawn silhouette sketch into a usable constraint for the deformation system. However, the ability of ControlNet to generate a reference image of the deformed object is sometimes limited. Second, although the cage for 3D GS is generated automatically, the process of training the SuGaR model to extract a mesh from the 3D GS still takes several minutes, which limits the system’s efficiency.

In future work, we will focus on improving the reliability of the sketch-guided deformation, particularly by enhancing the quality and consistency of the ControlNet-generated reference images. Additionally, making the cage generation process more efficient could enhance the practicality of our method. Exploring these avenues would contribute to both the robustness and efficiency of the system.

References

Aigerman et al. [2022] Noam Aigerman, Kunal Gupta, Vladimir G Kim, Siddhartha Chaudhuri, Jun Saito, and Thibault Groueix. Neural jacobian fields: Learning intrinsic mappings of arbitrary meshes. arXiv preprint arXiv:2205.02904, 2022.
Ben-Chen et al. [2009] Mirela Ben-Chen, Ofir Weber, and Craig Gotsman. Variational harmonic maps for space deformation. ACM Transactions on Graphics (TOG), 28(3):1–11, 2009.
Binninger et al. [2024] Alexandre Binninger, Amir Hertz, Olga Sorkine-Hornung, Daniel Cohen-Or, and Raja Giryes. Sens: Part-aware sketch-based implicit neural shape modeling. In Computer Graphics Forum, page e15015. Wiley Online Library, 2024.
Chen et al. [2024a] Minghao Chen, Iro Laina, and Andrea Vedaldi. Dge: Direct gaussian 3d editing by consistent multi-view editing. ECCV, 2024a.
Chen et al. [2024b] Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, and Guosheng Lin. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21476–21485, 2024b.
Ding and Liu [2016] Chao Ding and Ligang Liu. A survey of sketch based modeling systems. Frontiers of Computer Science, 10:985–999, 2016.
Gao et al. [2024] Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, and Yu-Kun Lai. Mesh-based gaussian splatting for real-time large-scale deformation. arXiv preprint arXiv:2402.04796, 2024.
Guédon and Lepetit [2024] Antoine Guédon and Vincent Lepetit. Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5354–5363, 2024.
Guillard et al. [2021] Benoit Guillard, Edoardo Remelli, Pierre Yvernay, and Pascal Fua. Sketch2mesh: Reconstructing and editing 3d shapes from sketches. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13023–13032, 2021.
Haque et al. [2023] Ayaan Haque, Matthew Tancik, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-nerf2nerf: Editing 3d scenes with instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19740–19750, 2023.
Hong et al. [2024] Chi Hong, Jiyue Huang, Robert Birke, Dick Epema, Stefanie Roos, and Lydia Y Chen. Sfddm: Single-fold distillation for diffusion models. arXiv preprint arXiv:2405.14961, 2024.
Huang et al. [2024] Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, et al. Epidiff: Enhancing multi-view synthesis via localized epipolar-constrained diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9784–9794, 2024.
Igarashi et al. [2006] Takeo Igarashi, Satoshi Matsuoka, and Hidehiko Tanaka. Teddy: a sketching interface for 3d freeform design. In ACM SIGGRAPH 2006 Courses, pages 11–es. 2006.
Jacobson et al. [2013] Alec Jacobson, Daniele Panozzo, C Schüller, Olga Diamanti, Qingnan Zhou, N Pietroni, et al. libigl: A simple c++ geometry processing library. Google Scholar, 2013.
Kazhdan et al. [2006] Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, 2006.
Kerbl et al. [2023] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph., 42(4):139–1, 2023.
Kho and Garland [2005] Youngihn Kho and Michael Garland. Sketching mesh deformations. In Proceedings of the 2005 symposium on Interactive 3D graphics and games, pages 147–154, 2005.
Kingma [2014] Diederik P Kingma. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Kraevoy et al. [2009] Vladislav Kraevoy, Alla Sheffer, and Michiel van de Panne. Modeling from contour drawings. In Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling, page 37–44, New York, NY, USA, 2009. Association for Computing Machinery.
Ling et al. [2024] Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, and Karsten Kreis. Align your gaussians: Text-to-4d with dynamic 3d gaussians and composed diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8576–8588, 2024.
Liu et al. [2024] Feng-Lin Liu, Hongbo Fu, Yu-Kun Lai, and Lin Gao. Sketchdream: Sketch-based text-to-3d generation and editing. ACM Transactions on Graphics (TOG), 43(4):1–13, 2024.
Liu et al. [2023] Ruoshi Liu, Rundi Wu, Basile Van Hoorick, Pavel Tokmakov, Sergey Zakharov, and Carl Vondrick. Zero-1-to-3: Zero-shot one image to 3d object. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9298–9309, 2023.
Mikaeili et al. [2023] Aryan Mikaeili, Or Perel, Mehdi Safaee, Daniel Cohen-Or, and Ali Mahdavi-Amiri. Sked: Sketch-guided text-based 3d editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14607–14619, 2023.
Mildenhall et al. [2021] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
Nealen et al. [2005] Andrew Nealen, Olga Sorkine, Marc Alexa, and Daniel Cohen-Or. A sketch-based interface for detail-preserving mesh editing. In ACM SIGGRAPH 2005 Papers, pages 1142–1147. 2005.
Nealen et al. [2007] Andrew Nealen, Takeo Igarashi, Olga Sorkine, and Marc Alexa. Fibermesh: designing freeform surfaces with 3d curves. In ACM SIGGRAPH 2007 papers, pages 41–es. 2007.
Poole et al. [2022] Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
Ren et al. [2023] Jiawei Ren, Liang Pan, Jiaxiang Tang, Chi Zhang, Ang Cao, Gang Zeng, and Ziwei Liu. Dreamgaussian4d: Generative 4d gaussian splatting. arXiv preprint arXiv:2312.17142, 2023.
Shi et al. [2023] Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, and Xiao Yang. Mvdream: Multi-view diffusion for 3d generation. arXiv preprint arXiv:2308.16512, 2023.
Song et al. [2023] Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, and Taehyeong Kim. Blending-nerf: Text-driven localized editing in neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14383–14393, 2023.
Sorkine and Alexa [2007] Olga Sorkine and Marc Alexa. As-rigid-as-possible surface modeling. In Symposium on Geometry processing, pages 109–116. Citeseer, 2007.
Starodubcev et al. [2024] Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov, and Artem Babenko. Your student is better than expected: Adaptive teacher-student collaboration for text-conditional diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9275–9285, 2024.
Wang et al. [2023] Jianyi Wang, Kelvin CK Chan, and Chen Change Loy. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2555–2563, 2023.
Wang et al. [2024] Junjie Wang, Jiemin Fang, Xiaopeng Zhang, Lingxi Xie, and Qi Tian. Gaussianeditor: Editing 3d gaussians delicately with text instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20902–20911, 2024.
Wu et al. [2024] Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, and Victor Adrian Prisacariu. Gaussctrl: multi-view consistent text-driven 3d gaussian splatting editing. arXiv preprint arXiv:2403.08733, 2024.
Xie et al. [2023] Tianhao Xie, Eugene Belilovsky, Sudhir Mudur, and Tiberiu Popa. Dragd3d: Vertex-based editing for realistic mesh deformations using 2d diffusion priors. arXiv preprint arXiv:2310.04561, 2023.
Xie et al. [2024] Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, and Chenfanfu Jiang. Physgaussian: Physics-integrated 3d gaussians for generative dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4389–4398, 2024.
Yoo et al. [2024] Seungwoo Yoo, Kunho Kim, Vladimir G Kim, and Minhyuk Sung. As-plausible-as-possible: Plausibility-aware mesh deformation using 2d diffusion priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4315–4324, 2024.
Zhang et al. [2023] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
Zhou et al. [2019] Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
Zhuang et al. [2023] Jingyu Zhuang, Chen Wang, Liang Lin, Lingjie Liu, and Guanbin Li. Dreameditor: Text-driven 3d scene editing with neural fields. In SIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023.
Zimmermann et al. [2008] Johannes Zimmermann, Andrew Nealen, and Marc Alexa. Sketching contours. Computers & Graphics, 32(5):486–499, 2008.