D²CSG: Unsupervised Learning of Compact CSG Trees with Dual Complements and Dropouts

Fenggen Yu¹ Qimin Chen¹ Maham Tanveer¹
Ali Mahdavi Amiri¹ Hao Zhang¹
¹Simon Fraser University

Abstract

Abstract: We present D²CSG, a neural model composed of two dual and complementary network branches, with dropouts, for unsupervised learning of compact constructive solid geometry (CSG) representations of 3D CAD shapes. Our network is trained to reconstruct a 3D shape by a fixed-order assembly of quadric primitives, with both branches producing a union of primitive intersections or inverses. A key difference between D²CSG and all prior neural CSG models is its dedicated residual branch to assemble the potentially complex shape complement, which is subtracted from an overall shape modeled by the cover branch. With the shape complements, our network is provably general, while the weight dropout further improves compactness of the CSG tree by removing redundant primitives. We demonstrate both quantitatively and qualitatively that D²CSG produces compact CSG reconstructions with superior quality and more natural primitives than all existing alternatives, especially over complex and high-genus CAD shapes.

1 Introduction

CAD shapes have played a central role in the advancement of geometric deep learning, with most neural models to date trained on datasets such as ModelNet [48], ShapeNet [2], and PartNet [29] for classification, reconstruction, and generation tasks. These shape collections all possess well-defined category or class labels, and more often than not, the effectiveness of the data-driven methods is tied to how well the class-specific shape features can be learned. Recently, the emergence of datasets of CAD parts and assemblies such as ABC [19] and Fusion360 [45] has fueled the need for learning shape representations that are agnostic to class labels, without any reliance on class priors. Case in point, the ABC dataset does not provide any category labels, while another challenge to the ensuing representation learning problem is the rich topological varieties exhibited by the CAD shapes.

Constructive solid geometry (CSG) is a classical CAD representation; it models a 3D shape as a recursive assembly of solid primitives, e.g., cuboids, cylinders, etc., through Boolean operations including union, intersection, and difference. Of particular note is the indispensable role the difference operation plays when modeling holes and high shape genera, which are common in CAD. Recently, there have been increased interests in 3D representation learning using CSG [9, 46, 37, 36, 17, 54, 6, 3], striving for generality, compactness, and reconstruction quality of the learned models.

Refer to caption — Figure 1: Comparing CSG trees and shape reconstructions by our network, D²CSG, and CAPRI-Net, the current state of the art. To reproduce the GT shape, a natural CSG assembly necessitates a difference operation involving a complex residual shape, which D²CSG can predict with compactness (only three intermediate, general shapes) and quality. CAPRI-Net can only build it using a union of convexes, requiring unnecessarily many primitives and leading to poor reconstruction.

In terms of primitive counts, a direct indication of compactness of the CSG trees, and reconstruction quality, CAPRI-Net [54] represents the state of the art. However, it is not a general neural model, e.g., it is unable to represent CAD shapes whose assembly necessitates nested difference operations (i.e., needing to subtract a part that requires primitive differences to build itself, e.g., see the CAD model in Fig. 2). Since both operands of the (single) difference operation in CAPRI-Net can only model intersections and unions, their network cannot produce a natural and compact CSG assembly for relatively complex CAD shapes with intricate concavities and topological details; see Fig. 1.

In this paper, we present D²CSG, a novel neural network composed of two dual and complementary branches for unsupervised learning of compact CSG tree representations of 3D CAD shapes. As shown in Fig. 2, our network follows a fixed-order CSG assembly, like most previous unsupervised CSG representation learning models [54, 6, 3]. However, one key difference to all of them is the residual branch that is dedicated to the assembly of the potentially complex complement or residual shape. In turn, the shape complement is subtracted from an overall shape that is learned by the cover branch. Architecturally, the two branches are identical, both constructing a union of intersections of quadric primitives and primitive inverses, but they are modeled by independent network parameters and operate on different primitive sets. With both operands of the final difference operation capable of learning general CAD shape assemblies, our network excels at compactly representing complex and high-genus CAD shapes with higher reconstruction quality, compared to the current state of the art, as shown in Fig. 1. To improve the compactness further, we implement a dropout strategy over network weights to remove redundant primitives, based on an importance metric.

Given the challenge of unsupervised learning of CSG trees amid significant structural diversity among CAD shapes, our network is not designed to learn a unified model over a shape collection. Rather, it overfits to a given 3D CAD shape by optimizing a compact CSG assembly of quadric surface primitives to approximate the shape. The learning problem is still challenging since the number, selection, and assembly of the primitives are unknown, inducing a complex search space.

In contrast to CAPRI-Net, our method is provably general, meaning that any CSG tree can be converted into an equivalent D²CSG representation. Our dual branch network is fully differentiable and can be trained end-to-end with only the conventional occupancy loss for neural implicit models [4, 3, 54]. We demonstrate both quantitatively and qualitatively that our network, when trained on ABC [19] or ShapeNet [2], produces CSG reconstructions with superior quality, more natural trees, and better quality-compactness trade-off than all existing alternatives [3, 6, 54].

2 Related Work

In general, a 3D shape can be represented as a set of primitives or parts assembled together. Primitive fitting to point clouds has been extensively studied [25, 23, 21]. For shape abstraction, cuboid [40][51] and super quadratics [34] have been employed, while 3D Gaussians were adopted for template fitting [12]. From a single image, cuboids have also been used to estimate object parts and their relations using a convolutional-recursive auto-encoder [30]. More complex sub-shapes have been learned for shape assembly such as elementary 3D structures [8], implicit convex [7], and neural star components [18], as well as parts in the form of learnable parametric patches [37], moving or deformable primitives [26, 55, 52, 33], point clouds [24], or a part-aware latent space [10]. However, none of these techniques directly addresses reconstructing a CSG tree for a given 3D shape.

Deep CAD Models.

Synthesizing and editing CAD models is challenging due to their sharp features and non-trivial topologies. Learning-based shape programs have been designed to perform these tasks by providing easy-to-use editing tools [39, 11, 16, 1]. Boundary Representations (B-Reps) are quite common for modeling CAD shapes and there are previous attempts to reverse engineer such representations given an input mesh or point cloud [49]. For example, BRepNet [20], UV-Net [14], and SBGCN [15] offer networks to work with B-Reps and their topological information through message passing. NH-Rep [13] proposed several hand-crafted rules to infer CSG trees from B-Reps. However, the leaf nodes of their CSG trees are not parameterized primitives but implicit surfaces. Extrusion-based techniques [47, 50, 35] have also been used in deep-learning-based CAD shape generation, but extrusions do not provide general shape representations and the ensuing supervised learning methods [44, 47] are restricted to specific datasets.

Learning CSG.

Learning CSG representations, e.g., primitive assembly [43] and sketch analysis [31, 22], has become an emerging topic of geometric deep learning. While most approaches are supervised, e.g., CSG-Net [36], SPFN [23], ParseNet [37], DeepCAD [46], and Point2Cyl [41], there have been several recent attempts at unsupervised CSG tree reconstruction, especially under the class-agnostic setting, resorting to neural implicit representations [4, 27, 32, 3]. InverseCSG [9] is another related method, which first detects a collection of primitives over the surface of an input shape using RANSAC, then performs a search to find CSG sequences to reconstruct the desired shape. However, RANSAC requires meticulous parameter tuning and the initially detected primitives can be noisy. Additionally, the space of CSG sequences is vast, leading to a lengthy search process. UCSG-Net [17] learns reconstructing CSG trees with arbitrary assembly orders. The learning task is difficult due to the order flexibility, but can be made feasible by limiting the primitives to boxes and spheres only. More success in terms of reconstruction quality and compactness of the CSG trees has been obtained by learning fixed-order assemblies, including D²CSG.

BSP-Net [3] learns plane primitives whose half-spaces are assembled via intersections to obtain convexes, followed by a union operation, to reconstruct concave shapes. CAPRI-Net [54] extends BSP-Net by adding quadric surface primitives and a difference operation after primitive unions. Similar to D²CSG, CAPRI-Net [54] also tries to perform a final shape difference operation by predicting a set of primitives that undergo a fixed order of intersection, union, and difference operations to obtain the target shape. However, it constrains all primitives to be convex and its sequence is not general to support all shapes. To support generality, the set of primitives in D²CSG includes both convex primitives and complimentary primitives (see Section 3.1). This enables the integration of difference operations during the initial phases of the CSG sequence. See the supplementary material for a formal proof of the generality of D²CSG. CSG-Stump [6] also follows a fixed CSG assembly order while including an inverse layer to model shape complements. The complement operation helps attain generality of their CSG reconstructions, in theory, but the inverse layer is non-differentiable and the difference operations can only be applied to basic primitives, which can severely compromise the compactness and quality of the reconstruction. ExtrudeNet [35] follows the same CSG order as CSG-Stump and changes the basic 3D primitives to extrude to ease the shape edibility but sacrifices the generalization ability for sphere-like shapes.

Overfit models.

Learning an overfit model to a single shape is not uncommon in applications such as compression [5], reconstruction [42], and level-of-details modeling [38] due to the high-quality results it can produce. The underlying geometry of a NeRF is essentially an overfit (i.e., fixed) to the shape/scene although the primary task of NeRF is novel view synthesis [53, 28]. Following a similar principle, to replicate fine geometric details, D²CSG constructs a CSG tree for a given object via optimizing a small neural network along with a randomly initialized feature code (Fig. 2). We follow this optimization/overfitting procedure as we did not find learning a prior on CAD shapes very useful and generalizable due to the structural and topological diversity in the CAD shapes.

3 Method

CSG representation involves performing operations on a set of primitives (leaves of a CSG tree), resulting in the creation of intermediate shapes that are then merged to form the final shape. In this work, sampled from a Gaussian distribution, the input to D²CSG is a feature vector that is optimized along with the network’s parameters to reconstruct a given 3D shape $S$ and the output is a set of occupancy values that are optimized to fit $S$ (similar to Auto-decoder in DeepSDF [32]). We avoid encoding a shape to a latent code since it does not help to converge more accurately or faster (supplementary material). Our design is efficient as D²CSG uses a light network converging quickly to each shape. Starting from a feature vector, we pass it to the primitive prediction network and generate two matrices that hold the primitives’ parameters. Each matrix is used to determine the approximated signed distance (ASD) of a set of query points sampled in the shape’s vicinity. These two sets of ASD values are separately passed to cover and residual branches, and each branch has an intersection and a union layer. The cover branch (Fig. 2 Top) and the residual branch (Fig. 2 Bottom) generate point occupancy values indicating whether a point is inside or outside $S_{C}$ and $S_{R}$ respectively. The difference between $S_{C}$ and $S_{R}$ forms the final shape. We adopted the multi-stage training of CAPRI-Net [54] to achieve accurate CSG reconstructions. Furthermore, since there is no explicit compactness constraint used in previous CSG-learning methods, many redundant primitives and intermediate shapes exist in the predicted CSG sequence. We introduce a novel dropout operation on the weights of each CSG layer to iteratively reduce redundancy and achieve higher compactness.

3.1 Primitive Representation

The implicit quadric equation from CAPRI-Net [54] can cover frequently used convex primitives, e.g., planes, spheres, etc. However, its fixed CSG order (intersection, union, and difference) is not general to combine the convexes to produce any shape; see proof in supplementary material. With D²CSG, we introduce a more general primitive form to also cover inverse convex primitives, so that the difference operation can be utilized as the last, as well as the first, operation of the learned CSG sequence. This is one of the key differences between our method and CAPRI-Net [54], which only produces convex shapes from the intersection layer. Our primitive prediction network (an MLP) receives a code of size 256 and outputs two separate matrices $\mathbf{P}_{C}\in\mathbb{R}^{p\times 7}$ and $\mathbf{P}_{R}\in\mathbb{R}^{p\times 7}$ (for cover and residual branches), each contains parameters of $p$ primitives (see Fig. 2). We discuss the importance of separating $\mathbf{P}_{C}$ and $\mathbf{P}_{R}$ in the next section. Half of the primitives in $\mathbf{P}_{C}$ and $\mathbf{P}_{R}$ are convex represented by a quadric equation: $|a|x^{2}+|b|y^{2}+|c|z^{2}+dx+ey+fz+g=0.$

Complementary Primitives. In D²CSG, the other half of the primitives in $\mathbf{P}_{C}$ and $\mathbf{P}_{R}$ are inverse convex primitives with the first three coefficients to be negative:

-|a|x^{2}-|b|y^{2}-|c|z^{2}+dx+ey+fz+g=0.

(1)

We avoid using a general quadric form without any non-negative constraints since it would produce complex shapes not frequently used in CAD design. For reconstruction, $n$ points near the shape’s surface are sampled and their ASD to all primitives is calculated similarly to CAPRI-Net [54]. For each point $\mathbf{q}_{j}=(x_{j},y_{j},z_{j})$ , its ASD is captured in matrix $\mathbf{D}\in\mathbb{R}^{n\times p}$ as: $\mathbf{D}_{C}(j,:)=\mathbf{Q}(j,:)\mathbf{P}_{C}^{T}$ and $\mathbf{D}_{R}(j,:)=\mathbf{Q}(j,:)\mathbf{P}_{R}^{T}$ , where $\mathbf{Q}(j,:)=(x_{j}^{2},y_{j}^{2},z_{j}^{2},x_{j},y_{j},z_{j},1)$ is the $j$ th row of $\mathbf{Q}$ .

3.2 Dual CSG Branches

Previous CSG learning approaches employ a fixed CSG order and do not apply difference operation as the last operation as opposed to CAPRI-Net [54] and D²CSG. Applying the difference as the last operation helps produce complex concavity for the reconstructed shape [54]. However, CAPRI-Net utilizes a shared primitive set for the two shapes that undergo the final subtraction operation. This sharing between the two operands has adverse effects on the final result. To backpropagate gradients through all the primitives, values in the selection matrix $\mathbf{T}$ in the intersection layer are float in the early training stage (Equation 2 and Table 1). Therefore many primitives simultaneously belong to cover and residual shapes even though their associated weights might be small in $\mathbf{T}$ . Thus, the primitives utilized by the residual shape may impact the cover shape and result in the removal of details, or the opposite may also occur. Our dual branch in D²CSG, however, allows the cover and residual shapes to utilize distinct primitives, enabling them to better fit their respective target shapes.

Now, we briefly explain our dual CSG branches using similar notations of BSP-Net [3]. We input the ASD matrix $\mathbf{D}_{C}$ into the cover branch and $\mathbf{D}_{R}$ into the residual branch, and output vector $\mathbf{a}_{C}$ and $\mathbf{a}_{R}$ , indicating whether query points are inside/outside cover shape $S_{C}$ and residual shape $S_{R}$ . Each branch contains an intersection layer and a union layer adopted from BSP-Net [3].

During training, the inputs to intersection layers are two ASD matrices $\mathbf{D}_{R}\in\mathbb{R}^{n\times p}$ and $\mathbf{D}_{C}\in\mathbb{R}^{n\times p}$ . Primitives involved in forming intersected shapes are selected by two learnable matrices $\mathbf{T}_{C}\in\mathbb{R}^{p\times c}$ and $\mathbf{T}_{R}\in\mathbb{R}^{p\times c}$ , where $c$ is the number of intermediate shapes. We can obtain $\mathbf{Con}\in\mathbb{R}^{n\times c}$ by the intersection layer and only when $\mathbf{Con}(j,i)=0$ , query point $\mathbf{q}_{j}$ is inside the intermediate shape $i$ ( $\mathbf{Con}_{R}$ is the same and only subscripts are $R$ ):

\mathbf{Con}_{C}=\text{relu}(\mathbf{D}_{C})\mathbf{T}_{C}\hskip 9.95863pt\begin{cases}0&\text{in,}\\ >0&\text{out.}\end{cases}

(2)

Then all the shapes obtained by the intersection operation are combined by two union layers to find the cover shape $S_{C}$ and residual shape $S_{R}$ . The inside/outside indicators of the combined shape are stored in vector $\mathbf{a}_{R}$ and $\mathbf{a}_{C}\in\mathbb{R}^{n\times 1}$ indicating whether a point is in/outside of the cover and residual shapes. We compute $\mathbf{a}_{C}$ and $\mathbf{a}_{R}$ in a multi-stage fashion ( $\mathbf{a}^{+}$ and $\mathbf{a}^{*}$ for early and final stages) as [54]. Specifically, $\mathbf{a}^{*}_{C}$ and $\mathbf{a}^{*}_{R}$ are obtained by finding min of each row of $\mathbf{Con}_{C}$ and $\mathbf{Con}_{R}$ :

\mathbf{a}^{*}_{C}(j)=\min_{1\leq i\leq c}(\mathbf{Con}_{C}(j,i))\hskip 9.95863pt\begin{cases}0&\text{in,}\\ >0&\text{out.}\end{cases}

(3)

Since gradients can only be backpropagated to the minimum value in the min operation, we additionally introduce $\mathbf{a}^{+}_{C}$ at the early training stage to facilitate learning by the following equation:

\displaystyle\mathbf{a}^{+}_{C}(j)=

\displaystyle\mathscr{C}(\sum_{1\leq i\leq c}\mathbf{W}_{C}(i)\mathscr{C}(1-\mathbf{Con}_{C}(j,i)))\hskip 2.84544pt\begin{cases}1&\approx\text{in,}\\ <1&\approx\text{out.}\end{cases}

(4)

$\mathbf{W}_{C}\in\mathbb{R}^{c}$ is a learned weighting vector and $\mathscr{C}$ is a clip operation to $[0,1]$ . $\mathbf{a}^{+}_{R}$ is defined similarly.

3.3 Loss Functions and Dropout in Training

Our training is similar to CAPRI-Net [54] in terms of staging to facilitate better gradient propagation for learning CSG layer weights. While the early stages (0 - 1) of our training and loss functions are the same as CAPRI-Net, Stage 2, which is a crucial part of our training due to the injection of a novel dropout, is different. In Table 1, we provide an overview of each stage of training and loss functions, with more details presented below and also in the supplementary material.

Table 1: Settings for multi-stage training.

Stage	Intersection ( $\mathbf{T}$ )	Union ( $\mathbf{W}$ )	Difference Op ( $\mathbf{a}$ )	Dropout	Loss
1	float	float	$\mathbf{a}^{+}$	-	$L^{+}_{rec}+L_{\mathbf{T}}+L_{\mathbf{W}}$
2	float	-	$\mathbf{a}^{*}$	-	$L^{*}_{rec}+L_{\mathbf{T}}$
3	binary	binary	$\mathbf{a}^{*}$	✓	$L^{*}_{rec}$

Stage 0. At stage 0, $\mathbf{T}$ in the intersection layer and $\mathbf{W}$ in the union layer are float. The loss function is: $L^{+}=L^{+}_{rec}+L_{\mathbf{T}}+L_{\mathbf{W}}$ . $L^{+}_{rec}$ is the reconstruction loss applied to $\mathbf{a}^{+}_{C}$ and $\mathbf{a}^{+}_{R}$ and would force the subtracted result between the cover shape and residual shape to be close to the input shape. $L_{\mathbf{T}}$ and $L_{\mathbf{W}}$ are losses applied to weights of the intersection layer and union layer to keep entries of ${\mathbf{T}}$ within $[0,1]$ and entries of ${\mathbf{W}}$ close to one [3]. Please note that we have separate network weights for the dual CSG branches: $\mathbf{T}=[\mathbf{T}_{C},\mathbf{T}_{R}]$ and $\mathbf{W}=[\mathbf{W}_{C},\mathbf{W}_{R}]$ and they are trained separately. Please refer to more details on the losses in the supplementary material.

Stage 1. Here, the vector $\mathbf{a}^{+}$ is substituted with $\mathbf{a}^{*}$ to eliminate the use of $\mathbf{W}$ from Stage 0. This adopts $\min$ as the union operation for the cover and residual shapes and restricts the back-propagation of gradients solely to the shapes that underwent a union operation. The loss function is $L^{*}=L^{*}_{rec}+L_{T}$ . $L^{*}_{rec}$ is the reconstruction loss applied to $\mathbf{a}^{*}_{C}$ and $\mathbf{a}^{*}_{R}$ , $L_{T}$ is the same as Stage 0.

Stage 2. In Stage 2, entries $t$ in $T$ are quantized into binary values by a given threshold, $t_{Binary}=(t>\eta)?1:0$ , where $\eta=0.01$ . This way, the network will perform interpretable intersection operations on primitives. Due to the non-differentiable nature of the $t_{Binary}$ values, the loss term $L_{T}$ is no longer utilized, resulting in the reduction of the loss function to only $L^{*}_{rec}$ (see Table 1).

Dropout in Stage 2. Without an explicit compactness constraint or loss, redundancies in our CSG reconstruction are inevitable. To alleviate this, we make two crucial observations: first, the CSG sequence learned at Stage 2 is entirely interpretable, allowing us to track the primitives and CSG operations involved in creating the final shape; second, altering the primitives or CSG operations will consequently modify the final shape’s implicit field. Therefore, we can identify less significant primitives and intermediate shapes if their removal does not substantially impact the final result. Thus, we introduce the importance metric $\Delta\mathbf{S}$ to measure the impact of removing primitives or intermediate shapes on the outcome. We first define the implicit field $\mathbf{s}^{*}$ of the final shape as follows:

\displaystyle\mathbf{s}^{*}(j)=\text{max}(\mathbf{a}^{*}_{C}(j),\alpha-\mathbf{a}^{*}_{R}(j))\hskip 2.84544pt\begin{cases}0&\text{in,}\\ >0&\text{out,}\end{cases}

(5)

where $\mathbf{s}^{*}(j)$ signifies whether a query point $\mathbf{q}_{j}$ is inside or outside the reconstructed shape. It is important to note that $\alpha$ should be small and positive ( $\approx 0.2$ ), as $\mathbf{s}^{*}$ approaches 0 when points are within the shape. Then the importance metric $\Delta\mathbf{S}$ is defined as:

\displaystyle\Delta\mathbf{S}=\sum_{j=1}^{n}

\displaystyle\Delta\mathbbm{1}(\mathbf{s}^{*}(j)\!<\!\frac{\alpha}{2}).

(6)

Here, $\mathbf{s}^{*}(j)\!<\!\frac{\alpha}{2}$ quantizes implicit field values $\mathbf{s}^{*}(j)$ from float to Boolean, where inside values are 1 and outside values are 0. Function $\mathbbm{1}$ maps Boolean values to integers for the sum operation, $\Delta$ captures the difference in values in case of eliminating primitives or intermediate shapes.

Note that union layer parameters $\mathbf{W}$ are set close to 1 by $L_{\mathbf{W}}$ in stage 0. During stage 2, at every 4,000 iterations, if removing the intermediate shape $i$ from the CSG sequence makes $\Delta\mathbf{S}$ smaller than threshold $\sigma$ , we would discard intermediate shape $i$ by making $\mathbf{W}_{i}=0$ . Consequently, Equation (5) will be modified to incorporate the updated weights in the union layer:

\mathbf{a}^{*}_{C}(j)=\min_{1\leq i\leq c}(\mathbf{Con}_{C}(j,i)+(1-\mathbf{W}_{i})*\theta)\hskip 9.95863pt\begin{cases}0&\text{in,}\\ >0&\text{out.}\end{cases}

(7)

This way, the removed shape $i$ will not affect $\mathbf{a}^{*}$ as long as $\theta$ is large enough. In our experiments, we discover that $\theta$ just needs to be larger than $\alpha$ so that the removed shape $i$ will not affect $\mathbf{s}^{*}$ . Thus we set $\theta$ as 100 in all experiments. Furthermore, we apply dropout to parameters in the intersection layer by setting $T_{k,:}=0$ if $\Delta\mathbf{S}$ falls below $\sigma$ after removing primitive $k$ from the CSG sequence. We set $\sigma$ as 3 in experiments and examine the impact of $\sigma$ in the supplementary material.

4 Results

In our experiments to be presented in this section, we set the number of maximum primitives as $p=512$ and the number of maximum intersections as $c=32$ for each branch to support complex shapes. The size of our latent code is 256 and a two-layer MLP is used to predict the parameters of the primitives from the input feature code. We train D²CSG per shape by optimizing the latent code, primitive prediction network, intersection layer, and union layer. To evaluate D²CSG against prior methods, which all require an additional time-consuming optimization at test time to achieve satisfactory results (e.g., 30 min per shape for CSG-Stump), we have randomly selected a moderately sized subset of shapes as test set for evaluation: 500 shapes from ABC, and 50 from each of the 13 categories of ShapeNet (650 shapes in total). In addition, we ensured that 80% of the selected shapes from ABC have genus larger than two with more than 10K vertices to include complex structures. All experiments were performed using an Nvidia GeForce RTX 2080 Ti GPU.

4.1 Mesh to CSG Representation

Given a 3D shape, the task is to learn an accurate and compact CSG representation for it. To do so, we first sample 24,576 points around the shape’s surface (i.e. with a distance up to $1/64$ ) and 4,096 random points in 3D space. All 28,672 points are then scaled into the range $[-0.5,0.5]$ , these points along with their occupancy values are used to optimize the network.

We compare D²CSG with InverseCSG [9] (I-CSG), BSP-Net [3], CSG-Stump [6] and CAPRI-Net [54], which output structured parametric primitives. For a fair comparison, we optimize all of these networks with the same number of iterations. BSP-Net, CSG-Stump, and CAPRI-Net are pre-trained on the training set provided in CAPRI-Net before optimization to achieve better initialization. Note that CSG-Stump uses different network settings for shapes from ABC (with shape differences) and ShapeNet (without shape difference); we therefore follow the same settings in our comparisons. For InverseCSG, we adopt the RANSAC parameters used for most of shapes in its CAD shape dataset. For each shape reconstruction, BSP-Net takes about 15 min, I-CSG about 17 min, CSG-Stump about 30 min, and CAPRI-Net about 3 min to converge. In contrast, the training process for D²CSG runs 12,000 iterations for each stage, taking about 5 minutes per shape reconstruction.

Evaluation Metrics.

Quantitative metrics for shape reconstruction are symmetric Chamfer Distance (CD), Normal Consistency (NC), Edge Chamfer Distance [3] (ECD). For ECD, we set the threshold for normal cross products to 0.1 for extracting points close to edges. CD and ECD are computed on 8K sample points on the surface and multiplied by 1,000. In addition, we compare the number of primitives #P to evaluate the compactness of shapes since all CSG-based modeling methods predict some primitives that are combined with intersection operations.

Evaluation and Comparison.

We provide visual comparisons on representative examples from the ABC dataset and the ShapeNet dataset in Fig. 3. Since InverseCSG only accepts high-quality CAD meshes as input for its primitive detection algorithm and fails on ShapeNet, we only compare it on the ABC dataset, more visual results can be found in the supplementary material. Our method consistently reconstructs more accurate shapes with geometric details and concavities. The RANSAC based primitive detection algorithm in InverseCSG can easily produce noisy and inaccurate primitives, resulting in many degenerated planes after CSG modeling. BSP-Net simply assembles convex shapes to fit the target shape and obtain less compact results without a difference operation. CSG-Stump tends to use considerably more difference operations to reconstruct shapes. This also causes the shapes’ surface to be carved by many redundant primitives (i.e., lack of compactness). In addition, since CSG-Stump does not support difference operations between complex shapes, it fails to reconstruct small holes or intricate concavities. As the intermediate shapes subtracted in CAPRI-Net’s fixed CSG sequence are only unions of convex shapes, it fails to reproduce target shapes with complex concavities. D²CSG achieves the best reconstruction quality and compactness in all metrics on ABC dataset and ShapeNet compared to other methods, except for NC on ShapeNet where D²CSG underperformed ever so slightly ( $\Delta=0.003$ ); see Table 2.

Table 2: Comparing CSG rep learning from 3D meshes in ABC and ShapeNet.

	ABC				ShapeNet
Methods	I-CSG	BSP	STUMP	CAPRI	Ours	BSP	STUMP	CAPRI	Ours
CD $\downarrow$	0.576	0.115	0.383	0.177	0.069	0.164	2.214	0.124	0.119
NC $\uparrow$	0.877	0.921	0.850	0.903	0.928	0.882	0.794	0.890	0.887
ECD $\downarrow$	6.330	4.047	8.881	3.990	3.091	3.899	6.101	2.035	1.722
#P $\downarrow$	43.62	359.38	83.42	66.26	28.62	694.21	228.58	50.94	43.78

Table 3: Ablation study on key components of D²CSG: complementary primitives (CP), dual branches (DB), and dropout (DO), on three quality metrics and three compactness metrics, the number of CSG primitives (#P), intermediate shapes (#IS), and surface segments (#Seg) resulting from a decomposition induced by the CSG tree. Winner in boldface and second place in blue.

Row ID	CP	DB	DO	CD $\downarrow$	NC $\uparrow$	ECD $\downarrow$	#P $\downarrow$	#IS $\downarrow$	#Seg $\downarrow$
1	-	-	-	0.177	0.903	3.99	66	8.3	82
2	✓	-	-	0.073	0.935	3.12	38	5.8	55
3	✓	-	✓	0.088	0.926	3.48	27	5.3	40
4	✓	✓	-	0.069	0.936	2.98	53	6.8	57
5	✓	✓	✓	0.069	0.928	3.09	29	5.7	42

Ablation Study.

We conducted an ablation study (see Table 3) to assess the efficacy of, and tradeoff between, the three key features of D²CSG: complementary primitives, dual branch design, and dropout. As we already have three metrics for reconstruction quality, we complement #P, the only compactness measure, with two more: the number of intermediate shapes and the number of surface segments as a result of shape decomposition by the CSG tree; see more details on their definitions in the supplementary material. We observe that incorporating complementary primitives (row 2) enables better generalization to shapes with complex structures and higher accuracy than the CAPRI-Net [54] baseline (row 1). The dual branch design further enhances reconstruction accuracy (row 4 vs. 2), as primitives from the residual shape do not erase details of the cover shape, and each branch can better fit its target shape without impacting the other branch. However, the dual branch design did compromise compactness. Replacing dual branching by dropout in the final stage (row 3 vs. 4) decreases the primitive count by about 50% and improves on the other two compactness measures, while marginally compromising accuracy. The best trade-off is evidently achieved by combining all three components, as shown in row 5, where adding the dual branches has rather minimal effects on all three compactness metrics with dropouts. In the supplementary material, we also examine the impact of employing basic primitives and modifying the maximum primitive count.

Table 4: Comparing CSG rep learning from 3D point cloud in ABC and ShapeNet.

	ABC				ShapeNet
Methods	BSP	STUMP	CAPRI	Ours	BSP	STUMP	CAPRI	Ours
CD $\downarrow$	0.133	0.695	0.225	0.085	0.268	1.177	0.242	0.224
NC $\uparrow$	0.919	0.841	0.894	0.924	0.889	0.840	0.888	0.886
ECD $\downarrow$	3.899	7.303	3.308	3.029	1.854	3.482	1.971	1.815
#P $\downarrow$	360.87	67.436	68.57	35.28	553.76	211.36	55.21	51.26

4.2 Applications

Point Clouds to CSG.

We reconstruct CAD shapes from point clouds, each containing 8,192 points. For each input point, we sample $8$ points along its normal with perturbations sampled from Gaussian distribution $(\mu=0,\sigma=1/64)$ . If this point is at the opposite direction of normal vectors, the occupancy value is 1, otherwise it is 0. This way, we sample $65,536$ points to fit the network to each shape and other settings the same as mesh-to-CSG experiment. Quantitative comparisons in Table 4 and visual comparisons in Fig. 4 show that our network outperforms BSP-Net, CSG-Stump, and CAPRI-Net in different reconstruction similarity and compactness metrics on ABC dataset and ShapeNet. More results can be found in the supplementary material.

Shape Editing.

Once we have acquired the primitives and CSG assembly operations, we can edit the shape by altering primitive parameters, adjusting the CSG sequence, or transforming intermediate shapes. To facilitate shape editing in popular CAD software, we further convert quadric primitives into basic primitives (e.g., cubes, spheres, cylinder, etc.) by identifying the best-fitting basic primitive for each predicted quadric primitive by our method. Subsequently, we can export the primitive parameters and CSG sequence into an OpenSCAD script. Fig. 5 shows the shape editing results.

5 Discussion, Limitation, and Future Work

We present D²CSG, a simple yet effective idea, for unsupervised learning of general and compact CSG tree representations of 3D CAD objects. Extensive experiments on the ABC and ShapeNet datasets demonstrate that our network outperforms state-of-the-art methods both in reconstruction quality and compactness. We also have ample visual evidence that the CSG trees obtained by our method tend to be more natural than those produced by prior approaches.

Our network does not generalize over a shape collection; it “overfits” to a single input shape and is in essence an optimization to find a CSG assembly. While arguably limiting, this is not entirely unjustified since the CAD shapes we seek to handle, i.e., those from ABC, do not appear to possess sufficient generalizability in their primitive assemblies. Another limitation is the quadric primitive representation cannot support but only approximate complex surface, such as torus and NURBS. In addition, incorporating interpretable CSG operations into the network tends to cause gradient back-propagation issues and limits the reconstruction accuracy of small details such as decorative curves on chair legs. We would also like to extend our method to structured CAD shape reconstruction from images and free-form sketches. Another interesting direction for future work is to scale the primitive assembly optimization from CAD parts to indoor scenes.

References

[1] Dan Cascaval, Mira Shalah, Phillip Quinn, Rastislav Bodik, Maneesh Agrawala, and Adriana Schulz. Differentiable 3d cad programs for bidirectional editing. Computer Graphics Forum, 41(2):309–323, 2022.
[2] Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, Jianxiong Xiao, Li Yi, and Fisher Yu. ShapeNet: An information-rich 3D model repository. arXiv preprint arXiv:1512.03012, 2015.
[3] Zhiqin Chen, Andrea Tagliasacchi, and Hao Zhang. Bsp-net: Generating compact meshes vis binary space partitioning. In CVPR, pages 45–54, 2020.
[4] Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. In CVPR, pages 5939–5948, 2019.
[5] Thomas Davies, Derek Nowrouzezahrai, and Alec Jacobson. Overfit neural networks as a compact shape representation. arXiv preprint arXiv:2009.09808, 2020.
[6] Jianmin Zheng Daxuan Ren, Jianfei Cai, Haiyong Jiang Jiatong Li, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, and Shuai Yi. CSG-Stump: A learning friendly csg-like representation for interpretable shape parsing. In ICCV, pages 12458–12467, 2021.
[7] Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, and Andrea Tagliasacchi. Cvxnet: Learnable convex decomposition. In CVPR, pages 31–44, 2020.
[8] Theo Deprelle, Thibault Groueix, Matthew Fisher, Vladimir Kim, Bryan Russell, and Mathieu Aubry. Learning elementary structures for 3d shape generation and matching. Advances in Neural Information Processing Systems, 32, 2019.
[9] Tao Du, Jeevana Priya Inala, Yewen Pu, Andrew Spielberg, Adriana Schulz, Daniela Rus, Armando Solar-Lezama, and Wojciech Matusik. Inversecsg: Automatic conversion of 3d models to csg trees. ACM Trans. on Graphics (TOG), 37(6):1–16, 2018.
[10] Anastasia Dubrovina, Fei Xia, Panos Achlioptas, Mira Shalah, Raphaël Groscot, and Leonidas J Guibas. Composite shape modeling via latent space factorization. In ICCV, pages 8140–8149, 2019.
[11] Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama. Write, execute, assess: Program synthesis with a repl. Advances in Neural Information Processing Systems, 32, 2019.
[12] Kyle Genova, Forrester Cole, Daniel Vlasic, Aaron Sarna, William T Freeman, and Thomas Funkhouser. Learning shape templates with structured implicit functions. In CVPR, pages 7154–7164, 2019.
[13] Hao-Xiang Guo, Liu Yang, Hao Pan, and Baining Guo. Nh-rep: Neural halfspace representations for implicit conversion of b-rep solids. ACM Transactions on Graphics (TOG), 2022.
[14] Pradeep Kumar Jayaraman, Aditya Sanghi, Joseph G Lambourne, Karl DD Willis, Thomas Davies, Hooman Shayani, and Nigel Morris. Uv-net: Learning from boundary representations. In CVPR, pages 11703–11712, 2021.
[15] Benjamin Jones, Dalton Hildreth, Duowen Chen, Ilya Baran, Vladimir G Kim, and Adriana Schulz. Automate: A dataset and learning approach for automatic mating of cad assemblies. ACM Trans. on Graphics (TOG), 40(6):1–18, 2021.
[16] R. Kenny Jones, Theresa Barton, Xianghao Xu, Kai Wang, Ellen Jiang, Paul Guerrero, Niloy Mitra, and Daniel Ritchie. Shapeassembly: Learning to generate programs for 3d shape structure synthesis. ACM Transactions on Graphics (TOG), Siggraph Asia 2020, 39(6):Article 234, 2020.
[17] Kacper Kania, Maciej Zieba, and Tomasz Kajdanowicz. Ucsg-net-unsupervised discovering of constructive solid geometry tree. Advances in Neural Information Processing Systems, 33:8776–8786, 2020.
[18] Yuki Kawana, Yusuke Mukuta, and Tatsuya Harada. Neural star domain as primitive representation. Advances in Neural Information Processing Systems, 33:7875–7886, 2020.
[19] Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. ABC: A big cad model dataset for geometric deep learning. In CVPR, pages 9601–9611, June 2019.
[20] Joseph G Lambourne, Karl DD Willis, Pradeep Kumar Jayaraman, Aditya Sanghi, Peter Meltzer, and Hooman Shayani. Brepnet: A topological message passing system for solid models. In CVPR, pages 12773–12782, 2021.
[21] Eric-Tuan Lê, Minhyuk Sung, Duygu Ceylan, Radomir Mech, Tamy Boubekeur, and Niloy J Mitra. Cpfn: Cascaded primitive fitting networks for high-resolution point clouds. In CVPR, pages 7457–7466, 2021.
[22] Changjian Li, Hao Pan, Adrien Bousseau, and Niloy J. Mitra. Free2cad: Parsing freehand drawings into cad commands. ACM Trans. on Graphics (TOG), 41(4), 2022.
[23] Lingxiao Li, Minhyuk Sung, Anastasia Dubrovina, Li Yi, and Leonidas J Guibas. Supervised fitting of geometric primitives to 3d point clouds. In CVPR, pages 2652–2660, 2019.
[24] Yichen Li, Kaichun Mo, Lin Shao, Minhyuk Sung, and Leonidas Guibas. Learning 3d part assembly from a single image. In ECCV, pages 664–682. Springer, 2020.
[25] Yangyan Li, Xiaokun Wu, Yiorgos Chrysathou, Andrei Sharf, Daniel Cohen-Or, and Niloy J. Mitra. Globfit: Consistently fitting primitives by discovering global relations. ACM Trans. on Graphics (TOG), 30(4), 2011.
[26] Zhijian Liu, William T Freeman, Joshua B Tenenbaum, and Jiajun Wu. Physical primitive decomposition. In ECCV, pages 3–19, 2018.
[27] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3D reconstruction in function space. In CVPR, pages 4460–4470, 2019.
[28] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
[29] Kaichun Mo, Shilin Zhu, Angel X. Chang, Li Yi, Subarna Tripathi, Leonidas J. Guibas, and Hao Su. PartNet: A large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In CVPR, pages 90–918, 2019.
[30] Chengjie Niu, Jun Li, and Kai Xu. Im2struct: Recovering 3d shape structure from a single RGB image. In CVPR, 2018.
[31] Wamiq Para, Shariq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas J Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches. Advances in Neural Information Processing Systems, 34:5077–5088, 2021.
[32] Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR, pages 165–174, 2019.
[33] Despoina Paschalidou, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. Neural parts: Learning expressive 3d shape abstractions with invertible neural networks. In CVPR, pages 3204–3215, 2021.
[34] Despoina Paschalidou, Ali Osman Ulusoy, and Andreas Geiger. Superquadrics revisited: Learning 3d shape parsing beyond cuboids. In CVPR, pages 10344–10353, 2019.
[35] Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, and Junzhe Zhang. Extrudenet: Unsupervised inverse sketch-and-extrude for shape parsing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 482–498. Springer, 2022.
[36] Gopal Sharma, Rishabh Goyal, Difan Liu, Evangelos Kalogerakis, and Subhransu Maji. CSGNet: neural shape parser for constructive solid geometry. In CVPR, pages 5515–5523, 2018.
[37] Gopal Sharma, Difan Liu, Subhransu Maji, Evangelos Kalogerakis, Siddhartha Chaudhuri, and Radomír Měch. Parsenet: A parametric surface fitting network for 3d point clouds. In ECCV, pages 261–276, 2020.
[38] Towaki Takikawa, Joey Litalien, Kangxue Yin, Karsten Kreis, Charles Loop, Derek Nowrouzezahrai, Alec Jacobson, Morgan McGuire, and Sanja Fidler. Neural geometric level of detail: Real-time rendering with implicit 3d shapes. In CVPR, pages 11358–11367, 2021.
[39] Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T Freeman, Joshua B Tenenbaum, and Jiajun Wu. Learning to infer and execute 3d shape programs. arXiv preprint arXiv:1901.02875, 2019.
[40] Shubham Tulsiani, Hao Su, Leonidas J. Guibas, Alexei A. Efros, and Jitendra Malik. Learning shape abstractions by assembling volumetric primitives. In CVPR, pages 2635–2643, 2017.
[41] Mikaela Angelina Uy, Yen yu Chang, Minhyuk Sung, Purvi Goel, Joseph Lambourne, Tolga Birdal, and Leonidas Guibas. Point2Cyl: Reverse engineering 3D objects from point clouds to extrusion cylinders. In CVPR, pages 11850–11860, 2022.
[42] Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, and Daniele Panozzo. Deep geometric prior for surface reconstruction. In CVPR, pages 10130–10139, 2019.
[43] Karl D.D. Willis, Pradeep Kumar Jayaraman, Hang Chu, Yunsheng Tian, Yifei Li, Daniele Grandi, Aditya Sanghi, Linh Tran, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. JoinABLe: Learning bottom-up assembly of parametric CAD joints. In CVPR, pages 15849–15860, 2022.
[44] Karl DD Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences. ACM Transactions on Graphics (TOG), 40(4):1–24, 2021.
[45] Karl D. D. Willis, Yewen Pu, Jieliang Luo, Hang Chu, Tao Du, Joseph G. Lambourne, Armando Solar-Lezama, and Wojciech Matusik. Fusion 360 gallery: A dataset and environment for programmatic cad construction from human design sequences. ACM Trans. on Graphics (TOG), 40(4), 2021.
[46] Rundi Wu, Chang Xiao, and Changxi Zheng. DeepCAD: A deep generative network for computer-aided design models. In ICCV, pages 6772–6782, 2021.
[47] Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6772–6782, 2021.
[48] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In CVPR, pages 1912–1920, 2015.
[49] Xianghao Xu, Wenzhe Peng, Chin-Yi Cheng, Karl DD Willis, and Daniel Ritchie. Inferring cad modeling sequences using zone graphs. In CVPR, pages 6062–6070, 2021.
[50] Xiang Xu, Karl DD Willis, Joseph G Lambourne, Chin-Yi Cheng, Pradeep Kumar Jayaraman, and Yasutaka Furukawa. Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks. arXiv preprint arXiv:2207.04632, 2022.
[51] Kaizhi Yang and Xuejin Chen. Unsupervised learning for cuboid shape abstraction via joint segmentation from point clouds. ACM Transactions on Graphics (TOG), 40(4):1–11, 2021.
[52] Chun-Han Yao, Wei-Chih Hung, Varun Jampani, and Ming-Hsuan Yang. Discovering 3d parts from image collections. In ICCV, pages 12981–12990, 2021.
[53] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
[54] Fenggen Yu, Zhiqin Chen, Manyi Li, Aditya Sanghi, Hooman Shayani, Ali Mahdavi-Amiri, and Hao Zhang. Capri-net: Learning compact cad shapes with adaptive primitive assembly. In CVPR, pages 11768–11778, 2022.
[55] Chuhang Zou, Ersin Yumer, Jimei Yang, Duygu Ceylan, and Derek Hoiem. 3d-prnn: Generating shape primitives with recurrent neural networks. In ICCV, pages 900–909, 2017.

D2CSG: Unsupervised Learning of Compact CSG Trees with Dual Complements and Dropouts