This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Discovering Design Concepts for CAD Sketches

Yuezhi Yang
The University of Hong Kong
Microsoft Research Asia
[email protected]
&Hao Pan
Microsoft Research Asia
[email protected]
Work done during internship at Microsoft Research Asia.
Abstract

Sketch design concepts are recurring patterns found in parametric CAD sketches. Though rarely explicitly formalized by the CAD designers, these concepts are implicitly used in design for modularity and regularity. In this paper, we propose a learning based approach that discovers the modular concepts by induction over raw sketches. We propose the dual implicit-explicit representation of concept structures that allows implicit detection and explicit generation, and the separation of structure generation and parameter instantiation for parameterized concept generation, to learn modular concepts by end-to-end training. We demonstrate the design concept learning on a large scale CAD sketch dataset and show its applications for design intent interpretation and auto-completion.

1 Introduction

Parametric CAD modeling is a standard paradigm for mechanical CAD design nowadays. In parametric modeling, CAD sketches are fundamental 2D shapes used for various 3D construction operations. As shown in Fig. 1, a CAD sketch is made of primitive geometric elements (e.g. lines, arcs, points) which are constrained by different relationships (e.g. coincident, parallel, tangent); the sketch graph of primitive elements and constraints captures design intents, and allows adaptation and reuse of designed parts by changing parameters and updating all related elements automatically [1]. Designers are therefore tasked with the meticulous design of such sketch graphs, so that the inherent high-level design intents are easy to interpret and disentangle. To this end, meta-structures (Fig. 1), which we call sketch concepts in this paper, capture repetitive design patterns and regulate the design process with more efficient intent construction and communication [9, 12]. Concretely, each sketch concept is a structure that encapsulates specific primitive elements and their compositional constraints, and the interactions of its internal elements with outside only go through the interface of the concept.

How to discover these modular concepts automatically from raw sketch graphs? In this paper, we cast this task as a program library induction problem by formulating a domain specific language (DSL) for sketch generation, where a sketch graph is formalized as a program, and sketch concepts are modular functions that abstract primitive elements and compose the program (Fig. 1). Discovering sketch concepts thus becomes the induction of library functions from sketch programs. While previous works address the general library induction problem via expensive combinatorial search [20, 5, 7], we present a simple end-to-end deep learning solution for sketch concepts. Specifically, we bridge the implicit and explicit representations of sketch concepts, and separate concept structure generation from parameter instantiation, so that a powerful deep network can detect and generate sketch concepts, by training with the inductive objective of reconstructing sketch with modular concepts.

We conduct experiments on large-scale sketch datasets [17]. The learned sketch concepts show that they provide modular interpretation of design sketches. The network can also be trained on incomplete input sketches and learn to auto-complete them. Comparisons with state-of-the-art approaches that solve sketch graph generation through autoregressive models show that the modular sketch concepts learned by our approach enable more accurate and interpretable completion results.

To summarize, we make the following contributions in this paper:

  • We formulate the task of discovering modular CAD sketch concepts as program library induction for a declarative DSL modeling sketch graphs.

  • We propose a self-supervised deep learning framework that discovers modular libraries for the DSL with simple end-to-end training.

  • We show the framework learns from large-scale datasets sketch concepts capturing intuitive and reusable components, and enables structured sketch interpretation and auto-completion.

\begin{overpic}[width=433.62pt]{images/Teaser2.pdf} \put(0.0,21.2){\footnotesize(a)} \put(53.0,19.5){\scriptsize${\color[rgb]{0,1,0}S.t_{0}}:\mathbf{T}_{0}^{1}$} \put(53.0,1.5){\scriptsize${\color[rgb]{0.55,0.14,1}S.t_{1}}:\mathbf{T}_{0}^{1}$} \put(79.0,19.5){\scriptsize${\color[rgb]{1,0,0}S.t_{2}}:\mathbf{T}_{1}^{1}$} \end{overpic}

(b) Learned program that restructures the sketch:

𝐓01\displaystyle\centering\mathbf{T}_{0}^{1}\rightarrow λ(α0o,α1o).{t0:Arc,t1:Tang.,t2,t3:Coinc.,R={t1(t0,α0o),t2(t0,α0o),t3(t0,α1o)}}\displaystyle\lambda(\alpha_{0}^{o},\alpha_{1}^{o}).\{t_{0}{:}\textrm{Arc},t_{1}{:}\textrm{Tang.},t_{2},t_{3}{:}\textrm{Coinc.},R{=}\{t_{1}(t_{0},\alpha_{0}^{o}),t_{2}(t_{0},\alpha_{0}^{o}),t_{3}(t_{0},\alpha_{1}^{o})\}\}
𝐓11\displaystyle\mathbf{T}_{1}^{1}\rightarrow λ(α0i,α1i).{t0,t1,t2,t3:Line,t4,t5,t6,t7:Coinc.,t8:Perpend.,t9:Parallel,t10:Distance,t11:Horizon,\displaystyle\lambda(\alpha_{0}^{i},\alpha_{1}^{i}).\{t_{0}{,}t_{1}{,}t_{2}{,}t_{3}{:}\textrm{Line},t_{4}{,}t_{5}{,}t_{6}{,}t_{7}{:}\textrm{Coinc.},t_{8}{:}\textrm{Perpend.},t_{9}{:}\textrm{Parallel},t_{10}{:}\textrm{Distance},t_{11}{:}\textrm{Horizon},
R={t4(t0,t3),t5(t0,t2),t0(t1,t2),t7(t1,t3),t8(t1,t2),t9(t2,t3),t10(t2,t3),t11(t3),α0i(t1),α1i(t0)}}\displaystyle R{=}\{t_{4}(t_{0},t_{3}),t_{5}(t_{0},t_{2}),t_{0}(t_{1},t_{2}),t_{7}(t_{1},t_{3}),t_{8}(t_{1},t_{2}),t_{9}(t_{2},t_{3}),t_{10}(t_{2},t_{3}),t_{11}(t_{3}),\alpha_{0}^{i}(t_{1}),\alpha_{1}^{i}(t_{0})\}\}
S\displaystyle S\rightarrow {t0,t1:𝐓01,t2:𝐓11,R={t0(t2.α1i,t2.α0i),t1(t2.α0i,t2.α1i)}}\displaystyle\{{\color[rgb]{0,1,0}t_{0}}{,}{\color[rgb]{0.55,0.14,1}t_{1}}{:}\mathbf{T}_{0}^{1},{\color[rgb]{1,0,0}t_{2}}{:}\mathbf{T}_{1}^{1},R{=}\{t_{0}(t_{2}.\alpha_{1}^{i},t_{2}.\alpha_{0}^{i}),t_{1}(t_{2}.\alpha_{0}^{i},t_{2}.\alpha_{1}^{i})\}\}
Figure 1: Concept learning from sketch graphs. (a) In black are the raw sketch and its constraint graph, with nodes showing primitives and edges depicting constraints. Colored are the restructured sketch and its modular constraint graph, where each module box represents a concept; primitives and constraint edges are colored according to the modular concepts. (b) The restructured sketch graph in our DSL program representation (List 1), where the whole sketch SS is compactly constructed with three instances of two learned 𝕃1\mathbb{L}^{1} types. We simplify notation super/sub-scripts for readability.

2 Related work

Concept discovery for CAD sketch It is well acknowledged in the CAD design community that design intents are inherent to and implicitly encoded by the combinations of geometric primitives and constraints [12, 10]. However, there is generally no easy approach to discover the intents and make them explicit, albeit through manual design of meta-templates guided by expert knowledge [9, 10]. We propose an automatic approach to discover such intents, by formulating the intents as modular structures with self-contained references, and learning them through self-supervised inductive training with simple objectives on large raw sketch dataset. Therefore, we provide an automatic approach for discovering combinatorially complex structures through end-to-end neural network learning.

Generative models for CAD sketch A series of recent works [6, 24, 13, 18, 25] use autoregressive models [22] to generate CAD sketches and constraints modeled through pointer networks [23]. These works focus on learning from large datasets [17] to generate plausible layouts of geometric primitives and their constraints, which can then be fine-tuned with a constraint solver for more regular sketches. Different from these works, our aim is to discover modular structures (i.e. sketch concepts) from the concrete sketches. Therefore, our framework provides higher-level interpretation of raw sketches and more transparent auto-completion than these works (cf. Sec. 6).

Program library induction for CAD modeling Program library induction has been studied in the shape modeling domain [7]. General program synthesis assisted by deep learning is a research topic with increasing popularity [20, 3, 4, 19, 5]. The library induction task specifically involves combinatorial search, as has been handled by neural guided search [20, 5] or by pure stochastic sampling [7]. We instead present an end-to-end learning algorithm for sketch concept induction. In particular, based on key observations about sketch concepts, we present implicit-explicit dual representations of concept library functions, and separate the concept structure generation from parameter instantiation, to enable self-supervised training with induction objectives.

3 CAD sketch concept formulation

To capture the notion of sketch concepts precisely, we formulate a domain specific language (DSL) (syntax given in List 1, an exhaustive list of data types given in the supplementary). In the DSL, we first define the basic data types, including length, angle, coordinate, and the reference type, where a reference binds to another reference or a primitive for modeling the constraint relationships. Second, we define the 𝕃0\mathbb{L}^{0} collection of primitive and constraint types as given in raw sketches. In particular, we regard the constraints as functions whose arguments are the references to bind with primitives, e.g. a coincident constraint c=λ(r1,r2:Ref).{}c=\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}, where a function is represented in the lambda calculus style (one may refer to [14] for introductory lambda calculus formality). Some constraints have parameters other than mere references, which are treated as variables inside, e.g. parallel distance in List 1111While other works [13, 18] have skipped such constraints, we preserve them but omit generating the parameter values that can be reliably deduced from primitives. See more discussions in the supplementary.. Third, we define the sketch concepts as 𝕃1\mathbb{L}^{1} types composed of 𝕃0\mathbb{L}^{0} types. To be specific, a composite type 𝐓i1𝕃1\mathbf{T}_{i}^{1}\in\mathbb{L}^{1} is a function with arguments [αk][\alpha_{k}] and members ti,j0:𝐓j0𝕃0t^{0}_{i,j}:\mathbf{T}_{j}^{0}\in\mathbb{L}^{0}, which are connected through a composition operator R𝐓i1={p(q)|p,q[ti,j0][αk]}R_{\mathbf{T}^{1}_{i}}=\{p(q)|p,q\in[t^{0}_{i,j}]{\cup}[\alpha_{k}]\} that specifies how each pair of primitive elements binds together. For example, a coincident constraint p=λ(r1,r2).{}p=\lambda(r_{1},r_{2}).\{\} may take a line primitive qq as its first argument and bind to an argument αk\alpha_{k} of the composite type as its second argument, i.e. p(q,αk)R𝐓i1p(q,\alpha_{k})\in R_{\mathbf{T}^{1}_{i}}; on the other hand, an argument αk\alpha_{k} may bind to a primitive qq, which is specified by αk(q)R𝐓i1\alpha_{k}(q)\in R_{\mathbf{T}^{1}_{i}}. Finally, an input sketch SS is restructured as a collection of composite types ti1:𝐓i1𝕃1t^{1}_{i}:\mathbf{T}_{i}^{1}\in\mathbb{L}^{1}, as well as their connections specified by a corresponding composition operator RSR_{S}. RSR_{S} records how different concepts bind through their arguments, which further transfers to 𝕃0\mathbb{L}^{0} typed elements inside the concepts and translates into the raw constraint relationships of the sketch graph. Fig. 1(b) shows an example DSL program encoding sketches and concepts.

// Basic data types
Length, Angle, Coord, Ref
// 𝕃0\mathbb{L}^{0} primitive types
Line \rightarrow cstart_x,cstart_y,cend_x,cend_yc_{start\_x},c_{start\_y},c_{end\_x},c_{end\_y} : Coord
Circle \rightarrow ccenter_x,ccenter_yc_{center\_x},c_{center\_y} : Coord, lradiusl_{radius} : Length
\cdots
// 𝕃0\mathbb{L}^{0} constraint types
Coincident \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Parallel Distance \rightarrow λ(r1,r2:Ref)\lambda(r_{1},r_{2}:\textrm{Ref}).{ldistl_{dist} : Length}
\cdots
// 𝕃1\mathbb{L}^{1} composite types
𝐓i1\mathbf{T}^{1}_{i} \rightarrow λ([αk:Ref]).{ti,j0\lambda([\alpha_{k}:\textrm{Ref}]).\{t^{0}_{i,j} : 𝐓j0𝕃0,R𝐓i1([ti,j0][αk])}\mathbf{T}_{j}^{0}\in\mathbb{L}^{0},R_{\mathbf{T}_{i}^{1}}\left([t^{0}_{i,j}]{\cup}[\alpha_{k}]\right)\}
// Sketch decomposition
S{ti1:𝐓i1𝕃1,RS([ti1])}S\rightarrow\{t^{1}_{i}:\mathbf{T}_{i}^{1}\in\mathbb{L}^{1},R_{S}([t^{1}_{i}])\}
List 1 A domain-specific language formulating CAD sketch concepts

Given the explicit formulation of CAD sketches through a DSL, the discovery of sketch concepts becomes the task of learning program libraries 𝕃1\mathbb{L}^{1} by induction on many sketch samples. Therefore, our task resembles shape program synthesis that aims at building modular programs for generating shapes [5, 7], and differs from works that use autoregressive language models to generate CAD sketch programs one token at a time [6, 13, 18]. In Sec. 6.2, we show that the structured learning of CAD sketches enables more robust auto-completion than unstructured language modeling.

The search of structured concepts is clearly a combinatorial problem with exponential complexity, which is intractable unless we can exploit the inherent patterns in large-scale sketch datasets. However, to enable deep learning based detection and search of structured concepts, we need to bridge the implicit deep representations and the explicit and interpretable structures, which we build through the following two key observations:

  • A concept has dual representations: implicit and explicit. The implicit representation as embeddings in latent spaces is compatible with deep learning, while the explicit representation provides structures on which desired properties (e.g. modularity) can be imposed.

  • A concept is a parameterized structure. A concept is a composite type with fixed modular structure for interpretability, but the structure is always instantiated by assigning parameters to its component primitives when the concept is found in a sketch.

3.1 Method overview

According to the two observations, we design an end-to-end sketch concept learning framework by self-supervised induction on sketch graphs. As shown in Fig. 2, the framework has two main steps before loss computation: a detection step that generates implicit representations of concepts making up the input sketch, and an explicit generation step that expands the implicit concepts into concrete structures on which self-supervision targets like reconstruction and modularity are applied.

Building on a state-of-the-art detection architecture [2], the detection module DD takes a sketch SS as input and detects the modular concepts within it, i.e. {𝐪i}=D(S,{𝐪¯i})\{\mathbf{q}_{i}\}=D(S,\{\overline{\mathbf{q}}_{i}\}), where the concepts are represented implicitly as latent codes {𝐪i}\{\mathbf{q}_{i}\}, and {𝐪¯i}\{\overline{\mathbf{q}}_{i}\} are a learnable set of concept instance queries. Notably, we apply vector quantization to the latent codes and obtain {𝐪i=min𝐩𝕃1||𝐩𝐪i||2}\{\mathbf{q}^{\prime}_{i}=\min_{\mathbf{p}\in\mathbb{L}^{1}}{||\mathbf{p}-\mathbf{q}_{i}||_{2}}\}, which ensures that each concept is selected from the common collection of learnable concepts 𝕃1\mathbb{L}^{1} used for restructuring all sketches.

The explicit generation module is separated into two sub-steps, structure generation and parameter instantiation, which ensures that the modular concept structures are explicit and reused throughout different sketch instances. Specifically, the structure network takes each quantized concept code 𝐪i\mathbf{q}^{\prime}_{i} and generates its explicit form 𝐓i1\mathbf{T}_{i}^{1} in terms of primitives and constraints of 𝕃0\mathbb{L}^{0} types along with the composition operator R𝐓i1R_{\mathbf{T}_{i}^{1}}. Subsequently, the parameter network instantiates the concept structure by assigning parameter values to each component of 𝐓i1\mathbf{T}_{i}^{1} conditioned on 𝐪i\mathbf{q}_{i} and input sketch, to obtain ti1t^{1}_{i}.

The composition operator RSR_{S} for combining {ti1}\{t^{1}_{i}\} is generated from a special latent code 𝐪R\mathbf{q}_{R} transformed by DD from a learnable token 𝐪¯R\overline{\mathbf{q}}_{R} appended to {𝐪¯i}\{\overline{\mathbf{q}}_{i}\}.

The entire model is trained end-to-end by reconstruction and modularity objectives. In particular, we design loss functions that measure differences between the generated and groundtruth sketch graphs, in terms of both per-element attributes and pairwise references. Given our explicit modeling of encapsulated structures of the learned concepts, we can further enhance the modularity of the generation by introducing a bias loss that encourages in-concept references.

\begin{overpic}[width=433.62pt]{images/pipeline_more_arrows} \put(2.0,1.5){input sketch $S{=}[t^{0}_{i}]$} \put(2.0,18.5){$[\mathbf{e}^{\prime}_{t_{i}^{0}}]$} \put(23.0,1.5){$[\overline{\mathbf{q}}_{i}]$} \put(23.0,18.0){$[\mathbf{q}_{i}]$} \put(27.8,15.7){$\mathbf{q}_{R}$} \put(27.8,4.2){$\overline{\mathbf{q}}_{R}$} \put(18.6,15.7){{\color[rgb]{1,1,1}$\mathbf{q}_{0}$}} \put(36.3,10.9){{\color[rgb]{1,1,1}$\mathbf{q}_{0}$}} \put(47.6,5.0){{\color[rgb]{1,1,1}$\mathbf{q}_{0}^{\prime}$}} \put(38.5,0.5){concept lib $\mathbf{L}^{1}$} \put(56.0,0.5){structure $\mathbf{T}^{1}$} \put(61.0,11.0){instance $t^{1}$} \put(73.0,2.0){$[t_{i}^{1}]$} \put(81.0,12.0){$R_{S}$} \put(80.0,9.0){assemble} \put(89.0,18.0){generation} \put(94.0,9.0){loss} \put(90.0,0.5){target} \put(50.8,5.0){{\color[rgb]{1,1,1}Struct}} \put(55.7,15.0){{\color[rgb]{1,1,1}Param}} \put(4.5,9.8){{\color[rgb]{1,1,1}Encoder}} \put(21.5,9.8){{\color[rgb]{1,1,1}Decoder}} \put(8.0,-1.9){\small(a) detection module} \put(42.0,-1.9){\small(b) generation module} \put(75.6,-1.9){\small(c) loss computation} \end{overpic}
Figure 2: Framework illustration. (a) The detection module is a transformer network that detects from the sketch sequence [ti0][t^{0}_{i}] implicitly encoded concepts [𝐪i][\mathbf{q}_{i}] and their composition 𝐪R\mathbf{q}_{R}. (b) Each 𝐪\mathbf{q} is quantized against the concept library 𝐋1\mathbf{L}^{1} to obtain prototype 𝐪\mathbf{q}^{\prime}, which is expanded by structure network into an explicit structure 𝐓1\mathbf{T}^{1} and further instantiated by parameter network into t1t^{1}. (c) The collection of [ti1][t^{1}_{i}] are assembled by composition operator RSR_{S} generated from 𝐪R\mathbf{q}_{R} to obtain the final generated sketch graph, which is compared with the input sketch for loss computation.

4 End-to-end sketch concept induction

4.1 Implicit concept detection

Sketch encoding A raw sketch SS can be serialized into a sequence of 𝕃0\mathbb{L}^{0} primitives and constraints. Previous works have adopted slightly different schemes to encode the sequence [6, 13, 18, 24, 25]. In this paper, we build on the previous works and take a simple strategy akin to [13, 25] for input sketch encoding. Specifically, we split each 𝕃0\mathbb{L}^{0} typed instance t0t^{0} into several tokens: type, parameter, and a list of references. For each of the token category, we use a specific embedding module. For example, parameters as scalars are quantized into finite bins before being embedded as vectors (see supplementary for the quantization details), and since there are at most five parameters for each primitive, we pack all parameter embeddings into a single code. On the other hand, each constraint reference as a primitive index is directly embedded as a code. Therefore, each token of a 𝕃0\mathbb{L}^{0} typed instance is encoded as

𝐞t0.x=enctype(t0)+encpos(t0.x)+[encparam(t0.x)|encref(t0.x)],\mathbf{e}_{t^{0}.x}=\textrm{enc}_{type}(t^{0})+\textrm{enc}_{pos}(t^{0}.x)+\left[\textrm{enc}_{param}(t^{0}.x)|\textrm{enc}_{ref}(t^{0}.x)\right], (1)

where t0.xt^{0}.x iterates over the split tokens (i.e., type, parameters, references), the type embedding is shared for all tokens of the instance, the position embedding counts the token index in the whole split-tokenized sequence of SS, and parameter or reference embeddings are applied where applicable.

Concept detection We build the detection network as an encoder-decoder transformer following [2]. The transformer encoder operates on the sketch encoded sequence [𝐞ti0S][\mathbf{e}_{t_{i}^{0}\in S}] and produces the contextualized sequence [𝐞ti0S][\mathbf{e}^{\prime}_{t_{i}^{0}\in S}] through layers of self-attention and feed-forward. The transformer decoder takes a learnable set of concept queries [𝐪¯i][\overline{\mathbf{q}}_{i}] of size kqryk_{qry} plus a special query 𝐪¯R\overline{\mathbf{q}}_{R} for composition generation, and applies interleaved self-attention, cross-attention to [𝐞ti0][\mathbf{e}^{\prime}_{t_{i}^{0}}] and feed-forward layers to obtain the implicit concept codes [𝐪i][\mathbf{q}_{i}] and 𝐪R\mathbf{q}_{R}. The concept codes are further quantized into [𝐪i][\mathbf{q}^{\prime}_{i}] by selecting concept prototypes from a library 𝐋1\mathbf{L}^{1} implicitly encoding 𝕃1\mathbb{L}^{1}, before being expanded into explicit forms.

4.2 Explicit concept structure generation

Concept structure expansion Given a library code 𝐪𝐋1\mathbf{q}^{\prime}\in\mathbf{L}^{1} representing a type 𝐓1𝕃1\mathbf{T}^{1}\in\mathbb{L}^{1}, through an MLP we expand its explicit structure as a collection of codes [𝐭i0][\mathbf{t}_{i}^{0}] representing the 𝕃0\mathbb{L}^{0} type instances [ti0][t_{i}^{0}] and a matrix representing the composition R𝐓1R_{\mathbf{T}^{1}} of [ti0][t_{i}^{0}] and arguments (cf. List 1).

\begin{overpic}[width=86.72267pt]{images/argument_passing} \scriptsize\put(3.0,91.0){concept A} \put(61.0,91.0){concept B} \put(43.0,90.0){$R_{S}$} \put(10.0,10.0){primitive} \put(60.0,10.0){constraint} \put(10.0,0.0){inward arg} \put(60.0,0.0){outward arg} \end{overpic}

We fix the maximum number of 𝕃0\mathbb{L}^{0} type instances to kL0k_{L^{0}} (12 by default), and split the arguments into two groups, inward arguments and outward arguments, each of maximum number kargk_{arg} (2 by default). Each type code 𝐭i0\mathbf{t}_{i}^{0} is decoded into discrete probabilities over 𝕃0\mathbb{L}^{0} with an additional probability for null type ϕ\phi to indicate the emptiness of this element (cf. Sec. 5.1), by dectype()\textrm{dec}_{type}(\cdot) as the inverse of enctype()\textrm{enc}_{type}(\cdot) in Sec. 4.1. An inward argument only points to a primitive inside the concept structure and originates from a constraint outside, and conversely an outward argument only points to primitives outside and originates from a constraint inside the concept (see inset for illustration); the split into two groups eases composition computation, as discussed below.

The composition operator R𝐓1R_{\mathbf{T}^{1}} is implemented as an assignment matrix 𝐑𝐓1\mathbf{R}_{\mathbf{T}^{1}} of shape (2kL0+karg)×(kL0+karg)(2k_{L^{0}}{+}k_{arg})\times(k_{L^{0}}{+}k_{arg}), where each row corresponds to a constraint reference or inward argument, and each column to a primitive or outward argument. The two-fold coefficient of constraint references comes from that any constraint we considered in the dataset [17] has at most two arguments. Each row is a discrete probability distribution such that j𝐑𝐓1[i,j]=1\sum_{j}{\mathbf{R}_{\mathbf{T}^{1}}[i,j]}=1, with the maximum entry signifying that the ii-th constraint/outward argument refers to the jj-th primitive/inward argument. We compute 𝐑𝐓1\mathbf{R}_{\mathbf{T}^{1}} by first mapping the concept code 𝐪\mathbf{q}^{\prime} to a matrix of logits in the shape of 𝐑𝐓1\mathbf{R}_{\mathbf{T}^{1}}, and then applying softmax transform for each row. Notably, we avoid the meaningless loops of an element referring back to itself, and inward arguments referring to outward arguments, by masking the diagonal blocks 𝐑𝐓1[2i:2i+2,i],i[kL0]\mathbf{R}_{\mathbf{T}^{1}}[2i{:}2i{+}2,i],i{\in}[k_{L^{0}}] and the argument block 𝐑𝐓1[2kL0:,kL0:]\mathbf{R}_{\mathbf{T}^{1}}[2k_{L^{0}}{:},k_{L^{0}}{:}] by setting their logits to -\infty.

Cross-concept composition Aside from references inside a concept, references across concepts are generated to complete the whole sketch graph. We achieve cross-concept references by argument passing (see inset above for illustration). In particular, we implement the cross-concept composition operator RSR_{S} as an assignment matrix 𝐑S\mathbf{R}_{S} of shape (kqrykarg)×(kqrykarg)(k_{qry}{\cdot}k_{arg})\times(k_{qry}{\cdot}k_{arg}) directly mapped from 𝐪R\mathbf{q}_{R} through an MLP. Similar to the in-concept composition matrix, each row of the cross-concept matrix is a discrete distribution such that j𝐑S[i,j]=1\sum_{j}{\mathbf{R}_{S}[i,j]}=1, with the maximum entry signifying that the (imodkarg)(i\bmod k_{arg})-th outward argument of the i/karg\lfloor i/k_{arg}\rfloor-th concept instance refers to the (jmodkarg)(j\bmod k_{arg})-th inward argument of the j/karg\lfloor j/k_{arg}\rfloor-th concept instance.

The complete cross-concept reference is therefore the product of three transport matrices:

𝐑cref[ti1,tj1]=𝐑ti1[:2kL0,kL0:]×𝐑S[ikarg:(i+1)karg,jkarg:(j+1)karg]×𝐑tj1[2kL0:,:kL0],\centering\mathbf{R}_{cref}[t_{i}^{1},t_{j}^{1}]=\mathbf{R}_{t^{1}_{i}}[{:}2k_{L^{0}},k_{L^{0}}{:}]\times\mathbf{R}_{S}[i{\cdot}k_{arg}{:}(i+1){\cdot}k_{arg},j{\cdot}k_{arg}{:}(j+1){\cdot}k_{arg}]\times\mathbf{R}_{t^{1}_{j}}[2k_{L^{0}}{:},{:}k_{L^{0}}],\@add@centering (2)

where 𝐑cref[ti1,tj1]\mathbf{R}_{cref}[t_{i}^{1},t_{j}^{1}] is a block assignment matrix of shape 2kL0×kL02k_{L^{0}}{\times}k_{L^{0}}. Intuitively, 𝐑cref[ti1,tj1]\mathbf{R}_{cref}[t_{i}^{1},t_{j}^{1}] specifies how constraints inside ti1t_{i}^{1} refers to primitives of tj1t_{j}^{1} throughout all possible paths crossing the arguments of two concepts.

Collectively, we denote the complete reference matrix for all pairs of generated 𝕃0\mathbb{L}^{0} elements as 𝐑\mathbf{R} of shape (2kqrykL0)×(kqrykL0)(2k_{qry}{\cdot}k_{L^{0}})\times(k_{qry}{\cdot}k_{L^{0}}), which includes in-concept and cross-concept references.

4.3 Concept instantiation by parameter generation

Instantiating a concept structure requires assigning parameters to the components where applicable. Therefore, as shown in Fig. 2, the parameter generation network takes a concept structure 𝐓1\mathbf{T}^{1} and its implicit instance encoding 𝐪\mathbf{q} as input, and produces the specific parameters for each 𝕃0\mathbb{L}^{0} typed instances inside the concept. In addition, as the parameters of generated instances are directly related to the input parameters of the raw sketch SS, we find it improves convergence and accuracy by allowing the parameter network to attend to the input tokens.

We implement the parameter network as a transformer decoder in a similar way as [2]. The instance code 𝐪\mathbf{q} is first expanded to kL0k_{L^{0}} tokens by a small MLP, which are summed with [𝐭i0][\mathbf{t}_{i}^{0}] token-wise to obtain the query codes. The parameter network then transforms the query codes through interleaved layers of self-attention, cross-attention to the contextualized input sequence [𝐞t0][\mathbf{e}^{\prime}_{t^{0}}], and feed-forward, before finally mapped to explicit parameters in the form of probabilities over quantized bins, through a decoding layer decparam()\textrm{dec}_{param}(\cdot) that is inverse of encparam()\textrm{enc}_{param}(\cdot) in Sec. 4.1.

5 Induction objectives

Without any given labels of concepts, we use the following objectives to supervise the inductive network training: sketch reconstruction, concept quantization, and modularity enhancement.

5.1 Reconstruction loss

As discussed in Sec. 4, an input sketch is restructured into a set of sketch concepts which are expanded into a graph of primitives and constraints; the generated sketch graph S~\widetilde{S} is compared with the input sketch SS for reconstruction supervision.

The comparison of generated and target graphs requires a one-to-one correspondence between elements of the two graphs, on which the graph differences can be measured. However, it is nontrivial to find such a matching, because not only are there variable numbers of elements in the two graphs, but also both elements and references between elements must be taken into account for matching. To this end, we build a cost matrix that measures the difference for each pair of generated and target elements, in terms of their attributes and references, and apply linear assignment matching on the cost matrix [11, 8] to establish the optimal correspondence between two graphs.

Cost matrix construction To compare each pair of generated element and target element, we measure their type differences, and further use type casting to interpret the generated element as the target type, so that their parameters can be compared. To account for reference differences between the two elements, we compare the reference arrows by the differences of their pointed primitives.

\begin{overpic}[width=69.38078pt,trim=0.0pt -25.60747pt 0.0pt 0.0pt]{images/costmatrix} \footnotesize\put(-18.0,35.0){\rotatebox{90.0}{targets}} \put(17.0,109.0){generations} \put(21.0,6.0){cost matrix} \put(21.0,-5.0){\scriptsize{\color[rgb]{.75,1,0}green}: primitives} \put(21.0,-16.0){\scriptsize{\color[rgb]{1,.5,0}orange}: constraints} \put(-5.0,36.0){$p$} \put(-12.0,82.0){$p.r$} \put(47.0,97.0){$q$} \put(67.0,97.0){$q.r{\sim}\mathbf{R}[2q{+}r,:]$} \end{overpic}
Binary cost between target constraint pp and generated element qq.

Specifically, given the target graph SS of ktgtk_{tgt} elements and the generated graph S~\widetilde{S} of kqrykL0k_{qry}{\cdot}k_{L^{0}} elements, we build the cost matrix 𝐂\mathbf{C} of shape ktgt×(kqrykL0)k_{tgt}\times(k_{qry}{\cdot}k_{L^{0}}) in two steps. First, for a pair of target element pp and generated element qq, we compare their type and parameter differences by cross-entropy. We denote the cost matrix in this stage as 𝐂ury\mathbf{C}_{ury}, as it accounts for the element-wise unary distances between two graphs. Second, to measure the binary distances of references, for each target constraint element pp and its rr-th referenced primitive p.rp.r, its distance from the generated references of element qq is computed as (also illustrated by inset figure):

𝐂bry[p,q]=r{0,1}jS~𝐑[2q+r,j]×𝐂ury[p.r,j],\mathbf{C}_{bry}[p,q]=\sum_{r\in\{0,1\}}{\sum_{j\in\widetilde{S}}{\mathbf{R}[2q+r,j]\times\mathbf{C}_{ury}[p.r,j]}}, (3)

where 𝐑[2q+r,j]\mathbf{R}[2q+r,j] is the probability of qq taking jj as its rr-th reference, as predicted by the composition operation (Sec. 4.2). Intuitively, the binary cost is a summation of unary costs weighted by predicted reference probabilities, where the unary costs measure how different a generated pointed primitive is from the target pointed primitive. The complete cost matrix is 𝐂=wury𝐂ury+wbry𝐂bry\mathbf{C}=w_{ury}\mathbf{C}_{ury}+w_{bry}\mathbf{C}_{bry}, with wury=50,wbry=1w_{ury}=50,w_{bry}=1; we give a larger weight to the unary costs because meaningful binary costs depend on reliable unary costs in the first place, as evident in Eq. (3).

Matching and reconstruction loss Given the cost matrix 𝐂\mathbf{C}, we apply linear assignment to obtain a matching σ:S~S{ϕ}\sigma:\widetilde{S}\rightarrow S{\cup}\{\phi\} between S~\widetilde{S} and SS. Note that the number of elements of these two graphs can be different, but we have chosen kqry,kL0k_{qry},k_{L^{0}} such that the generated elements always cover the target elements. Therefore, σ(q)=p\sigma(q)=p assigns a matched generation qMS~q{\in}M{\subset}\widetilde{S} to a target pSp{\in}S, but assigns the rest unmatched generations M=S~MM^{\prime}=\widetilde{S}{\setminus}M to the empty target ϕ\phi, i.e. σ(M)=ϕ\sigma(M^{\prime})=\phi. The loss terms for matched generations are simply the corresponding cost terms 𝐂[σ(q),q],qM\mathbf{C}[\sigma(q),q],q\in M; for unmatched generations, we supervise its type to be the empty type ϕ\phi and neglect its parameters or references. We denote the average loss of all generated terms as recon\mathcal{L}_{recon}.

Besides matching cost, we also use an additional reference loss to encourage the generated references to be sharp (i.e., 𝐑\mathbf{R} being closer to binary). This loss complements the binary costs mentioned above by making sure that even if the generated primitives are similar, a generated constraint only refers to one primitive sharply.

We define the sharp reference loss as

sharp=1|Sc|pSc,rlog(𝐑[2σ1(p)+r,σ1(p.r)]),\mathcal{L}_{sharp}=-\frac{1}{|S_{c}|}\sum_{p\in S_{c},r}{\log{\mathbf{R}[2\sigma^{-1}(p)+r,\sigma^{-1}(p.r)]}}, (4)

where pp iterates over the target constraints ScS_{c}, σ1()\sigma^{-1}(\cdot) is the inverse mapping from target element to generation, and we skip a term if p.rp.r does not exist for constraints with one reference.

5.2 Concept quantization loss

Following [21, 16], we optimize the concept code quantization against library 𝐋1\mathbf{L}^{1} with:

vq=1kqryi[kqry]sg(𝐪i)𝐪i+β𝐪isg(𝐪i),\mathcal{L}_{vq}=\frac{1}{k_{qry}}\sum_{i\in[k_{qry}]}{||\textrm{sg}(\mathbf{q}_{i})-\mathbf{q}^{\prime}_{i}||+\beta||\mathbf{q}_{i}-\textrm{sg}(\mathbf{q}^{\prime}_{i})||}, (5)

where sg()\textrm{sg}(\cdot) is the stop gradient operation. For training stability, we follow [16] and replace the first term with EMA updates of 𝐪𝐋1\mathbf{q}^{\prime}\in\mathbf{L}^{1}. Furthermore, we improve spare code usage by reviving unused code in 𝐋1\mathbf{L}^{1} periodically [16] (please refer to supplementary for details).

5.3 Modularity enhancement loss

We look for modular 𝕃1\mathbb{L}^{1} concepts that have rich and meaningful encapsulated structures, rather than arbitrary groups of 𝕃0\mathbb{L}^{0} elements that rely on cross-group references to recover the graph structures. This modularity can be enhanced by limiting the use of arguments for sketch concepts. Instead of allocating very few arguments as a hard constraint, we introduce a soft bias loss to encourage the restrictive use of arguments, which may still cover cases when more arguments are needed for accurate reconstruction. To be specific, we penalize the accumulated probability of elements pointing to outward arguments:

bias=1|Sc|pSc,ri[karg]𝐑𝐓1σ1(p)[2σ1(p)+r,i+kL0],\mathcal{L}_{bias}=\frac{1}{|S_{c}|}\sum_{p\in S_{c},r}{\sum_{i\in[k_{arg}]}{\mathbf{R}_{\mathbf{T}^{1}\ni\sigma^{-1}(p)}[2\sigma^{-1}(p)+r,i+k_{L^{0}}]}}, (6)

where again pp iterates over the target constraints ScS_{c}, σ1()\sigma^{-1}(\cdot) is the inverse mapping from target element to generation, and i+kL0i+k_{L^{0}} slices the reference probabilities to outward arguments.

5.4 Total loss

The training objective sums up losses of reconstruction, concept quantization and modularity bias:

total=wreconrecon+wsharpsharp+wvqvq+wbiasbias,\mathcal{L}_{total}=w_{recon}\mathcal{L}_{recon}+w_{sharp}\mathcal{L}_{sharp}+w_{vq}\mathcal{L}_{vq}+w_{bias}\mathcal{L}_{bias}, (7)

where we empirically use weights wrecon=1,wsharp=20,wvq=1,wbias=25w_{recon}=1,w_{sharp}=20,w_{vq}=1,w_{bias}=25 throughout all experiments unless otherwise specified in the ablation studies.

6 Results

Dataset and implementation Following previous works [6, 13, 18], we adopt the SketchGraphs dataset [17] which contains millions of real-world CAD sketches for training and evaluation. We filter the data by removing trivially simple sketches and duplicates, and limit the sketch complexity such that the number of primitives and constraints is within [20,50][20,50]. As a result, we obtain around 1 million sketches and randomly split them into 950k for training and 50k for testing. We defer network details to the supplementary and open-source code and data to facilitate future research222URL to code and data: https://github.com/yyuezhi/SketchConcept.

Evaluation metrics We evaluate the generated sketches in terms of reconstruction accuracy and sketch concept modularity, which are the two major objectives of our task. We measure reconstruction accuracy by the F-scores of generated primitives and constraints, where F-score is simply the harmonic mean of precision and recall. A generated primitive is considered a correct match with ground-truth if its type and parameters are correct, where for the scalar parameters we allow a threshold of 10%10\% of quantization levels. A constraint is correct if and only if its type, parameter and references match ground-truth, i.e., the generated qq is correct w.r.t target pp iff qq has the same type and parameters with pp and the primitives q.rq.r and p.rp.r are correctly matched. Modularity is measured by the percentage of constraints with references entirely within the encapsulating concepts, among all correct constraints.

[Uncaptioned image]
Figure 3: Design intent parsing. Left: input raw sketches and sketches restructured with concepts. Right: raw constraint graphs and modular constraint graphs. Primitives and constraints in the restructured sketches and graphs are colored according to concepts.
[Uncaptioned image]
Figure 4: Instances of concepts. Left: two learned rectangle concepts with subtly different structures. Right: four sketches containing instances of these two concepts.
\begin{overpic}[width=433.62pt]{images/auto_complete_main.pdf} \put(1.0,1.0){\small(a)} \put(60.0,1.0){\small(b)} \end{overpic}
Figure 5: Auto completion. Each example shows the input partial sketch (black) and groundtruth completion (red), result of the autoregressive baseline, and our result (colored by concepts).
\begin{overpic}[width=433.62pt,trim=0.0pt -14.22636pt 0.0pt 0.0pt]{images/auto_complete_metric.pdf} \put(15.0,-2.5){\small(a) primitive} \put(65.0,-2.5){\small(b) constraint} \end{overpic}
Figure 6: Auto completion comparison. Plotted are F-scores at different ratios of partial input.
Config Primitive Constraint Modular(%)
𝐂bry-\mathbf{C}_{bry} 0.993 0.290 34.2
sharp-\mathcal{L}_{sharp} 0.983 0.104 100
bias-\mathcal{L}_{bias} 0.992 0.743 13.3
Ours 0.994 0.766 50.8
Table 1: Loss ablation. F-scores are reported for primitives and constraints.

6.1 Design intent interpretation

By training our model on the raw sketches with self-supervised induction losses, we obtain a result library of sketch concepts and a model for design intent parsing that interprets a given sketch into modular concepts and their combination. Indeed, we find the automatically discovered concepts capture natural design intents and modular structures. For example, through the restructured sketches and constraint graphs in Figs. 1 and 3, we find that our network decomposes sketches into modular structures like rectangles, line-arcs and parallel lines that align symmetrically, even though no such prior knowledge is applied during training except for concept modularity. Fig. 4 shows that a given concept can be used repetitively in different sketches, and structures with subtle differences in constraint relations can be detected and distinguished into different concepts of the library. Note that these subtle structural differences are subsumed in the input sketch graph, which makes them more difficult to detect. We refer to the supplementary for more examples of design intent parsing and instantiation of learned concepts, as well as quantitative analysis of the learned library.

6.2 Auto completion

Auto-completion is a critical feature of CAD modeling software for assisting designers. Given a partial sketch of primitives and their constraints, auto-completion aims at complementing them with the rest primitives and constraints to form regular and well-structured designs. Therefore, our concept detection and generation approach would naturally enhance the auto-completion task with better regularity. For training and evaluation, following previous work [18], we synthesize the partial input by removing a suffix of random length (up to 50%50\%) from the sketch sequence, along with constraints that refer to the removed primitives, and make the model learn to generate the full sketches.

State-of-the-art methods [6, 13, 18, 24] formulate auto-completion through a combination of primitive and constraint generation models, both of which operate in an autoregressive fashion, with the constraint model conditioned on and referring (by pointers [23]) to the generated primitives. Since these works use diverse sketch encodings and have no publicly released code at submission time, for fair comparison, we implement the autoregressive baseline with our sketch encoding (Sec. 4.1).

Fig. 6 compares our method with autoregressive baseline under various primitive mask ratios: our method has superior primitive and constraint accuracy than the autoregressive baseline at almost all mask ratios. This difference confirms that since our method completes sketches concept-by-concept instead of primitive-by-primitive, more meaningful structures are likely to be generated. Our model also gains advantage by taking primitives and constraints together as input and generating primitives and constraints simultaneously, while in comparison the autoregressive baseline separates generation in two steps (primitives followed by constraints). Indeed, in practice CAD designers rarely finish all primitives first before supplementing the constraints, but rather apply constraints on partial primitives immediately whenever they form a design intent. Fig. 5 shows how our approach interprets the partial inputs and completes with modular concepts (see supplementary for more examples); in comparison, the autoregressive baseline does not provide such interpretable or regular completions.

6.3 Ablation study

To evaluate the impact of different loss terms of the induction objective (Sec. 5), we train several models in the absence of these losses respectively on the auto-encoding task. The results are shown in Table. 1. We see that removing the binary costs 𝐂bry\mathbf{C}_{bry} from reconstruction loss results in significant drop of constraint reconstruction, showing its necessity for constraint reference learning. Removing sharp reference loss sharp\mathcal{L}_{sharp} similarly fails constraint reference learning, although modularity enhancement bias loss makes all constraint references inside concepts. Removing the modularity enhancement bias loss bias\mathcal{L}_{bias} only results in a slight drop in reconstruction quality but a significant drop in modularity, since without it cross-concept reference through arguments is more likely and therefore modularity suffers. We provide more ablation tests on hyper parameters like the numbers of concept queries kqryk_{qry} and arguments kargk_{arg} in the supplementary.

7 Conclusion

CAD sketch concepts are meta-structures containing primitives and constraints that define modular sub-graphs and capture design intents. By formulating the sketch concepts as program libraries of a DSL, we present an end-to-end approach for discovering CAD sketch concepts through library induction learning. Key to our approach are the implicit-explicit representation of concepts and the separated structure generation and parameter instantiation for concept generation, which together enable the end-to-end training under self-supervised induction objectives. By training on large-scale sketch dataset, our approach enables the discovery of repetitive and modular concepts from raw sketches, and more structured and interpretable auto-completion than baseline autoregressive models.

Limitations and future work Design intents can be hierarchical [10], meaning that higher order meta-structures can be built out of lower order ones. In this sense, our framework only addresses the first order library induction, and should be extended for higher order library learning; toward this goal, we believe a progressive approach like [5] can be used with our framework as the one-step induction. In addition, similar strategies of end-to-end induction learning can be applied to constraint graphs involving 3D CAD operations or even more general programs in other domains, as long as they have similar declarative and parametric structures as sketch graphs.

References

  • Camba et al. [2016] Jorge D Camba, Manuel Contero, and Pedro Company. Parametric cad modeling: An analysis of strategies for design reusability. Computer-Aided Design, 74, 2016.
  • Carion et al. [2020] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to-end object detection with transformers. In ECCV, 2020.
  • Ellis et al. [2018] Kevin Ellis, Daniel Ritchie, Armando Solar-Lezama, and Josh Tenenbaum. Learning to infer graphics programs from hand-drawn images. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • Ellis et al. [2019] Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, and Armando Solar-Lezama. Write, execute, assess: Program synthesis with a repl. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • Ellis et al. [2021] Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sablé-Meyer, Lucas Morales, Luke Hewitt, Luc Cary, Armando Solar-Lezama, and Joshua B. Tenenbaum. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning. In Proceedings of the ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021, 2021.
  • Ganin et al. [2021] Yaroslav Ganin, Sergey Bartunov, Yujia Li, Ethan Keller, and Stefano Saliceti. Computer-aided design as language. In Advances in Neural Information Processing Systems, volume 34, 2021.
  • Jones et al. [2021] R. Kenny Jones, David Charatan, Paul Guerrero, Niloy J. Mitra, and Daniel Ritchie. Shapemod: Macro operation discovery for 3d shape programs. ACM Transactions on Graphics (Siggraph), 40(4), 2021.
  • Kuhn [1955] Harold W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83–97, 1955.
  • Kyratzi and Azariadis [2021] Sofia Kyratzi and Philip Azariadis. A constraint-based framework to recognize design intent during sketching in parametric environments. Computer-Aided Design and Applications, 18(3), 2021.
  • Kyratzi and Azariadis [2022] Sofia Kyratzi and Philip Azariadis. Integrated design intent of 3d parametric models. Computer-Aided Design, 146, 2022.
  • Munkres [1957] James Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957.
  • Otey et al. [2018] Jeffrey Otey, Pedro Company, Manuel Contero, and Jorge D. Camba. Revisiting the design intent concept in the context of mechanical cad education. Computer-Aided Design and Applications, 15(1), 2018.
  • Para et al. [2021] Wamiq Para, Shariq Bhat, Paul Guerrero, Tom Kelly, Niloy Mitra, Leonidas J Guibas, and Peter Wonka. Sketchgen: Generating constrained cad sketches. In Advances in Neural Information Processing Systems, volume 34, 2021.
  • Pierce [2002] Benjamin C Pierce. Types and programming languages. MIT press, 2002.
  • PTC Inc. [2022] PTC Inc. Onshape, 2022. URL https://www.onshape.com/en/. Accessed Apr. 20, 2022.
  • Razavi et al. [2019] Ali Razavi, Aaron van den Oord, and Oriol Vinyals. Generating diverse high-fidelity images with vq-vae-2. In Advances in Neural Information Processing Systems, volume 32, 2019.
  • Seff et al. [2020] Ari Seff, Yaniv Ovadia, Wenda Zhou, and Ryan P. Adams. SketchGraphs: A large-scale dataset for modeling relational geometry in computer-aided design. In ICML 2020 Workshop on Object-Oriented Learning, 2020.
  • Seff et al. [2022] Ari Seff, Wenda Zhou, Nick Richardson, and Ryan P Adams. Vitruvion: A generative model of parametric CAD sketches. In International Conference on Learning Representations, 2022.
  • Tian et al. [2019] Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu. Learning to infer and execute 3d shape programs. In International Conference on Learning Representations, 2019.
  • Valkov et al. [2018] Lazar Valkov, Dipak Chaudhari, Akash Srivastava, Charles Sutton, and Swarat Chaudhuri. Houdini: Lifelong learning as program synthesis. In International Conference on Neural Information Processing Systems, 2018.
  • van den Oord et al. [2017] Aaron van den Oord, Oriol Vinyals, and koray kavukcuoglu. Neural discrete representation learning. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • Vaswani et al. [2017] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, 2017.
  • Vinyals et al. [2015] Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In Advances in Neural Information Processing Systems, volume 28, 2015.
  • Willis et al. [2021] Karl D.D. Willis, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Hang Chu, and Yewen Pu. Engineering sketch generation for computer-aided design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2021.
  • Wu et al. [2021] Rundi Wu, Chang Xiao, and Changxi Zheng. Deepcad: A deep generative network for computer-aided design models. In IEEE International Conference on Computer Vision (ICCV), 2021.

Appendix A Supplementary for “Discovering Design Concepts for CAD Sketches”

A.1 The complete list of 𝕃0\mathbb{L}^{0} types

We provide the complete list of 𝕃0\mathbb{L}^{0} types in List 2. These types are constructed based on the given data types from the SketchGraphs dataset [17]. Note that in the current implementation we do not distinguish sub-primitive references that point to different parts of a primitive, but rely on the predicted geometric closeness of primitive parts to tell them in the post-process, as we find the geometric predictions are generally quite accurate for this purpose. On the other hand, we note that the extension of references into primitive parts can be trivially achieved by turning primitives into functions and augmenting them with arguments (similar to how we model constraints), such that each argument corresponds to a primitive part; the constraint references can then pinpoint to primitive parts through argument passing (Sec. 4.2).

// Basic data types
Construction, Length, Angle, Coord, Ref
// 𝕃0\mathbb{L}^{0} primitive types
Line \rightarrow bdashb_{dash}: Construction, cstart_x,cstart_y,cend_x,cend_yc_{start\_x},c_{start\_y},c_{end\_x},c_{end\_y} : Coord
Circle \rightarrow bdashb_{dash}: Construction, ccenter_x,ccenter_yc_{center\_x},c_{center\_y} : Coord, lradiusl_{radius} : Length
Point \rightarrow bdashb_{dash}: Construction, cx,cyc_{x},c_{y} : Coord
Arc \rightarrow bdashb_{dash}: Construction, ccenter_x,ccenter_yc_{center\_x},c_{center\_y} : Coord, lradiusl_{radius} : Length , astarta_{start}, aenda_{end} : Angle
// 𝕃0\mathbb{L}^{0} constraint types
Coincident \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Distance \rightarrow λ(r1,r2:Ref)\lambda(r_{1},r_{2}:\textrm{Ref}).{ldistl_{dist} : Length}
Horizontal \rightarrow λ(r1:Ref).{}\lambda(r_{1}:\textrm{Ref}).\{\}
Parallel \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Vertical \rightarrow λ(r1:Ref).{}\lambda(r_{1}:\textrm{Ref}).\{\}
Tangent \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Length \rightarrow λ(r1:Ref)\lambda(r_{1}:\textrm{Ref}).{ldistl_{dist} : Length}
Perpendicular \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Equal \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Diameter \rightarrow λ(r1:Ref)\lambda(r_{1}:\textrm{Ref}).{ldistl_{dist} : Length}
Radius \rightarrow λ(r1:Ref)\lambda(r_{1}:\textrm{Ref}).{ldistl_{dist} : Length}
Angle \rightarrow λ(r1,r2:Ref)\lambda(r_{1},r_{2}:\textrm{Ref}).{aanga_{ang} : Angle}
Concentric \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
Normal \rightarrow λ(r1,r2:Ref).{}\lambda(r_{1},r_{2}:\textrm{Ref}).\{\}
List 2 The complete list of 𝕃0\mathbb{L}^{0} types considered in this work.

A.2 Implementation details

Sketch encoding format In Sec. 4 we described how sketches are encoded to allow network learning; here we present more implementation details.

We encode the input sketch SS as a series of primitive tokens followed by a series of constraint tokens, with these tokens supplemented by learned positional encoding according to their indices in this sequence (Sec. 4.1). We additionally insert learnable START, END and NEW tokens at the front of the sequence, the end of the sequence, as well as between every encoded primitive/constraint respectively, to produce the complete sequence.

Each primitive is represented by two consecutive tokens: a 𝕃0\mathbb{L}^{0} type token and a parameter token. The 𝕃0\mathbb{L}^{0} type of primitive is encoded by a 256-dim embedding, obtained by an embedding layer denoted as enctype\textrm{enc}_{type}. The parameters of a primitive are encoded in the parameter token; compared with using different numbers of tokens for different primitive types that previous autoregressive baselines do [6, 13, 18], our one-token parameter encoding allows straightforward matching with a target primitive even if the predicted primitive type does not match the target, which simplifies training. In particular, we use a schema shown in Fig. 7 to encode the parameter values, where each basic data type is represented by a 14-dim code that is obtained by embedding the quantized parameter value, and all slots of a specific primitive type are used while the rest slots are set zero. To represent that the resultant parameter token belongs to a specific primitive type, we augment parameter token with the type token by summing the two vectors to produce the final parameter token (Sec. 4.1).

Each constraint is represented by a type token and several reference tokens. The constraint type token is obtained through the same embedding layer enctype\textrm{enc}_{type} as primitives. To encode a reference, we use a 256-dim embedding to encode the primitive index in this expanded sequence, obtained through the embedding layer encref\textrm{enc}_{ref}. We omit constraint parameters in the current implementation, and defer their inference to post-processing according to the positions of predicted primitives; in comparison, most previous works [13, 18] have simply skipped constraint types with parameters.

Refer to caption

LinePointCircleArcbdashb_{dash}csxc_{sx}csyc_{sy}cexc_{ex}ceyc_{ey}bdashb_{dash}cxc_{x}cyc_{y}bdashb_{dash}cxc_{x}cyc_{y}lradl_{rad}bdashb_{dash}cxc_{x}cyc_{y}lradl_{rad}astarta_{start}aenda_{end}

Figure 7: Parameter code schema. A parameter code contains 18 tokens, each of 14 dims, that are concatenated and zero-padded to 256 dims. For a particular primitive type, only tokens corresponding to the specified type are used in the parameter code, the rest tokens are reset to zero. We did not allocate slots for constraint parameters in the current implementation; in comparison, previous works [13, 18] simply omit constraints with parameters.

Sketch parameter decoding decparam()\textrm{dec}_{param}() has a mirrored structure of encparam()\textrm{enc}_{param}(). It takes a latent parameter code as input and decodes it into a 256-dim code (Fig. 7), which contains several segments corresponding to different primitive types. When doing type casting (Sec. 5), the segment corresponding to the target type is taken for parameter decoding. Each primitive property is represented by a 14-dim embedding code, from which a quantized property value is recovered by an inverse-embedding layer; during this inverse-embedding process, the logits are processed by argmax to query the quantized value. Following previous works [6, 13, 18], we always work with quantized attribute values as categorical variables during network training and inference.

Normalization, augmentation and quantization We normalize all sketches inside a 2×22\times 2 square centered at the origin, and remove duplicated sketches by rasterizing into 128×128128\times 128 binary valued images and removing those with the same images. We apply random shrinking augmentation with scaling factors of 0.50.80.5\sim 0.8. The continuous basic data types (List 2) are uniformly quantized; in particular, we assign 30 bins for angle, 20 bins for length and 80 bins for coordinate.

Network and training details The detection network is a transformer encoder-decoder network, with the encoder/decoder having 12 layers, 8 attention heads and latent dimension of 256.

The structure generation network takes a library code 𝐪𝐋1\mathbf{q}^{\prime}\in\mathbf{L}^{1} and generates the 𝕃0\mathbb{L}^{0} type elements [ti0][t_{i}^{0}] within and a matrix representing the composition R𝐓1R_{\mathbf{T}^{1}} of [ti0][t_{i}^{0}] and arguments. Specifically, the 256-dim library code 𝐪\mathbf{q}^{\prime} first passes through an MLP333Unless otherwise specified, all MLPs used in this paper have uniform hidden dimensions as the input dimension and ReLU activation after each hidden linear layer. of 3 layers to expand to kL0×256k_{L^{0}}\times 256 dims, i.e. kL0k_{L^{0}} codes representing [ti0][t_{i}^{0}] elements. Then each code passes through another MLP of 3 layers (i.e. dectype\textrm{dec}_{type}) to output the discrete probabilities of 𝕃0\mathbb{L}^{0} types that ti0t_{i}^{0} assumes. To generate the composition matrix 𝐑𝐓1\mathbf{R}_{\mathbf{T}^{1}}, we use another MLP of 5 layers to expand the library code to a (2kL0+karg)×(kL0+karg)(2k_{L^{0}}{+}k_{arg})\times(k_{L^{0}}{+}k_{arg}) matrix and apply softmax on each row, as detailed in Sec. 4.2.

The parameter network generates parameters to instantiate concepts. It first expands each of the kqryk_{qry} concept instance codes [𝐪i][\mathbf{q}_{i}] into kL0k_{L^{0}} parameter latent codes, which are further added with the corresponding parameter type embeddings obtained from structure generation network and fed into a transformer decoder to generate explicit parameters. The transformer decoder here has the same hyper parameters as the concept detection decoder (i.e. 12 layers, 8 attention heads, and 256 latent dimension). The decoder transforms each group of kL0k_{L^{0}} latent parameter codes by cross-attending to contextualized input sequences [𝐞ti0][\mathbf{e}^{\prime}_{t_{i}^{0}}], and finally maps them to parameter tokens as described in Fig. 7 through decparam\textrm{dec}_{param}, which are further decoded into probabilities over quantized basic data types by corresponding inverse embedding layers.

We implement all modules in Pytorch, and use the Adam optimizer with a learning rate of 10410^{-4} to train the network for 160 epochs on 4 V100 GPUs, which takes 2 days to complete.

Library size, EMA code update and dead code revival In our experiments, we use a library of 10001000 candidate concepts for 𝐋1\mathbf{L}^{1}. We follow [16] to replace the first term of concept quantization loss (i.e. sg(𝐪i)𝐪i||\textrm{sg}(\mathbf{q}_{i})-\mathbf{q}^{\prime}_{i}|| ) with exponential moving average (EMA) updates of 𝐪𝐋1\mathbf{q}^{\prime}\in\mathbf{L}^{1}. Specifically, for each code 𝐪i\mathbf{q}_{i}^{\prime}, we define two accumulated variables ni0n_{i}\geq 0 and 𝐦i𝐑d\mathbf{m}_{i}\in\mathbf{R}^{d}, which are initialized as 1 and a random unit vector, respectively. They are later updated in each gradient descent iteration following the rules:

ni\displaystyle n_{i} :=γni+(1γ)Ni\displaystyle:=\gamma n_{i}+(1-\gamma)N_{i} (8)
𝐦i\displaystyle\mathbf{m}_{i} :=γ𝐦i+(1γ)j𝐪i,j\displaystyle:=\gamma\mathbf{m}_{i}+(1-\gamma)\sum_{j}{\mathbf{q}_{i,j}} (9)
𝐪i\displaystyle\mathbf{q}_{i}^{\prime} :=𝐦ini\displaystyle:=\frac{\mathbf{m}_{i}}{n_{i}} (10)

where {𝐪i,j}\{\mathbf{q}_{i,j}\} are NiN_{i} detection queries that select 𝐪i\mathbf{q}_{i}^{\prime} as the closet concept prototype in this iteration. We set the decay rate γ=0.99\gamma=0.99 and the commitment cost coefficient β=1\beta=1 in all our experiments.

In addition, we find that the concept quantization process may suffer from codebook collapse where all [𝐪i][\mathbf{q}_{i}] select to few codes of 𝐋1\mathbf{L}^{1}, which impairs the capability of the model. To solve this problem, in the training process we use dead-code revival [16] to periodically (every 100 mini-batches) find an unused code in 𝐋1\mathbf{L}^{1} and replace it with the 𝐪\mathbf{q} who has farthest distance to its closest code 𝐪\mathbf{q}^{\prime}.

A.3 Autoregressive baseline implementation detail

Following [24, 13, 18], the autoregressive baseline contains two modules, the primitive model that generates primitives sequentially and the constraint model that takes primitives as input and generates constraints sequentially. The primitive model is an autoregressive transformer decoder of 12 layers, 8 attention heads and latent dimension 256. The constraint model is a transformer encoder-decoder, where the encoder contextualizes input primitives, and the decoder is an autoregressive model generating constraints. Constraint reference to primitives is implemented by computing dot product correlation between the generated reference token and contextualized primitive tokens produced by the encoder, following the Pointer Network design [23]. The constraint model encoder/decoder have the same hyper-parameters as the primitive model.

A.4 More results

We present more results on design intent interpretation and auto completion. In Fig. 8, we show more results of how raw sketches are parsed with learned concepts, where primitives are colored according to their encapsulating concepts, and constraint graphs are visualized to show the modular concepts. In Fig. 9 we show more such design intent interpretation results without constraint graphs. Fig. 10 presents more auto-completion results, where again we compare with baseline autoregressive approach and demonstrate better interpretability and more regular completions.

Refer to caption
Figure 8: Design intent parsing. Each example shows the input raw sketch and corresponding constraint graph (in black), as well as our interpreted sketch made of modular concepts and corresponding modular constraint graph, where primitives and constraints are colored according to their encapsulating concepts.
Refer to caption
Figure 9: Design intent parsing without showing the constraint graphs. In each sketch example, 𝕃0\mathbb{L}^{0} primitives of the same color belong to the same sketch concept.
Refer to caption
Figure 10: More results of auto-completion. Each example shows the input partial sketch (black) and groundtruth completion (red), result of the autoregressive baseline, and our result (colored by concepts). Our completion results show better interpretability and regularity.

A.5 Concept library analysis

\begin{overpic}[width=433.62pt]{images/Library_distribution3.pdf} \put(24.0,0.0){(a)} \put(67.0,0.0){(b)} \end{overpic}
Figure 11: Statistics about the learned library concepts. (a) Library frequency distribution. The horizontal axis shows the 1000 𝕃1\mathbb{L}^{1} library concepts learned and sorted according to their frequencies in the test dataset. The vertical axis shows the frequency value in log-scale. The majority of learned concepts have stable but not very high occurrence frequencies, meaning they follow a long-tail distribution as expected. Concepts denoted by red points on the curve are visualized in Fig. 12. (b) Library complexity histogram. The horizontal axis is the number of 𝕃0\mathbb{L}^{0} instances contained in a concept, and the vertical axis is the number of concepts of a specific complexity. We can see that degenerate (i.e. empty) or trivial (i.e. size 1) concepts are rare among the whole learned library.
Refer to caption
Figure 12: Examples of learned library concepts and their corresponding sketch instances. The concepts are sorted by their occurrence frequency (high to low) in the test set, and correspond to the red dots marked on the distribution curve of Fig. 11.

Fig. 11(a) shows the frequency of how often our learned library concepts are used in the test dataset. The distribution shows a long-tail pattern, which is expected because the diversity of sketches demands a wealth of modular sketch concepts that individually may not be used extensively. We provide more concrete concepts and corresponding sketches containing these concepts in Fig. 12. These concepts are arranged according to appearance frequency (from high to low) as marked with red points in Fig. 11(a). The most frequently used concepts are rectangles with different constraint variants due to their high abundance within regular sketches. Besides, concepts with simple structures, e.g. few lines connected together by coincidence, are generally more frequently used than those with complex structures, as the simple structures are more flexible and can fit in diverse sketches.

Fig. 11(b) shows the complexity of learned library concepts in terms of how many 𝕃0\mathbb{L}^{0} instances are contained in a concept. We can see that there are a small number of degenerate concepts with empty 𝕃0\mathbb{L}^{0} instances and trivial concepts with only one 𝕃0\mathbb{L}^{0} instances. The empty concepts serve as placeholder for filling up the gaps between small sketches and the maximal graph of kqryk_{qry} concepts. The trivial concepts exist because we always convert a raw sketch into a set of 𝕃1\mathbb{L}^{1} concepts, and for those 𝕃0\mathbb{L}^{0} elements of the raw sketch that do not fit into any modular concept, they will be encapsulated with such trivial 𝕃1\mathbb{L}^{1} concepts for the sake of complete reconstruction.

A.6 Parameter refinement with constraint solver

The errors in generated primitive parameters (e.g. due to quantization of basic data types) can be mitigated by applying constraints with a constraint solver provided by OnShape [15]. In Fig. 13, we show examples of sketches before and after refining primitive parameters with constraint solver.

Refer to caption
Figure 13: Parameter refinement with constraint solver. Generated sketches have parameter inaccuracies (upper row); constraint solver refines the sketches by applying generated constraints and fitting the primitives together properly (lower row).

A.7 More ablation results

kqryk_{qry} Primitive Constraint Modular(%)
5 0.994 0.766 50.8
6 0.994 0.808 36.0
8 0.991 0.845 32.6
10 0.998 0.894 15.9
12 0.999 0.918 14.9
Table 2: Query number kqryk_{qry} ablation. F-scores are reported for primitives and constraints.
kargk_{arg} Primitive Constraint Modular(%)
1 0.993 0.666 52.6
2 0.993 0.766 50.8
3 0.990 0.7577 18.5
4 0.994 0.776 7.1
Table 3: Argument number kargk_{arg} ablation. F-scores are reported for primitives and constraints.
𝕃1\mathbb{L}^{1} size Primitive Constraint Modular(%)
50 0.970 0.581 48.1
100 0.980 0.600 48.8
500 0.989 0.735 49.2
1000 0.994 0.766 50.8
2000 0.995 0.779 50.6
Table 4: 𝕃1\mathbb{L}^{1} library size ablation. F-scores are reported for primitives and constraints.
\begin{overpic}[width=346.89731pt]{images/modularity_diff} \put(-14.0,8.0){$k_{qry}=10$} \put(-14.0,26.0){$k_{qry}=8$} \put(-14.0,44.0){$k_{qry}=6$} \put(-14.0,63.0){$k_{qry}=5$} \end{overpic}
Figure 14: Design intent interpretation trained with different concept query numbers kqryk_{qry}. Larger kqryk_{qry} leads to less modular concepts.
\begin{overpic}[width=346.89731pt]{images/modularity_diff_arg_new.pdf} \put(-14.0,11.0){$k_{arg}=4$} \put(-14.0,30.0){$k_{arg}=3$} \put(-14.0,47.0){$k_{arg}=2$} \end{overpic}
Figure 15: Design intent interpretation trained with different argument numbers kargk_{arg}. Larger kargk_{arg} leads to less modular concepts.

To evaluate the impact of hyper parameters such as the number of concept queries kqryk_{qry}, arguments kargk_{arg} and library size 𝕃1\mathbb{L}^{1}, we train our model under different kqryk_{qry}, kargk_{arg} and 𝕃1\mathbb{L}^{1} sizes on the auto-encoding design intent interpretation task.

When changing kqryk_{qry}, we also adjust kL0k_{L^{0}} the number of 𝕃0\mathbb{L}^{0} elements each concept contains, so that the total number of generated 𝕃0\mathbb{L}^{0} elements, i.e. kqry×kL0k_{qry}\times k_{L^{0}}, is unchanged. The evaluation results on adjusting kqryk_{qry}, kargk_{arg}, 𝕃1\mathbb{L}^{1} are shown in Table 2, Table 3 and Table 4 respectively.

From Table 2 we can see that adding more concept queries makes the model more expressive and flexible, demonstrated as the increasing constraint F-scores; on the other hand, this comes with a cost of hurting modularity, as a sketch can be decomposed into more granular components.

From Table 3 we see that adding more arguments kargk_{arg} than default 2 does not result in a significant improvement in constraint F-score, but leads to a significant decrease in modularity, suggesting the current default argument number is sufficient. On the other hand, decreasing the number of arguments leads to a significant drop in constraint F-score but does not achieve obviously higher modularity.

We visualize examples of design intent interpretation for different kqryk_{qry} and kargk_{arg} in Fig. 14 and Fig. 15 respectively, to give an intuitive sense of the above numeric results especially on modularity.

From Table 4 we can see that adding more 𝕃1\mathbb{L}^{1} library makes the model more expressive and flexible, showing increasing constraint F-scores. However, such improvement becomes marginal when more libraries are introduced, as the newly introduced libraries are mainly used to capture structures that rarely appear and have little impact on the overall results (intuitively, they mainly continue the fall-off trend of far-right tail regions of the frequency distribution shown in Fig. 11(a)). On the other hand, the modularity maintains at roughly the same level throughout the changes over library size.

A.8 Image-conditioned generation

We extend our model to image-conditioned generation, where we are interested in accurately recovering a parametric sketch from an image of hand-drawn sketch. For comparison, we also extend the auto-regressive baseline to this image-conditioned generation.

Following [13, 18], we use a ViT style encoder to condition the generation on images. Specifically, the input sketch image of size 128×128128\times 128 is partitioned into non-overlapping square patches of size 16×1616\times 16. The image patches are flattened and pass through an MLP of 3 layers to produce a sequence of 64 image tokens (each of dimension 256), and then feed into a transformer encoder to produce contextualized image embeddings that the detection decoder cross-attends to. For autoregressive baseline, we similarly augment the primitive model with such an image encoder and use cross-attention to image tokens in the autoregressive primitive decoder. The ViT style image encoder used here has the same hyper parameters as the other transformer modules discussed above (i.e. 12 layers, 8 attention heads, 256 latent dimension).

We train our model with learning rate of 3×1043\times 10^{-4} for 200 epochs and the autoregressive baseline with the same learning rate for 400 epoch to convergence. We used the xkcd packages in mathplotlib to simulate sketches of hand-drawn style.

We provide quantitative evaluation in Table 5 and visual comparison in Fig. 16, both showing that our model has superior performance than the autoregressive baseline, which again can be attributed to the more regular generation through sketch concept composition.

Refer to caption
Figure 16: Image conditioned generation. Each example shows the input sketch image, the reconstruction result by autoregressive baseline, and our result.
Config Primitive Constraint
Autoregressive 0.575 0.301
Ours 0.711 0.368
Table 5: Image-conditioned generation. F-scores are reported for primitives and constraints.

A.9 Experiment on “CAD as Language” dataset

We also conduct preliminary experiments of our method on the dataset of [6], which comprises of millions of CAD sketches retrieved from OnShape [15]; in comparison, the SketchGraphs dataset [17] on which we have done the other experiments is similarly collected from OnShape but has a smaller scale. We filter the dataset by removing trivial or semantically ambiguous sketches and confine the sketch complexity such that the total number of primitives and constraints is within [20,90][20,90]. In the end, we obtain about 2.5 million sketches and use 2.3 millions for training and the rest for testing. In comparison, there are about 1 million samples from SketchGraphs dataset used in the other experiments, where the maximum sketch graph size is 50 (Sec. 6).

To accommodate the increased complexity of this dataset, we increase the query number kqryk_{qry} to 6, the 𝕃1\mathbb{L}^{1} library size to 15, and leave the rest hyperparameters unchanged. Examples of learned libraries and corresponding sketches are presented in Fig. 17. Examples of design intent parsing results are given in Fig. 18. We can see that our method obtains new modular concepts and parses more complex sketches; these results show that our framework works similarly on this new dataset.

Refer to caption
Figure 17: Examples of library concepts learned from the CAD as Language dataset [6] and their corresponding sketch instances. Different instances of the same library are highlighted in different colors.
Refer to caption
Figure 18: Design intent parsing learned on the CAD as Language dataset [6]. Each example shows the input raw sketch and corresponding constraint graph (in black), as well as our interpreted sketch made of modular concepts and corresponding modular constraint graph, where primitives and constraints are colored according to their encapsulating concepts.

A.10 Broader impact

This work potentially improves the efficiency of CAD sketch design, which however does not replace other critical procedures of CAD. For example, the discovered concepts do not necessarily meet structure safety constraints, and should be subject to checking and validation procedures according to specific applications. The general methodology of program library induction presented in this work facilitates more structured and interpretable machine learning, which may enhance human and AI interaction but has no direct negative social impacts.