This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization

Ruichen Zheng Tsinghua UniversityHaidian DistrictBeijingChina [email protected]  and  Tao Yu Tsinghua UniversityHaidian DistrictBeijingChina [email protected]
Abstract.

Neural implicit representation, the parameterization of distance function as a coordinate neural field, has emerged as a promising lead in tackling surface reconstruction from unoriented point clouds. To enforce consistent orientation, existing methods focus on regularizing the gradient of the distance function, such as constraining it to be of the unit norm, minimizing its divergence, or aligning it with the eigenvector of Hessian that corresponds to zero eigenvalue. However, under the presence of large scanning noise, they tend to either overfit the noise input or produce an excessively smooth reconstruction. In this work, we propose to guide the surface reconstruction under a new variant of neural field, the octahedral field, leveraging the spherical harmonics representation of octahedral frames originated in the hexahedral meshing. Such field automatically snaps to geometry features when constrained to be smooth, and naturally preserves sharp angles when interpolated over creases. By simultaneously fitting and smoothing the octahedral field alongside the implicit geometry, it behaves analogously to bilateral filtering, resulting in smooth reconstruction while preserving sharp edges. Despite being operated purely pointwise, our method outperforms various traditional and neural approaches across extensive experiments, and is very competitive with methods that require normal and data priors. Our full implementation is available at: https://github.com/Ankbzpx/frame-field.

copyright: none
Refer to caption
Figure 1. Given an unoriented point cloud, start with a faithful initialization (i.e. with NSH (Wang et al., 2023)), our method align a smooth neural octahedral field with the jointly learned implicit geometry, while leveraging its octahedral prior to simultaneously smooth and regularize the sharp edges of the reconstruction.
\Description

This is the teaser figure for the article. \DescriptionEnjoying the baseball game from the third-base seats. Ichiro Suzuki preparing to bat.

1. Introduction

As the predominant method for acquiring real-world geometry, 3D scanning technologies have advanced rapidly and become increasingly accessible over the past several decades. The early procedure relied on triangulation-based solutions that required a strictly controlled environment, but now handhold scanning using test of flight sensors in smart phones is no longer a rare sight. Despite the varying methodologies, the intermediate noisy point cloud remains the most common data consumed for the reconstruction step, making its study of great importance and practical value.

The goal of most surface reconstruction tasks is to remove as much noise as possible while preserving geometric features such as sharp edges. In fact, the denoising and sharpening tasks are often interconnected. For instance, (Fleishman et al., 2005) employs the median filter to remove the initial noise, then adapts the forward search scheme, by progressively fitting a subset, then filtering outlines to determine a set of patches, which can be further sampled towards edges. Huang et al. (2013) first applies bilateral filters to denoise the noisy observation that off-edge normal can be reliably estimated. They then progressively sample towards the edge with newly updated normal to improve edge quality. Similarly, (Wei et al., 2023a) draws the idea using line processes that favors smoothness when similarity measurement is under a tunable threshold, hence it performs smoothing while not compromising sharp features. The work above all requires the fitting of a local tangent plane, over neighborhood patches, and with KNN connectivity.

However, when it comes to neural implicit representation, which stays pointwise and has no connectivity nor local neighborhood in the traditional sense, enforcing its smoothness while preserving sharp geometric feature becomes an exceedingly challenging task. Processing such representation requires slow and costly evaluation of high-order energies over densely sampled queries (Yang et al., 2021). Even connectivity can be created using differentiable surface extraction (Remelli et al., 2020), its backpropagation violates the level set equation, and hence is inherently biased (Mehta et al., 2022). However, as the popularity of neural implicit representation increases, it gradually becomes the direct output in acquiring new geometries (Mildenhall et al., 2021), rather than an intermediate format for storing existing ones, so its denosing and sharpening holds practical value.

In this work, we tackle this challenge through the len of surface reconstruction from unoriented point–newly proposed regularizations (Ben-Shabat et al., 2023; Wang et al., 2023) can produce a faithful initialization from positional constraints alone, but cannot handle noise nor enforce sharp feature unless it is given by positional input. To this end, we introduce a guiding octahedral field, which by its nature is edge aware, and by alternating between aligning the octahedral field with implicit geometry normal and regularizing the normal to match the guide field, it converges to smooth reconstruction with exaggerated sharp edges. More importantly, our method stays pointwise throughout the process.

Our contribution can be summarized as:

  • Propose a novel method to simultaneously smooth and emphasize the sharp edges of neural implicit representation for unoriented point cloud surface reconstruction.

  • Introduce a new variant of neural field, the octahedral field, that can be jointly learned with other neural implicit representations. Moreover, we design an effective alignment loss as well as an efficient smoothness loss to make the learning of the zero-level aligned smooth octahedral field feasible.

  • Design a novel sharp edge regularization loss that utilizes the cubic symmetry of the octahedral frame to regularize the gradient of implicit geometry, encouraging it to preserve sharp turing angles near the edge.

2. Background and Related work

Refer to caption
Figure 2. NSH and DiGS are powerful methods to handle unoriented point cloud fitting. However, when facing large scanning noise, they tend to overfit the positional constraints after the annealing of the regularization weights (Top). A careful tuning can help balance the noise and smooth, but cannot fully resolve the challenge (Middle). We propose to ’snapshot’ their initialization with a jointly trained octahedral field. Such field is sharp edge aware, which can be used to regularize the implicit geometry to improve the reconstruction fidelity.
\Description

Core idea.

2.1. Surface reconstruction from noisy point cloud

Reconstructing surface has been extensively researched in recent decades and continues to evolve rapidly with the advancement of deep learning. We focus on reviewing the approaches most related to ours and refer the interested readers to more detailed reviews (Huang et al., 2022; Sulzer et al., 2024).

Point Set Surface (Alexa et al., 2001) (PSS) fits polynomials to approximate the surface locally. The query points are evaluated using the Moving Least Square (MLS) (Levin, 2004), with the gradient of the polynomials indicating the surface normals (Alexa and Adamson, 2004). Algebraic Point Set Surface (APSS) (Guennebaud and Gross, 2007) fits the algebraic sphere to improve the stability of PSS over regions of high curvature. Robust Implicit MLS (Öztireli et al., 2009) integrates robust statistics with local kernel regression to handle outliers. Fleishman et al. (2005) apply forward search (Atkinson and Riani, 2012) to filter outliers that cannot be identified by robust statistics. Variational Implicit Point Set Surface (VPSS) (Huang et al., 2019) leverages the eikonal constraint to ease the need for input normals, but at the cost of cubic complexity. In contrast to local fitting, Poisson Surface Reconstruction (PSR) (Kazhdan et al., 2006) represents the surface globally as the level set of implicit indicator function, whose gradient matches the input normal. Screened Poisson Surface Reconstruction (SPSR) (Kazhdan and Hoppe, 2013) integrates positional constraints to balance precision and smoothness. Recently, Neural Kernel Surface Reconstruction (NKSR) (Huang et al., 2023) integrates aspects from both ends–it models surface as the zero level set of the global implicit function, but with its value determined by learned proximity (neural kernel) with respect to input samples.

2.2. Point cloud resampling and denoising

Edge-Aware Point Set Resampling (EAR) (Huang et al., 2013) applies bilateral filtering to reliable fit normals away from edges, then progressively sample towards the edge to capture sharp features. Wei et al. (2023b) predict cross field and leverage its crease align nature Jakob et al. (2015); Huang and Ju (2016) to tessellate space for convolution. Sarkar et al. (2018) randomly samples square patches and applies RANSAC (Fischler and Bolles, 1981) stores local height in square matrices and performs denoising by to low rank matrix factorization. Zeng et al. (2019) sample overlapping patches and connect samples by projection distance to perform graph smoothing. (Wei et al., 2023a) use line processes to perform smoothing up to a optimizable threshold. Deep Geometric Prior (DGP) (Williams et al., 2019) parameterize local patch with MLP and rely on its smooth prior to denoise.

2.3. Fitting based Neural implicit representation

Neural implicit representation encodes the surface as the zero-level set of the distance function using coordinate MLP. We primarily review its application in fitting-based surface reconstruction. Readers interested in neural field in general should check out the comprehensive survey by Xie et al. (2021).

DeepSDF (Park et al., 2019) and Occupancy Networks (Mescheder et al., 2019) parameterize the signed distance function (SDF) or occupancy classification over spatial locations, which are capable of encoding geometry of arbitrary resolutions. SAL (Atzmon and Lipman, 2020) sample in the vicinity of the input point cloud and learns an unsigned distance function. SALD (Atzmon and Lipman, 2021) further improves the quality of reconstruction with normal supervision. IGR (Gropp et al., 2020) introduces the eikonal regularization term, which under stochastic optimization, the MLP converges to a faithful SDF. When normal is not available, the eikonal regularization is insufficient to guarantee a unique solution, resulting in ambiguities and artifacts. Park et al. (2023) model the SDF gradient as the unique solution to the pp-Poisson equation and additionally minimize the surface area to improve hole filling. DiGS (Ben-Shabat et al., 2023) initializes the MLP with a geometric sphere and minimizes the divergence of its gradient to preserve the orientation consistency. Neural Singular Hessian (NSH) (Wang et al., 2023) encourages the Hessian of MLP to have zero determinant and produces a topologically faithful initialization. However, both DiGS and NSH require annealing the regularization weights to fit finer details, which are prone to noise corruption. Our idea is to ”snapshot” the initialization as a prior, in making up for the emerging of noise (Figure 2).

2.4. Functional representation of the octahedral field

In this section, we give a brief summary of the Spherical Harmonic (SH) representation of octahedral frame by Huang et al. (2011), Ray et al. (2016) and Palmer et al. (2020). A full review is available in the section A of the supplementary.

[Uncaptioned image]

2.4.1. Definition

An octahedral frame can be represented by three mutually orthogonal unit vectors {𝐯1,𝐯2,𝐯3}\{\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\}, or equivalently, a rotation matrix 𝐑=[𝐯1|𝐯2|𝐯3]SO(3)\mathbf{R}=[\mathbf{v}_{1}|\mathbf{v}_{2}|\mathbf{v}_{3}]\in SO(3) associated with the canonical frame of standard basis vectors {𝐞x,𝐞y,𝐞z}\{\mathbf{e}_{x},\mathbf{e}_{y},\mathbf{e}_{z}\}. To avoid representation vector matching, it is convenient to describe the canonical frame as a spherical polynomial F:S2F:S^{2}\to\mathbb{R}, so it can be projected onto band 0 and 4 of SH basis:

F(𝐬)\displaystyle F(\mathbf{s}) =(𝐬𝐞x)4+(𝐬𝐞y)4+(𝐬𝐞z)4\displaystyle=(\mathbf{s}\cdot\mathbf{e}_{x})^{4}+(\mathbf{s}\cdot\mathbf{e}_{y})^{4}+(\mathbf{s}\cdot\mathbf{e}_{z})^{4}
=c0(c1Y00(𝐬)+712Y40(𝐬)+512Y44(𝐬))\displaystyle=c_{0}(c_{1}Y_{0}^{0}(\mathbf{s})+\sqrt{\frac{7}{12}}Y_{4}^{0}(\mathbf{s})+\sqrt{\frac{5}{12}}Y_{4}^{4}(\mathbf{s}))
=c0(c1Y00(𝐬)+𝐘4(𝐬)T𝐪0)\displaystyle=c_{0}(c_{1}Y_{0}^{0}(\mathbf{s})+\mathbf{Y}_{4}(\mathbf{s})^{T}\mathbf{q}_{0})
𝐪0\displaystyle\mathbf{q}_{0} =[0,0,0,0,712,0,0,0,512]T9,𝐬S2\displaystyle=[0,0,0,0,\sqrt{\frac{7}{12}},0,0,0,\sqrt{\frac{5}{12}}]^{T}\in\mathbb{R}^{9},\ \mathbf{s}\in S^{2}

where YlmY_{l}^{m} is the SH basis function of band ll order mm, 𝐘4\mathbf{Y}_{4} is its vector form for the band 44 basis, c0c_{0} and c1c_{1} are constants.

Since Y00Y_{0}^{0} is also a constant, F(𝐬)F(\mathbf{s}) can be fully categorized by the SH band 4 coefficients 𝐪0\mathbf{q}_{0} alone, so are any general octahedral frames F(𝐑T𝐬)F(\mathbf{R}^{T}\mathbf{s}):

F(𝐑T𝐬):𝐪=𝐑~𝐪09,𝐑~SO(9)F(\mathbf{R}^{T}\mathbf{s}):\mathbf{q}=\tilde{\mathbf{R}}\mathbf{q}_{0}\in\mathbb{R}^{9},\tilde{\mathbf{R}}\in SO(9)

where 𝐑~\tilde{\mathbf{R}} is the Wigner D-matrix induced by 𝐑\mathbf{R}. Due to the orthogonality of the SH basis, 𝐪2=𝐑~𝐪02=𝐪02=1\|\mathbf{q}\|_{2}=\|\tilde{\mathbf{R}}\mathbf{q}_{0}\|_{2}=\|\mathbf{q}_{0}\|_{2}=1, so 𝐪\mathbf{q} is also a value SH band 4 coefficient vector. We refer to this set of unit norm coefficient vectors in 9\mathbb{R}^{9} associated with the octahedral frame as the octahedral variety.

With SH parameterized functional representation, the difference between two general frames associated with 𝐑a\mathbf{R}_{a} and 𝐑b\mathbf{R}_{b} can be measured using spherical integral, that can be further reduced as as difference between their coefficient vectors:

(1) S2(F(𝐑aT𝐬)F(𝐑bT𝐬))2𝑑𝐬\displaystyle\int_{S^{2}}(F(\mathbf{R}_{a}^{T}\mathbf{s})-F(\mathbf{R}_{b}^{T}\mathbf{s}))^{2}d\mathbf{s} =S2(𝐘4(𝐬)T𝐪a𝐘4(𝐬)T𝐪b)2𝑑𝐬\displaystyle=\int_{S^{2}}(\mathbf{Y}_{4}(\mathbf{s})^{T}\mathbf{q}_{a}-\mathbf{Y}_{4}(\mathbf{s})^{T}\mathbf{q}_{b})^{2}d\mathbf{s}
=𝐪a𝐪b22\displaystyle=\|\mathbf{q}_{a}-\mathbf{q}_{b}\|_{2}^{2}

The association of octahedral frames with spatial coordinates, known as the octahedral field, can be described by function u:𝐩3𝐪9u:\mathbf{p}\in\mathbb{R}^{3}\to\mathbf{q}\in\mathbb{R}^{9}, whose smoothness can be measured using Dirichlet energy and discretized with (1) if given the connectivity:

(2) Vu2𝑑𝐩=eijwij𝐪i𝐪j22\int_{V}\|\nabla u\|^{2}d\mathbf{p}=\sum_{e_{ij}\in\mathcal{E}}w_{ij}\|\mathbf{q}_{i}-\mathbf{q}_{j}\|_{2}^{2}

where \mathcal{E} is the set of edges, wijw_{ij} is the harmonic weight of edge eije_{ij}.

2.4.2. Normal alignment

The octahedral field is classically designed over a volume enclosed by its boundary surface, with boundary frames having one of their axes aligned with the surface normals, while being as smooth as possible anywheres. The normal aligned frames 𝐪n\mathbf{q}_{n} can be modelled as the set of zz axis aligned frames 𝐪z\mathbf{q}_{z} rotated towards normal directions, which can be parameterized by twisting along zz axis:

(3) 𝐪z\displaystyle\mathbf{q}_{z} =[512cos4θ,0,0,0,712,0,0,0,512sin4θ]T\displaystyle=[\sqrt{\frac{5}{12}}\cos 4\theta,0,0,0,\sqrt{\frac{7}{12}},0,0,0,\sqrt{\frac{5}{12}}\sin 4\theta]^{T}
𝐪n\displaystyle\mathbf{q}_{n} =𝐑~zn𝐪z9,𝐑~znSO(9)\displaystyle=\tilde{\mathbf{R}}_{z\to n}\mathbf{q}_{z}\in\mathbb{R}^{9},\tilde{\mathbf{R}}_{z\to n}\in SO(9)

where znz\to n denotes rotation from zz axis to normal 𝐧\mathbf{n}, θ\theta\in\mathbb{R} is the twisting angle that parameterize the quadratic equality:

c02+c82=1,c0=cos4θ,c8=sin4θc_{0}^{2}+c_{8}^{2}=1,c_{0}=\cos 4\theta,c_{8}=\sin 4\theta

For a coefficient vector 𝐪9\mathbf{q}\in\mathbb{R}^{9}, Palmer et al. (2020) observe its closest projection to zz axis aligned frame has form:

Πz(𝐪)=[512𝐪[0]𝐪[0]2+𝐪[8]2,0,0,0,712,0,0,0,512𝐪[8]𝐪[0]2+𝐪[8]2]\Pi_{z}(\mathbf{q})=[\frac{\sqrt{\frac{5}{12}}\mathbf{q}[0]}{\sqrt{\mathbf{q}[0]^{2}+\mathbf{q}[8]^{2}}},0,0,0,\sqrt{\frac{7}{12}},0,0,0,\frac{\sqrt{\frac{5}{12}}\mathbf{q}[8]}{\sqrt{\mathbf{q}[0]^{2}+\mathbf{q}[8]^{2}}}]

where [][\cdot] denotes 0 based array indexing. Notice how the normalization of the first and the last coefficients incorporates the quadratic constraint without expressing it in the form of equality. They further propose the projection to closest 𝐧\mathbf{n} aligned frame as:

(4) Πn(𝐪)=𝐑~znΠz(𝐑~znT𝐪)\Pi_{n}(\mathbf{q})=\tilde{\mathbf{R}}_{z\to n}\Pi_{z}(\tilde{\mathbf{R}}_{z\to n}^{T}\mathbf{q})

We employ this loss in our normal alignment constraint.

Refer to caption
Figure 3. The visualization of how octahedral frames interpolated over sharp creases. We use Gauss map to illustrate angle coverage.
\Description

Octahedral frame interpolation.

Refer to caption
Figure 4. The illustration of the proposed alternating optimization procedure.
\Description

Regularization process.

2.4.3. Interpolation

Our key observation is based on how normal aligned octahedral frames interpolate across creases–it behaves similarly to normals, which are ambiguous at discrete vertices Solomon et al. (2017), but interpolate smoothly in continuous settings. However, such interpolation exhibits cubic symmetry–it favors sharp turing when the dihedral angle is large and converges to vertex normal interpolation at smooth regions (Figure 3). This property makes the octahedral field edge aware, an ideal property when handling scanning noise. Moreover, the SH representation fully parameterizes the octahedral field as 9\mathbb{R}^{9} coefficient vector field in the ambient space 3\mathbb{R}^{3}, allowing it to be implicitly encoded to pair with the neural implicit representation.

3. Method

Given an unoriented point cloud {𝐩i}i=1N\{\mathbf{p}_{i}\}_{i=1}^{N} and a neural implicit representation fω:3f_{\omega}:\mathbb{R}^{3}\to\mathbb{R}, we introduce a jointly trained octahedral field parameterized using a dedicated coordinate MLP:

uθ:𝐱3𝐪9u_{\theta}:\mathbf{x}\in\mathbb{R}^{3}\to\mathbf{q}\in\mathbb{R}^{9}

where ω\omega and θ\theta are the MLP weights.

As illustrated in Figure 4, our goal is to pair a smooth octahedral field with an initial implicit surface. By alternatively aligning the octahedral field with the fixed surface normal, and regularizing the surface normal to match one of the fixed octahedral frame’s representation vectors, it encourages the implicit geometry to be smooth while emphasizing its sharp edges.

3.1. Normal alignment

For each sample on the surface, we want to minimize the deviation between the predicted octahedral frame and its closest normal-aligned projection. Although the surface normal is not provided, the fωf_{\omega} is always differentiable, that we utilize the gradient of its zero level set as the boundary condition to define our octahedral field:

i=1Nϕ(uθ(𝐩i),Π(uθ(𝐩i),fω(𝐩i)fω(𝐩i)2))\sum_{i=1}^{N}\phi(u_{\theta}(\mathbf{p}_{i}^{\prime}),\Pi(u_{\theta}(\mathbf{p}_{i}^{\prime}),\frac{\nabla f_{\omega}(\mathbf{p}_{i}^{\prime})}{\|\nabla f_{\omega}(\mathbf{p}_{i}^{\prime})\|_{2}}))

where 𝐩i=𝐩ifω(𝐩i)fω(𝐩i)fω(𝐩i)2\mathbf{p}_{i}^{\prime}=\mathbf{p}_{i}-f_{\omega}(\mathbf{p}_{i})\cdot\frac{\nabla f_{\omega}(\mathbf{p}_{i})}{\|\nabla f_{\omega}(\mathbf{p}_{i})\|_{2}} is the projection of 𝐩i\mathbf{p}_{i} on zero level set, Π(𝐪,𝐧)\Pi(\mathbf{q},\mathbf{n}) is the rewritten of (4) by treating both SH coefficient vector and normal as arguments, ϕ\phi is a similarity measurement function.

In practice, to avoid the costly projection step, we adapt the weighting scheme from Ma et al. (2023), which we use fω(𝐩i)\nabla f_{\omega}(\mathbf{p}_{i}) as the surrogate and weight it by distance to the zero level set. The alignment loss can then be simplified as:

(5) align=i=1Nβ(𝐩i)ϕ(uθ(𝐩i),Π(uθ(𝐩i),fω(𝐩i)fω(𝐩i)2))\mathcal{L}_{\text{align}}=\sum_{i=1}^{N}\beta(\mathbf{p}_{i})\cdot\phi(u_{\theta}(\mathbf{p}_{i}),\Pi(u_{\theta}(\mathbf{p}_{i}),\frac{\nabla f_{\omega}(\mathbf{p}_{i})}{\|\nabla f_{\omega}(\mathbf{p}_{i})\|_{2}}))

where β(𝐩i)=exp(100|fω(𝐩i)|)\beta(\mathbf{p}_{i})=\exp(-100\cdot|f_{\omega}(\mathbf{p}_{i})|), and we choose cosine similarity ϕ(,)=(1,)\phi(\cdot,\cdot)=(1-\langle\cdot,\cdot\rangle). We observe a minor difference when compared to direct projection. It is worth pointing out that the weighting is important–frames distant away surface may deviate significantly due to the existence of singularity. Note that (5) is a function of both MLPs, so we stop the gradient propagation for fωf_{\omega} and focus on aligning uθu_{\theta}.

The design of (5) yields two benefits: First, as discussed in Section 2.4.2, projection (4) by Palmer et al. (2020) alleviates the need for additional parameterization or quadratic equality, especially when such equality is defined element-wise. Additionally, the choice of cosine similarity relaxes the unit norm constraint, which is topologically unsatisfiable for the smooth octahedral field due to the existence of singularities (Solomon et al., 2017). This also makes the output of uθu_{\theta} a Euclidean vector space in 9\mathbb{R}^{9}, which we will elaborate on its importance in the next section.

However, unlike the boundary condition in the discrete setting, (5) is inherently a soft constraint. Compared to the exact one (Sukumar and Srivastava, 2022), it is notoriously difficult to train and requires dense boundary samples to fulfill (Raissi et al., 2019; Barschkis, 2023). It is especially challenging for our case, as small hole-like structures are a common occurrence in real-world scans. Those holes resemble cylinder volumes that require multiple singularity curves to satisfy. Compounded by the difficulty in capturing the interior of those holes using range sensors, our octahedral field cannot fully align with those regions. We will discuss its implications in Section 3.3.

3.2. Smoothness with Lipschitz continuity

The next step is to encourage the octahedral field to be smooth. Since uθu_{\theta} directly outputs the SH coefficient vector, its MLP output space is of Euclidean topology and is also where the smoothness of the octahedral frames can be measured. This allows the use of efficient Lipschitz regularization (Liu et al., 2022) as a smoothness proxy.

Under the 1-Lipschitz activation function (ReLU, ELU, sin\sin, tanh, etc.), the norm of output variation with respect to the input is bounded by that layer’s Lipschitz constants cic_{i}

uθi(𝐱0)uθi(𝐱1)pci𝐱0𝐱1p,ci=𝐖ip,i=1l\|u_{\theta_{i}}(\mathbf{x}_{0})-u_{\theta_{i}}(\mathbf{x}_{1})\|_{p}\leq c_{i}\|\mathbf{x}_{0}-\mathbf{x}_{1}\|_{p},\ c_{i}=\|\mathbf{W}_{i}\|_{p},\ i=1\dots l

where uθiu_{\theta i} denotes the iith layer of uθu_{\theta}, 𝐱\mathbf{x} is its input, 𝐖i\mathbf{W}_{i} is its weight matrix, ll is number of layers. The LipMLP (Liu et al., 2022) initializes the Lipschitz constant per layer as the LL_{\infty} norm of its randomly initialized weight matrix ci=𝐖ic_{i}=\|\mathbf{W}_{i}\|_{\infty}. During training, cic_{i} is used to rescale the maximum absolute row-wise sum of the weight matrix 𝐖i\mathbf{W}_{i} so ci=𝐖ic_{i}=\|\mathbf{W}_{i}\|_{\infty} still holds. By shrinking all Lipschitz constants, we minimize the MLP’s output variation with respect to input, hence encouraging it to be smooth:

(6) lip=i=1lsoftplus(ci)\mathcal{L}_{\text{lip}}=\prod_{i=1}^{l}\text{softplus}(c_{i})

where softplus is a reparameterization to force the bound to be positive (softplus(ci)ci,ci1\text{softplus}(c_{i})\approx c_{i},\ c_{i}\gg 1), and the p=p=\infty is chosen purely for computational efficiency. This loss is highly efficient because it only needs first-order derivatives, and all weight matrices can be updated in parallel.

In our case, minimizing the Lipschitz bound also has geometric meaning. Denote the output for any two input 𝐱0\mathbf{x}_{0}, 𝐱1\mathbf{x}_{1} as 𝐪0=uθ(𝐩0)\mathbf{q}_{0}=u_{\theta}(\mathbf{p}_{0}), 𝐪1=uθ(𝐩1)\mathbf{q}_{1}=u_{\theta}(\mathbf{p}_{1}) respectively, we have:

𝐪0𝐪12𝐪0𝐪1i=1lci𝐩0𝐩1\|\mathbf{q}_{0}-\mathbf{q}_{1}\|_{2}\leq\|\mathbf{q}_{0}-\mathbf{q}_{1}\|_{\infty}\leq\prod_{i=1}^{l}c_{i}\|\mathbf{p}_{0}-\mathbf{p}_{1}\|_{\infty}

The Lipschitz constants bound norm of variation of SH coefficient vector (1) with respect to input spatial variations, hence its minimization is analogously to minimizing the discrete Dirichlet energy (2). Note that it is only possible because we output SH coefficients directly–the output space for most rotation representations are not Euclidean(Zhou et al., 2019), and for those are, they are not where the smoothness for octahedral frames is measured. Thus, they require more expensive gradient norm minimization or finite-difference equivalents (Huang et al., 2021; Zhang et al., 2023).

Refer to caption
Figure 5. The visualization of DiGS and NSH fitting procedure. The DiGS minimize the divergence of SDF gradient field, or equivalently its laplacian. The resulting network takes much longer to update its positional constraints, even after the complete lift of its divergence regularization.
\Description

DiGS smoothing.

3.3. Octahedral prior for sharp edge regularization

The core of our method is to exploit the interpolation property of the octahedral frames (Section 2.4.3) to simultaneously smooth and accentuate the sharp edges of the companion implicit geometry (Figure 4). We do it by matching the gradient of SDF to one of the six representation vectors of the octahedral frame.

Refer to caption
Figure 6. We sample directions of a unit sphere and evaluate their losses against the canonical octahedral frame coefficients to visualize the loss manifold. We further demonstrate their difference using a toy example–the white points are unoriented point clouds sampled on creases with dihedral angles of 3030^{\circ}, 9090^{\circ}, 120120^{\circ} respectively. Notice how the steepness of the manifold at the representation vector directions helps sharpen the crease. We use L1L_{1} loss throughout our experiments.
\Description

The loss manifold

Recall that align loss (5) is a function of both MLPs–we can simply flip the gradient propagation by fixing uθu_{\theta} while updating fωf_{\omega}, which resembles alternating optimization. In examining the loss manifold (Figure 6), we find L1L_{1} or L2L_{2} loss works better than cosine similarity, so we adjust it slightly:

(7) regularize=i=1Nϕ(uθ(𝐩i)uθ(𝐩i)2,Π(uθ(𝐩i),fω(𝐩i)fω(𝐩i)2))\mathcal{L}_{\text{regularize}}=\sum_{i=1}^{N}\phi(\frac{u_{\theta}(\mathbf{p}_{i})}{\|u_{\theta}(\mathbf{p}_{i})\|_{2}},\Pi(u_{\theta}(\mathbf{p}_{i}),\frac{\nabla f_{\omega}(\mathbf{p}_{i})}{\|\nabla f_{\omega}(\mathbf{p}_{i})\|_{2}}))

the normalization is employed because the projection (4) always yields valid SH 4 coefficient vector hence of unit norm. Given that we work on the same samples in the alignment phrase, the normalized uθ(𝐩i)u_{\theta}(\mathbf{p}_{i}) is unlikely to deviate from the octahedral variety. We remove the distance weighting, as we empirically find it contributes little here, possibly due to the inherent ambiguity in regularization process–the cubic symmetry of octahedral frame means the matched SDF gradient is not unique. This is also why our method does not contribute to improving the normal consistency of unoriented point cloud reconstruction. Conversely, we require a good initialization, such as DiGS or NSH, to realize our method’s full potential.

3.4. Practical concern

Note that smoothing is at the cost of reducing the capacity of the model to represent high-frequency information (figure 5). Therefore, we use NSH as our backbone and follow their practice, we also sample close-surface samples and off-surface samples, denoted 𝐩close\mathbf{p}_{\text{close}} and 𝐩off\mathbf{p}_{\text{off}} respectively. The off-surface point cloud is sampled uniformly in the unit bounding box, while close samples are drawn from the normal distribution, with sigma being the maximum KNN distance of k=51k=51, centered at the scanning input 𝐩\mathbf{p}. Our final losses are:

total\displaystyle\mathcal{L}_{\text{total}} =λalignalign(𝐩)+λregularizeregularize(𝐩)+λliplip\displaystyle=\lambda_{\text{align}}\cdot\mathcal{L}_{\text{align}}(\mathbf{p})+\lambda_{\text{regularize}}\cdot\mathcal{L}_{\text{regularize}}(\mathbf{p})+\lambda_{\text{lip}}\cdot\mathcal{L}_{\text{lip}}
+λNSH|det(H(fω(𝐩close)))|\displaystyle+\lambda_{\text{NSH}}\cdot\sum|\det(H(f_{\omega}(\mathbf{p}_{\text{close}})))|
+λeikonalfω(𝐩𝐩off)1\displaystyle+\lambda_{\text{eikonal}}\cdot\|\|\nabla f_{\omega}(\mathbf{p}\cup\mathbf{p}_{\text{off}})\|-1\|
+λpositionalfω(𝐩)\displaystyle+\lambda_{\text{positional}}\cdot\sum f_{\omega}(\mathbf{p})
+λoffexp(α|fω(𝐩off)|)\displaystyle+\lambda_{\text{off}}\cdot\exp(-\alpha|f_{\omega}(\mathbf{p}_{\text{off}})|)

where λ\lambdas are balancing weight. Note that evaluating our loss only requires input point cloud.

We fix λNSH=3\lambda_{\text{NSH}}=3, λeikonal=50\lambda_{\text{eikonal}}=50, λoff=100\lambda_{\text{off}}=100, λalign=100\lambda_{\text{align}}=100, λregularize=10\lambda_{\text{regularize}}=10, λlip=106\lambda_{\text{lip}}=10^{-6}, and adapt two scheduling schemes:

  • Low-noise: Set λpositional=7000\lambda_{\text{positional}}=7000, annealing λNSH\lambda_{\text{NSH}} to 3×1043\times 10^{-4} after 10%10\% of training step, start alignment and smoothing at 40%40\%, regularization at 60%60\%.

  • High-noise: Set λpositional=3500\lambda_{\text{positional}}=3500, annealing λNSH\lambda_{\text{NSH}} to 3×1033\times 10^{-3} after 10%10\% of training step, start alignment and smoothing at 20%20\%, regularization at 40%40\%.

This might seem counterintuitive because we change the schedule of losses rather than adjusting their weights. The rationale is that we want our octahedral field to capture the geometry before it is fully corrupted, while not overfitting the overly smooth initialization. Given that small holes are often not observed sufficiently during scanning and emerge later in the fitting process (Figure 5), we postpond scheduling to capture those small details. However, when the noise level is high, the noise is corrupted so quickly that we cannot afford to leave more to come, so we start our alignment shortly after its annealing stage.

Note that λpositional\lambda_{\text{positional}} and λNSH\lambda_{\text{NSH}} in the low noise scheme are the default values in their original implementation, the tuning in our high noise scheme aims to slow down the noise corruption. The resulting effect is illustrated as the fine-tuned one in Figure 2.

Following the practice of LipMLP, we spatially scale the input by 100100–it is equivalent to premultiply first weight matrix by the same amount Sitzmann et al. (2020) to speed up convergence, so our octahedral field can ”snapshot” as quickly as possible–our method cannot fix a already corrupted geometry.

4. Experiment

We implement our method using JAX (Bradbury et al., 2018) (Equinox (Kidger and Garcia, 2021) for network, optax (DeepMind et al., 2020) for optimizer). We use NSH (Wang et al., 2023) as our backbone method. Therefore, we follow their training setup, use 4 layers SIREN of 256 units, normalize each scan to unit square and sample input, off-surface and close-surface point cloud for 1500015000 each, and fit over 1000010000 iterations.

We use ADAM optimizer with learning rate of 5e55e-5 for both SIREN and LipMLP, and extract the surface using Marching Cube (MC) with voxel grid of 5123512^{3}. The full implementation is available at: https://github.com/Ankbzpx/frame-field.

4.1. Metrics

For quantitative evaluation, we follow DiGS (Ben-Shabat et al., 2023) and report Chamfer distance (×103)(\times 10^{3}) (Fan et al., 2016) and Hausdorff distance (×102)(\times 10^{2}) over 1M points randomly sampled on surface or point cloud, depending on the method’s output. Given that neural implicit representations are susceptible to floating artifacts (Figure 8), we additionally report the F-score (Knapitsch et al., 2017). It clamps the closest matching distance by a threshold and reports as a percentage and hence is less sensitive to the mismatching of surplus parts. Following Tatarchenko et al. (2019), we use the distance threshold of 0.5%0.5\%. Note that we do not postprocess the output for any methods and leave the extracted meshes or point clouds as is.

4.2. Surface Reconstruction Benchmark

We first compare our method with existing approaches based on neural implicit representation fitting, namely DGR (Williams et al., 2019), SIREN (Sitzmann et al., 2020), DiGS (Ben-Shabat et al., 2023), NSH (Wang et al., 2023). We use the Surface Reconstruction Benchmark (SRB) (Berger et al., 2013) dataset, which consists of 5 scans that exhibit triangulation-based scanning patterns. We fit the full point cloud for all methods and use our low-noise scheme for our fitting process. DGR (Williams et al., 2019) outputs a dense denoised point cloud, which we use SPSR for visualization purposes.

Our method achieves the best F-score, second best Chamfer distance with the smallest standard deviation (Table 1), which is aligned with our cleaner reconstruction quality (Figure 7).

Refer to caption
Figure 7. Qualitative result on SRB.
\Description

SRB qualitative results.

Table 1. Quantitative results on SRB (Berger et al., 2013). The bold text indicates the best score, the underline text indicates the second best score

. Chamfer \downarrow Hausdorff \downarrow F-score \uparrow mean std mean std mean std DGP 0.227 0.073 5.194 2.291 91.744 2.926 SIREN 0.316 0.262 7.121 7.823 89.603 8.003 DiGS 0.195 0.071 3.843 2.274 93.057 4.040 NSH 0.200 0.075 3.867 2.885 92.725 3.730 Ours 0.196 0.069 3.923 2.676 93.088 4.028

4.3. ABC and Thingi10k

We further evaluate our methods on two widely evaluated datasets, ABC (Koch et al., 2019) and Thingi10k (Zhou and Jacobson, 2016). The former consists of CAD models of sharp edges, whereas the latter is made up of more general shapes. We leverage the 100 test split from Points2Surf (Erler et al., 2020) for both datasets and further use Blensor (Gschwandtner et al., 2011) to simulate the time-of-flight (ToF) scanning process. Specifically, we set sensor resolution to 176×144176\times 144, focal length to 10mm and scanning each object spherically for 30 scans, resulting in point clouds of size ranging from 20k to 100k. We generate scanning data for two noise levels, 𝒩(0,0.01L)\mathcal{N}(0,0.01L) and 𝒩(0,0.005L)\mathcal{N}(0,0.005L) (so 400400 in total), where LL is the length of the maximum edge of the model’s bounding box. Our two schemes in Section 3.3 are tuned with respect to these two noise levels, respectively, although we have shown that the low-level scheme also works for SRB as well. For reference, data-driven methods (Erler et al., 2020; Huang et al., 2023) typically randomize noise with the sigma in between 0.01L0.01L and 0.05L0.05L, so our noise level is their lower bound.

We compare with 9 methods, 2 axiomatic ones (APSS (Guennebaud and Gross, 2007), SPSR (Kazhdan and Hoppe, 2013)), 1 resampling (EAR (Huang et al., 2013)), 2 patch-based denoising (GLR (Zeng et al., 2019), LP (Wei et al., 2023a)), 3 implicit fitting (SIREN (Sitzmann et al., 2020), DiGS (Ben-Shabat et al., 2023), NSH (Wang et al., 2023)) and 1 with data prior (NKSR (Huang et al., 2023)). For point cloud visualization, we use SPSR if the method outputs normals, otherwise we use the Advancing Front (Zienkiewicz et al., 2013). The full results are shown in figure 10, 11, and the quantitative evaluation is provided in the supplementary. Although our method shrines the most with CAD models, it succeeds in denoisng more general shapes such as sculptures as well. However, with noise sigma 0.01L0.01L, our method somethingswould output a low-poly-like reconstruction–the likely cause would be, when the noise level is large, the implicit geometry evolves quickly such that constant changing surface normal makes the octahedral field hard to converge.

In our experiment, we find NSH, so is our method, fails to reconstruct three cases (Figure 8) that gives erroneously large closest matching distances (50×50\times larger than the second worst case). Therefore, we report both original metrics and those with failure cases removed.

5. Limitations and future work

The main limitation of our method is its ambiguity in regularization–the gradient of SDF is constrained to match any one of the three axes of the octahedral frame, so it has the tendency to close or distort open holes (Figure 9). This can be alleviated with a lower regularization weight, but at the cost of less effective optimization. Our method requires careful tuning with respect to noise level and relies on a faithful normal initialization. When the backbone methods fail, our regularization fails as well (Figure 8). Although our method has the potential to scale up for larger scenes, the choice of SDF prohibits its scalability. Those scans contain thin and one-sided structures, which our method already struggles against (Figure 8).

Refer to caption
Figure 8. Our method relies on a backbone method for normal initialization. Thus, we are prone to the same limitation of the backbone method–surplus geometries (Left) and struggle against noisy thin structures.
\Description

Failure case.

Refer to caption
Figure 9. Our method requires sufficient samples to fit a normal aligned octahedral field. Notice the difference of octahedral frame distribution at cylinder handle and at the holes. When scanning noise is large, our method can distort the shape of the hole.
\Description

Limitation.

6. Conclusion

We introduce the neural octahedral field, a guiding field that when pairs with neural implicit representation, can simultaneously smooth while emphasizing its sharp edges. We design effective normal alignment loss and utilize the Lipschitz regularization to encourage field smoothness. We further examine its loss manifold with respect to normal directions, draw its connection with regularization quality, and propose to use the L1L_{1} distance to improve its sharp edge awareness. We extensively compare our method with existing baselines to demonstrate our effectiveness in simultaneously smoothing and sharp edge regularizing. In future work, we would like to explore the potential of the neural octahedral field for hexahedral meshing.

References

  • (1)
  • Alexa and Adamson (2004) Marc Alexa and Anders Adamson. 2004. On normals and projection operators for surfaces defined by point sets. In Proceedings of the First Eurographics Conference on Point-Based Graphics (Switzerland) (SPBG’04). Eurographics Association, Goslar, DEU, 149–155.
  • Alexa et al. (2001) M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C.T. Silva. 2001. Point set surfaces. In Proceedings Visualization, 2001. VIS ’01. 21–29, 537. https://doi.org/10.1109/VISUAL.2001.964489
  • Anandkumar et al. (2012) Anima Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2012. Tensor decompositions for learning latent variable models. arXiv:1210.7559 [cs.LG]
  • Atkinson and Riani (2012) A. Atkinson and M. Riani. 2012. Robust Diagnostic Regression Analysis. Springer New York. https://books.google.com.hk/books?id=sZ3SBwAAQBAJ
  • Atzmon and Lipman (2020) Matan Atzmon and Yaron Lipman. 2020. SAL: Sign Agnostic Learning of Shapes from Raw Data. arXiv:1911.10414 [cs.CV]
  • Atzmon and Lipman (2021) Matan Atzmon and Yaron Lipman. 2021. SALD: Sign Agnostic Learning with Derivatives. In 9th International Conference on Learning Representations, ICLR 2021.
  • Barschkis (2023) Sebastian Barschkis. 2023. Exact and soft boundary conditions in Physics-Informed Neural Networks for the Variable Coefficient Poisson equation. arXiv:2310.02548 [cs.LG]
  • Ben-Shabat et al. (2023) Yizhak Ben-Shabat, Chamin Hewa Koneputugodage, and Stephen Gould. 2023. DiGS : Divergence guided shape implicit neural representation for unoriented point clouds. arXiv:2106.10811 [cs.CV]
  • Berger et al. (2013) Matthew Berger, Joshua A Levine, Luis Gustavo Nonato, Gabriel Taubin, and Claudio T Silva. 2013. A benchmark for surface reconstruction. ACM Transactions on Graphics (TOG) 32, 2 (2013), 1–17.
  • Boralevi et al. (2017) Ada Boralevi, Jan Draisma, Emil Horobeţ, and Elina Robeva. 2017. Orthogonal and unitary tensor decomposition from an algebraic perspective. Israel Journal of Mathematics 222, 1 (Oct. 2017), 223–260. https://doi.org/10.1007/s11856-017-1588-6
  • Bradbury et al. (2018) James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs. http://github.com/google/jax
  • Chemin et al. (2019) Alexandre Chemin, François Henrotte, Jean-François Remacle, and Jean Van Schaftingen. 2019. Representing Three-Dimensional Cross Fields Using Fourth Order Tensors. Springer International Publishing, 89–108. https://doi.org/10.1007/978-3-030-13992-6_6
  • DeepMind et al. (2020) DeepMind, Igor Babuschkin, Kate Baumli, Alison Bell, Surya Bhupatiraju, Jake Bruce, Peter Buchlovsky, David Budden, Trevor Cai, Aidan Clark, Ivo Danihelka, Antoine Dedieu, Claudio Fantacci, Jonathan Godwin, Chris Jones, Ross Hemsley, Tom Hennigan, Matteo Hessel, Shaobo Hou, Steven Kapturowski, Thomas Keck, Iurii Kemaev, Michael King, Markus Kunesch, Lena Martens, Hamza Merzic, Vladimir Mikulik, Tamara Norman, George Papamakarios, John Quan, Roman Ring, Francisco Ruiz, Alvaro Sanchez, Laurent Sartran, Rosalia Schneider, Eren Sezener, Stephen Spencer, Srivatsan Srinivasan, Miloš Stanojević, Wojciech Stokowiec, Luyu Wang, Guangyao Zhou, and Fabio Viola. 2020. The DeepMind JAX Ecosystem. http://github.com/google-deepmind
  • Desobry et al. (2021) David Desobry, Yoann Coudert-Osmont, Etienne Corman, Nicolas Ray, and Dmitry Sokolov. 2021. Designing 2D and 3D Non-Orthogonal Frame Fields. Computer-Aided Design 139 (2021), 103081. https://doi.org/10.1016/j.cad.2021.103081
  • Erler et al. (2020) Philipp Erler, Paul Guerrero, Stefan Ohrhallinger, Niloy J. Mitra, and Michael Wimmer. 2020. Points2Surf Learning Implicit Surfaces from Point Clouds. Springer International Publishing, 108–124. https://doi.org/10.1007/978-3-030-58558-7_7
  • Fan et al. (2016) Haoqiang Fan, Hao Su, and Leonidas Guibas. 2016. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. arXiv:1612.00603 [cs.CV]
  • Fischler and Bolles (1981) Martin A. Fischler and Robert C. Bolles. 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6 (jun 1981), 381–395. https://doi.org/10.1145/358669.358692
  • Fleishman et al. (2005) Shachar Fleishman, Daniel Cohen-Or, and Cláudio T. Silva. 2005. Robust moving least-squares fitting with sharp features. ACM Trans. Graph. 24, 3 (jul 2005), 544–552. https://doi.org/10.1145/1073204.1073227
  • Green (2003) Robin Green. 2003. Spherical harmonic lighting: The gritty details. In Archives of the game developers conference, Vol. 56. 4.
  • Gropp et al. (2020) Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. 2020. Implicit Geometric Regularization for Learning Shapes. In Proceedings of Machine Learning and Systems 2020. 3569–3579.
  • Gschwandtner et al. (2011) Michael Gschwandtner, Roland Kwitt, Andreas Uhl, and Wolfgang Pree. 2011. Blensor: Blender sensor simulation toolbox. In Advances in Visual Computing: 7th International Symposium, ISVC 2011, Las Vegas, NV, USA, September 26-28, 2011. Proceedings, Part II 7. Springer, 199–208.
  • Guennebaud and Gross (2007) Gaël Guennebaud and Markus Gross. 2007. Algebraic point set surfaces. ACM Trans. Graph. 26, 3 (jul 2007), 23–es. https://doi.org/10.1145/1276377.1276406
  • Huang et al. (2013) Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, Uri Ascher, and Hao (Richard) Zhang. 2013. Edge-aware point set resampling. ACM Trans. Graph. 32, 1, Article 9 (feb 2013), 12 pages. https://doi.org/10.1145/2421636.2421645
  • Huang et al. (2023) Jiahui Huang, Zan Gojcic, Matan Atzmon, Or Litany, Sanja Fidler, and Francis Williams. 2023. Neural Kernel Surface Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4369–4379.
  • Huang et al. (2011) Jin Huang, Yiying Tong, Hongyu Wei, and Hujun Bao. 2011. Boundary Aligned Smooth 3D Cross-Frame Field. ACM Trans. Graph. 30, 6 (dec 2011), 1–8. https://doi.org/10.1145/2070781.2024177
  • Huang et al. (2021) Qixing Huang, Xiangru Huang, Bo Sun, Zaiwei Zhang, Junfeng Jiang, and Chandrajit Bajaj. 2021. ARAPReg: An As-Rigid-As Possible Regularization Loss for Learning Deformable Shape Generators. arXiv:2108.09432 [cs.CV]
  • Huang et al. (2019) Zhiyang Huang, Nathan Carr, and Tao Ju. 2019. Variational implicit point set surfaces. ACM Trans. Graph. 38, 4, Article 124 (jul 2019), 13 pages. https://doi.org/10.1145/3306346.3322994
  • Huang and Ju (2016) Zhiyang Huang and Tao Ju. 2016. Extrinsically smooth direction fields. Computers & Graphics 58 (2016), 109–117.
  • Huang et al. (2022) Zhangjin Huang, Yuxin Wen, Zihao Wang, Jinjuan Ren, and Kui Jia. 2022. Surface Reconstruction from Point Clouds: A Survey and a Benchmark. arXiv:2205.02413 [cs.CV]
  • Jakob et al. (2015) Wenzel Jakob, Marco Tarini, Daniele Panozzo, Olga Sorkine-Hornung, et al. 2015. Instant field-aligned meshes. ACM Trans. Graph. 34, 6 (2015), 189–1.
  • Kazhdan et al. (2006) Michael Kazhdan, Matthew Bolitho, and Hugues Hoppe. 2006. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, Vol. 7.
  • Kazhdan and Hoppe (2013) Michael Kazhdan and Hugues Hoppe. 2013. Screened poisson surface reconstruction. ACM Transactions on Graphics (ToG) 32, 3 (2013), 1–13.
  • Kidger and Garcia (2021) Patrick Kidger and Cristian Garcia. 2021. Equinox: neural networks in JAX via callable PyTrees and filtered transformations. Differentiable Programming workshop at Neural Information Processing Systems 2021 (2021).
  • Knapitsch et al. (2017) Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. 2017. Tanks and Temples: Benchmarking Large-Scale Scene Reconstruction. ACM Transactions on Graphics 36, 4 (2017).
  • Knöppel et al. (2013) Felix Knöppel, Keenan Crane, Ulrich Pinkall, and Peter Schröder. 2013. Globally optimal direction fields. ACM Transactions on Graphics (ToG) 32, 4 (2013), 1–10.
  • Koch et al. (2019) Sebastian Koch, Albert Matveev, Zhongshi Jiang, Francis Williams, Alexey Artemov, Evgeny Burnaev, Marc Alexa, Denis Zorin, and Daniele Panozzo. 2019. ABC: A Big CAD Model Dataset For Geometric Deep Learning. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • Lathauwer et al. (1995) Lieven De Lathauwer, Pierre Comon, Bart De Moor, and Joos Vandewalle. 1995. Higher-order power method - application in independent component analysis. https://api.semanticscholar.org/CorpusID:115691434
  • Levin (2004) David Levin. 2004. Mesh-independent surface interpolation. In Geometric modeling for scientific visualization. Springer, 37–49.
  • Liu et al. (2022) Hsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, and Or Litany. 2022. Learning Smooth Neural Functions via Lipschitz Regularization. https://doi.org/10.48550/ARXIV.2202.08345
  • Ma et al. (2023) Baorui Ma, Junsheng Zhou, Yu-Shen Liu, and Zhizhong Han. 2023. Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment. arXiv:2305.11601 [cs.CV]
  • Mehta et al. (2022) Ishit Mehta, Manmohan Chandraker, and Ravi Ramamoorthi. 2022. A Level Set Theory for Neural Implicit Evolution under Explicit Flows. arXiv:2204.07159 [cs.CV]
  • Mescheder et al. (2019) Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. 2019. Occupancy Networks: Learning 3D Reconstruction in Function Space. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).
  • Mildenhall et al. (2021) Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. 2021. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  • Palmer et al. (2020) David Palmer, David Bommes, and Justin Solomon. 2020. Algebraic Representations for Volumetric Frame Fields. ACM Trans. Graph. 39, 2, Article 16 (apr 2020), 17 pages. https://doi.org/10.1145/3366786
  • Park et al. (2019) Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. 2019. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 165–174.
  • Park et al. (2023) Yesom Park, Taekyung Lee, Jooyoung Hahn, and Myungjoo Kang. 2023. pp-Poisson surface reconstruction in curl-free flow from point clouds. arXiv:2310.20095 [cs.CV]
  • Raissi et al. (2019) Maziar Raissi, Paris Perdikaris, and George E Karniadakis. 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 378 (2019), 686–707.
  • Ray et al. (2016) Nicolas Ray, Dmitry Sokolov, and Bruno Lévy. 2016. Practical 3D frame field generation. ACM Transactions on Graphics (TOG) 35, 6 (2016), 1–9.
  • Remelli et al. (2020) Edoardo Remelli, Artem Lukoianov, Stephan R. Richter, Benoît Guillard, Timur Bagautdinov, Pierre Baque, and Pascal Fua. 2020. MeshSDF: Differentiable Iso-Surface Extraction. arXiv:2006.03997 [cs.CV]
  • Robeva (2016) Elina Robeva. 2016. Orthogonal Decomposition of Symmetric Tensors. SIAM J. Matrix Anal. Appl. 37, 1 (Jan. 2016), 86–102. https://doi.org/10.1137/140989340
  • Sarkar et al. (2018) Kripasindhu Sarkar, Florian Bernard, Kiran Varanasi, Christian Theobalt, and Didier Stricker. 2018. Structured Low-Rank Matrix Factorization for Point-Cloud Denoising. In 2018 International Conference on 3D Vision (3DV). 444–453. https://doi.org/10.1109/3DV.2018.00058
  • Sitzmann et al. (2020) Vincent Sitzmann, Julien N.P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. 2020. Implicit Neural Representations with Periodic Activation Functions. In arXiv.
  • Sloan (2008) Peter-Pike Sloan. 2008. Stupid spherical harmonics (sh) tricks. In Game developers conference, Vol. 9. 42.
  • Solomon et al. (2017) Justin Solomon, Amir Vaxman, and David Bommes. 2017. Boundary Element Octahedral Fields in Volumes. ACM Trans. Graph. 36, 4, Article 114b (jul 2017), 16 pages. https://doi.org/10.1145/3072959.3065254
  • Sukumar and Srivastava (2022) N. Sukumar and Ankit Srivastava. 2022. Exact imposition of boundary conditions with distance functions in physics-informed deep neural networks. Computer Methods in Applied Mechanics and Engineering 389 (Feb. 2022), 114333. https://doi.org/10.1016/j.cma.2021.114333
  • Sulzer et al. (2024) Raphael Sulzer, Renaud Marlet, Bruno Vallet, and Loic Landrieu. 2024. A Survey and Benchmark of Automatic Surface Reconstruction from Point Clouds. arXiv:2301.13656 [cs.CV]
  • Tatarchenko et al. (2019) Maxim Tatarchenko, Stephan R. Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. 2019. What Do Single-view 3D Reconstruction Networks Learn? CVPR.
  • Wang et al. (2023) Zixiong Wang, Yunxiao Zhang, Rui Xu, Fan Zhang, Pengshuai Wang, Shuangmin Chen, Shiqing Xin, Wenping Wang, and Changhe Tu. 2023. Neural-Singular-Hessian: Implicit Neural Representation of Unoriented Point Clouds by Enforcing Singular Hessian. arXiv:2309.01793 [cs.CV]
  • Wei et al. (2023b) Guangshun Wei, Hao Pan, Shaojie Zhuang, Yuanfeng Zhou, and Changjian Li. 2023b. iPUNet:Iterative Cross Field Guided Point Cloud Upsampling. arXiv:2310.09092 [cs.CV]
  • Wei et al. (2023a) Jiayi Wei, Jiong Chen, Damien Rohmer, Pooran Memari, and Mathieu Desbrun. 2023a. Robust Pointset Denoising of Piecewise-Smooth Surfaces through Line Processes. Computer Graphics Forum 42, 2 (2023), 175–189. https://doi.org/10.1111/cgf.14752 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14752
  • Williams et al. (2019) Francis Williams, Teseo Schneider, Claudio Silva, Denis Zorin, Joan Bruna, and Daniele Panozzo. 2019. Deep Geometric Prior for Surface Reconstruction. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2019.01037
  • Xie et al. (2021) Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, and Srinath Sridhar. 2021. Neural Fields in Visual Computing and Beyond. CoRR abs/2111.11426 (2021). arXiv:2111.11426 https://arxiv.org/abs/2111.11426
  • Yang et al. (2021) Guandao Yang, Serge Belongie, Bharath Hariharan, and Vladlen Koltun. 2021. Geometry Processing with Neural Fields. In Thirty-Fifth Conference on Neural Information Processing Systems.
  • Zeng et al. (2019) Jin Zeng, Gene Cheung, Michael Ng, Jiahao Pang, and Cheng Yang. 2019. 3D Point Cloud Denoising using Graph Laplacian Regularization of a Low Dimensional Manifold Model. arXiv:1803.07252 [cs.CV]
  • Zhang et al. (2023) Baowen Zhang, Jiahe Li, Xiaoming Deng, Yinda Zhang, Cuixia Ma, and Hongan Wang. 2023. Self-supervised Learning of Implicit Shape Representation with Dense Correspondence for Deformable Objects. arXiv:2308.12590 [cs.CV]
  • Zhang et al. (2020) Paul Zhang, Josh Vekhter, Edward Chien, David Bommes, Etienne Vouga, and Justin Solomon. 2020. Octahedral frames for feature-aligned cross fields. ACM Transactions on Graphics (TOG) 39, 3 (2020), 1–13.
  • Zhou and Jacobson (2016) Qingnan Zhou and Alec Jacobson. 2016. Thingi10K: A Dataset of 10,000 3D-Printing Models. arXiv:1605.04797 [cs.GR]
  • Zhou et al. (2019) Yi Zhou, Connelly Barnes, Jingwan Lu, Jimei Yang, and Hao Li. 2019. On the Continuity of Rotation Representations in Neural Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2019.00589
  • Zienkiewicz et al. (2013) O.C. Zienkiewicz, R.L. Taylor, and J.Z. Zhu. 2013. Chapter 17 - Automatic Mesh Generation. In The Finite Element Method: its Basis and Fundamentals (Seventh Edition) (seventh edition ed.), O.C. Zienkiewicz, R.L. Taylor, and J.Z. Zhu (Eds.). Butterworth-Heinemann, Oxford, 573–640. https://doi.org/10.1016/B978-1-85617-633-0.00017-4
  • Öztireli et al. (2009) A. C. Öztireli, G. Guennebaud, and M. Gross. 2009. Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression. Computer Graphics Forum 28, 2 (2009), 493–501. https://doi.org/10.1111/j.1467-8659.2009.01388.x arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8659.2009.01388.x
Refer to caption
Figure 10. Qualitative comparison with noise level 0.002L.
\Description

0.002L full images.

Refer to caption
Figure 11. Qualitative comparison with noise level 0.01L.
\Description

0.01L full images.

Table 2. The results from two noise levels are separated by a slash, with left indicates noise σ=0.002L\sigma=0.002L and right σ=0.01L\sigma=0.01L. Note methods marked with * require normal input (PCA normal with KNN k=16k=16, filtered by scanning view directions), and methods marked with ’†’ have failure cases removed. The bold text indicates the best scores, while the underlined text indicates the best scores for methods that do not need normal input.
ABC (Koch et al., 2019) Thingi10k (Zhou and Jacobson, 2016)
Chamfer \downarrow Hausdorff \downarrow F-score \uparrow Chamfer \downarrow Hausdorff \downarrow F-score \uparrow
APSS* (Guennebaud and Gross, 2007) 2.333 / 4.863 5.043 / 9.078 94.693 / 70.838 1.346 / 4.333 3.367 / 7.960 97.542 / 68.216
SPSR* (Kazhdan and Hoppe, 2013) 3.306 / 3.999 6.091 / 6.162 91.756 / 89.185 1.892 / 2.765 3.939 / 4.338 96.437 / 93.299
EAR* (Huang et al., 2013) 4.066 / 4.375 5.785 / 6.133 84.405 / 81.057 3.590 / 3.843 3.541 / 4.393 80.505 / 78.195
NKSR* (Huang et al., 2023) 2.929 / 3.600 6.579 / 7.152 93.636 / 90.184 1.594 / 2.338 5.127 / 6.696 97.082 / 93.307
GLR (Zeng et al., 2019) 4.026 / 4.965 5.768 / 6.233 82.602 / 75.153 2.774 / 3.603 3.236 / 3.756 87.909 / 79.139
LP (Wei et al., 2023a) 4.601 / 5.917 5.634 / 6.132 76.228 / 60.472 3.464 / 4.514 3.173 / 3.810 78.731 / 61.982
SIREN (Sitzmann et al., 2020) 8.835 / 6.427 14.805 / 8.978 87.537 / 58.945 11.019 / 5.710 18.854 / 9.071 85.593 / 55.810
DiGS (Ben-Shabat et al., 2023) 3.734 / 6.590 10.640 / 11.484 93.885 / 58.463 2.232 / 6.046 9.124 / 8.042 96.775 / 51.056
NSH (Wang et al., 2023) 5.755 / 5.420 8.847 / 7.516 92.391 / 64.614 3.792 / 5.119 6.356 / 7.386 96.064 / 59.365
Ours 5.393 / 7.062 7.861 / 9.247 93.496 / 87.831 3.804 / 3.798 6.051 / 5.330 96.419 / 90.663
NSH†(Wang et al., 2023) 4.426 / 5.464 7.893 / 7.598 93.173 / 64.227 3.407 / 5.093 6.074 / 7.236 96.604/ 59.387
Ours† 3.274 / 4.124 6.540 / 7.542 94.416 / 89.133 2.650 / 3.012 5.422 / 4.792 97.229 / 91.217

Appendix A Octahedral frame background

This section is an review of SH parameterized octahedral field and its functional or explicit vector representation. Experienced reader may refer to next section for implementation details.

An octahedral frame can be represented by 3 orthogonal unit vectors and their opposites:

{𝐯1,𝐯1,𝐯2,𝐯2,𝐯3,𝐯3}3\{\mathbf{v}_{1},-\mathbf{v}_{1},\mathbf{v}_{2},-\mathbf{v}_{2},\mathbf{v}_{3},-\mathbf{v}_{3}\}\in\mathbb{R}^{3}

Such frame exhibits cubic symmetry–permuting or flipping the 3 basis vectors gives equivalent frame. Thus, measuring the difference between 2 frames requires matching of their representation vectors, making optimizing the smoothness of octaheral field a mixed-integer programming problem.

A.1. SH functional representation

Huang et al. (2011) observe any octahedral frame can be equivalently represented as a rotation matrix 𝐑SO(3)\mathbf{R}\in SO(3) associated with the canonical frame of standard basis vectors {𝐞x,𝐞y,𝐞z}\{\mathbf{e}_{x},\mathbf{e}_{y},\mathbf{e}_{z}\}, where 𝐞x=[1,0,0]T\mathbf{e}_{x}=[1,0,0]^{T}, 𝐞y=[0,1,0]T\mathbf{e}_{y}=[0,1,0]^{T}, 𝐞z=[0,0,1]T3\mathbf{e}_{z}=[0,0,1]^{T}\in\mathbb{R}^{3}.

To avoid matching, Huang et al. propose to measure the difference between two octahedral frames by the spherical integration of a descriptor function that is invariant to cubic symmetry. Specifically, for the canonical frame, they design the function over the unit sphere F:𝒮2F:\mathcal{S}^{2}\to\mathbb{R} that gives equivalent value under the sign change or re-ordering of its representation vectors {𝐞x,𝐞y,𝐞z}\{\mathbf{e}_{x},\mathbf{e}_{y},\mathbf{e}_{z}\}:

F(𝐬)=(𝐬𝐞x)2(𝐬𝐞y)2+(𝐬𝐞y)2(𝐬𝐞z)2+(𝐬𝐞z)2(𝐬𝐞x)2F(\mathbf{s})=(\mathbf{s}\cdot\mathbf{e}_{x})^{2}(\mathbf{s}\cdot\mathbf{e}_{y})^{2}+(\mathbf{s}\cdot\mathbf{e}_{y})^{2}(\mathbf{s}\cdot\mathbf{e}_{z})^{2}+(\mathbf{s}\cdot\mathbf{e}_{z})^{2}(\mathbf{s}\cdot\mathbf{e}_{x})^{2}

or under Cartesian coordinate:

(8) F(𝐬)=x2y2+y2z2+z2x2,𝐬=[x,y,z]TS2F(\mathbf{s})=x^{2}y^{2}+y^{2}z^{2}+z^{2}x^{2},\mathbf{s}=[x,y,z]^{T}\in S^{2}

The general frame can then be described by F(𝐑T𝐬)F(\mathbf{R}^{T}\mathbf{s}), obtained by rotating either its input or its representation vectors. Both interpretations give the same result as (𝐑T𝐬)𝐞i=𝐬(𝐑𝐞i)(\mathbf{R}^{T}\mathbf{s})\cdot\mathbf{e}_{i}=\mathbf{s}\cdot(\mathbf{R}\mathbf{e}_{i}).

A.1.1. Smoothness

An efficient way to evaluate the functional integral on unit sphere is to project the function onto Spherical Harmonic (SH) basis. Let the projection of the reference descriptor function be:

F(𝐬)=l,mclmYlm(𝐬)=𝐲(𝐬)T𝐜F(\mathbf{s})=\sum_{l,m}c_{l}^{m}Y_{l}^{m}(\mathbf{s})=\mathbf{y}(\mathbf{s})^{T}\mathbf{c}

where YlmY_{l}^{m} is SH basis function of band ll order mm, clmc_{l}^{m} are corresponding coefficients. 𝐲\mathbf{y} and 𝐜\mathbf{c} are their vector form. The rotational invariance of SH (Sloan, 2008) guarantees the rotated version, obtained by rotating its input, to be equally projected. More importantly, its coefficients can be acquired by the linear combination of the coefficients of its unrotated counterpart (Green, 2003):

F(𝐑T𝐬)=𝐲(𝐑T𝐬)T𝐜=𝐲(𝐬)T(𝐑~𝐜)F(\mathbf{R}^{T}\mathbf{s})=\mathbf{y}(\mathbf{R}^{T}\mathbf{s})^{T}\mathbf{c}=\mathbf{y}(\mathbf{s})^{T}(\tilde{\mathbf{R}}\mathbf{c})

where 𝐑~\tilde{\mathbf{R}} is the Wigner D-matrix induced from the 𝐑\mathbf{R} that has same dimension as 𝐜\mathbf{c}.

Thus, for two octahedral frames of rotation matrices 𝐑a\mathbf{R}_{a} and 𝐑b\mathbf{R}_{b}, let 𝐜a=𝐑~a𝐜\mathbf{c}_{a}=\tilde{\mathbf{R}}_{a}\mathbf{c}, 𝐜b=𝐑~b𝐜\mathbf{c}_{b}=\tilde{\mathbf{R}}_{b}\mathbf{c}. Their functional difference can be efficiently measured by the L2L_{2} norm their SH coefficients, thanks to the orthogonality of SH bases (Green, 2003):

𝒮2(F(𝐑aT𝐬)F(𝐑bT𝐬))2𝑑𝐬\displaystyle\int_{\mathcal{S}^{2}}(F(\mathbf{R}_{a}^{T}\mathbf{s})-F(\mathbf{R}_{b}^{T}\mathbf{s}))^{2}d\mathbf{s}
=\displaystyle= 𝒮2(𝐲(𝐬)T(𝐜a𝐜b))2𝑑𝐬\displaystyle\int_{\mathcal{S}^{2}}(\mathbf{y}(\mathbf{s})^{T}(\mathbf{c}_{a}-\mathbf{c}_{b}))^{2}d\mathbf{s}
=\displaystyle= tr((𝐜a𝐜b)T(𝒮2𝐲(𝐬)𝐲(𝐬)T𝑑s)(𝐜a𝐜b))\displaystyle tr((\mathbf{c}_{a}-\mathbf{c}_{b})^{T}(\int_{\mathcal{S}^{2}}\mathbf{y}(\mathbf{s})\mathbf{y}(\mathbf{s})^{T}ds)(\mathbf{c}_{a}-\mathbf{c}_{b}))
=\displaystyle= tr((𝐜a𝐜b)T𝐈(𝐜a𝐜b))\displaystyle tr((\mathbf{c}_{a}-\mathbf{c}_{b})^{T}\mathbf{I}(\mathbf{c}_{a}-\mathbf{c}_{b}))
=\displaystyle= 𝐜a𝐜b22\displaystyle\|\mathbf{c}_{a}-\mathbf{c}_{b}\|_{2}^{2}

In practice, Huang et al. find the F(𝐬)F(\mathbf{s}) can be losslessly projected on the band 0 and 4 of SH basis as:

F(𝐬)=c0(7Y40+5Y44+c1Y00)F(\mathbf{s})=c_{0}(\sqrt{7}Y_{4}^{0}+\sqrt{5}Y_{4}^{4}+c_{1}Y_{0}^{0})

where c0c_{0}, c1c_{1} are scaling constants. Given Y00Y_{0}^{0} is constant, and the global constant scaling won’t change pairwise measurement, F(𝐬)F(\mathbf{s}) can then be parameterized as the SH band 4 coefficient vector 𝐪0\mathbf{q}_{0} alone:

(9) 𝐪0=[0,0,0,0,7,0,0,0,5]T,𝐪=𝐑~𝐪09,𝐑~SO(9)\mathbf{q}_{0}=[0,0,0,0,\sqrt{7},0,0,0,\sqrt{5}]^{T},\mathbf{q}=\tilde{\mathbf{R}}\mathbf{q}_{0}\in\mathbb{R}^{9},\tilde{\mathbf{R}}\in SO(9)

where 𝐪\mathbf{q} represents general frames described by F(𝐑T𝐬)F(\mathbf{R}^{T}\mathbf{s}).

A.1.2. Alignment

Huang et al. further observe that any zz axis aligned octahedral frame has its Y40Y_{4}^{0} coefficient equals to 7\sqrt{7}:

(10) 𝐪z(4)=7\mathbf{q}_{z}(4)=\sqrt{7}

Thus, aligning frame with normal 𝐧\mathbf{n} can be enforced similarly, by apply rotation 𝐑~𝐧𝐳\tilde{\mathbf{R}}_{\mathbf{n}\to\mathbf{z}} from nn to zz, then constraining Y40Y_{4}^{0} coefficient to be 7\sqrt{7}:

(11) (𝐑~𝐧𝐳𝐪n)(4)=7(\tilde{\mathbf{R}}_{\mathbf{n}\to\mathbf{z}}\mathbf{q}_{n})(4)=\sqrt{7}

A.2. Improved functional representation

Ray et al. (2016) propose a simpler form of (8):

(12) F(𝐬)=x4+y4+z4F(\mathbf{s})=x^{4}+y^{4}+z^{4}

With a quick derivation, the descriptor function is a globally scaled and shifted version of the original function:

x4+y4+z4=12(x2y2+y2z2+z2x2)x^{4}+y^{4}+z^{4}=1-2(x^{2}y^{2}+y^{2}z^{2}+z^{2}x^{2})

Hence it exhibits the same invariance to cubic symmetry, but with its maxima matches the direction of the representation vectors

Similarly, it can be losslessly projected onto SH basis as:

F(𝐬)=(8π521𝐬4)(3214Y00(𝐬)+712Y40(𝐬)+512Y44(𝐬))F(\mathbf{s})=(\frac{8\sqrt{\pi}}{5\sqrt{21}}\|\mathbf{s}\|^{4})(\frac{3\sqrt{21}}{4}Y_{0}^{0}(\mathbf{s})+\sqrt{\frac{7}{12}}Y_{4}^{0}(\mathbf{s})+\sqrt{\frac{5}{12}}Y_{4}^{4}(\mathbf{s}))

Given 𝐬=1\|\mathbf{s}\|=1, and ignore coefficients of Y00Y_{0}^{0} and scaling, the octahedral frame can also be parameterized with SH band 4 coefficients as:

(13) 𝐪0=[0,0,0,0,712,0,0,0,512]T,𝐪=𝐑~𝐪09,𝐑~9×9\mathbf{q}_{0}=[0,0,0,0,\sqrt{\frac{7}{12}},0,0,0,\sqrt{\frac{5}{12}}]^{T},\mathbf{q}=\tilde{\mathbf{R}}\mathbf{q}_{0}\in\mathbb{R}^{9},\tilde{\mathbf{R}}\in\mathbb{R}^{9\times 9}

It is equivalent to (9), but with scaling such that 𝐪=1\|\mathbf{q}\|=1.

A.2.1. Alignment

Ray et al. apply rotation along zz axis to the coefficients of reference frame and find that any zz axis aligned frame can be expressed as:

𝐪z=𝐑~z(θ)𝐪0=[512cos4θ,0,0,0,712,0,0,0,512sin4θ]T\mathbf{q}_{z}=\tilde{\mathbf{R}}_{z}(\theta)\mathbf{q}_{0}=[\sqrt{\frac{5}{12}}\cos 4\theta,0,0,0,\sqrt{\frac{7}{12}},0,0,0,\sqrt{\frac{5}{12}}\sin 4\theta]^{T}

where θ\theta is the angle of tangential twist, 𝐑~z(θ)\tilde{\mathbf{R}}_{z}(\theta) is the corresponding Wigner D-matrix (Section 1.1 of their supplementary).

Therefore, the zz axis alignment can be extended with additional constraints:

𝐪z[4]=712\displaystyle\mathbf{q}_{z}[4]=\sqrt{\frac{7}{12}}
𝐪z[0]2+𝐪z[8]2=512\displaystyle\mathbf{q}_{z}[0]^{2}+\mathbf{q}_{z}[8]^{2}=\frac{5}{12}

where [][\cdot] denotes 0 based array indexing.

So is the normal alignment:

(14) 𝐪n\displaystyle\mathbf{q}_{n} =712𝐑~𝐳𝐧[0,0,0,0,1,0,0,0,0]T\displaystyle=\sqrt{\frac{7}{12}}\tilde{\mathbf{R}}_{\mathbf{z}\to\mathbf{n}}[0,0,0,0,1,0,0,0,0]^{T}
+c0512𝐑~𝐳𝐧[1,0,0,0,0,0,0,0,0]T\displaystyle+c_{0}\sqrt{\frac{5}{12}}\tilde{\mathbf{R}}_{\mathbf{z}\to\mathbf{n}}[1,0,0,0,0,0,0,0,0]^{T}
+c8512𝐑~𝐳𝐧[0,0,0,0,0,0,0,0,1]T\displaystyle+c_{8}\sqrt{\frac{5}{12}}\tilde{\mathbf{R}}_{\mathbf{z}\to\mathbf{n}}[0,0,0,0,0,0,0,0,1]^{T}
subject toc02+c82=1\displaystyle\text{subject to}\ c_{0}^{2}+c_{8}^{2}=1

Compared to (10) and (11), additional constraints guarantee the 𝐪z\mathbf{q}_{z} and 𝐪n\mathbf{q}_{n} to be obtainable by rotating 𝐪0\mathbf{q}_{0}.

A.3. Improved normal alignment constraints

Solomon et al. (2017) observe that a smooth normal aligned octahedral field commonly has singularities, at which the evaluated Dirichlet energy is unbounded (Knöppel et al., 2013), undermining the effectiveness of the gradient steps. Thus, Solomon et al. propose to scale the tangential axes of the normal aligned frames, or equivalently, the xyxy axes when rotated to be zz axis aligned, to satisfy the topological restrictions. Numerically, instead of enforcing the c02+c82=1c_{0}^{2}+c_{8}^{2}=1 constraint in (14) everywhere, they relax it as the average over the boundary surface to be one:

V(c02+c82)𝑑𝐩=A\int_{\partial V}(c_{0}^{2}+c_{8}^{2})d\mathbf{p}=A

where AA is the area of the boundary surface V\partial V.

A.4. Improved rotation representation

In early work (Huang et al., 2011; Ray et al., 2016), the Wigner D-matrix is parameterized with ZYZ Euler angles representation, as rotation along zz axis is trivial to evaluate with spherical coordinates:

𝐑\displaystyle\mathbf{R} =𝐑x(α)𝐑y(β)𝐑z(γ)SO(3)\displaystyle=\mathbf{R}_{x}(\alpha)\mathbf{R}_{y}(\beta)\mathbf{R}_{z}(\gamma)\in SO(3)
𝐑~\displaystyle\tilde{\mathbf{R}} =𝐑~x(α)𝐑~y(β)𝐑~z(γ)SO(9)\displaystyle=\tilde{\mathbf{R}}_{x}(\alpha)\tilde{\mathbf{R}}_{y}(\beta)\tilde{\mathbf{R}}_{z}(\gamma)\in SO(9)
=𝐑y(π/2)T𝐑~z(α)𝐑y(π/2)T𝐑~x(π/2)𝐑~z(β)𝐑~x(π/2)T𝐑~z(γ)\displaystyle=\mathbf{R}_{y}(\pi/2)^{T}\tilde{\mathbf{R}}_{z}(\alpha)\mathbf{R}_{y}(\pi/2)^{T}\tilde{\mathbf{R}}_{x}(\pi/2)\tilde{\mathbf{R}}_{z}(\beta)\tilde{\mathbf{R}}_{x}(\pi/2)^{T}\tilde{\mathbf{R}}_{z}(\gamma)

However, the nonlinearity makes it difficult to analyze the behavior of the gradient of SH coefficients over spatial locations. Palmer et al. (2020) propose an elegant rotation vector based representation that makes this analysis feasible (Zhang et al., 2020).

Any rotation matrix 𝐑SO(3)\mathbf{R}\in SO(3) can be equivalently converted as axis-angle representation (𝐞,θ)(\mathbf{e},\theta) of the same rotation. It denotes the right-handed rotation of radians θ\theta along the unit vector 𝐞\mathbf{e}, and can be compactly written as a general vector 𝐯=θ𝐞3\mathbf{v}=\theta\mathbf{e}\in\mathbb{R}^{3}, known as the rotation vector. The skew symmetry matrix form of rotation vector [𝐯]×3×3[\mathbf{v}]_{\times}\in\mathbb{R}^{3\times 3} is an element of the lie algebra 𝔰𝔬(3)\mathfrak{so}(3) that associates the same rotation as 𝐑SO(3)\mathbf{R}\in SO(3):

𝐯=[vx,vy,vz]T\displaystyle\mathbf{v}=[v_{x},v_{y},v_{z}]^{T} 3\displaystyle\in\mathbb{R}^{3}
[𝐯]×=𝐯𝐋=vx𝐋x+vy𝐋y+vz𝐋z\displaystyle[\mathbf{v}]_{\times}=\mathbf{v}\cdot\mathbf{L}=v_{x}\mathbf{L}_{x}+v_{y}\mathbf{L}_{y}+v_{z}\mathbf{L}_{z} 𝔰𝔬(3)\displaystyle\in\mathfrak{so}(3)
exp([𝐯]×)=exp(𝐯𝐋)=𝐑\displaystyle\exp([\mathbf{v}]_{\times})=\exp(\mathbf{v}\cdot\mathbf{L})=\mathbf{R} SO(3)\displaystyle\in SO(3)

where 𝐋x=[𝐞x]×,𝐋y=[𝐞y]×,𝐋z=[𝐞z]×\mathbf{L}_{x}=[\mathbf{e}_{x}]_{\times},\mathbf{L}_{y}=[\mathbf{e}_{y}]_{\times},\mathbf{L}_{z}=[\mathbf{e}_{z}]_{\times} are bases of 𝔰𝔬(3)\mathfrak{so}(3), exp()\exp(\cdot) denotes matrix exponential.

The rotation vector representation also applies to the Wigner D-matrix 𝐑~\tilde{\mathbf{R}}. However, as it is induced by 𝐑SO(3)\mathbf{R}\in SO(3), 𝐑~\tilde{\mathbf{R}} only spans a subspace of SO(9). Their relations noted by Palmer et al. are as follows:

𝐯=[vx,vy,vz]T\displaystyle\mathbf{v}=[v_{x},v_{y},v_{z}]^{T} 3\displaystyle\in\mathbb{R}^{3}
𝐯𝐋~=vx𝐋~x+vy𝐋~y+vz𝐋~z\displaystyle\mathbf{v}\cdot\tilde{\mathbf{L}}=v_{x}\tilde{\mathbf{L}}_{x}+v_{y}\tilde{\mathbf{L}}_{y}+v_{z}\tilde{\mathbf{L}}_{z} 𝔰𝔬(9)\displaystyle\in\mathfrak{so}(9)
exp(𝐯𝐋~)=𝐑~\displaystyle\exp(\mathbf{v}\cdot\tilde{\mathbf{L}})=\tilde{\mathbf{R}} SO(9)\displaystyle\in SO(9)

where 𝐋~x,𝐋~y,𝐋~z\tilde{\mathbf{L}}_{x},\tilde{\mathbf{L}}_{y},\tilde{\mathbf{L}}_{z} are bases of 𝔰𝔬(9)\mathfrak{so}(9) induced by SO(3)SO(3) (Section 2 of their supplementary). (13) can then be reformulated as:

𝐪=exp(𝐯𝐋~)𝐪09,𝐯3\mathbf{q}=\exp(\mathbf{v}\cdot\tilde{\mathbf{L}})\mathbf{q}_{0}\in\mathbb{R}^{9},\mathbf{v}\in\mathbb{R}^{3}

We characterize the set of 𝐪\mathbf{q} as the octahedral variety.

Note the matrix exponential is mostly not commute, so its derivatives behaves differently from the scalar exponential operator and can be quite involved. However, it is commute at 𝐯=[0,0,0]T\mathbf{v}=[0,0,0]^{T} as vi𝐋~iv_{i}\tilde{\mathbf{L}}_{i} are zero matrices. (Zhang et al., 2020) leverage this observation and write the close form expression of the spatial gradient of SH coefficients and draw its connection with surface curvature.

A.5. Projection onto Octahedral variety

In optimizing the octahedral field, the quadratic constraint c02+c82c_{0}^{2}+c_{8}^{2} (14) is often omitted during initialization (Huang et al., 2011; Ray et al., 2016), so normal aligned frames are likely deviated from the octahedral variety. The interior frames often deviate further away, as they are mostly unconstrained except for the smoothness.

For a general vector 𝐟9\mathbf{f}\in\mathbb{R}^{9} that may not be obtained by applying 𝐪0\mathbf{q}_{0} with rotations indices from SO(3)SO(3), projecting it back to octahedral variety is to find 𝐪\mathbf{q} that minimize their Euclidean distance:

argmax𝐪𝐪𝐟22\operatorname*{arg\,max}_{\mathbf{q}}\|\mathbf{q}-\mathbf{f}\|_{2}^{2}

Huang et al. (2011), Ray et al. (2016) and Solomon et al. (2017) leverage the (9) or (13) and apply gradient descent over the rotation parameterization to find its projection. The resulting rotation can also be used to recover the representation vectors of the octahedral frame. However, the Euler angle representation suffers from gimbal lock, and the nonconvex objective makes it easy to stuck in local minimal. Palmer et al. (2020) draw connection between octahedral frame and fourth order tensor (Chemin et al., 2019), and propose an exact projection method to circumvent this limitation.

For a general rotation matrix 𝐑=[𝐯1|𝐯2|𝐯3]\mathbf{R}=[\mathbf{v}_{1}|\mathbf{v}_{2}|\mathbf{v}_{3}], its descriptor function (12) can be written as:

(15) F(𝐑T𝐬)=i=13(𝐯i𝐬)4=(8π521)(3214Y00(𝐬)+𝐪T𝐲4(𝐬))F(\mathbf{R}^{T}\mathbf{s})=\sum_{i=1}^{3}(\mathbf{v}_{i}\cdot\mathbf{s})^{4}=(\frac{8\sqrt{\pi}}{5\sqrt{21}})(\frac{3\sqrt{21}}{4}Y_{0}^{0}(\mathbf{s})+\mathbf{q}^{T}\mathbf{y}_{4}(\mathbf{s}))

where 𝐲4\mathbf{y}_{4} is the vector form of SH band 4 basis functions.

It is the homogeneous polynomial (generalization of quadratic form 𝐬T𝐌𝐬\mathbf{s}^{T}\mathbf{Ms}) of the fourth order symmetric tensor with λi=1\lambda_{i}=1 and 𝐯i\mathbf{v}_{i} as its orthogonal decomposition:

𝐓=i=13λi𝐯i𝐯i𝐯i𝐯i:=i=13λi𝐯i43×3×3×3:=43\mathbf{T}=\sum_{i=1}^{3}\lambda_{i}\mathbf{v}_{i}\otimes\mathbf{v}_{i}\otimes\mathbf{v}_{i}\otimes\mathbf{v}_{i}:=\sum_{i=1}^{3}\lambda_{i}\mathbf{v}_{i}^{\otimes 4}\in\mathbb{R}^{3\times 3\times 3\times 3}:=\otimes^{4}\mathbb{R}^{3}

where \otimes denotes tensor power. We use the notation suggested by Anandkumar et al. (2012).

For a fourth order tensor to be orthogonally decomposable, the coefficients of its homogeneous polynomial (15) must satisfy a set of quadratic constraints (Boralevi et al., 2017). Palmer et al. propagate these constraints over 𝐪\mathbf{q}, as 15 quadratic equations:

[1𝐪]T𝐏i[1𝐪]=0,i=115\begin{bmatrix}1\\ \mathbf{q}\end{bmatrix}^{T}\mathbf{P}_{i}\begin{bmatrix}1\\ \mathbf{q}\end{bmatrix}=0,i=1\dots 15

where 𝐏i\mathbf{P}_{i} are provided in section 4 of their supplementary. Due to the nonconvexity of those constraints, they leverage semi-definite relaxation to perform the projection.

When 𝐪\mathbf{q} lies on the octahedral variety, its representation vectors 𝐯i\mathbf{v}_{i} are the eigenvectors of its orthogonally decomposable TT, which are also the fixed points of its homogeneous polynomial (Robeva, 2016). They can be recovered iteratively using the tensor power method (Lathauwer et al., 1995):

𝐯t+1=F𝐪(𝐯t)F𝐪(𝐯t)\mathbf{v}_{t+1}=\frac{\nabla F_{\mathbf{q}}(\mathbf{v}_{t})}{\|\nabla F_{\mathbf{q}}(\mathbf{v}_{t})\|}

where F𝐪:3F_{\mathbf{q}}:\mathbb{R}^{3}\to\mathbb{R} is the (15) parameterized with the fixed 𝐪\mathbf{q}.

A.6. Explicit vector representation

In study the non-orthogonal frame field, Desobry et al. (2021) propose an equivalent SH representation using zonal harmonics.

Zonal harmonics is the SH projection of function that exhibits the rotational symmetry along one axis. If that axis is zz axis, representing such function only requires one coefficient zl0z_{l}^{0} per band ll (Sloan, 2008).

f(𝐬)=lzl0Yl0(𝐬)f(\mathbf{s})=\sum_{l}z_{l}^{0}Y_{l}^{0}(\mathbf{s})

More prominently, the rotated version of this function towards a new direction 𝐝\mathbf{d} can be evaluated as:

(16) f(𝐬)\displaystyle f(\mathbf{s}) =lzl4π2l+1mYlm(𝐝)Ylm(𝐬)\displaystyle=\sum_{l}z_{l}\sqrt{\frac{4\pi}{2l+1}}\sum_{m}Y_{l}^{m}(\mathbf{d})Y_{l}^{m}(\mathbf{s})
=lmzl4π2l+1Ylm(𝐝)clmYlm(𝐬)\displaystyle=\sum_{l}\sum_{m}\underbrace{z_{l}\sqrt{\frac{4\pi}{2l+1}}Y_{l}^{m}(\mathbf{d})}_{c_{l}^{m}}Y_{l}^{m}(\mathbf{s})

In our case, the z4z^{4} component of the polynomial (12) is clearly invariant by the rotation along zz axis, thus can be projected as:

Fz(𝐬)\displaystyle F_{z}(\mathbf{s}) =z4=(𝐞z𝐬)4\displaystyle=z^{4}=(\mathbf{e}_{z}\cdot\mathbf{s})^{4}
=(16π𝐬4105)(218Y00(𝐬)+352Y20(𝐬)+Y40(𝐬))\displaystyle=(\frac{16\sqrt{\pi}\|\mathbf{s}\|^{4}}{105})(\frac{21}{8}Y_{0}^{0}(\mathbf{s})+\frac{3\sqrt{5}}{2}Y_{2}^{0}(\mathbf{s})+Y_{4}^{0}(\mathbf{s}))
=z00Y00(𝐬)+z20Y20(𝐬)+z40Y40(𝐬)\displaystyle=z_{0}^{0}Y_{0}^{0}(\mathbf{s})+z_{2}^{0}Y_{2}^{0}(\mathbf{s})+z_{4}^{0}Y_{4}^{0}(\mathbf{s})

With (16), for any direction 𝐯i\mathbf{v}_{i}, we have:

(17) F𝐯i(𝐬)\displaystyle F_{\mathbf{v}_{i}}(\mathbf{s}) =(𝐯i𝐬)4\displaystyle=(\mathbf{v}_{i}\cdot\mathbf{s})^{4}
=2πz00Y00(𝐯i)Y00(𝐬)+4π5z20m=22Y2m(𝐯i)Y2m(𝐬)\displaystyle=2\sqrt{\pi}z_{0}^{0}Y_{0}^{0}(\mathbf{v}_{i})Y_{0}^{0}(\mathbf{s})+\sqrt{\frac{4\pi}{5}}z_{2}^{0}\sum_{m=-2}^{2}Y_{2}^{m}(\mathbf{v}_{i})Y_{2}^{m}(\mathbf{s})
+2π3z40m=44Y4m(𝐯i)Y4m(𝐬)\displaystyle+\frac{2\sqrt{\pi}}{3}z_{4}^{0}\sum_{m=-4}^{4}Y_{4}^{m}(\mathbf{v}_{i})Y_{4}^{m}(\mathbf{s})

Thus, for a frame of any 3 representation vectors {𝐯1,𝐯2,𝐯3}\{\mathbf{v}_{1},\mathbf{v}_{2},\mathbf{v}_{3}\}, let 𝐑=[𝐯1|𝐯2|𝐯3]\mathbf{R}=[\mathbf{v}_{1}|\mathbf{v}_{2}|\mathbf{v}_{3}], we have:

F(𝐑T𝐬)=i3(𝐯i𝐬)=c0+i=13𝐜2i𝐲2(𝐬)+i=13𝐜4i𝐲4(𝐬)F(\mathbf{R}^{T}\mathbf{s})=\sum_{i}^{3}(\mathbf{v}_{i}\cdot\mathbf{s})=c_{0}+\sum_{i=1}^{3}\mathbf{c}_{2}^{i}\mathbf{y}_{2}(\mathbf{s})+\sum_{i=1}^{3}\mathbf{c}_{4}^{i}\mathbf{y}_{4}(\mathbf{s})

where c0c_{0}\in\mathbb{R}, 𝐜2i5\mathbf{c}_{2}^{i}\in\mathbb{R}^{5}, 𝐜4i9\mathbf{c}_{4}^{i}\in\mathbb{R}^{9} are coefficients from 17.

When the 3 representation vectors are orthogonal, the frame is the ordinary octahedral frame, the band 2 coefficients i3𝐜2i\sum_{i}^{3}\mathbf{c}_{2}^{i} becomes 𝟎\mathbf{0} vector, 𝐑SO(3)\mathbf{R}\in SO(3) is rotation matrix, and the expression above is equivalent to (15) by a constant scaling.

The advantage of this expression is it bridges the differentiation of the SH band 4 coefficient representation with respect to its representation vectors, without the need of iterative matrix logarithm conversion from rotation vector.

Appendix B Numerical results

The quantitative results from ABC and Thingi10k dataset are listed in Table 2. We additional ablate the choice or regularization weight (Table 3) and use DiGS as backbone (Table 4) and provide visualization for both cases (Figure 12).

Table 3. Ablation study of different regularization weight, ”weight / noise σ\sigma”. The metrics are averaged across two datasets.
Chamfer \downarrow Hausdorff \downarrow F-Score \uparrow
5/0.002L5/0.002L 3.140 6.363 95.779
10/0.002L10/0.002L 2.962 5.981 95.822
5/0.01L5/0.01L 4.174 6.822 89.934
10/0.01L10/0.01L 3.568 6.167 90.175
Refer to caption
Figure 12. We ablate our results with half the regularization weight (Left) and DiGS as backbone (Right). A large regularization weight can help suppressing the noise, but at a higher cost of filling holes. Our method is less effective on DiGS, as its smooth on SDF limits our regularization of sharp edges.
\Description

Ablation

Table 4. Ablation study of different backbones, ”backbone method /noise σ\sigma”. The metrics are averaged across two datasets.
Chamfer \downarrow Hausdorff \downarrow F-Score \uparrow
DiGS/0.002DiGS/0.002L 3.384 8.054 94.244
NSH/0.002LNSH/0.002L 2.962 5.981 95.822
DiGS/0.01LDiGS/0.01L 4.744 8.715 88.113
NSH/0.01LNSH/0.01L 3.568 6.167 90.175