This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Score-Based Point Cloud Denoising

Shitong Luo, Wei Hu
Wangxuan Institute of Computer Technology
Peking University
{luost, forhuwei}@pku.edu.cn
Corresponding author: Wei Hu ([email protected]). This work was supported by National Natural Science Foundation of China (61972009).
Abstract

Point clouds acquired from scanning devices are often perturbed by noise, which affects downstream tasks such as surface reconstruction and analysis. The distribution of a noisy point cloud can be viewed as the distribution of a set of noise-free samples p(𝒙)p({\bm{x}}) convolved with some noise model nn, leading to (pn)(𝒙)(p*n)({\bm{x}}) whose mode is the underlying clean surface. To denoise a noisy point cloud, we propose to increase the log-likelihood of each point from pnp*n via gradient ascent—iteratively updating each point’s position. Since pnp*n is unknown at test-time, and we only need the score (i.e., the gradient of the log-probability function) to perform gradient ascent, we propose a neural network architecture to estimate the score of pnp*n given only noisy point clouds as input. We derive objective functions for training the network and develop a denoising algorithm leveraging on the estimated scores. Experiments demonstrate that the proposed model outperforms state-of-the-art methods under a variety of noise models, and shows the potential to be applied in other tasks such as point cloud upsampling. The code is available at https://github.com/luost26/score-denoise.

1 Introduction

Point clouds consist of discrete 3D points irregularly sampled from continuous surfaces. It is an increasingly popular representation widely applied in autonomous driving, robotics and immersive tele-presence. However, point clouds are often perturbed by noise due to the inherent limitations of acquisition equipments or matching ambiguities in the reconstruction from images. Noise in point clouds significantly affects downstream tasks such as rendering, reconstruction and analysis since the underlying structures are deformed. Hence, point cloud denoising is crucial to relevant 3D vision applications. Nevertheless, point cloud denoising is challenging due to the irregular and unordered characteristics of point clouds.

Refer to caption
Figure 1: An illustration of the proposed point cloud denoising method. We first estimate the score of the noise-convolved distribution 𝒙log[(pn)(𝒙)]\nabla_{\bm{x}}\log[(p*n)({\bm{x}})] from the noisy point cloud. Then, we perform gradient ascent using the estimated score to denoise the point cloud.

Early point cloud denoising methods are optimization-based [6, 16, 4, 1, 2, 22, 31, 36], which rely heavily on geometric priors and are sometimes challenging to strike a balance between the detail preservation and denoising effectiveness. Recently, deep-learning-based approaches have emerged and achieved promising denoising performance thanks to the advent of neural network architectures crafted for point clouds [24, 25, 32]. The majority of deep-learning-based denoising models predict the displacement of noisy points from the underlying surface and then apply the inverse displacement to the noisy point clouds [7, 26, 11, 23]. This class of methods mainly suffer from two types of artifacts: shrinkage and outliers, which arise from over-estimation or under-estimation of the displacement. Instead, Luo et al. [21] proposed to learn the underlying manifold of a noisy point cloud for reconstruction in a downsample-upsample architecture, which alleviates the issue of outliers by learning to filter out high-noise points in the downsampling stage. However, the downsampling stage inevitably causes detail loss especially at low noise levels.

In this paper, we propose a novel paradigm of point cloud denoising motivated by the distributional properties of noisy point clouds. Point clouds consist of points sampled from the surface of 3D objects. Therefore, a noise-free point cloud can be modeled as a set of samples from some 3D distribution p(𝒙)p({\bm{x}}) supported by 2D manifolds. If the point cloud is corrupted by noise, the distribution about the noisy point cloud can be modeled as the convolution between the original distribution pp and some noise model nn (e.g., Gaussian noise), expressed as (pn)(𝒙)(p*n)({\bm{x}}). Under some mild assumptions about the noise model nn (see Section 4 for details), the mode of pnp*n is the underlying clean surface, having higher probability than its ambient space. According to this observation, denoising a noisy point cloud naturally amounts to moving noisy points towards the mode, which can be realized by performing gradient ascent on the log-probability function log[(pn)(𝒙)]\log[(p*n)({\bm{x}})], as illustrated in Figure 1. As the points are expected to converge to the mode of distribution after sufficient iterations of gradient ascent, our method is more robust against artifacts such as shrinkage and outliers, while previous methods have no awareness of the mode.

However, there is a major challenge to address in order to implement this method—pnp*n is unknown at test-time, which has to be estimated from the input noisy point cloud only. To tackle this challenge, we propose a detail-preserving neural network architecture to estimate the score of the distribution underlying an input noisy point cloud 𝒙log[(pn)(𝒙)]\nabla_{\bm{x}}\log[(p*n)({\bm{x}})], i.e., the gradient of the log-probability function. We also formulate the objective function for training the score estimation network and develop a denoising algorithm. Further, we provide an analysis of the model from the perspective of probability, revealing the principle behind the model formally. Extensive experiments demonstrate that the proposed model outperforms state-of-the-art methods, and has the potential to be applied to other tasks such as point cloud upsampling.

To summarize, the contributions of this work include:

  • We propose a novel paradigm of point cloud denoising, exploiting the distribution model of noisy point clouds and leveraging the score of the distribution.

  • To implement the method, we propose a neural network architecture for score-estimation from noisy point clouds, formulate objective functions for training the network, and develop the denoising algorithm.

  • Extensive experiments demonstrate the capability of the proposed method under a variety of noise models.

2 Related Work

Refer to caption
Figure 2: Illustration of different classes of deep-learning-based point cloud denoising methods.

2.1 Optimization-based denoising

Prior to the emergence of deep-learning-based denoising, the point cloud denoising problem is often formulated as an optimization problem constrained by geometric priors. We classify them into four categories: (1) Density-based methods are most relevant to ours as they also involve modeling the distribution of points. [36] uses the kernel density estimation technique to approximate the density of noisy point clouds. Then, it removes outlying points in low-density regions. To finally obtain a clean point cloud, it relies on the bilateral filter [9] to reduce the noise of the outlier-free point cloud. Therefore, this method focuses on outlier removal. (2) Local-surface-fitting-based methods approximate the point cloud with a smooth surface using simple-form function approximators and then project points onto the surface [1]. [9, 4, 16, 6] proposed jet fitting and bilateral filtering that take into account both point coordinates and normals. (3) Sparsity-based methods first reconstruct normals by solving an optimization problem with sparse regularization and then update the coordinates of points based on the reconstructed normals [2, 31, 33]. The recently proposed MRPCA [22] is a sparsity-based denoiser which has achieved promising denoising performance. (4) Graph-based methods represent point clouds on graphs and perform denoising using graph filters such as the graph Laplacian [28, 37, 12, 14, 13]. Recently, [37] proposed graph Laplacian regularization (GLR) of a low-dimensional manifold model for point cloud denoising, while [12] proposed a paradigm of feature graph learning to infer the underlying graph structure of point clouds for denoising. To summarize, optimization-based point cloud denoising methods rely heavily on geometric priors. Also, there is sometimes a trade-off between detail preservation and denoising effectiveness.

2.2 Deep-learning-based denoising

The advent of point-based neural networks [24, 25, 32] has made deep point cloud denoising possible. The majority of existing deep learning based methods predict the displacement of each point in noisy point clouds using neural networks, and apply the inverse displacement to each point as illustrated in Figure 2(a). PointCleanNet (PCNet) [26] is the pioneer of this class of approaches, which employs a variant of PointNet as its backbone network. [23] proposed GPDNet, which uses graph convolutional networks to enhance the robustness of the neural denoiser. [11] proposed an unsupervised point cloud denoising framework—Total Denoising (TotalDn). In TotalDn, an unsupervised loss function is derived for training deep-learning-based denoisers, based on the assumption that points with denser surroundings are closer to the underlying surface. The aforementioned displacement-prediction methods generally suffer from two types of artifacts: shrinkage and outliers, as a result of inaccurate estimation of noise displacement. Instead, [21] proposed to learn the underlying manifold (surface) of a noisy point cloud for reconstruction in a downsample-upsample architecture as illustrated in Figure 2(b). However, although the downsampling stage discards outliers in the input, it may also discard some informative details, leading to over-smoothing.

In this work, we propose a novel framework that distinguishes significantly from the aforementioned methods. Our method is motivated by the distribution model of noisy point clouds. It denoises point clouds via gradient ascent guided by the estimated gradient of the noisy point cloud’s log-density as illustrated in Figure 2(c). Our method is shown to alleviate the artifacts of shrinkage and outliers, and achieve significantly better denoising performance.

2.3 Score matching

Score matching is a technique for training energy-based models—a family of non-normalized probability distributions [18]. It deals with matching the model-predicted gradients and the data log-density gradients by minimizing the squared distance between them [17, 30]. Our proposed training objectives are similar to the score matching technique. The score matching technique in generative modeling aims at approximating unconditional distributions about data (e.g., images), while our model estimates the noise-convolved distribution of points.

Score matching has been applied to developing generative models for 3D shapes. [3] proposed an auto-encoder architecture ShapeGF that also has a score-estimation network which served as a decoder. However, ShapeGF is different from our model in at least the following three aspects. First, ShapeGF is designed for 3D point cloud generation and models the noise-free 3D distribution pp, while our method models the noise-convolved distribution pnp*n and aims at denoising the point cloud based on the score of pnp*n. Second, since ShapeGF is a general auto-encoder for 3D shapes, it does not have the generalizability to out-of-distribution shapes. For instance, when trained on the ShapeNet dataset [5], it can hardly generalize to shapes beyond the categories in ShapeNet. In contrast, our model is generalizable to arbitrary 3D shapes because our score function is defined on a local basis, which learns the building blocks of 3D shapes rather than the entire shapes themselves. This way narrows down the latent space of 3D geometry representation and makes it possible for the network to learn and reconstruct 3D details. Third, to recover 3D shapes, ShapeGF requires a latent code of the shape obtained from the encoder, but their encoder is not meant to learn representations for denoising or other detail-demanding tasks.

3 Method

Refer to caption
Figure 3: Illustration of the proposed network architecture.

We first provide an overview of the proposed method. Then, we elaborate on the score estimation network, propose the training objective for the network, and develop a score-based denoising algorithm.

3.1 Overview

Given a noisy point cloud 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N} consisting of NN points as input, we model the underlying noise-free point cloud as a set of samples from a 3D distribution pp supported by 2D manifolds, and assume the noise follows a distribution nn. Then the distribution of the noisy point cloud can be modelled as the convolution between pp and nn, denoted as pnp*n.

In order to denoise the noisy input 𝑿{\bm{X}}, we propose to estimate the score of the noise-convolved distribution pnp*n, i.e., 𝒙log[(pn)(𝒙)]\nabla_{\bm{x}}\log[(p*n)({\bm{x}})]—the gradient of the log-probability function, only from 𝑿{\bm{X}}. Then, we denoise 𝑿{\bm{X}} using the estimated scores of pnp*n via gradient ascent, thus moving noisy points towards the mode of the distribution that corresponds to the underlying clean surface. The implementation of the proposed method mainly consists of the following three parts:

  1. 1.

    The score estimation network that takes noisy point clouds as input and outputs point-wise scores 𝒙log[(pn)(𝒙i)](i=1,,N)\nabla_{\bm{x}}\log[(p*n)({\bm{x}}_{i})](i=1,\ldots,N) (Section 3.2).

  2. 2.

    The objective function for training the score estimation network (Section 3.3).

  3. 3.

    The score-based denoising algorithm that leverages on the estimated scores to denoise point clouds (Section 3.4).

3.2 The Proposed Score Estimation Network

Given a noisy point cloud 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N} as input, the score estimation network predicts 𝒙log[(pn)(𝒙i)]\nabla_{\bm{x}}\log[(p*n)({\bm{x}}_{i})] for each point in 𝑿{\bm{X}}. We estimate the score for each point 𝒙i{\bm{x}}_{i} on a local basis, i.e., the network aims at estimating the score function in the neighborhood space around 𝒙i{\bm{x}}_{i}, denoted as 𝒮i(𝒓){\mathcal{S}}_{i}({\bm{r}}). Localized score functions are fundamental to the model’s generalizability because in this way the model focuses on the basic fragments of 3D shapes rather than the entire shapes themselves, narrowing down the latent space of 3D geometry representation.

The estimation of 𝒮i(𝒓){\mathcal{S}}_{i}({\bm{r}}) is realized by a neural network which consists of a feature extraction unit and a score estimation unit. The feature extraction unit produces features that encode both the local and non-local geometry at each point. The extracted features are subsequently fed as parameters into the score estimation unit to construct score functions.

Feature Extraction Unit

The feature extraction unit aims to learn point-wise features from the input noisy point cloud 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N}. We adopt the feature extraction network widely used in previous denoising and upsampling models [21, 34, 19]. Specifically, we construct a stack of densely connected dynamic graph convolutional layers [32]. The dynamic graph convolution is able to extract multi-scale as well as both local and non-local features for each point, while the dense connection produces features with richer contextual information [15, 20]. These properties make the architecture suitable for the denoising task, as evidenced in previous works [21, 34]. The learned feature for point 𝒙i{\bm{x}}_{i} is denoted as 𝒉i{\bm{h}}_{i}.

Score Estimation Unit

The score estimation unit is parameterized by point 𝒙i{\bm{x}}_{i}’s feature 𝒉i{\bm{h}}_{i}. It takes some 3D coordinate 𝒙3{\bm{x}}\in{\mathbb{R}}^{3} nearby 𝒙i{\bm{x}}_{i} as input and outputs the score 𝒮i(𝒙){\mathcal{S}}_{i}({\bm{x}}). Note that, here 𝒙{\bm{x}} does not necessarily correspond to a point in the input point cloud 𝑿{\bm{X}}. It might be an intermediate coordinate during the gradient ascent denoising process. Formally, the score estimation unit takes the form:

𝒮i(𝒙)=Score(𝒙𝒙i,𝒉i),{\mathcal{S}}_{i}({\bm{x}})=\operatorname{Score}({\bm{x}}-{\bm{x}}_{i},{\bm{h}}_{i}), (1)

where Score()\operatorname{Score}(\cdot) is a multi-layer perceptron (MLP). Note that we input 𝒙𝒙i{\bm{x}}-{\bm{x}}_{i} (the coordinate of 𝒙{\bm{x}} relative to 𝒙i{\bm{x}}_{i}) to the network because the score function is localized around 𝒙i{\bm{x}}_{i}.

The score estimation is trained by optimizing the proposed objective, which will be discussed next.

3.3 The Proposed Training Objective

We denote the input noisy point cloud as 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N} and the ground truth noise-free point cloud as 𝒀={𝒚i}i=1N{\bm{Y}}=\{{\bm{y}}_{i}\}_{i=1}^{N}. Using the ground truth 𝒀{\bm{Y}}, we define the score for some point 𝒙3{\bm{x}}\in{\mathbb{R}}^{3} as follows:

𝒔(𝒙)=NN(𝒙,𝒀)𝒙,𝒙3,{\bm{s}}({\bm{x}})=\operatorname{NN}({\bm{x}},{\bm{Y}})-{\bm{x}},\ {\bm{x}}\in{\mathbb{R}}^{3}, (2)

where NN(𝒙,𝒀)\operatorname{NN}({\bm{x}},{\bm{Y}}) returns the point nearest to 𝒙{\bm{x}} in 𝒀{\bm{Y}}. Intuitively, 𝒔(𝒙){\bm{s}}({\bm{x}}) is a vector from 𝒙{\bm{x}} to the underlying surface.

The training objective aligns the network-predicted score to the ground truth score defined above:

(i)=𝔼𝒙𝒩(𝒙i)[𝒔(𝒙)𝒮i(𝒙)22],\mathcal{L}^{(i)}=\mathbb{E}_{{\bm{x}}\sim{\mathcal{N}}({\bm{x}}_{i})}\left[\left\|{\bm{s}}({\bm{x}})-{\mathcal{S}}_{i}({\bm{x}})\right\|_{2}^{2}\right], (3)

where 𝒩(𝒙i){\mathcal{N}}({\bm{x}}_{i}) is a distribution concentrated in the neighborhood of 𝒙i{\bm{x}}_{i} in 3{\mathbb{R}}^{3} space. Note that, this objective not only matches the predicted score on the position of 𝒙i{\bm{x}}_{i} but also matches the score on the neighboring areas of 𝒙i{\bm{x}}_{i} as illustrated in Figure 3. This is important because a point moves around during gradient ascent, which relies on the score defined on the neighborhood of its initial position. Such definition of training objective also distinguishes our method from previous displacement-based methods [26, 23], as the objectives of those methods only consider the position of each point while our proposed objective covers the neighborhood of each point.

The final training objective is simply an aggregation of the objective for each local score function:

=1Ni=1N(i).\mathcal{L}=\frac{1}{N}\sum_{i=1}^{N}\mathcal{L}^{(i)}. (4)

3.4 The Score-Based Denoising Algorithm

Given a noisy point cloud 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N} as input, we first need to construct the local score function 𝒮i{\mathcal{S}}_{i} for each point 𝒙i𝑿{\bm{x}}_{i}\in{\bm{X}}. Specifically, we first feed the input point cloud 𝑿{\bm{X}} to the feature extraction unit to obtain a set of point-wise features {𝒉i}i=1N\{{\bm{h}}_{i}\}_{i=1}^{N}. Next, by substituting 𝒙i{\bm{x}}_{i}, 𝒉i{\bm{h}}_{i} and some 3D coordinate 𝒙3{\bm{x}}\in{\mathbb{R}}^{3} into Eq. 1, we obtain 𝒮i(𝒙){\mathcal{S}}_{i}({\bm{x}}) as the estimated score at 𝒙{\bm{x}}.

In principle, we can solely use 𝒮i{\mathcal{S}}_{i} to denoise 𝒙i{\bm{x}}_{i}. However, to enhance the robustness and reduce the bias of estimation, we propose the ensemble score function:

i(𝒙)=1K𝒙jkNN(𝒙i)𝒮j(𝒙),𝒙3,{\mathcal{E}}_{i}({\bm{x}})=\frac{1}{K}\sum_{{\bm{x}}_{j}\in k\operatorname{NN}({\bm{x}}_{i})}{\mathcal{S}}_{j}({\bm{x}}),\ {\bm{x}}\in{\mathbb{R}}^{3}, (5)

where kNN(𝒙i)k\operatorname{NN}({\bm{x}}_{i}) is 𝒙i{\bm{x}}_{i}’s kk-nearest neighborhood.

Finally, denoising a point cloud amounts to updating each point’s position via gradient ascent:

𝒙i(t)=𝒙i(t1)+αti(𝒙i(t1)),t=1,,T,𝒙i(0)=𝒙i,𝒙i𝑿,\begin{split}{\bm{x}}_{i}^{(t)}&={\bm{x}}_{i}^{(t-1)}+\alpha_{t}{\mathcal{E}}_{i}({\bm{x}}_{i}^{(t-1)}),\ t=1,\ldots,T,\\ {\bm{x}}_{i}^{(0)}&={\bm{x}}_{i},\ {\bm{x}}_{i}\in{\bm{X}},\end{split} (6)

where αt\alpha_{t} is the step size at the tt-th step. We suggest two criteria for choosing the step size sequence {αt}t=1T\{\alpha_{t}\}_{t=1}^{T}: (1) The sequence should be decreasing towards 0 to ensure convergence. (2) α1\alpha_{1} should be less than 1 and not be too close to 0, because according to Eq. 2, the magnitude of the score is approximately the distance from each point to the underlying surface (approximately the length of 𝒔(𝒙){\bm{s}}({\bm{x}}) in Eq. 2). Thus, performing gradient ascent for a sufficient number of steps with a proper step size less than 1 is enough and avoids over-denoising.

It is worth noting that, unlike some previous deep-learning-based denoisers such as PCNet [26] and TotalDn [11] that suffer from shape shrinkage, we do not observe any shrinkage induced by our method. Thus, we have no need to post-process the denoised point clouds by inflating them slightly as in those works. This shows that our method is more robust to shrinkage compared to previous ones.

4 Analysis

Refer to caption
Figure 4: Illustration of the denoising theory.
# Points 10K (Sparse) 50K (Dense)
Noise 1% 2% 3% 1% 2% 3%
Dataset Model CD P2M CD P2M CD P2M CD P2M CD P2M CD P2M
PU [35] Bilateral [9] 3.646 1.342 5.007 2.018 6.998 3.557 0.877 0.234 2.376 1.389 6.304 4.730
Jet [4] 2.712 0.613 4.155 1.347 6.262 2.921 0.851 0.207 2.432 1.403 5.788 4.267
MRPCA [22] 2.972 0.922 3.728 1.117 5.009 1.963 0.669 0.099 2.008 1.033 5.775 4.081
GLR [37] 2.959 1.052 3.773 1.306 4.909 2.114 0.696 0.161 1.587 0.830 3.839 2.707
PCNet [26] 3.515 1.148 7.467 3.965 13.067 8.737 1.049 0.346 1.447 0.608 2.289 1.285
DMR [21] 4.482 1.722 4.982 2.115 5.892 2.846 1.162 0.469 1.566 0.800 2.432 1.528
Ours 2.521 0.463 3.686 1.074 4.708 1.942 0.716 0.150 1.288 0.566 1.928 1.041
PC [26] Bilateral [9] 4.320 1.351 6.171 1.646 8.295 2.392 1.172 0.198 2.478 0.634 6.077 2.189
Jet [4] 3.032 0.830 5.298 1.372 7.650 2.227 1.091 0.180 2.582 0.700 5.787 2.144
MRPCA [22] 3.323 0.931 4.874 1.178 6.502 1.676 0.966 0.140 2.153 0.478 5.570 1.976
GLR [37] 3.399 0.956 5.274 1.146 7.249 1.674 0.964 0.134 2.015 0.417 4.488 1.306
PCNet [26] 3.847 1.221 8.752 3.043 14.525 5.873 1.293 0.289 1.913 0.505 3.249 1.076
DMR [21] 6.602 2.152 7.145 2.237 8.087 2.487 1.566 0.350 2.009 0.485 2.993 0.859
Ours 3.369 0.830 5.132 1.195 6.776 1.941 1.066 0.177 1.659 0.354 2.494 0.657
Table 1: Comparison among competitive denoising algorithms. CD is multiplied by 10410^{4} and P2M is multiplied by 10410^{4}.

In this section, we elaborate on the distribution model for noisy point clouds.

4.1 Points as Samples from a Distribution

To begin with, we consider the distribution of a noise-free point cloud 𝒀={𝒚i}i=1N{\bm{Y}}=\{{\bm{y}}_{i}\}_{i=1}^{N} as sampled from a 3D distribution p(𝒚)p({\bm{y}}) supported by 2D manifolds. Since p(𝒚)p({\bm{y}}) is supported on 2D manifolds, it is discontinuous and has zero support in the ambient space, i.e., p(𝒚)p({\bm{y}})\rightarrow\infty if 𝒚{\bm{y}} exactly lies on the manifold, otherwise p(𝒚)=0p({\bm{y}})=0.

Next, we consider the distribution of noisy point clouds. A noisy point cloud can be denoted as 𝑿={𝒙i=𝒚i+𝒏i}i=1N{\bm{X}}=\{{\bm{x}}_{i}={\bm{y}}_{i}+{\bm{n}}_{i}\}_{i=1}^{N}, where 𝒏i{\bm{n}}_{i} is the noise component from a distribution nn. Here, we assume that the probability density function nn is continuous and has a unique mode at 0. These assumptions are made for analysis. We will show by experiments that in some cases where the assumptions do not hold, the proposed method still achieves superior performance (see Section A in the supplementary material). Under the continuity assumption of nn, the density function of the distribution with respect to 𝒙i{\bm{x}}_{i} can be expressed as a convolution of pp and nn:

q(𝒙):=(pn)(𝒙)=𝒔3p(𝒔)n(𝒙𝒔)d𝒔.\begin{split}q({\bm{x}})&:=(p*n)({\bm{x}})\\ &=\int_{{\bm{s}}\in{\mathbb{R}}^{3}}p({\bm{s}})n({\bm{x}}-{\bm{s}})\mathrm{d}{\bm{s}}.\end{split} (7)

It can be shown by taking the derivative of both sides that the noise-free point cloud 𝒀{\bm{Y}} from the noise-free distribution pp exactly lies on the mode of qq if the mode of nn is 0. When the assumption of uni-modality holds, q(𝒙)q({\bm{x}}) reaches the maximum on the noise-free manifold.

4.2 Connection to Denoising

Suppose the density function q(𝒙)q({\bm{x}}) is known. Based on the above analysis, denoising a point cloud 𝑿={𝒙i}i=1N{\bm{X}}=\{{\bm{x}}_{i}\}_{i=1}^{N} amounts to maximizing ilogq(𝒙i)\sum_{i}\log q({\bm{x}}_{i}). This can be naturally achieved by performing gradient ascent until the points converge to the mode of q(𝒙)q({\bm{x}}). The gradient ascent relies only on the score function 𝒙logq(𝒙)\nabla_{\bm{x}}\log q({\bm{x}})—the first-order derivative of the log-density function. As shown in the previous subsection, q(𝒙)q({\bm{x}}) reaches the maximum on the underlying manifold under some mild assumptions. Hence, the vector field 𝒙logq(𝒙)\nabla_{\bm{x}}\log q({\bm{x}}) consistently heads to the clean surface as demonstrated in Figure 4.

However, the density qq is unknown during test-time. Instead of estimating qq from noisy observations, we only need the gradient of logq\log q during the denoising, which is more tractable. This motivates the proposed model—score-based denoising.

4.3 Connection to the Training Objective

The training objective defined in Eq. 3 matches the predicted score to the ground truth score function. The magnitude of the estimated score may not exactly equal to that of the real score function. However, this is not an issue in the denoising task, since as long as the directions of estimated gradients are accurate, the points will converge to the underlying surface with sufficient number of steps at a suitable step size of gradient ascent.

5 Experiments

MRPCA GLR PCNet DMR Ours
CD 2.886 2.663 3.137 2.764 2.616
P2M 1.933 1.920 2.142 1.910 1.847
Table 2: Comparison among different denoising methods tested on point clouds generated by simulated LiDAR scanning with realistic LiDAR noise, which is an unseen noise pattern to our denoiser since we train only on Gaussian noise. CD is multiplied by 10410^{4} and P2M is multiplied by 10410^{4}.

5.1 Setup

Datasets

We collect 20 meshes for training from the training set of PU-Net [35] and use Poisson disk sampling to sample points from the meshes, at resolutions ranging from 10K to 50K points. The point clouds are normalized into the unit sphere. Then, they are only perturbed by Gaussian noise with standard deviation from 0.5% to 2% of the bounding sphere’s radius. Similar to previous works [26, 21], point clouds are split into patches before being fed into the model. We set the patch size to be 1K.

For quantitative evaluation, we use two benchmarks: the test-set of PU-Net [35] (20 shapes) and the test-set of PointCleanNet (10 shapes) [26]. Similarly, we use Poisson disk sampling to sample point clouds from each shape, at resolution levels of 10K and 50K points. The performance of our model is then evaluated using a variety of noise models, including isotropic Gaussian noise, simulated LiDAR noise, non-isotropic Gaussian noise, uni-directional noise, Laplace noise, uniform noise, and discrete noise.

Furthermore, we also use the Paris-rue-Madame dataset [29] for visual evaluation, which is obtained from the real world using laser scanners.

Baselines

We compare our method to state-of-the-art deep-learning-based denoisers and optimization-based denoisers.

Deep-learning-based denoisers include: PointCleanNet (PCNet) [26], and DMRDenoise (DMR) [21]. We exclude Total Denoising (TotalDn) [11] in our main experiments as TotalDn is based on unsupervised learning and it is unfair to compare supervised and unsupervised models explicitly. However, we will present an unsupervised adaptation of our model inspired by the training objective proposed by [11] in the supplementary material, and compare our unsupervised adaptation to TotalDn.

Optimization-based denoisers include bilateral filtering [6], jet fitting [4], MRPCA [22] and GLR [37].

Metrics

We employ two metrics commonly adopted in previous works to perform quantitative evaluation of our model: Chamfer distance (CD) [8] and point-to-mesh distance (P2M) [27]. Since the size of point clouds varies, we normalize the denoised results into the unit sphere before computing the metrics.

Hyper-parameters

We use only one set of hyper-parameters to train a unique model for all experimental settings, except for ablation studies. Hyper-parameters including learning rates, denoising step sizes, network architectures, etc., are provided in the supplementary material. The code and data are available at https://github.com/luost26/score-denoise.

Refer to caption
Figure 5: Visual comparison of denoising methods under (a) Gaussian noise, (b) simulated LiDAR noise. Points colored yellower are farther away from the ground truth surface.

5.2 Quantitative Results

We first use isotropic Gaussian noise to test our models and baselines. The standard deviation of noise ranges from 1% to 3% of the shape’s bounding sphere radius. As presented in Table 1, our model significantly outperforms previous deep-learning-based methods in all settings and, surpasses optimization-based methods in the majority of cases.

Although the model is trained with only Gaussian noise, to test its generalizability, we use a different noise type—simulated LiDAR noise. Specifically, we use a virtual Velodync HDL-64E2 scanner provided by the Blensor simulation package [10] to acquire noisy point clouds. The scanning noise level is set to 1% following [21]. The results in Table 2 indicate that although our denoiser is trained using Gaussian noise, it is effective in generalizing to unseen LiDAR noise and outperforms previous methods.

Other noise models, including non-isotropic Gaussian noise, uni-directional noise, Laplace noise, uniform noise, and discrete noise are also used to evaluate our method and baselines. In most of these experimental settings, our model outperforms competing baselines. The detailed results are included in the supplementary material.

Refer to caption
Figure 6: Visual results of our denoiser on the real-world dataset Paris-rue-Madame [29].

5.3 Qualitative Results

Figure 5 shows the denoising results from the proposed method and competitive baselines under Gaussian noise and simulated LiDAR noise. The color of each point indicates its reconstruction error measured by point-to-mesh distance introduced in Section 5.1. Points closer to the underlying surface are colored darker, otherwise colored brighter. As can be observed in the figure, our results are much cleaner and more visually appealing than those of other methods. Notably, our method preserves details better than other methods and is more robust to outliers compared to other deep-learning-based methods such as PCNet and DMRDenoise.

Further, we conduct qualitative studies on the real-world dataset Paris-rue-Madame [29]. Note that, since the noise-free point cloud is unknown for real-world datasets, the error of each point cannot be computed and visualized. As demonstrated in Figure 6, our denoising result is cleaner and smoother than that of PCNet, with details preserved better than DMRDenoise.

In addition, we present a denoising trajectory in Figure 7, which reveals the gradient ascent process of our method—noise reduces as points gradually converge to the mode of pnp*n.

More visual results regarding synthetic noise and real-world noise are provided in the supplementary material.

In summary, the demonstrated qualitative results are consistent with the quantitative results in Section 5.2, which again validates the effectiveness of the proposed method.

Refer to caption
Figure 7: A gradient ascent trajectory of denoising.

5.4 Ablation Studies

We perform ablation studies to assess the contribution of the proposed method’s main designs:

(1) Score-based denoising algorithm We replace the gradient ascent rule (Eq. 6) by directly adding the predicted score to the input coordinates, which is similar to end-to-end displacement-based methods:

𝒚i=𝒙i+i(𝒙i).{\bm{y}}_{i}={\bm{x}}_{i}+{\mathcal{E}}_{i}({\bm{x}}_{i}). (8)

We also apply this update rule iteratively following previous displacement-based methods [26, 11]. The number of iterations is fine-tuned to produce the best performance.

(2) Neighborhood-covering training objective We replace the objective in Eq. 3 with:

(i)=𝒔(𝒙i)𝒮i(𝒙i)22,{\mathcal{L}}^{(i)}=\|{\bm{s}}({\bm{x}}_{i})-{\mathcal{S}}_{i}({\bm{x}}_{i})\|_{2}^{2}, (9)

which is similar to the L2 objective [26] or the Chamfer distance [21, 23] employed in previous deep-learning-based models [26], considering only the position of 𝒙i{\bm{x}}_{i}, while ours covers the neighborhood of 𝒙i{\bm{x}}_{i}.

(3) Ensemble score function We replace the ensemble score function in Eq. 5 with the single score function 𝒮i(𝒙){\mathcal{S}}_{i}({\bm{x}}).

As shown in Table 3, all the components contribute positively to the denoising performance. More results and analysis of the ablation studies can be found in the supplementary material.

Dataset: PU 10K, 1% 10K, 2% 10K, 3%
Ablation CD P2M CD P2M CD P2M
(1) 3.237 0.994 5.241 2.258 7.471 4.049
(1) + iter. 3.237* 0.994* 5.241* 2.258* 6.073 2.953
(2) 4.726 2.188 5.740 2.748 5.976 3.036
(3) 2.522 0.471 4.021 1.280 6.872 3.497
Full 2.521 0.463 3.686 1.074 4.708 1.942
Table 3: Ablation studies. CD is multiplied by 10410^{4} and P2M is multiplied by 10410^{4}. (*) The best performance is achieved after running for only 1 iteration.

5.5 Beyond Denoising: Upsampling via Denoising

Going beyond denoising, we show that the proposed method is applicable to point cloud upsampling. In particular, given a sparse point cloud with NN points as input, we perturb it with Gaussian noise independently for rr times, leading to a noisy dense point cloud consisting of rNrN points. Subsequently, we feed the noisy dense point cloud to our denoiser to acquire the final upsampled point cloud.

We compare the denoising-based upsampling method with the classical upsampling network PU-Net [35] using the test-set of PU-Net. The quantitative results are presented in Table 4 and the qualitative comparison is shown in Figure 8. We see that the denoising-based upsampling method fairly outperforms PU-Net which is specialized in upsampling. This implies that the proposed score-based method for point clouds has the potential in tasks beyond denoising, which will be further explored as our future works.

Refer to caption
Figure 8: Visual comparison between the specialized upsampling method PU-Net and our denoising-based upsampling method.
#Points 5K 10K
PU-Net [35] Ours PU-Net [35] Ours
CD 3.445 1.696 2.862 1.454
P2M 1.669 0.295 1.166 0.181
Table 4: Comparison between PU-Net and our denoising-based point cloud upsampling for the upsampling rate 4x. CD is multiplied by 10410^{4} and P2M is multiplied by 10410^{4}.

6 Conclusions

In this paper, we propose a novel paradigm of point cloud denoising, modeling noisy point clouds as samples from a noise-convolved distribution. We design a neural network architecture to estimate the score of the distribution and leverage on the score to denoise point clouds via gradient ascent. Experimental results validate the superiority of our model and further show the potential to be applied to other tasks such as point cloud upsampling.

References

  • [1] Marc Alexa, Johannes Behr, Daniel Cohen-Or, Shachar Fleishman, David Levin, and Claudio T Silva. Point set surfaces. In Proceedings Visualization, 2001. VIS’01., pages 21–29. IEEE, 2001.
  • [2] Haim Avron, Andrei Sharf, Chen Greif, and Daniel Cohen-Or. \ell1-sparse reconstruction of sharp point set surfaces. ACM Transactions on Graphics (TOG), 29(5):1–12, 2010.
  • [3] Ruojin Cai, Guandao Yang, Hadar Averbuch-Elor, Zekun Hao, Serge Belongie, Noah Snavely, and Bharath Hariharan. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  • [4] Frédéric Cazals and Marc Pouget. Estimating differential quantities using polynomial fitting of osculating jets. Computer Aided Geometric Design, 22(2):121–146, 2005.
  • [5] Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  • [6] Julie Digne and Carlo De Franchis. The bilateral filter for point clouds. Image Processing On Line, 7:278–287, 2017.
  • [7] Chaojing Duan, Siheng Chen, and Jelena Kovacevic. 3d point cloud denoising via deep neural network based local surface estimation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8553–8557. IEEE, 2019.
  • [8] Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 605–613, 2017.
  • [9] Shachar Fleishman, Iddo Drori, and Daniel Cohen-Or. Bilateral mesh denoising. In ACM SIGGRAPH 2003 Papers, pages 950–953. 2003.
  • [10] Michael Gschwandtner, Roland Kwitt, Andreas Uhl, and Wolfgang Pree. Blensor: Blender sensor simulation toolbox. In International Symposium on Visual Computing, pages 199–208. Springer, 2011.
  • [11] Pedro Hermosilla, Tobias Ritschel, and Timo Ropinski. Total denoising: Unsupervised learning of 3d point cloud cleaning. In Proceedings of the IEEE International Conference on Computer Vision, pages 52–60, 2019.
  • [12] Wei Hu, Xiang Gao, Gene Cheung, and Zongming Guo. Feature graph learning for 3D point cloud denoising. IEEE Transactions on Signal Processing, 68:2841–2856, 2020.
  • [13] Wei Hu, Qianjiang Hu, Zehua Wang, and Xiang Gao. Dynamic point cloud denoising via manifold-to-manifold distance. arXiv preprint arXiv:2003.08355, 2020.
  • [14] Wei Hu, Jiahao Pang, Xianming Liu, Dong Tian, Chia-Wen Lin, and Anthony Vetro. Graph Signal Processing for geometric data and beyond: Theory and applications. arXiv preprint arXiv:2008.01918, 2020.
  • [15] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
  • [16] Hui Huang, Shihao Wu, Minglun Gong, Daniel Cohen-Or, Uri Ascher, and Hao Zhang. Edge-aware point set resampling. ACM transactions on graphics (TOG), 32(1):1–12, 2013.
  • [17] Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(Apr):695–709, 2005.
  • [18] Yann LeCun, Sumit Chopra, Raia Hadsell, M Ranzato, and F Huang. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.
  • [19] Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and Pheng-Ann Heng. Pu-gan: a point cloud upsampling adversarial network. In Proceedings of the IEEE International Conference on Computer Vision, pages 7203–7212, 2019.
  • [20] Yongcheng Liu, Bin Fan, Gaofeng Meng, Jiwen Lu, Shiming Xiang, and Chunhong Pan. Densepoint: Learning densely contextual representation for efficient point cloud processing. In Proceedings of the IEEE International Conference on Computer Vision, pages 5239–5248, 2019.
  • [21] Shitong Luo and Wei Hu. Differentiable manifold reconstruction for point cloud denoising. In Proceedings of the 28th ACM International Conference on Multimedia, pages 1330–1338, 2020.
  • [22] Enrico Mattei and Alexey Castrodad. Point cloud denoising via moving rpca. In Computer Graphics Forum, volume 36, pages 123–137. Wiley Online Library, 2017.
  • [23] Francesca Pistilli, Giulia Fracastoro, Diego Valsesia, and Enrico Magli. Learning graph-convolutional representations for point cloud denoising. arXiv preprint arXiv:2007.02578, 2020.
  • [24] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  • [25] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pages 5099–5108, 2017.
  • [26] Marie-Julie Rakotosaona, Vittorio La Barbera, Paul Guerrero, Niloy J Mitra, and Maks Ovsjanikov. Pointcleannet: Learning to denoise and remove outliers from dense point clouds. In Computer Graphics Forum, volume 39, pages 185–203. Wiley Online Library, 2020.
  • [27] Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, and Georgia Gkioxari. Accelerating 3d deep learning with pytorch3d. arXiv:2007.08501, 2020.
  • [28] Yann Schoenenberger, Johan Paratte, and Pierre Vandergheynst. Graph-based denoising for time-varying point clouds. In 2015 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video (3DTV-CON), pages 1–4. IEEE, 2015.
  • [29] Andrés Serna, Beatriz Marcotegui, François Goulette, and Jean-Emmanuel Deschaud. Paris-rue-madame database: a 3d mobile laser scanner dataset for benchmarking urban detection, segmentation and classification methods. 2014.
  • [30] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pages 11918–11930, 2019.
  • [31] Yujing Sun, Scott Schaefer, and Wenping Wang. Denoising point sets via l0 minimization. Computer Aided Geometric Design, 35:2–15, 2015.
  • [32] Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 38(5):1–12, 2019.
  • [33] Linlin Xu, Ruimin Wang, Juyong Zhang, Zhouwang Yang, Jiansong Deng, Falai Chen, and Ligang Liu. Survey on sparsity in geometric modeling and processing. Graphical Models, 82:160–180, 2015.
  • [34] Wang Yifan, Shihao Wu, Hui Huang, Daniel Cohen-Or, and Olga Sorkine-Hornung. Patch-based progressive 3d point set upsampling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5958–5967, 2019.
  • [35] Lequan Yu, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and Pheng-Ann Heng. Pu-net: Point cloud upsampling network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2790–2799, 2018.
  • [36] Faisal Zaman, Ya Ping Wong, and Boon Yian Ng. Density-based denoising of point cloud. In 9th International Conference on Robotic, Vision, Signal Processing and Power Applications, pages 287–295. Springer, 2017.
  • [37] Jin Zeng, Gene Cheung, Michael Ng, Jiahao Pang, and Yang Cheng. 3D point cloud denoising using graph Laplacian regularization of a low dimensional manifold model. IEEE Transactions on Image Processing, 29:3474–3489, December 2019.