This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

11institutetext: 1South China University of Technology
2Tencent AI Lab  3City University of Hong Kong
4School of Data Science, The Chinese University of Hong Kong, Shenzhen https://lzhnb.github.io/project-pages/analytic-splatting/

Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration

Zhihao Liang 11**    Qi Zhang 22††    Wenbo Hu 22    Lei Zhu 33    Ying Feng 22    Kui Jia 44††
Abstract

The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity.

Keywords:
3D Gaussian Splatting Anti-Aliasing View Synthesis Cumulative Distribution Function (CDF) Analytic Approximation
11footnotetext: Work was done during an internship at Tencent AI Lab.22footnotetext: Corresponding authors.
Refer to caption
Figure 1: For shading a pixel by a Gaussian signal, 3DGS (a) only treats the Gaussian signal value corresponding to the pixel center as the intensity response. Analytic-Splatting (b) instead considers an analytic approximation of the integral over the pixel window area as the intensity response. Compared to 3DGS, Analytic-Splatting has anti-aliasing capability and better detail fidelity.

1 Introduction

Novel view synthesis of a scene captured from multiple images has achieved great progress due to the rapid advancements of neural rendering. As a prominent representative, Neural Radiance Field (NeRF) [23] models the scene using a neural volumetric representation, enabling photorealistic rendering of novel views via ray marching. Ray marching trades off rendering efficiency with quality, and subsequent works [33, 8, 24] are proposed to have a better quality-efficiency balance. More recently, 3D Gaussian Splatting (3DGS) [16] proposes a GPU-friendly differentiable rasterization pipeline that incorporates an explicit point-based representation, achieving high-quality and real-time renderings for novel view synthesis. In contrast to ray marching in NeRF, which renders a pixel by accumulating the radiance of samples along the ray that intersects the image plane at the pixel, 3DGS employs a forward-mapping technique that can be rasterized very efficiently. Specifically, 3DGS represents the scene as a set of anisotropic 3D Gaussians with scene properties; when rendering a pixel, 3DGS orders and projects these 3D Gaussians onto the image plane as 2D Gaussians, and then queries values and scene properties associated with the Gaussians that have overlaps with the pixel, and finally shades the pixel by cumulatively compositing these queried values and properties.

3DGS works for scene representation learning and novel view synthesis at constant resolutions; however, its performance degrades greatly either when the multi-view images are captured at varying distances, or when the novel view to be rendered has a resolution different from those of the captured images. The main reason is that the footprint 111The footprint is defined as the ratio between the pixel window area in screen space and its covered Gaussian signals region in the world space. of the pixel changes at different resolutions and 3DGS is insensitive to such changes since it treats each pixel as an isolated point (i.e. merely pixel center) when retrieving the corresponding Gaussian values; Fig. 1a gives an illustration. As a result, 3DGS could produce significant artifacts (e.g. blurry or jaggies) especially when pixel footprints change drastically (e.g. synthesizing novel views with zooming-in and zooming-out effects).

By delving into the details, we know that 3DGS represents a continuous signal in the image space as a set of α\alpha-blended 2D Gaussians, and the pixel shading is a process of integrating the signal response within each pixel area; artifacts in 3DGS are caused by the limited sampling bandwidth for the Gaussians that retrieves erroneous responses, especially when the pixel footprint changes drastically. It is possible to increase sampling bandwidth (i.e. via super sampling) or use prefiltering techniques to alleviate this problem; for example, Mip-Splatting [36] employs the prefiltering technique and presents a hybrid filtering mechanism to regularize the high-frequency components of 2D and 3D Gaussians to achieve anti-aliasing. While Mip-Splatting overcomes most aliasing in 3DGS, it is limited in capturing details and synthesizes over-smoothing results. Consequently, solving the integral of Gaussian signals within the pixel window area as intensity responses is crucial for both anti-aliasing and capturing details.

In this paper, we revisit pixel shading in 3DGS and introduce an analytic approximation of the window integral response of Gaussian signals for anti-aliasing. Rather than discrete sampling in 3DGS and prefiltering in Mip-Splatting, we analytically approximate the integral within each pixel area as shown in Fig. 1b. We term our method as Analytic-Splatting. Compared with Mip-Splatting, which approximates the pixel window as a 2D Gaussian low-pass filter, our proposed method does not suppress the high-frequency components in Gaussian signals and can better preserve high-quality details. Experiments show that our method removes the aliasing existing in 3DGS and other methods while synthesizing more details with better fidelity. We summarize our contributions as follows.

  • We revisit the causes of aliasing in 3D Gaussian Splatting from the perspective of signal window response and derive an analytic approximation of the window response for Gaussian signals;

  • Based on the derivation, we present Analytic-Splatting that improves the pixel shading in 3D Gaussian Splatting to achieve anti-aliasing and better detail fidelity.

  • Our experiments on challenging datasets demonstrate the superiority of our method to other approaches in terms of anti-aliasing and synthesizing results.

2 Related Works

Neural Rendering. Recently, neural rendering techniques exemplified by Neural Radiance Field (NeRF) [23] have achieved impressive results in novel view synthesis, and further enhanced several advanced tasks [30, 34, 21, 14, 22, 25]. Nevertheless, the backward-mapping volume rendering used in NeRF hinders the real-time rendering performance, restricting the application prospects of NeRF. While several NeRF variants adopt efficient sampling strategies [35, 24, 19] or use explicit/hybrid representations [8, 29, 5, 9] with higher capacities, they still suffer from the tough sampling problem and struggle with real-time rendering. To overcome these limitations, 3DGS [16] employs forward mapping volume rendering technology and implements GPU-friendly tile-based rasterization to achieve real-time rendering and impressive rendering results. Due to its real-time rendering capability and impressive rendering performance, 3DGS has been widely used in advanced tasks such as Human/Avatar modeling [27, 12, 38, 40], surface reconstruction [10, 6], inverse rendering [20, 15, 28], physical simulation [32, 7], etc. Although rasterization makes 3DGS avoid tough sampling problems along rays and achieve promising results, it also introduces aliasing caused by restricted sampling bandwidth when shading pixels using 2D Gaussians. And the aliasing will be noticeable when the pixel footprint changes drastically (e.g. zooming in and out). In this paper, we study the errors introduced by the discrete sampling scheme used in 3DGS and introduce our advanced resolution.

Anti-Aliasing. Aliasing is the phenomenon of overlapping frequency components when the discrete sampling rate is below the Nyquist rate. Anti-aliasing is critical for rendering high-fidelity results, which has been extensively explored in the computer graphics and vision community [1, 31, 18]. In the neural rendering context, MipNeRF [2] and Zip-NeRF [4] pioneer the use of prefiltering and multi-sampling to address the aliasing issue in neural radiance fields (NeRF). Recent works also explored the anti-aliased NeRF for unbounded scenes [3], efficient reconstruction [13], and surface reconstruction [39]. All these works are built upon the backward-mapping volume rendering to consider the pixel footprint, by replacing the original ray-casting with cone-casting. However, the backward-mapping volume rendering is too computationally expensive to achieve real-time rendering. On the other hand, 3DGS [16] introduced real-time forward-mapping volume rendering but suffers from aliasing artifacts due to the discrete sampling during shading pixels using projected Gaussians. To this end, Mip-Splatting [36] presents a hybrid filtering mechanism to restrict the high-frequency components of 2D and 3D Gaussians to achieve anti-aliasing. Nevertheless, this low-pass filtering strategy hinders the capability to preserve high-quality details. In contrast, our approach introduces an analytic approximation of the integral within the pixel area to better capture the intensity response of each pixel, harvesting both aliasing-free and detail-preserving rendering results.

3 Preliminary

In this section, we give the technical background necessary for presentation of our proposed method.

3D Gaussian Splatting (3DGS) explicitly represents 3D scene as a set of points {𝒑i}i=1N\{\bm{p}_{i}\}^{N}_{i=1}. Given any point 𝒑{𝒑i}i=1N\bm{p}\in\{\bm{p}_{i}\}^{N}_{i=1}, 3DGS models it as a 3D Gaussian signal with mean vector 𝝁\bm{\mu} and covariance matrix 𝚺\bm{\Sigma}:

g3D(𝒙|𝝁,𝚺)=exp(12(𝒙𝝁)𝚺1(𝒙𝝁)),\vspace{-0.1cm}g^{\text{3D}}(\bm{x}|\bm{\mu},\bm{\Sigma})=\exp\left(-\frac{1}{2}(\bm{x}-\bm{\mu})^{\top}\bm{\Sigma}^{-1}(\bm{x}-\bm{\mu})\right),\vspace{-0.1cm} (1)

where 𝝁3\bm{\mu}\in\mathbb{R}^{3} is the position of point 𝒑\bm{p}, and 𝚺3×3\bm{\Sigma}\in\mathbb{R}^{3\times 3} is an anisotropic covariance matrix, which is factorized into a scaling matrix 𝑺\bm{S} and a rotation matrix 𝑹\bm{R} as 𝚺=𝑹𝑺𝑺𝑹\bm{\Sigma}=\bm{R}\bm{S}\bm{S}^{\top}\bm{R}^{\top}. Note that 𝑺\bm{S} indicates a 3×33\times 3 diagonal matrix and 𝑹\bm{R} refers to a 3×33\times 3 matrix constructed from a unit quaternion 𝒒\bm{q}.

Given a viewing transformation with extrinsic matrix 𝑻\bm{T} and projection matrix 𝑲\bm{K}, we get the projected position 𝝁^\hat{\bm{\mu}} and covariance matrix 𝚺^\hat{\bm{\Sigma}} in 2D screen space as:

𝝁^=𝑲𝑻[𝝁,1],𝚺^=𝑱𝑻𝚺𝑻𝑱,\hat{\bm{\mu}}=\bm{KT}[\bm{\mu},1]^{\top},\quad\hat{\bm{\Sigma}}=\bm{J}\bm{T}\bm{\Sigma}\bm{T}^{\top}\bm{J}^{\top}, (2)

where 𝑱\bm{J} is the Jacobian matrix of the affine approximation of the perspective projection. Note that 3DGS only retains the second-order values of 𝝁^\hat{\bm{\mu}} and 𝚺^\hat{\bm{\Sigma}} as 𝝁^2\hat{\bm{\mu}}\in\mathbb{R}^{2} and 𝚺^2×2\hat{\bm{\Sigma}}\in\mathbb{R}^{2\times 2} respectively. The projected 2D Gaussian signal for the pixel 𝒖\bm{u} is given:

g2D(𝒖|𝝁^,𝚺^)=exp(12(𝒖𝝁^)𝚺^1(𝒖𝝁^)),\vspace{-0.1cm}g^{\text{2D}}(\bm{u}|\hat{\bm{\mu}},\hat{\bm{\Sigma}})=\exp\left(-\frac{1}{2}(\bm{u}-\hat{\bm{\mu}})^{\top}\hat{\bm{\Sigma}}^{-1}(\bm{u}-\hat{\bm{\mu}})\right),\vspace{-0.1cm} (3)

using the projected 2D Gaussian signal, 3DGS derives the volume transmittance and shades the color of pixel 𝒖\bm{u} through:

𝑪(𝒖)=iNTigi2D(𝒖|𝝁i^,𝚺i^)αi𝒄i,Ti=j=1i1(1gj2D(𝒖|𝝁j^,𝚺j^)αj),\vspace{-0.1cm}\bm{C}(\bm{u})=\sum_{i\in N}T_{i}g^{\text{2D}}_{i}(\bm{u}|\hat{\bm{\mu}_{i}},\hat{\bm{\Sigma}_{i}})\alpha_{i}\bm{c}_{i},\quad T_{i}=\prod^{i-1}_{j=1}(1-g^{\text{2D}}_{j}(\bm{u}|\hat{\bm{\mu}_{j}},\hat{\bm{\Sigma}_{j}})\alpha_{j}),\vspace{-0.1cm} (4)

where the symbols with subscripts ii indicate the attributes related to the point 𝒑i\bm{p}_{i}. Specifically, αi\alpha_{i} and 𝒄i\bm{c}_{i} respectively denote the opacity and view-dependent color of point 𝒑i\bm{p}_{i}.

For better understanding, we further formulate the 2D Gaussian signal g2Dg^{\text{2D}} in Eq. 3 as a flattened expression. Considering 𝚺^\hat{\bm{\Sigma}} is a real-symmetric 2×22\times 2 matrix, we numerically express 𝚺^\hat{\bm{\Sigma}} and 𝚺^1\hat{\bm{\Sigma}}^{-1} as:

𝚺^=[Σ^11Σ^12Σ^12Σ^22]\displaystyle\hat{\bm{\Sigma}}=\begin{bmatrix}\hat{\Sigma}_{11}\ &\ \hat{\Sigma}_{12}\\ \hat{\Sigma}_{12}\ &\ \hat{\Sigma}_{22}\end{bmatrix} ,𝚺^1=1|𝚺^|[Σ^22Σ^12Σ^12Σ^11]=[abbc],\displaystyle,\quad\hat{\bm{\Sigma}}^{-1}=\frac{1}{|\hat{\bm{\Sigma}}|}\begin{bmatrix}\hat{\Sigma}_{22}\ &\ -\hat{\Sigma}_{12}\\ -\hat{\Sigma}_{12}\ &\ \hat{\Sigma}_{11}\end{bmatrix}=\begin{bmatrix}a\ &\ b\\ b\ &\ c\end{bmatrix}, (5)
a\displaystyle a =Σ^22|𝚺^|,b=Σ^12|𝚺^|,c=Σ^11|𝚺^|,\displaystyle=\frac{\hat{\Sigma}_{22}}{|\hat{\bm{\Sigma}}|},\quad b=-\frac{\hat{\Sigma}_{12}}{|\hat{\bm{\Sigma}}|},\quad c=\frac{\hat{\Sigma}_{11}}{|\hat{\bm{\Sigma}}|},

given the pixel 𝒖=[ux,uy]\bm{u}=[u_{x},u_{y}]^{\top} and the projected position 𝝁^=[μ^x,μ^y]\hat{\bm{\mu}}=[\hat{\mu}_{x},\hat{\mu}_{y}]^{\top} in 2D screen space, Eq. 3 can be rewrited as:

g2D(𝒖\displaystyle g^{\text{2D}}(\bm{u} |𝝁^,𝚺^)=exp(a2u^x2c2u^y2bu^xu^y),\displaystyle|\hat{\bm{\mu}},\hat{\bm{\Sigma}})=\exp\left(-\frac{a}{2}\hat{u}_{x}^{2}-\frac{c}{2}\hat{u}_{y}^{2}-b\hat{u}_{x}\hat{u}_{y}\right), (6)
u^x\displaystyle\hat{u}_{x} =uxμ^x,u^y=uyμ^y.\displaystyle=u_{x}-\hat{\mu}_{x},\quad\hat{u}_{y}=u_{y}-\hat{\mu}_{y}.

Remark. It is worth noting that 3DGS treats each pixel as an isolated, single point when calculating its corresponding Gaussian value, as shown in Eq. (6). This approximating scheme functions effectively when training and testing images to capture the scene content from a relatively consistent distance. However, when the pixel footprint changes due to focal length or camera distance adjustments, 3DGS renderings exhibit considerable artifacts, such as thin Gaussians observed during zooming in. As a result, it becomes crucial to define the pixel as a window area and calculate the corresponding intensity response by integrating the Gaussian signal within this domain. Rather than using the intuitive but time-consuming super sampling, it would be better to tackle the problem more analytically, given that the Gaussian Signal is a continuous function.

4 Methods

In Sec. 3, we observe that 3DGS ignores the window area of each pixel, and only considers the Gaussian value corresponding to the pixel center as its intensity response. This approach would inevitably produce artifacts due to fluctuations of the pixel footprint under different resolutions. To address this problem, we are motivated to derive an analytical approximation of a 2D Gaussian signal within the pixel window area to accurately describe the intensity response of the imaging pixel. Subsequently, we plan to apply this derived integration to replace g2Dg^{\text{2D}} in the 3DGS framework.

Refer to caption
(a) Integration (Ground-truth)
Refer to caption
(b) Baseline (3DGS)
Refer to caption
(c) Super Sampling (3DGS-SS)
Refer to caption
(d) Filtering (Mip-Splatting)
Refer to caption
(e) Analytic Approximation (Ours)
Figure 2: Example diagram of the signal integration within window area and the approximation schemes used in different methods.

4.1 Revisit One-dimensional Gaussian Signal Response

Before describing our proposed method, we first revisit the example of the integrated response of a one-dimensional Gaussian Signal within a window area for better understanding. Given a signal g(x)g(x) and a window area [x1,x2][x_{1},x_{2}], we aim to get the response by integrating the signal g=x1x2g(x)dx\mathcal{I}_{g}=\int^{x_{2}}_{x_{1}}g(x)\text{d}x within this domain as shown in Fig. 2(a). For an unknown signal, Monte Carlo sampling within the window area is a feasible approach to approximate the integral as demonstrated in Fig. 2(b) and Fig. 2(c), and the approximation result gx2x1Ni=1Ng(xi),xi[x1,x2]\mathcal{I}_{g}\approx\frac{x_{2}-x_{1}}{N}\sum^{N}_{i=1}g(x_{i}),\ x_{i}\in[x_{1},x_{2}] will be more accurate as the number of samples NN increases. Nonetheless, increasing the number of samples (i.e., super sampling) increases the computation burden significantly.

Fortunately, our goal in the 3DGS framework is to obtain the intensity response of the Gaussian signal within the window area [x1,x2][x_{1},x_{2}]. Given that the Gaussian signal is a continuous real-valued function, it is natural to derive an analytical approximation to the Gaussian definite integral (Fig. 2(a)) which is more accurate compared to the numerical integration (Fig. 2(b) and Fig. 2(c)). For instance, in Mip-Splatting, the window area is treated as a Gaussian kernel gwg_{\text{w}}, and the integral is approximated as the result of sampling after convolving the Gaussian signal with the Gaussian kernel gggw\mathcal{I}_{g}\approx g\circledast g_{\text{w}} 222the standard deviation σ\sigma in gwg_{\text{w}} is set to 0.10.1. Note that the result of convolving a Gaussian signal with a Gaussian kernel is still a Gaussian signal as shown in Fig. 2(d). While this prefiltering uses the great convolution properties of Gaussian signals, this approximation will introduce a large gap when the Gaussian signal gg mainly consists of high-frequency components (i.e., with small standard variation σ\sigma).

To overcome these shortcomings, we are motivated to calculate the integral within the window area analytically. Specifically, the problem of evaluating the definite integral within [x1,x2][x_{1},x_{2}] can be simplified to the subtraction of two improper integrals by applying the first part of the fundamental theorem of calculus. Let G(x)G(x) be the cumulative distribution function (CDF) of the standard Gaussian distribution g(x)g(x) defined by,

G(x)=xg(x)dx,g(x)\displaystyle G(x)=\int^{x}_{-\infty}g(x)\text{d}x,\quad g(x) =12πexp(x22),\displaystyle=\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^{2}}{2}\right), (7)

and the definite integral of g(x)g(x) within [x1,x2][x_{1},x_{2}] can be expressed as,

g=G(x2)G(x1).\mathcal{I}_{g}=G(x_{2})-G(x_{1}). (8)

However, this CDF of the Gaussian function (defined as the error function erf) is not closed-form. We start by approximating the CDF G(x)G(x) and find an important corollary, i.e.,

Definition 1

The logistic function S(x)S(x) is the analytic approximation of the CDF G(x)G(x) with standard deviation σ=1\sigma=1, which is defined as,

S(x)=11+exp(1.6x0.07x3),S(x)=\frac{1}{1+\exp\left(-1.6\cdot x-0.07\cdot x^{3}\right)}, (9)

and could be derivative-friendly.

This analytic approximation also contains similar properties of the CDF G(x)G(x):

  1. 1.

    S(x)S(x) is non-decreasing and right-continuous, satisfying

    limxG(x)=limxS(x)=0andlimx+G(x)=limx+S(x)=1.\lim_{x\rightarrow-\infty}G(x)=\lim_{x\rightarrow-\infty}S(x)=0\quad\text{and}\quad\lim_{x\rightarrow+\infty}G(x)=\lim_{x\rightarrow+\infty}S(x)=1.
  2. 2.

    The curve of S(x)S(x) has 2-fold rotational symmetry around the point (0,1/2)(0,\nicefrac{{1}}{{2}}),

    G(x)+G(x)=S(x)+S(x)=1,x.G(x)+G(-x)=S(x)+S(-x)=1,\ \forall x\in\mathbb{R}.

For any Gaussian signals with different standard deviations, we can approximate their CDFs by scaling xx in Eq. 9 by the reciprocals of their standard deviations. Once xx in S(x)S(x) scales by the reciprocal of σ\sigma, we express it as Sσ(x)S_{\sigma}(x). For more details, please refer to the supplementary material.

In summary, given the sample uu and setting the window area as 11, the integral g(u)\mathcal{I}_{g}(u) of Gaussian signal g(x)g(x) within the area [u12,u+12][u-\frac{1}{2},u+\frac{1}{2}] is defined as:

g(u)=u12u+12g(x)dx=G(u+12)G(u12).\mathcal{I}_{g}(u)=\int^{u+\frac{1}{2}}_{u-\frac{1}{2}}g(x)\text{d}x=G(u+\frac{1}{2})-G(u-\frac{1}{2}). (10)

Besides, according to Eq. 9, we apply Eq. 9 and approximate the integral of Eq. 10 as:

g(u)S(u+12)S(u12).\mathcal{I}_{g}(u)\approx S(u+\frac{1}{2})-S(u-\frac{1}{2}). (11)

In the following section, we will discuss how to employ the above approximation to 2D-pixel shading in Analytic-Splatting.

Refer to caption
(a) Integral over the original pixel area.
Refer to caption
(b) Integral over the rotated pixel area.
Figure 3: Example diagram of the pixel integration domain in Eq. 12 and the domain after rotation in Eq. 14. The yellow lines in Fig. 3(a) are the coordinate axes of 2D screen space; And the yellow lines in Fig. 3(b) are the eigenvectors scaled by the eigenvalues.

4.2 Analytic-Splatting

After revisiting the one-dimensional Gaussian signal integration, we tend to approximate the integral of projected 2D Gaussians within each pixel window area in Analytic-Splatting. Mathematically, we replace the sampled g2D(𝒖)g^{\text{2D}}(\bm{u}) in Eq. 4 with the approximated integral g2D(𝒖)\mathcal{I}_{g}^{\text{2D}}(\bm{u}). For the pixel 𝒖=[ux,uy]\bm{u}=[u_{x},u_{y}]^{\top} in 2D screen space, which corresponds to the window area Ω𝒖\Omega_{\bm{u}} as shown in Fig. 3(a), the integration of Gaussian signal in Eq. 3 is represented as:

g2D(𝒖)=ux12ux+12uy12uy+12exp(a2(xμ^x)2c2(yμ^y)2b(xμ^x)(yμ^y)correlation term)dxdy.\displaystyle\mathcal{I}_{g}^{\text{2D}}(\bm{u})=\int^{u_{x}+\frac{1}{2}}_{u_{x}-\frac{1}{2}}\int^{u_{y}+\frac{1}{2}}_{u_{y}-\frac{1}{2}}\exp\left(-\frac{a}{2}(x-\hat{\mu}_{x})^{2}-\frac{c}{2}(y-\hat{\mu}_{y})^{2}-\underbrace{b(x-\hat{\mu}_{x})(y-\hat{\mu}_{y})}_{\text{correlation term}}\right)\ \text{d}x\text{d}y. (12)

Notably, handling the correlation term in this integral is intractable. To unravel the correlation term and thus feasibly solve the integral, we diagonalize the covariance matrix 𝚺^\hat{\bm{\Sigma}} of the 2D Gaussian g2Dg^{\text{2D}} and slightly rotate the integration domain as shown in Fig. 3(b), thus approximating the integral by multiplying two independent 1D Gaussian integrals.

In detail, we first perform eigenvalue decomposition on the covariance matrix 𝚺^\hat{\bm{\Sigma}} (refer to Eq. 5) to obtain eigenvalues {λ1,λ2}\{\lambda_{1},\lambda_{2}\} and the corresponding eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\}. After diagonalization, for better description, we take 𝝁^=[μ^x,μ^y]\hat{\bm{\mu}}=[\hat{\mu}_{x},\hat{\mu}_{y}]^{\top} (the mean vector of g2Dg^{\text{2D}}) as the origin and the eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\} as the axis to construct a new coordinate system.

In this coordinate system, given a pixel 𝒖=[ux,uv]\bm{u}=[u_{x},u_{v}]^{\top}, we rewrite the g2Dg^{\text{2D}} in Eq. 6 as the multiplication of two independent 1D Gaussians:

g2D(𝒖)\displaystyle g^{\text{2D}}(\bm{u}) =exp(12λ1u~x212λ2u~y2)=exp(12λ1u~x2)exp(12λ2u~y2),\displaystyle=\exp\left(-\frac{1}{2\lambda_{1}}\tilde{u}_{x}^{2}-\frac{1}{2\lambda_{2}}\tilde{u}_{y}^{2}\right)=\exp\left(-\frac{1}{2\lambda_{1}}\tilde{u}_{x}^{2}\right)\exp\left(-\frac{1}{2\lambda_{2}}\tilde{u}_{y}^{2}\right), (13)
𝒖~\displaystyle\tilde{\bm{u}} =[u~xu~y]=[𝒗1𝒗2](𝒖𝝁^)=[𝒗1𝒗2][uxμ^xuyμ^y],\displaystyle=\begin{bmatrix}\tilde{u}_{x}\\ \tilde{u}_{y}\end{bmatrix}=\begin{bmatrix}-&\bm{v}_{1}&-\\ -&\bm{v}_{2}&-\end{bmatrix}\left(\bm{u}-\hat{\bm{\mu}}\right)=\begin{bmatrix}-&\bm{v}_{1}&-\\ -&\bm{v}_{2}&-\end{bmatrix}\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix},

where 𝒖~=[u~x,u~y]\tilde{\bm{u}}=[\tilde{u}_{x},\tilde{u}_{y}]^{\top} denotes the diagonalized coordinate of the pixel center. After diagonalization, we further rotate the pixel integration domain Ω𝒖\Omega_{\bm{u}} along the pixel center to align it with the eigenvectors and get Ω~𝒖\tilde{\Omega}_{\bm{u}} for approximating the integral. Thus the integral in Eq. 12 can be approximated as:

g2D(𝒖)\displaystyle\mathcal{I}_{g}^{\text{2D}}(\bm{u}) Ω~𝒖g2D(𝒖)d𝒖=u~x12u~x+12exp(12λ1x2)dxu~y12u~y+12exp(12λ2y2)dy\displaystyle\approx\int_{\tilde{\Omega}_{\bm{u}}}g^{\text{2D}}(\bm{u})\text{d}\bm{u}=\int^{\tilde{u}_{x}+\frac{1}{2}}_{\tilde{u}_{x}-\frac{1}{2}}\exp\left(-\frac{1}{2\lambda_{1}}x^{2}\right)\text{d}x\int^{\tilde{u}_{y}+\frac{1}{2}}_{\tilde{u}_{y}-\frac{1}{2}}\exp\left(-\frac{1}{2\lambda_{2}}y^{2}\right)\text{d}y (14)
=2πλ1λ2u~x12u~x+1212πλ1exp(12λ1x2)dxu~y12u~y+1212πλ2exp(12λ2y2)dy\displaystyle=2\pi\sqrt{\lambda_{1}\lambda_{2}}\int^{\tilde{u}_{x}+\frac{1}{2}}_{\tilde{u}_{x}-\frac{1}{2}}\frac{1}{\sqrt{2\pi\lambda_{1}}}\exp\left(-\frac{1}{2\lambda_{1}}x^{2}\right)\text{d}x\int^{\tilde{u}_{y}+\frac{1}{2}}_{\tilde{u}_{y}-\frac{1}{2}}\frac{1}{\sqrt{2\pi\lambda_{2}}}\exp\left(-\frac{1}{2\lambda_{2}}y^{2}\right)\text{d}y
2πσ1σ2[Sσ1(u~x+12)Sσ1(u~x12)][Sσ2(u~y+12)Sσ2(u~y12)].\displaystyle\approx 2\pi\sigma_{1}\sigma_{2}\left[S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})-S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})\right]\left[S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})-S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})\right].

where σ\sigma_{\ast} subscripts of GG and SS respectively correspond to Gaussian signals with the standard deviation σ\sigma_{\ast}. σ1=λ1\sigma_{1}=\sqrt{\lambda_{1}} and σ2=λ2\sigma_{2}=\sqrt{\lambda_{2}} denote the standard derivations of the independent Gaussian signals along two eigenvectors respectively. In summary, the volume shading in Analytic-Splatting is given by:

𝑪(𝒖)\displaystyle\bm{C}(\bm{u}) =iNTigi2D(𝒖|𝝁i^,𝚺i^)αi𝒄i,Ti=j=1i1(1gj2D(𝒖|𝝁j^,𝚺j^)αj),\displaystyle=\sum_{i\in N}T_{i}\mathcal{I}_{g-i}^{\text{2D}}(\bm{u}|\hat{\bm{\mu}_{i}},\hat{\bm{\Sigma}_{i}})\alpha_{i}\bm{c}_{i},\quad T_{i}=\prod^{i-1}_{j=1}(1-\mathcal{I}_{g-j}^{\text{2D}}(\bm{u}|\hat{\bm{\mu}_{j}},\hat{\bm{\Sigma}_{j}})\alpha_{j}), (15)
g2D(𝒖)\displaystyle\mathcal{I}_{g}^{\text{2D}}(\bm{u}) =2πσ1σ2[Sσ1(u~x+12)Sσ1(u~x12)][Sσ2(u~y+12)Sσ2(u~y12)].\displaystyle=2\pi\sigma_{1}\sigma_{2}\left[S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})-S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})\right]\left[S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})-S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})\right].

5 Experiments

5.1 Approximation Error Analysis

In this section, we first comprehensively explore the approximation errors in our scheme and then conduct an elaborate comparison against other schemes. It is worth noting that during training, the pruning and densification schemes proposed in 3DGS tend to maintain the standard deviations of Gaussian signals within an appropriate range (i.e. σ[0.3, 6.6]\sigma\in[0.3,\ 6.6]), and each Gaussian signal only responds to pixels within the 99%99\% confidence interval (i.e. |x|<3σ|x|<3\sigma) for shading. Therefore, we only consider the Gaussian signals with standard deviations in [0.3, 6.6][0.3,\ 6.6], and merely discuss the approximation error of pixels within the 99%99\% confidence interval.

Referring to Eq. 7 and Eq. 9, we get the approximation error function about the CDF of the Gaussian function:

CDF(x)=|S(x)G(x)|,\small\mathcal{E}_{\text{CDF}}(x)=\left\lvert S(x)-G(x)\right\rvert, (16)

and referring to Eq. 10 and Eq. 11, the approximation error function regarding the 1-width window area integral response is:

Int(x)=|(S(x+12)S(x+12))(G(x+12)G(x+12))|,\small\mathcal{E}_{\text{Int}}(x)=\left\lvert\left(S(x+\frac{1}{2})-S(x+\frac{1}{2})\right)-\left(G(x+\frac{1}{2})-G(x+\frac{1}{2})\right)\right\rvert, (17)

the approximation error CDF\mathcal{E}_{\text{CDF}} and Int\mathcal{E}_{\text{Int}} referring to different standard deviations are shown in Fig. 4(a) and Fig. 4(b) respectively. 333Since Eq. 16 and Eq. 17 are even functions, we show the results for the positive semi-axis over 0x60\leq x\leq 6 in Fig. 4. Please note that the errors shown in Fig. 4 are scaled by a tiny factor of 1e41\mathrm{e}{-4}.

Refer to caption
(a) Approximation errors of our conditioned logistic function Eq. 9 to the CDF Eq. 7 referring to different standard deviations.
Refer to caption
(b) Approximation errors of our approximated window integral Eq. 11 to the ground-truth Eq. 10 referring to different standard deviations.
Figure 4: Error Analysis of using our conditioned logistic function Eq. 9 to approximate CDF of Gaussian signals Eq. 7 and window integration Eq. 10. Note that the scaling factor of the error is 1e41e^{-4}.
Refer to caption
(a) Approximation errors referring to different standard variations.
Refer to caption
(b) Approximation errors for different variable distribution.
Refer to caption
(c) Approximation errors of rotating the integral domain by different angles in Fig. 3.
Figure 5: Error Analysis of approximating the window integral Eq. 10 using different schemes in Fig. 2.

For approximating the integral response over the 1-width window area, our scheme significantly outperforms other schemes. Fig. 5(a) and Fig. 5(b) respectively show that one-dimensional approximation error with different standard deviations and variable distributions. Our scheme is robust to these two conditions, especially when the standard deviations of Gaussian signals become smaller, our advantage becomes more obvious, which means that our scheme is better at capturing the high-frequency signals (i.e. details) of the scene, and our subsequent experimental results also verify this.

PSNR \uparrow SSIM \uparrow LPIPS \downarrow
Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg. Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg. Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg.
NeRF w/o area\mathcal{L}_{\text{area}} 31.20 30.65 26.25 22.53 27.66 0.950 0.956 0.930 0.871 0.927 0.055 0.034 0.043 0.075 0.052
NeRF [23] 29.90 32.13 33.40 29.47 31.23 0.938 0.959 0.973 0.962 0.958 0.074 0.040 0.024 0.039 0.044
MipNeRF [2] 32.63 34.34 35.47 35.60 34.51 0.958 0.970 0.979 0.983 0.973 0.047 0.026 0.017 0.012 0.026
Plenoxels [8] 31.60 32.85 30.26 26.63 30.34 0.956 0.967 0.961 0.936 0.955 0.052 0.032 0.045 0.077 0.051
TensoRF [5] 32.11 33.03 30.45 26.80 30.60 0.956 0.966 0.962 0.939 0.956 0.056 0.038 0.047 0.076 0.054
Instant-NGP [24] 30.00 32.15 33.31 29.35 31.20 0.939 0.961 0.974 0.963 0.959 0.079 0.043 0.026 0.040 0.047
Tri-MipRF [13] 32.65 34.24 35.02 35.53 34.36 0.958 0.971 0.980 0.987 0.974 0.047 0.027 0.018 0.012 0.026
3DGS [16] 28.79 30.66 31.64 27.98 29.77 0.943 0.962 0.972 0.960 0.960 0.065 0.038 0.025 0.031 0.040
3DGS-SS [16] 32.05 33.78 33.92 31.12 32.71 0.964 0.975 0.980 0.977 0.974 0.039 0.021 0.016 0.020 0.024
Mip-Splatting [36] 32.81 34.49 35.45 35.50 34.56 0.967 0.977 0.983 0.988 0.979 0.035 0.019 0.013 0.010 0.019
Ours 33.22 34.92 35.98 36.00 35.03 0.967 0.977 0.984 0.989 0.979 0.033 0.019 0.012 0.010 0.018
Table 1: Quantatitive Comparison of Analytic-Splatting against several cutting-edge methods on the multi-scale Blender Synthetic dataset [2]. These methods conduct multi-scale training and testing.

Last but not least, we employ this scheme to approximate the window integral responses of two-dimensional Gaussian signals, which requires rotating the integration domain from Ω𝒖\Omega_{\bm{u}} to Ω~𝒖\tilde{\Omega}_{\bm{u}} as shown in Fig. 3 and inevitably introduces additional errors. To study this error, we record the approximation errors caused by rotating the integration domain from 00^{\circ} to 4545^{\circ} 444Since we always hold σ1σ2\sigma_{1}\geq\sigma_{2} in practice, thus we only consider the approximation error caused by the rotation angle from 00^{\circ} to 4545^{\circ}. under different standard deviations of Gaussian signals and distributions, as shown in Fig. 5(c). Although the approximation error of our scheme slightly increases as the rotation angle becomes larger, our scheme still surpasses other schemes.

5.2 Comparison

To verify the anti-aliasing capability and versatility of Analytic-Splatting, we conduct experiments against state-of-the-art methods under the multi-scale training & multi-scale testing (MTMT) setting on Blender Synthetic [23, 2] and Mip-NeRF 360 [3] datasets. We further evaluate the performance of 3DGS and its variants under the super-resolution setting.

Dataset & Metric. We conduct experiments using benchmark datasets of multi-scale Blender Synthetic [23, 2] and multi-scale Mip-NeRF 360 [3]. They respectively contain 8 objects and 9 scenes, each object and scene is compiled by downscaling the original dataset with a factor of 2, 4, and 8, and combining. For the Blender Synthetic dataset, each object contains 100 images for training and 200 images for testing. For the Mip-NeRF 360 dataset, we select 1 image from every 8 images for testing and the remaining images for training. To verify the efficacy of our method, we evaluate the synthesized novel view at multiple scales on both datasets in terms of Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Path Similarity (LPIPS) [37].

Refer to caption
Figure 6: Qualitative comparison of full-resolution and low-resolution (1/8\nicefrac{{1}}{{8}}) on Multi-Scale Blender [2]. All methods are trained on images with downsampling rates covering [1, 2, 4, 8]. Our method can better overcome the artifacts in 3DGS with better fidelity of details.

Implementation. We implement Analytic-Splatting upon 3DGS [16] and custom our shading module with CUDA extensions. Following 3DGS, we train Analytic-Splatting using the same parameters, training schedule, and loss functions, ensuring the efficacy of our scheme. To achieve super sampling of Gaussian signals, we implement 3DGS-SS, which first renders an image at twice the target resolution and obtains the final image at the target resolution through average pooling. For the MTMT setting, we follow previous works [2, 13, 36] and tend to select more full-resolution images as supervision samples during training. Please refer to supplementary material for more details on the backpropagation implementation of rendering.

PSNR \uparrow SSIM \uparrow LPIPS \downarrow
Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg. Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg. Full Res. 1/2\nicefrac{{1}}{{2}} Res. 1/4\nicefrac{{1}}{{4}} Res. 1/8\nicefrac{{1}}{{8}} Res. Avg.
Mip-NeRF 360 [3] 27.50 29.19 30.45 30.86 29.50 0.778 0.864 0.912 0.931 0.871 0.254 0.136 0.077 0.058 0.131
Mip-NeRF 360 + iNGP 26.46 27.92 27.67 25.58 26.91 0.773 0.855 0.866 0.804 0.824 0.253 0.142 0.117 0.159 0.167
Zip-NeRF [4] 28.25 30.01 31.56 32.52 30.58 0.822 0.891 0.933 0.955 0.900 0.198 0.099 0.056 0.038 0.098
3DGS [16] 26.55 28.00 28.51 27.45 27.63 0.779 0.854 0.891 0.888 0.853 0.274 0.162 0.102 0.087 0.156
3DGS-SS [16] 27.20 28.75 29.89 29.71 28.89 0.800 0.871 0.914 0.928 0.878 0.246 0.138 0.081 0.061 0.131
Mip-Splatting [36] 27.20 28.74 29.90 30.66 29.12 0.802 0.870 0.915 0.944 0.883 0.244 0.146 0.090 0.056 0.134
Ours 27.50 28.99 30.35 31.21 29.51 0.808 0.874 0.919 0.945 0.887 0.231 0.132 0.077 0.051 0.123
Table 2: Quantatitive Comparison of Analytic-Splatting against several cutting-edge methods on the multi-scale Mip-NeRF 360 dataset [3, 4]. These methods conduct multi-scale training and testing.

Evaluation on Blender Synthetic Dataset. We compare our Analytic-Splatting with several state-of-the-art methods i.e. NeRF [23], MipNeRF [2], Plenoxels [8], TensoRF [5], Instant-NGP [24], Tri-MipRF [13], 3DGS [16] and its variants (i.e. 3DGS-SS, Mip-Splatting [36]) on the Blender Synthetic dataset. The quantitative results in Tab. 1 show that Analytic-Splatting outperforms other methods in all aspects. The qualitative results in Fig. 6 demonstrate that Analytic-Splatting can better capture high-frequency details while being anti-aliased. More results can be found in the supplementary material.

Refer to caption
Figure 7: Qualitative comparisons of full-resolution and low-resolution on Multi-Scale Mip-NeRF 360. [3, 4] All methods are trained on images with downsampling rates covering [1, 2, 4, 8]. Our method can better overcome the artifacts with better fidelity of details. Please note that the artifacts of 3DGS/3DGS-SS become obvious at low resolutions, especially on elongated shapes (e.g. wheels and flower stems), while Mip-Splatting produces over-smoothing results (e.g. lobes).

Evaluation on Mip-NeRF 360 Dataset. We compare our Analytic-Splatting with several cut-edge methods i.e. Mip-NeRF 360 and its iNGP encoding version [3], Zip-NeRF [4], 3DGS [16] and its variants (i.e. 3DGS-SS, Mip-Splatting [36]) on the challenging Mip-NeRF 360 dataset. The results of Zip-NeRF are reported from the available official implementation 555https://github.com/jonbarron/camp_zipnerf. Please note that Mip-NeRF 360 and Zip-NeRF struggle with real-time rendering, especially Zip-NeRF performs supersampling techniques in the rendering phase. The quantitative results in Tab. 2 show that our method is second only to Zip-NeRF, and we outperform other methods with real-time rendering capability (i.e. 3DGS and its variants). The qualitative comparisons in Fig. 7 demonstrate that our method has better anti-aliasing capability and detail fidelity despite facing complex scenes. More results can be found in the supplementary material.

Super-Resolution Evaluation on Mip-NeRF 360 Dataset. We further evaluate our method against other 3DGS and its variants under the super-resolution setting (2×2\times Res.) on the Mip-NeRF 360 dataset. All methods are trained on the Multi-Scale Mip-NeRF 360 dataset. The quantitative results are shown in Tab. 3 and the qualitative results are shown in Fig. 8. Both results demonstrate the superior performance of our method, further supporting the capability of Analytic-Splatting in capturing details. Conversely, Mip-Splatting is insufficient in capturing details due to pre-filtering.

PSNR \uparrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
3DGS 23.14 20.28 24.62 25.44 21.90 30.27 28.08 29.51 30.34 25.95
3DGS-SS 23.98 20.84 25.48 26.24 22.12 30.90 28.69 30.53 31.34 26.68
Mip-Splatting 23.82 20.60 24.97 25.78 21.82 30.95 28.68 30.45 31.07 26.46
Ours 24.22 20.97 25.72 26.29 22.04 31.04 28.90 31.10 31.83 26.90
SSIM \uparrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
3DGS 0.639 0.492 0.707 0.706 0.578 0.902 0.891 0.893 0.916 0.747
3DGS-SS 0.675 0.527 0.747 0.739 0.596 0.908 0.901 0.907 0.926 0.769
Mip-Splatting 0.671 0.526 0.737 0.728 0.589 0.906 0.898 0.900 0.924 0.764
Ours 0.683 0.535 0.761 0.739 0.596 0.910 0.904 0.911 0.930 0.774
LPIPS \downarrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
3DGS 0.382 0.471 0.325 0.378 0.462 0.324 0.314 0.241 0.321 0.358
3DGS-SS 0.345 0.438 0.281 0.340 0.435 0.314 0.297 0.220 0.307 0.333
Mip-Splatting 0.341 0.433 0.291 0.338 0.439 0.309 0.295 0.216 0.300 0.329
Ours 0.342 0.429 0.268 0.338 0.434 0.307 0.291 0.209 0.300 0.324
Table 3: Quantatitive comparison of super-resolution (2×\times) on Multi-Scale Mip-NeRF 360. [3, 4] All methods are trained on images with downsampling rates covering [1, 2, 4, 8]. we boldly mark the best results.
Refer to caption
Figure 8: Qualitative comparison of super-resolution (2×\times) on Multi-Scale Mip-NeRF 360. [3, 4] All methods are trained on images with downsampling rates covering [1, 2, 4, 8]. Please note the high-frequency aliasing of 3DGS-SS and the over-smoothing of Mip-Splatting. Our results are closest to the ground truth.

6 Conclusion

In this paper, we first revisit the window response of one-dimensional Gaussian signals and reason about an analytical and accurate approximation using a conditioned logistic function. We then introduce this approximation in the two-dimensional pixel shading and present Analytic-Splatting, which approximates the pixel area integral response to achieve anti-aliasing capability and better detail fidelity. Our extensive experiments demonstrate the efficacy of Analytic-Splatting in achieving state-of-the-art novel view synthesis results under multi-scale and super-resolution settings.

Limitations. Compared with 3DGS and Mip-Splatting, our shading implementation introduces more root and exponential operations, which inevitably increases the computational burden and reduces the frame rate. Despite this, our frame rate is only 10%10\% lower than Mip-Splatting, which is also an anti-aliasing approach.

References

  • [1] Akeley, K.: Reality engine graphics. In: Conference on Computer graphics and interactive techniques (1993)
  • [2] Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5855–5864 (2021)
  • [3] Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5470–5479 (2022)
  • [4] Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: Anti-aliased grid-based neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)
  • [5] Chen, A., Xu, Z., Geiger, A., Yu, J., Su, H.: Tensorf: Tensorial radiance fields. In: European Conference on Computer Vision. pp. 333–350. Springer (2022)
  • [6] Chen, H., Li, C., Lee, G.H.: Neusg: Neural implicit surface reconstruction with 3d gaussian splatting guidance. arXiv preprint arXiv:2312.00846 (2023)
  • [7] Feng, Y., Feng, X., Shang, Y., Jiang, Y., Yu, C., Zong, Z., Shao, T., Wu, H., Zhou, K., Jiang, C., et al.: Gaussian splashing: Dynamic fluid synthesis with gaussian splatting. arXiv preprint arXiv:2401.15318 (2024)
  • [8] Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5501–5510 (2022)
  • [9] Gao, Q., Xu, Q., Su, H., Neumann, U., Xu, Z.: Strivec: Sparse tri-vector radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 17569–17579 (2023)
  • [10] Guédon, A., Lepetit, V.: Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. arXiv preprint arXiv:2311.12775 (2023)
  • [11] Hedman, P., Philip, J., Price, T., Frahm, J.M., Drettakis, G., Brostow, G.: Deep blending for free-viewpoint image-based rendering. ACM Transactions on Graphics (ToG) 37(6), 1–15 (2018)
  • [12] Hu, L., Zhang, H., Zhang, Y., Zhou, B., Liu, B., Zhang, S., Nie, L.: Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians. arXiv preprint arXiv:2312.02134 (2023)
  • [13] Hu, W., Wang, Y., Ma, L., Yang, B., Gao, L., Liu, X., Ma, Y.: Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19774–19783 (2023)
  • [14] Huang, X., Zhang, Q., Feng, Y., Li, H., Wang, X., Wang, Q.: Hdr-nerf: High dynamic range neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18398–18408 (2022)
  • [15] Jiang, Y., Tu, J., Liu, Y., Gao, X., Long, X., Wang, W., Ma, Y.: Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces. arXiv preprint arXiv:2311.17977 (2023)
  • [16] Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4) (2023)
  • [17] Knapitsch, A., Park, J., Zhou, Q.Y., Koltun, V.: Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG) 36(4), 1–13 (2017)
  • [18] Kuznetsov, A.: Neumip: Multi-resolution neural materials. ACM Transactions on Graphics (ToG) 40(4) (2021)
  • [19] Li, R., Tancik, M., Kanazawa, A.: Nerfacc: A general nerf acceleration toolbox. arXiv preprint arXiv:2210.04847 (2022)
  • [20] Liang, Z., Zhang, Q., Feng, Y., Shan, Y., Jia, K.: Gs-ir: 3d gaussian splatting for inverse rendering. arXiv preprint arXiv:2311.16473 (2023)
  • [21] Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: Barf: Bundle-adjusting neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5741–5751 (2021)
  • [22] Ma, L., Li, X., Liao, J., Zhang, Q., Wang, X., Wang, J., Sander, P.V.: Deblur-nerf: Neural radiance fields from blurry images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12861–12870 (2022)
  • [23] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: European Conference on Computer Vision. pp. 405–421 (2020)
  • [24] Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41(4), 1–15 (2022)
  • [25] Park, K., Sinha, U., Barron, J.T., Bouaziz, S., Goldman, D.B., Seitz, S.M., Martin-Brualla, R.: Nerfies: Deformable neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5865–5874 (2021)
  • [26] Petersen, K.B., Pedersen, M.S., et al.: The matrix cookbook. Technical University of Denmark 7(15),  510 (2008)
  • [27] Saito, S., Schwartz, G., Simon, T., Li, J., Nam, G.: Relightable gaussian codec avatars. arXiv preprint arXiv:2312.03704 (2023)
  • [28] Shi, Y., Wu, Y., Wu, C., Liu, X., Zhao, C., Feng, H., Liu, J., Zhang, L., Zhang, J., Zhou, B., et al.: Gir: 3d gaussian inverse rendering for relightable scene factorization. arXiv preprint arXiv:2312.05133 (2023)
  • [29] Sun, C., Sun, M., Chen, H.T.: Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5459–5469 (2022)
  • [30] Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021)
  • [31] Wu, L., Zhao, S., Yan, L.Q., Ramamoorthi, R.: Accurate appearance preserving prefiltering for rendering displacement-mapped surfaces. ACM Transactions on Graphics (ToG) 38(4), 1–14 (2019)
  • [32] Xie, T., Zong, Z., Qiu, Y., Li, X., Feng, Y., Yang, Y., Jiang, C.: Physgaussian: Physics-integrated 3d gaussians for generative dynamics. arXiv preprint arXiv:2311.12198 (2023)
  • [33] Xu, Q., Xu, Z., Philip, J., Bi, S., Shu, Z., Sunkavalli, K., Neumann, U.: Point-nerf: Point-based neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5438–5448 (2022)
  • [34] Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems 34, 4805–4815 (2021)
  • [35] Yu, A., Li, R., Tancik, M., Li, H., Ng, R., Kanazawa, A.: Plenoctrees for real-time rendering of neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5752–5761 (2021)
  • [36] Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-splatting: Alias-free 3d gaussian splatting. arXiv preprint arXiv:2311.16493 (2023)
  • [37] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
  • [38] Zheng, S., Zhou, B., Shao, R., Liu, B., Zhang, S., Nie, L., Liu, Y.: Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. arXiv preprint arXiv:2312.02155 (2023)
  • [39] Zhuang, Y., Zhang, Q., Feng, Y., Zhu, H., Yao, Y., Li, X., Cao, Y.P., Shan, Y., Cao, X.: Anti-aliased neural implicit surfaces with encoding level of detail. In: SIGGRAPH Asia 2023 Conference Papers. pp. 1–10 (2023)
  • [40] Zielonka, W., Bagautdinov, T., Saito, S., Zollhöfer, M., Thies, J., Romero, J.: Drivable 3d gaussian avatars. arXiv preprint arXiv:2311.08581 (2023)

Appendix 0.A Shading Module

Since we greatly improved the shading module, our implementation is quite different from the vanilla 3DGS, especially the backward part. In this section, we give the forward and backward propagation step by step so that everyone can learn the details and reproduce our shading module easily.

Refer to caption
(a) Integral over the original pixel area.
Refer to caption
(b) Integral over the rotated pixel area.
Figure 9: Example diagram of the pixel integration domain and the domain after rotation. The yellow lines in Fig. 9(a) are the coordinate axes of 2D screen space; And the yellow lines in Fig. 9(b) are the eigenvectors scaled by the eigenvalues.

0.A.1 Forward

Given a pixel with center 𝒖=[ux,uy]\bm{u}=[u_{x},u_{y}]^{\top}, we use a 2D Gaussian signal g2Dg^{\text{2D}} to respond to the pixel and shade it. Assuming that the 2D Gaussian signal g2Dg^{\text{2D}} has a mean vector 𝝁^2\hat{\bm{\mu}}\in\mathbb{R}^{2} and a real-symmetric covariance matrix Σ^2×2\hat{\Sigma}\in\mathbb{R}^{2\times 2}. For better formulation, we express them as:

𝝁^=[μ^xμ^y],𝚺^=[Σ^11Σ^12Σ^12Σ^22].\hat{\bm{\mu}}=\begin{bmatrix}\hat{\mu}_{x}\\ \hat{\mu}_{y}\end{bmatrix},\quad\hat{\bm{\Sigma}}=\begin{bmatrix}\hat{\Sigma}_{\text{11}}&\hat{\Sigma}_{\text{12}}\\ \hat{\Sigma}_{\text{12}}&\hat{\Sigma}_{\text{22}}\\ \end{bmatrix}. (18)

In Analytic-Splatting, we first perform eigendecomposition on 𝚺^\hat{\bm{\Sigma}} to achieve diagonalization. After the decomposition, we obtain the eigenvalues {λ1,λ2}\{\lambda_{1},\lambda_{2}\} and eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\} of 𝚺^\hat{\bm{\Sigma}}:

λ1=Tr(𝚺^)+Tr(𝚺^)24det(𝚺^)2,\displaystyle\lambda_{1}=\frac{\text{Tr}(\hat{\bm{\Sigma}})+\sqrt{\text{Tr}(\hat{\bm{\Sigma}})^{2}-4\det(\hat{\bm{\Sigma}})}}{2}, λ2=Tr(𝚺^)Tr(𝚺^)24det(𝚺^)2,\displaystyle\lambda_{2}=\frac{\text{Tr}(\hat{\bm{\Sigma}})-\sqrt{\text{Tr}(\hat{\bm{\Sigma}})^{2}-4\det(\hat{\bm{\Sigma}})}}{2}, (19)
𝒗^1=[Σ^12λ1Σ^11],\displaystyle\hat{\bm{v}}_{1}=\begin{bmatrix}\hat{\Sigma}_{\text{12}}\\ \lambda_{1}-\hat{\Sigma}_{\text{11}}\end{bmatrix}, 𝒗^2=[λ2Σ^22Σ^12],\displaystyle\hat{\bm{v}}_{2}=\begin{bmatrix}\lambda_{2}-\hat{\Sigma}_{\text{22}}\\ \hat{\Sigma}_{\text{12}}\end{bmatrix},
𝒗1=[v1xv1y]=𝒗^1𝒗^1=[Σ^12𝒗^1λ1Σ^11𝒗^1],\displaystyle\bm{v}_{1}=\begin{bmatrix}v_{1x}\\ v_{1y}\end{bmatrix}=\frac{\hat{\bm{v}}_{1}}{\|\hat{\bm{v}}_{1}\|}=\begin{bmatrix}\frac{\hat{\Sigma}_{\text{12}}}{\|\hat{\bm{v}}_{1}\|}\\ \frac{\lambda_{1}-\hat{\Sigma}_{\text{11}}}{\|\hat{\bm{v}}_{1}\|}\end{bmatrix}, 𝒗2=[v2xv2y]=𝒗^2𝒗^2=[λ2Σ^22𝒗^1Σ^12𝒗^1],\displaystyle\bm{v}_{2}=\begin{bmatrix}v_{2x}\\ v_{2y}\end{bmatrix}=\frac{\hat{\bm{v}}_{2}}{\|\hat{\bm{v}}_{2}\|}=\begin{bmatrix}\frac{\lambda_{2}-\hat{\Sigma}_{\text{22}}}{\|\hat{\bm{v}}_{1}\|}\\ \frac{\hat{\Sigma}_{\text{12}}}{\|\hat{\bm{v}}_{1}\|}\end{bmatrix},

where Tr(𝚺^)=Σ^11+Σ^22\text{Tr}(\hat{\bm{\Sigma}})=\hat{\Sigma}_{11}+\hat{\Sigma}_{22}, det(𝚺^)=Σ^11Σ^22Σ^122\det(\hat{\bm{\Sigma}})=\hat{\Sigma}_{11}\hat{\Sigma}_{22}-\hat{\Sigma}_{12}^{2}.

Then we use the eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\} to construct a new coordinate system with its origin at 𝝁^=[𝝁^x,𝝁^y]\hat{\bm{\mu}}=[\hat{\bm{\mu}}_{x},\hat{\bm{\mu}}_{y}]^{\top} (refer to the yellow lines in Fig. 3(b)) to unravel the correlation in the covariance 𝚺^\hat{\bm{\Sigma}}. In this way, the coordinate of the pixel center 𝒖\bm{u} is transformed into 𝒖~\tilde{\bm{u}}:

𝒖~=[u~xu~y]=[𝒗1𝒗2](𝒖𝝁^)=[v1xv1yv2xv2y][uxμ^xuyμ^y]=[v1x(uxμ^x)+v1y(uyμ^y)v2x(uxμ^x)+v2y(uyμ^y)],\begin{aligned} \tilde{\bm{u}}&=\begin{bmatrix}\tilde{u}_{x}\\ \tilde{u}_{y}\end{bmatrix}=\begin{bmatrix}-&\bm{v}_{1}&-\\ -&\bm{v}_{2}&-\end{bmatrix}\left(\bm{u}-\hat{\bm{\mu}}\right)=\begin{bmatrix}v_{1x}&v_{1y}\\ v_{2x}&v_{2y}\end{bmatrix}\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix}\\ &=\begin{bmatrix}v_{1x}(u_{x}-\hat{\mu}_{x})+v_{1y}(u_{y}-\hat{\mu}_{y})\\ v_{2x}(u_{x}-\hat{\mu}_{x})+v_{2y}(u_{y}-\hat{\mu}_{y})\end{bmatrix}\end{aligned}, (20)

then the intensity response of the pixel center can be written as:

g2D(𝒖)=exp(12λ1u~x2)exp(12λ2u~y2).g^{\text{2D}}(\bm{u})=\exp\left(-\frac{1}{2\lambda_{1}}\tilde{u}_{x}^{2}\right)\exp\left(-\frac{1}{2\lambda_{2}}\tilde{u}_{y}^{2}\right). (21)

In Sec. 4.1 of the main page, we propose to use a conditioned logistic function to approximate the cumulative distribution function (CDF) of the standard Gaussian distribution g(x)g(x) as:

G(x)=xg(u)du=x12πexp(u22)duS(x)=11+exp(1.6x0.07x3),\scriptsize\begin{aligned} G(x)=\int^{x}_{-\infty}g(u)\text{d}u=\int^{x}_{-\infty}\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{u^{2}}{2}\right)\text{d}u\approx S(x)=\frac{1}{1+\exp(-1.6\cdot x-0.07\cdot x^{3})}\end{aligned}, (22)

and for the Gaussian distribution with standard deviation σ1\sigma\neq 1, we use the reciprocal of σ\sigma to scale xx and express the logistic function SσS_{\sigma} as:

Gσ(x)\displaystyle G_{\sigma}(x) =x1σ2πexp(u22σ2)du\displaystyle=\int^{x}_{-\infty}\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{u^{2}}{2\sigma^{2}}\right)\text{d}u (23)
Sσ(x)=S(xσ)=11+exp(1.6xσ0.07(xσ)3).\displaystyle\approx S_{\sigma}(x)=S\left(\frac{x}{\sigma}\right)=\frac{1}{1+\exp\left(-1.6\cdot\frac{x}{\sigma}-0.07\cdot\left(\frac{x}{\sigma}\right)^{3}\right)}.

Given the CDF approximation Eq. 23, we approximate the response of 1-width window around sample xx as:

g(x)=x12x+121σ2πexp(u22σ2)du=Gσ(x+12)Gσ(x12)Sσ(x+12)Sσ(x12).\begin{aligned} \mathcal{I}_{g}(x)&=\int^{x+\frac{1}{2}}_{x-\frac{1}{2}}\frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{u^{2}}{2\sigma^{2}}\right)\text{d}u\\ &=G_{\sigma}\left(x+\frac{1}{2}\right)-G_{\sigma}\left(x-\frac{1}{2}\right)\approx S_{\sigma}\left(x+\frac{1}{2}\right)-S_{\sigma}\left(x-\frac{1}{2}\right)\end{aligned}. (24)

In Analytic-Splatting, we calculate the intensity response g2D\mathcal{I}^{\text{2D}}_{g} by approximating the integral of Gaussian signal Eq. 21 over the domain Ω~𝒖\tilde{\Omega}_{\bm{u}} in Fig. 9, and the integral over 2D domain Ω~𝒖\tilde{\Omega}_{\bm{u}} can be represented as the product of the integrals of two one-dimensional Gaussian signals:

g2D\displaystyle\mathcal{I}^{\text{2D}}_{g} Ω~𝒖g2D(𝒖)d𝒖\displaystyle\approx\int_{\tilde{\Omega}_{\bm{u}}}g^{\text{2D}}(\bm{u})\text{d}\bm{u} (25)
=u~x12u~x+12exp(12λ1x2)dxu~y12u~y+12exp(12λ2y2)dy\displaystyle=\int^{\tilde{u}_{x}+\frac{1}{2}}_{\tilde{u}_{x}-\frac{1}{2}}\exp\left(-\frac{1}{2\lambda_{1}}x^{2}\right)\text{d}x\int^{\tilde{u}_{y}+\frac{1}{2}}_{\tilde{u}_{y}-\frac{1}{2}}\exp\left(-\frac{1}{2\lambda_{2}}y^{2}\right)\text{d}y
=2πλ1u~x12u~x+1212πλ1exp(12λ1x2)dx2πλ2u~y12u~y+1212πλ2exp(12λ2y2)dy\displaystyle=\sqrt{2\pi\lambda_{1}}\int^{\tilde{u}_{x}+\frac{1}{2}}_{\tilde{u}_{x}-\frac{1}{2}}\frac{1}{\sqrt{2\pi\lambda_{1}}}\exp\left(-\frac{1}{2\lambda_{1}}x^{2}\right)\text{d}x\cdot\sqrt{2\pi\lambda_{2}}\int^{\tilde{u}_{y}+\frac{1}{2}}_{\tilde{u}_{y}-\frac{1}{2}}\frac{1}{\sqrt{2\pi\lambda_{2}}}\exp\left(-\frac{1}{2\lambda_{2}}y^{2}\right)\text{d}y
2πσ1[Sσ1(u~x+12)Sσ1(u~x12)]σ1σ2[Sσ2(u~y+12)Sσ2(u~y12)]σ2,\displaystyle\approx 2\pi\underbrace{\sigma_{1}\left[S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})-S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})\right]}_{\mathcal{I}_{\sigma_{1}}}\underbrace{\sigma_{2}\left[S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})-S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})\right]}_{\mathcal{I}_{\sigma_{2}}},

where σ1=λ1,σ2=λ2\sigma_{1}=\sqrt{\lambda_{1}},\sigma_{2}=\sqrt{\lambda_{2}} denote the standard derivations of the independent Gaussian signals along two eigenvectors respectively. In summary, the volume shading in Analytic-Splatting is given by:

𝑪(𝒖)\displaystyle\bm{C}(\bm{u}) =iNTigi2D(𝒖|𝝁i^,𝚺i^)αi𝒄i,Ti=j=1i1(1gj2D(𝒖|𝝁j^,𝚺j^)αj),\displaystyle=\sum_{i\in N}T_{i}\mathcal{I}_{g-i}^{\text{2D}}(\bm{u}|\hat{\bm{\mu}_{i}},\hat{\bm{\Sigma}_{i}})\alpha_{i}\bm{c}_{i},\quad T_{i}=\prod^{i-1}_{j=1}(1-\mathcal{I}_{g-j}^{\text{2D}}(\bm{u}|\hat{\bm{\mu}_{j}},\hat{\bm{\Sigma}_{j}})\alpha_{j}), (26)
g2D(𝒖)\displaystyle\mathcal{I}_{g}^{\text{2D}}(\bm{u}) =2πσ1σ2[Sσ1(u~x+12)Sσ1(u~x12)][Sσ2(u~y+12)Sσ2(u~y12)]\displaystyle=2\pi\sigma_{1}\sigma_{2}\left[S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})-S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})\right]\left[S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})-S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})\right]
=2πσ1σ2.\displaystyle=2\pi\mathcal{I}_{\sigma_{1}}\mathcal{I}_{\sigma_{2}}.

0.A.2 Backward

Derivation of the Conditioned Logistic Function Before introducing the backpropagation in Analytic-Splatting, we first give the derivation of the conditioned logistic function S(x)S(x) in Eq. 22:

S(x)x=(1.6+0.21x2)exp(1.6x0.07x3)[1+exp(1.6x0.07x3)]2=(1.6+0.21x2)S(x)[1S(x)],\begin{aligned} \frac{\partial S(x)}{\partial x}&=\left(1.6+0.21\cdot x^{2}\right)\frac{\exp\left(-1.6\cdot x-0.07\cdot x^{3}\right)}{\left[1+\exp\left(-1.6\cdot x-0.07\cdot x^{3}\right)\right]^{2}}\\ &=(1.6+0.21\cdot x^{2})S(x)[1-S(x)]\end{aligned}, (27)

further for the derivation of Sσ(x)S_{\sigma}(x) in Eq. 23, we can get Sσ(x)x\frac{\partial S_{\sigma}(x)}{\partial x} and Sσ(x)σ\frac{\partial S_{\sigma}(x)}{\partial\sigma} through the chain rule:

Sσ(x)x\displaystyle\frac{\partial S_{\sigma}(x)}{\partial x} =S(xσ)xσxσx=(1.6+0.21(xσ)2)Sσ(xσ)[1Sσ(xσ)]1σ\displaystyle=\frac{\partial S\left(\frac{x}{\sigma}\right)}{\partial\frac{x}{\sigma}}\frac{\partial\frac{x}{\sigma}}{\partial x}=\left(1.6+0.21\cdot\left(\frac{x}{\sigma}\right)^{2}\right)S_{\sigma}\left(\frac{x}{\sigma}\right)\left[1-S_{\sigma}\left(\frac{x}{\sigma}\right)\right]\cdot\frac{1}{\sigma} (28)
Sσ(x)σ\displaystyle\frac{\partial S_{\sigma}(x)}{\partial\sigma} =S(xσ)xσxσσ=(1.6+0.21(xσ)2)Sσ(xσ)[1Sσ(xσ)]xσ2\displaystyle=\frac{\partial S\left(\frac{x}{\sigma}\right)}{\partial\frac{x}{\sigma}}\frac{\partial\frac{x}{\sigma}}{\partial\sigma}=\left(1.6+0.21\cdot\left(\frac{x}{\sigma}\right)^{2}\right)S_{\sigma}\left(\frac{x}{\sigma}\right)\left[1-S_{\sigma}\left(\frac{x}{\sigma}\right)\right]\cdot-\frac{x}{\sigma^{2}}

Backpropogation in Shading Our backpropagation aims to derive the gradients of g2D\mathcal{I}_{g}^{\text{2D}} with respect to 𝝁^\hat{\bm{\mu}} and 𝚺^\hat{\bm{\Sigma}}, as g2D𝝁^\frac{\partial\mathcal{I}_{g}^{\text{2D}}}{\partial\hat{\bm{\mu}}} and g2D𝚺^\frac{\partial\mathcal{I}_{g}^{\text{2D}}}{\partial\hat{\bm{\Sigma}}}:

g2D𝝁^=[g2D/μ^xg2D/μ^y]2,g2D𝚺^=[g2D/Σ^11g2D/Σ^12g2D/Σ^12g2D/Σ^22]2×2.\frac{\partial\mathcal{I}_{g}^{\text{2D}}}{\partial\hat{\bm{\mu}}}=\begin{bmatrix}\nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\mu}_{x}}}\\ \nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\mu}_{y}}}\end{bmatrix}\in\mathbb{R}^{2},\quad\frac{\partial\mathcal{I}_{g}^{\text{2D}}}{\partial\hat{\bm{\Sigma}}}=\begin{bmatrix}\nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\Sigma}_{\text{11}}}}&&\nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\Sigma}_{\text{12}}}}\\ \nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\Sigma}_{\text{12}}}}&&\nicefrac{{\partial\mathcal{I}_{g}^{\text{2D}}}}{{\partial\hat{\Sigma}_{\text{22}}}}\\ \end{bmatrix}\in\mathbb{R}^{2\times 2}. (29)

It is quite difficult to express the above gradient directly. Still, we can use the chain rule to boil down the above results layer by layer to be available. We note that our key insight is to construct a new coordinate system using the mean vector 𝝁^\hat{\bm{\mu}}, eigenvalues {λ1,λ2}\{\lambda_{1},\lambda_{2}\} and eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\}. Therefore, we can use them as an intermediary to solve the final gradient by the chain rule. For the gradients of g2D\mathcal{I}_{g}^{\text{2D}} with respect to the mean vector 𝝁^\hat{\bm{\mu}}, according to Eq. 20 and Eq. 26, we have:

g2D𝝁^=\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\hat{\bm{\mu}}}= 2π(σ2σ1𝝁^+σ1σ2𝝁^)=2π(σ2σ1u~xu~x𝝁^+σ1σ2u~yu~y𝝁^)\displaystyle 2\pi\left(\mathcal{I}_{\sigma_{2}}\frac{\partial\mathcal{I}_{\sigma_{1}}}{\partial\hat{\bm{\mu}}}+\mathcal{I}_{\sigma_{1}}\frac{\partial\mathcal{I}_{\sigma_{2}}}{\partial\hat{\bm{\mu}}}\right)=2\pi\left(\mathcal{I}_{\sigma_{2}}\frac{\partial\mathcal{I}_{\sigma_{1}}}{\partial\tilde{u}_{x}}\frac{\partial\tilde{u}_{x}}{\partial\hat{\bm{\mu}}}+\mathcal{I}_{\sigma_{1}}\frac{\partial\mathcal{I}_{\sigma_{2}}}{\partial\tilde{u}_{y}}\frac{\partial\tilde{u}_{y}}{\partial\hat{\bm{\mu}}}\right) (30)
=\displaystyle= 2πσ2[Sσ1(u~x+12)u~xu~x𝝁^Sσ1(u~x12)u~xu~x𝝁^]+\displaystyle 2\pi\mathcal{I}_{\sigma_{2}}\left[\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})}{\partial\tilde{u}_{x}}\frac{\partial\tilde{u}_{x}}{\partial\hat{\bm{\mu}}}-\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})}{\partial\tilde{u}_{x}}\frac{\partial\tilde{u}_{x}}{\partial\hat{\bm{\mu}}}\right]+
2πσ1[Sσ2(u~y+12)u~yu~y𝝁^Sσ2(u~y12)u~yu~y𝝁^],\displaystyle 2\pi\mathcal{I}_{\sigma_{1}}\left[\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})}{\partial\tilde{u}_{y}}\frac{\partial\tilde{u}_{y}}{\partial\hat{\bm{\mu}}}-\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})}{\partial\tilde{u}_{y}}\frac{\partial\tilde{u}_{y}}{\partial\hat{\bm{\mu}}}\right],

where u~x𝝁^=[v1x,v1y]\frac{\partial\tilde{u}_{x}}{\partial\hat{\bm{\mu}}}=[-v_{1x},-v_{1y}]^{\top}, u~y𝝁^=[v2x,v2y]\frac{\partial\tilde{u}_{y}}{\partial\hat{\bm{\mu}}}=[-v_{2x},-v_{2y}]^{\top}, and the gradient Sσ(x)x\frac{\partial S_{\sigma}(x)}{\partial x} has been solved in Eq. 28.

According to Eq. 26, we have the gradients of g2D\mathcal{I}_{g}^{\text{2D}} to eigenvalues {λ1,λ2}\{\lambda_{1},\lambda_{2}\}:

g2Dλ1\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{1}} =g2Dσ1σ1λ1=2πσ2σ1σ112λ1\displaystyle=\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\sigma_{1}}\frac{\partial\sigma_{1}}{\partial\lambda_{1}}=2\pi\mathcal{I}_{\sigma_{2}}\frac{\partial\mathcal{I}_{\sigma_{1}}}{\partial\sigma_{1}}\cdot\frac{1}{2\sqrt{\lambda_{1}}} (31)
=πσ2λ1([Sσ1(u~x+12)Sσ1(u~x12)]+σ1[Sσ1(u~x+12)σ1Sσ1(u~x12)σ1]),\displaystyle=\frac{\pi\mathcal{I}_{\sigma_{2}}}{\sqrt{\lambda_{1}}}\left(\left[S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})-S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})\right]+\sigma_{1}\left[\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})}{\partial\sigma_{1}}-\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})}{\partial\sigma_{1}}\right]\right),
g2Dλ2\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{2}} =g2Dσ2σ2λ2=2πσ1σ2σ212λ2\displaystyle=\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\sigma_{2}}\frac{\partial\sigma_{2}}{\partial\lambda_{2}}=2\pi\mathcal{I}_{\sigma_{1}}\frac{\partial\mathcal{I}_{\sigma_{2}}}{\partial\sigma_{2}}\cdot\frac{1}{2\sqrt{\lambda_{2}}}
=πσ1λ2([Sσ2(u~y+12)Sσ2(u~y12)]+σ2[Sσ2(u~y+12)σ2Sσ2(u~y12)σ2]),\displaystyle=\frac{\pi\mathcal{I}_{\sigma_{1}}}{\sqrt{\lambda_{2}}}\left(\left[S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})-S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})\right]+\sigma_{2}\left[\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})}{\partial\sigma_{2}}-\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})}{\partial\sigma_{2}}\right]\right),

please note that the gradient Sσ(x)σ\frac{\partial S_{\sigma}(x)}{\partial\sigma} has been solved in Eq. 28. Given Eq. 20 and Eq. 26, the gradients of g2D\mathcal{I}_{g}^{\text{2D}} with respect to the eigenvectors {𝒗1,𝒗2}\{\bm{v}_{1},\bm{v}_{2}\} are:

g2D𝒗1=\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{1}}= g2Du~xu~x𝒗1=2πσ2σ1u~x[uxμ^xuyμ^y]\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\tilde{u}_{x}}\frac{\partial\tilde{u}_{x}}{\partial\bm{v}_{1}}=2\pi\mathcal{I}_{\sigma 2}\frac{\partial\mathcal{I}_{\sigma_{1}}}{\partial\tilde{u}_{x}}\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix} (32)
=\displaystyle= 2πσ2(Sσ1(u~x+12)u~xSσ1(u~x12)u~x)[uxμ^xuyμ^y]2,\displaystyle 2\pi\mathcal{I}_{\sigma 2}\left(\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}+\frac{1}{2})}{\partial\tilde{u}_{x}}-\frac{\partial S_{\sigma_{1}}(\tilde{u}_{x}-\frac{1}{2})}{\partial\tilde{u}_{x}}\right)\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix}\in\mathbb{R}^{2},
g2D𝒗2=\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{2}}= g2Du~yu~y𝒗2=2πσ1σ2u~y[uxμ^xuyμ^y]\displaystyle\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\tilde{u}_{y}}\frac{\partial\tilde{u}_{y}}{\partial\bm{v}_{2}}=2\pi\mathcal{I}_{\sigma 1}\frac{\partial\mathcal{I}_{\sigma_{2}}}{\partial\tilde{u}_{y}}\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix}
=\displaystyle= 2πσ1(Sσ2(u~y+12)u~ySσ2(u~y12)u~y)[uxμ^xuyμ^y]2.\displaystyle 2\pi\mathcal{I}_{\sigma 1}\left(\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}+\frac{1}{2})}{\partial\tilde{u}_{y}}-\frac{\partial S_{\sigma_{2}}(\tilde{u}_{y}-\frac{1}{2})}{\partial\tilde{u}_{y}}\right)\begin{bmatrix}u_{x}-\hat{\mu}_{x}\\ u_{y}-\hat{\mu}_{y}\end{bmatrix}\in\mathbb{R}^{2}.

In addition, please refer to [26] and learn the gradients of eigenvalues with respect to the covariance {λ1Σ^ij,λ2Σ^ij}\{\frac{\partial\lambda_{1}}{\partial\hat{\Sigma}_{ij}},\frac{\partial\lambda_{2}}{\partial\hat{\Sigma}_{ij}}\} and the gradients of eigenvectors with respect to the covariance {𝒗1Σ^ij,𝒗2Σ^ij}\{\frac{\partial\bm{v}_{1}}{\partial\hat{\Sigma}_{ij}},\frac{\partial\bm{v}_{2}}{\partial\hat{\Sigma}_{ij}}\}. According to Eq. 31 and Eq. 32, we get the final gradient:

g2DΣ^11=g2Dλ1λ1Σ^11+g2Dλ2λ2Σ^11+(g2D𝒗1)𝒗1Σ^11+(g2D𝒗2)𝒗2Σ^11g2DΣ^12=g2Dλ1λ1Σ^12+g2Dλ2λ2Σ^12+(g2D𝒗1)𝒗1Σ^12+(g2D𝒗2)𝒗2Σ^12g2DΣ^22=g2Dλ1λ1Σ^22+g2Dλ2λ2Σ^22+(g2D𝒗1)𝒗1Σ^22+(g2D𝒗2)𝒗2Σ^22.\small\begin{aligned} \frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\hat{\Sigma}_{11}}&=\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{1}}\frac{\partial\lambda_{1}}{\partial\hat{\Sigma}_{11}}+\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{2}}\frac{\partial\lambda_{2}}{\partial\hat{\Sigma}_{11}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{1}}\right)^{\top}\frac{\partial\bm{v}_{1}}{\partial\hat{\Sigma}_{11}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{2}}\right)^{\top}\frac{\partial\bm{v}_{2}}{\partial\hat{\Sigma}_{11}}\\ \frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\hat{\Sigma}_{12}}&=\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{1}}\frac{\partial\lambda_{1}}{\partial\hat{\Sigma}_{12}}+\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{2}}\frac{\partial\lambda_{2}}{\partial\hat{\Sigma}_{12}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{1}}\right)^{\top}\frac{\partial\bm{v}_{1}}{\partial\hat{\Sigma}_{12}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{2}}\right)^{\top}\frac{\partial\bm{v}_{2}}{\partial\hat{\Sigma}_{12}}\\ \frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\hat{\Sigma}_{22}}&=\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{1}}\frac{\partial\lambda_{1}}{\partial\hat{\Sigma}_{22}}+\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\lambda_{2}}\frac{\partial\lambda_{2}}{\partial\hat{\Sigma}_{22}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{1}}\right)^{\top}\frac{\partial\bm{v}_{1}}{\partial\hat{\Sigma}_{22}}+\left(\frac{\partial\mathcal{I}^{\text{2D}}_{g}}{\partial\bm{v}_{2}}\right)^{\top}\frac{\partial\bm{v}_{2}}{\partial\hat{\Sigma}_{22}}\\ \end{aligned}. (33)

Appendix 0.B Additional Results

In this section, we report more quantitative and qualitative results in detail. In addition, we provide the results of our method combined with 3D smoothing filter [36] for better study. According to the quantitative results of our experiments, the conclusion is that the 3D smoothing filter slightly improves the single-scale training and testing results. Still, it has no significant impact on the multi-scale training and testing results.

Mip-NeRF 360 Tanks&\&Temples Deep Blending
PSNR \uparrow SSIM \uparrow LPIPS \downarrow PSNR \uparrow SSIM \uparrow LPIPS \downarrow PSNR \uparrow SSIM \uparrow LPIPS \downarrow
Plenoxels [8] 23.08 0.625 0.463 21.07 0.721 0.379 23.06 0.795 0.510
INGP-Base [24] 25.30 0.671 0.371 21.72 0.734 0.330 23.62 0.797 0.423
INGP-Big [24] 25.59 0.699 0.331 21.92 0.752 0.305 24.96 0.817 0.390
Mip-NeRF 360 [3] 27.69 0.792 0.237 22.22 0.800 0.257 29.40 0.901 0.245
3DGS [16] 27.21 0.815 0.214 23.14 0.844 0.183 29.41 0.903 0.243
Mip-Splatting [36] 27.57 0.817 0.218 23.78 0.851 0.178 29.69 0.904 0.248
Ours 27.58 0.816 0.217 23.84 0.851 0.177 29.75 0.905 0.248
Ours + 3D filter 27.58 0.818 0.217 23.91 0.853 0.177 29.71 0.906 0.247
Table 4: Quantitative comparison of our method against previous methods over three datasets [3, 17, 11]. All methods are trained on full-resolution images and tested on the same-resolution images.
PSNR \uparrow SSIM \uparrow LPIPS \downarrow
Truck Train Johnson Playroom Truck Train Johnson Playroom Truck Train Johnson Playroom
Plenoxels [8] 23.22 18.93 23.14 22.98 0.774 0.663 0.787 0.802 0.335 0.422 0.521 0.499
INGP-Base [24] 23.26 20.17 27.75 19.48 0.779 0.666 0.839 0.754 0.274 0.386 0.381 0.465
INGP-Big [24] 23.83 20.46 28.26 21.67 0.800 0.689 0.854 0.779 0.249 0.360 0.352 0.428
Mip-NeRF 360 [3] 24.91 19.52 29.14 29.66 0.857 0.660 0.901 0.900 0.159 0.354 0.237 0.252
3DGS [16] 25.19 21.09 28.77 30.04 0.879 0.802 0.899 0.906 0.148 0.218 0.244 0.241
Mip-Splatting [36] 25.48 22.09 29.09 30.28 0.884 0.818 0.901 0.908 0.149 0.208 0.248 0.248
Ours 25.48 22.20 29.18 30.32 0.883 0.819 0.901 0.909 0.148 0.207 0.247 0.248
Ours + 3D filter 25.60 22.22 29.17 30.25 0.887 0.820 0.903 0.909 0.147 0.206 0.247 0.248
Table 5: Quantitative comparison of our method against previous methods on Tanks&Temples [17] and Deep Blending [11]. All methods are trained on full-resolution images and tested on the same-resolution images.
PSNR \uparrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
Plenoxels [8] 21.91 20.10 23.49 20.66 22.25 27.59 23.62 23.42 24.67 23.08
INGP-Base [24] 22.19 20.35 24.60 23.63 22.36 29.27 26.44 28.55 30.34 25.30
INGP-Big [24] 22.17 20.65 25.07 23.47 22.37 29.69 26.69 29.48 30.69 25.59
Mip-NeRF 360 [3] 24.37 21.73 26.98 26.40 22.87 31.63 29.55 32.23 33.46 27.69
3DGS [16] 25.25 21.52 27.41 26.55 22.49 30.63 28.70 30.32 31.98 27.21
Mip-Splatting [36] 25.31 21.62 27.45 26.62 22.62 31.62 29.11 31.53 32.30 27.57
Ours 25.18 21.61 27.39 26.65 22.54 31.75 29.11 31.56 32.43 27.58
Ours + 3D filter 25.32 21.64 27.51 26.68 22.59 31.66 29.04 31.50 32.30 27.58
  3DGS* [16] 25.63 21.77 27.70 26.87 22.75 31.69 29.08 31.56 32.29 27.76
Mip-Splatting* [36] 25.72 21.93 27.76 26.94 22.98 31.74 29.16 31.55 32.31 27.79
Ours* 25.63 21.92 27.73 26.92 22.79 31.89 29.24 31.66 32.60 27.82
Ours + 3D filter* 25.70 21.92 27.78 26.98 22.95 31.86 29.17 31.74 32.55 27.85
SSIM \uparrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
Plenoxels [8] 0.496 0.431 0.606 0.523 0.509 0.842 0.759 0.648 0.814 0.625
INGP-Base [24] 0.491 0.450 0.649 0.574 0.518 0.855 0.798 0.818 0.890 0.671
INGP-Big [24] 0.512 0.486 0.701 0.594 0.542 0.871 0.817 0.858 0.906 0.699
Mip-NeRF 360 [3] 0.685 0.583 0.813 0.744 0.632 0.913 0.894 0.920 0.941 0.792
3DGS [16] 0.771 0.605 0.868 0.775 0.638 0.914 0.905 0.922 0.938 0.815
Mip-Splatting [36] 0.767 0.608 0.868 0.776 0.636 0.920 0.909 0.928 0.943 0.817
Ours 0.763 0.606 0.866 0.772 0.633 0.921 0.910 0.928 0.943 0.816
Ours + 3D filter 0.769 0.608 0.869 0.773 0.636 0.921 0.909 0.928 0.943 0.817
  3DGS* [16] 0.777 0.620 0.871 0.784 0.655 0.927 0.916 0.933 0.948 0.825
Mip-Splatting* [36] 0.780 0.623 0.875 0.786 0.655 0.928 0.916 0.933 0.948 0.827
Ours* 0.777 0.623 0.873 0.783 0.651 0.929 0.917 0.933 0.943 0.826
Ours + 3D filter* 0.780 0.624 0.876 0.785 0.654 0.929 0.917 0.934 0.949 0.827
LPIPS \downarrow
bicycle flowers garden stump treehill room counter kitchen bonsai Avg.
Plenoxels [8] 0.506 0.521 0.386 0.503 0.540 0.419 0.441 0.447 0.398 0.463
INGP-Base [24] 0.487 0.481 0.312 0.450 0.489 0.301 0.342 0.254 0.227 0.371
INGP-Big [24] 0.446 0.441 0.257 0.421 0.450 0.261 0.306 0.195 0.205 0.331
Mip-NeRF 360 [3] 0.301 0.344 0.170 0.261 0.339 0.221 0.204 0.127 0.176 0.237
3DGS [16] 0.205 0.336 0.103 0.210 0.317 0.220 0.204 0.129 0.205 0.214
Mip-Splatting [36] 0.213 0.340 0.108 0.216 0.329 0.221 0.201 0.127 0.208 0.218
Ours 0.212 0.336 0.110 0.218 0.328 0.221 0.200 0.127 0.206 0.217
Ours + 3D filter 0.211 0.340 0.108 0.218 0.327 0.220 0.202 0.127 0.207 0.218
  3DGS* [16] 0.205 0.329 0.103 0.208 0.318 0.192 0.178 0.113 0.174 0.202
Mip-Splatting* [36] 0.206 0.331 0.103 0.209 0.320 0.192 0.179 0.113 0.173 0.203
Ours* 0.207 0.329 0.105 0.210 0.320 0.194 0.180 0.114 0.176 0.204
Ours + 3D filter* 0.206 0.333 0.104 0.210 0.321 0.194 0.181 0.114 0.177 0.204
Table 6: Quantative results of Single-Scale Training and Single-Scale Testing on the Mip-NeRF 360 [3] dataset. All methods are trained on full-resolution images and tested on the same-resolution images.

0.B.1 Single-scale Training and Single-scale Testing on Scene Datasets.

We evaluate our Analytic-Splatting against other methods on complex scene datasets (i.e. Mip-NeRF 360 [3], Tanks&Temples [17], and Deep Blending [11]) under the single-scale training and testing setting, which is the most widely used setting. The overall results are shown in Tab. 4, our method shows great generalization across different datasets and almost outperforms other methods.

Moreover, we report per-scene metrics for Tanks&Temples [17] and Deep Blending [11] in Tab. 5, and for Mip-NeRF 360 [3] in Tab. 6.

For Mip-NeRF 360, images from indoor and outdoor scenes are downsampled by 2×2\times and 4×4\times, respectively, as full-resolution input for training and testing. The official dataset provides the downsampled images and stores them in different folders. Specifically, the results reported in Mip-Splatting did not use officially provided downsampled images as input, but use the bicubically downsampled images as input. The quantitative results in Tab. 6 show these two downsampling schemes greatly affect the metrics. Therefore, for fairness, we mark the methods that use bicubically downsampled images for training with * (i.e. 3DGS*, Mip-Splatting*, and Ours*). And the remaining methods without * marks use officially provided downsampled images for training.

0.B.2 Multi-scale Training and Multi-scale Testing on the Multi-scale Blender Synthetic Dataset

We evaluate our Analytic-Splatting against other cutting-edge methods on the Blender Synthetic dataset under the multi-scale training and testing setting. Since per-resolution metrics have been mentioned in the main paper, we report per-object metrics in Tab. 8 for more comprehensive comparisons. More qualitative results are shown in Fig. 10. Our method almost surpasses other methods and performs better than other methods in terms of detail capturing and anti-aliasing. We further provide our method’s per-resolution and per-object metrics in Tab. 7 so that subsequent methods can better refer to our results.

PSNR \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
Full Res. 35.76 26.16 35.76 37.54 35.06 29.59 35.18 30.76 33.22
1/2\nicefrac{{1}}{{2}} Res. 38.48 27.30 36.46 39.46 36.46 31.05 37.61 32.54 34.92
1/4\nicefrac{{1}}{{4}} Res. 39.58 28.61 36.21 40.64 36.53 32.77 39.33 34.12 35.97
1/8\nicefrac{{1}}{{8}} Res. 39.24 29.87 36.01 40.24 34.93 33.54 39.02 35.10 35.99
All. 38.26 27.98 36.11 39.47 35.75 31.74 37.78 33.13 35.03
SSIM \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
Full Res. 0.986 0.952 0.988 0.984 0.980 0.958 0.990 0.901 0.967
1/2\nicefrac{{1}}{{2}} Res. 0.993 0.960 0.993 0.990 0.989 0.974 0.994 0.926 0.977
1/4\nicefrac{{1}}{{4}} Res. 0.995 0.952 0.988 0.984 0.992 0.986 0.995 0.948 0.984
1/8\nicefrac{{1}}{{8}} Res. 0.995 0.977 0.994 0.995 0.992 0.992 0.997 0.967 0.989
All. 0.992 0.964 0.992 0.991 0.988 0.977 0.994 0.936 0.979
LPIPS \downarrow
chair drums ficus hotdog lego materials mic ship Avg.
Full Res. 0.015 0.040 0.011 0.023 0.020 0.039 0.007 0.111 0.033
1/2\nicefrac{{1}}{{2}} Res. 0.007 0.029 0.006 0.011 0.009 0.018 0.004 0.065 0.019
1/4\nicefrac{{1}}{{4}} Res. 0.006 0.026 0.006 0.006 0.007 0.010 0.004 0.036 0.013
1/8\nicefrac{{1}}{{8}} Res. 0.005 0.022 0.006 0.005 0.008 0.008 0.004 0.021 0.010
All. 0.008 0.029 0.007 0.011 0.011 0.018 0.005 0.058 0.018
Table 7: Quantatitive Comparison of Analytic-Splatting against 3DGS and its variants under the 2×2\times super-resolution setting on the Mip-NeRF 360 dataset [3]. These methods are trained on images with down-sampling rates covering [1, 2, 4, 8].
PSNR \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
NeRF w/o area\mathcal{L}_{\text{area}} 29.92 23.27 27.15 32.00 27.75 26.30 28.40 26.46 27.66
NeRF [23] 33.39 25.87 30.37 35.64 31.65 30.18 32.60 30.09 31.23
MipNeRF [2] 37.14 27.02 33.19 39.31 35.74 32.56 38.04 33.08 34.51
Plenoxels [8] 32.79 25.25 30.28 34.65 31.26 28.33 31.53 28.59 30.34
TensoRF [5] 32.47 25.37 31.16 34.96 31.73 28.53 31.48 29.08 30.60
Instant-NGP [24] 32.95 26.43 30.41 35.87 31.83 29.31 32.58 30.23 31.20
Tri-MipRF [13] 37.67 27.35 33.57 38.78 35.72 31.42 37.63 32.74 34.36
3DGS [16] 32.73 25.30 29.00 35.03 29.44 27.13 31.17 28.33 29.77
3DGS-SS [16] 35.62 27.02 33.12 37.46 33.27 29.90 34.69 30.63 32.71
Mip-Splatting [36] 37.48 27.74 34.71 39.15 35.07 31.88 37.68 32.80 34.56
Ours 38.26 27.98 36.11 39.47 35.75 31.74 37.78 33.13 35.03
Ours + 3D filter 37.53 27.77 35.85 39.17 35.26 31.80 37.61 32.95 34.74
SSIM \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
NeRF w/o area\mathcal{L}_{\text{area}} 0.944 0.891 0.942 0.959 0.926 0.934 0.958 0.861 0.927
NeRF [23] 0.971 0.932 0.971 0.979 0.965 0.967 0.980 0.900 0.958
MipNeRF [2] 0.988 0.945 0.984 0.988 0.984 0.977 0.993 0.922 0.973
Plenoxels [8] 0.968 0.929 0.972 0.976 0.964 0.959 0.979 0.892 0.955
TensoRF [5] 0.967 0.930 0.972 0.976 0.964 0.959 0.979 0.892 0.955
Instant-NGP [24] 0.971 0.940 0.973 0.979 0.966 0.959 0.981 0.904 0.959
Tri-MipRF [13] 0.990 0.951 0.985 0.988 0.986 0.969 0.992 0.929 0.974
3DGS [16] 0.976 0.941 0.968 0.982 0.964 0.956 0.979 0.910 0.960
3DGS-SS [16] 0.988 0.958 0.985 0.988 0.982 0.973 0.990 0.928 0.974
Mip-Splatting [36] 0.991 0.963 0.990 0.990 0.987 0.978 0.994 0.936 0.979
Ours 0.992 0.964 0.992 0.991 0.988 0.977 0.994 0.936 0.979
Ours + 3D filter 0.991 0.963 0.990 0.990 0.987 0.977 0.994 0.936 0.979
LPIPS \downarrow
chair drums ficus hotdog lego materials mic ship Avg.
NeRF w/o area\mathcal{L}_{\text{area}} 0.035 0.069 0.032 0.028 0.041 0.045 0.031 0.095 0.052
NeRF [23] 0.028 0.059 0.026 0.024 0.035 0.033 0.025 0.085 0.044
MipNeRF [2] 0.011 0.044 0.014 0.012 0.013 0.019 0.007 0.062 0.026
Plenoxels [8] 0.040 0.070 0.032 0.037 0.038 0.055 0.036 0.104 0.051
TensoRF [5] 0.042 0.070 0.032 0.037 0.038 0.055 0.036 0.104 0.051
Instant-NGP [24] 0.035 0.066 0.029 0.028 0.040 0.051 0.032 0.095 0.047
Tri-MipRF [13] 0.011 0.046 0.016 0.014 0.013 0.033 0.008 0.069 0.026
3DGS [16] 0.025 0.056 0.030 0.022 0.038 0.040 0.023 0.086 0.040
3DGS-SS [16] 0.013 0.036 0.014 0.014 0.017 0.023 0.008 0.068 0.024
Mip-Splatting [36] 0.010 0.031 0.009 0.011 0.012 0.018 0.005 0.059 0.019
Ours 0.008 0.029 0.007 0.011 0.011 0.018 0.005 0.058 0.018
Ours + 3D filter 0.009 0.031 0.009 0.011 0.012 0.019 0.005 0.059 0.019
Table 8: Quantatitive Comparison of Analytic-Splatting against several cutting-edge methods on the Multi-scale Blender Synthetic dataset [2]. We report the metrics for each object in this table and all methods are trained on images only from the training set with downsampling rates covering [1, 2, 4, 8].
Refer to caption
Figure 10: Qualitative comparison of full-resolution and low-resolution (1/8\nicefrac{{1}}{{8}}) on Multi-Scale Blender [2]. All methods are trained on images only from the training set with downsampling rates covering [1, 2, 4, 8]. Our method can better overcome the artifacts in 3DGS with better fidelity of details.

0.B.3 Multi-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset

We evaluate our Analytic-Splatting against other methods on the Mip-NeRF 360 dataset under the multi-scale training and testing setting. As mentioned in Sec. 0.B.2, we use the officially provided downsampled images (2×2\times for indoor scenes, and 4×4\times for outdoor scenes) as full-resolution images. Under the multi-scale training and testing setting, we convert each full-resolution image into a set of four images for training via bicubically downsampling it by [1×,2×,4×,8×][1\times,2\times,4\times,8\times]. Some qualitative results are shown in Fig. 11. Our method performs better anti-aliasing capability and detail fidelity.

Refer to caption
Figure 11: Qualitative comparison of full-resolution and low-resolution (1/8\nicefrac{{1}}{{8}}) on Multi-Scale Mip-NeRF 360 [3]. All methods are trained on images with downsampling rates covering [1, 2, 4, 8]. Our method can better overcome the artifacts in 3DGS with better fidelity of details.
Outdoors
PSNR\uparrow bicycle flowers garden stump treehill
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 24.51 26.93 28.53 29.24 21.64 23.90 26.01 27.35 26.71 29.59 31.35 32.52 26.27 27.68 28.82 29.27 22.93 24.63 26.06 27.12
Mip-NeRF 360 + iNGP 24.61 26.98 26.69 24.50 21.93 24.14 24.90 23.19 26.48 29.06 27.54 24.85 26.41 27.63 27.62 25.64 23.19 24.86 25.55 24.81
Zip-NeRF [4] 25.57 28.25 30.20 31.37 22.37 24.91 27.51 29.50 27.71 30.53 32.60 33.83 27.17 28.62 30.30 31.73 23.63 25.47 27.27 28.84
3DGS [16] 24.19 26.23 26.46 25.83 20.96 23.07 24.54 23.93 26.16 28.54 29.13 28.66 25.84 27.24 27.96 27.64 22.50 24.13 25.31 25.35
3DGS-SS 20.96 27.16 28.25 27.95 21.51 23.07 25.89 26.11 26.81 28.54 30.71 30.91 26.56 27.24 29.36 29.69 22.67 24.13 25.81 26.62
Mip-Splatting [36] 24.90 27.24 28.81 29.10 21.42 23.75 26.13 28.19 26.69 29.37 30.92 31.66 26.49 27.94 29.58 31.17 22.52 24.36 26.22 27.84
Ours 25.20 27.39 28.97 29.81 21.76 24.05 26.52 28.57 27.13 29.53 31.20 32.19 26.74 28.15 29.90 31.50 22.70 24.41 26.13 27.67
Ours + 3D filter 25.32 27.50 29.04 29.88 21.79 24.04 26.49 28.52 27.19 29.52 31.14 32.08 26.80 28.15 29.87 31.48 22.72 24.33 25.97 27.49
SSIM\uparrow bicycle flowers garden stump treehill
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 0.666 0.815 0.890 0.912 0.567 0.727 0.834 0.881 0.791 0.903 0.939 0.959 0.726 0.819 0.874 0.882 0.615 0.748 0.839 0.893
Mip-NeRF 360 + iNGP 0.673 0.825 0.857 0.773 0.592 0.742 0.805 0.763 0.786 0.904 0.864 0.767 0.748 0.830 0.849 0.770 0.616 0.736 0.785 0.762
Zip-NeRF [4] 0.758 0.872 0.926 0.948 0.635 0.774 0.864 0.914 0.850 0.929 0.960 0.974 0.791 0.865 0.914 0.939 0.671 0.780 0.865 0.922
3DGS [16] 0.703 0.831 0.864 0.855 0.545 0.690 0.784 0.795 0.810 0.904 0.920 0.919 0.729 0.810 0.850 0.830 0.602 0.725 0.811 0.839
3DGS-SS 0.736 0.849 0.902 0.911 0.585 0.724 0.824 0.864 0.834 0.920 0.947 0.956 0.763 0.837 0.887 0.891 0.620 0.738 0.830 0.878
Mip-Splatting [36] 0.739 0.849 0.912 0.940 0.591 0.724 0.825 0.891 0.832 0.917 0.949 0.966 0.768 0.837 0.892 0.932 0.619 0.737 0.839 0.905
Ours 0.750 0.855 0.913 0.940 0.601 0.732 0.834 0.898 0.847 0.921 0.951 0.966 0.772 0.842 0.899 0.933 0.627 0.739 0.835 0.899
Ours + 3D filter 0.754 0.858 0.915 0.941 0.602 0.733 0.835 0.897 0.848 0.922 0.951 0.965 0.773 0.842 0.898 0.933 0.628 0.739 0.834 0.898
LPIPS\downarrow bicycle flowers garden stump treehill
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 0.322 0.177 0.089 0.066 0.367 0.215 0.114 0.071 0.194 0.079 0.045 0.029 0.279 0.171 0.114 0.107 0.362 0.236 0.144 0.096
Mip-NeRF 360 + iNGP 0.313 0.166 0.128 0.169 0.344 0.192 0.124 0.137 0.192 0.079 0.107 0.176 0.254 0.156 0.137 0.180 0.344 0.223 0.171 0.182
Zip-NeRF [4] 0.222 0.112 0.061 0.041 0.287 0.156 0.083 0.050 0.129 0.055 0.030 0.020 0.206 0.122 0.077 0.057 0.263 0.163 0.103 0.068
3DGS [16] 0.295 0.180 0.113 0.100 0.404 0.288 0.184 0.147 0.197 0.085 0.059 0.054 0.284 0.186 0.130 0.134 0.398 0.279 0.185 0.140
3DGS-SS 0.257 0.144 0.078 0.067 0.365 0.251 0.154 0.107 0.165 0.065 0.038 0.032 0.243 0.150 0.097 0.088 0.364 0.250 0.160 0.110
Mip-Splatting [36] 0.258 0.153 0.083 0.050 0.363 0.262 0.165 0.093 0.169 0.073 0.045 0.027 0.233 0.156 0.106 0.072 0.373 0.265 0.172 0.105
Ours 0.239 0.134 0.070 0.047 0.344 0.239 0.148 0.084 0.141 0.061 0.037 0.027 0.224 0.138 0.087 0.062 0.349 0.244 0.160 0.098
Ours + 3D filter 0.237 0.134 0.070 0.048 0.347 0.241 0.147 0.086 0.141 0.061 0.037 0.027 0.224 0.140 0.089 0.063 0.350 0.245 0.161 0.100
Indoors
PSNR\uparrow room counter kitchen bonsai
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 31.44 32.53 33.17 32.96 29.30 30.12 30.81 30.52 31.90 33.39 34.69 34.92 32.85 33.97 34.63 33.80
Mip-NeRF 360 + iNGP 30.93 31.83 31.66 29.52 24.30 24.66 24.81 24.06 30.13 31.25 29.85 26.14 30.20 30.90 30.39 27.49
Zip-NeRF [4] 32.20 33.33 34.12 34.26 29.17 29.93 30.70 31.11 32.33 33.76 35.20 35.71 34.08 35.25 36.18 36.32
3DGS [16] 30.53 31.42 31.46 29.81 28.25 28.91 29.21 27.66 29.90 31.04 31.50 29.57 30.63 31.42 31.02 31.58
3DGS-SS 31.12 32.13 32.75 32.12 28.81 29.47 30.16 29.85 30.84 32.05 33.13 32.59 31.57 32.44 32.96 31.58
Mip-Splatting [36] 31.32 32.26 32.79 32.88 28.91 29.50 30.03 30.32 31.11 32.05 32.33 32.79 31.48 32.21 32.26 31.97
Ours 31.26 32.23 32.92 33.07 29.03 29.65 30.39 30.90 31.44 32.57 33.56 33.78 32.12 32.92 33.60 33.39
Ours + 3D filter 31.32 32.28 32.98 33.13 29.03 29.64 30.38 30.88 31.07 32.15 33.09 33.25 31.97 32.73 33.42 33.26
SSIM\uparrow room counter kitchen bonsai
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 0.906 0.944 0.963 0.967 0.887 0.916 0.936 0.942 0.916 0.949 0.968 0.975 0.935 0.959 0.969 0.968
Mip-NeRF 360 + iNGP 0.904 0.941 0.950 0.932 0.816 0.837 0.843 0.819 0.903 0.938 0.904 0.773 0.920 0.941 0.937 0.874
Zip-NeRF [4] 0.921 0.955 0.971 0.977 0.899 0.926 0.944 0.955 0.926 0.956 0.975 0.982 0.947 0.968 0.978 0.980
3DGS [16] 0.903 0.936 0.952 0.947 0.886 0.912 0.928 0.920 0.907 0.941 0.956 0.950 0.924 0.948 0.954 0.938
3DGS-SS 0.911 0.943 0.962 0.965 0.898 0.922 0.941 0.947 0.919 0.949 0.968 0.938 0.933 0.955 0.967 0.966
Mip-Splatting [36] 0.913 0.944 0.962 0.969 0.899 0.921 0.936 0.949 0.920 0.948 0.960 0.974 0.935 0.954 0.960 0.966
Ours 0.914 0.946 0.964 0.971 0.902 0.924 0.942 0.955 0.924 0.951 0.967 0.974 0.939 0.958 0.969 0.972
Ours + 3D filter 0.915 0.946 0.964 0.971 0.903 0.924 0.943 0.955 0.924 0.951 0.967 0.973 0.939 0.958 0.969 0.972
LPIPS\downarrow room counter kitchen bonsai
1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times 1×1\times 2×2\times 4×4\times 8×8\times
Mip-NeRF 360 [3] 0.227 0.101 0.052 0.042 0.216 0.114 0.068 0.059 0.134 0.063 0.033 0.023 0.185 0.065 0.033 0.033
Mip-NeRF 360 + iNGP 0.220 0.105 0.072 0.095 0.275 0.195 0.163 0.182 0.145 0.074 0.084 0.188 0.190 0.082 0.063 0.126
Zip-NeRF [4] 0.199 0.084 0.041 0.028 0.189 0.095 0.055 0.039 0.117 0.055 0.028 0.018 0.173 0.052 0.023 0.017
3DGS [16] 0.254 0.127 0.066 0.053 0.235 0.128 0.078 0.067 0.159 0.081 0.046 0.041 0.234 0.104 0.056 0.051
3DGS-SS 0.241 0.114 0.054 0.039 0.217 0.112 0.066 0.050 0.142 0.068 0.034 0.024 0.220 0.092 0.044 0.032
Mip-Splatting [36] 0.235 0.115 0.062 0.040 0.213 0.116 0.077 0.043 0.138 0.075 0.047 0.027 0.214 0.095 0.058 0.037
Ours 0.234 0.111 0.052 0.035 0.208 0.109 0.065 0.045 0.134 0.065 0.037 0.028 0.210 0.088 0.042 0.029
Ours + 3D filter 0.233 0.110 0.052 0.034 0.209 0.110 0.065 0.045 0.134 0.065 0.037 0.029 0.209 0.088 0.042 0.030
Table 9: Quantative comparisons of Analyti-Splatting against several cutting-edge methods on the multi-scale Mip-NeRF 360 dataset [3, 4]. These methods conduct multi-scale training and testing.

We further provide per-resolution and per-scene metrics in Tab. 9. The results of Mip-NeRF 360 [3] and Zip-NeRF [4] are reported from the official Zip-NeRF paper [4]. Please note that Mip-NeRF 360 and Zip-NeRF struggle with real-time rendering, and our Analytic-Splatting, like 3DGS and its variants, is capable of real-time rendering.

0.B.4 Approximation Error Analysis

In Sec. 5.1 of the main page, we study the approximation error produced by different schemes. In this analysis, we concentrate on the standard deviation σ[0.3,6.6]\sigma\in[0.3,6.6] and the samples within the 99%99\% confidence interval (i.e. x<3σ\|x\|<3\sigma). We plot the curve of the approximation error under different conditions.

In detail, we sample uniformly in the logarithmic coordinate system of [0.3,6.6][0.3,6.6] to provide the standard deviation samples and plot the approximation error to the standard deviation as Fig. 5(a) in the main page. For the approximation error caused by the rotation of the integral domain, to calculate the integral of the original integral domain (Fig. 9(a)) as reference results, we perform Monte Carlo sampling in the original integral domain and calculate the average response of 6553665536 samples as the integration reference.