Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

Dongting Hu
University of Melbourne
Zhenkai Zhang
University of Melbourne
Tingbo Hou
Google
Tongliang Liu
University of Sydney
Huan Fu^∗
Alibaba Group
Mingming Gong
University of Melbourne
Equal contribution

Abstract

The rendering scheme in neural radiance field (NeRF) is effective in rendering a pixel by casting a ray into the scene. However, NeRF yields blurred rendering results when the training images are captured at non-uniform scales, and produces aliasing artifacts if the test images are taken in distant views. To address this issue, Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information. Nevertheless, this approach is only suitable for offline rendering since it relies on integrated positional encoding (IPE) to query a multilayer perceptron (MLP). To overcome this limitation, we propose mip voxel grids (Mip-VoG), an explicit multiscale representation with a deferred architecture for real-time anti-aliasing rendering. Our approach includes a density Mip-VoG for scene geometry and a feature Mip-VoG with a small MLP for view-dependent color. Mip-VoG represents scene scale using the level of detail (LOD) derived from ray differentials and uses quadrilinear interpolation to map a queried 3D location to its features and density from two neighboring down-sampled voxel grids. To our knowledge, our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously. We conducted experiments on multiscale dataset, results show that our approach outperforms state-of-the-art real-time rendering baselines.

1 Introduction

The realm of computer vision and graphics is marked by the captivating yet formidable challenge of novel view synthesis. In recent times, neural volumetric representations, most notably the neural radiance field (NeRF) [35], have emerged as a promising breakthrough in reconstructing intricate 3D scenes from multi-view image collections. NeRF employs a coordinate-based multilayer perceptron (MLP) architecture to map a 5D input coordinate (including 3D spatial position and 2D viewing direction) to intrinsic scene attributes (namely, volume density and view-dependent emitted radiance) at that precise location. The pixel rendering process in NeRF involves casting a ray through the pixel into the scene, extracting the scene representation for points sampled along the ray, and ultimately fusing these components to produce the final color output.

While this rendering methodology excels when the training and testing images share a uniform resolution, challenges arise when the training images encompass varying resolutions. This discrepancy in resolutions leads NeRF to produce blurred rendering outputs due to the altered pixel footprints originating from diverse scales. Besides, in case that test viewpoints significantly deviate from the spatial distance of the training views, the sample rate for per pixel would be inadequacy, thereby results in aliasing artifacts.

Refer to caption — Table 1: Our method is the first one concurrently addresses multiscale training, real-time and anti-aliasing rending.

Method	Multiscale
Method	Real-time
	Anti-aliasing
	Rendering
Mip-NeRF [3]	✓		✓
SNeRG [21]		✓
MobileNeRF [10]		✓	✓
Ours	✓	✓	✓

	$\displaystyle\hat{C}_{d}(r)$	$\displaystyle=\sum_{i=1}^{N}T_{i}\Big{(}1-\exp\big{(}-\sigma(t_{i})\delta(t_{i})\big{)}\Big{)}c_{d}(t_{i}),$		(3)
	$\displaystyle F_{s}(r)$	$\displaystyle=\sum_{i=1}^{N}T_{i}\Big{(}1-\exp\big{(}-\sigma(t_{i})\delta(t_{i})\big{)}\Big{)}f_{s}(t_{i}).$		(4)

Method	PSNR $\uparrow$				SSIM $\uparrow$				LPIPS $\downarrow$
Method	Full Res	1/2 Res	1/4 Res	1/8 Res	Full Res	1/2 Res	1/4 Res	1/8 Res	Full Res	1/2 Res	1/4 Res	1/8 Res
Mip-NeRF [3]	32.629	34.336	35.471	35.602	0.958	0.970	0.979	0.983	0.047	0.026	0.017	0.012
SNeRG [21]	27.043	28.405	30.044	28.544	0.912	0.932	0.952	0.950	0.100	0.067	0.047	0.049
MobileNeRF [10]	24.115	25.127	26.633	27.930	0.868	0.885	0.913	0.938	0.141	0.112	0.078	0.050
MobileNeRF [10] w/o SS	23.730	24.425	25.308	25.364	0.861	0.875	0.898	0.910	0.149	0.128	0.104	0.091
Ours	30.333	31.290	31.055	29.014	0.946	0.956	0.960	0.955	0.069	0.049	0.045	0.048

Method	PSNR $\uparrow$				SSIM $\uparrow$				LPIPS $\downarrow$
Method	Full Res	1/2 Res	1/4 Res	1/8 Res	Full Res	1/2 Res	1/4 Res	1/8 Res	Full Res	1/2 Res	1/4 Res	1/8 Res
SNeRG [21]	29.333	30.065	28.355	25.373	0.940	0.949	0.946	0.924	0.134	0.091	0.097	0.144
MobileNeRF [10]	29.448	30.654	31.144	30.000	0.934	0.947	0.957	0.959	0.077	0.054	0.042	0.037
MobileNeRF [10] w/o SS	28.290	28.447	27.317	25.212	0.926	0.935	0.935	0.917	0.093	0.077	0.079	0.094
Ours	30.355	30.467	28.766	26.566	0.949	0.956	0.951	0.935	0.062	0.050	0.058	0.073

Method	PSNR $\uparrow$
Method	Full Res	1/2 Res	1/4 Res	1/8 Res
Ours w/o tr-mip te-mip	29.690	30.897	30.201	27.371
Ours w/o tr-mip	29.631	30.217	29.461	27.663
Ours w/o te-mip	30.348	31.146	29.581	26.669
Ours	30.333	31.290	31.055	29.014

Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering

Abstract

1 Introduction

2 Related Works

Scene Representation for View Synthesis

Real-time Neural Rendering

Reducing Aliasing in Rendering

Relation to DVGO, iNGP and ZipNeRF

3 Method

3.1 Review of NeRF

3.2 Review of Deferred NeRF

3.3 Mip Voxel Grids

3.3.1 Level of Detail

3.3.2 Filtering and Sampling

3.3.3 Optimization

4 Experiments

4.1 Datasets

4.2 Implementation Details

4.3 Results

Multiscale-NeRF

Synthetic-NeRF

4.4 Ablations

Mipmapping

Low-pass Filter

5 Conclusion

Acknowledgements

References

Filter	PSNR $\uparrow$
Filter	Full Res	1/2 Res	1/4 Res	1/8 Res
None	29.995	31.043	30.718	28.534
Mean (size 3)	30.211	31.179	30.997	29.005
Mean (size 5)	30.333	31.290	31.055	29.014
Mean (size 7)	30.210	31.177	30.995	29.002
Gaussian (size 3)	30.011	31.068	30.733	28.575
Gaussian (size 5)	30.052	31.078	30.759	28.683
Gaussian (size 7)	30.005	31.059	30.731	28.505