An Interactive Image-based Modeling System

papers 0578 Zhi He Rui Wang Wei Hua Yuchi Huo

Abstract

This paper propose a interactive 3D modeling method and corresponding system based on single or multiple uncalibrated images. The main feature of this method is that, according to the modeling habits of ordinary people, the 3D model of the target is reconstructed from coarse to fine images. On the basis of determining the approximate shape, the user adds or modify projection constraints and spatial constraints, and apply topology modification, gradually realize camera calibration, refine rough model, and finally complete the reconstruction of objects with arbitrary geometry and topology. During the interactive process, the geometric parameters and camera projection matrix are solved in real time, and the reconstruction results are displayed in a 3D window.

1 Introduction

As an important supplement to traditional computer-aided 3D modeling technology, image-based modeling technology has made great progress in recent years. The so-called image-based modeling (IBM) technology is to recover the geometry and surface texture and other attribute information of objects from real images through technical means, so that the created models are more visually realistic.

In IBM technology, there are two main methods: 1) Geometric reconstruction method based on discrete points. It uses the corresponding points on multiple images and adopts the principle of stereo vision to restore the three-dimensional coordinates of the points, and then establishes a polygonal mesh model through the topological connection between these points. The advantage of this method is that it can theoretically reconstruct any shape of geometry; the disadvantage is that in practical applications, for objects with complex geometry and texture, there is a problem of low reliability in automatically selecting and matching corresponding points between multiple images. However, manual matching of corresponding points has the problem of excessive interaction. 2) Modeling method based on predefined voxels. A three-dimensional geometric parameterized model of the target model is given through human-computer interaction, and then by establishing the corresponding point-line relationship between the three-dimensional model and the image and the constraint relationship of the three-dimensional geometry itself, the precise parameters of the geometric model are determined by the optimization method. The advantage of this method is that it utilizes the constraint relationship of the model itself, and the amount of interaction is smaller than the previous method; the disadvantage is that a parameterized model needs to be built in advance, which limits the geometry type of the recoverable target model.

We propose a progressive interactive image-based modeling method using a pre-defined voxel-based approach and incorporating ideas from discrete point methods. Using this modeling approach, we developed an IBM modeling system with high versatility, reliability, and ease of use.

2 Progressive Interactive Modeling

Image-based modeling restores the 3D model based on the information of the image. Without loss of generality, we believe that the 3D model to be restored can be represented as a grid. The mesh model M can be represented by a quadruple (K, V, D, S) [9]. Among them, V is a set of vertex positions v1, . . . , vn, which defines the shape of the mesh; K is the connection relationship between V and defines the topology of the mesh; D defines the discrete surface of the model attributes, such as the surface material identification number; S defines the continuous attributes indicated by the model, such as color, normal, texture coordinates, etc. The goal of image-based reconstruction is to recover such a quaternary relation M in space from the image.

Hoppe proved that the conversion between two grids $M^{n}$ and $M^{0}$ in space only needs limited steps to realize [9]. We introduce such ideas into image-based modeling techniques and propose a progressive interactive image-based modeling method. We assume that the model to be recovered from several images $I_{j}(j=1,...,N)$ is $M^{\infty}$ , and the vertex $v_{i}$ must satisfy the projection constraint: $m_{i}^{j}=sP_{j}v_{i}$ . Among them, $m^{i}_{j}$ is the plane point on the image $I_{j}$ , which is a known condition; $s$ is the projection coefficient; $P_{j}$ is the camera projection matrix corresponding to the image $I_{j}$ ( $m^{j}_{i}$ and $v_{i}$ are expressed in the form of homogeneous coordinates). What needs to be solved is $P_{j}$ and $v_{i}$ . In addition to the projection constraints, the constraints that can be used in the restoration process are: epipolar constraint between cameras $f(m_{i},m_{i}^{\prime})=0$ , which defines the corresponding relationship between the plane projection points of the same space point in different images; The space constraints between vertices $g(v_{i0},...,v_{in})=0$ , such as coplanarity, collinearity, and face-to-face parallel, face-to-face perpendicular constraints.

We represent all projection constraints, epipolar constraints and spatial constraints as a constraint relation group F. Theoretically, when there are enough constraints and certain conditions are met, the vertex set V can be recovered by solving F. In addition, if we simplify D and S to the texture and physical coordinates of the mesh surface, then when the topology K and vertex set V of the mesh are known, we can get the vertices of each face on the image plane according to the projection constraints. Through the inverse matrix of the projection matrix, we can inversely obtain the texture coordinates of the surface and the vertex. Therefore, the grid (K, V, D, S) can be obtained by the corresponding calculation of (K, F).

Our modeling system can be regarded as a state machine. The user applies different operators op to the state machine to change into different states by interacting with the system, where the state can be represented by $(K^{i},F^{i})$ , and the system can pass the pair $(K^{i},F^{i})$ Calculations are made to obtain mesh models $M^{i}$ , these $M^{i}(i=0,...,n)$ will approximate the final mesh $M^{\infty}$ (Figure 1).

The operators are divided into the following three categories according to the different imposed objects.

(1) K-op operators, that is, operators applied to topology K. It is proved in [9] that only one pair of reversible operators is needed for two grids Mn and M 0 in space: vertex splitting and edge merging. Considering that they are not conducive to intuitively controlling the grid during interaction, for this purpose, we define two pairs of equivalent reciprocal inverse operators as shown in Table 1.

Refer to caption — Figure 1: Progressive modeling process.

$E_{split}$	Add a vertex to split the edge into two edges.
$V_{delete}$	The inverse operator of $E_{split}$ .
$V_{connect}$	Increase a new edge to connect two vertices.
$E_{delete}$	The inverse operator of $V_{connect}$ .

Table 1: Two pairs of reciprocal inverse operators.

In order to construct the mesh from an initial topology quickly, we set several initial topology construction operators ( $K_{create}$ ) are defined to create predefined voxels.

(2) F-op operator, that is, the operator imposed on the constraint F. Two types of operators are defined: A, the projection constraint operator ( $V_{proj}$ ), which specifies the correspondence between spatial points and pixels on the image to establish projection constraints; B, the spatial constraint operator ( $V_{space}$ ), which specifies the spatial constraints between spatial points relation, including the corresponding inverse operator

(3) V-op operator, that is, the operator applied to the vertex set V. Although the system solves V through F, when the number of constraints in F is insufficient or when numerical iteration is used to solve V, and the initial value is not good, the system can be assisted in solving V by directly adjusting the positions of some vertices.

We take the modeling process of the top of a pavilion as an example to illustrate. First, according to the top shape, use the $K_{create}$ operator to create a cuboid voxel, and then through the operator $V_{proj}$ , the user drags the cube to the corresponding points under different image projections to establish projection constraints. The system uses the edge vertical information of the cube and the existing entities in the scene to calculate the camera internal parameters of each image; extracts point pairs from the corresponding relationship between multiple images to establish epipolar constraints, and calculates the camera external parameters. According to the scene’s current projection constraints, spatial constraints (in hidden voxels) and updated camera parameters number, which computes the geometry of the model in the scene. Obviously, a cube cannot properly approach the top of the real scene, so the user can continue to use $E_{split}$ to split the edges of the cube and generate new vertices, as shown in Figure 2; The vertices need to be connected with $V_{connect}$ to complete the further approximation of the top shape of the pavilion. After each drag operation by the user, the system updates the model. Finally, the system automatically solves the texture and texture coordinates.

3 Computational in Modeling

3.1 Inverse Camera Parameters

We adopt a step-by-step refinement of the inverse algorithm of camera parameters. First, the internal parameters of the camera are recovered from a single image, and then the external parameters of the camera are recovered and the internal parameters are refined through the epipolar constraints of the corresponding points between the two images.

When recovering the camera’s internal parameters from a single image, if the user has sufficient prior knowledge of the scene in the image, as long as the known spatial points of six different planes are input, the camera’s internal parameters can be reversed; otherwise, by using the space of the three-three perpendicular line clusters on the plane for the projection, the vanishing point is obtained by the method of literature [22], and the internal parameters of the camera are reversed from the vanishing point by the method of literature [21].

When recovering camera parameters from multiple images, the epipolar constraints between the two images are obtained through the projection relationship established by the user. Let the projection point sets of the vertex $v$ in the two images be $m$ and $m^{\prime}$ respectively, and eliminate $v$ , $s$ , $s^{\prime}$ from the equation $\left\{\begin{aligned} sm=Pv\\ s^{\prime}m^{\prime}=P^{\prime}v\\ \end{aligned}\right.$ (where $P$ is the camera projection matrix), and get the equation $m^{{}^{\prime}T}Tm=0$ , that is the corresponding point epipolar constraint, where $T$ is the fundamental matrix [18, 8]. There are many ways to calculate $T$ [28], we use the non-linear distance from the point to the limit line with the pixel distance as the final optimization method:

\min_{F}\sum_{i}(d^{2}({m_{i}}^{\prime},Tm_{i})+d^{2}(m_{i},T{m_{i}}^{\prime})),

(1)

where $d(...)$ is the distance function. After restoring $T$ , we can restore $P$ from projective transformation to affine transformation through $T$ , and then directly use the aforementioned method of restoring camera internal parameters from a single image to restore $P$ from affine transformation to isometric transformation (Metric Matrix). When the number of input images is large enough, we use the self-calibration algorithm [20] to calculate the internal parameters of the camera. But in order to ensure the stability of the self-calibration algorithm, we use the internal parameters calculated from a single image as the initial value of the iteration.

3.2 Geometry Inverse

According to the projection constraints and space constraints, in the input $N$ images, the vertex set $V$ of the mesh is solved, which can be expressed as the following form:

\min\sum^{N}_{j=0}\sum^{L}_{i=0}\|P_{j}v_{i}-m_{i}^{j}\|^{2}\quad s.t.g(v_{0},...,v_{L})=0,

(2)

where $P_{j}$ is the camera matrix of the recovered image $I_{j}$ , $m_{j}^{i}$ is the projection of the vertex $v_{i}$ of $I_{j}$ , and $g$ is the space constraint, which is the most basic point projection expression. If the point-based expression is solved, the amount of calculation is very large, and the corresponding relationship needs to be input point by point. Therefore, a parameterized voxel representation [6] is introduced into the system, which expresses the vertex position as the local coordinate of the voxel and the product of the voxel’s rotation and translation transformation. After the initial position of the voxel is calculated, according to the K-op operator used by the user for topology modification, the newly generated vertex is turned to the method based on local coordinates to participate in the solution of formula 2.

3.3 Resolving Texture and Texture Coordinate

After completing the inverse calculation of the camera parameters and the inverse calculation of the model geometry, the texture attributes of each facet of the model can be obtained. The texture inverse operation is actually a mapping transformation $T$ from the source space to the target space. $T$ can be obtained by solving the inverse of a projective transformation. Since each surface has different projected textures on different images, and there are influences such as occlusion and shadow, the colors of the projected points of the same spatial point on different images may be different. We comprehensively consider the visibility of the patch in each image, the angle between the normal direction of the patch and the line of sight of each camera, and the size of the projected area of the patch in the image. Restore high-quality viewpoint-based textures for units [3, 6].

4 Reconstruction Result

The system is divided into four modules: interface module, space modeling module, plane image processing module and optimization solution module. Figure 3 presents two examples of modeling by the system from multiple images.

5 Conclusion

We implement a progressive interactive image-based modeling approach with a corresponding system. Compared with the existing IBM method and system, the system better resolves the contradiction between interaction volume, reliability and flexibility of modeling, and creatively introduces the progressive IBM modeling method, which simplifies the modeling process. process and make it conform to human cognitive habits. The system has the following characteristics: progressive interactive solution mode; WYSIWYG interface environment; accurate camera calibration algorithm; texture editing function and accurate texture acquisition. Further work includes: on the basis of reconstructing geometry and restoring texture attributes, how to further restore the surface lighting attributes, and restore the light distribution and attributes in the scene.

There are some extended articles about applying physical lighting computation in various applications:

1.

Deep Learning-Based Monte Carlo Noise Reduction By training a neural network denoiser through offline learning, it can filter noisy Monte Carlo rendering results into high-quality smooth output, greatly improving physics-based Availability of rendering techniques [14], common research includes predicting a filtering kernel based on g-buffer [2], using GAN to generate more realistic filtering results [26], and analyzing path space features Perform manifold contrastive learning to enhance the rendering effect of reflections [4], use weight sharing to quickly predict the rendering kernel to speed up reconstruction [7], filter and reconstruct high-dimensional incident radiation fields for unbiased reconstruction rendering guide [13], etc.
2.

The multi-light rendering framework is an important rendering framework outside the path tracing algorithm. Its basic idea is to simplify the simulation of the complete light path illumination transmission after multiple refraction and reflection to calculate the direct illumination from many virtual light sources, and provide a unified Mathematical framework to speed up this operation [5], including how to efficiently process virtual point lights and geometric data in external memory [23], how to efficiently integrate virtual point lights using sparse matrices and compressed sensing [12], and how to handle virtual line light data in translucent media [11], use spherical Gaussian virtual point lights to approximate indirect reflections on glossy surfaces [10], and more.
3.

Automatic optimization of rendering pipelines Apply high-quality rendering technology to real-time rendering applications by optimizing rendering pipelines. The research contents include automatic optimization based on quality and speed [24], automatic optimization for energy saving [25, 27], LOD optimization for terrain data [17], automatic optimization and fitting of pipeline rendering signals [16], anti-aliasing [29], etc.
4.

Using physically-based process to guide the generation of data for single image reflection removal [15]; propagating local image features in a hypergraph for image retreival [1]; managing 3D assets in a block chain-based distributed system [19].

References

[1] G. An, Y. Huo, and S.-E. Yoon. Hypergraph propagation and community selection for objects retrieval. Advances in Neural Information Processing Systems, 34, 2021.
[2] S. Bako, T. Vogels, B. McWilliams, M. Meyer, J. Novák, A. Harvill, P. Sen, T. Derose, and F. Rousselle. Kernel-predicting convolutional networks for denoising monte carlo renderings. ACM Trans. Graph., 36(4):97–1, 2017.
[3] C. Buehler, M. Bosse, L. McMillan, S. Gortler, and M. Cohen. Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 425–432, 2001.
[4] I.-Y. Cho, Y. Huo, and S.-E. Yoon. Weakly-supervised contrastive learning in path manifold for monte carlo image reconstruction. ACM Transactions on Graphics (TOG), 40(4):38–1, 2021.
[5] C. Dachsbacher, J. Křivánek, M. Hašan, A. Arbree, B. Walter, and J. Novák. Scalable realistic rendering with many-light methods. In Computer Graphics Forum, volume 33, pages 88–104. Wiley Online Library, 2014.
[6] P. E. Debevec, C. J. Taylor, and J. Malik. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 11–20, 1996.
[7] H. Fan, R. Wang, Y. Huo, and H. Bao. Real-time monte carlo denoising with weight sharing kernel prediction network. In Computer Graphics Forum, volume 40, pages 15–27. Wiley Online Library, 2021.
[8] O. Faugeras. Stratification of three-dimensional vision: projective, affine, and metric representations. JOSA A, 12(3):465–484, 1995.
[9] H. Hoppe. Progressive meshes. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 99–108, 1996.
[10] Y. Huo, S. Jin, T. Liu, W. Hua, R. Wang, and H. Bao. Spherical gaussian-based lightcuts for glossy interreflections. In Computer Graphics Forum, volume 39, pages 192–203. Wiley Online Library, 2020.
[11] Y. Huo, R. Wang, T. Hu, W. Hua, and H. Bao. Adaptive matrix column sampling and completion for rendering participating media. ACM Transactions on Graphics (TOG), 35(6):1–11, 2016.
[12] Y. Huo, R. Wang, S. Jin, X. Liu, and H. Bao. A matrix sampling-and-recovery approach for many-lights rendering. ACM Transactions on Graphics (TOG), 34(6):1–12, 2015.
[13] Y. Huo, R. Wang, R. Zheng, H. Xu, H. Bao, and S.-E. Yoon. Adaptive incident radiance field sampling and reconstruction using deep reinforcement learning. ACM Transactions on Graphics (TOG), 39(1):1–17, 2020.
[14] Y. Huo and S.-e. Yoon. A survey on deep learning-based monte carlo denoising. Computational Visual Media, 7(2):169–185, 2021.
[15] S. Kim, Y. Huo, and S.-E. Yoon. Single image reflection removal with physically-based training images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5164–5173, 2020.
[16] S. Li, R. Wang, Y. Huo, W. Zheng, W. Hua, and H. Bao. Automatic band-limited approximation of shaders using mean-variance statistics in clamped domain. In Computer Graphics Forum, volume 39, pages 181–192. Wiley Online Library, 2020.
[17] S. Li, C. Zheng, R. Wang, Y. Huo, W. Zheng, H. Lin, and H. Bao. Multi-resolution terrain rendering using summed-area tables. Computers & Graphics, 95:130–140, 2021.
[18] Q.-T. Luong and O. D. Faugeras. Self-calibration of a moving camera from point correspondences and fundamental matrices. International Journal of computer vision, 22(3):261–289, 1997.
[19] H. Park, Y. Huo, and S.-E. Yoon. Meshchain: Secure 3d model and intellectual property management powered by blockchain technology. In Computer Graphics International Conference, pages 519–534. Springer, 2021.
[20] M. Pollefeys, L. Van Gool, and A. Oosterlinck. Self-calibration with the modulus constraint. 1996.
[21] C. Rother. A new approach to vanishing point detection in architectural environments. Image and Vision Computing, 20(9-10):647–655, 2002.
[22] J. A. Shufelt. Performance evaluation and analysis of vanishing point detection techniques. IEEE transactions on pattern analysis and machine intelligence, 21(3):282–288, 1999.
[23] R. Wang, Y. Huo, Y. Yuan, K. Zhou, W. Hua, and H. Bao. Gpu-based out-of-core many-lights rendering. ACM Transactions on Graphics (TOG), 32(6):1–10, 2013.
[24] R. Wang, X. Yang, Y. Yuan, W. Chen, K. Bala, and H. Bao. Automatic shader simplification using surface signal approximation. ACM Transactions on Graphics (TOG), 33(6):1–11, 2014.
[25] R. Wang, B. Yu, J. Marco, T. Hu, D. Gutierrez, and H. Bao. Real-time rendering on a power budget. ACM Transactions on Graphics (TOG), 35(4):1–11, 2016.
[26] B. Xu, J. Zhang, R. Wang, K. Xu, Y.-L. Yang, C. Li, and R. Tang. Adversarial monte carlo denoising with conditioned auxiliary feature modulation. ACM Trans. Graph., 38(6):224–1, 2019.
[27] Y. Zhang, R. Wang, Y. Huo, W. Hua, and H. Bao. Powernet: Learning-based real-time power-budget rendering. IEEE Transactions on Visualization and Computer Graphics, 2021.
[28] Z. Zhang. Determining the epipolar geometry and its uncertainty: A review. International journal of computer vision, 27(2):161–195, 1998.
[29] Y. Zhong, Y. Huo, and R. Wang. Morphological anti-aliasing method for boundary slope prediction. arXiv preprint arXiv:2203.03870, 2022.