This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform

Zhijian Qiao, Haoming Huang, Chuhao Liu, Shaojie Shen, Fumin Zhang and Huan Yin Zhijian Qiao and Haoming Huang contributed equally to this work. Zhijian Qiao, Chuhao Liu, Shaojie Shen, Fumin Zhang, and Huan Yin are with the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong SAR. Haoming Huang is with the Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology, Hong Kong SAR. This work was supported in part by the Hong Kong Center for Construction Robotics (InnoHK center supported by Hong Kong ITC), in part by the HKUST Postgraduate Studentship, and in part by the HKUST-DJI Joint Innovation Laboratory. (Corresponding author: Huan Yin)
Abstract

The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., “speak the same language”. To achieve this, we design a cross-modality registration method, spanning from front end the back end. At the front end, we extract descriptors by identifying walls and capturing the intersected corners. Subsequently, for the back-end pose estimation, we employ the Hough transform for pose estimation and estimate multiple pose candidates. The final pose is verified by wall-pixel correlation. To evaluate the effectiveness of our method, we conducted real-world multi-session experiments in a large-scale university building, involving two different types of LiDAR sensors. We also report our findings and plan to make our collected dataset open-sourced.

Note to Practitioners

In order to effectively deploy this method, it is important to consider certain hyper-parameters such as the size of the map and the height threshold. These parameters play a crucial role in achieving accurate alignment results and should be carefully tuned. In addition, it is worth noting that there are certain limitations to our current approach. Firstly, the environment should have a well-defined structural layout for the method to work effectively. Additionally, the current version of the method may have time constraints, and efforts should be made to reduce the computational time required for alignment. Future research and development should focus on refining the method to enhance its performance in these aspects.

Index Terms:
Point cloud registration, LiDAR, Building information modeling, Hough transform

I Introduction

Three-dimensional (3D) point cloud data provides accurate modeling of the indoor environment. Its application extends across various phases in the construction, encompassing typical uses such as geometry quality inspection and real-time visualization [1]. Traditionally, obtaining precise point clouds often relied on stationary laser scanners. In the last ten years, mobile light detection and ranging (LiDAR) mapping has emerged as a well-established technique within the robotics community [2], like centimeter-level simultaneous localization and mapping (SLAM). This technique enables the fast acquisition of 3D point clouds using mobile robotic platforms, boosting the 3D point cloud-based applications in the construction industry.

The utilization of 3D point clouds commonly involves the as-designed building data, e.g., visualizing elements on building information modeling (BIM) with augmented reality (AR) techniques or comparing LiDAR to BIM for quality management. Notably, a rigid transformation exists between the as-built source frame and the as-designed reference frame. Thus, frame alignment is indispensable for aligning the distinct sourced data within the same spatial space for subsequent analysis. In commercial software, users manually set the transformation to align the data, and manual tuning is necessary to ensure alignment quality. Regarding long-term robotic operations, external infrastructures are typically required to calibrate the different frames, like employing tags and markers in the building. However, these processes entail much human labor and additional costs, restricting the adoption of LiDAR point clouds in the construction industry, especially in large-scale environments.

In this study, we propose an automatic alignment approach for LiDAR point clouds and BIM using a global point cloud registration approach, without the requirements of external infrastructures or manual verification. Compared to conventional point cloud registration tasks, we face three main challenges: 1) the modality of LiDAR point clouds and BIM is naturally different, which introduces difficulties in front-end feature extraction and data association; 2) indoor environments commonly contain repetitive patterns, leading to potential ambiguity for pose estimation; 3) the deviation of as-built and as-designed make the data association hard to obtain. To address these challenges, our method encompasses both the front end and back end. We employ lines as the fundamental representations of BIM and LiDAR data, enabling the generation of corner points and point-level correspondence estimation (data association). Subsequently, multiple pose candidates are obtained through voting under the Hough transform scheme. These pose candidates undergo verification using wall-pixel correlation to obtain the final pose estimation. We conduct experiments using real-world data from a large-scale university building. The dataset includes multi-session LiDAR point clouds collected from mobile mapping platforms with two different LiDAR sensor types.

Overall, this study focuses on registering LiDAR data to BIM globally, and we summarize the contributions as follows,

  • We design and develop a complete registration framework that could align LiDAR point cloud with BIM automatically and globally.

  • A comprehensive descriptor is designed for cross-modality data association, which encodes the basic lines and corners in structural environments.

  • The pose estimation is achieved by Hough transform in parameter space and verified by wall-pixel correlation to guarantee robustness and accuracy.

  • We conduct experiments in the real world using two different LiDAR sensors in a large-scale university building. We plan to open-source the LiBIM-UST dataset to benefit the community.

The structure of this paper is organized as follows: Section II presents the related work on BIM-aided robotic mapping and global point cloud registration; Section III introduces the system overview. Section IV details the the feature extraction and correspondence generation, following the pose Hough transform and voting in Section V. We validate the proposed method in the real world in Section VI. A conclusion is summarized in Section VII.

II Related Work

In this section, we first provide related works on robotic sensing and BIM. We then dive into the specific topic of point cloud registration, emphasizing the importance of point descriptors in this context.

II-A Robots Meet BIM

BIM serves as a digital representation of buildings and other physical assets in the architectural, engineering, and construction (AEC) industry. BIM generally encompasses comprehensive 3D spatial information, making it suitable for aligning robotic sensing data with the BIM model. This alignment process can effectively reveal deviations between the as-built and as-designed states, thereby benefiting the AEC industry. Conversely, BIM could also serve as a spatial map that facilitates robot localization and planning, eliminating the need for traditional mapping approaches. By leveraging the information provided by BIM, robots can navigate and operate within the environment without relying on explicit mapping procedures. This mapping-free characteristic of BIM enhances robot navigation capabilities. The subsequent paragraphs will delve into the relevant works conducted in these two aspects, namely aligning robot sensings to BIM for deviation analysis and utilizing BIM as a spatial map for robot navigation.

Visual data is highly informative in nature. Han et al. [3] introduce a technique where vanishing lines and points are detected in images and subsequently projected onto the BIM for the purpose of progress monitoring. Similarly, Chen et al. [4] propose a method that enables the alignment of photometric point clouds with BIM, thereby achieving precise camera localization, commonly referred to as the ”align-to-locate” approach. It is worth noting that a single 2D image cannot inherently provide depth information, which restricts the deployment on image-to-BIM. While stereo vision can offer depth perception [5], its suitability for large-scale alignment is limited. In light of these challenges, our study proposes the utilization of LiDAR point clouds as the input, facilitating direct 3D modeling of the environment.

In the robotics community, an existing recognition is that for long-term robots operating in stable environments, the mapping process of SLAM can be redundant as the generated map remains largely unchanged over time. As an alternative, utilizing Building Information Modeling (BIM) or low-level floorplans has emerged as a potential solution for long-term robot navigation. One notable approach is FloorPlanNet, proposed by Feng et al. [6], which segments rooms from point cloud data and aligns them with a floor plan. The core architecture of FloorPlanNet is based on a graph neural network for effective feature learning. Similarly, Zimmerman et al. [7] introduce a semantic visibility model integrated into a particle filter, enhancing long-term robot indoor localization. These works leverage floorplans as 2D maps, while BIM is a good choice for providing 3D point cloud maps. Yin et al. [8] generated a semantic-metric point cloud map from a BIM and designed a semantic-aided iterative closest point (ICP) algorithm for robot localization. This method effectively localizes a robot equipped with a moving sensor within the BIM environment. While the aforementioned works focus on point-level operations, there is also a promising direction in building instance-level robot navigation. In the study by Shaheer et al. [9], a novel map called S-Graph is designed based on BIM. The S-Graph consists of three-layered hierarchical representations, and a particle filter-based approach is employed to support robot navigation at the instance level.

All the works above aim to align observed data to a floorplan-based or BIM-based map, in which point cloud registration is an indispensable function. This will be discussed in the following subsection.

II-B 3D Point Cloud Registration and Advances

Point cloud registration, like iterative closest point (ICP) [10] and normal distributions transform (NDT) [11], focuses on local point-level alignment with a good initial guess. Global point cloud registration aims at solving pose estimation from scratch [12]. The global point cloud registration can be categorized into two primary models: correspondence-free[13] and correspondence-based. This study is in a correspondence-based manner. We mainly focus on point descriptors to provide correspondences and recent advances in this task.

Traditional 3D descriptors like Fast Point Feature Histogram (FPFH) [14] compute rotation-invariant features such as distances and angles between points and normals. Then, it searches correspondences on a k-d tree constructed from the FPFH features. Although it performs well in environments with rich geometric features, it degenerates in scenes with ambiguous structures. With the development of 3D representation learning such as PointNet [15] and KPConv [16], various of learning-based methods are proposed to encode and match point cloud features. Deep Closet Point (DCP) [17] encodes 3D point features and incorporates self-attention and cross-attention layers on the encoded features. Point correspondences can be searched by computing the all-to-all feature similarities. After that, GeoTransformer [18] utilizes KPConv to encode hierarchical 3D point features and use optimal transport for correspondence estimation. Although DCP and GeoTransformer have achieved promising results on small-scale benchmarks [19, 20], they face two main challenges for LiDAR global registration with BIM. Firstly, these environments contain numerous similar 3D structures, especially surface planes, making it difficult to distinguish and match corresponding elements accurately. Secondly, the differences in modalities between BIM and LiDAR data create a large domain gap, leading to ambiguous encoding of the 3D point cloud by the 3D backbone. Additionally, learning-based methods incorporating multiple attention layers consume a large amount of memory, making them impractical for use in large-scale environments with current computing platforms.

Refer to caption
Figure 1: Schematic of our proposed method. The process begins with accumulating LiDAR scans as submaps (Section IV-A1). Feature extraction from the query submap, including 3D wall segmentation, 2D line detection, and corner detection, results in lines and corners in a LiDAR submap (Section IV-B). Feature extraction from Building Information Modeling (BIM) involves decomposition into subBIMs and extraction of lines and corners in the BIM data (Section IV-A2). These features are then used to compute triangle descriptors, which are stored in hash tables for efficient matching (Section IV-C). The back-end involves a pose Hough transform and voting procedure to identify potential transformation candidates (Section V-A). Finally, wall-pixel correlation verification is introduced to determine the optimal transformation (Section V-B).

The 3D point descriptor can also be constructed along with semantic information. Given an RGB-D sequence as input, Kimera by Rosinol et al. [21] constructs a hierarchical scene graph in a large-scale indoor environment. Based on Kimera, Hydra [22] encodes the semantic objects and rooms as hierarchical features. Some related methods reconstruct a semantic graph and encode the graph topology using random walk descriptors. Outram by Yin et al. [23] proposes a triangulated 3D scene graph to search correspondences. BoxGraph by Pramatarov et al. [24] clusters the semantic point cloud into instances. It matches each pair of instances while considering their shape similarity and semantic label. In [7], the floorplan-based method extracts the semantic objects in the 2D floorplan and incorporates the objects into the Monte Carlo Localization model. These semantic-aided approaches are hard to apply in our task registration because the semantic information significantly differs between a BIM model and a LiDAR data, especially when a large deviation exists between the as-designed and as-built in the semantic level. BIM mainly involves structural elements such as walls and rooms, while the methods above detect furniture and small-size objects in LiDAR or RGB-D maps. The inconsistent semantic descriptors pose serious challenges in matching them across modalities.

III System Overview

The primary objective of this paper is to achieve robust and global registration between Building Information Modeling (BIM) and Light Detection and Ranging (LiDAR) scans. The fundamental challenge arises from the heterogeneous nature of these modalities, which exhibit significant differences in structure, density, and information content. To address this, we have designed a pipeline that constructs correspondences using common features identified in the front end, and performs pose (transformation) estimation based on these correspondences in the back end.

The architecture of our proposed pipeline is illustrated in Figure 1. The process begins with data preprocessing, as outlined in Section IV-A, where LiDAR data are aggregated into submaps and the BIM model is decomposed into manageable subBIMs. This step is crucial to ensure consistency in density and size between both data sources. The subsequent extraction of common semantic features, specifically walls and corners, from these disparate data sources is discussed in Section IV-B. These features are then associated through the construction and matching of a novel triangle descriptor, as detailed in Section IV-C.

A transformation estimation technique using the Hough transform and voting is introduced in Section V-A, which utilizes corner-wise correspondences formed by the triangle descriptor based matching. This estimator generates multiple candidate transformations through a voting mechanism. The multi-candidate approach is designed to address the challenge of potential true transformations being mistakenly discarded due to repetitive corner patterns within the building structures. The correct transformation is subsequently selected using a verification scheme, which identifies the candidate that most accurately aligns the submap with the BIM, as detailed in Section V-B.

IV Front-end: Walls, Corners and Descriptors

In this section, we establish correspondences between BIM and LiDAR submaps by extracting common semantic features such as walls and corners. The proposed pipeline consists of three main steps, as illustrated in Figure 1. First, sequential LiDAR point clouds are aggregated into a submap 𝒮\mathcal{S} to address the limited field of view issue, and the BIM is decomposed into several subBIMs {}\{\mathcal{B}\} (Section IV-A). Second, walls and corners are detected and extracted from 𝒮\mathcal{S} and {}\{\mathcal{B}\} to build correspondences, facilitate pose estimation, and perform verification (Section IV-B). Finally, corners are used to construct triangle descriptor dictionaries for precise point-level data associations (Section IV-C).

IV-A LiDAR Submap and SubBIM

The density and size of a single LiDAR scan and a BIM model often differ significantly due to modality differences. For effective construction monitoring, denser LiDAR data are preferable to capture detailed geometric information. Conversely, to reduce the complexity of the problem, preprocessing of LiDAR scans and BIM is necessary to align them in size and density. This involves accumulating sequential LiDAR scans to form the LiDAR submap 𝒮\mathcal{S}, and decomposing the BIM into subBIMs {}\{\mathcal{B}\}.

IV-A1 Building LiDAR submap

Given one LiDAR scan captured at the sensor frame 𝒫k\mathcal{P}_{k}, with kk representing the timestamp, we denote 𝒫1,𝒫2,,𝒫n\mathcal{P}_{1},\mathcal{P}_{2},\ldots,\mathcal{P}_{n} as a sequential of LiDAR scans with consecutive timestamps. A LiDAR submap could be obtained by processing sequential LiDAR scans:

𝒮=Γ(k=1n𝐓𝐤𝒫k,rv)\mathcal{S}=\Gamma\left(\bigcup_{k=1}^{n}\mathbf{T_{k}}\mathcal{P}_{k},r_{v}\right) (1)

where \bigcup denotes the union of the point sets from each point cloud and 𝐓𝐤\mathbf{T_{k}} denotes the pose of the LiDAR data from the body frame to the world frame. Γ(𝒮,r)\Gamma(\mathcal{S},r) is the voxelization downsamling function, where rr represents the voxel size. n=argmin𝑘{trans(𝐓𝐤1𝐓𝟏)>ds}n=\underset{k}{\arg\min}\left\{\|\text{trans}(\mathbf{T_{k}}^{-1}\mathbf{T_{1}})\|>d_{s}\right\}, where trans()\text{trans}(\cdot) is the translation part of 𝐓\mathbf{T} and dsd_{s} denotes the submap distance. Note that the transformation 𝐓𝐤\mathbf{T_{k}} could be obtained with SLAM and similar techniques.

Refer to caption
Figure 2: The decomposition process of BIM involves several steps. Initially, wall segments are extracted directly from the BIM using specialized software. Following this, a sliding window technique is applied to generate subBIMs. Within each subBIM, crucial features such as lines and corners are identified by utilizing the wall segments.

IV-A2 BIM decomposition

BIM data provides a digital representation of building elements, facilitating the direct extraction of semantic features such as walls using specialized tools or software, such as Revit111Rhino.Inside.Revit at https://www.rhino3d.com/inside/revit/1.0/. After extracting these features, BIM decomposition is employed to manage the complex structure efficiently. This involves using a sliding window technique that moves along the xx- and yy-axes of the BIM floor plan at fixed intervals, with a predefined overlap ratio between each generated subBIM. The decomposition process is depicted in Figure 2.

IV-B From 3D Walls to 2D Corners

In typical indoor environments, various semantic elements such as floors, columns, and walls exist [8]. Among these elements, the wall is a fundamental component of modern architecture. In this study, we propose selecting walls as the shared representations for LiDAR and BIM data. However, accurately modeling walls from LiDAR data using traditional point cloud processing techniques can be time-consuming. Additionally, learning-based wall recognition methods often require additional data to support training. These factors pose challenges in achieving large-scale global registration, especially on resource-constrained platforms.

On the other hand, the inertial measurement unit (IMU) is a gold standard for modern robotic sensing platforms. IMU can provide the gravity direction of the mapping platform, hence, the 6-degree of freedom (DoF) registration problem can be reduced to 3-DoF in the case of a planar floor. Thus, our basic idea is to project the 3D walls to 2D lines and extract corner points as features, thus generating descriptors for the downstream data association and 3-DoF pose estimation (xyx-y coordinates and the yaw angle).

We first elaborate on the feature extraction for the LiDAR submap, as illustrated in Figure 3. With the built LiDAR submap, we perform plane detection on a single query submap to extract walls and filter out irrelevant components, notably the ground. Then the 3D walls are converted into 2D corners for the following descriptor computation and transformation estimation. Specifically, the entire point cloud is divided into voxels with predetermined dimensions svs_{v}. Each voxel contains a collection of points 𝐩i\mathbf{p}_{i}, where i=1,,Ni=1,\ldots,N. Next, we calculate the covariance matrix Σ\Sigma for the points within each voxel:

𝐩¯=1Ni=1N𝐩i;𝚺=1Ni=1N(𝐩i𝐩¯)(𝐩i𝐩¯)T\overline{\mathbf{p}}=\frac{1}{N}\sum_{i=1}^{N}\mathbf{p}_{i};\quad\boldsymbol{\Sigma}=\frac{1}{N}\sum_{i=1}^{N}\left(\mathbf{p}_{i}-\overline{\mathbf{p}}\right)\left(\mathbf{p}_{i}-\overline{\mathbf{p}}\right)^{T} (2)

To determine the content of each voxel, we analyze the ratio of the smallest eigenvalue λ3\lambda_{3} to the second smallest eigenvalue λ2\lambda_{2} of the covariance matrix Σ\Sigma. If the ratio exceeds a predefined threshold, the voxel is considered to belong to a plane. To further consolidate plane voxels into coherent plane segments, we employ a region-growing technique. The criteria for merging voxels into plane segments are described in detail in our previous G3Reg by Qiao et al. [25].

Once plane segmentation is completed, we filter out ground points and project the remaining vertical wall points onto the floor plane. 2D corners are obtained by applying the following procedure:

  1. 1.

    We project wall points onto a 2D image at pixel scale sIs_{\mathrm{I}} and apply a line segment detector to identify potential wall structures. We utilize the LSD algorithm by [26] for this purpose.

  2. 2.

    Detected line segments are then grouped based on their orientation. Segments whose lengths are below LminL_{\mathrm{min}} will be discarded.

  3. 3.

    Within each group, the classical DBSCAN algorithm [27] is employed, using the distances between line segment endpoints to form clusters that likely correspond to individual walls.

  4. 4.

    We merge close and parallel line segments within the same cluster and refit them into a single line segment to accurately represent each wall.

  5. 5.

    Finally, these line segments are extended by a predefined distance, and their intersections are computed to identify corner points. These corners are crucial in subsequent data association and pose estimation.

Following a similar pipeline used for the LiDAR submap, we estimate corners of the subBIMs from extracted walls. This consistent approach across LiDAR submaps and subBIMs facilitates subsequent data association and back-end pose estimation.

Refer to caption
Figure 3: The workflow for extracting triangle descriptors from LiDAR submaps begins with a Dense LiDAR Submap. The process involves plane segmentation to identify 3D walls. These segmented walls are then projected onto the 2D ground plane, allowing for the detection of lines. Subsequently, corner points are extracted from the detected lines through the intersection of adjacent lines. Finally, triangle descriptors are computed based on the corner triplets within the submap.
Refer to caption
Figure 4: Triangular descriptor formulation, where each vertex in the corner triplet, denoted as AA, BB, and CC, corresponds to the wall corners. The angles α\alpha, β\beta, and γ\gamma represent the angles between the triangle sides and the local wall segments.

IV-C Triangle Descriptors

Upon preprocessing the BIM and the LiDAR submap, we employ triangle descriptors to establish correspondences between them. This method is inspired by the STD introduced in [28]. We leverage both corners and wall segments to construct these descriptors.

Corners extracted from the LiDAR submap and subBIM are used to form a neighborhood graph, where vertices represent corners and edges connect two corners if they are within a maximum distance of LmaxL_{max}. Cliques in this graph with three vertices are used to form triangle descriptors, denoted by Δ\Delta. These triangles are formed by corner triplets (A,B,C)(A,B,C), as illustrated in Figure 4. The attributes of a triangle descriptor Δ\Delta include:

  • Side lengths AB,BC,AC\|AB\|,\|BC\|,\|AC\|: These are ordered such that ABBCAC\|AB\|\leq\|BC\|\leq\|AC\|, encoding the geometric shape of the triangle.

  • Angles α,β,γ\alpha,\beta,\gamma: These angles are defined as the smaller angle between a triangle side and the intersecting wall segments. Specifically, α=min((AB,lm),(AB,ln))\alpha=\min(\angle(AB,l_{m}),\angle(AB,l_{n})) where AA is the intersection of wall segments lml_{m} and lnl_{n}. Similar definitions apply for β\beta and γ\gamma, associated with sides BCBC and ACAC, respectively.

These sorted side lengths and angles capture both the geometric and structural details of the local environment, distinguishing this approach from the original STD descriptor [28]. As shown in Figure 5, querying each triangle in the submap can yield multiple responses in the subBIM due to repetitive architectural patterns. The inclusion of angles α,β,γ\alpha,\beta,\gamma helps to differentiate triangles that have identical shapes but differ in their angular relationships with adjacent wall segments, thus significantly reducing incorrect correspondences.

For efficient data management and retrieval, we construct a hash table for both LiDAR submap and subBIMs. The keys are formed using the triangle descriptor Δ=[AB,BC,AC,α,β,γ]\Delta=[\|AB\|,\|BC\|,\|AC\|,\alpha,\beta,\gamma], with Δ𝐑6\Delta\in\mathbf{R}^{6}, and are quantized based on side length resolution rsr_{\mathrm{s}} and angle resolution rar_{\mathrm{a}} as detailed in Table II and analyzed in Section VI-C1. Multiple corner triplets that yield the same hash key are stored together:

Δ𝒮\displaystyle\Delta\mathcal{S} ={Δis={(A,B,C)j}j=1Kis}i=1Ns,\displaystyle=\{\Delta_{i}^{s}=\{(A,B,C)_{j}\}_{j=1}^{K_{i}^{s}}\}_{i=1}^{N^{s}},
Δ\displaystyle\Delta\mathcal{B} ={Δib={(A,B,C)j}j=1Kib}i=1Nb,\displaystyle=\{\Delta_{i}^{b}=\{(A,B,C)_{j}\}_{j=1}^{K_{i}^{b}}\}_{i=1}^{N^{b}},

where NsN^{s} and NbN^{b} denote the number of triangle descriptors, and KsK^{s} and KbK^{b} represent the number of corner triplets for each descriptor in the submap and subBIM, respectively.

During the subBIM retrieval stage, hash keys from the submap are used to vote for potential matched subBIMs. The retrieval score for a subBIM is determined by the number of key intersections with the submap’s keys. This process is expedited by the O(1)O(1) query complexity of the hash table. Then we retain all subBIMs with a nonzero number of votes as potential matching candidates, given the presence of numerous repetitive patterns that generate significant ambiguity in indoor scenes. For each subBIM candidate, the matched triangle descriptor can naturally be transformed into a set of corner triplet correspondences by exhaustively matching all corner triplets within two descriptors. Though triplet-based correspondences are with a extremely large number and high outlier ratio, they are crucial for the transformation estimation process, which will be discussed in the next section.

Refer to caption
Figure 5: The figure demonstrates the effectiveness of triangle descriptors in matching results, with and without angle inclusion. On the left, a highlighted corner triplet from the LiDAR submap is shown in red, which has multiple matches from the subBIM using descriptor-based matching. Incorporating angle information significantly enhances the discriminative power of the triangle descriptor by encoding the local geometric structure around the triangle, which markedly reduces the false positives.

V Back-end: Transformation Voting and Verification

This section outlines the methodology for deriving pose transformation candidates from subBIM candidates, utilizing a combination of the Hough transform and a voting mechanism. This process is particularly critical given the high ratio of outliers among corner triplet correspondences (Section V-A). Following this, we detail a verification scheme designed to identify the optimal transformation for aligning each submap with its corresponding BIM region, ensuring the reliable registration (Section V-B).

V-A Pose Hough Transformation and Voting

The Hough transform is a robust technique in computer vision for converting raw measurements into a parameter space, where optimal parameters are identified through a voting mechanism to effectively reject outliers. Our implementation includes two primary steps:

Parameter Space Construction: The parameter space, based on the SE(2)\operatorname{SE}(2) Lie group, includes 2 degrees of freedom (DOF) for translation and one for yaw. This space is rasterized into voxels at resolutions specified by rxyr_{\mathrm{xy}} for x-y translation and ryawr_{\mathrm{yaw}} for yaw, as detailed in Table II and discussed in the ablation study (Section VI-C2).

Voting by Corner Triplet Correspondences: For each corner triplet correspondence [(A,B,C)s,(A,B,C)b]\left[(A,B,C)_{s},(A,B,C)_{b}\right], transformation parameters (x,y,yaw)(x,y,yaw) are computed using a closed-form solution for 2D registration [29]. A line segment overlap-based filter is applied to ensure that line segments intersecting at a submap corner overlap with those at the corresponding BIM corner. Only triplet correspondences that pass this filter contribute to the voting process.

For each subBIM, multiple corner triplet correspondences {(Δis,Δib)}i=1Nc\{\left(\Delta_{i}^{s},\Delta_{i}^{b}\right)\}_{i=1}^{N_{c}} are obtained through triangle descriptor based matching. Each corner triplet from Δs\Delta^{s} is matched against every triplet in Δb\Delta^{b}, forming a minimum vote set where each voxel in the parameter space receives at most one vote. All minimum vote sets from NcN_{c} descriptor correspondences are then combined to form the vote set for one subBIM candidate. In cases where voxels receive votes from multiple subBIM candidates, only the voxel with the maximum votes is retained. The final step involves grouping all pose voxels by their vote counts and selecting the top JJ groups with the highest votes. The average votes falling into the same voxel represent our transformation candidate.

The computational complexity of this stage is primarily focused on the Hough transform step, denoted as O(n)O(n), where nn is the number of corner triplet correspondences. This complexity is comparatively efficient, especially when juxtaposed against alternative transformation estimation methods such as RANSAC and TEASER++ (Section VI-C3).

V-B Transformation Verification

Given transformation candidates for JJ groups, our objective is to develop a verification scheme to determine the optimal transformation. While corners serve as a common reference between the submap and subBIMs in previous sections, they do not fully capture the geometric structure, leading to potential inaccuracies in alignment determination. To address this, we leverage walls, which encompass the majority of the common geometric structure between the submap and subBIMs, to verify the transformation candidates. The verification process consists of two components: wall preprocessing and similarity score computation.

In the wall preprocessing step, we project the wall point cloud data from both the subBIM and the transformed submap into images II_{\mathcal{B}} and I𝒮I_{\mathcal{S}} at a pixel scale sIs_{\mathrm{I}}. However, directly comparing these images might not accurately reflect the true alignment quality for two primary reasons. First, the actual construction conditions (as-built data) may differ from the BIM designs (as-designed data). To mitigate this discrepancy, we apply a dilation to the wall pixels in the submap image by a radius of 1 meter. Second, as the submap only partially overlaps with the subBIM, we define a region of interest (ROI) within the subBIM image to focus the assessment on corresponding areas where the submap image exhibits wall pixels.

The similarity between the two images is quantified using the normalized cross correlation coefficient (NCC), a standard measure for image similarity:

𝚜𝚌𝚘𝚛𝚎(IS,IB)=u,v(IS(u,v)IB(u,v))u,vIS(u,v)2u,vIB(u,v)2\mathtt{score}(I_{S},I_{B})=\frac{\sum_{u,v}\left(I_{S}(u,v)\cdot I_{B}(u,v)\right)}{\sqrt{\sum_{u,v}I_{S}(u,v)^{2}}\cdot\sqrt{\sum_{u,v}I_{B}(u,v)^{2}}} (3)

where (u,v)(u,v) are the coordinates within the ROI.

A typical case study illustrating this verification process is presented in Figure 6. Despite generating multiple transformation candidates through a Hough transform and voting procedure, the wall-based transformation verification effectively resolves ambiguities from the corner correspondences by selecting the transformation that best aligns the submap with the subBIM.

Refer to caption
(a) More votes with lower similarity score
Refer to caption
(b) Less votes with higher similarity score
Figure 6: This figure demonstrates the voting and verification process for a submap and subBIM pair, where (a) and (b) show the number of votes and similarity scores obtained by different transform candidates, respectively. The designed verification scheme could select the true transformation from multiple candidates based on the designed similarity score.

VI Experiments

This section is structured as follows: In Section VI-A, we introduce the dataset, evaluation metrics, and implementation details of our proposed method, as well as those of the methods against which we compare. Section VI-B presents comparative experiments that assess the registration recall of our method relative to other advanced methods. In Section VI-C, we explore how different parameters affect the performance of our proposed algorithm through an ablation study. Section VI-D provides visualizations of our algorithm applied in real-world scenarios and discusses cases where the algorithm failed.

Refer to caption
Figure 7: The handheld sensor suite: a Livox Mid-360 LiDAR scanner, fisheye cameras for wide-angle imaging, and a built-in IMU.

VI-A Experimental Setup

VI-A1 LiBIM-UST Dataset

TABLE I: FusionBIM Dataset Information
Sensor Type Sequence Total Distance (m) Submap Distance (m) Number of Submaps
Ouster-128 Building Day 402.14 10 36
Mid-360 2f-office 533.78 15 40
30 15

We developed the LiBIM-UST dataset using the publicly available FusionPortable dataset [30] and additional data collected by our team for evaluation purposes. This dataset features the BIM of the main academic building at the Hong Kong University of Science and Technology (HKUST). It includes 91 LiDAR submaps and the ground truth pose transformations between them, as detailed in Table I.

The ”Building Day” sequence, part of the FusionPortable dataset [30], was captured in the lobby of the academic building using a mechanical rotating LiDAR OS1-128. The other was collected in office corridors using a solid-state LiDAR MID-360, which were gathered by our team. The sensor configuration for the self-collected data is depicted in Figure 7.

As described in Section IV-A1, LiDAR scans are accumulated over a period based on the submap distance dsd_{s} to generate submaps. Given that different sequences have varying fields of view, we set distinct dsd_{s} values for each, as indicated in Table I. To ensure adequate differentiation between submaps, their overlap ratio is limited to approximately 50%.

To determine the ground truth of the pose transformations from the submaps to the BIM, we initially align them manually and subsequently refine the alignments using the ICP algorithm.

TABLE II: Parameters of Our Method
Parameters Value
Line Detection Pixel scale sIs_{\mathrm{I}} 12 pixels / m
Minmum length LminL_{\mathrm{min}} 16 pixels
Triangle Descriptor Side length resolution rsr_{\mathrm{s}} 1.25 m
Angle resolution rar_{\mathrm{a}} 3.8
Maximum side length LmaxL_{\mathrm{max}} 37.5 m
Hough Transform x-y resolution rxyr_{\mathrm{xy}} 2.5 m
Yaw resolution ryawr_{\mathrm{yaw}} 5.0
TABLE III: Global Registration Evaluation on LiBIM-UST Dataset
Front-end Back-end Registration Recall(%) Average Time(s)
Building Day 2f-office Building Day 2f-office
dsd_{s}=10m dsd_{s}=15m dsd_{s}=30m dsd_{s}=10m dsd_{s}=15m dsd_{s}=30m
Image Matching- based Method SIFT [31] RANSAC 0.00 0.00 0.00 3.31 2.95 3.30
Superpoint [32]+Lightglue [33] RANSAC 0.00 0.00 0.00 2.24 2.31 2.33
Point Cloud- based Method FPFH [34] TEASER++ [35] 0.00 0.00 0.00 20.89 9.71 17.07
Voxel Map 3D BBS [36] 58.33 52.50 86.67 105.73 748.63 1122.47
Ours Triangle Descriptor Hough Voting 91.67 45.00 93.33 25.44 36.38 158.16

VI-A2 Evaluation Metric

To evaluate the effectiveness and efficiency of the registration algorithms, we use registration recall and average time consumption, respectively. The success of registration, or true positive, is determined based on the transformation estimation (𝐑est,𝐭est)(\mathbf{R}_{est},\mathbf{t}_{est}) compared to the ground truth (𝐑gt,𝐭gt)(\mathbf{R}_{gt},\mathbf{t}_{gt}). A registration is considered successful if:

arccos(tr(𝐑estT𝐑gt)12)<5\displaystyle\arccos\left(\frac{\operatorname{tr}(\mathbf{R}_{est}^{T}\mathbf{R}_{gt})-1}{2}\right)<5^{\circ} (4)
𝐑estT(𝐭gt𝐭est)2<3m\displaystyle\|\mathbf{R}_{est}^{T}(\mathbf{t}_{gt}-\mathbf{t}_{est})\|_{2}<3\text{m}

These thresholds (5,3m)(5^{\circ},3\text{m}) are chosen based on common practice, allowing for successful local refinement methods like ICP and NDT to follow.

VI-A3 Comparisons and Implementation

Given that LiDAR to BIM global registration is a novel task in cross-modal registration with limited open-source work and benchmarks, we selected representative methods from both image and point cloud registration domains for comparison.

Image Registration: For image registration, intermediate images from the line extraction process (Section IV-B) are used. We implemented the SIFT [31] and a combination of Superpoint [32] with LightGlue [33], representing traditional and deep learning-based methods, respectively. RANSAC, with up to 4,000,000 iterations, was used for pose estimation.

Point Cloud Registration: We employed two common global point cloud registration methods: FPFH [34] combined with TEASER++ [35] and 3D-BBS [36], a branch-and-bound based method. For FPFH, we set the normal search radius to 1 m and the feature search radius to 2.5 m. For TEASER++, the noise bound is set at 0.4 m. For 3D-BBS, the parameter min_level_res\mathrm{min\_level\_res} is set to 1.0 m for the lobby and 0.5 m for office corridors. Additionally, recent works STD [28] and Outram [23], which share a similar correspondence generation module with our approach but differ in transformation solvers, are also compared. Their key point extraction and descriptor construction methods were not directly applicable; hence, we integrated only their backend into our framework and evaluated it in the ablation experiments (Section VI-C3).

Our Method: Detailed parameters of our method are listed in Table II. The decomposition of BIM and the Hash table configuration for each subBIM were predetermined. For simplicity and to expedite the verification process, our prototype was implemented in Python without multi-threading, affecting runtime efficiency. The experiments were conducted on an Intel i7-10700 CPU and an Nvidia RTX3080 graphics card. Future versions will be reimplemented using C++ with multi-threading to meet real-time performance and practical application requirements.

VI-B Performance on Registration Recall

The registration recall results on the LiBIM-UST dataset are summarized in Table III. This table compares the performance of image registration, point cloud registration, and our method.

Methods relying on local descriptors, such as SIFT [31], Superpoint [32], and FPFH [34], all recorded a 0% registration recall. Despite attempts to optimize these methods by adjusting descriptor parameters and increasing the number of correspondences (typical counts were 400 for SIFT and Superpoint, and 4000 for FPFH), none were successful. The failure of these methods can be primarily attributed to two factors:

  • Inherent Limitations of Local Descriptors: The LiDAR submap and BIM share only basic geometric structures such as walls and corners, which often exhibit repetitive patterns (like single planes or simple intersections). This leads to local descriptors that are not discriminative enough for effective registration.

  • Mismatch with BIM’s Geometric Information: For image descriptors, the lack of rich gradient information due to uniform heights of indoor floors and ceilings poses a challenge. Point cloud descriptors like FPFH struggle because they require a diverse distribution of normal vectors, which is not typical in the mostly parallel or perpendicular planes found in BIM environments.

The 3D-BBS method [36] showed potential for LiDAR to BIM registration by searching through transformation space to align submaps closely with BIM. However, it does not account for discrepancies between as-designed and as-built conditions, and the extensive size of BIM compared to submaps makes the search process for 3D-BBS quite time-consuming.

Our method outperforms the aforementioned techniques by leveraging only the geometric features common to both the LiDAR submap and BIM, such as corners and walls. This approach not only reduces computational overhead but also minimizes interference from non-shared structures. By employing a Hough transform coupled with a voting mechanism, our method ensures that the pool of transformation candidates maximally includes potential true transformations, thereby enhancing the registration recall. Additionally, unlike the exhaustive search in 3D-BBS, our use of triangle descriptor-based matching significantly narrows the search scope. We also observed that a larger submap size (dsd_{s}) tends to increase recall, indicating its benefit in preventing registration degeneration, particularly in corridor settings.

TABLE IV: ABLATION STUDY ON LiBIM-UST DATASET
Variable Value Registration Recall(%) Average Time(s)
Building Day 2f-office Building Day 2f-office
Triangle Descriptor rlr_{\mathrm{l}} 0.5m 83.33 73.33 15.96 30.17
1.25m 91.67 93.33 25.44 158.16
2.0m 86.11 73.33 39.16 236.20
Triangle Descriptor rar_{\mathrm{a}} 2 88.89 80.00 10.34 57.47
3.8 91.67 93.33 25.44 158.16
4.5 88.89 80.00 25.58 145.08
Hough Transform rxyr_{\mathrm{xy}} 1.5m 91.67 86.67 28.95 161.41
2.5m 91.67 93.33 25.44 158.16
4.0 m 91.67 60.00 18.05 121.85
Hough Transform ryawr_{\mathrm{yaw}} 3 91.67 93.33 28.59 159.80
5 91.67 93.33 25.44 158.16
7 91.67 86.67 22.61 124.59
Top J in Voting 1 86.11 46.67 14.40 156.55
5 91.67 66.67 19.01 156.84
10 91.67 80.00 23.75 159.82
15 91.67 93.33 25.44 158.16
20 91.67 93.33 27.92 160.54
Transformation Estimator RANSAC 38.89 20.00 24.69 153.42
TEASER++ 77.78 26.67 NA NA
Hough Voting 91.67 93.33 25.44 158.16
Refer to caption
Figure 8: The registration recall and average time with varing parameter JJ in voting process.

VI-C Ablation Study

In this section, we explore the effects of several key parameters on the overall performance of our registration system. These parameters include the side length resolution rsr_{\mathrm{s}} and angle resolution rar_{\mathrm{a}} for triangle descriptors, the x-y resolution rxyr_{\mathrm{xy}}, yaw resolution ryawr_{\mathrm{yaw}}, and the parameter JJ in the voting and verification process, as well as different back-end systems for pose estimation. We set ds=10md_{s}=10m for Building Day sequences and ds=30md_{s}=30m for office sequences by default.

VI-C1 Triangle descriptor

The side length resolution rsr_{\mathrm{s}} and angle resolution rar_{\mathrm{a}} are crucial in addressing the discrepancies between corners derived from submaps and those from BIM. These discrepancies typically arise from LiDAR odometry drifts, differences between the as-designed BIM and the as-built environment, and errors in the corner extraction algorithm. As shown in Figure 9 and Table IV, both excessively high and low resolutions detrimentally affect the registration recall. Excessively fine resolutions prevent correct corner triplet correspondences from being grouped under the same triangle descriptor, leading to increased false negatives. On the other hand, excessively coarse resolutions may group incorrect correspondences together, increasing false positives and introducing incorrect transformation candidates during the voting process. This not only complicates the transformation verification process but also increases computation time significantly due to the higher number of corner triplets sharing the same descriptor, especially with an all-to-all matching strategy (see Section IV-C).

VI-C2 Transformation voting and verification

Table VI-C and Figure 8 illustrate how rxyr_{\mathrm{xy}}, ryawr_{\mathrm{yaw}}, and JJ influence registration recall and computation time. Incorrect corner extraction introduces noise in the parameter space, complicating the voting process. Too fine a resolution scatters votes, preventing them from aggregating around the true transformation, while too coarse a resolution reduces the precision of estimated transformations and may lead to false positives in transformation verification. Adjusting JJ affects both registration success and processing time; a smaller JJ might overlook true transformations, particularly in structurally repetitive environments like office corridors, whereas a larger JJ enhances recall at the cost of increased computation.

VI-C3 Back-end

As outlined in Section VI-A3, our framework incorporates the back-end algorithms, RANSAC and TEASER++, commonly employed in global localization methods such as STD [28] and Outram [23]. Specifically, for each subBIM candidate, we apply RANSAC and TEASER++ to generate a transformation candidate from its corner-wise correspondences. Then, we utilize the proposed wall-based verification to find the optimal candidate.

Table IV presents the registration results following the integration of RANSAC and TEASER++ into our system. Given that each subBIM candidate produces only a single transformation candidate, the likelihood of overlooking the true transformation is heightened. Furthermore, with a high outier ratio, RANSAC needs exponential time complexity to find the true estimation. TEASER++ trades off spatial complexity for speed with an O(n2)O(n^{2}) spatial complexity, and constructing the compatibility graph also demands a time complexity of O(n2)O(n^{2}), where nn is the number of corner-wise correspondences. This complexity has occasionally led to memory overflow during execution, causing termination of the program on some submaps. Consequently, in such instances, the runtime for TEASER++ is not recorded and is denoted as NA.

Refer to caption
Figure 9: The registration recall and average time with varing side length resolution rsr_{\mathrm{s}} and angle resolution rar_{\mathrm{a}} in the construction of triangle descriptors.
Refer to caption
Figure 10: Global registration results using our our method. Unsuccessful cases are illustrated in Failure-1 and Failure-2. Failure-1: multiple highly similar patterns exist. Failure-2: the scenario is geometrically uninformative like a long and narrow corridor, where only parallel lines can be extracted in the submap as shown in the dashed box.

VI-D Case Studies

This subsection highlights specific scenarios that demonstrate the limitations of our proposed method. As depicted in Figure 10, our method struggles in environments characterized by repetitive architectural patterns which lack distinctive corner triplets. Additionally, in scenarios where the mobile platform navigates through narrow and elongated corridors, the observable effective corner points are markedly reduced. This scarcity of corner points often leads to what we term as ‘degeneration issues,’ where the lack of sufficient distinctive features compromises the algorithm’s performance.

Figure 10 also provides further visualizations of the registration outcomes across different scenarios, supplemented with visual images captured by fisheye cameras mounted on the mobile platform. These visual aids illustrate the practical challenges faced in these specific architectural settings and help in understanding the limitations of the current system in managing complex environments.

VII Conclusion

In this study, we design a global registration method that could align LiDAR data to BIM without initial guess. Lines and corners are extracted in BIM and LiDAR as shared features. Then, triangle descriptors are proposed and correspondences are formed using the Hash table. In the back end, we incorporate the Hough transform to estimate the multiple candidates for transformation. Lastly, we verify the consistency of each candidate and find the optimal transformation as the final pose estimation. The method is evaluated on a real-world university building, with collected multi-session LiDAR data, named LiBIM-UST dataset.

Aknowledgement

We sincerely thank Prof. Jack Chin Pang Cheng, a Professor at the Department of Civil and Environmental Engineering, Hong Kong University of Science and Technology, for kindly providing the BIM of the HKUST Academic Building.

References

  • [1] Q. Wang and M.-K. Kim, “Applications of 3d point cloud data in the construction industry: A fifteen-year review from 2004 to 2018,” Advanced engineering informatics, vol. 39, pp. 306–319, 2019.
  • [2] W. Xu, Y. Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar-inertial odometry,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022.
  • [3] K. Asadi, H. Ramshankar, M. Noghabaei, and K. Han, “Real-time image localization and registration with bim using perspective alignment for indoor monitoring of construction,” Journal of Computing in civil Engineering, vol. 33, no. 5, p. 04019031, 2019.
  • [4] J. Chen, S. Li, and W. Lu, “Align to locate: Registering photogrammetric point clouds to bim for robust indoor localization,” Building and Environment, vol. 209, p. 108675, 2022.
  • [5] R. Hartley and A. Zisserman, Multiple view geometry in computer vision.   Cambridge university press, 2003.
  • [6] D. Feng, Z. He, J. Hou, S. Schwertfeger, and L. Zhang, “Floorplannet: Learning topometric floorplan matching for robot localization,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 6168–6174.
  • [7] N. Zimmerman, M. Sodano, E. Marks, J. Behley, and C. Stachniss, “Constructing metric-semantic maps using floor plan priors for long-term indoor localization,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 1366–1372.
  • [8] H. Yin, Z. Lin, and J. K. Yeoh, “Semantic localization on bim-generated maps using a 3d lidar sensor,” Automation in Construction, vol. 146, p. 104641, 2023.
  • [9] M. Shaheer, H. Bavle, J. L. Sanchez-Lopez, and H. Voos, “Robot localization using situational graphs and building architectural plans,” arXiv preprint arXiv:2209.11575, 2022.
  • [10] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611.   Spie, 1992, pp. 586–606.
  • [11] M. Magnusson, A. Lilienthal, and T. Duckett, “Scan registration for autonomous mining vehicles using 3d-ndt,” Journal of Field Robotics, vol. 24, no. 10, pp. 803–827, 2007.
  • [12] H. Yin, X. Xu, S. Lu, X. Chen, R. Xiong, S. Shen, C. Stachniss, and Y. Wang, “A survey on global lidar localization: Challenges, advances and open problems,” International Journal of Computer Vision, pp. 1–33, 2024.
  • [13] L. Bernreiter, L. Ott, J. Nieto, R. Siegwart, and C. Cadena, “Phaser: A robust and correspondence-free global pointcloud registration,” IEEE Robotics and Automation Letters(RA-L), vol. 6, no. 2, pp. 855–862, 2021.
  • [14] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation.   IEEE, 2009, pp. 3212–3217.
  • [15] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2017, pp. 652–660.
  • [16] H. Thomas, C. R. Qi, J.-E. Deschaud, B. Marcotegui, F. Goulette, and L. J. Guibas, “Kpconv: Flexible and deformable convolution for point clouds,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6411–6420.
  • [17] Y. Wang and J. M. Solomon, “Deep closest point: Learning representations for point cloud registration,” in Proceedings of the IEEE/CVF international conference on computer vision(ICCV), 2019, pp. 3523–3532.
  • [18] Z. Qin, H. Yu, C. Wang, Y. Guo, Y. Peng, and K. Xu, “Geometric transformer for fast and robust point cloud registration,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), 2022, pp. 11 143–11 152.
  • [19] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2015, pp. 1912–1920.
  • [20] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser, “3dmatch: Learning local geometric descriptors from rgb-d reconstructions,” in Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2017, pp. 1802–1811.
  • [21] A. Rosinol, A. Violette, M. Abate, N. Hughes, Y. Chang, J. Shi, A. Gupta, and L. Carlone, “Kimera: From slam to spatial perception with 3d dynamic scene graphs,” The International Journal of Robotics Research, vol. 40, no. 12-14, pp. 1510–1546, 2021.
  • [22] N. Hughes, Y. Chang, and L. Carlone, “Hydra: A real-time spatial perception system for 3d scene graph construction and optimization,” arXiv preprint arXiv:2201.13360, 2022.
  • [23] P. Yin, H. Cao, T.-M. Nguyen, S. Yuan, S. Zhang, K. Liu, and L. Xie, “Outram: One-shot global localization via triangulated scene graph and global outlier pruning,” arXiv preprint arXiv:2309.08914, 2023.
  • [24] G. Pramatarov, D. De Martini, M. Gadd, and P. Newman, “Boxgraph: Semantic place recognition and pose estimation from 3d lidar,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 7004–7011.
  • [25] Z. Qiao, Z. Yu, B. Jiang, H. Yin, and S. Shen, “G3reg: Pyramid graph-based global registration using gaussian ellipsoid model,” arXiv preprint arXiv:2308.11573, 2023.
  • [26] R. G. Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A line segment detector,” Image Processing On Line, vol. 2, pp. 35–55, 2012.
  • [27] M. Ester, H.-P. Kriegel, J. Sander, X. Xu et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in kdd, vol. 96, no. 34, 1996, pp. 226–231.
  • [28] C. Yuan, J. Lin, Z. Zou, X. Hong, and F. Zhang, “Std: Stable triangle descriptor for 3d place recognition,” in 2023 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2023, pp. 1897–1903.
  • [29] K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-squares fitting of two 3-d point sets,” IEEE Transactions on pattern analysis and machine intelligence, no. 5, pp. 698–700, 1987.
  • [30] J. Jiao, H. Wei, T. Hu, X. Hu, Y. Zhu, Z. He, J. Wu, J. Yu, X. Xie, H. Huang et al., “Fusionportable: A multi-sensor campus-scene dataset for evaluation of localization and mapping accuracy on diverse platforms,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 3851–3856.
  • [31] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2.   Ieee, 1999, pp. 1150–1157.
  • [32] D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
  • [33] P. Lindenberger, P.-E. Sarlin, and M. Pollefeys, “Lightglue: Local feature matching at light speed,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 17 627–17 638.
  • [34] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation.   IEEE, 2009, pp. 3212–3217.
  • [35] H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certifiable point cloud registration,” IEEE Transactions on Robotics, vol. 37, no. 2, pp. 314–333, 2020.
  • [36] K. Aoki, K. Koide, S. Oishi, M. Yokozuka, A. Banno, and J. Meguro, “3d-bbs: Global localization for 3d point cloud scan matching using branch-and-bound algorithm,” arXiv preprint arXiv:2310.10023, 2023.