[1]\fnmXiaojing\surTian
[1]\orgdivSchool of Mechanical Engineering, \orgnameDalian Jiaotong University, \orgaddress\street794 Huanghe Road, \cityDalian, \postcode116028, \stateLiaoning, \countryChina
2]\orgdivSchool of Mechanical Engineering, \orgnameDalian University of Technology, \orgaddress\street2 Linggong Road, \cityDalian, \postcode116024, \stateLiaoning, \countryChina
MEDPNet: Achieving High-Precision Adaptive Registration for Complex Die Castings
Abstract
Due to their complex spatial structure and diverse geometric features, achieving high-precision and robust point cloud registration for complex Die Castings has been a significant challenge in the die-casting industry. Existing point cloud registration methods primarily optimize network models using well-established high-quality datasets, often neglecting practical application in real scenarios. To address this gap, this paper proposes a high-precision adaptive registration method called Multiscale Efficient Deep Closest Point (MEDPNet) and introduces a die-casting point cloud dataset, DieCastCloud, specifically designed to tackle the challenges of point cloud registration in the die-casting industry. The MEDPNet method performs coarse die-casting point cloud data registration using the Efficient-DCP method, followed by precision registration using the Multiscale feature fusion dual-channel registration (MDR) method. We enhance the modeling capability and computational efficiency of the model by replacing the attention mechanism of the Transformer in DCP with Efficient Attention and implementing a collaborative scale mechanism through the combination of serial and parallel blocks. Additionally, we propose the MDR method, which utilizes multilayer perceptrons (MLP), Normal Distributions Transform (NDT), and Iterative Closest Point (ICP) to achieve learnable adaptive fusion, enabling high-precision, scalable, and noise-resistant global point cloud registration. Our proposed method demonstrates excellent performance compared to state-of-the-art geometric and learning-based registration methods when applied to complex die-casting point cloud data.
keywords:
Complex Die Castings, point cloud registration, efficient Attention, multiscale feature fusion1 Introduction
Complex Die Castings are critical components in industries such as manufacturing, transportation, and defense, characterized by intricate structures and diverse forms. High-quality three-dimensional reconstruction of their overall surfaces through point cloud registration plays a vital role in enhancing product molding quality and ensuring safety in subsequent use. Recent work has made substantial progress in fully automatic, 3D feature-based point cloud registration. At first glance, benchmarks like 3DMatch [1] appear to be saturated, with multiple state-of-the-art (SoTA) methods reaching nearly 95% feature matching recall and successfully registering over 80% of all scan pairs [35]. However, due to the complexity of the spatial structure of Die Castings and their susceptibility to complex background interferences such as casting reflections, oil contamination, machining marks, etc., there is currently no effective method to achieve high-precision point cloud registration of die-casting data. We believe that a high-precision adaptive method is the key to addressing this issue.
Currently, representative point cloud registration methods can be broadly categorized into two main types: those based on geometric properties[24][29][6][7]] and those based on deep learning[35][3][36][27][8]. The method for point cloud registration, as shown in Fig 1, aims to calculate the optimal transformation parameters (R, T) (three rotation angles in R and three translation components in T) from the common parts of the data known as correspondences[49]. In recent years, with the rapid development of deep learning[4][12], it has found widespread application in point cloud registration tasks. Deep learning-based registration algorithms, including DCP[8], PointNetLK[32], GeoTransformer[18], etc., have significantly improved the speed and performance of point cloud registration tasks. However, these methods often require more computational resources, and their performance is frequently constrained by the quality of the dataset, often leading to suboptimal results in practical applications and difficulty in achieving stable high-precision point cloud registration effects.
Representative point cloud registration algorithms based on geometric properties include Iterative Closest Point (ICP)[24] and Normal Distributions Transform (NDT)[29]. Such methods have low hardware requirements, are easy to implement, exhibit strong interpretability, and do not involve time-consuming training processes. However, they face challenges such as sensitivity to local minima or poor generalization, reliance on manually crafted features to distinguish corresponding relationships, and significant impact from the designer’s experience and parameter tuning capabilities. Additionally, these methods often consume considerable time, posing potential bottlenecks in real-time applications.
In addressing this challenge, given the intricate nature of die-cast components, we have devised a highly efficient adaptive registration method and created the DieCastCloud point cloud dataset to validate the efficacy of the approach.DCP method performs well in point cloud registration tasks, but it still has certain stability issues when faced with complex die-cast point cloud data with diverse surface feature variations. To address this problem, we introduce Efficient Attention[14] to replace the Transformer Attention[48] in DCP. By combining serial and parallel blocks, Efficient Attention efficiently captures global feature information and improves computational efficiency. Efficient Attention differs from traditional self-attention mechanisms in terms of implementation. Traditional self-attention mechanisms[42][22][41] generate an attention map for each position to aggregate input values and produce outputs. In contrast, Efficient Attention does not generate separate attention maps for each position. Instead, it interprets the keywords of attention as global attention maps, with each global attention map corresponding to a semantic aspect of the entire input. Efficient Attention uses these global attention maps to aggregate values and generate a global context vector. Then, each position uses a set of coefficients to weight the global context vector and adjust its own representation. This approach gives Efficient Attention advantages in terms of memory and computational efficiency, as it does not require calculating similarities between each pair of positions, thereby reducing computational and storage complexities. Although the improved DCP network provides initial registration for casting point clouds, it has certain limitations in terms of accuracy and stability, restricting its practical feasibility in industrial applications.
To address this issue, we propose a multi-scale adaptive fine registration method. Building upon a favorable initial pose obtained from the DCP network, we further enhance the precise registration of point cloud data through the fusion of multi-scale feature information. In order to ensure the robustness and verifiability of fine registration, we improve and integrate ICP and NDT. Specifically, we initially perform multi-scale feature extraction on the die-casting point cloud data to avoid the impact of feature loss or noise interference on point cloud registration. Subsequently, we use a dual-channel approach to obtain transformation matrices from NDT and ICP that have undergone multi-scale feature fusion. We then apply nonlinear weighting to the obtained transformation matrices, endowing them with good adaptability and stable registration accuracy.
-
•
We replaced the Transformer’s Attention in DCP with Efficient Attention and implemented a collaborative scale mechanism through a combination of serial and parallel blocks to improve both the modeling capability and computational efficiency of the model.
-
•
We propose a Multiscale feature fusion dual-channel precision registration(MDR) method and supported our experimental details through ablation experiments under various scenarios.
-
•
We established the point cloud dataset DieCastCloud to address the challenge of scarce high-quality point cloud data in the die casting industry.

2 Related Work
DCP (Deep Closest Point): DCP[8] is a representative learning-based method for point cloud registration[32][18][9]. The primary objective of the DCP method is to address issues encountered by traditional methods such as 4PCS(4-points congruent sets for robust pairwise surface registration)[2] and CPD(Coherent point drift)[7], which are prone to noise interference and susceptible to getting trapped in local optima. DCP utilizes deep neural networks[51] to learn representations of point clouds and employs these learned representations for point cloud registration. Specifically, the DCP[8] method comprises two sub-networks: a feature extraction network and a transformation prediction network. The feature extraction network is responsible for extracting local and global geometric features[38] from the input point clouds, while the transformation prediction network predicts the rigid transformation required to align two point clouds.
To tackle the matching problem in point cloud registration[1][19][9][2][50], DCP[8] adopts the approach of Pointer Networks[31]. Pointer Networks use attention mechanisms to select positions in the input sequence, addressing the challenge of predicting discrete labels. By predicting a position distribution at each output step, Pointer Networks can be considered as ”soft pointers” for selecting matching positions. The entire network is differentiable, allowing for end-to-end training[55]. HSGM [21][16] proposes a hierarchical similarity graph module to relieve the conflict of backbone networks and mine the discriminative features. Additionally, Transformer[48] models are utilized to learn contextual information of point clouds, enabling the model to capture global feature information. However, DCP still exhibits certain limitations when dealing with point cloud data with significant initial pose differences. In the presence of added Gaussian noise[52], although DCP[8] demonstrates better robustness compared to methods like FGR[15], it is still subject to some degree of influence.
An Adaptive Registration Method Based on Multimodal Data: The current trend in point cloud tasks is the increasing popularity of multimodal data[5][39]. Geometry-based methods[23][43][45], in the context of point cloud registration, involve utilizing geometric features such as point positions, distances, and orientations to establish correspondences and achieve alignment between point clouds. These methods typically aim to find the optimal rigid transformation to minimize geometric disparities between point clouds, ensuring accurate registration. ICP (Iterative Closest Point)[24] and NDT (Normal Distributions Transform)[29] are two of the most renowned geometry-based point cloud registration methods, particularly suitable for scenarios with local overlap and small-scale rigid transformations.
ICP (Iterative Closest Point)[24] is a classical method for point cloud registration, with the primary goal of iteratively finding the optimal rigid transformation between two point clouds to align them as closely as possible[57][58]. The core idea of ICP involves iteratively mapping points from the target point cloud to the reference point cloud and updating the rigid transformation based on the corresponding mappings. This iterative process continues until convergence is achieved, ultimately realizing the best possible alignment between the two point clouds. NDT (Normal Distributions Transform)[29] is a method used for point cloud registration, and its core idea involves describing the local structure of each point cloud by modeling the normal distribution of points. By mapping each point in the point cloud to its corresponding Gaussian distribution[53], NDT represents the point cloud as a set of probability density distributions[56]. During the registration process, the method adjusts the rigid transformation[54] by minimizing the disparity in probability density distributions between the two point clouds, thereby achieving optimal point cloud alignment[59][60].
While both geometric-based methods[24][29][2][45] and their variants[50][23] have become increasingly mature, integrating them into practical engineering applications remains a highly challenging task. The ICP method performs well in scenarios with local overlap and small-scale rigid transformations but is sensitive to noise and prone to getting stuck in local optima. The NDT method demonstrates advantages in handling large-scale[20], sparse, or point clouds with complex geometric structures; however, its performance is constrained by the choice of parameters.
3 Proposed Method
3.1 Overview
Given two sets of point clouds and , the objective is to estimate rigid transformations for to align these two point clouds. In Fig 2, we present the architecture of MEDPNet. In brief, we embed the acquired die-casting part point cloud data into a high-dimensional space using DGCNN[10] and encode the contextual information with the Efficient Attention[14] module, finally estimating the alignment using a differentiable SVD layer][47]. In which, is the embedding of point in the -th layer, and is a nonlinear function in the -th layer parameterized by a shared multilayer perceptron (MLP). The forward mechanism is given by:
(1) |
We input the collected unaligned input point clouds X and Y into the same space, where we embed each point of the two input point clouds individually, and iterate over the features of each point in the input point clouds, represented by the aggregation function as:
(2) |
and
(3) |
DGCNN constructs a k-NN (k-nearest neighbor) graph , nonlinearly acquires edge values at edge endpoints, and performs per-vertex aggregation at each layer. Unlike PointNet[33], which extracts independent information from each point, DGCNN explicitly incorporates local geometric shapes into its representation. This is achieved through the forward mechanism:
(4) |
where represents the set of neighbors of point in the k-NN graph, ensuring that local geometric features are considered during the aggregation process. In the task of die-casting part point cloud registration, DGCNN achieves higher quality registration performance by leveraging these local geometric information.


3.2 Efficient Attention
Before introducing Efficient attention, let’s first discuss the concept of dot-product attention. Dot-product attention is a fundamental attention mechanism commonly used in models like Transformers. For a given query vector , key vector , and value vector , the computation of dot-product attention is as follows:
(5) |
Here, is the dimensionality of query/key vectors, represents the dot product between query vector and key vector , and it is scaled by to stabilize gradient magnitudes. The softmax function[46] normalizes the dot product results into attention weights, which are then used to weight the value vector to generate the final output. This mechanism allows the model to dynamically allocate attention based on the similarity between queries and keys, capturing relationships between different positions in the input sequence.
The principle of efficient attention is to optimize the traditional attention mechanism, particularly in addressing the challenges of high computational complexity and significant memory consumption when processing long sequence data. By reducing redundancies in computation and employing more efficient computational strategies, such as low-rank factorization and kernel techniques, it approximates the key operation in the standard self-attention mechanism, as shown in Fig 3.
In the standard self-attention mechanism, the input sequence undergoes linear transformations to obtain queries , keys , and values , where , , , and , , are the corresponding weight matrices, with being the feature dimension.
Traditionally, attention weights are obtained by computing and applying the softmax function, i.e.,
(6) |
This step has a computational complexity of , which for long sequence data, results in significant computational burden and memory requirements.
Efficient attention introduces an approximation technique to reduce this complexity, specifically, it uses a form
(7) |
for approximation, where is a nonlinear function mapping to a lower-dimensional feature space, effectively reducing the required computational power and storage space. This mapping not only reduces the need for direct computation of but also, by selecting an appropriate function, can lower the computational complexity of the attention mechanism from to , where is the dimension of the mapped lower-dimensional space, significantly smaller than the length of the input sequence .
Precisely for these reasons, Efficient attention significantly enhances computational and storage efficiency in die-cast parts point cloud registration tasks by optimizing the attention mechanism. It effectively manages long-sequence dependencies, accurately captures changes and spatial relationships between point clouds, achieves high-precision registration, and broadens the application scope in resource-constrained environments.
3.3 Adaptive Multi-scale Patch Matching for Registration
First, let’s review the geometric point cloud registration method. Its core idea involves iteratively optimizing the rigid transformation {R, t}, where is the rotation matrix and is the translation vector. The objective is to minimize the distance between two point clouds, achieving their alignment. The expression for this can be given as follows:
(8) |
our approach primarily adopts the ICP (Iterative Closest Point) method and the NDT (Normal Distributions Transform) method. The core idea of the ICP method is to achieve point cloud registration by iteratively optimizing the rigid transformation {R, t} to minimize the distance between point clouds. Its expression is:
(9) |
Here, ICP primarily aligns point clouds through iterative optimization. In the process of minimizing the objective function, adjustments to and are made to bring the two point clouds as close together as possible in space.

The NDT method describes the local structure of point clouds by modeling the normal distribution of each point. Its optimization objective is formulated as:
(10) |
and denote the mean of the normal distribution corresponding to points in two point clouds, and represents the covariance matrix of the normal distribution. The goal is to adjust and to make the two point clouds as consistent as possible in terms of normal distribution, minimizing the objective function.
Multi-scale Feature Fusion Module: In both ICP (Iterative Closest Point) and NDT (Normal Distributions Transform), we introduced a multiscale feature fusion module aimed at addressing key challenges in point cloud registration, such as local minima and sensitivity to initial values. Our approach first utilizes the Feature Pyramid Network (FPN) [17] method to generate multiple scales of point clouds through downsampling. At each scale, the ICP algorithm is independently applied to find the optimal rigid transformation. Subsequently, the coarse-scale registration results are propagated to finer scales, resulting in the final registration outcome and transformation matrix. We have successfully implemented feature fusion on die-cast point cloud pairs at different scales.
The multiscale ICP method aims to minimize the registration error across all scales and corresponding points by finding the optimal rigid transformation and . This approach allows us to comprehensively tackle registration challenges at multiple scales, thereby enhancing the algorithm’s robustness in diverse scale environments. The expression for this method is given by:
(11) |
similarly, multiscale Normal Distributions Transform (NDT) minimizes the cumulative error of the normal distribution for corresponding points at each scale, utilizing the Mahalanobis distance metric (measured by the difference between the inverse covariance matrix and the normal distributions). This optimization seeks to find the optimal rigid transformation and translation vector at each scale. This enables precise registration of point cloud with target point cloud in the normal distribution across multiple scales through adjustments in rigid transformation and translation. It can be expressed as:
(12) |
Dual Channel Fusion Module: Previous studies have primarily focused on enhancing the performance of the ICP (Iterative Closest Point) and NDT (Normal Distributions Transform) methods, yet they have overlooked the importance of ensuring stability under conditions of high precision. This issue becomes particularly evident when dealing with complex die-cast point cloud data, where the ICP algorithm may perform excellently on point cloud data ”a,” while the NDT algorithm shows superior performance on point cloud data ”b.” To overcome this challenge, we propose a novel dual-channel fusion module. Through multi-scale feature fusion, this module enables the ICP and NDT methods to obtain rigid transformation matrices and , respectively. We input 300 pairs of rigid transformation matrices from the ICP and NDT methods into an MLP for learnable self-feedback weighting and iterate the weights of the optimal registration results by minimizing the registration error, as shown in Fig 4. The MLP consists of three fully connected layers, with neuron counts of 32, 64, and 32, respectively. Here, we opt for the Huber loss function, expressed as:
(13) |
Where represents the prediction error, i.e., the difference between the actual value and the predicted value . is a threshold parameter that determines the point at which the loss function transitions from squared error to linear error. When the absolute value of the error is less than or equal to , the loss function behaves like the square of the error (similar to MSE), imposing a heavier penalty for smaller errors to encourage more precise fitting. Conversely, when the absolute value of the error exceeds , the loss function becomes linear (similar to MAE), reducing the penalty for larger errors and enhancing the model’s robustness.
Although the merged matrix obtained at this point demonstrates certain reliability, repeated tests have shown that the model’s accuracy decreases when facing unfamiliar samples. We hypothesize that this instability might be due to the limited sample size, making it difficult for the model to learn the complete features of die-cast part point clouds. However, due to the irreplicability of die-cast samples and the industrial production cycle’s inability to accommodate the training duration for a large volume of samples, we introduced the hyperparameter . Initially, we hoped to directly obtain optimal weights and through the MLP mechanism, but the limited training samples and excessive number of parameters to be optimized led to unsatisfactory results. To address this, we added a self-updating filtering mechanism on top of the MLP. With the determination of optimal weights and , we use the root mean square error (RMSE) feedback from each iteration to determine the corresponding , choosing the associated with the minimum RMSE as the input for the next iteration, as shown in Fig 4.The formula for RMSE is as follows:
(14) |
Where is the number of samples, is the actual value of sample , and is the predicted value for sample .Finally,we can obtain:
(15) |
In this setup, we achieved learnable adaptive registration through a multilayer perceptron and a self-updating filtering mechanism, obtaining desirable results on the die-cast part point cloud dataset DieCastCloud.
4 Experiment and Analysis
In this section, we will conduct comparative experiments to assess the effectiveness of our approach. We first introduce the details of the experiments in Section 4.1. In Section 4.2, we evaluate the Efficient DCP method on our die-cast dataset and perform ablation experiments to ensure the method’s effectiveness. In Section 4.3, we introduce the an adaptive registration method based on multimodal data, providing corresponding experiments at each step. In Section 4.5, we will present an overall introduction to our method, MEDPNet (Multimodal Efficient Deep Closest Point), and compare it with the current state-of-the-art methods.
4.1 Implementation Details
This experiment utilizes Open3D[30] 1.2.0 and PCL 1.9.1 to implement algorithm execution in Python and C++. The experimental platform is Ubuntu 18.04 system, with PyTorch[37] version 1.8.1, CUDA version 11.1, GPU=RTX 3090 (24GB) * 1, CPU=15 vCPU AMD EPYC 7642 48-Core Processor. To test the generalization of different models, we will split DieCastCloud into training and testing sets.
Due to the complexity of the surface features of die cast parts, unlike the approach in PointNet[33] experiments of uniformly sampling 1024 points on the model’s outer surface, we opted to sample 4096 points. This decision was based on an understanding of the complexity of die cast surface features, aiming to more comprehensively preserve the point cloud’s feature information, thereby enhancing the accuracy and reliability of subsequent registration.
To ensure consistency and standardization in data processing, we performed a series of preprocessing steps on the collected point cloud data. Initially, the point cloud data was centered at the origin and scaled to fit within a unit sphere. Throughout this process, we only used the three-dimensional coordinates (x, y, z) of the points as input features, without introducing any additional attribute information, to accurately assess the model’s ability to recognize and process geometric shapes themselves.
The initial pose has a critical impact on point cloud registration. To better quantify the performance of coarse registration, we employed multiple error metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), to ensure the reliability of the method. Moreover, considering practical applications in the die casting industry, we primarily focused on Root Mean Squared Error (RMSE) and registration time (s) during fine registration, where all angle measurements were made in degrees (°).
4.2 Datasets
The DieCastCloud dataset contains 2,000 point cloud data, including 5 different types of die-cast parts. This dataset is randomly divided into a training set and a test set, with proportions of 0.8 and 0.2, respectively. In practical applications in the die-casting industry, the purpose of point cloud registration is to enhance the completeness of the point cloud data while preserving key features, to ensure high-precision 3D reconstruction and facilitate product quality control. Unlike other datasets[28][25], here, to ensure the practical feasibility of the method, the overlap rate of point cloud data in DieCastCloud is set to be greater than .
We utilized the UR16e robotic arm equipped with the high-precision 3D laser scanner CIRRUS 3D 300 to collect point cloud data of die-cast parts, and we named the resulting dataset DieCastCloud. The point cloud data in DieCastCloud covers the main external surfaces of the die-cast parts, including complex geometric features such as pipes and holes. Additionally, we processed and filtered the collected raw point cloud data to obtain a richer sample set.
Finally, in the creation process of DieCastCloud, point cloud data was enhanced through techniques such as rotation, translation, scaling, and random erasure to increase the diversity of the data and enhance the model’s generalization capabilities. Specifically, we randomly rotated the point cloud data at random angles along any axis and translated it in any spatial direction, with the translation range controlled within [-800mm, 800mm]. The scaling ratio was constrained to between [0.95, 1.05] of the original point cloud size.
4.3 Efficient of DCP
In this experiment, we use DCP-v2, which incorporates a Transformer, as our baseline. We compare the performance of Efficient DCP with other cutting-edge deep learning-based point cloud registration methods, including PointNetLK[32], GeoTransformer[18], PRNet[36], DeepGMR[9], and DCP[8]. We randomly split 2000 die-cast component point clouds from DieCastCloud into validation and test sets, utilizing different point cloud data during the training and testing periods.During training, we sampled the point clouds and applied a random rigid transformation along each axis, with rotations uniformly sampled within [0, 60°] and translations in the range of [-150mm, 150mm]. The source point cloud and the point cloud after the rigid transformation were used as the input to the network.
To ensure a fair comparison among these methods, we follow the convention and use performance metrics including Mean Squared Error (MSE) for rotation angles (MSE(R)), Mean Squared Error for translation directions (MSE(t)), Root Mean Squared Error for rotation angles (RMSE(R)), Root Mean Squared Error for translation directions (RMSE(t)), Mean Absolute Error for rotation angles (MAE(R)), and Mean Absolute Error for translation directions (MAE(t)), to guarantee the reliability of the experiments.
Table 1 assesses the performance of our method and its counterparts in this experiment. In this study, Efficient DCP adopts a structure that integrates DGCNN with Efficient Transformer, with a learning rate of 0.001, 200 epochs, train batch sizes of 32 and train batch sizes of 10, and employs Stochastic Gradient Descent (SGD) as the optimizer. Across all evaluated performance metrics, Efficient DCP showcases outstanding performance.
Model | MSE(R) | MSE(t) | RMSE(R) | RMSE(t) | MAE(R) | MAE(t) |
PointNetLK | 51.271578 | 0.114432 | 6.842565 | 0.089526 | 7.664845 | 0.045454 |
GeoTransformer | 24.352470 | 0.001422 | 5.898481 | 0.002177 | 2.974119 | 0.084997 |
PRNet | 82.665150 | 0.014432 | 12.54549 | 0.114551 | 4.859481 | 0.072361 |
DeepGMR | 29.159647 | 0.008747 | 3.861095 | 0.084411 | 3.784151 | 0.048944 |
DCP | 24.372444 | 0.009330 | 4.851226 | 0.017721 | 2.311324 | 0.027983 |
Efficient DCP(Ours) | 4.822984 | 0.000231 | 2.196129 | 0.015187 | 1.350260 | 0.008338 |
\botrule |
4.4 Multiscale Feature Fusion Dual-channel Precision Registration
In this section, we conducted comparative experiments to validate our choice of the Iterative Closest Point (ICP) and Normal Distributions Transform (NDT) methods. In real industrial scenarios, registration time needs to be controlled within 60 seconds. To better evaluate the feasibility of the methods, we used Root Mean Square Error (RMSE) in millimeters and registration time in seconds as performance evaluation metrics, with the results shown in Table 2. Additionally, to more intuitively understand the impact of rotation angles (°) and translation distances (mm) on various methods, we conducted comparative experiments to test the performance of each method at specific rotation angles and translation distances, with results presented in Table 3. Finally, we elucidated the advantages of our dual-channel precision registration method based on multi-scale features.
Data clean | Data noisy | |||
Method | RMSE(mm) | Time(s) | RMSE(mm) | Time(s) |
SAC-IA | 0.128 | 47.64 | 0.206 | 53.76 |
4PCS | 0.168 | 13.32 | 0.273 | 16.11 |
NDT | 0.158 | 4.33 | 0.182 | 5.64 |
ICP | 0.153 | 12.47 | 0.197 | 18.12 |
MDR(ours) | 0.092 | 25.92 | 0.148 | 29.41 |
\botrule |
Data Rotate (°) | Data Translation (mm) | |||||
---|---|---|---|---|---|---|
Method | 10 | 20 | 30 | 100 | 500 | 1000 |
SAC-IA | 0.004 | 0.008 | 0.015 | 0.003 | 0.007 | 0.015 |
4PCS | 0.002 | 0.007 | 0.011 | 0.010 | 0.014 | 0.022 |
NDT | 0.007 | 0.009 | 0.012 | 0.001 | 0.003 | 0.010 |
ICP | 0.002 | 0.004 | 0.008 | 0.003 | 0.008 | 0.017 |
MDR(ours) | 0.001 | 0.001 | 0.003 | 0.001 | 0.002 | 0.002 |
\botrule |
We first evaluated the performance of SAC-IA[45], 4PCS[2], NDT[29], ICP[24], and MDR on the DieCastCloud dataset. To assess the robustness of these methods, we introduced noise with an intensity of 0.1 into the DieCastCloud dataset and then conducted comparative experiments to evaluate the stability of each method under noisy conditions. By incorporating a multi-scale feature fusion module, MDR possesses more comprehensive feature information compared to other methods, thereby achieving higher registration accuracy and noise resistance. As shown in Table 2, MDR demonstrates high performance while meeting the registration time requirements in practical applications.

Next, we tested the effects of rotation angles and translation distances on several algorithms. We rotated the point cloud data by 10°, 30°, and 90° around any spatial axis and translated it by 100mm, 500mm, and 1000mm in any spatial direction. As can be seen from Table 3, ICP showed a notable performance for different angles of rotation, while NDT performed better in facing translation issues and exhibited greater robustness in dealing with spatial position changes. Our method demonstrated the best registration performance compared to the other methods.
Method | Data clean | Data noisy |
---|---|---|
PointNetLK | 6.84315 | 15.9744 |
DCP | 4.85126 | 5.87391 |
NDT | 4.27784 | 9.76239 |
Efficient DCP (Ours) | 2.19618 | 3.71646 |
ICP | 2.07729 | 6.73248 |
MDR (Ours) | 1.94877 | 4.67442 |
MEDPNet (Ours) | 1.17245 | 1.32954 |
\botrule |
4.5 Influence of MEDPNet
Overall, the MEDPNet method achieves high-quality registration results by initially applying Efficient DCP for coarse registration of unaligned point cloud pairs, followed by fine-tuning through MDR. To assess the accuracy and robustness of our method, we conducted tests on both clean and noisy samples, with our experimental results presented in Table 4. We selected root mean square error (RMSE) as the performance metric to comprehensively account for both variance and bias. The experimental outcomes indicate that our method not only ensures high accuracy but also maintains robustness across different conditions.
4.6 Visualization
In this section, we present a visual comparison of the performance of various advanced methods against our MEDPNet approach, as illustrated in Fig 5. We selected three typical types of die casting samples and conducted experiments under both noise-free and noisy conditions to ensure the practicality of our method. Our comparison mainly includes PointNetLK[32], DCP[8], ICP[24], NDT[29], and our improved methods Efficient DCP, MDR, and MEDPNet, arranged in descending order according to root mean square error (RMSE). As can be observed in Fig 5, MEDPNet achieves state-of-the-art performance on both clean and noisy samples.
5 Conclusion
In the intricate domain of die casting, where complex spatial structures and heterogeneous geometric features prevail, the quest for precise and resilient point cloud registration represents a formidable challenge. Traditional methodologies predominantly hinge on high-caliber datasets, endeavoring to enhance registration fidelity through network model optimization, yet frequently neglecting the nuances of real-world deployment. Addressing this lacuna, the present exposition delineates the Multiscale Efficient Deep Closest Point (MEDPNet) modality, coupled with the establishment of DieCastCloud, a bespoke point cloud dataset specifically designed to mitigate the application impediments of point cloud registration within the die casting sphere.
The MEDPNet initially conducts coarse registration based on Efficient-DCP, subsequently transitioning to advanced precision registration via the Multiscale feature fusion dual-channel registration (MDR) method. By replacing the traditional Transformer’s attention mechanism with Efficient Attention, it introduces a Multiscale feature fusion dual-channel precision registration (MDR) technique. This technique minimizes registration errors by adaptively optimizing the final transformation matrix using multilayer perceptrons (MLP), resulting in an adaptive, scalable, and highly robust global point cloud registration framework.
Although our method achieves excellent registration results for die casting point clouds, there are still some shortcomings. Firstly, due to the large size of Die Castings, as well as the presence of occlusions, blind spots, and the influence of machining marks, collecting high-quality point cloud data often requires a significant amount of time. Furthermore, for different Die Castings, it is necessary to adjust the data collection strategy of the robot. We believe that an intelligent die casting point cloud generator is key to solving this problem. Secondly, in the step of precise registration, we utilize unsubsampling point cloud data, where the parameter count of each point cloud reaches the tens of millions level. In actual industrial production, how to reduce computational costs is an urgent problem that needs to be addressed.
6 Acknowledgements
This work was supported by both the Unveiling the Top Technical Research Project of Dalian City (2023JB11GX001) and the Key Special Projects of the National Key R&D Program (2022YFB3706802).
References
- [1] A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser, “3dmatch: Learning local geometric descriptors from rgb-d reconstructions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1802–1811.
- [2] D. Aiger, N. J. Mitra, and D. Cohen-Or, “4-points congruent sets for robust pairwise surface registration,” in ACM SIGGRAPH 2008 papers, 2008, pp. 1–10.
- [3] J. Hu, Z. Huang, F. Shen, D. He, and Q. Xian, “A bag of tricks for fine-grained roof extraction,” in IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023.
- [4] ——, “A rubust method for roof extraction and height estimation,” in IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2023.
- [5] F. Shen, X. Shu, X. Du, and J. Tang, “Pedestrian-specific bipartite-aware similarity learning for text-based person retrieval,” in Proceedings of the 31th ACM International Conference on Multimedia, 2023.
- [6] A. Zhang, Z. Min, Z. Zhang, X. Yang, and M. Q.-H. Meng, “Anisotropic generalized bayesian coherent point drift for point set registration,” IEEE Transactions on Automation Science and Engineering, vol. 20, no. 1, pp. 495–505, 2022.
- [7] A. Myronenko and X. Song, “Point set registration: Coherent point drift,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 12, pp. 2262–2275, 2010.
- [8] Y. Wang and J. M. Solomon, “Deep closest point: Learning representations for point cloud registration,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3523–3532.
- [9] W. Yuan, B. Eckart, K. Kim, V. Jampani, D. Fox, and J. Kautz, “Deepgmr: Learning latent gaussian mixture models for registration,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer, 2020, pp. 733–750.
- [10] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,” ACM Transactions on Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019.
- [11] F. Shen, X. Du, L. Zhang, and J. Tang, “Triplet contrastive learning for unsupervised vehicle re-identification,” arXiv preprint arXiv:2301.09498, 2023.
- [12] F. Shen, Y. Xie, J. Zhu, X. Zhu, and H. Zeng, “Git: Graph interactive transformer for vehicle re-identification,” IEEE Transactions on Image Processing, 2023.
- [13] F. Shen, J. Zhu, X. Zhu, Y. Xie, and J. Huang, “Exploring spatial significance via hybrid pyramidal graph network for vehicle re-identification,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8793–8804, 2021.
- [14] Z. Shen, M. Zhang, H. Zhao, S. Yi, and H. Li, “Efficient attention: Attention with linear complexities,” in Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021, pp. 3531–3539.
- [15] Q.-Y. Zhou, J. Park, and V. Koltun, “Fast global registration,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 766–782.
- [16] F. Shen, M. Wei, and J. Ren, “Hsgnet: Object re-identification with hierarchical similarity graph network,” arXiv preprint arXiv:2211.05486, 2022.
- [17] T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- [18] Z. Qin, H. Yu, C. Wang, Y. Guo, Y. Peng, and K. Xu, “Geometric transformer for fast and robust point cloud registration,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11 143–11 152.
- [19] B. Eckart, K. Kim, and J. Kautz, “Fast and accurate point cloud registration using trees of gaussian mixtures,” arXiv preprint arXiv:1807.02587, 2018.
- [20] F. Shen, J. Zhu, X. Zhu, J. Huang, H. Zeng, Z. Lei, and C. Cai, “An efficient multiresolution network for vehicle reidentification,” IEEE Internet of Things Journal, vol. 9, no. 11, pp. 9049–9059, 2021.
- [21] F. Shen, X. Peng, L. Wang, X. Zhang, M. Shu, and Y. Wang, “Hsgm: A hierarchical similarity graph module for object re-identification,” in 2022 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2022, pp. 1–6.
- [22] M. Li, M. Wei, X. He, and F. Shen, “Enhancing part features via contrastive attention module for vehicle re-identification,” in 2022 IEEE International Conference on Image Processing (ICIP). IEEE, 2022, pp. 1816–1820.
- [23] J. Yang, H. Li, D. Campbell, and Y. Jia, “Go-icp: A globally optimal solution to 3d icp point-set registration,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 11, pp. 2241–2254, 2015.
- [24] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611. Spie, 1992, pp. 586–606.
- [25] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361.
- [26] J. Liu, F. Shen, M. Wei, Y. Zhang, H. Zeng, J. Zhu, and C. Cai, “A large-scale benchmark for vehicle logo recognition,” in 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC). IEEE, 2019, pp. 479–483.
- [27] H. Wu, F. Shen, J. Zhu, H. Zeng, X. Zhu, and Z. Lei, “A sample-proxy dual triplet loss function for object re-identification,” IET Image Processing, vol. 16, no. 14, pp. 3781–3789, 2022.
- [28] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d shapenets: A deep representation for volumetric shape modeling.”
- [29] M. Magnusson, “The three-dimensional normal-distributions transform: an efficient representation for registration, surface analysis, and loop detection,” Ph.D. dissertation, Örebro universitet, 2009.
- [30] Q.-Y. Zhou, J. Park, and V. Koltun, “Open3d: A modern library for 3d data processing,” arXiv preprint arXiv:1801.09847, 2018.
- [31] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Advances in neural information processing systems, vol. 28, 2015.
- [32] Y. Aoki, H. Goforth, R. A. Srivatsan, and S. Lucey, “Pointnetlk: Robust & efficient point cloud registration using pointnet,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7163–7172.
- [33] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.
- [34] H. Deng, T. Birdal, and S. Ilic, “Ppfnet: Global context aware local features for robust 3d point matching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 195–205.
- [35] S. Huang, Z. Gojcic, M. Usvyatsov, A. Wieser, and K. Schindler, “Predator: Registration of 3d point clouds with low overlap,” in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2021, pp. 4267–4276.
- [36] Y. Wang and J. M. Solomon, “Prnet: Self-supervised learning for partial-to-partial registration,” Advances in neural information processing systems, vol. 32, 2019.
- [37] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
- [38] X. Fu, F. Shen, X. Du, and Z. Li, “Bag of tricks for “vision meet alage” object detection challenge,” in 2022 6th International Conference on Universal Village (UV). IEEE, 2022, pp. 1–4.
- [39] R. Xu, F. Shen, H. Wu, J. Zhu, and H. Zeng, “Dual modal meta metric learning for attribute-image person re-identification,” in 2021 IEEE International Conference on Networking, Sensing and Control (ICNSC), vol. 1. IEEE, 2021, pp. 1–6.
- [40] Y. Xie, F. Shen, J. Zhu, and H. Zeng, “Viewpoint robust knowledge distillation for accelerating vehicle re-identification,” EURASIP Journal on Advances in Signal Processing, vol. 2021, pp. 1–13, 2021.
- [41] C. Qiao, F. Shen, X. Wang, R. Wang, F. Cao, S. Zhao, and C. Li, “A novel multi-frequency coordinated module for sar ship detection,” in 2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2022, pp. 804–811.
- [42] W. Weng, W. Ling, F. Lin, J. Ren, and F. Shen, “A novel cross frequency-domain interaction learning for aerial oriented object detection,” in Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 2023.
- [43] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
- [44] Z. J. Yew and G. H. Lee, “Rpm-net: Robust point matching using learned features,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 824–11 833.
- [45] R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (fpfh) for 3d registration,” in 2009 IEEE international conference on robotics and automation. IEEE, 2009, pp. 3212–3217.
- [46] J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8922–8931.
- [47] T. Papadopoulo and M. I. Lourakis, “Estimating the jacobian of the singular value decomposition: Theory and applications,” in Computer Vision-ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland, June 26–July 1, 2000 Proceedings, Part I. Springer, 2000, pp. 554–570.
- [48] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- [49] R. Rantoson, H. Nouira, N. Anwer, and C. Mehdi-Souzani, “Novel automated methods for coarse and fine registrations of point clouds in high precision metrology,” The International Journal of Advanced Manufacturing Technology, vol. 81, pp. 795–810, 2015.
- [50] S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algorithm,” in Proceedings third international conference on 3-D digital imaging and modeling. IEEE, 2001, pp. 145–152.
- [51] V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
- [52] L. Nataraj, A. Sarkar, and B. S. Manjunath, “Adding gaussian noise to “denoise” jpeg for detecting image resizing,” in 2009 16th IEEE International Conference on Image Processing (ICIP). IEEE, 2009, pp. 1493–1496.
- [53] N. R. Goodman, “Statistical analysis based on a certain multivariate complex gaussian distribution (an introduction),” The Annals of mathematical statistics, vol. 34, no. 1, pp. 152–177, 1963.
- [54] Y. Liu, L. De Dominicis, B. Wei, L. Chen, and R. R. Martin, “Regularization based iterative point match weighting for accurate rigid transformation estimation,” IEEE transactions on visualization and computer graphics, vol. 21, no. 9, pp. 1058–1071, 2015.
- [55] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, vol. 17, no. 39, pp. 1–40, 2016.
- [56] A. Delaigle and P. Hall, “Defining probability density for a distribution of random functions,” The Annals of Statistics, pp. 1171–1193, 2010.
- [57] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin, “Registration of 3d point clouds and meshes: A survey from rigid to nonrigid,” IEEE transactions on visualization and computer graphics, vol. 19, no. 7, pp. 1199–1217, 2012.
- [58] N. J. Mitra, N. Gelfand, H. Pottmann, and L. Guibas, “Registration of point cloud data from a geometric optimization perspective,” in Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, 2004, pp. 22–31.
- [59] H. Hong and B. H. Lee, “Probabilistic normal distributions transform representation for accurate 3d point cloud registration,” in 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, 2017, pp. 3333–3338.
- [60] G. K. Tam, Z.-Q. Cheng, Y.-K. Lai, F. C. Langbein, Y. Liu, D. Marshall, R. R. Martin, X.-F. Sun, and P. L. Rosin, “Registration of 3d point clouds and meshes: A survey from rigid to nonrigid,” IEEE transactions on visualization and computer graphics, vol. 19, no. 7, pp. 1199–1217, 2012.