Revisiting Lesion Tracking in 3D Total Body Photography
Abstract
Melanoma is the most deadly form of skin cancer. Tracking the evolution of nevi and detecting new lesions across the body is essential for the early detection of melanoma. Despite prior work on longitudinal tracking of skin lesions in 3D total body photography, there are still several challenges, including 1) low accuracy for finding correct lesion pairs across scans, 2) sensitivity to noisy lesion detection, and 3) lack of large-scale datasets with numerous annotated lesion pairs. We propose a framework that takes in a pair of 3D textured meshes, matches lesions in the context of total body photography, and identifies unmatchable lesions. We start by computing correspondence maps bringing the source and target meshes to a template mesh. Using these maps to define source/target signals over the template domain, we construct a flow field aligning the mapped signals. The initial correspondence maps are then refined by advecting forward/backward along the vector field. Finally, lesion assignment is performed using the refined correspondence maps. We propose the first large-scale dataset for skin lesion tracking with 25K lesion pairs across 198 subjects. The proposed method achieves a success rate of 89.9 % (at 10 mm criterion) for all pairs of annotated lesions and a matching accuracy of 98.2% for subjects with more than 200 lesions.
Keywords:
Total body photography Skin lesion longitudinal tracking 3D correspondence.1 Introduction
Melanoma is the most deadly form of skin cancer. Tracking the evolution of existing nevi and detecting new lesions across the body is essential for the early detection of melanoma [2]. Manual evaluation of skin lesions by a dermatologist is considered the standard of care. However, in patients with numerous skin lesions, this task becomes challenging and is prone to human error.
Total body photography (TBP) captures the entire body of a patient using 2D images [20] and/or a 3D mesh [45, 17]. As such, TBP can be effective for monitoring the evolution of lesions [10, 21, 24, 19, 44, 43]. From a systematic review, an individual with a large number (>100) of common nevi is usually included in the target population of TBP [21].
Several prior works have proposed the use of 3D meshes for longitudinal tracking of skin lesions [8, 65, 3, 22]. However, they continue to face significant challenges. First, existing methods produce low-accuracy lesion correspondences, attributed to multiple factors such as the limited positional expression of a lesion, regions with non-isometric deformations, and inconsistent texture across scans. Methods proposed in [65, 3, 22] are constrained by their representation of lesions at the resolution of the mesh vertices, thereby making them heavily dependent on the resolution of the mesh. In extreme cases, adjacent lesions will be mapped to the same vertex, reducing tracking accuracy. In addition, approaches from Zhao et al. [65] and Ahmedt-Aristizabal et al. [3] rely on a coarse correspondence map between the source and target meshes to identify corresponding lesion pairs. Thus, even with an accurate representation of the lesions themselves, correspondences are necessarily imprecise. Huang et al. [22] improve the correspondence accuracy by incorporating both shape and texture information. However, their method is sensitive to inconsistent texture across scans, caused by imperfect scanning (e.g. misalignment between images) or changes in clothing and hairiness.
Independently, noise in lesion detection is unavoidable in practice [65], including both false positives (detections that are not actual lesions) and false negatives (lesions that are undetected). Therefore, matching methods need to not only identify correspondences between of inlier lesions on the source/target (i.e., lesions that are successfully detected in both scans) but also provide the corresponding location on the target/source for lesions that cannot be matched. This additional location information allows physicians to verify whether an unmatchable lesion represents a new growth or is a false positive resulting from noise.
The public dataset introduced by Zhao et al. [65] represents a pioneering effort in lesion tracking using 3D meshes. However, the size of the dataset is limited, comprising only 10 subjects. Furthermore, on average only 20 lesion pairs are annotated per subject. The sparse annotation of skin lesions makes the lesion-tracking evaluation far from representative of real-world scenarios.
We propose a framework to match lesions in the context of the TBP using 3D textured meshes while providing locations for unmatchable lesions. To achieve this, we compute accurate correspondence maps relying on the signal represented by the TBP images, in addition to the geometry of the meshes themselves. Using the geometry of the meshes, we first compute coarse correspondence maps taking the source and target meshes to a template mesh. Then, using the lesion/texture signals, we solve for a flow field on the template mesh which is used to refine the correspondence maps. Finally, lesion assignment is performed using the refined correspondence maps. We also extend the annotations on the 3DBodyTex dataset [48] to a 25K lesion pairs dataset for skin lesion tracking. To the best of our knowledge, we are the first to release a dataset for skin lesion tracking dataset at this scale.
Overall, we make three main contributions:
-
•
We propose a novel framework for lesion tracking that automatically matches inlier lesions while providing locations for unmatchable lesions in the context of TBP using 3D textured mesh.
-
•
We extend the 3DBodyTex dataset by annotating 25K lesion pairs over 198 subjects for skin lesion tracking in 3D TBP.
-
•
We validate that the proposed framework outperforms the state-of-the-art methods in both the matching accuracy and the accuracy of the correspondence maps. The framework is also more robust to 1) inconsistency between source and target texture, 2) non-isometric deformations, and 3) errors in lesion detection.
2 Related work
2.1 Shape correspondence for humans
Shape correspondence between non-rigid surfaces represented as triangle meshes has been an active research topic in computer vision and computer graphics [58, 47, 11]. The shape correspondence problem for triangle meshes is finding a set of corresponding points between two meshes. For human shapes, priors of the human body are commonly applied, such as near-isometric deformation and local rigidity for limbs [66, 62, 6, 4].
Template-based
A line of research relies on a template model, such as SMPL [33], for establishing correspondences across shapes [23, 18, 7]. These methods usually rely on an initial estimation of the pose (body joint positions and orientation) mapping the template to the input. Groueix et al. [18] propose to deform a template mesh to various body poses and shapes with auto-encoder frameworks. Bhatnagar et al. [7] use self-supervised learning to register scans of humans to a common 3D human model.
Canonical embedding
Another common shape correspondence method maps vertices into a pose-invariant feature space, where correspondences between the input and template geometries are more easily established [40, 32, 36, 13]. In this category, several shape descriptors have been proposed, from traditional hand-crafted descriptors [55, 50, 38] to deep-learning-based descriptors [63, 52, 37, 26]. Furthermore, functional map [40] are commonly used for robust regularization.
However, for matching skin lesions, the correspondence map relying on the geometry is not sufficiently accurate. Therefore, using the coarse correspondence map may fail to pair up lesions, particularly if the subject undergoes non-isometric deformation from scan to scan with numerous lesions in close vicinity. We propose leveraging additional signals on the mesh to refine the correspondence map for more accurate matching.
2.2 Graph matching
Given a set of lesions detected in a mesh, we can construct a graph in which a node corresponds to a single lesion and an edge is a connection between a pair of lesions. The node attribute is the position of a lesion, and the edge attribute is the geodesic distance between a pair of lesions. Then, the problem of matching source and target lesions can be formulated as a (partial) graph-matching problem that maximizes the node-to-node and edge-to-edge affinity of the two graphs. Two-graph matching can be modeled as the quadratic assignment problem (QAP), and is known to be NP-hard.
Traditional approaches [30, 15, 9] aim to match graphs by maximizing quadratic objective functions. While effective for simple cases, these methods often struggle with complex graph structures. Some proposed approaches utilize relaxation strategies in graph matching to mitigate the hard combinatorial problem [29, 25, 51]. More recent methods explore hyper-graph matching represented by a tensor to encode the higher-order information, offering increased expressiveness but at the cost of higher computational complexity [14, 41, 64].
Learning-based methods have been shown to improve matching accuracy [61, 46, 31]. Wang et al. [61] presents a QAP network to solve the matching problem as a vertex classification task over the association graph whose nodes represent candidate correspondences between the two graphs and edge weights are induced by the affinity matrix built with the two graphs. Liao et al. [31] convert the problem of hypergraph matching into a node classification problem and develop a hypergraph neural network. Despite their promise, these methods often require exhaustively annotated datasets for training and struggle to generalize across different domains or datasets. Meanwhile, to address real-world scenarios involving noisy or incomplete data, some techniques are developed for partial graph matching and soft assignment [59, 16].
Despite the advances in graph matching, the solution itself (unlike a correspondence map) does not provide location information for unmatched lesions, an issues that is critical in clinical settings. Starting from coarse correspondence maps, we propose a framework that performs lesion assignment and maps lesions onto a template mesh that allows clinical verification.
2.3 Skin lesion tracking in total body photography
Several works have been proposed for tackling the skin lesion tracking problem over the full body [27, 28, 53, 54]. Korotkov et al. [28] designed a TBP system with 21 high-resolution cameras and a turntable to track lesions. However, their method assumes the patient poses are the same across visits and heavily rely on calibrated camera poses for finding lesion correspondence. The work of Korotkov et al. [27] improved the earlier method but does not extend to multiple visits. Strzelecki et al. [54] developed a TBP system with one digital camera rotating and moving vertically around the subject. The camera captures 32 images for lesion detection and lesion matching [53] based on feature matching and triangulation. However, their method fails when the skin surface is inclined at an angle deviating significantly from 90∘ with respect to the camera viewing direction. Overall, these methods are limited to a controlled environment and sensitive to camera perspectives and changes in body poses [22].
Recently, the concept of finding lesion correspondence using a 3D representation of the human body has been explored in [8, 65, 3, 22]. Zhao et al. [65], Bogo et al. [8], and Ahmedt-Aristizabal et al. [3] proposed to use a template mesh and rely on anatomical position defined on the template mesh for lesion matching. However, using the template-based correspondence map is insufficient to accurately match lesions. Furthermore, these methods have limited positional expression for lesions since they “snap” the location of a lesion to the nearest vertex. Huang et al. [22] proposed to improve the lesion correspondence localization accuracy using landmark-based correspondences refined by texture information. However, their method requires manual annotation of landmarks, is limited to the resolution of the mesh, and is sensitive to inconsistent texture between scans due to scanning artifacts. In this paper, we represent lesion positions using barycentric coordinates within a triangle, making our approach less sensitive to the resolution of the mesh and allowing us to achieve a higher matching accuracy. Additionally, we utilize lesion signals that are agnostic to the textured mesh assuming lesions are provided separately.
3 Methods
3.1 Problem Formulation
Given a template mesh , source and target meshes , , and two sets of detected lesions , , we would like to find correspondence maps and , and a matching matrix minimizing an energy of consisting of two terms:
(1) |
That is, we want a pair of corresponding source and target lesions to be close to each other while encouraging the correspondence matrix to be doubly stochastic. By adding a dummy lesion to each of the lesion sets in , we allow the matching function to account for unmatchable lesions. Specifically, assuming a lesion in the source scan can be matched to at most one lesion in the target scan and vice versa, we also enforce:
(2) | ||||
(3) |
(where indicates a match between the -th lesion on the source and the target’s dummy lesion). Since the dummy lesions can be matched multiple times, the sums involving the dummy lesions can be greater than 1.
3.2 Coarse Correspondence Map
3.2.1 Template-based coarse correspondence
We start by constructing a coarse correspondence map between the source/target and a template mesh. We follow the approach from Marin et al. [35] to acquire a deformed template mesh registered to the source/target mesh that allows us to construct the correspondence map. Given an input mesh, they propose a localized neural fields network in which a neural field is dedicated to a local region of body shape to predict the vertex displacement of the template mesh (SMPL [33] model). The parameters of the neural field are then refined using Iterative Closest Point [5] through backpropagation. Then, the updated neural field is utilized to register the SMPL model to the input, followed by a refinement that optimizes Chamfer distance. We denote the method SMPL-NICP. Let be the template mesh, for an input mesh , the output from SMPL-NICP is a deformed template mesh (i.e. with the same topology as the original template) whose geometry is registered to that of .
We define a correspondence map , by first deforming the template mesh to and then finding, for every point , the nearest surface point on the deformed template. Similarly, we construct a correspondence map by finding the closest surface point on the input mesh for each point on the deformed template mesh. We note that and are not inverses of each other since two different points on the source/target can have the same closest point on the deformed template. Fig. 1 illustrates the template mesh in (a), the source and the target meshes in (b), the correspondence maps from the source/target to the template in (c), and the source and the target lesions mapped to the template mesh in (d).
3.2.2 Surface point correspondence map
We allow lesions to be located anywhere on the surface on the mesh (i.e. not restricted to vertex positions, as in previous work [65, 3, 22]). To this end, we use barycentric coordinates to encode a point on the mesh: where indexes the triangle containing and are the barycentric coordinates of inside the triangle ( and ).
Using this encoding, we represent mesh correspondences as vertex-to-surface-point maps, taking the vertices on one mesh to points on the second mesh: is represented by a matrix. The row in maps the vertex to a point in in the barycentric encoding.
Given a vertex-to-surface-point correspondence map , we use the barycentric encoding to extend it to a surface-point-to-surface-point correspondence map . Concretely, to find the correspondence of a surface point to the mesh , we map the three vertices of the triangle containing the point onto the mesh , interpolate the positions of the imaged vertices using the barycentric coordinate of , and then find the point on closest to the interpolant. Formally, for a point with representing the triangle containing , we have:
(4) |

3.3 Vector-field-based Refinement
The template-based correspondence maps are coarse for two reasons. First, when fitting a template mesh to source/target scan, non-isometric deformation is present at locations near body joints and locations of soft tissues. Second, misalignment between the deformed template mesh and the input mesh occurs if the body pose of the input mesh is far from the canonical “T” pose. Since the coarse correspondence map relies on the nearest point on the registered template mesh to the query point, such a misalignment degrades the accuracy of the mapping. Consequently, a pair of corresponding points in the source and the target will not map to the same position on the template mesh. To refine the correspondence map, we transfer the texture and lesion signals of the source/target to the template mesh using the source/target-to-template correspondences. We then construct a vector field on the template mesh that aligns the transferred signals.
3.3.1 Signal construction on template mesh
Let be a signal on mesh , we transfer the signal to the template mesh using the correspondence map, to define a signal on the template . We consider two types of input signals:
Texture signal
We construct a triplet of color signals on the template mesh, using the colors in the texture map acquired by the TBP, , with .
Lesion signal
We construct lesion signals on the template mesh using lesion signals defined on the source/target meshes, . The source/target lesion signals represent the likelihood of a surface point being a lesion. To create the lesion signal, we diffuse a sum of delta functions centered at the lesion positions and normalize the signal across the surface with the maximum signal value to create a scalar-per-vertex signal.
3.3.2 Surface optical flow
We are given a template mesh , source/target texture signals , and source/target lesion signals . Our goal is to define a tangent vector field on the template mesh such that advection along the field best aligns the source and target signals. To this end we leverage the approach of Prada et al. [42], defining the flow field as the minimizer of the energy:
(5) | ||||
with the first and the second terms penalizing the failure of the vector field to explain the difference in the texture signal and the lesion signal respectively, the third term encouraging the smoothness of the flow, and the fourth term regularizing the norm of the flow to respect the initial correspondence map. We follow the approach proposed by Prada et al., solving for the flow field hierarchically. Please refer to [42] for more details. Fig. 2 visualizes the source and the target (a) texture and (b) lesion signals transferred to the template mesh. Fig. 2 (c) shows the vector field obtained by minimizing Eqn.5.

3.3.3 Update of correspondence map
With the vector field defined on , we update the correspondence map and by advecting the positions of correspondence forward and backward along the vector field halfway, separately. Formally, we have:
(6) |
and
(7) |
with the exponential map taking vectors in the tangent space at to positions on .
3.4 Lesion Assignment
Given source/target correspondence maps , we expect lesions, and to be in correspondence if the geodesic distance between and is small. Conversely, we expect (resp. ) to be unmatched if the geodesic distance from to for all (resp. from to for all ) is large. We formalize these observations, expressing the assignment matrix as the minimizer of the energy:
(8) |
where is the geodesic distance function on . The assignment problem can be optimized through the Kuhn–Munkres algorithms [39]. We follow the implementation in Pygmtools [60] to solve the minimizer to Eqn. 8.
4 Evaluation
4.1 Dataset
We extend the 3DBodyTex dataset [48, 49] by annotating lesion correspondence in every pair of meshes. Following the suggestion from a medical expert reported in [57], the labeling process is inclusive for anything that could be potentially considered a skin lesion (e.g. including freckles). The dataset is labeled by four experienced annotators using the point list picking function in CloudCompare [1]. It takes an annotator approximately 15 minutes to label one subject with 100 lesion pairs in two poses (meshes). We excluded subjects when fewer than 10 lesions were found.
The average number of annotated lesions is 129.6 () across 198 subjects, totaling 25,666 lesions. We define the density of the annotated lesions as the number of neighboring lesions within a geodesic distance of 100 mm. Across all subjects, the average density is 8.5 with . In addition, the lesions are distributed with 12,846 lesions on the trunk, 3761 lesions on the upper right limb, 3617 lesions on the upper left limb, 2139 lesions on the lower left limb, 2291 lesions on the lower left limb, and 1012 lesions on the head. Overall, numerous skin lesion pairs are annotated with diversity in body shapes, sizes, poses, and anatomical variations. Therefore, the proposed dataset is suitable for the evaluation of skin lesion tracking that approximates real-world scenarios for TBP. The distribution of the lesion annotations can be found in the supplement.
We define a “challenging-pose” subset that consists of 6 subjects in which one of the two poses is challenging. This subset is separated from the entire dataset and evaluated individually. Furthermore, since lesion tracking is most valuable for patients with numerous and dense lesions, we define another subset of 35 subjects as a “numerous-lesions” set in which subjects are annotated with more than 200 lesions.
4.2 Evaluation of Correspondence Map
To evaluate the quality of the established correspondence maps, we measure the geodesic distance between a pair of source and target lesions mapped on the template mesh. We report 1) the average geodesic distance across all the annotated lesion pairs () and 2) the subject-wise geodesic distance () as the average geodesic distance of the annotated lesion pairs for the individual subject, and then aggregated across all the subjects. To interpret the geodesic distance between a pair of source and target lesions in a clinical application, a pair of lesions is successfully mapped if the distance between them is less than a threshold criterion. Using the threshold criterion of 10 mm (as in [22]), we measure the success rate for each subject as the percentage of the correctly mapped source and target skin lesions over the total number of annotated skin lesion pairs. We report the subject-wise success rate computed on a pair of meshes (for one subject) and averaged across paired meshes.
We compare our method to two baseline methods that rely solely on geometry to provide correspondence maps, SMPL-NICP [35] and DiffusionNet [52]. We note that DiffusionNet [52] is used as a feature extractor to transform each vertex into a higher-order embedding. Therefore, shape correspondence from DiffusionNet [52] belongs to the category of canonical embedding in shape correspondence (described in §2.1), relying on functional maps to compute correspondence between shapes assuming descriptor preservation over the underlying meshes. Table 1 shows the comparison of the established correspondence maps. For both and , the correspondence maps from our method are substantially more accurate with a smaller variance compared to the baseline methods. As a result, the proposed method achieves a success rate of 89.9% () at the 10-mm criterion, significantly higher than baseline methods. We note that the reported success rate from [22] is only 57% (), computed over 10 subjects and totaling around 200 lesions within our dataset.
Fig. 3 (a) shows the distribution of the geodesic distance between all the lesion pairs. Fig. 3 (b) shows the distribution of the subject-wise geodesic distance between lesion pairs. We observe that the proposed method effectively aligns lesion pairs closer and reduces the long-tail distribution as well. To further investigate the improvement of flow field refinement, we compare correspondence maps established by the proposed method and SMPL-NICP for an individual subject. Fig. 4 (a) shows the distribution of the geodesic distances for all the lesion pairs on the subject. In Fig. 4 (b), we observe that the flow field refinement is also effective in aligning lesion pairs that are originally far from each other (e.g. with a geodesic distance of more than 30 mm). Fig. 4 (c) visualizes the source and the target mesh of the subject, demonstrating challenges of non-isometric deformation due to soft tissue and differing poses. Fig. 4 (d) visualizes the texture signals transferred to the template mesh. In Fig. 4 (e), we show the source and the target lesions mapped onto the template mesh with (ours) and without (SMPL-NICP) flow field refinement. We observe that after the refinement source and target lesions are mapped more closely together, facilitating the task of lesion assignment.
DiffusionNet [52] | SMPL-NICP [35] | Ours | |
---|---|---|---|
(mm) | 31.6 (28.5) | 16.2 (12.2) | 4.9 (8.4) |
(mm) | 32 (8.8) | 15.9 (4) | 4.8 (1.4) |
Success rate (%) | 12 (7) | 30.9 (13.9) | 89.9 (4.7) |


4.3 Evaluation of Lesion Assignment
We compare the proposed framework to existing lesion matching methods from Zhao et al. [65] and Ahmedt-Aristizabal et al. [3]. We note that the two methods essentially rely on the anatomical position of the source lesions and the target lesions that are mapped in the template mesh while differing in two ways. First, Ahmedt-Aristizabal et al. [3] use linear assignment, whereas Zhao et al. [65] resort to quadratic assignment using the preservation of geodesic distance between lesion pairs. Second, Ahmedt-Aristizabal et al. [3] selects LoopReg [7] for template registration, while Zhao et al. [65] selects 3D-CODED [18]. To eliminate the effect coming from different registering methods, we re-implement their methods using SMPL-NICP [35] for template registration, similar to what is used in our framework.
For each subject, we calculate the matching accuracy as the number of correctly matched lesions over the number of annotated lesion pairs. Table 2 shows the comparison of the matching accuracy evaluated with different subsets of the dataset. The matching accuracy of the proposed method is the highest for the entire dataset and the “numerous-lesions” subset. In particular, when only evaluating subjects within the “numerous-lesions” subset, we observe that the improvement from our method is more pronounced. In this subset, our method achieves a 98.2% () matching accuracy over 35 subjects totaling 9772 lesion pairs, as compared to 92.2% () and 96.0% () for the methods of Zhao et al. [65] and Ahmedt-Aristizabal et al. [3], respectively. Therefore, the proposed method effectively pairs up skin lesions for subjects with numerous lesions, the most important benefit of using TBP for skin cancer. The matching accuracy for individual subjects within the “numerous-lesions” subset is shown in the supplement. We remark that the reported matching accuracy from Zhao et al. [65] is 83% () conducted on 10 subjects and totaling around 200 lesion pairs. We also observe that the proposed method is more robust to differences in topology between source and target meshes. A subject with changes in topology between the two scans and the results can be found in the supplement.
For the “challenging-pose” subset, the approach from Zhao et al. [65] gives the highest matching accuracy, followed by our method. We note that the template-based correspondence map fails when the poses are challenging, either unseen in the training dataset (in SMPL-NICP [35]) or far from the template “T” pose. In this case, comparing the methods from Zhao et al. [65] and Ahmedt-Aristizabal et al. [3], the preservation of the geodesic distance between pairs of lesions helps to improve the matching accuracy beyond linear assignment.
Zhao et al. [65] | Ahmedt-Aristizabal et al. [3] | Ours | |
---|---|---|---|
entire | 95.9 (11.6) | 98.3 (3.6) | 98.9 (1.7) |
numerous-lesions | 92.2 (16.9) | 96.0 (5.7) | 98.2 (1.5) |
challenging-pose | 87.0 (8.8) | 72.2 (8.7) | 80.4 (9.6) |
4.4 Robustness to Noise in Lesion Detection
We evaluate the robustness of the proposed framework to noisy lesion detection by independently taking out % of lesions in the source and target, and then pairing up lesions. The evaluation is performed on the “numerous-lesions” subset. We compare the proposed framework to the approaches from Zhao et al. [65] and Ahmedt-Aristizabal et al. [3].
We compute precision, recall, and F1 scores between the source and target lesions for each subject and then report the average and standard deviation across subjects. The F1 score is the harmonic mean of precision and recall. Given predicted matrix containing matches, and ground truth matrix containing matches, denote as element-wise product, we have:
precision | (9) | |||
recall | (10) | |||
F1 | (11) |
Fig. 5 compares the precision, recall, and F1 scores for the three methods under different noise levels. Compared to the baseline methods, the proposed method is consistently more robust to errors in lesion detection for a noise level smaller than 25%. We note that the reported recall for lesion detection by Zhao et al. [65] ranges from 78% to 96%, validating that the noise range in our experiment is practical. Furthermore, while the preservation of geodesic distance helps the lesion matching when correspondence maps fail to align lesions (in §4.3), it is notoriously sensitive to noise, which is corroborated in our findings.

4.5 Ablation Study
4.5.1 Initial correspondence map
Conceptually, the proposed flow field refinement can be applied to any shape correspondence method that maps the source and target mesh to a template mesh. However, the level of improvement might be different, depending on the consistency of the source and target signals constructed on the template mesh. Therefore, we compare how the flow field refinement works on different initial correspondence maps. We evaluate the average geodesic distance across all the annotated lesion pairs (), using DiffusionNet [52] and SMPL-NICP [35] for initial correspondence maps. The for DiffusionNet and SMPL-NICP are 31.6 mm () and 16.2 mm () initially, and 27.4 mm () and 4.9 mm () after refinement, respectively. We show that the proposed flow field refinement effectively improves the correspondence map from both methods. Remarkably, the improvement for a more precise initial correspondence map is more signnificant – an improvement of 11.3 mm for SMPL-NICP as compared to an improvement of 4.2 mm for DiffusionNet. Since SMPL-NICP gives more consistent source and target signals to be aligned, in line with the assumptions of brightness constancy and little motion [34, 56], it benefits more from flow field refinement.
4.5.2 Flow field refinement
Usage of signals
To investigate the effectiveness of different signals to refine the correspondence maps, we compare by using only the texture signal, the lesion signal, and a combination of the two. On the “numerous-lesion” subset, we observe that using the texture signal gives 14.4 mm (), while there is no significant difference between using only the lesion signal compared to using a combination of the two, with both giving 4.8 mm (). Overall, the signals used in our method refine the correspondence maps while being robust to inconsistent source and target texture coming from scanning artifacts.
Usage of small magnitude regularization
To respect the initial correspondence map and be robust to inconsistency in signals due to noise (e.g. undetected lesions or texture that does not agree in the source and target), we add a magnitude regularization to enforce the vector field to be small. On the “numerous-lesion” subset, we observe that the is 15.84 mm () without the regularization and 4.8 mm () with the regularization.
5 Discussion
5.1 Choice of Barycentric Coordinate Representation
The choice of the barycentric coordinate representation for lesions (and correspondence maps) tackles the limitation of inaccurate and resolution-dependent matching present in previous works [65, 3, 22]. As a comparison, on the entire dataset, the average matching accuracy across all the subjects using the vertex representation is 82.9% (), as compared to 98.3% () evaluated with our re-implementation of approach from Ahmedt-Aristizabal et al. [3]. As expected, we observe a consistent decrease in matching accuracy as the lesion count increases due to inaccurate representation. A detailed comparison of the average matching accuracy for different numbers of lesions per subject can be found in the supplement. In addition, the selected representation is robust to the mesh resolution. Our evaluation is conducted on the low-resolution mesh from the 3DBodyTex dataset [48, 49] with 10K vertices on average, whereas both Zhao et al. [65] and Haung et al. [22] use high-resolution mesh with 300K vertices from the same dataset.
5.2 Computational cost
Our method achieves state-of-the-art matching accuracy without sacrificing efficiency. For each subject, our framework takes around 7 minutes, including the following components: 1) Coarse correspondence map (4 minutes). 2) Signal construction on template mesh (20 seconds). 3) Surface optical flow (150 seconds). 4) Lesion assignment (1 second). In particular, for the “numerous-lesion” subset, our method performs significantly better than the approach from Ahmedt-Aristizabal et al. [3] with an acceptable computation overhead of 3 minutes. On the other hand, the quadratic assignment used by Zhao et al. [65] is NP-hard.
5.3 Evaluation on Subjects with Dark Skin Tones
Our dataset includes a variety of skin tones, notably including 5 subjects with dark skin tones. We observe that the number of skin lesions is relatively small for those subjects, with 26.6 () lesions on average. Our method successfully pairs up all the lesions on the 5 subjects. Example results can be found in the supplement.
5.4 Limitations
The proposed framework has several limitations. First, our method fails to accurately map the source and the target lesions on the template mesh within the 10-mm criterion if a large non-isometric deformation exists. Fig. 6 (a) visualizes a subject with large non-isometric deformation around the chest and the belly area. Fig. 6 (b) demonstrates the source and the target lesions imaged on the template mesh. Although the proposed vector-field-based refinement brings closer lesions in correspondence, the lesion pairs in the red box are still far from each other, with a geodesic distance of more than 50 mm. However, these lesions are correctly paired up in our lesion assignment step. Second, the proposed framework fails for some challenging poses when the template mesh is incorrectly registered to the input mesh. For example, Fig. 6 (c) shows an example where template-based registration fails. The registered template mesh flips the left and the right legs. As a result, our method cannot match lesions successfully for lesions on those incorrectly registered body regions. However, some challenging poses may not be common in clinical settings, especially considering that patients have to maintain the pose during the scan. Last, the proposed dataset exhibits limited annotations of distal extremities, such as lesions on fingers and toes. In particular, due to complex morphology, these anatomical regions are challenging for digital imaging. Consequently, the proposed framework may demonstrate reduced performance and reliability for lesions in these regions.

6 Conclusions
In this paper, we propose a framework to match lesions in the context of full body using 3D textured meshes while providing locations for unmatchable lesions. We propose a skin lesion tracking dataset with 25K lesion pairs over 198 subjects. As far as we know, we are the first to release a dataset for skin lesion tracking in 3D TBP at this scale. We show that the proposed method effectively refines correspondence maps to align lesion pairs, achieving a success rate of 89.9% at 10-mm criterion. The proposed framework accomplishes state-of-the-art matching accuracy, an accuracy of 98% for 35 subjects with more than 200 lesions on the body. Furthermore, our method is validated to be more robust to inconsistent texture between the source and the target meshes and less sensitive to errors in lesion detection.
In the future, we would like to extend the framework for more than two TBP scans. With more than two TBP scans, some of the false positives and false negatives may potentially be resolved by evaluating the consistency of a lesion’s life cycle [12]. Moreover, the method needs to be evaluated on longitudinal data with a longer duration that may include more significant changes in skin conditions.
6.0.1 Acknowledgments
The research was in part supported by the Intramural Research Program (IRP) of the NIH/NICHD, Phase I of NSF STTR grant 2127051, Phase II of NSF STTR grant 2335086, and Phase I NIH/NIBIB STTR grant R41EB032304. We thank Ryan Whittaker for the help with annotating skin lesion pairs.
References
- [1] Cloudcompare (2023), http://www.cloudcompare.org/, gPL software
- [2] Abbasi, N.R., Shaw, H.M., Rigel, D.S., Friedman, R.J., McCarthy, W.H., Osman, I., Kopf, A.W., Polsky, D.: Early diagnosis of cutaneous melanoma: revisiting the abcd criteria. Jama 292(22), 2771–2776 (2004)
- [3] Ahmedt-Aristizabal, D., Nguyen, C., Tychsen-Smith, L., Stacey, A., Li, S., Pathikulangara, J., Petersson, L., Wang, D.: Monitoring of pigmented skin lesions using 3d whole body imaging. Computer Methods and Programs in Biomedicine 232, 107451 (2023)
- [4] Alldieck, T., Xu, H., Sminchisescu, C.: imghum: Implicit generative models of 3d human shape and articulated pose. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5461–5470 (2021)
- [5] Besl, P.J., McKay, N.D.: Method for registration of 3-d shapes. In: Sensor fusion IV: control paradigms and data structures. vol. 1611, pp. 586–606. Spie (1992)
- [6] Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Combining implicit function learning and parametric models for 3d human reconstruction. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. pp. 311–329. Springer (2020)
- [7] Bhatnagar, B.L., Sminchisescu, C., Theobalt, C., Pons-Moll, G.: Loopreg: Self-supervised learning of implicit surface correspondences, pose and shape for 3d human mesh registration. Advances in Neural Information Processing Systems 33, 12909–12922 (2020)
- [8] Bogo, F., Romero, J., Peserico, E., Black, M.J.: Automated detection of new or evolving melanocytic lesions using a 3d body model. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 593–600. Springer (2014)
- [9] Cho, M., Lee, J., Lee, K.M.: Reweighted random walks for graph matching. In: Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11. pp. 492–505. Springer (2010)
- [10] Deinlein, T., Michor, C., Hofmann-Wellenhof, R., Schmid-Zalaudek, K., Fink-Puches, R.: The importance of total-body photography and sequential digital dermatoscopy for monitoring patients at increased melanoma risk. JDDG: Journal der Deutschen Dermatologischen Gesellschaft 18(7), 692–697 (2020)
- [11] Deng, B., Yao, Y., Dyke, R.M., Zhang, J.: A survey of non-rigid 3d registration. In: Computer Graphics Forum. vol. 41, pp. 559–589. Wiley Online Library (2022)
- [12] Di Veroli, B., Lederman, R., Sosna, J., Joskowicz, L.: Graph-theoretic automatic lesion tracking and detection of patterns of lesion changes in longitudinal ct studies. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 106–115. Springer (2023)
- [13] Donati, N., Sharma, A., Ovsjanikov, M.: Deep geometric functional maps: Robust feature learning for shape correspondence. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8592–8601 (2020)
- [14] Duchenne, O., Bach, F., Kweon, I.S., Ponce, J.: A tensor-based algorithm for high-order graph matching. IEEE transactions on pattern analysis and machine intelligence 33(12), 2383–2395 (2011)
- [15] Enqvist, O., Josephson, K., Kahl, F.: Optimal correspondences from pairwise constraints. In: 2009 IEEE 12th international conference on computer vision. pp. 1295–1302. IEEE (2009)
- [16] Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8893–8902 (2021)
- [17] Grochulska, K., Betz-Stablein, B., Rutjes, C., Chiu, F.P.C., Menzies, S.W., Soyer, H.P., Janda, M.: The additive value of 3d total body imaging for sequential monitoring of skin lesions: a case series. Dermatology 238(1), 12–17 (2022)
- [18] Groueix, T., Fisher, M., Kim, V.G., Russell, B.C., Aubry, M.: 3d-coded: 3d correspondences by deep deformation. In: Proceedings of the european conference on computer vision (ECCV). pp. 230–246 (2018)
- [19] Guido, N., Hagstrom, E.L., Ibler, E., Carneiro, C., Orrell, K.A., Kelm, R.C., Rademaker, A.W., West, D.P., Nardone, B.: A novel total body digital photography smartphone application designed to detect and monitor skin lesions: a pilot study. Journal of Surgical Dermatology 6(2), 14–18 (2021)
- [20] Halpern, A.C.: Total body skin imaging as an aid to melanoma detection. In: Seminars in cutaneous medicine and surgery. vol. 22, pp. 2–8 (2003)
- [21] Hornung, A., Steeb, T., Wessely, A., Brinker, T.J., Breakell, T., Erdmann, M., Berking, C., Heppt, M.V.: The value of total body photography for the early detection of melanoma: a systematic review. International Journal of Environmental Research and Public Health 18(4), 1726 (2021)
- [22] Huang, W.L., Tashayyod, D., Kang, J., Gandjbakhche, A., Kazhdan, M., Armand, M.: Skin lesion correspondence localization in total body photography. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 260–269. Springer (2023)
- [23] Huang, X., Yang, H., Vouga, E., Huang, Q.: Dense correspondences between human bodies via learning transformation synchronization on graphs. Advances in Neural Information Processing Systems 33, 17489–17501 (2020)
- [24] Ji-Xu, A., Dinnes, J., Matin, R.: Total body photography for the diagnosis of cutaneous melanoma in adults: a systematic review and meta-analysis. British Journal of Dermatology 185(2), 302–312 (2021)
- [25] Kang, U., Hebert, M., Park, S.: Fast and scalable approximate spectral graph matching for correspondence problems. Information Sciences 220, 306–318 (2013)
- [26] Kim, H., Kim, J., Kam, J., Park, J., Lee, S.: Deep virtual markers for articulated 3d shapes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11615–11625 (2021)
- [27] Korotkov, K., Quintana, J., Campos, R., Jesús-Silva, A., Iglesias, P., Puig, S., Malvehy, J., Garcia, R.: An improved skin lesion matching scheme in total body photography. IEEE journal of biomedical and health informatics 23(2), 586–598 (2018)
- [28] Korotkov, K., Quintana, J., Puig, S., Malvehy, J., Garcia, R.: A new total body scanning system for automatic change detection in multiple pigmented skin lesions. IEEE transactions on medical imaging 34(1), 317–338 (2014)
- [29] Leordeanu, M., Hebert, M.: A spectral technique for correspondence problems using pairwise constraints. In: Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1. vol. 2, pp. 1482–1489. IEEE (2005)
- [30] Leordeanu, M., Hebert, M., Sukthankar, R.: An integer projected fixed point method for graph matching and map inference. Advances in neural information processing systems 22 (2009)
- [31] Liao, X., Xu, Y., Ling, H.: Hypergraph neural networks for hypergraph matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1266–1275 (2021)
- [32] Litany, O., Remez, T., Rodola, E., Bronstein, A., Bronstein, M.: Deep functional maps: Structured prediction for dense shape correspondence. In: Proceedings of the IEEE international conference on computer vision. pp. 5659–5667 (2017)
- [33] Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: Smpl: A skinned multi-person linear model. In: Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pp. 851–866 (2023)
- [34] Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI’81: 7th international joint conference on Artificial intelligence. vol. 2, pp. 674–679 (1981)
- [35] Marin, R., Corona, E., Pons-Moll, G.: Nicp: Neural icp for 3d human registration at scale. In: European Conference on Computer Vision (2024)
- [36] Marin, R., Melzi, S., Rodola, E., Castellani, U.: Farm: Functional automatic registration method for 3d human bodies. In: Computer Graphics Forum. vol. 39, pp. 160–173. Wiley Online Library (2020)
- [37] Mitchel, T.W., Kim, V.G., Kazhdan, M.: Field convolutions for surface cnns. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10001–10011 (2021)
- [38] Mitchel, T.W., Rusinkiewicz, S., Chirikjian, G.S., Kazhdan, M.: Echo: Extended convolution histogram of orientations for local surface description. In: Computer Graphics Forum. vol. 40, pp. 180–194. Wiley Online Library (2021)
- [39] Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics 5(1), 32–38 (1957)
- [40] Ovsjanikov, M., Ben-Chen, M., Solomon, J., Butscher, A., Guibas, L.: Functional maps: a flexible representation of maps between shapes. ACM Transactions on Graphics (ToG) 31(4), 1–11 (2012)
- [41] Park, S., Park, S.K., Hebert, M.: Fast and scalable approximate spectral matching for higher order graph matching. IEEE transactions on pattern analysis and machine intelligence 36(3), 479–492 (2013)
- [42] Prada, F., Kazhdan, M., Chuang, M., Collet, A., Hoppe, H.: Motion graphs for unstructured textured meshes. ACM Transactions on Graphics (TOG) 35(4), 1–14 (2016)
- [43] Primiero, C.A., Rezze, G.G., Caffery, L.J., Carrera, C., Podlipnik, S., Espinosa, N., Puig, S., Janda, M., Soyer, H.P., Malvehy, J.: A narrative review: opportunities and challenges in artificial intelligence skin image analyses using total body photography. Journal of Investigative Dermatology (2024)
- [44] Primiero, C.A., McInerney-Leo, A.M., Betz-Stablein, B., Whiteman, D.C., Gordon, L., Caffery, L., Aitken, J.F., Eakin, E., Osborne, S., Gray, L., et al.: Evaluation of the efficacy of 3d total-body photography with sequential digital dermoscopy in a high-risk melanoma cohort: protocol for a randomised controlled trial. BMJ open 9(11), e032969 (2019)
- [45] Rayner, J.E., Laino, A.M., Nufer, K.L., Adams, L., Raphael, A.P., Menzies, S.W., Soyer, H.P.: Clinical perspective of 3d total body photography for early detection and screening of melanoma. Frontiers in Medicine 5, 152 (2018)
- [46] Rolínek, M., Swoboda, P., Zietlow, D., Paulus, A., Musil, V., Martius, G.: Deep graph matching via blackbox differentiation of combinatorial solvers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16. pp. 407–424. Springer (2020)
- [47] Sahillioğlu, Y.: Recent advances in shape correspondence. The Visual Computer 36(8), 1705–1721 (2020)
- [48] Saint, A., Ahmed, E., Cherenkova, K., Gusev, G., Aouada, D., Ottersten, B., et al.: 3dbodytex: Textured 3d body dataset. In: 2018 International Conference on 3D Vision (3DV). pp. 495–504. IEEE (2018)
- [49] Saint, A., Cherenkova, K., Gusev, G., Aouada, D., Ottersten, B., et al.: Bodyfitr: robust automatic 3d human body fitting. In: 2019 IEEE International Conference on Image Processing (ICIP). pp. 484–488. IEEE (2019)
- [50] Salti, S., Tombari, F., Di Stefano, L.: Shot: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding 125, 251–264 (2014)
- [51] Schellewald, C., Schnörr, C.: Probabilistic subgraph matching based on convex relaxation. In: International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition. pp. 171–186. Springer (2005)
- [52] Sharp, N., Attaiki, S., Crane, K., Ovsjanikov, M.: Diffusionnet: Discretization agnostic learning on surfaces. ACM Transactions on Graphics (TOG) 41(3), 1–16 (2022)
- [53] Strakowska, M., Kociołek, M.: Skin lesion matching algorithm for application in full body imaging systems. In: International Conference on Information Technologies in Biomedicine. pp. 222–233. Springer (2022)
- [54] Strzelecki, M.H., Strąkowska, M., Kozłowski, M., Urbańczyk, T., Wielowieyska-Szybińska, D., Kociołek, M.: Skin lesion detection algorithms in whole body images. Sensors 21(19), 6639 (2021)
- [55] Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: Computer graphics forum. vol. 28, pp. 1383–1392. Wiley Online Library (2009)
- [56] Tomasi, C., Kanade, T.: Detection and tracking of point. Int J Comput Vis 9(137-154), 3 (1991)
- [57] Useini, V., Tanadini-Lang, S., Lohmeyer, Q., Meboldt, M., Andratschke, N., Braun, R.P., Barranco García, J.: Automatized self-supervised learning for skin lesion screening. Scientific Reports 14(1), 12697 (2024)
- [58] Van Kaick, O., Zhang, H., Hamarneh, G., Cohen-Or, D.: A survey on shape correspondence. In: Computer graphics forum. vol. 30, pp. 1681–1707. Wiley Online Library (2011)
- [59] Wang, R., Guo, Z., Jiang, S., Yang, X., Yan, J.: Deep learning of partial graph matching via differentiable top-k. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6272–6281 (2023)
- [60] Wang, R., Guo, Z., Pan, W., Ma, J., Zhang, Y., Yang, N., Liu, Q., Wei, L., Zhang, H., Liu, C., Jiang, Z., Yang, X., Yan, J.: Pygmtools: A python graph matching toolkit. Journal of Machine Learning Research 25(33), 1–7 (2024), https://jmlr.org/papers/v25/23-0572.html
- [61] Wang, R., Yan, J., Yang, X.: Neural graph matching network: Learning lawler’s quadratic assignment problem with extension to hypergraph and multiple-graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 44(9), 5261–5279 (2021)
- [62] Wang, S., Geiger, A., Tang, S.: Locally aware piecewise transformation fields for 3d human mesh registration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7639–7648 (2021)
- [63] Wei, L., Huang, Q., Ceylan, D., Vouga, E., Li, H.: Dense human body correspondences using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1544–1553 (2016)
- [64] Yan, J., Zhang, C., Zha, H., Liu, W., Yang, X., Chu, S.M.: Discrete hyper-graph matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1520–1528 (2015)
- [65] Zhao, M., Kawahara, J., Abhishek, K., Shamanian, S., Hamarneh, G.: Skin3d: Detection and longitudinal tracking of pigmented skin lesions in 3d total-body textured meshes. Medical Image Analysis 77, 102329 (2022)
- [66] Zuffi, S., Black, M.J.: The stitched puppet: A graphical model of 3d human shape and pose. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3537–3546 (2015)