A Connected Component Labeling Algorithm for Implicitly-Defined Domains
Abstract
A connected component labeling algorithm is developed for implicitly-defined domains specified by multivariate polynomials. The algorithm operates by recursively subdividing the constraint domain into hyperrectangular subcells until the topology thereon is sufficiently simple; in particular, we devise a topology test using properties of Bernstein polynomials. In many cases the algorithm produces a certificate guaranteeing its correctness, i.e., two points yield the same label if and only if they are path-connected. To robustly handle various kinds of edge cases, the algorithm may assign identical labels to distinct components, but only when they are exactly or nearly touching, relative to a user-controlled length scale. A variety of numerical experiments assess the effectiveness of the overall approach, including statistical analyses on randomly generated multi-component geometry in 2D and 3D, as well as specific examples involving cusps, self-intersections, junctions, and other kinds of singularities.
keywords:
connected components, path connectedness, implicitly-defined domains, level set methods, Bernstein polynomials, semi-algebraic sets65D99 (primary), 65D18, 14P10
1 Introduction
In this paper, we develop a connected component labeling algorithm for implicitly-defined domains specified by multivariate polynomials. In particular, we consider domains of the form , where is a given polynomial and is a given bounded -dimensional hyperrectangle, and consider the problem of implementing a labeling function such that if and only if the two points are path-connected.
Our interest in this labeling problem stems from the author’s prior work on quadrature algorithms for multi-component implicitly-defined domains [18]. For a general, piecewise-smooth integrand function , these algorithms output a quadrature scheme of the form and have the following key property: if is a connected component of and is sufficiently smooth on , then represents a high-order accurate quadrature scheme on . In other words, to build a quadrature scheme on , one may simply discard the quadrature points outside of , leaving the weights of the remaining points unmodified. Consequently, one may think of the bulk quadrature scheme over as an agglomeration of smaller quadrature schemes over the individual connected components of . The algorithms developed in [18], however, do not automatically determine this grouping; one of the aims of the present work is to design a simple and efficient algorithm for this purpose. Besides this application, connected component analysis arises in a wide variety of different settings, including in computer-aided geometric design, shape modeling and pattern matching, and in building topological feature descriptors of physical data sets [12].
A number of different approaches could be taken to solve the connected component labeling problem:
-
•
One approach is to apply the workhorse tools of real algebraic geometry, such as the cylindrical algebraic decomposition (CAD) algorithm pioneered by Collins [6, 5], or by computing roadmaps [2, 3]. In exact or arbitrary-precision arithmetic, CAD and roadmap methods can be used to identify and analyze the connected components of general, arbitrarily complex semi-algebraic sets (of which is one particular instance). However, these methods can be computationally expensive; moreover, their implementation in fixed-precision arithmetic requires considerable care, even for relatively simple geometry.
-
•
A second approach is to apply the computational methods of Morse theory, Reeb graphs, and contour trees [11, 12]. Roughly speaking, these methods compute the topology of the set of level sets of via the location of its extrema and saddle points, combined with gradient path tracing algorithms. These methods become increasingly more intricate to implement as the dimension increases.
-
•
A third approach might be to isolate individual components of by finding a polynomial which can separate them (followed by a similar procedure for ). For example, if we could find a polynomial such that is empty and for two sample points , it would follow that the zero level set of forms a kind of divider that separates the two connected components associated with and . To find this polynomial, one could use polynomial positivstellensatz or sums-of-squares methods [13, 14] to compute, if possible, a polynomial such that: (i) on ; (ii) ; and (iii) . Part (i), (ignoring conditions (ii) and (iii) and various other subtleties,) can be solved efficiently via linear or semidefinite programming techniques, see, e.g., [14, 1]; however, conditions (ii) and (iii) end up creating a quadratically-constrained quadratic program for the coefficients of , whose associated feasibility condition involves a challenging non-convex constraint. These ideas, among various others for computing separating polynomials, were investigated as part of the present work; however, no sufficiently elegant or efficient approach was found.
-
•
Finally, a fourth and considerably simpler approach is to subdivide the constraint domain into small enough cells for which the topology of is easy to determine. Depending on the tessellation method, this approach can be quite effective (fast, simple to implement, high resolution power), but in some cases the subdivision process may need to be stopped, potentially leading to some components being incorrectly merged or broken.
The preceding discussion serves to highlight that, depending on the final application, one must ultimately decide on a suitable compromise between: (i) absolute correctness/certifiability of the connected component labeling algorithm, potentially needing costly arbitrary-precision computation (as exemplified by CAD-based methods); (ii) robustness, e.g., in suitably handling fixed-precision roundoff errors or uncertainty in the input polynomial coefficients; (iii) computational complexity of constructing the labeling function as well as its subsequent evaluation; and (iv) implementation simplicity. The target application of this work concerns the development of numerical methods for multi-physics simulations involving highly complex geometry such as liquid atomization dynamics; in particular, the input polynomial represents the fluid interface and is computed dynamically. This application necessitates prioritizing aspects (ii), (iii), and (iv), which, in turn, necessitates a design choice for how to handle edge cases such as nearly touching components or interfacial self-intersections. Our choice here is to require that components are never broken, i.e., if are path-connected in and is the output of the labeling algorithm, then is an absolute certainty. It follows that the labeling algorithm must be permitted to merge or “glue” together distinct components in the (presumably rare) cases of uncertain topology. The algorithm developed in this work glues components only if they are exactly or nearly touching, relative to a user-defined gap threshold, typically orders of magnitude smaller than the length scale of .
With this design objective in mind, the connected component labeling algorithm developed here follows the fourth approach mentioned earlier: it operates by recursively subdividing the constraint hyperrectangle into hyperrectangular subcells until the topology of thereon is sufficiently simple or has reached a smallest-permitted size. This is achieved through a combination of quadtrees (in 2D) or octrees (in 3D), along with simple yet effective topology tests using properties of Bernstein polynomials. Once the tree is constructed, its leaf cells form the vertices of a graph whose edges are determined by sign attainability tests on the faces of adjacent leaf cells. The connected components of this graph ultimately decide the overall labeling: equals the label of the leaf cell containing , the latter found by fast tree traversal methods. We show in this paper that the overall approach is efficient in handling various degrees of geometric complexity. In particular, if the recursive subdivision process terminates without reaching the user-defined smallest subcell size, then the labeling is provably correct. Applied to a fixed class of randomly generated geometry, the percentage of cases in which the algorithm is uncertain about the topology decreases exponentially as a function of the maximum tree depth; moreover, only a small fraction of these cases yield artificially-glued components.
An outline for the rest of the paper is as follows. In § 2, we establish some preliminaries on Bernstein polynomials and define the simply connected topology test. § 3 presents the main connected component labeling algorithm, followed by a discussion of its features. Numerical tests are presented in § 4, analyzing the algorithm’s success and failure rates on randomly generated multi-component geometry in 2D and 3D as well as its behavior on particular examples exhibiting cusps, self-intersections, and other kinds of singularities. Concluding remarks with a brief discussion of possible extensions are given in § 5.
2 Preliminaries
A key part of the connected component labeling algorithm is an effective means for evaluating the range of attainable values of a polynomial. One of the most accurate and straightforward methods for doing so is through the use of the Bernstein basis [8, 15, 10, 4]. These methods make use of a convex hull property and guarantee that a polynomial’s value, at any point in its rectangular reference domain, is no larger (or smaller) than its maximum (or minimum) coefficient in the Bernstein basis. Especially useful in the present setting, these bounds become monotonically more accurate under the stable operations of Bernstein subdivision. In addition, owing to the hyperrectangular constraint domain and its subdivision into hyperrectangular subcells, it is particularly natural to use a tensor-product basis. Accordingly, a tensor-product Bernstein basis is adopted throughout this work.
In dimensions, let be a tensor-product Bernstein polynomial of degree , defined relative to the hyperrectangle ; takes the form
where
are the one-dimensional Bernstein basis functions of degree relative to the interval , denotes the th Bernstein coefficient of for a multi-index , and denotes the product of basis functions. This representation gives the Bernstein coefficients of relative to the given hyperrectangle ; subdivision refers to transforming these coefficients relative to a subset hyperrectangle and can be stably computed using the de Casteljau algorithm [7].
In the connected component labeling algorithm, we need a method to assess whether the topology of and is simple enough. This is achieved through the following three concepts.111Definition 2.1 has, in various guises, appeared before in the literature; the remaining two concepts (Definitions 2.2 and 2.3, along with their implications) are perhaps new, and target specifically the objectives of the present work.
Definition 2.1.
A Bernstein polynomial is called coefficient monotone increasing on (in the direction ) if there exists a coordinate direction such that for all , ; here, denotes the standard basis vector in the direction of the th coordinate. The polynomial is called coefficient monotone if or is coefficient monotone increasing.
An equivalent viewpoint comes from a convenient property of differentiation in the Bernstein basis: up to a multiplicative factor, the coefficients of the derivative are formed via first-order divided differences of the input polynomial’s coefficients. Therefore, is coefficient monotone if and only if the Bernstein coefficients of its derivative in the corresponding direction are either all non-negative or all non-positive. It follows that coefficient monotone polynomials are monotone in the conventional sense and therefore attain their extrema on the corresponding faces of the hyperrectangle. Tests of Bernstein coefficient monotonicity have a variety of applications, including, e.g., in polynomial range evaluation [16]. In the present application, an important property is that, for a coefficient monotone polynomial , for any , can be path-connected to one of the corresponding faces of the hyperrectangle without leaving the region , and analogously for points in .
We next establish a sufficiently accurate means to determine whether a polynomial attains positive or negative values on a given hyperrectangle. The following definition sets the requirements on an algorithm implementing this task.
Definition 2.2.
Given a Bernstein polynomial on the hyperrectangle , a -dimensional sign evaluation algorithm outputs the possible signs of on and must satisfy the following properties:
-
(i)
;
-
(ii)
must be a superset of the ground-truth, i.e., , where is the standard sign operator,
-
(iii)
if for all , then ; if for all , then ;
-
(iv)
if is coefficient monotone in the direction , then must hold, where and denote the restriction of to the corresponding lower and upper faces of , and is a -dimensional sign evaluation algorithm;222Here, denotes the union operation which includes zero when appropriate: if , then ; if , then .
-
(v)
finally, a zero-dimensional sign evaluation algorithm must satisfy .
Note that a method which exactly computes the minimum and maximum value of on would trivially yield a sign evaluation algorithm. However, computing the extrema of arbitrary polynomials is a non-trivial and computationally expensive task, perhaps as difficult as the connected component labeling problem itself. The conditions of a sign evaluation algorithm are weaker than an exact computation, thereby allowing for simpler and more efficient algorithms; nevertheless, some degree of certitude is required. In particular, condition (iii) requires an implementation be at least as accurate as conventional Bernstein range evaluation; this ensures it inherits at least the same level of accuracy under the action of subdivision. Condition (iv) says that if happens to be coefficient monotone, the sign evaluation must be at least as accurate as what can be determined solely from the corresponding faces where ’s extrema are known to occur. One particularly important property is that if a sign evaluation algorithm outputs either or , then we may conclude, with absolute certainty, is uniformly signed throughout . An example of a simple sign evaluation algorithm is given in the next section.
Finally, a notion of whether the topology of is “simple enough” is recursively defined as follows. This definition implicitly depends on the presence of a suitable sign evaluation algorithm , so it is assumed that one has been implemented and fixed ahead of time.
Definition 2.3.
A -dimensional Bernstein polynomial is called simply connected on if at least one of the following three conditions hold: (i) ; or (ii) ; or (iii) there is coordinate direction such that (a) is coefficient monotone in the direction and (b) is simply connected on the corresponding upper and lower faces of . By definition, a zero-dimensional Bernstein polynomial is simply connected.
A simply connected polynomial guarantees a simple topology of both and , as follows. If conditions (i) or (ii) in Definition 2.3 are met, then is non-zero throughout the hyperrectangle and so exactly one of is empty and the other the whole hyperrectangle. Otherwise, there is a coordinate axis such that any point in can be path-connected to one of the corresponding faces. These faces are themselves simply connected; inductively it follows that if (resp., ) is nonempty, then (resp., ) has exactly one connected component.333In fact, the sets or are simply connected in the conventional topological sense, but it is unnecessary to prove this fact in the present work.
To prove the correctness of the connected component labeling algorithm, it is useful to establish the following two properties of simply connected polynomials:
-
•
The first property is that if is simply connected on a hyperrectangle , then is simply connected on every face of . This can be shown via induction on the dimension; we illustrate here with a three-dimensional example. Suppose is simply connected on a rectangular 3D prism. If , then it is uniformly signed and thus trivially simply connected on every face of the prism. Otherwise, is coefficient monotone in the up direction, say, such that the restriction of to the bottom and top faces is simply connected; in particular, by induction, we have that is simply connected on every edge of the bottom and top face. Now, on each vertical face of the prism (those faces excluding the bottom and top), the restriction of is coefficient monotone in the up direction, and, as was just concluded, is simply connected on the corresponding lower and upper edges. Therefore, is simply connected on every vertical face. The preceding arguments can be extended to any dimension, and the one-dimensional base case of the inductive argument trivially holds.
-
•
The second property is that if is simply connected on a hyperrectangle , then a sign evaluation algorithm yields exact results, i.e., the output of is precisely . Similar to before, this can be shown via induction. Suppose is simply connected on . If , then is with absolute certainty uniformly signed and is exact. Otherwise, if is coefficient monotone on in the up direction, say, then is at least as accurate as applied to the restriction of on the lower and upper faces of ; by induction, the latter calculation is exact, and therefore is, too.
Combining these two properties, we observe that if is simply connected on a hyperrectangle , then a sign evaluation algorithm yields exact results on every face of .
3 Connected Component Labeling Algorithm
With these preliminaries established, we are now in a position to describe the connected component labeling algorithm. First, we describe a sufficiently-accurate Bernstein polynomial range evaluation algorithm, which approximately (and in various cases, exactly) evaluates the range of on a given hyperrectangle . Our particular implementation is given in Algorithm 1 and has the following properties:
-
•
The output is an interval such that always holds.
-
•
The algorithm is at least as accurate as conventional Bernstein range evaluation, i.e., , the latter interval bounding the minimum and maximum value of the input’s Bernstein coefficients.
-
•
For the simple cases of linear, quadratic, or cubic polynomials in dimensions, it is a particularly simple and efficient task to exactly evaluate the range of ; this is implemented on line 2 and is an easy tweak benefiting overall accuracy.
-
•
If happens to be coefficient monotone on , then the range evaluation algorithm is as least as accurate as what can be determined by evaluating the range on the corresponding faces of where the extrema of is known to occur (line 8). Further, if happens to be coefficient monotone in multiple directions, then the tightest possible range is returned (lines 9–10).
The main purpose of the range evaluation algorithm is to implement a sign evaluation algorithm. In particular, we define
In other words, is defined by the possible signs of as computed by the range evaluation algorithm. By construction, it satisfies all the required properties set out in Definition 2.2. In particular, it is exact for low-degree one-dimensional polynomials, whenever , as well as various other cases. Using the sign evaluation algorithm, it is also straightforward to implement an algorithm that tests whether is simply connected on a given hyperrectangle ; our implementation carries out the algorithmic process naturally suggested by Definition 2.3.
With these ingredients, we can now describe the connected component labeling algorithm. Given a Bernstein polynomial and a hyperrectangular constraint domain , we first test whether is simply connected on . If it is, then and are either empty or have exactly one component, in which case the labeling problem is trivial. Otherwise, we recursively subdivide into smaller and smaller subcells until either is simply connected on the subcell or a maximum recursion depth is met. In this work, we have opted for a quadtree/octree-style subdivision wherein the subcells maintain the aspect ratio of , though other subdivision algorithms are certainly possible. Upon conclusion of the subdivision process, we have a grid in which most (and in many cases, all) leaf cells have simple topology. In the second phase of the algorithm, two graphs are constructed, each representing the connectivity of the cells overlapping and . For example, if applied to a shared face of the grid yields and (resp., ), then an edge is inserted into (resp., ). These edges reflect the fact that there is likely (and in most cases, definitely) a path connecting and . Ultimately, it is the connected components of and which define the possible labels of the overall algorithm; computing this labeling can be done with negligible cost via efficient disjoint-set/merge-find data structures. In essence, these graphs establish the globally topology of , whereas each leaf cell of the tree handles the local topology. Finally, to determine which component an arbitrary point belongs, we apply standard (and fast) tree traversal algorithms to find a leaf cell containing , and then adopt its corresponding label from (if ) or (if ).
Algorithm 2 and Algorithm 3 sketch the implementation of the connected component labeling algorithm. A few complementary remarks are in order:
-
•
In Algorithm 2, whenever the restriction of to a subcell or its face is required, the corresponding Bernstein coefficients can be efficiently and accurately computed via the de Casteljau algorithm.
-
•
On line 7, specifies the user-defined maximum recursion depth. For example, if the constraint domain is the unit cube , the parameter corresponds to the smallest-possible subcell having width . The effect of on the overall labeling is studied in detail in the next section.
-
•
Algorithm 2 only assigns labels to cells that are path-connected to other cells; as such, if a cell contains a part of , say, that has not been path-connected to any other cell overlapping with , then that cell does not (yet) have a label defined. Our main motivation for this mainly concerns the scenarios for which the smallest-possible cell size is reached whereon the topology of is still uncertain: in these cases, we do not want to unnecessarily create a new label when thereon may actually be empty. In a sense, if a sign evaluation algorithm is unable to certify the presence of , then doing so will be up to the subsequent evaluation of ; indeed, Algorithm 3 creates a label only when a point is given which proves the nonemptiness of in that cell.
-
•
By caching the results of the simply connected tests on the leaf cells, as computed by the first phase of Algorithm 2, various steps in the second and last phase can be accelerated. For example, it is unnecessary to compute on a shared face if it is already known that is uniformly signed on one of the corresponding cells. On a related note, if a path connection already exists between and , it is unnecessary to execute lines 18–23 because these lines would not alter that connectivity; thus, some faces can be skipped during the second phase of the algorithm.
-
•
Our particular implementation of the subdivision tree represents each node by a bit-field of width 16 bits. One bit specifies whether the node is a leaf or non-leaf node; the remaining 15 bits of a non-leaf node points to the children of that node; the remaining 15 bits of a leaf node store the type of the leaf cell (2 bits, flagging whether it is uniformly negative, uniformly positive, mixed sign and simply connected, or not simply connected) and 6 bits each to cache the and labels created by Algorithms 2 and 3. This implies an upper bound of components for each phase, ample for our purposes. In addition, our implementation of Algorithm 2 bypasses the explicit creation of the graphs ; instead, an efficient disjoint-set/merge-find data structure is used to represent the edge formations and subsequent label creation and lookup.
-
•
Finally, our implementation also makes use of a fuzzy threshold to robustly handle roundoff error in fixed-precision arithmetic. A tolerance is incorporated into the sign evaluation algorithm as follows: if , then iff , iff , and in all other cases. This tolerance helps to prevent inconsistencies arising from roundoff error, such as what might occur when the zero level set grazes the boundary of a cell. Here, the tolerance is chosen to scale relative to the Bernstein coefficients of on ; in particular, we have set , where is machine epsilon (approximately in double-precision arithmetic).
To conclude this section, we prove the correctness of the connected component labeling algorithm in the case the subdivision process finishes without hitting the maximum-imposed recursion depth. In other words, every leaf cell of the tree satisfies the simply connected property. In the prior section we saw that the sign evaluation algorithm is exact in this case, and so two adjacent cells and are -connected if and only if is nonempty. It follows that if are path connected, then a corresponding path would go through zero or more faces overlapping with , all of which must have been assigned the same label by Algorithm 2. Conversely, if for two points the evaluation of Algorithm 3 yields , then there must be a sequence of leaf cells with and such that, for each pair, is nonempty. Therefore, there is a sequence of points , and, because each has exactly one component, there exists a path connecting , and so and are path connected. The same arguments apply analogously to the negative region, and it is trivial to see that whenever , since the negative and positive regions do not have overlapping label identifiers. In summary, when all the leaf cells satisfy the simply connected property, the output of Algorithm 3 is exact in the sense that if and only if and are path connected. In fact, we will observe in the next section that in many instances, the labeling algorithm continues to be exact even when the maximum recursion depth is reached.
4 Numerical Experiments
In this section, we assess the effectiveness of the connected component labeling algorithm on randomly generated geometry in 2D and 3D as well as on particular examples exhibiting cusps, self-intersections, and other kinds of singularities.
4.1 Randomly generated geometry
We consider here a class of randomly generated polynomials whose corresponding implicitly defined geometry exhibits a variety of characteristics such as high-curvature pieces and almost-touching components. The random polynomials are defined through the orthonormal Legendre polynomials, as follows. Let , denote the first four univariate Legendre polynomials relative to the interval , i.e.,
In dimensions, define a tensor-product degree polynomial as follows:
(1) |
where is a given set of coefficients, one for each multi-index , and is a given (fixed) parameter such that . With this set up, the class of randomly generated geometry is defined by polynomials of the form (1) such that the coefficients are randomly and independently drawn from the uniform distribution on . Depending on the value of , the factor controls the smoothness of the polynomial by damping (or magnifying) the higher-order oscillatory modes of the Legendre basis. In particular, we have set , empirically chosen to give a reasonable spread between relatively mild to more complex geometry involving multiple components of high degree curvature; examples are given in the next set of figures.
As discussed in the prior section, the connected component labeling algorithm is exact whenever the subdivision process finishes without reaching the maximum recursion depth . Consequently, we expect that as increases, the percentage of cases in which the algorithm is certifiably correct will also increase. To quantify this, we generate a fixed number of random polynomials of the form (1), convert them to the Bernstein basis, and invoke Algorithm 2 on each instance on the reference domain . Three possible outcomes are tallied by the following quantities:
-
•
Let denote the percentage of cases in which line 8 is not executed, i.e., every leaf cell is simply connected so that the labeling is certifiably exact.
-
•
Let denote the percentage of cases in which line 8 is executed (i.e., the algorithm is uncertain about the topology), yet the labeling produced is exact.
-
•
Let denote the percentage of remaining cases, i.e., those in which two or more components have been artificially-glued.
Note that the last two scenarios require a notion of the ground truth so as to determine whether the labeling is exact or not; to that end, a reference solution for each of the randomly generated cases is computed by applying the base algorithm with unlimited recursion depth.444In every computed instance, the ground truth solution terminated in finite time.


Fig. 1 plots , , and as a function .555The data shown in Fig. 1 was collected from ten million randomly generated cases in 2D and one million random examples in 3D, more than enough to stabilize the results. On this class of randomly generated geometry, it is rare for the topology to be fully resolved with just one or two levels of subdivision (e.g., is less than 10% when ); however, after a few more levels of subdivision, increases to around 99% when in 2D, and around 90% when in 3D. In general, we observe that as increases, the number of cases in which the labeling algorithm is certifiably correct increases exponentially. Furthermore, note that decays to zero at about the same exponential speed as , but is a significantly small fraction thereof; as such, even when the algorithm reaches the maximum recursion depth and is consequently uncertain about the topology, the labeling continues to be exact in more cases than not. We also note that the failure rate is somewhat similar in 2D and 3D, but the rate of certitude () in 3D is smaller than it is in the 2D setting. This can be attributed to a number of aspects, chief among which the 3D geometry is substantially more complex. Another major reason concerns the Bernstein polynomial range evaluation, which is generally less accurate in 3D versus 2D; consequently, a few more levels of subdivision are needed in order to pass the simply connected polynomial tests, and intuitively one may expect higher dimensions to require additional refinement power.



Figs. 2, 3 and 4 provide some 2D examples demonstrating the above three scenarios. Each example illustrates the quadtree and the colors have a one-to-one correspondence with computed labeling; shaded cells correspond to those whereon is not simply connected, per the requirements of Definition 2.3. Fig. 2 shows some examples in which the algorithm is certain about the topology, i.e., every cell is simply connected; observe that, as expected, progressively more complex geometry can be resolved as further subdivision steps are permitted. Fig. 3 demonstrates cases in which the maximum recursion depth is reached (hence some cells are shaded), yet the labeling is still exact; these instances arise whenever a shaded cell contains at most one component each of and , but for some reason the simply connected test did not hold. Often, this occurs when a saddle point or extrema is nearby the zero level set of . Meanwhile, Fig. 4 demonstrates cases in which the subdivision is, in a sense, forced to stop early, leaving some components glued together. This scenario occurs whenever: (i) a shaded cell contains multiple components; and/or (ii) a shaded cell is -connected or -connected to a neighbor even when no path exists between them. Situation (ii) arises whenever the sign evaluation algorithm applied to the shared face is inexact in the sense that it predicts all possible signs, yet in actuality is uniformly signed on the face.666In 2D, on the randomly generated degree 3 polynomials under consideration, scenario (ii) never occurs because our particular range evaluation algorithm Algorithm 1 on 1D edges is exact.

Similar behavior is obtained in 3D, though it is much less straightforward to visualize. Fig. 5 illustrates some examples. Note that in some cases, two cells with different labels can be touching, demonstrating that the sign evaluation algorithm was successful in disconnecting them.
Intuitively, we expect the algorithm to glue components only when they are nearly touching. Indeed, the examples in Fig. 4 illustrate how the associated gaps are generally less than the size of the smallest cell, sometimes much smaller. To quantify this, we define an error metric which assigns a “cost” to gluing multiple components together. Roughly speaking, the cost is defined by the sum of the gaps between the agglomerated components. More precisely, given a set and two points , we define the cost of connecting and by the shortest path connecting them, as measured by the arclength outside of :
where the infimum is taken over all paths connecting and such that the part of exterior to has measurable and finite length. The cost of gluing multiple connected components is then defined as
For example, the cost of gluing two connected components is the smallest straight-line gap between them. As another example, the cost of gluing the 1D sets , , and is , being the sum of the two smaller gaps.


This metric is used to analyze the failure modes of the connected component labeling algorithm, as follows. For now, suppose is fixed. For each of the above-generated random polynomials , we invoke Algorithm 2. Using the reference solution, we determine which components have been glued together and compute the associated cost;777A separate algorithm has been developed to compute the gaps between the set of connected components of . The algorithm uses a recursive subdivision approach combined with a kind of adaptive point sampling. Its output is an interval bounding the exact gap value and can be made as tight as necessary; in particular, the gap calculations being presented here are sufficiently accurate for the analysis shown in Fig. 6. if no components were glued, the example is excluded from further analysis; if there are multiple sets of glued components, the cost of each set is measured and the maximal such cost is used. The output is the “error” for each example making up the tally defined earlier; note that measures gaps of the input geometry, but its value depends on which components have been glued, which in turn depends on . We expect to scale with the smallest cell size; in the present setting, for the reference domain , the smallest cells have a diagonal length of . We therefore examine their ratio and define the overall error for polynomial by . Aggregated over all such examples, we then compute the minimum and maximum value of as well as the median, first, and third quartiles. Fig. 6 displays the results corresponding to the set of 2D and 3D randomly generated polynomials. Once is sufficiently large so as to resolve the typical geometry (and for Bernstein subdivision to become sufficiently accurate), we observe that the gap between glued components is typically 25%–50% of the smallest cell. Meanwhile, the upper dashed curves of in Fig. 6 correspond to the rare cases in which two or more components are glued across a small chain of neighboring leaf cells. Notwithstanding this critical analysis, it is worth being reminded that, as Fig. 1 shows, the number of cases in which components are artificially glued decreases rapidly as is made larger.
4.2 Additional examples
![]() |
![]() |
![]() |
(i) Deltoid, three cusps | (ii) Trifolium, self-intersection | (iii) Non-negative polynomial |
![]() |
![]() |
![]() |
(iv) Oloid, one cusp | (v) High-order junction | (vi) Line of singularities |
Fig. 7 illustrates some additional examples involving corners, cusps, self-intersections, and other kinds of singularities. These examples are more contrived in the sense they typically would not occur in a computational physics modeling problem, say; nonetheless, they serve to stress-test the connected component labeling algorithm in ways the randomly generated geometry did not. In particular, the examples demonstrate cases in which the implicitly-defined geometry contains points of singularity, i.e., points where both and its gradient are zero. In these cases, at least in comparison to the topology test of Definition 2.3, it is considerably more nuanced to determine whether the topology of is sufficiently simple on a subcell . The examples of Fig. 7 serve to illustrate the algorithm’s behavior near singularities; in particular, (i) and (iv) show cases where the algorithm produces exact results despite the presence of singularities, while the remaining examples show cases where components are glued:
-
•
Fig. 7(i) demonstrates a deltoid curve given by the zero level set of ; note how the subdivision focuses in on the singular portions of the curve situated at the deltoid’s three cusps. In this case, the maximum-defined subdivision recursion depth is always met, but the resulting labeling is always exact.
-
•
Fig. 7(ii) demonstrates a trifolium curve, given by the zero level set , containing a point of self-intersection at the origin. The subdivision process focuses in on the origin until it is forced to stop; the labeling algorithm subsequently glues the three connected components of , while the negative component is unaffected.
-
•
Besides corners, cusps, and self-intersections, another kind of edge case concerns non-negative polynomials such that contains multiple components. Fig. 7(iii) illustrates a case involving circular geometry, where . A ring of smallest-permitted subcells is obtained such that the components on either side of the interface are glued, as indicated by the solid white in the figure.
-
•
Fig. 7(iv) demonstrates a 3D case involving a single isolated cusp, given by conic-like oloid surface where . Similar to the example in Fig. 7(i), the polynomial will never be simply connected on subcells sufficiently close to the cusp, and so the algorithm continues to subdivide until the maximum recursion depth is met. The labeling, however, is exact in this case.
-
•
Fig. 7(v) is a 3D analogue of (ii), and is given by the zero level set of . The full surface on contains six globular “arms” meeting at four junction points; we focus on just one of these singularities in the figure. In fixed-precision arithmetic, roundoff errors could lead to inconsistencies in the treatment of this geometry; the design choices in the present work result in all three components being glued.
-
•
Finally, Fig. 7(vi) demonstrates a non-trivial example in which the set of singularities is one-dimensional. Here, , whose corresponding implicitly-defined geometry resembles two pinched pieces of paper, one above the other, rotated and connected via a line of singularities situated along the whole -axis. In this example, the algorithm subdivides until the maximum-permitted resolution is met, with all subcells near or containing the -axis marked as not simply connected. These leaf cells then form a bridge from one folded paper to the other, gluing all four of their respective components.
As an addendum, we point out that a tailor-made connected component labeling could be designed to specifically handle the kinds of geometry shown in Fig. 7. For example, in all of these cases the polynomial degree is small enough that it would be straightforward to apply cylindrical algebraic decomposition methods from real algebraic geometry. Our main purpose for these test problems is to demonstrate the “failure modes” of the connected component labeling algorithm in these exceptional situations.
5 Concluding Remarks
The connected component labeling algorithm developed here operates by recursively subdividing the constraint hyperrectangular domain into progressively smaller subcells until the topology of thereon is sufficiently simple. This is achieved through a “simply connected” test exploiting some useful properties of the Bernstein polynomial basis, namely its effective methods for polynomial range evaluation. If the subdivision process succeeds, then the labeling is certifiably exact, i.e., if and only if are path-connected. At its core, however, the connected component labeling problem is ill-posed in the sense that tiny changes in the coefficients of the input polynomial could alter the number and topology of the connected components. We treated that here by allowing distinct components to be glued together, but only when the topology is uncertain: gluing occurs mainly when two components are nearly touching relative to the length scale of the smallest-permitted subcell; gluing also occurs for various edge cases such as interfacial self-intersections or junctions, or grazing-type singularities such as non-negative polynomials for which has multiple components.
We have yet to discuss the speed of the algorithm. In general, the computational complexity is difficult to precisely quantify, being intricately dependent on the input polynomial . For a fixed spatial dimension and bounded polynomial degree, the construction of the subdivision tree (Algorithm 2) has a worst-case complexity ; in practice, however, it is usually significantly faster. Computation of for some (Algorithm 3) has a worst-case complexity ; in practice, however, it is usually significantly faster. Indeed, for a given fixed polynomial, as soon as the subdivision process resolves its implicitly-defined geometry, the algorithmic complexity becomes independent of .888In pathological cases, such as those illustrated in Fig. 7, the computational complexity depends on the measure of the input polynomial’s set of singularities; in particular, if its set of singularities (inside ) is nonempty and has maximal manifold dimension , the worst-case complexity of Algorithm 2 is when , or when , as . A comparison to the high-order quadrature algorithms developed in [18] is perhaps useful: solving the connected component labeling problem is about as fast as building a moderate order quadrature scheme. These quadrature algorithms are already “fast” in some sense, so the speed of the labeling algorithm is sufficient for our intended usage. As an additional indication, the cost of tree construction for the randomly generated geometry problems in § 4.1 is about microseconds per instance in 2D and milliseconds per instance in 3D; about two thirds of that time is spent in the first phase of Algorithm 2, about one third in the second phase, and negligible time in third phase. After tree construction and on the same set of test problems, the cost of evaluating for a random point is about (resp., ) microseconds per evaluation in 2D (resp., 3D).999Timing measurements obtained on an Intel Xeon E3-1535m v6 laptop, single core, operating at approximately 3.5 Ghz.
Finally, we mention here some possibilities for extending the algorithmic approach. One immediate possibility is to develop a more sophisticated subcell topology test, e.g., being star-connected or via an analysis of convexity [9]. Accompanying this change, a more sophisticated sign evaluation algorithm might also be required, so as to appropriately path-connect neighboring cells, consistent with the updated topology test. Depending on the end application, however, these alterations might come with an increased computational cost. Another possibility is to replace the quadtree/octree-style subdivision with a k-d tree subdivision, or, more generally, a binary space partitioning (BSP) tree. In this setting, the most efficient subdivision process would likely occur by orienting to the features of the polynomial, e.g., by choosing hyperplanes which are locally parallel to its level sets. One would then have to generalize the range evaluation algorithms to polytopes and contend with more complex algorithms for building the cell connectivity. On the other hand, the quadtree/octree approach allows for a considerably simpler implementation, partly due to the use of tensor-product polynomials as well as dimension reduction and recursion techniques. As a final possibility, we note that the basic idea of the subdivision algorithm could be extended to non-polynomial level set functions; for example, one approach might be to apply automatic differentiation and interval arithmetic techniques (see, e.g., [17]) to devise a suitable subcell topology test for arbitrary level set functions.
Acknowledgements
This research was supported in part by the Applied Mathematics Program of the U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research under contract number DE-AC02-05CH11231, and by a DOE Office of Science Early Career Research Program award.
References
- [1] A. A. Ahmadi and A. Majumdar, DSOS and SDSOS Optimization: More Tractable Alternatives to Sum of Squares and Semidefinite Optimization, SIAM Journal on Applied Algebra and Geometry, 3 (2019), pp. 193–230, https://doi.org/10.1137/18M118935X.
- [2] S. Basu, R. Pollack, and M.-F. Roy, Complexity of computing semi-algebraic descriptions of the connected components of a semi-algebraic set, in Proceedings of the 1998 International Symposium on Symbolic and Algebraic Computation, ISSAC ’98, 1998, pp. 25–29, https://doi.org/10.1145/281508.281533.
- [3] S. Basu, R. Pollack, and M.-F. Roy, Algorithms in Real Algebraic Geometry, vol. 10 of Algorithms and Computation in Mathematics, Springer Berlin, Heidelberg, 2 ed., 2006, https://doi.org/10.1007/3-540-33099-2.
- [4] M. Beśka, Convexity and variation diminishing property of multidimensional Bernstein polynomials, Approximation Theory and its Applications, 5 (1989), pp. 59–78, https://doi.org/10.1007/BF02888887.
- [5] B. F. Caviness and J. R. Johnson, eds., Quantifier Elimination and Cylindrical Algebraic Decomposition, Texts & Monographs in Symbolic Computation, Springer-Verlag Wien, 1998, https://doi.org/10.1007/978-3-7091-9459-1.
- [6] G. E. Collins, Quantifier elimination for real closed fields by cylindrical algebraic decomposition, in Automata Theory and Formal Languages, H. Brakhage, ed., Berlin, Heidelberg, 1975, Springer Berlin Heidelberg, pp. 134–183, https://doi.org/10.1007/3-540-07407-4_17.
- [7] R. Farouki and V. Rajan, Algorithms for polynomials in Bernstein form, Computer Aided Geometric Design, 5 (1988), pp. 1–26, https://doi.org/10.1016/0167-8396(88)90016-7.
- [8] J. Garloff, The Bernstein algorithm, Interval Computations, 2 (1993), pp. 154–168.
- [9] J. B. Lasserre, Certificates of convexity for basic semi-algebraic sets, Applied Mathematics Letters, 23 (2010), pp. 912–916, https://doi.org/10.1016/j.aml.2010.04.009.
- [10] Q. Lin and J. Rokne, Methods for bounding the range of a polynomial, Journal of Computational and Applied Mathematics, 58 (1995), pp. 193–199, https://doi.org/10.1016/0377-0427(93)E0270-V.
- [11] Y. Matsumoto, An Introduction to Morse Theory, American Mathematical Society, 2002.
- [12] D. Morozov and G. H. Weber, Distributed contour trees, in Topological Methods in Data Analysis and Visualization III, P.-T. Bremer, I. Hotz, V. Pascucci, and R. Peikert, eds., Springer International Publishing, 2014, pp. 89–102, https://doi.org/10.1007/978-3-319-04099-8_6.
- [13] V. Powers and T. Wörmann, An algorithm for sums of squares of real polynomials, Journal of Pure and Applied Algebra, 127 (1998), pp. 99–104, https://doi.org/10.1016/S0022-4049(97)83827-3.
- [14] M. Putinar and S. Sullivant, eds., Emerging Applications of Algebraic Geometry, vol. 149 of The IMA Volumes in Mathematics and its Applications, Springer Science & Business Media, 2008, https://doi.org/10.1007/978-0-387-09686-5.
- [15] J. Rajyaguru, M. E. Villanueva, B. Houska, and B. Chachuat, Chebyshev model arithmetic for factorable functions, Journal of Global Optimization, 68 (2017), pp. 413–438, https://doi.org/10.1007/s10898-016-0474-9.
- [16] S. Ray and P. S. V. Nataraj, An efficient algorithm for range computation of polynomials using the Bernstein form, Journal of Global Optimization, 45 (2009), pp. 403–426, https://doi.org/10.1007/s10898-008-9382-y.
- [17] R. I. Saye, High-Order Quadrature Methods for Implicitly Defined Surfaces and Volumes in Hyperrectangles, SIAM Journal on Scientic Computing, 37 (2015), pp. A993–A1019, https://doi.org/10.1137/140966290.
- [18] R. I. Saye, High-order quadrature on multi-component domains implicitly defined by multivariate polynomials, Journal of Computational Physics, 448 (2022), p. 110720, https://doi.org/10.1016/j.jcp.2021.110720.