University of California, Santa Cruz, United States [email protected]://orcid.org/0009-0008-1073-6173University of California, Santa Cruz, United [email protected]://orcid.org/0000-0003-2163-3555 \CopyrightDaniel Paul-Pena and C. Seshadhri \ccsdesc[500]Mathematics of computing Graph algorithms \ccsdesc[500]Theory of computation Graph algorithms analysis \fundingBoth authors are supported by NSF CCF-1740850, CCF-1839317, CCF-2402572, and DMS-2023495
A Dichotomy Theorem for Linear Time Homomorphism Orbit Counting in Bounded Degeneracy Graphs
Abstract
Counting the number of homomorphisms of a pattern graph in a large input graph is a fundamental problem in computer science. In many applications in databases, bioinformatics, and network science, we need more than just the total count. We wish to compute, for each vertex of , the number of -homomorphisms that participates in. This problem is referred to as homomorphism orbit counting, as it relates to the orbits of vertices of under its automorphisms.
Given the need for fast algorithms for this problem, we study when near-linear time algorithms are possible. A natural restriction is to assume that the input graph has bounded degeneracy, a commonly observed property in modern massive networks. Can we characterize the patterns for which homomorphism orbit counting can be done in near-linear time?
We discover a dichotomy theorem that resolves this problem. For pattern , let be the length of the longest induced path between any two vertices of the same orbit (under the automorphisms of ). If , then -homomorphism orbit counting can be done in near-linear time for bounded degeneracy graphs. If , then (assuming fine-grained complexity conjectures) there is no near-linear time algorithm for this problem. We build on existing work on dichotomy theorems for counting the total -homomorphism count. Surprisingly, there exist (and we characterize) patterns for which the total homomorphism count can be computed in near-linear time, but the corresponding orbit counting problem cannot be done in near-linear time.
keywords:
Homomorphism counting, Bounded degeneracy graphs, Fine-grained complexity, Orbit counting, Subgraph countingcategory:
\relatedversion1 Introduction
Analyzing the occurrences of a small pattern graph in a large input graph is a central problem in computer science. The theoretical study has led to a rich and immensely deep theory [39, 20, 30, 24, 40, 2, 23, 45, 51, 17, 16]. The applications of graph pattern counts occur across numerous scientific areas, including logic, biology, statistical physics, database theory, social sciences, machine learning, and network science [34, 19, 22, 18, 27, 13, 29, 42, 59, 45, 25, 44]. (Refer to the tutorial [53] for more details on applications.)
A common formalism used for graph pattern counting is homomorphism counting. The pattern graph is denoted and is assumed to have constant size. The input graph is denoted . Both graphs are simple and do not contain self-loops. An -homomorphism is a map that preserves edges. Formally, , . Let denote the number of distinct -homomorphisms in .
Given the importance of graph homomorphism counts, the study of efficient algorithms for this problem is a subfield in itself [35, 3, 18, 27, 26, 24, 13, 23, 14, 51]. The simplest version of this problem is when is a triangle, itself a problem that attracts much attention. Let and . Computing is -hard when parameterized by (even when is a -clique), so we do not expect algorithms for general [24]. Much of the algorithmic study of homomorphism counting is in understanding conditions on and when the trivial running time bound can be beaten.
Our work is inspired by the challenges of modern applications of homomorphism counting, especially in network science. Typically, is extremely large, and only near-linear time () algorithms are feasible. Inspired by a long history and recent theory on this topic, we focus on bounded degeneracy input graphs (we say bounded degeneracy graphs to refer to graphs belonging to classes of graphs with bounded degeneracy). This includes all non-trivial minor-closed graph families, such as planar graphs, bounded genus graphs, and bounded tree-width graphs. Many practical algorithms for large-scale graph pattern counting use algorithms for bounded degeneracy graphs [2, 38, 45, 43, 37, 44]. Real-world graphs typically have a small degeneracy, comparable to their average degree ([32, 37, 55, 5, 9], also Table 2 in [5]).
Secondly, many modern applications for homomorphism counting require more fine-grained statistics than just the global count . The aim is to find, for every vertex of , the number of homomorphisms that participates in. Seminal work in network analysis for bioinformatics plots the distributions of these per-vertex counts to compare graphs [36, 46]. Orbit counts can be used to generate features for vertices, sometimes called the graphlet kernel [54]. In the past few years, there have been many applications of these per-vertex counts [10, 59, 52, 57, 4, 58, 50, 60, 61].
Algorithms for this problem require considering the “roles” that could play in a homomorphism. For example, in a -path (a path of length ) there are different roles: a vertex could be in the middle, could be at the end, or at two other positions. These roles are colored in Fig. 1. The roles are called orbits (defined in the Section 3), and the problem of -homomorphism orbit counting is as follows: for every orbit in and every vertex in , output the number of homomorphisms of where participates in the orbit . This is the main question addressed by our work:
What are the pattern graphs for which the -homomorphism orbit counting problem is computable in near-linear time (when has bounded degeneracy)?
Recent work of Bressan followed by Bera-Pashanasangi-Seshadhri introduced the question of homomorphism counting for bounded degeneracy graphs, from a fine-grained complexity perspective [14, 8]. A dichotomy theorem for near-linear time counting of was provided in subsequent work [6]. Assuming fine-grained complexity conjectures, can be computed in near-linear time iff the longest induced cycle of has length at most . It is natural to ask whether these results extend to orbit counting.

The bottom graph adds a triangle at the end, breaking the symmetry, and the only vertices in the same orbit in that graph are the red ones. The LIPCO in this graph is now less than 6 so we can compute in near-linear time.
1.1 Main Result
We begin with some preliminaries. The input graph has vertices and edges. A central notion in our work is that of graph degeneracy, also called the coloring number.
Definition 1.1.
A graph is -degenerate if the minimum degree in every subgraph of is at most .
The degeneracy of is the minimum value of such that is -degenerate.
A family of graphs has bounded degeneracy if the degeneracy is constant with respect to the graph size. Bounded degeneracy graph classes are extremely rich. For example, all non-trivial minor-closed families have bounded degeneracy. This includes bounded treewidth graphs. Preferential attachment graphs also have bounded degeneracy; real-world graphs have a small value of degeneracy (often in the 10s) with respect to their size (often in the hundreds of millions) [5].
We assume the pattern graph to have a constant number of vertices. (So we suppress any dependencies on purely the size of .) Consider the group of automorphisms of . The vertices of can be partitioned into orbits, which consist of vertices that can be mapped to each other by some automorphism (defined formally in Definition 3.1). For example, in Fig. 1, the -path has four different orbits, where each orbit has the same color. The -path with a hanging triangle (in Fig. 1) has more orbits, since the pattern is no longer symmetric with respect to the “center” of the -path and hence the opposite “ends” of the -path cannot be mapped by a non-trivial automorphism.
The set of orbits of the pattern is denoted . Let be the set of homomorphisms from to (). We now define our main problem.
Definition 1.2.
Homomorphism Orbit Counts: For each orbit and vertex , define to be the number of -homomorphisms mapping a vertex of to . Formally, .
The problem of -homomorphism orbit counting is to output the values for all . (Abusing notation, refers to the list/vector of all of these values.)
Note that for a given , the size of the output is (recall ). For example, when is the -path, we will get counts, for each vertex and each of the four orbits.
Our main result is a dichotomy theorem that precisely characterizes patterns for which can be computed in near-linear time. We introduce a key definition.
Definition 1.3.
For a pattern , the Longest Induced Path Connecting Orbits of , denoted is defined as follows. It is the length of the longest induced simple path, measured in edges, between any two vertices in (where may be equal to , forming a cycle) in the same orbit.
Again refer to Fig. 1. The -path has a LIPCO of six, since the ends are in the same orbit. On the other hand, the second pattern (-path with a triangle) has a LIPCO of due to the triangle.
The Triangle Detection Conjecture was introduced by Abboud and Williams on the complexity of determining whether a graph has a triangle [1]. It is believed that this problem cannot be solved in near-linear time, and indeed, may even require time. We use this conjecture for the lower bound of our main theorem.
Conjecture 1.4 (Triangle Detection Conjecture [1]).
There exists a constant such that in the word RAM model of bits, any algorithm to detect whether an input graph on edges has a triangle requires time in expectation.
Our main theorem proves that the LIPCO determines the dichotomy. Note that because is a bounded degeneracy graph we have , we will be expressing the bounds in terms of .
Theorem 1.5.
(Main Theorem) Let be a graph with vertices, edges, and bounded degeneracy. Let denote the constant from the Triangle Detection Conjecture (Conjecture 1.4).
-
•
If : there exists a deterministic algorithm that computes in time .111The exact dependency on the degeneracy of the input graph is .
-
•
If : assume the Triangle Detection Conjecture. There is no algorithm with (expected) running time that computes .
Orbit Counting vs Total Homomorphism Counting
In the following discussion, we use “linear” to actually mean near-linear, we assume that the Triangle Detection Conjecture is true, and we assume that has bounded degeneracy.
One of the most intriguing aspects of the dichotomy of Theorem 1.5 is that it differs from the condition for getting the total homomorphism count. As mentioned earlier, the inspiration for Theorem 1.5 is the analogous result for determining . There is a near-linear time algorithm iff the length of the longest induced cycle (LICL) of is at most five. Since the definition of LIPCO considers induced cycles (induced path between a vertex to itself), if , then . This implies, not surprisingly, that the total homomorphism counting problem is easier than the orbit counting problem.
But there exist patterns for which the orbit counting problem is provably harder than total homomorphism counting, a simple example is the -path (path with vertices). There is a simple linear time dynamic program for counting the homomorphism of paths. But the endpoints are in same orbit, so the LIPCO is six, and Theorem 1.5 proves the non-existence of linear time algorithms for orbit counting. On the other hand, the LIPCO of the -path is five, so orbit counting can be done in linear time.
Consider the pattern at the bottom of Fig. 1. The LICL is three, so the total homomorphism count can be determined in linear time. Because the ends of the underlying -path lie in different orbits, the LIPCO is also three (by the triangle). Theorem 1.5 provides a linear time algorithm for orbit counting.
1.2 Main Ideas
The starting point for homomorphism counting on bounded degeneracy graphs is the seminal work of Chiba-Nishizeki on using acyclic graph orientations [20]. It is known that, in linear time, the edges of a bounded degeneracy graph can be acyclically oriented while keeping the outdegree bounded [41]. For clique counting, we can now use a brute force algorithm in all out neighborhoods, and get a linear time algorithm. Over the past decade, various researchers observed that this technique can generalize to certain other pattern graphs [21, 45, 43, 44]. Given a pattern , one can add the homomorphism counts of all acyclic orientations of for an acyclic orientation of . In certain circumstances, each acyclic orientation can be efficiently counted by a carefully tailored dynamic program that breaks the oriented into subgraphs spanned by rooted, directed trees.
Bressan gave a unified treatment of this approach through the notion of DAG-tree decompositions. [14] These decompositions give a systematic way of breaking up an oriented pattern into smaller pieces, such that homomorphism counts can be computed by a dynamic program. Bera et al. showed that if the LICL of is at most , then the DAG-treewidth of is at most one [8, 6]. This immediately implies Bressan’s algorithm runs in linear time.
Our result on orbit counting digs deeper into the mechanics of Bressan’s algorithm. To run in linear time, Bressan’s algorithm requires “compressed” data structures that store information about homomorphism counts. For example, the DAG-tree decomposition based algorithm can count -cycles in linear time for bounded degeneracy graphs (this was known from Chiba-Nishizeki as well [20]). But there could exist quadratically many -cycles in such a graph. Consider two vertices connected by disjoint paths of length ; each pair of paths yields a distinct -cycle. Any linear time algorithm for -cycle counting has to carefully index directed paths and combine these counts, without actually touching every -cycle.
By carefully looking at Bressan’s algorithm, we discover that “local” per-vertex information about -homomorphisms can be computed. Using the DAG-tree decomposition, one can combine these counts into a quantity that looks like orbit counts. Unfortunately, we cannot get exact orbit counts, but rather a weighted sum of homomorphisms.
To extract exact orbit counts, we dig deeper into the relationship between orbit counts and per-vertex homomorphism counts. This requires looking into the behavior of independent sets in the orbits of . We then design an inclusion-exclusion formula that “inverts” the per-vertex homomorpishm counts into orbit counts. The formula requires orbit counts for other patterns that are constructed by merging independent sets in the same orbit of .
Based on previous results, we can prove that if the LICL of all these patterns is at most , then can be computed in (near)linear time. This LICL condition over all is equivalent to the LIPCO of being at most . Achieving the upper bound of Theorem 1.5.
The above seemingly ad hoc algorithm optimally characterizes when orbit counting is linear time computable. To prove the matching lower bound, we use tools from the breakthrough work of Curticapean-Dell-Marx [23]. They prove that the complexity of counting linear combinations of homomorphism counts is determined by the hardest individual count (up to polynomial factors). Gishboliner-Levanzov-Shapira give a version of this tool for proving linear time hardness [31]. Consider a pattern with LIPCO at least six. We can construct a pattern with LICL at least six by merging vertices of an orbit in . We use the tools above to construct a constant number of linear sized graphs such that a linear combination of -orbit counts on these graphs yields the total -homomorphism count on . The latter problem is hard by existing bounds, and hence the hardness bounds translate to -orbit homomorphism counting.
2 Related Work
Homomorphism and subgraph counting on graphs is an immense topic with an extensive literature in theory and practice. For a detailed discussion of practical applications, we refer the reader to a tutorial [53].
Homomorphism counting is intimately connected with the treewidth of the pattern . The notion of tree decomposition and treewidth were introduced in a seminal work by Robertson and Seymour [47, 48, 49]; although it has been discovered before under different names [11, 33]. A classic result of Dalmau and Jonsson [24] proved that is polynomial time solvable if and only if has bounded treewidth, otherwise it is -complete. Díaz et al [26] gave an algorithm for homomorphism counting with runtime where is the treewidth of the pattern graph and the number of vertices of .
To improve on these bounds, recent work has focused on restrictions on the input [51]. A natural restriction is bounded degeneracy, which is a nuanced measure of sparsity introduced by early work of Szekeres-Wilf [56]. Many algorithmic results exploit low degeneracy for faster subgraph counting problems [20, 28, 2, 38, 45, 43, 37, 44].
Pioneering work of Bressan introduced the concept of DAG-treewidth for faster algorithms for homomorphism counting in bounded degeneracy graphs [14]. Bressan gave an algorithm for counting running in time essentially , where denotes the DAG-treewidth. The result also proves that (assuming ETH) there is no algorithm running in time .
Bera-Pashanasangi-Seshadhri build on Bressan’s methods to discover a dichotomy theorem for linear time homomorphism counting in bounded degeneracy graphs [7, 8]. Gishboliner, Levanzov, and Shapira independently proved the same characterization using slightly different methods [31, 6].
We give a short discussion of the Triangle Detection Conjecture. Itai and Rodeh [35] gave the first non-trivial algorithm for the triangle detection and finding problem with runtime. The current best known algorithm runs in time [3], where is the matrix multiplication exponent. Even for , the bound is and widely believed to be a lower bound. Many classic graph problems have fine-grained complexity hardness based on Triangle Detection Conjecture [1].
Homomorphism or subgraph orbit counts have found significant use in network analysis and machine learning. Przulj introduced the use of graphlet (or orbit count) degree distributions in bioinformatics [46]. The graphlet kernel of Shervashidze-Vishwanathan-Petri-Mehlhorn-Borgwardt uses vertex orbits counts to get embeddings of vertices in a network [54]. Four vertex subgraph and large cycle and clique orbit counts have been used for discovering special kinds of vertices and edges [59, 50, 60, 61]. Orbits counts have been used to design faster algorithms for finding dense subgraphs in practice [10, 52, 57, 4, 58].
3 Preliminaries
We use to denote the input graph and to denote the pattern graph, both and are simple, undirected and connected graphs. We denote and by and respectively and by .
A pattern graph is divided into orbits, we use the definition from Bondy and Murty (Chapter , Section [12]):
Definition 3.1.
Fix a graph . An automorphism is a bijection such that iff . The group of automorphisms of is denoted .
Define an equivalence relation on as follows. We say that iff there exists an automorphism that maps to . The equivalence classes of the relation are called orbits.
We refer to the set of orbits in as and to individual orbits in as . Note that every vertex belongs to exactly one orbit. We can represent an orbit by a canonical (say lexicographically least) vertex in the orbit. Somewhat abusing notation, we can think of the set of orbits as a subset of vertices of , where each vertex plays a “distinct role” in . Fig. 1 has examples of different graphs with their separate orbits.
We now define homomorphisms.
Definition 3.2.
An -homomorphism from to is a mapping such that for all , . We refer to the set of homomorphisms from to as .
We now define a series of counts.
-
•
: This is the count of -homomorphisms in . So .
-
•
: For a vertex , is the number of -homomorphisms that map any vertex in the orbit to . Formally, .
-
•
: We use to denote the list/vector of counts over all . Similarly, denotes the sequence of lists of counts over all orbits .
Our aim is to compute , which are a set of homomorphism counts. We use existing algorithmic machinery to compute homomorphism counts per vertex of , so part of our analysis will consist of figuring out how to go between these counts. As we will see, this is where the LIPCO parameter makes an appearance.
Acyclic orientations
These are a key algorithmic tool in efficient algorithms for bounded degeneracy graphs. An acyclic orientation of an undirected graph is a digraph obtained by directing the edges of such that the digraph is a DAG. We will encapsulate the application of the degeneracy in the following lemma, which holds from a classic result of Matula and Beck [41].
Lemma 3.3.
Suppose has degeneracy . Then, in time, one can compute an acyclic orientation of with the following property. The maximum outdegree of is precisely . ( is also called a degeneracy orientation.)
The set of all acyclic orientations of is denoted . Our algorithm will enumerate over all such orientations.
Note that all definitions of homomorphisms carry over to DAGs.
3.1 DAG-tree decompositions
A central part of our result is applying intermediate lemmas from an important algorithm of Bressan for homomorphism counting [14]. This subsection gives a technical overview of Bressan’s techique of DAG-tree decompositions and related lemmas. Our aim is to state the key lemmas from previous work that can be used as a blackbox.
The setting is as follows. We have an acyclic orientation and a DAG pattern (think of as a member of ; is an acyclic orientation of ). Bressan’s algorithm gives a dynamic programming approach to counting .
We introduce some notation. We use the standard notion of reachability in digraphs: vertex is reachable from if there is a directed path from to .
-
•
: The set of sources in the DAG .
-
•
: For source , is the set of vertices in reachable from .
-
•
: Let . .
-
•
: This is the subgraph of induced by .
Definition 3.4.
(DAG-tree decomposition [14]). Let be a DAG with source vertices . A DAG-tree decomposition of P is a tree with the following three properties:
-
1.
Each node (referred to as a “bag” of sources) is a subset of the source vertices : .
-
2.
The union of the nodes in is the entire set : .
-
3.
For all , if lies on the unique path between the nodes and in , then .
Definition 3.5.
Let be a DAG. For any DAG-tree decomposition to , the DAG-treewidth is defined as . The DAG-treewidth of , denoted , is the minimum value of over all DAG-tree decompositions of .
Two important lemmas
We state two critical results from previous work. Both of these are highly non-trivial and technical to prove. We will use them in a black-box manner. The first lemma, by Bera-Pashanasangi-Seshadhri, connects the Largest Induced Cycle Length (LICL) to DAG-treewidth [8].
Lemma 3.6.
(Theorem 4.1 in [8]) For a simple graph : iff .
The second lemma is an intermediate property of Bressan’s subgraph counting algorithm [15]. We begin by defining homomorphism extensions. Think of some directed pattern that we are trying to count. Fix a (rooted) DAG-tree decomposition . Let be a subgraph of , be a subgraph of . A -homomorphism extends a -homomorphism if , . Basically, agrees with wherever the latter is defined.
-
•
: Let be a homomorphism from a subgraph of to . Then is the number of -homomorphisms extending .
-
•
: Let be a node in the DAG-tree decomposition of . The set is the union of bags that are descendants of in . Furthermore is the pattern induced by .
A technical lemma in Bressan’s result shows that extension counts can be obtained efficiently. We will refer to the procedure in this lemma as “Bressan’s algorithm”.
Lemma 3.7.
(Lemma 5 in [15]): Let be a digraph with outdegree at most and be a DAG with vertices. Let be a DAG-tree decomposition for , and any element of . There is a procedure, that in time , returns a dictionary storing the following values: for every , it has .
Let us explain this lemma in words. For any bag , which is a set of sources in , consider , which is the subgraph induced by . For every -homomorphism , we wish to count the number of extensions to (the subgraph induced by vertices of reachable by any source in any descendant bag of ).
4 Obtaining Vertex-Centric Counts
We define vertex-centric homomorphism counts, which allows us to ignore orbits and symmetries in . Quite simply, for vertices and , we count the number of homomorphisms from to that map to .
Definition 4.1.
Vertex-centric Counts: For each vertex and vertex , let be the number of -homomorphisms that map to .
Let denote the list of over all and .
We can show that the vertex-centric counts can be obtained in near-linear time when :
Theorem 4.2.
There is an algorithm that takes as input a bounded degeneracy graph and a pattern with , and has the following properties. It outputs and runs in time.
Before proving this theorem we need to introduce two more lemmas. First, we invoke the following lemma from [15]:
Lemma 4.3.
(Lemma 4 in [15]): Given any , the set of homomorphisms has size and can be enumerated in time .
Second, we show how to use the output of Bressan’s algorithm to obtain the Vertex-centric counts:
Lemma 4.4.
Let be a directed pattern on vertices, be a DAG-tree decomposition of with (All nodes/bags in are singletons), and be a directed graph with vertices and max degree . Let be the root of and be any vertex in . We can compute in time .
Proof 4.5.
The algorithm of Lemma 3.7 will return a data structure/dictionary that gives the following values. For each , it provides . Note that is the root of . By the properties of a DAG-tree decomposition, contains all vertices of and . Hence, the dictionary gives the values , that is, the number of homomorphisms that extend .
Let be a vertex in . We can partition the set of homomorphisms from to , , into sets defined as follows. For each ,.
By Lemma 4.3 we can list all the homomorphisms in time, by the same lemma we know that will have size at most , hence we can iterate over the list of homomorphisms and check the value of . We can then express as follows:
We can compute all of these values by enumerating all the elements in (over all ), and making a dictionary access to get . The total running time is , where is extra overhead of accessing the dictionary.
By Lemma 3.7, the dictionary construction takes time. Since and , we can express the total complexity as .
We can now complete the proof of Theorem 4.2:
Proof 4.6 (Proof of Theorem 4.2).
The algorithm is explicitly described in Algorithm 1.
The first step of our algorithm is to construct the degeneracy orientation of . By Lemma 3.3, it can be computed in time. Since has bounded degeneracy, has bounded outdegree. When orienting as , each homomorphism from to becomes a homomorphism of exactly one of the directed patterns to . We can hence compute as the sum of for every acyclic orientation of . This is given by the following equation:
(1) |
Because , Lemma 3.6 implies that for all , . There exists a DAG-tree decomposition of with . We use the output of Bressan’s algorithm to obtain the Vertex-centric counts.
The DAG-tree decomposition can be arbitrarily rooted at any node . Moreover, for each , there must exist some source such that (meaning, is reachable from ). So, by rooting at all possible nodes (singleton bags), we can ensure that is in . We can apply Lemma 4.4 to get all counts .
We complete the proof by bounding the running time and asserting correctness.
From Lemma 3.3, we can compute in . Since has bounded degeneracy, and the outdegree is bounded. The number of acyclic orientations of , is bounded by . In each iteration, by Lemma 4.4, we will take . For constant and constant , the running time is .
Now we prove the correctness of the algorithm. Consider each . Let be the DAG-tree decomposition of . For each , we compute for all the vertices in . By looping over each singleton bag , we update counts for all vertices in . Hence, we are computing . Finally, we sum over all , which by Equation 1, gives us .
5 From Vertex-Centric to Orbit Counts
We now show how to go from vertex-centric to orbit counts, using inclusion-exclusion. Much of our insights are given by the following definitions.
Definition 5.1.
: Given a pattern graph , for every orbit we define as the collection of all non empty subsets , such that forms an independent set (i.e. there is no edge in connecting any two vertices in ).
Formally, .
Definition 5.2.
: For each set we define as the graph resulting from merging all the vertices in into a single new vertex , removing any duplicate edge.
We state two more tools in our analysis. The first lemma relates the counts obtained in the previous section () to the desired output ().
Lemma 5.3.
(Inclusion-exclusion formula:)
In order to prove this lemma, we need to define the Signature of a homomorphisms. Let be a homomorphism from to , we define to be the subset of vertices from the orbit that are mapped to in . Formally .
We prove a series of claims regarding the signature.
Claim 1.
The Signature of from to , , must form an independent set of vertices in , that is, there are no edges in connecting two vertices in .
Proof 5.4.
We prove by contradiction. Assume that is not an Independent Set of vertices of , that means that we have a pair of vertices such that there is an edge connecting them. But from the definition of signature we have that , however this is not a valid homomorphism from to as it is not preserving the edge.
The next claim allows us to relate the Signature with the Homomorphism Orbit Counts.
Claim 2.
Proof 5.5.
From the definition of Homomorphism Orbit Counts we have that . Hence, suffices to show that .
Let be a homomorphism from to such that . Let , we know that as is mapped to and from Claim 1 we know that it forms an independent set on the vertices of . Hence .
To prove the other direction of the equality, suffices to note that if a homomorphism contributes to the right side of the equation, then its signature belongs to , hence there is at least one vertex that is mapped to , and thus contributes to the left side of the equation.
Now, we will relate the Signature with the Vertex-centric Counts:
Claim 3.
For each orbit in and each vertex in we have that :
Proof 5.6.
If is mapping all the vertices in to , then the Signature of from to must be a superset of , . Hence summing over such sets will reach the equality. Note that we can add the restriction of belonging to as it is implied from Claim 1.
Let be the set of homomorphism from to . When forms an independent set there is an equivalence between the homomorphisms in that map to and the set of homomorphisms in that map all the vertices of to . In fact we can prove the following claim:
Claim 4.
If is not empty and form an independent set:
Proof 5.7.
From the definition of Vertex-centric Counts we have that . Hence it suffices to show that:
We do so by proving that there is a bijection between both sets, that is, a one to one correspondence between them. Let and . We show an invertible function :
-
•
Given a homomorphism we obtain by setting and . This is a valid homomorphism as we are mapping all the vertices in to and we are preserving the edges.
-
•
Given a homomorphism we obtain by setting and . Again this is a valid homomorphism as we are mapping all the vertices in to and we are still preserving the edges.
Additionally, we have that for all , , which completes the proof.
We will show one last claim that will be important when deriving the inclusion-exclusion formula:
Claim 5.
Given a graph , for every orbit , any subset satisfies:
Proof 5.8.
We now have all the tools required to prove Lemma 5.3:
Proof 5.9 (Proof of Lemma 5.3).
The next lemma relates the Longest Induced Path Connecting Orbits (LIPCO) defined in Definition 1.3 with the of all the graphs , for all and all orbits of .
Lemma 5.10.
For every graph , iff , .
Proof 5.11.
First, we show that if then . Consider the longest induced path in with endpoints in the same orbit , let be the two endpoints of the path. We have two cases:
-
•
: In this case the induced path is actually just an induced cycle of length or more in including the vertex . For any and for any with we have that , and hence .
-
•
: In the other case we have that are distinct vertices. Consider the set , we have that as both and there is no edge connecting them (otherwise we would have a longer induced cycle). We form by combining and into a single vertex, the induced path that we had in becomes then an induced cycle of length at least , which implies .
Now, we prove that if then . Let be the set such that . Again, we have two cases:
-
•
: We have that and hence , any vertex in that induced cycle induces a path of the same length with such vertex in both ends, which implies .
-
•
: Let be the vertex in obtained by merging the vertices of in . Consider the longest induced cycle in , if that cycle does not contain then that same cycle exists in and , which implies . Otherwise, we can obtain by splitting back into separate vertices, there will be two distinct vertices that are in the two ends of an induced path of the same length in , thus .
6 Wrapping it up
In this section we complete the proof of the main theorem for the upper bound. We also give Algorithm 2, which summarizes the entire process.
Theorem 6.1.
There is an algorithm that, given a bounded degeneracy graph and pattern with , computes in time .
Proof 6.2.
7 Lower Bound for computing Homomorphism Orbit Counts
In this section we prove the lower bound of Theorem 1.5. It will be given by the following theorem:
Theorem 7.1.
Let be a pattern graph on vertices with . Assuming the Triangle Detection Conjecture, there exists an absolute constant such that for any function , there is no (expected) algorithm for the problem, where and are the number of edges and the degeneracy of the input graph, respectively.
To prove this Theorem we will show how to express the Homomorphism Orbit Counts for some orbit as a linear combination of Homomorphism counts of non-isomorphic graphs for all in . Because we will have that the of at least one of these graphs is also greater than . We will then show that the hardness of computing Orbit counts in the original graph is the same than the hardness of computing the Homomorphisms counts. Finally we use a previous hardness result from [8] to complete the proof.
First, we introduce the following definition:
Definition 7.2.
Given a pattern graph and an input graph , for the orbit of , we define as the sum over every vertex of homomorphisms that are mapping some vertex in to , that is:
Note that if we can compute for every vertex in then we can also compute in additional linear time. Now, we state the following lemma:
Lemma 7.3.
For every pattern graph and every orbit , there is some number such that the following holds. For every graph there are some graphs , computable in time , such that and for all , and such that knowing allows one to compute for all , in time . Furthermore, if is -degenerate, then so are .
First, we can relate the Homomorphism Vertex Counts of a vertex to Homomorphism Counts from to , as given in the following claim:
Claim 6.
For all :
Proof 7.4.
(Def. of ) | |||
(Sum over whole set) | |||
() | |||
(Def. of Hom) |
We now state the following Lemma from [6]:
Lemma 7.5.
(Lemma 4.2 from [6]): Let be pairwise non-isomorphic graphs and let be non-zero constants. For every graph there are graphs , computable in time , such that and for every , and such that knowing for every allows one to compute in time . Furthermore, if is -degenerate, then so are .
We will apply the previous lemma in a similar way as it is used the proof of Lemma in [6].
Proof 7.6 (Proof of Lemma 7.3).
Let be an enumeration of all the graphs for all , up to isomorphism. This means that are pairwise non-isomorphic and .
Let be the number of sets such that is isomorphic to , with the sign being . Note that all such sets have equal and that the value of is always non-zero. We will use to denote the vertex of that correspond to the vertices of the graphs that are isomorphic to . We can express as follows:.
Before we prove Theorem 7.1, we need to state the following theorem from [8], which gives a hardness result on Homomorphism Counting:
Theorem 7.7 (Theorem 5.1 from [8]).
Let be a pattern graph on vertices with . Assuming the Triangle Detection Conjecture, there exists an absolute constant such that for any function , there is no (expected) algorithm for the problem, where and are the number of edges and the degeneracy of the input graph, respectively.
We now have all the tools required to proof Theorem 7.1:
Proof 7.8 (Proof of Theorem 7.1).
We prove by contradiction. Given a graph and a pattern with , suppose there exists an algorithm that allows us to compute in time , by Lemma 7.3 we have the existence of some graphs . We can compute for all of these graphs in time and then aggregate the results into for all and all . Using Lemma 7.3, that implies that we can compute for all for all in time .
References
- [1] Amir Abboud and Virginia Vassilevska Williams. Popular conjectures imply strong lower bounds for dynamic problems. In Proc. 55th Annual IEEE Symposium on Foundations of Computer Science, 2014. doi:10.1109/FOCS.2014.53.
- [2] Nesreen K. Ahmed, Jennifer Neville, Ryan A. Rossi, and Nick Duffield. Efficient graphlet counting for large networks. In Proceedings, SIAM International Conference on Data Mining (ICDM), 2015. doi:10.1109/ICDM.2015.141.
- [3] Noga Alon, Raphael Yuster, and Uri Zwick. Finding and counting given length cycles. Algorithmica, 17(3):209–223, 1997. doi:10.1007/BF02523189.
- [4] A. Benson, D. F. Gleich, and J. Leskovec. Higher-order organization of complex networks. Science, 353(6295):163–166, 2016. doi:10.1126/science.aad9029.
- [5] Suman K Bera, Amit Chakrabarti, and Prantar Ghosh. Graph coloring via degeneracy in streaming and other space-conscious models. In International Colloquium on Automata, Languages and Programming, 2020. doi:10.4230/LIPIcs.ICALP.2020.11.
- [6] Suman K. Bera, Lior Gishboliner, Yevgeny Levanzov, C. Seshadhri, and Asaf Shapira. Counting subgraphs in degenerate graphs. Journal of the ACM (JACM), 69(3), 2022. doi:10.1145/3520240.
- [7] Suman K Bera, Noujan Pashanasangi, and C Seshadhri. Linear time subgraph counting, graph degeneracy, and the chasm at size six. In Proc. 11th Conference on Innovations in Theoretical Computer Science. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020. doi:10.4230/LIPIcs.ITCS.2020.38.
- [8] Suman K. Bera, Noujan Pashanasangi, and C. Seshadhri. Near-linear time homomorphism counting in bounded degeneracy graphs: The barrier of long induced cycles. In Proceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, page 2315–2332, 2021. doi:10.1137/1.9781611976465.138.
- [9] Suman K Bera and C Seshadhri. How the degeneracy helps for triangle counting in graph streams. In Principles of Database Systems, pages 457–467, 2020. doi:10.1145/3375395.3387665.
- [10] Jonathan W. Berry, Bruce Hendrickson, Randall A. LaViolette, and Cynthia A. Phillips. Tolerating the community detection resolution limit with edge weighting. Phys. Rev. E, 83:056119, 2011. doi:10.1103/PhysRevE.83.056119.
- [11] Umberto Bertele and Francesco Brioschi. On non-serial dynamic programming. J. Comb. Theory, Ser. A, 14(2):137–148, 1973. doi:10.1016/0097-3165(73)90016-2.
- [12] J.A. Bondy and U.S.R Murty. Graph Theory, volume 244. Springer, 2008. doi:10.1007/978-1-84628-970-5.
- [13] Christian Borgs, Jennifer Chayes, László Lovász, Vera T. Sós, and Katalin Vesztergombi. Counting graph homomorphisms. In Topics in discrete mathematics, pages 315–371. Springer, 2006. doi:10.1007/3-540-33700-8_18.
- [14] Marco Bressan. Faster subgraph counting in sparse graphs. In 14th International Symposium on Parameterized and Exact Computation (IPEC 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. doi:10.4230/LIPIcs.IPEC.2019.6.
- [15] Marco Bressan. Faster algorithms for counting subgraphs in sparse graphs. Algorithmica, 83:2578–2605, 2021. doi:10.1007/s00453-021-00811-0.
- [16] Marco Bressan, Leslie Ann Goldberg, Kitty Meeks, and Marc Roth. Counting subgraphs in somewhere dense graphs. In 14th Innovations in Theoretical Computer Science Conference (ITCS 2023), pages 27:1–27:14, 2023. doi:10.4230/LIPIcs.ITCS.2023.27.
- [17] Marco Bressan and Marc Roth. Exact and approximate pattern counting in degenerate graphs: New algorithms, hardness results, and complexity dichotomies. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 276–285, 2022. doi:10.1109/FOCS52979.2021.00036.
- [18] Graham R Brightwell and Peter Winkler. Graph homomorphisms and phase transitions. Journal of combinatorial theory, series B, 77(2):221–262, 1999. doi:10.1006/jctb.1999.1899.
- [19] Ashok K Chandra and Philip M Merlin. Optimal implementation of conjunctive queries in relational data bases. In Proc. 9th Annual ACM Symposium on the Theory of Computing, pages 77–90, 1977. doi:10.1145/800105.803397.
- [20] Norishige Chiba and Takao Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal on Computing (SICOMP), 14(1):210–223, 1985. doi:10.1137/0214017.
- [21] Jonathan Cohen. Graph twiddling in a mapreduce world. Computing in Science & Engineering, 11(4):29, 2009. doi:10.1109/MCSE.2009.120.
- [22] J. Coleman. Social capital in the creation of human capital. American Journal of Sociology, 94:S95–S120, 1988. URL: http://www.jstor.org/stable/2780243, doi:10.1086/228943.
- [23] Radu Curticapean, Holger Dell, and Dániel Marx. Homomorphisms are a good basis for counting small subgraphs. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 210–223, 2017. doi:10.1145/3055399.3055502.
- [24] Víctor Dalmau and Peter Jonsson. The complexity of counting homomorphisms seen from the other side. Theoretical Computer Science, 329(1-3):315–323, 2004. doi:10.1016/j.tcs.2004.08.008.
- [25] Holger Dell, Marc Roth, and Philip Wellnitz. Counting answers to existential questions. In Proc. 46th International Colloquium on Automata, Languages and Programming. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019. doi:10.4230/LIPIcs.ICALP.2019.113.
- [26] Josep Díaz, Maria Serna, and Dimitrios M Thilikos. Counting h-colorings of partial k-trees. Theoretical Computer Science, 281(1-2):291–309, 2002. doi:10.1016/S0304-3975(02)00017-8.
- [27] Martin Dyer and Catherine Greenhill. The complexity of counting graph homomorphisms. Random Structures & Algorithms, 17(3-4):260–289, 2000. doi:10.1002/1098-2418(200010/12)17:3/4\%3C260::AID-RSA5\%3E3.0.CO;2-W.
- [28] David Eppstein. Arboricity and bipartite subgraph listing algorithms. Information processing letters, 51(4):207–211, 1994. doi:10.1016/0020-0190(94)90121-X.
- [29] G. Fagiolo. Clustering in complex directed networks. Phys. Rev. E, 2007. doi:10.1103/PhysRevE.76.026107.
- [30] Jörg Flum and Martin Grohe. The parameterized complexity of counting problems. SIAM Journal on Computing (SICOMP), 33(4):892–922, 2004. doi:10.1137/S0097539703427203.
- [31] Lior Gishboliner, Yevgeny Levanzov, and Asaf Shapira. Counting subgraphs in degenerate graphs, 2020. arXiv:2010.05998, doi:10.48550/arXiv.2010.05998.
- [32] Gaurav Goel and Jens Gustedt. Bounded arboricity to determine the local structure of sparse graphs. In International Workshop on Graph-Theoretic Concepts in Computer Science, pages 159–167. Springer, 2006. doi:10.1007/11917496_15.
- [33] Rudolf Halin. S-functions for graphs. Journal of geometry, 8(1-2):171–186, 1976. doi:10.1007/BF01917434.
- [34] P. Holland and S. Leinhardt. A method for detecting structure in sociometric data. American Journal of Sociology, 76:492–513, 1970. doi:10.1016/B978-0-12-442450-0.50028-6.
- [35] Alon Itai and Michael Rodeh. Finding a minimum circuit in a graph. SIAM Journal on Computing, 7(4):413–423, 1978. doi:10.1137/0207033.
- [36] Shalev Itzkovitz, Reuven Levitt, Nadav Kashtan, Ron Milo, Michael Itzkovitz, and Uri Alon. Coarse-graining and self-dissimilarity of complex networks. Phys. Rev. E, 71(016127), January 2005. doi:10.1103/PhysRevE.71.016127.
- [37] Shweta Jain and C Seshadhri. A fast and provable method for estimating clique counts using Turán’s theorem. In Proceedings, International World Wide Web Conference (WWW), pages 441–449, 2017. doi:10.1145/3038912.3052636.
- [38] Madhav Jha, C Seshadhri, and Ali Pinar. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proc. 24th Proceedings, International World Wide Web Conference (WWW), pages 495–505. International World Wide Web Conferences Steering Committee, 2015. doi:10.1145/2736277.2741101.
- [39] László Lovász. Operations with structures. Acta Mathematica Academiae Scientiarum Hungarica, 18(3-4):321–328, 1967. doi:10.1007/BF02280291.
- [40] László Lovász. Large networks and graph limits, volume 60. American Mathematical Soc., 2012.
- [41] David W Matula and Leland L Beck. Smallest-last ordering and clustering and graph coloring algorithms. Journal of the ACM (JACM), 30(3):417–427, 1983. doi:10.1145/2402.322385.
- [42] Derek O’Callaghan, Martin Harrigan, Joe Carthy, and Pádraig Cunningham. Identifying discriminating network motifs in youtube spam, 2012. URL: https://arxiv.org/abs/1202.5216, arXiv:1202.5216, doi:10.48550/arXiv.1202.5216.
- [43] Mark Ortmann and Ulrik Brandes. Efficient orbit-aware triad and quad census in directed and undirected graphs. Applied network science, 2(1), 2017. doi:10.1007/s41109-017-0027-2.
- [44] Noujan Pashanasangi and C Seshadhri. Efficiently counting vertex orbits of all 5-vertex subgraphs, by evoke. In Proc. 13th International Conference on Web Search and Data Mining (WSDM), pages 447–455, 2020. doi:10.1145/3336191.3371773.
- [45] Ali Pinar, C Seshadhri, and Vaidyanathan Vishal. Escape: Efficiently counting all 5-vertex subgraphs. In Proceedings, International World Wide Web Conference (WWW), pages 1431–1440, 2017. doi:10.1145/3038912.3052597.
- [46] Natasa Przulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):177–183, 2007. doi:10.1093/bioinformatics/btl301.
- [47] Neil Robertson and Paul D. Seymour. Graph minors. i. excluding a forest. Journal of Combinatorial Theory, Series B, 35(1):39–61, 1983. doi:10.1016/0095-8956(83)90079-5.
- [48] Neil Robertson and Paul D. Seymour. Graph minors. iii. planar tree-width. Journal of Combinatorial Theory, Series B, 36(1):49–64, 1984. doi:10.1016/0095-8956(84)90013-3.
- [49] Neil Robertson and Paul D. Seymour. Graph minors. ii. algorithmic aspects of tree-width. Journal of algorithms, 7(3):309–322, 1986. doi:10.1016/0196-6774(86)90023-4.
- [50] Rahmtin Rotabi, Krishna Kamath, Jon M. Kleinberg, and Aneesh Sharma. Detecting strong ties using network motifs. In Proceedings, International World Wide Web Conference (WWW), 2017. doi:10.1145/3041021.3055139.
- [51] Marc Roth and Philip Wellnitz. Counting and finding homomorphisms is universal for parameterized complexity theory. In Proc. 31st Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2161–2180, 2020. doi:10.1137/1.9781611975994.133.
- [52] Ahmet Erdem Sariyuce, C. Seshadhri, Ali Pinar, and Umit V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings, International World Wide Web Conference (WWW), pages 927–937, 2015. doi:10.1145/2736277.2741640.
- [53] C. Seshadhri and Srikanta Tirthapura. Scalable subgraph counting: The methods behind the madness: WWW 2019 tutorial. In Proceedings, International World Wide Web Conference (WWW), 2019. doi:10.1145/3308560.3320092.
- [54] Nino Shervashidze, S. V. N. Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten M. Borgwardt. Efficient graphlet kernels for large graph comparison. In AISTATS, pages 488–495, 2009. URL: http://proceedings.mlr.press/v5/shervashidze09a.html.
- [55] K. Shin, T. Eliassi-Rad, and C. Faloutsos. Patterns and anomalies in -cores of real-world graphs with applications. Knowledge and Information Systems, 54(3):677–710, 2018. doi:10.1007/s10115-017-1077-6.
- [56] George Szekeres and Herbert S Wilf. An inequality for the chromatic number of a graph. Journal of Combinatorial Theory, 4(1):1–3, 1968. doi:10.1016/S0021-9800(68)80081-X.
- [57] Charalampos E. Tsourakakis. The k-clique densest subgraph problem. In Proceedings, International World Wide Web Conference (WWW), pages 1122–1132, 2015. doi:10.1145/2736277.2741098.
- [58] Charalampos E. Tsourakakis, Jakub Pachocki, and Michael Mitzenmacher. Scalable motif-aware graph clustering. In Proceedings, International World Wide Web Conference (WWW), pages 1451–1460, 2017. doi:10.1145/3038912.3052653.
- [59] Johan Ugander, Lars Backstrom, and Jon M. Kleinberg. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections. In Proceedings, International World Wide Web Conference (WWW), pages 1307–1318, 2013. doi:10.1145/2488388.2488502.
- [60] Hao Yin, Austin R. Benson, and Jure Leskovec. Higher-order clustering in networks. Phys. Rev. E, 97:052306, 2018. doi:10.1103/PhysRevE.97.052306.
- [61] Hao Yin, Austin R. Benson, and Jure Leskovec. The local closure coefficient: A new perspective on network clustering. In ACM International Conference on Web Search and Data Mining (WSDM), pages 303–311, 2019. doi:10.1145/3289600.3290991.