The cross cyclomatic complexity:
a bi-dimensional measure for program complexity on graphs
Abstract
Reduce and control complexity is an essential practice in software design. Cyclomatic complexity (CC) is one of the most popular software metrics, applied for more than 40 years. Despite CC is an interesting metric to highlight the number of branches in a program, it clearly not sufficient to represent the complexity in a piece of software. In this paper, we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis in as pair on the Cartesian plan to characterize program complexity using control flow graphs. Our postulates open a new venue to represent program complexity, and we discuss its implications and opportunities.
Keywords cyclomatic complexity, software metrics, software quality, cycle basis, graph
1 Introduction
Maintainability is one of an essential quality in a software system; maintain a program drives evolution and the longevity of a system. Thus, to maintain a piece of software, firstly, we need to understand it. Understandability is the quality aspect that determines the capacity to understand code easily. However, how to measure if our source code is easy to understand? Despite more than 40 years of research for measuring understandability, there is not a consensus on what metrics we should use to evaluate the understandability [1].
As understandability is hard to measure, the software engineering community uses the McCabe’s Cyclomatic Complexity [2] as a proxy of understandability. McCabe’s Cyclomatic Complexity (MCC) measures the number of linearly independent control flow paths within the method, claiming explicitly that ”… it can be used to manage and control program complexity.” [2] There is a commonsense and intuition if a program has more branches, effort to understand it is more significant than a program with fewer branches. Then, if a method has a high CC, intuitively, it should be hard to understand.
Despite its value, MCC is not enough to describe the complexity of a problem. For example, idiomatic Java structures as blocks try/catch do not contribute to increase a program MCC. Moreover, a well-design program - using structure style - can have the same McCabe’s Cyclomatic Complexity than a equivalent program written using GOTOs or ”spaghetti” code [1]. Further, Jbara et al. found hundreds of methods with MCC higher than 100 111Visser et al. propose good software system should have the majority of its methods with a MCC lower or equal 10. Then, a method with MCC of 100 should be considered as very complex., but they appreared to be well structured [3]. In fact, MCC does not reflect the complexity in all diversity of situation in a code, espcially in high-MCC programs [3].
In this position paper, we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis as a pair on the Cartesian plan to characterize program complexity using control flow graphs.
The remainder of this paper is organized as follows. In the Section 2, we discuss the characteristics and the main issues of MCC. In Section 3, the mathematical and graph background to define our metric. In Section 4, we define CCC metric, and Section 5, we present how to calculate CCC from 6 different cases. In the Section 6 we present related work to this study. Section 7 discusses the implications of CCC and opportunities of future work.
2 Is McCabe’s Cyclomatic Complexity a bad measure?
Cyclomatic Complexity (MCC) is a measure of program complexity that was proposed by Thomas McCabe in 1976. MCC measures the complexity of a program counting the number of linearly independent paths in a control flow graph [2]. Explicitly, McCabe’s measure goal is ”… provide a quantitative basis for modularization and allow us to identify software modules that will be difficult to test or maintain.” Indeed, MCC is a straightforward metric in terms of computation on which developers have an upper bound for the number of test cases required to obtain branch coverage of the code, and cyclomatic complexity could be applied to estimate the required effort for writing tests [4]. Also, MCC has strong mathematical support on graph theory, differently of arbitrary measures as Cognitive Complexity [5], for example. Moever, a recent study found that there is no strong linear correlation between MCC and SLOC of Java methods, so developers should not use only SLOC as a proxy of complexity measure. Thus, it is not a surprise that MCC is one of the most adopted program metrics partly because it is simple to calculate, aligned with developers intuition and there are no widely agreed alternatives [6].
Despite some advantages, MCC has very well-know limitations [6, 3, 4, 5, 1]. First, intuitively while loops are feels more complicate than ifs, as nested sequences of fors and if are more complicated than a switch/case structure. However, MCC can give the same value because it gives the same weight to all structures. We can observe this issue in the Listing 1. The method sumOfPrimes is intuitively more complex (lines 1 to 12) than the method getWords (lines 14 to 23). The first method has 2 for loops and a nested if inside of the second for, and a continue to jump outside of the loops (as GOTO). The second method has just a switch/case with 3 alternatives. Despite the intuitive differences in terms of complexity, both methods have the same MCC (MCC = 4).
Moreover, a method with high MCC yet would easy to understand, as a large state machine implemented using a switch/case structure. Thus, MCC may produce a false positive conclusion about the complexity of methods [1, 7]. Finally, Ajamiet al. [6] found that different code structures take different times to interpret, suggesting that the approach taken by MCC where all branching constructs are given the same weight, is overly simplistic.
Alternatives to MCC that use different weights to different control structures ware proposed (as Halstead metrics [8] or Cognitive Complexity [5], but they did not report empirical evidence supporting proposed weights [6]. Thus, Hummel [4] suggest that
The sad truth is that there probably is no single simple measurement that can express an abstract concept such as complexity in a single number. But this does not imply that we cannot measure and control complexity. It just has to be done with multiple metrics and checks that cover the various aspects of complexity.
Vinju and Godfrey [1] share a similar perspective. They propose a schematic diagram (Figure 1) to highlight and explain the factors for the correlation of high CC with failure. The diagram suggests the number of failures should not be determinate by a single aspect (number of branches). Thus, they highlighted the fact that there are several aspects to explain the failures.

3 Mathematical prerequisites
We now recall some important concepts from graph theory and linear algebra necessary in order to define our metric. Additional examples and details of proofs can be found in [9]. Let be a connected graph without loops or multiple edges. If the edges of are non-ordered pairs (resp. ordered pairs), we say is a simple graph (resp. a simple digraph). A -path is a sequence of adjacent edges such that is incident to , is adjacent to . A cycle is a path where . Also, is simple if it does not contain repeated vertices, except for possibly and . A spanning tree of is an acyclic subgraph such that . Spanning trees are easily computed using Breadth-First Search or Depth-First Search algorithms. Note that in the case of digraphs, those algorithms return a spanning tree only if is strongly connected, that is if there exists a path between each pair of vertices. Otherwise, the resulting tree depends on the choice of a starting vertex. A weighted graph is a graph together with a function where the image is the edge-weight of e. By extension, the weight of a subset is the sum . The weight of a graph is the weight of its edge set. For example, if for all , then corresponds to the length of the path . Figure 2 illustrates a weighted graph and one of its spanning tree.
Now, let be a spanning tree of . The relative complement of in is the graph obtained from by deleting the edges of . A fundamental cycle associated to is the cycle obtained by adding any edge of to . Existence and unicity are guaranteed since is acyclic and spans . The fundamental system of cycles of is the set of fundamental cycles associated to . It is worth noting that in the case of simple digraphs, the cycles obtained are not oriented (see Figure 3 for an example). This will be further discussed in the next section.
As its name suggest, a fundamental system of cycles depends on the choice of . The following discussion on the links between linear algebra and graph theory expands on this idea. Recall from linear algebra that (often noted ) is the Galois Field on elements. That is, the set with addition and multiplication modulo . Now, let and be subsets of for a given graph . Define the ring sum as the symmetric difference between and , that is
If we denote by the set of all subsets of , then together with is a vector space over . Finally, the cycle space of is the subspace of consisting of , all cycles of and all union of disjoint cycles. The next theorem establishes important results concerning basis of the cycle space.
Theorem 3.1.
Let be a simple connected graph. Then, any fundamental system of cycles is a basis of the cycle space of . Also, the dimension of the cycle space is equal to .
The dimension of the cycle space is called the cycle rank of (or Betti number or cyclomatic number) and is denoted by . For example, the graph of Figure 2 has cycle rank . The cycles , and form a basis and we have .
By considering the control flow graph of a computer program and adding an extra virtual arc between the end and start node, one obtains the well-known cyclomatic complexity of the program, as first introduced by McCabe in [2].
We close this section by studying cycle basis for weighted graphs. Given a weighted graph , the weight of a basis is the sum of the weight of all its cycles. A minimum-weight basis is a cycle basis such that its weight is smaller than any other basis. Figure 4 shows three possible cycle basis for a weigthed graph.
4 The cross cyclomatic complexity
As stated in Section 2, MCC is an interesting metric to measure complexity of graphs but is clearly not sufficient. Indeed, many structural properties of are osculated by . Thus, we could augment cyclomatic complexity by considering, on one axis, , and on another, the weight of a minimum-weight cycle basis for . Intuitively, it is an extension of the traditional MCC: Whereas merely gives the number of independent cycles in , describes the internal structure of those cycles. Also, as is the case for the , the weight of a minimum-weight cycle basis is well-defined for any given graph. So, we define a new bi-dimensional complexity measure on graphs and study its applications to control flow graphs. Let be a weighted simple graph or digraph. Then, the cross cyclomatic complexity222Technically, since no cross product is involved, it should be called something along the line of ”cartesian cyclomatic complexity”. We chose the term ”cross” simply because we found it slightly less boring. of G is the tuple
(1) |
where is the weight of a minimum-weight cycle basis for .
We now describe how can be applied to measure complexity of computer programs. A control flow graph is a simple digraph where all paths are oriented from a start vertex to an exit vertex together with a virtual arc . Such graphs are always strongly connected, otherwise some part of the program would be inaccessible. It is worth noting that cycles in correspond to independent paths in . We define the cross cyclomatic complexity of a computer program as with for all . It is obvious from the definition that . The computation of is a little more involved and thus is studied in the next section. Finally, our metric offers some flexibility on how to treat independent paths in control flow graphs. Indeed, by assigning different weight to arcs, one could, for instance, penalize certain paths in the control flow graph. Remark that we insist on using both and in the expression for because, although comes from , one cannot explicitly compute from the value of .
Before closing this section, we derive some useful properties of . First, since for all , corresponds to the length of a minimal-length cycle basis. Also, since any cycle contains at least one edge. Further, the equality holds if and only if is an isolated vertex or a sequence of two instructions. Finally, we can plot the complexity in a halfplane as in Figure 5. Section 5 presents examples supporting this claim.
5 Computing the CCC for control flow graphs
As mentioned in Section 3, computing a minimum-weight cycle basis is not trivial. Several algorithms have been proposed [10, 11].
We present here a algorithm due to Horton [11] (Algorithm 1). The idea is to construct a -sized set of cycles guaranteed to contain a minimum-weight cycle basis. Then, elements of this set are greedily extracted and linear independence in checked using Gaussian elimination.
Line 5 can be computed using, for example, Dijkstra’s algorithm [12]. For Line 13, one can check check linear independence as follows: Given a cycle , construct the -dimensional incidence vector where if and only if belongs to and otherwise. Then, linear independence of a set of cycle is easily checked by constructing the matrix and using Gaussian elimination over .
For small graphs, the cross cyclomatic complexity of the control structures of a program are rather easy to compute since there exists very few cycle basis (see Figure 6). For instance, the unique (and thus minimal-weight) cycle basis for the graph in Figure 6c is the set of two independent cycles and giving complexity
A more complex example is presented in Figure 4. Using Horton’s algorithm, one determine that the three lowest weight simple cycles are , and of weight , and respectively. Since they are linearly independent, we find .
The associated matrix, with edge ordering through , is
It is easily checked that this matrix reduces to
and thus , and are linearly independent.
Despite being mathematically more accurate, the computation of demands a significant effort. To simplify its computation, we now propose a simplified version of our complexity measure. The idea rises from the following simple observation: Given a graph and one of its cycle basis , then
(2) |
From Section 3, one can deduced the following simple algorithm for computing an upper bound on .
For example, the graph in Figure 4b gives the approximation whereas . In a more tangible example, consider the control flow graph of a Java implementation of Bubble sort from Rosetta Code [13] (Figure 7). Our approximation for the given spanning tree (red arcs) gives . Next, we extract from McCabe’s original paper two more graphs for which he illustrated his measure (Figures 8 and 9). The values of each graph is shown in Figure 10.
.
Analyzing Figure 10, we can easily compare the three graphs’ complexities. The three regions in the chart represent some properties of : No graph can have cross cyclomatic complexity located in the dark gray region of the plane since each cycle in a basis is of length at least . The light gray region of the plane contains the complexity of valid graphs, but not non-trivial computer programs. Finally, the white region represents the of valid non-trivial computer programs. In terms of design optimization, developers should aim to design programs with complexity near the line . Following this criterion, we observe that the Bubble sort implementation as modelled by the graph in Figure 7 is considered optimized (near to the line), whereas the graph in Figure 9 could be be refactored.
6 Related work
Vinju et al. [1] investigated the relationship between the shape of control flow patterns observed in Java methods to their CC metric values, proposing the concept of abstract control flow patterns (CFPs) and compressed control flow patterns (CCFPs), which allow us to produce statistical evidence that the CC metric indeed does not adequately model the likely complexity of control flow in Java methods.
Jbara et al. [3] discuss there are no agreement on how to measure code complexity, favouring simple general purpose metrics, such as lines of code or MCC. They claims such metrics just count syntactic features, and ignore details of the code’s global structure, which may also have an effect on understandability. To address this issue, they suggested that code regularity—where the same structures are repeated time after time—may significantly reduce complexity.
7 Final remarks
Understanding a computer program is a time consuming and complex cognitive task. To control program complexity, researchers and practitioners have proposed single metrics to address this issue. However, those single metrics can not solve the problem yet. McCabe’s Cyclomatic Complexity is one of those metrics. The MCC abstracts the control flow graph as merely the analysis of branches and the fan-outs of its nodes. Doing so, it reduces the rich structure of the underlying graphs as a single number [1].
Thus, we claim for addressing the program complexity without throw out previous useful metrics (as MCC), we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis as a pair on the Cartesian plane to characterize program complexity using control flow graphs. Using solid mathematical and graph backgrounds, CCC provides a combined approach to address some of the issues on MCC without losing its advantages.
Hence, we claim that Cross Cyclomatic Complexity gives a criterion for highlight refactoring opportunities to control the complexity of computer programs. Namely, we suggest that one should aim to keep the complexity as close as possible to the line . Further, CCC is agnostic of computer languages, and robust graph theorems support it. Also, CCC addresses distortions of MCC, which ignores the complexity associated with the size of programs or structures as long switch/case statements.
However, it is a positional work, lacking an in-depth empirical evaluation to observe how CCC can address complexity issues in program comprehension and support refactoring tasks. In the future, we will evaluate our metric to identify the diversity of scenarios to apply it and results on a large dataset of programs for different languages. Thus, this paper is the first step in order to provide a new way to see software metrics using graph properties.
References
- [1] J. J. Vinju and M. W. Godfrey. What does control flow really look like? eyeballing the cyclomatic complexity metric. In 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pages 154–163, Sep. 2012.
- [2] T. J. McCabe. A complexity measure. IEEE Trans. Softw. Eng., 2(4):308–320, July 1976.
- [3] Ahmad Jbara and Dror G. Feitelson. On the effect of code regularity on comprehension. In Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, page 189–200, New York, NY, USA, 2014. Association for Computing Machinery.
- [4] Benjamin Hummel. Mccabe’s cyclomatic complexity and why we don’t use it, 2014.
- [5] G. Ann Campbell. Cognitive complexity, because testability != understandability, 2016.
- [6] Shulamyt Ajami, Yonatan Woodbridge, and Dror G. Feitelson. Syntax, predicates, idioms – what really affects code complexity? Empirical Software Engineering, 24(1):287–328, 2019.
- [7] D. Landman, A. Serebrenik, and J. Vinju. Empirical analysis of the relationship between cc and sloc in a large corpus of java methods. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 221–230, Sep. 2014.
- [8] Maurice H. Halstead. Elements of software science. Elsevier, 1st edition, 1977.
- [9] Jonathan L. Gross and Jay Yellen. Graph Theory and Its Applications, Second Edition (Discrete Mathematics and Its Applications). Chapman & Hall/CRC, 2005.
- [10] Franziska Berger, Peter Gritzmann, and Sven de Vries. Minimum cycle bases and their applications. In Jürgen Lerner, Dorothea Wagner, and Katharina Anna Zweig, editors, Algorithmics of Large and Complex Networks - Design, Analysis, and Simulation [DFG priority program 1126], volume 5515 of Lecture Notes in Computer Science, pages 34–49. Springer, 2009.
- [11] J. D. Horton. A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM Journal on Computing, 16(2):358–366, 1987.
- [12] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.
- [13] Rosetta Code. Sorting algorithms/bubble sort in java, 2020.