This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The cross cyclomatic complexity:
a bi-dimensional measure for program complexity on graphs

Hugo Tremblay
Université du Québec à Chicoutimi
Chicoutimi, Québec
[email protected]
&Fabio Petrillo
Université du Québec à Chicoutimi
Chicoutimi, Québec
[email protected]
Abstract

Reduce and control complexity is an essential practice in software design. Cyclomatic complexity (CC) is one of the most popular software metrics, applied for more than 40 years. Despite CC is an interesting metric to highlight the number of branches in a program, it clearly not sufficient to represent the complexity in a piece of software. In this paper, we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis in as pair on the Cartesian plan to characterize program complexity using control flow graphs. Our postulates open a new venue to represent program complexity, and we discuss its implications and opportunities.

Keywords cyclomatic complexity, software metrics, software quality, cycle basis, graph

1 Introduction

Maintainability is one of an essential quality in a software system; maintain a program drives evolution and the longevity of a system. Thus, to maintain a piece of software, firstly, we need to understand it. Understandability is the quality aspect that determines the capacity to understand code easily. However, how to measure if our source code is easy to understand? Despite more than 40 years of research for measuring understandability, there is not a consensus on what metrics we should use to evaluate the understandability [1].

As understandability is hard to measure, the software engineering community uses the McCabe’s Cyclomatic Complexity [2] as a proxy of understandability. McCabe’s Cyclomatic Complexity (MCC) measures the number of linearly independent control flow paths within the method, claiming explicitly that ”… it can be used to manage and control program complexity.” [2] There is a commonsense and intuition if a program has more branches, effort to understand it is more significant than a program with fewer branches. Then, if a method has a high CC, intuitively, it should be hard to understand.

Despite its value, MCC is not enough to describe the complexity of a problem. For example, idiomatic Java structures as blocks try/catch do not contribute to increase a program MCC. Moreover, a well-design program - using structure style - can have the same McCabe’s Cyclomatic Complexity than a equivalent program written using GOTOs or ”spaghetti” code [1]. Further, Jbara et al. found hundreds of methods with MCC higher than 100 111Visser et al. propose good software system should have the majority of its methods with a MCC lower or equal 10. Then, a method with MCC of 100 should be considered as very complex., but they appreared to be well structured [3]. In fact, MCC does not reflect the complexity in all diversity of situation in a code, espcially in high-MCC programs [3].

In this position paper, we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis as a pair on the Cartesian plan to characterize program complexity using control flow graphs.

The remainder of this paper is organized as follows. In the Section 2, we discuss the characteristics and the main issues of MCC. In Section 3, the mathematical and graph background to define our metric. In Section 4, we define CCC metric, and Section 5, we present how to calculate CCC from 6 different cases. In the Section 6 we present related work to this study. Section 7 discusses the implications of CCC and opportunities of future work.

2 Is McCabe’s Cyclomatic Complexity a bad measure?

Cyclomatic Complexity (MCC) is a measure of program complexity that was proposed by Thomas McCabe in 1976. MCC measures the complexity of a program counting the number of linearly independent paths in a control flow graph [2]. Explicitly, McCabe’s measure goal is ”… provide a quantitative basis for modularization and allow us to identify software modules that will be difficult to test or maintain.” Indeed, MCC is a straightforward metric in terms of computation on which developers have an upper bound for the number of test cases required to obtain branch coverage of the code, and cyclomatic complexity could be applied to estimate the required effort for writing tests [4]. Also, MCC has strong mathematical support on graph theory, differently of arbitrary measures as Cognitive Complexity [5], for example. Moever, a recent study found that there is no strong linear correlation between MCC and SLOC of Java methods, so developers should not use only SLOC as a proxy of complexity measure. Thus, it is not a surprise that MCC is one of the most adopted program metrics partly because it is simple to calculate, aligned with developers intuition and there are no widely agreed alternatives [6].

Despite some advantages, MCC has very well-know limitations [6, 3, 4, 5, 1]. First, intuitively while loops are feels more complicate than ifs, as nested sequences of fors and if are more complicated than a switch/case structure. However, MCC can give the same value because it gives the same weight to all structures. We can observe this issue in the Listing 1. The method sumOfPrimes is intuitively more complex (lines 1 to 12) than the method getWords (lines 14 to 23). The first method has 2 for loops and a nested if inside of the second for, and a continue to jump outside of the loops (as GOTO). The second method has just a switch/case with 3 alternatives. Despite the intuitive differences in terms of complexity, both methods have the same MCC (MCC = 4).

Moreover, a method with high MCC yet would easy to understand, as a large state machine implemented using a switch/case structure. Thus, MCC may produce a false positive conclusion about the complexity of methods [1, 7]. Finally, Ajamiet al. [6] found that different code structures take different times to interpret, suggesting that the approach taken by MCC where all branching constructs are given the same weight, is overly simplistic.

1int sumOfPrimes(int max) {
2 int total = 0;
3 OUT: for (int i = 1; i <= max; ++i) {
4 for (int j = 2; j < i; ++j) {
5 if (i % j == 0) {
6 continue OUT;
7 }
8 }
9 total += i;
10 }
11 return total;
12} // Cyclomatic Complexity 4
13
14String getWords(int number) {
15 switch (number) {
16 case 1:
17 return "one";
18 case 2:
19 return "a couple";
20 default:
21 return "lots";
22 }
23 } // Cyclomatic Complexity 4
Listing 1: Two Java methods with the same MCC [5]

Alternatives to MCC that use different weights to different control structures ware proposed (as Halstead metrics [8] or Cognitive Complexity [5], but they did not report empirical evidence supporting proposed weights [6]. Thus, Hummel [4] suggest that

The sad truth is that there probably is no single simple measurement that can express an abstract concept such as complexity in a single number. But this does not imply that we cannot measure and control complexity. It just has to be done with multiple metrics and checks that cover the various aspects of complexity.

Vinju and Godfrey [1] share a similar perspective. They propose a schematic diagram (Figure 1) to highlight and explain the factors for the correlation of high CC with failure. The diagram suggests the number of failures should not be determinate by a single aspect (number of branches). Thus, they highlighted the fact that there are several aspects to explain the failures.

Refer to caption
Figure 1: Two comparable explanations for correlation of high CC with failure [1]

All in all, MCC is a useful measure to help developers to find methods that are hard to test, but MCC is not enough. Thus, in Section 3, we present essential concepts before introducing our metric. After, we introduce our metric in Section 4.

3 Mathematical prerequisites

We now recall some important concepts from graph theory and linear algebra necessary in order to define our metric. Additional examples and details of proofs can be found in [9]. Let G=(V,E)G=(V,E) be a connected graph without loops or multiple edges. If the edges of GG are non-ordered pairs (resp. ordered pairs), we say GG is a simple graph (resp. a simple digraph). A uvuv-path is a sequence of adjacent edges P(u,v)=(e1,e2,,en)P(u,v)=(e_{1},e_{2},\ldots,e_{n}) such that uu is incident to e1e_{1}, vv is adjacent to ene_{n}. A cycle C(u,v)C(u,v) is a path where u=vu=v. Also, P(u,v)P(u,v) is simple if it does not contain repeated vertices, except for possibly uu and vv. A spanning tree of GG is an acyclic subgraph TT such that VT=VGV_{T}=V_{G}. Spanning trees are easily computed using Breadth-First Search or Depth-First Search algorithms. Note that in the case of digraphs, those algorithms return a spanning tree only if GG is strongly connected, that is if there exists a path between each pair of vertices. Otherwise, the resulting tree depends on the choice of a starting vertex. A weighted graph is a graph G=(V,E)G=(V,E) together with a function ω:E\omega:E\longrightarrow\mathbb{R} where the image ω(e)\omega(e) is the edge-weight of e. By extension, the weight of a subset EEE^{\prime}\subseteq E is the sum ω(E)=eEω(e)\omega(E^{\prime})=\sum_{e\in E^{\prime}}\omega(e). The weight of a graph is the weight of its edge set. For example, if ω(e)=1\omega(e)=1 for all eEe\in E, then ω(P)\omega(P) corresponds to the length of the path PP. Figure 2 illustrates a weighted graph and one of its spanning tree.

aabbccddee441-101/21/2115510-10
Figure 2: A weighted graph G=(V,EG)G=(V,E_{G}) and one of its spanning tree T=(V,ET)T=(V,E_{T}). Here, ω(G)=1/2\omega(G)=-1/2 and ω(T)=10\omega(T)=10.

Now, let TT be a spanning tree of GG. The relative complement of TT in GG is the graph GTG-T obtained from GG by deleting the edges of TT. A fundamental cycle associated to TT is the cycle obtained by adding any edge of GTG-T to TT. Existence and unicity are guaranteed since TT is acyclic and spans GG. The fundamental system of cycles of TT is the set of fundamental cycles associated to TT. It is worth noting that in the case of simple digraphs, the cycles obtained are not oriented (see Figure 3 for an example). This will be further discussed in the next section.

aabbccddee
(a) A simple digraph and one of its spanning tree TT
aabbccddee
(b) The relative complement GTG-T
aabbccaaccddccddee
(c) The set of unoriented fundamental cycles associated to TT
Figure 3: Constructing a set of fundamental cycles

As its name suggest, a fundamental system of cycles depends on the choice of TT. The following discussion on the links between linear algebra and graph theory expands on this idea. Recall from linear algebra that GF(2)GF(2) (often noted /2\mathbb{Z}/2\mathbb{Z}) is the Galois Field on 22 elements. That is, the set {0,1}\{0,1\} with addition and multiplication modulo 22. Now, let E1E_{1} and E2E_{2} be subsets of EE for a given graph G=(V,E)G=(V,E). Define the ring sum E1E2E_{1}\oplus E_{2} as the symmetric difference between E1E_{1} and E2E_{2}, that is

E1E2=(E1E2)(E2E1).E_{1}\oplus E_{2}=(E_{1}-E_{2})\cup(E_{2}-E_{1}).

If we denote by WE(G)W_{E}(G) the set of all subsets of EGE_{G}, then WE(G)W_{E}(G) together with \oplus is a vector space over GF(2)GF(2). Finally, the cycle space of GG is the subspace of WE(G)W_{E}(G) consisting of \emptyset, all cycles of GG and all union of disjoint cycles. The next theorem establishes important results concerning basis of the cycle space.

Theorem 3.1.

Let GG be a simple connected graph. Then, any fundamental system of cycles is a basis of the cycle space of GG. Also, the dimension of the cycle space is equal to #E#V+1\#E-\#V+1.

The dimension of the cycle space is called the cycle rank of GG (or Betti number or cyclomatic number) and is denoted by ν(G)\nu(G). For example, the graph of Figure 2 has cycle rank 33. The cycles c1=a,e,bc_{1}=\langle a,e,b\rangle, c2=b,e,dc_{2}=\langle b,e,d\rangle and c3=b,d,cc_{3}=\langle b,d,c\rangle form a basis and we have c1c2=a,e,d,bc_{1}\oplus c_{2}=\langle a,e,d,b\rangle.

By considering the control flow graph GG of a computer program and adding an extra virtual arc between the end and start node, one obtains the well-known cyclomatic complexity of the program, as first introduced by McCabe in [2].

We close this section by studying cycle basis for weighted graphs. Given a weighted graph G=(V,E)G=(V,E), the weight of a basis is the sum of the weight of all its cycles. A minimum-weight basis is a cycle basis such that its weight is smaller than any other basis. Figure 4 shows three possible cycle basis for a weigthed graph.

aabbccddee11335544227766
(a) Basis associated to T1T_{1}: {a,e,d,a,d,c,b,a,c,b}\{\langle a,e,d\rangle,\langle a,d,c,b\rangle,\langle a,c,b\rangle\} of weight 3636
aabbccddee11335544227766
(b) Basis associated to T2T_{2}: {a,e,d,a,d,c,b,a,d,c}\{\langle a,e,d\rangle,\langle a,d,c,b\rangle,\langle a,d,c\rangle\} of weight 4545
aabbccddee11335544227766
(c) Basis associated to T3T_{3}: {a,e,d,a,d,c,a,c,b}\{\langle a,e,d\rangle,\langle a,d,c\rangle,\langle a,c,b\rangle\} of weight 3636
Figure 4: Different cycle basis and their weigth

4 The cross cyclomatic complexity

As stated in Section 2, MCC is an interesting metric to measure complexity of graphs but is clearly not sufficient. Indeed, many structural properties of GG are osculated by ν(G)\nu(G). Thus, we could augment cyclomatic complexity by considering, on one axis, ν(G)\nu(G), and on another, the weight of a minimum-weight cycle basis for GG. Intuitively, it is an extension of the traditional MCC: Whereas ν(G)\nu(G) merely gives the number of independent cycles in GG, ωmin(G)\omega_{min}(G) describes the internal structure of those cycles. Also, as is the case for the ν(G)\nu(G), the weight of a minimum-weight cycle basis is well-defined for any given graph. So, we define a new bi-dimensional complexity measure on graphs and study its applications to control flow graphs. Let G=(V,E)G=(V,E) be a weighted simple graph or digraph. Then, the cross cyclomatic complexity222Technically, since no cross product is involved, it should be called something along the line of ”cartesian cyclomatic complexity”. We chose the term ”cross” simply because we found it slightly less boring. of G is the tuple

cω(G)=(ν(G),ωmin(G)).c_{\omega}(G)=\left(\nu(G),\omega_{min}(G)\right). (1)

where ωmin(G)\omega_{min}(G) is the weight of a minimum-weight cycle basis for GG.

We now describe how cωc_{\omega} can be applied to measure complexity of computer programs. A control flow graph is a simple digraph G=(V,E)G=(V,E) where all paths are oriented from a start vertex ss to an exit vertex rr together with a virtual arc (r,s)(r,s). Such graphs are always strongly connected, otherwise some part of the program would be inaccessible. It is worth noting that cycles in GG correspond to independent paths in G{(r,s)}G-\{(r,s)\}. We define the cross cyclomatic complexity of a computer program as cωc_{\omega} with ω(e)=1\omega(e)=1 for all eGe\in G. It is obvious from the definition that ν(G)=#E#V+2\nu(G)=\#E-\#V+2. The computation of cω(G)c\omega(G) is a little more involved and thus is studied in the next section. Finally, our metric offers some flexibility on how to treat independent paths in control flow graphs. Indeed, by assigning different weight to arcs, one could, for instance, penalize certain paths in the control flow graph. Remark that we insist on using both ν(G)\nu(G) and ωmin(G)\omega_{min}(G) in the expression for cωc_{\omega} because, although ωmin\omega_{min} comes from ν\nu, one cannot explicitly compute ν\nu from the value of ωmin\omega_{min}.

Before closing this section, we derive some useful properties of cωc\omega. First, since ω(e)=1\omega(e)=1 for all eGe\in G, ωmin(G)\omega_{min}(G) corresponds to the length of a minimal-length cycle basis. Also, ωmin(G)ν(G)\omega_{min}(G)\geq\nu(G) since any cycle contains at least one edge. Further, the equality holds if and only if GG is an isolated vertex or a sequence of two instructions. Finally, we can plot the complexity in a halfplane as in Figure 5. Section 5 presents examples supporting this claim.

ν\nuωmin\omega_{min}(a)(b)(c) and (d)1111
Figure 5: The cross complexity halfplane. Dots correspond to possible values for the metric. Blue dots correspond to the complexity of the control structures in Figure 6

5 Computing the CCC for control flow graphs

As mentioned in Section 3, computing a minimum-weight cycle basis is not trivial. Several algorithms have been proposed [10, 11].

We present here a 𝒪(#E3#V)\mathcal{O}(\#E^{3}\#V) algorithm due to Horton [11] (Algorithm 1). The idea is to construct a 𝒪(#E#V)\mathcal{O}(\#E\#V)-sized set of cycles guaranteed to contain a minimum-weight cycle basis. Then, elements of this set are greedily extracted and linear independence in checked using Gaussian elimination.

Algorithm 1 Horton
1:A simple weighted digraph G=(V,E)G=(V,E)
2:A minimum-weight cycle basis \mathcal{B} for GG
3:\mathcal{B}\leftarrow\emptyset, LL\leftarrow\emptyset
4:for x,yVx,y\in V do
5:     Find the shortest xyxy-path P(x,y)P(x,y)
6:end for
7:for e=(x,y)Ee=(x,y)\in E and zVz\in V do
8:     Add C(z,e)=P(z,x)+e+P(y,z)C(z,e)=P(z,x)+e+P(y,z) to LL if it is simple
9:end for
10:Order LL by cycle weight
11:while #<ν(G)\#\mathcal{B}<\nu(G) do
12:     cc\leftarrow the first element of LL
13:     if the elements of \mathcal{B} and cc are linearly independent then
14:         Add cc to \mathcal{B}
15:     end if
16:     Remove cc from LL
17:end while
18:return \mathcal{B}

Line 5 can be computed using, for example, Dijkstra’s algorithm [12]. For Line 13, one can check check linear independence as follows: Given a cycle c=e1,e2,,enc=\langle e_{1},e_{2},\ldots,e_{n}\rangle, construct the #E\#E-dimensional incidence vector [δei]eiE[\delta_{e_{i}}]_{e_{i}\in E} where δei=1\delta_{e_{i}}=1 if and only if eie_{i} belongs to cc and 0 otherwise. Then, linear independence of a set of cycle {c1,c2,,ck}\{c_{1},c_{2},\ldots,c_{k}\} is easily checked by constructing the matrix [c1c2ck][c_{1}\;c_{2}\;\cdots c_{k}] and using Gaussian elimination over GF(2)GF(2).

For small graphs, the cross cyclomatic complexity of the control structures of a program are rather easy to compute since there exists very few cycle basis (see Figure 6). For instance, the unique (and thus minimal-weight) cycle basis for the graph in Figure 6c is the set of two independent cycles s,a,r\langle s,a,r\rangle and s,b,r\langle s,b,r\rangle giving complexity cmin(G)=(2,4)c_{min}(G)=(2,4)

ssrr11
(a) Sequence
ssaarr111111
(b) If
ssaabbrr11111111
(c) If/Else
ssaarr111111
(d) Loop
Figure 6: The control structures of a program have cross cyclomatic complexity (a) cmin=(1,1)c_{min}=(1,1), (b) cmin=(2,3)c_{min}=(2,3), (c) cmin=(2,4)c_{min}=(2,4) and (d) cmin=(2,4)c_{min}=(2,4)

A more complex example is presented in Figure 4. Using Horton’s algorithm, one determine that the three lowest weight simple cycles are c1=a,b,cc_{1}=\langle a,b,c\rangle, c2=a,e,dc_{2}=\langle a,e,d\rangle and c3=a,d,cc_{3}=\langle a,d,c\rangle of weight 66, 1515 and 1515 respectively. Since they are linearly independent, we find ωmin(G)=36\omega_{min}(G)=36.

The associated matrix, with edge ordering e1e_{1} through e7e_{7}, is

(100100101010011010001).\begin{pmatrix}1&0&0\\ 1&0&0\\ 1&0&1\\ 0&1&0\\ 0&1&1\\ 0&1&0\\ 0&0&1\end{pmatrix}.

It is easily checked that this matrix reduces to

(100010001000000000000),\begin{pmatrix}1&0&0\\ 0&1&0\\ 0&0&1\\ 0&0&0\\ 0&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix},

and thus c1c_{1}, c2c_{2} and c3c_{3} are linearly independent.

Despite being mathematically more accurate, the computation of ωmin(G)\omega_{min}(G) demands a significant effort. To simplify its computation, we now propose a simplified version of our complexity measure. The idea rises from the following simple observation: Given a graph GG and one of its cycle basis \mathcal{B}, then

ωmin(G)ω().\omega_{min}(G)\leq\omega(\mathcal{B}). (2)

From Section 3, one can deduced the following simple algorithm for computing an upper bound on ωmin\omega_{min}.

Algorithm 2 Upper bound on ωmin\omega_{min}
1:A simple weighted digraph G=(V,E)G=(V,E)
2:A value ω\omega such that ωmin(G)ω\omega_{min}(G)\leq\omega
3:ω0\omega\leftarrow 0
4:Construct a spanning tree TT of GG
5:for eGTe\in G-T do
6:     C(T,e)C(T,e)\leftarrow the unique non-oriented cycle of (GT){e}(G-T)\cup{\{e\}}
7:     ωω+ω(C(T,e))\omega\leftarrow\omega+\omega(C(T,e))
8:end for
9:return ω\omega

For example, the graph in Figure 4b gives the approximation (3,45)(3,45) whereas cω=(3,36)c_{\omega}=(3,36). In a more tangible example, consider the control flow graph of a Java implementation of Bubble sort from Rosetta Code [13] (Figure 7). Our approximation for the given spanning tree (red arcs) gives cω(G)=(4,12)c_{\omega}(G)=(4,12). Next, we extract from McCabe’s original paper two more graphs for which he illustrated his measure (Figures 8 and 9). The values cω(G)c_{\omega}(G) of each graph is shown in Figure 10.

ssaabbccddeerr
Figure 7: Control flow graph for Bubble Sort
Figure 8: Control flow graph and spanning tree for graph G1G1 in [2] with cω(G1)=(6,24)c_{\omega}(G1)=(6,24)

.

11223344556677889910101111121213131414151516161717181819192020212122222323
Figure 9: Control flow graph and spanning tree for a graph in [2] with cω(G1)=(10,47)c_{\omega}(G1)=(10,47).
ν\nuωmin\omega_{min}Fig 7: (4,12)(4,12)Fig 9: (10,47)(10,47)Fig 8: (6,24)(6,24)1111
Figure 10: Cross cyclomatic complexity for a few computer programs

Analyzing Figure 10, we can easily compare the three graphs’ complexities. The three regions in the chart represent some properties of cω(G)c_{\omega}(G): No graph GG can have cross cyclomatic complexity located in the dark gray region of the plane since each cycle in a basis is of length at least 11. The light gray region of the plane contains the complexity of valid graphs, but not non-trivial computer programs. Finally, the white region represents the cω(G)c_{\omega}(G) of valid non-trivial computer programs. In terms of design optimization, developers should aim to design programs with complexity near the line ν(G)=2ωmin\nu(G)=2\omega_{min}. Following this criterion, we observe that the Bubble sort implementation as modelled by the graph in Figure 7 is considered optimized (near to the line), whereas the graph in Figure 9 could be be refactored.

6 Related work

Vinju et al. [1] investigated the relationship between the shape of control flow patterns observed in Java methods to their CC metric values, proposing the concept of abstract control flow patterns (CFPs) and compressed control flow patterns (CCFPs), which allow us to produce statistical evidence that the CC metric indeed does not adequately model the likely complexity of control flow in Java methods.

Jbara et al. [3] discuss there are no agreement on how to measure code complexity, favouring simple general purpose metrics, such as lines of code or MCC. They claims such metrics just count syntactic features, and ignore details of the code’s global structure, which may also have an effect on understandability. To address this issue, they suggested that code regularity—where the same structures are repeated time after time—may significantly reduce complexity.

7 Final remarks

Understanding a computer program is a time consuming and complex cognitive task. To control program complexity, researchers and practitioners have proposed single metrics to address this issue. However, those single metrics can not solve the problem yet. McCabe’s Cyclomatic Complexity is one of those metrics. The MCC abstracts the control flow graph as merely the analysis of branches and the fan-outs of its nodes. Doing so, it reduces the rich structure of the underlying graphs as a single number [1].

Thus, we claim for addressing the program complexity without throw out previous useful metrics (as MCC), we introduce the cross cyclomatic complexity (CCC), a new bi-dimensional complexity measure on graphs that combines the cyclomatic complexity and the weight of a minimum-weight cycle basis as a pair on the Cartesian plane to characterize program complexity using control flow graphs. Using solid mathematical and graph backgrounds, CCC provides a combined approach to address some of the issues on MCC without losing its advantages.

Hence, we claim that Cross Cyclomatic Complexity gives a criterion for highlight refactoring opportunities to control the complexity of computer programs. Namely, we suggest that one should aim to keep the complexity as close as possible to the line ν=ωmin\nu=\omega_{min}. Further, CCC is agnostic of computer languages, and robust graph theorems support it. Also, CCC addresses distortions of MCC, which ignores the complexity associated with the size of programs or structures as long switch/case statements.

However, it is a positional work, lacking an in-depth empirical evaluation to observe how CCC can address complexity issues in program comprehension and support refactoring tasks. In the future, we will evaluate our metric to identify the diversity of scenarios to apply it and results on a large dataset of programs for different languages. Thus, this paper is the first step in order to provide a new way to see software metrics using graph properties.

References

  • [1] J. J. Vinju and M. W. Godfrey. What does control flow really look like? eyeballing the cyclomatic complexity metric. In 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pages 154–163, Sep. 2012.
  • [2] T. J. McCabe. A complexity measure. IEEE Trans. Softw. Eng., 2(4):308–320, July 1976.
  • [3] Ahmad Jbara and Dror G. Feitelson. On the effect of code regularity on comprehension. In Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, page 189–200, New York, NY, USA, 2014. Association for Computing Machinery.
  • [4] Benjamin Hummel. Mccabe’s cyclomatic complexity and why we don’t use it, 2014.
  • [5] G. Ann Campbell. Cognitive complexity, because testability != understandability, 2016.
  • [6] Shulamyt Ajami, Yonatan Woodbridge, and Dror G. Feitelson. Syntax, predicates, idioms – what really affects code complexity? Empirical Software Engineering, 24(1):287–328, 2019.
  • [7] D. Landman, A. Serebrenik, and J. Vinju. Empirical analysis of the relationship between cc and sloc in a large corpus of java methods. In 2014 IEEE International Conference on Software Maintenance and Evolution, pages 221–230, Sep. 2014.
  • [8] Maurice H. Halstead. Elements of software science. Elsevier, 1st edition, 1977.
  • [9] Jonathan L. Gross and Jay Yellen. Graph Theory and Its Applications, Second Edition (Discrete Mathematics and Its Applications). Chapman & Hall/CRC, 2005.
  • [10] Franziska Berger, Peter Gritzmann, and Sven de Vries. Minimum cycle bases and their applications. In Jürgen Lerner, Dorothea Wagner, and Katharina Anna Zweig, editors, Algorithmics of Large and Complex Networks - Design, Analysis, and Simulation [DFG priority program 1126], volume 5515 of Lecture Notes in Computer Science, pages 34–49. Springer, 2009.
  • [11] J. D. Horton. A polynomial-time algorithm to find the shortest cycle basis of a graph. SIAM Journal on Computing, 16(2):358–366, 1987.
  • [12] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.
  • [13] Rosetta Code. Sorting algorithms/bubble sort in java, 2020.