A Better Approach to Track the Evolution of Static Code Warnings

Junjie Li Concordia University
Montreal, Canada
Email: [email protected]

Abstract

Static bug detection tools help developers detect code problems. However, it is known that they remain underutilized due to various reasons. Recent advances to incorporate static bug detectors in modern software development workflows can better motivate developers to fix the reported warnings on the fly.

In this paper, we study the effectiveness of the state-of-the-art (SOA) solution in tracking warnings by static bug detectors and propose a better solution based on our analysis of the insufficiencies of the SOA solution. In particular, we examined four large-scale open-source systems and crafted a data set of 3,452 static code warnings by two static bug detectors. We manually uncover the ground-truth evolution status of the selected warnings: persistent, resolved, or newly-introduced. Moreover, upon manual analysis, we identified the critical reasons behind the insufficiencies of the SOA matching algorithm. Finally, we propose a better approach to improve the tracking of static warnings over software development history. Our evaluation shows that our proposed approach provides a significant improvement in the precision of the tracking, i.e., from 66.9% to 90.0%.

Index Terms:

Tranking static code warnings, Empirical study, Static analysis

I Introduction

Static bug detection tools have been widely applied in practice to detect potential defects in software. However, they are known to be underutilized due to various reasons. First, static bug detectors detect an overwhelming number of warnings, which may be far beyond what resources are allowed to resolve [1], [2]. Second, static bug detectors are known to detect many false positive warnings. The existence of a large number of false positives discourages developers from actively working on resolving the reported warnings [3], [4]. As a result, a significant portion of static code warnings remain unresolved by developers and can hinder software quality. Researchers have been working on techniques to improve the performance of static bug detectors [5, 6].

Recent studies show that by integrating static bug detectors in software development workflows, such as code review, developers demonstrated a higher response rate in resolving the reported static warnings [7, 8]. Tracking the evolution of static code warnings reveals which warnings are persistent, i.e., unresolved by developers, which warnings are resolved, and which warnings are newly introduced by the latest changes. Currently, there has been little effort to review the existing solutions to track the evolution of static code warnings. Avgustinov et al. [9] presented the first algorithm that combines various information of one warning. It compares two warnings in layers and eventually establishes mappings between two sets of warnings using three matching strategies(i.e., Location Matching, Snippet Matching and Hash Matching), which we refer to as the state-of-the-art (SOA) solution. The information of one warning they used is shown as Figure 1. However, they did not present an evaluation of the performance of the SOA solution in tracking the static code warnings. In other words, it remains unknown whether or not the SOA solution has an acceptable performance.

⬇

1 <WarningInstance>

2 <WarningType>SE_BAD_FIELD</WarningType>

3 <Project>jclouds</Project>

4 <Class>ContextBuilderTest</Class>

5 <Method></Method>

6 <Field></Field>

7 <FilePath>org/jclouds/ContextBuilder.java</FilePath>

8 <StartLine>70</StartLine>

9 <EndLine>75</EndLine>

10 </WarningInstance>

Figure 1: An example of the representation of one static code warning. The representation has been simplified to only show the information used by the SOA matching approach.

In this paper, we collected and uncovered the ground-truth evolution status between two consecutive commits using two widely used static bug detectors(PMD [10] and Spotbugs [11]) on two open-source software projects (JClouds and Kafka). We examined the SOA solution in tracking the evolution of static code warnings. Our investigation shows that the SOA solution achieves inadequate results, and we performed a manual analysis to uncover the insufficiencies of the SOA solution. Based on our findings, we proposed a better approach to track the static code warnings. The evaluation based on the crafted data set shows that our approach can significantly improve the tracking precision.

II uncovering Ground-truth and approach improvement

Static Bug Detectors and Analyzed Open-source Systems. In this paper, we include two static bug detectors, i.e., PMD and Spotbugs. In terms of Analyzed systems, our study includes four Java open-source systems, JClouds, Kafka, Spring-boot and Guava. Two of the systems(i.e., JClouds and Kafka) are used to are used to uncover the ground-truth. The other two systems(i.e., Spring-boot and Guava) are selected to evaluate both approaches without biases.

A Description of the SOA Approach. For those warnings that cannot be matched by Exact matching(i.e., warnings from two revisions will be matched up when their metadata like Figure 1 are identical), there are three matching strategies in the SOA approach. The first one is Location matching. It is based on the diffs [12] [13] between two revisions. When Location matching fails, Snippet matching will be used. Given the source location defined by a start line and an end line, code snippets in between are extracted from both revisions. Snippet matching will decide a mapping if they are identical. The two strategies can only match warnings in the same class file. When a file is moved to a new location (i.e., file path are modified), they cannot handle it. For such case, Hash matching approach can be helpful. This matching strategy tries to match warnings based on the similarity of their surrounding code.

Improved Approach. We re-implemented the SOA approach and applied it on analyzed open-source systems. Overall, We crafted a dataset of 1,715 static code warnings and manually uncovered their ground-truth evolution status. In short, the overall precision of the SOA approach on the collected dataset is only 62.4%, which means that 37.6% resolved or newly introduced warnings actually are persistent. Guided by our manual analysis results, we propose to improve the SOA approach by better handling refactoring changes. In particular, our proposed approach reuses the two matching strategies(i.e., Location matching and Snippet matching) of the SOA approach and revise a few key steps to improve the inaccurate tracking.

1.

Improvement 1 - Including refactoring. From our dataset, we find that code refactoring will modify the metadata(i.e., Figure 1) of static warnings, which causes that three matching strategies fail to work. To address this insufficiency, we include the refactoring information to improve the tracking using RefactoringMiner [14]. When the location of a static code warning having code refactoring is detected, our approach can take a better and correct mappings by modifying the metadata of the pre-commit warning to the post-commit warning. Then we take the two matching strategies to match them. In particular, Hash matching in the SOA approach is designed to handle the case of the class files renamed or moved that are included into refactoring information. Thus we remove Hash matching.
2.

Improvement 2 - Adopting Hungarian algorithm. When a pair of warnings is mismatched from the SOA approach, the others will be affected. We do find such false positives due to the SOA mismatching. To address this insufficiency, we adopt the Hungarian algorithm [15], a classic approach to solve the assignment problem in bipartite graphs, to reduce the effect of the matching strategies order. When a pair of candidates is found in two matching strategies(i.e., Location Matching and Snippet Matching), instead of deciding it as a matched pair directly in the SOA approach, we construct a Hungarian matrix to save matching candidate pairs. After saving all matching candidate pairs, we leverage maximum matching on them to decide the matched pairs.

III An Evaluation of our approach

We propose an improved approach based on the manual analysis on JClouds and Kafka. To avoid a biased evaluation, in addition to the two systems, we also select two other open-source software systems (i.e., Spring-boot and Guava) to evaluate our improved approach and show how much improvement our approach has compared to the SOA approach. Totally, we collect 3,452 static warnings from SOA approach.

Since tracking the static code warnings is not a standalone task for each individual warning, it is, in fact, a mapping problem between two sets. Hence, we actually applied our improved approach on all static warning on a commit, which is a superset of the 3,452 warnings in the manually-labeled dataset. The remaining warnings while not in our crafted dataset have a pre-assumed satuts, “persistent”.

TABLE I: The performance comparison between the SOA approach and our approach. Note that FP is short for false positive. A lower FP ratio is desired.

	Resolved		Newly-Introduced
	FP (SOA)	FP (our approach)	FP (SOA)	FP (our approach)
PMD
JClouds	63.6% (178/280)	3.8% (4/106)	36.8% (57/155)	8.4% (9/107)
Kafka	45.4% (148/326)	27.0% (66/244)	8.6% (22/255)	4.1% (10/243)
Spring-boot	36.7% (80/218)	13.8% (22/160)	42.3% (80/189)	16.8% (22/131)
Guava	41.9% (124/296)	8.0% (15/187)	66.0% (124/188)	19.0% (15/79)
Spotbugs
Jclouds	17.3% (18/104)	1.1% (1/87)	17.9% (14/78)	3.0% (2/66)
Kafka	37.8% (114/301)	15.8% (35/222)	43.5% (94/216)	17.0% (25/147)
Spring-boot	3.6% (7/193)	0.5% (1/187)	4.4% (7/160)	0.6% (1/154)
Guava	20.1% (58/289)	6.1% (15/246)	26.0% (53/204)	6.2% (10/161)
Total	36.2% (727/2007)	11.1% (159/1437)	31.2% 451/1445)	8.6% (94/1087)

Table I lists the comparison results between the SOA approach and our improved approach. In the 3,452 warnings, the SOA approach has 1,178 warnings with a wrong evolution status. Compared to that, our proposed approach reduces the false positives significantly, from 1,178 to 253, i.e., the false positive rate drops to 10.0%. Our approach maps correctly for the labeled persistent warnings, which are mistakely labeled as resolved or newly-introduce by the SOA approach.

IV Conclusions

Tracking the evolution of static code warnings across software development history becomes a vital question due to the increasing interest to further utilize static bug detectors by integrating them in developers’ workflow. Also, such tracking is widely used in many downstream software engineering tasks. This study presents an investigation on the performance of the state-of-the-art approach in tracking static code warnings. In particular, a dataset of 3,452 static code warnings and their evolution status is crafted. Last, this paper present an improved approach, which is shown to outperform the SOA approach significantly in terms of the tracking precision.

References

[1] B. Johnson, Y. Song, E. Murphy-Hill, and R. Bowdidge, “Why don’t software developers use static analysis tools to find bugs?” in Proceedings of the 2013 International Conference on Software Engineering, ser. ICSE ’13. IEEE Press, 2013, p. 672–681.
[2] M. Beller, R. Bholanath, S. McIntosh, and A. Zaidman, “Analyzing the state of static analysis: A large-scale evaluation in open source software,” in 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, 2016, pp. 470–481.
[3] F. Wedyan, D. Alrmuny, and J. M. Bieman, “The effectiveness of automated static analysis tools for fault detection and refactoring prediction,” in 2009 International Conference on Software Testing Verification and Validation, 2009, pp. 141–150.
[4] A. Habib and M. Pradel, “How many of all bugs do we find? a study of static bug detectors,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE 2018. New York, NY, USA: Association for Computing Machinery, 2018, p. 317–328. [Online]. Available: https://doi.org/10.1145/3238147.3238213
[5] Q. Hanam, L. Tan, R. Holmes, and P. Lam, “Finding patterns in static analysis alerts: Improving actionable alert ranking,” in Proceedings of the 11th Working Conference on Mining Software Repositories, ser. MSR 2014. New York, NY, USA: Association for Computing Machinery, 2014, p. 152–161. [Online]. Available: https://doi.org/10.1145/2597073.2597100
[6] S. Kim and M. D. Ernst, “Which warnings should i fix first?” in Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ser. ESEC-FSE ’07. New York, NY, USA: Association for Computing Machinery, 2007, p. 45–54. [Online]. Available: https://doi.org/10.1145/1287624.1287633
[7] C. Sadowski, E. Aftandilian, A. Eagle, L. Miller-Cushon, and C. Jaspan, “Lessons from building static analysis tools at google,” Commun. ACM, vol. 61, no. 4, p. 58–66, Mar. 2018. [Online]. Available: https://doi.org/10.1145/3188720
[8] C. Sadowski, J. van Gogh, C. Jaspan, E. Soederberg, and C. Winter, “Tricorder: Building a program analysis ecosystem,” in International Conference on Software Engineering (ICSE), 2015.
[9] P. Avgustinov, A. I. Baars, A. S. Henriksen, G. Lavender, G. Menzel, O. de Moor, M. Schäfer, and J. Tibble, “Tracking static analysis violations over time to capture developer characteristics,” in Proceedings of the 37th International Conference on Software Engineering - Volume 1, ser. ICSE ’15. IEEE Press, 2015, p. 437–447.
[10] (2019) Pmd latest version. [Online]. Available: https://pmd.github.io
[11] (2019) Spotbugs latest version. [Online]. Available: http://spotbugs.readthedocs.io
[12] J. W. Hunt and T. G. Szymanski, “A fast algorithm for computing longest common subsequences,” Communications of the ACM, vol. 20, no. 5, pp. 350–353, 1977.
[13] E. W. Myers, “Ano (nd) difference algorithm and its variations,” Algorithmica, vol. 1, no. 1-4, pp. 251–266, 1986.
[14] N. Tsantalis, M. Mansouri, L. M. Eshkevari, D. Mazinanian, and D. Dig, “Accurate and efficient refactoring detection in commit history,” in Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY, USA: ACM, 2018, pp. 483–494. [Online]. Available: http://doi.acm.org/10.1145/3180155.3180206
[15] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.