Recommending Extract Method Refactoring
Based on Confidence of Predicted Method Name

Jinto Yamanaka ^∗ Graduate School of
Science and Technology
University of Tsukuba
Tsukuba, Japan
[email protected] Yasuhiro Hayase ^∗ Faculty of
Engineering, Information and Systems
University of Tsukuba
Tsukuba, Japan
[email protected] Toshiyuki Amagasa Center for Computational Sciences
University of Tsukuba
Tsukuba, Japan
[email protected]

Abstract

Refactoring is an important activity that is frequently performed in software development, and among them, Extract Method is known to be one of the most frequently performed refactorings. The existing techniques for recommending Extract Method refactoring calculate metrics from the source method and the code fragments to be extracted to order the recommendation candidates. This paper proposes a new technique for accurately recommending Extract Method refactoring by considering whether code fragments are semantically coherent chunks that can be given clear method names, in addition to the metrics used in previous studies. As a criterion for the semantic coherency, the proposed technique employs the probability (i.e. confidence) of the predicted method names for the code fragments output by code2seq, which is a state-of-the-art method name prediction technique. The evaluation experiment confirmed that the proposed technique has higher correctness of recommendation than the existing techniques.

Index Terms:

Refactoring, Extract Method, Software reliability

⁰⁰footnotetext: ^∗The first two authors contributed equally to this work

I Introduction

In software development, refactoring which improves the design of the source code without changing the external behavior is an essential and frequent activity to maintain the readability and changeability of the software[1, 2]. Among them, Extract Method refactoring which extracts a part of code from an existing method as a new method is known to be one of the most frequently performed ones[3, 4].

On the other hand, identifying where and how to refactor gets more difficult as a software product becomes larger and more complex. To address this problem, various techniques have been proposed to support developers’ refactoring activities. As for Extract Method, there are several techniques to suggest to developers what part of which method should be extracted[5, 6, 7, 8, 9]. These existing techniques use various metrics such as complexity, coupling, and cohesion obtained from the source method and the code fragment to be extracted as a new method for recommending refactoring.

Now, we focus on the name of the newly created method in Extract Method refactoring. Extract method refactoring is completed by giving a clear method name that expresses its role and meaning to the extracted code. Therefore, the semantic coherence of the extracted code fragments, in the sense that the developers can give them a clear name, is one of the most influential factors in the developer’s decision on whether to perform Extract Method or not. However, the existing techniques do not take into account the semantic cohesiveness of the code to be extracted in recommending Extract Method.

This paper proposes a new technique for Extract Method recommendation that applies code2seq[10], a method name estimation technique, to a part of code to be extracted, and uses the confidence value of the estimated name in addition to the metrics used by existing methods. Because the confidence value of code2seq is correlated to the degree of correspondence between the name of the method and its content, the proposed technique is expected to reflect whether the part of the code to be extracted is semantically coherent enough to be given a clear name. In order to utilize the metrics and recommendation algorithms used by the existing methods, the proposed technique is implemented by extending GEMS[9], which is the state of the art of Extract Method recommendation.

This paper is organized as follows. Section II introduces Extract Method refactoring, method name recommendation, and existing recommendation techniques as prior knowledge of this research; Section III describes the proposed technique; Section IV shows the evaluation experiments; and finally, Section V summarizes this research.

II Related Work

II-A Extract Method Refactoring

Extract Method refactoring is a technique to separate a method by extracting some code from an existing method as a new method[1]. The purpose of this refactoring is to improve the comprehension of the program by splitting a method that is too long, or a method that implements multiple features into each one. This refactoring consists of the following steps.

1.

Determine the code to be extracted from the existing methods.
2.

Create a new method by extracting the determined code into the body.
3.

Give the new method a clear name that identifies the role and behavior of the code.
4.

Replace the extracted code fragments with the call to the new method in the source method.

Fig. 1 is an example of Extract Method refactoring shown in [1]. The method printDetails is newly created by extracting lines 5-6 from the method printOwing in this figure. A statement calling printDetails is added instead of the extracted code fragment in the source method printOwing.

Extract Method is one of the most frequently applied refactorings. Murphy et al.[3] found that Extract Method refactoring was performed by more than 50% of the developers surveyed among the 11 major types of refactorings performed by the Eclipse refactoring function. Also, in the usage statistics of JDeodorant¹¹1https://users.encs.concordia.ca/~nikolaos/stats.html, a refactoring plugin for Eclipse, this refactoring accounts for almost 50% of all performed refactorings.

II-B Method Name Recommendation

Høst et al. analyzed the relationship between the behavior of methods and the verbs used in the method names[11]. They analyzed the typical behavior of methods including the verb in their names and identified the typical behavior for 40 concrete verbs. Also, Høst et al. proposed a technique that alerts the naming bugs of methods to developers and that recommends how to fix the naming bugs[12]. The technique suggests a list of method names according to the semantic distance between a method name and implementation of the method.

Kashiwabara et al. proposed a technique to recommend candidate verbs for a method name so that developers can use consistent verbs for method names[13]. They have identified four meaningful groups of rules for verb recommendation as follows: the first group of rules recommends the same verb as methods called in the method. The second group recommends verbs that are conceptually related to a certain word in the method. The third group recommends verbs related to a class definition The fourth group recommends verbs based on the Java programming idioms.

Allamanis et al. proposed how to generate method names by inputting a sequence of tokens appearing in the source code into a neural convolutional attentional model that includes a convolutional network within the attention mechanism itself[14]. Also, they presented deep learning models for recommending method names by modeling the code’s graph structure and learning program representations over those graphs[15].

Code2seq proposed by Alon et al.[10] is a technique that generates a distributed representation of a method using the Abstract Syntax Tree created from the source code of the method body and predicts the method name to input this representation into the NMT model from the seq2seq paradigm. This technique is an extension of code2vec[16] proposed by them and receives as input the source code of the method whose name is to be predicted and returns the candidates of the predicted method name in order of the probability of correct prediction (i.e. confidence). The predicted method names are output as a sequence of words. As far as we know, code2seq is currently the most accurate method name prediction technique.

Refer to caption — Figure 1: Example of Extract Method refactoring[1]

II-C Refactoring Support Techniques

Various techniques have been proposed to support developers’ refactoring activities. Terra et al.[17] and Kurbatova et al.[18] proposed a technique for recommending Move Method refactoring, and Bavota et al.[19, 20] proposed one to automatically identify Extract Class refactoring opportunities. Besides, a detailed review and comparison of existing techniques has been compiled by Pecorelli et al.[21] and Baqais et al.[22].

As for Extract Method refactoring, there are several techniques to recommend to developers what part of which method should be extracted. JDeodorant proposed by Tsantalis et al.[23, 24, 5, 25, 26] is a method that provides multiple types of refactoring support. Regarding Extract Method refactoring[5], it relies on the concept of program slicing which is a technique for extracting only the source code that affects arbitrary variables in the program from the original program. Specifically, the source code containing all the statements that change the value of a certain variable or the state of a certain object given in the body of the method is selected as the target for extraction.

Silva et al.[6, 7] proposed JExtract, which determines the need for extraction by obtaining metrics from a set of variables, types, and packages used in a method. First, the technique generates all the code fragments that can be extracted from the method as extraction candidates. The code fragments are created by eliminating the combinations that cause compilation errors when extracted after dividing the source code of the method into units of nested structures called blocks and obtaining all the combinations of consecutive statements in the same block. Next, sets of variables, types, and packages are created from the source method and each candidate code fragment, respectively. Then, the Kulczynski set similarity coefficient is obtained using the sets and the candidates with small similarities to the source method are recommended as extraction targets.

SEMI proposed by Charalampidou et al.[8] uses the method level $\text{LCOM}_{\text{2}}$ which measures the lack of class cohesion to determine the extraction target. The degree of cohesion is an index showing how much the functionalities of the source code are aggregated. SEMI creates candidate code fragments for extraction according to the unique algorithm and recommends extraction targets from among them according to the Benefit that is calculated using $\text{LCOM}_{\text{2}}$ obtained from the source method and each extraction candidate.

Xu et al.[9] proposed a method called GEMS. They created a classification model that consists of 48 different metrics as features to identify the extraction targets, because the metrics used in existing techniques only consider specific program elements, and that the approach that relies on specific metrics is not practical since actual refactoring doesn’t only aim to improve the metrics. GEMS uses the same extraction algorithm as JExtract to create code fragments for the extraction candidates and creates features from each candidate. Then, it classifies each candidate as to whether it should be extracted or not based on the model and recommends those that are classified as extraction targets in order of the highest prediction probability. As far as we know, GEMS has shown the best recommendation results among the existing Extract Method recommendation techniques.

III Proposed Technique

This paper proposes a technique to improve the recommendation correctness by considering whether candidate code fragments are semantically coherent chunks that can be given clear method names code fragments for extraction. As a criterion for the semantic coherency, the proposed technique employs the probability (i.e. confidence) of the predicted method names for the code fragments output by code2seq [10], which is a state-of-the-art method name prediction technique. The proposed method is based on the idea that since extract method refactoring is completed by giving a clear name to the newly created method that expresses its role and behavior, it is important for the refactoring decision whether the code fragment to be extracted has a semantic coherence that can be given such a name. The implementation of the proposed technique is based on GEMS, which is the state of the art of Extract Method recommendation techniques, and uses the confidence of the method name output by code2seq as an explanatory variable in addition to the features used by GEMS. Note that code2seq and GEMS used in the implementation are the ones taken from the GitHub²²2https://github.com repository[27] and the ones distributed as plugins, respectively.

Fig. 2 shows the overview of the proposed technique. The overall structure of the proposed method is similar to that of GEMS and consists of two stages. The first stage is the training stage, which builds a classification model to determine whether a code fragment is a good candidate for extraction or not, and is executed prior to the second stage. The second stage is the recommendation stage, which uses the constructed model to recommend a candidate of Extract Method, and is executed on request of the user.

III-A Training Stage

The training stage takes as input a set of examples of Extract Method refactorings, and outputs a statistical classification model that determines whether a given part of a method is suitable to be extracted as a method. The framework of this stage is almost the same as that of the same stage in GEMS, with the only difference being the addition of the confidence of code2seq as a feature. The example of an extracted method is a set of the entire source code of the original method and what part of the code was extracted as a new method.

Due to the requirements of the classification algorithm used by the technique, parts of the method that should not be extracted (i.e., negative examples) are required to build the classification model, so negative examples are generated from the (positive) examples of the Extract Method. Specifically, from the source code of the positive example, a part of the code that is different from the actual extraction is randomly selected, and the pair of the source code and the part is used as the negative example.

The features are then extracted from the positive and negative examples, and the set of features is fed to a learning algorithm to build the classification model. Totally 49 types of features are given to the learning algorithm, including the 48 types of features used in GEMS (Table I and Table II) and the confidence values obtained by applying code2seq to the part of the source code that are candidates for extraction.

III-B Recommendation Stage

The recommendation stage is executed in response to a request from a user, and outputs a ranked list of the parts of the method specified by the user that are considered suitable for extraction as a method. First, the stage lists all possible parts of the user-specified method that can be extracted as a method within the constraints of the programming language. Next, for each pair that combines the specified method with each part, the 49 features including the confidence of code2seq are calculated. The features are then input to the classification model constructed in the training stage to obtain the probability that the extract method represented by each pair is likely to be performed. Finally, a list of pairs with probabilities higher than a certain threshold is created and provided to the user in order of the probability.

III-C Features for classification

The confidence that the code2seq calculates when predicting the method names of code fragments is used as an indicator of whether or not it is easy to give clear method names to candidate code fragments for extraction, and this value is added to existing features to extend GEMS. An overview of the GEMS extension is shown in Fig. 3. When GEMS is applied to a certain method, all the code fragments that can be extracted are generated as extraction candidates, and 48 different metrics are created as features from the pairs of the source method and each code fragment. Then, each code fragment is converted into the form of a method by the method extraction function of Eclipse JDT and input to code2seq because it can only receive source code in method form. After that, the confidence of the method name predicted by code2seq at the top is added to the features created by GEMS, for each code fragment.

The features used in GEMS are shown in Table I and Table II. These are divided into two categories: structural features and functional features. The structural features include 28 types, and the functional features include 20 types. The structural features shown in Table I are mainly created from each of the code fragments that are extraction candidates and the remaining code obtained by removing the fragments of extraction candidates from the source method. The functional features shown in Table II are created in the following way. First, for each program element, the element whose ratio of the number used in the candidate code fragment to the number used in the source method is 1st or 1st - 2nd is identified. This ratio is the feature corresponding to the Usage rate in extraction candidates in the table. Next, the ratio of the lines of code in which the identified element is used to the lines of code in the candidate code fragment is calculated. This ratio is the feature corresponding to the Dedication to elements with high usage in the table.

TABLE I: Structural features generated by GEMS

Description of the features	The candidate code fragment	The remaining code
Lines of code	LOC_EXTRACTED_METHOD	CON_LOC
Number of local variables defined	NUM_LOCAL	CON_LOCAL
Whether literals are defined or not	NUM_LITERAL	CON_LITERAL
Number of method invocations	NUM_INVOCATION	CON_INVOCATION
Number of if statement	NUM_IF	CON_IF
Number of conditional operators	NUM_CONDITIONAL	CON_CONDITIONAL
Number of switch statement	NUM_SWITCH	CON_SWITCH
Number of variables accessed	NUM_VAR_AC	CON_VAR_ACC
Number of types accessed	NUM_TYPE_AC	CON_TYPE_ACC
Number of fields accessed	NUM_FIELD_AC	CON_FIELD_ACC
Number of assignments	NUM_ASSIGN	CON_ASSIGN
Number of typed elements	NUM_TYPED_ELE	CON_TYPED_ELE
Number of packages to reference	NUM_PACKAGE	CON_PACKAGE
Number of assert statement	CON_ASSERT	n/a
Ratio of LOC in the candidate code fragment to LOC in the source method	RATIO_LOC

TABLE II: Functional features generated by GEMS

The program element	Usage rate in extraction candidates (1st, 2nd)	Dedication to elements with high usage (1st, 2nd)
Local variable	RATIO_VARIABLE_ACCESS	VARAC_COHESION
Local variable	RATIO_VARIABLE_ACCESS2	VARAC_COHESION2
Field	RATIO_FIELD_ACCESS	FIELD_COHESION
Field	RATIO_FIELD_ACCESS2	FIELD_COHESION2
Method	RATIO_INVOCATION	INVOCATION_COHESION
Type	RATIO_TYPE_ACCESS	TYPEAC_COHESION
Type	RATIO_TYPE_ACCESS2	TYPEAC_COHESION2
Typed element	RATIO_TYPED_ELE	TYPEDELE_COHESION
Package	RATIO_PACKAGE	PACKAGE_COHESION
Package	RATIO_PACKAGE2	PACKAGE_COHESION2

IV Evaluation Experiment

An evaluation experiment is conducted to confirm whether the introduction of the code2seq confidence in the proposed method contributes to the improvement of the correctness of the Extract Method recommendation.The baseline for the evaluation is GEMS, which is the state-of-the-art Extract Method recommendation technique and also the basis for the implementation of the proposed method. The experiment shows the improvement of the correctness against the baseline and the contribution of the confidence to the improvement.

IV-A Experimental Design

To evaluate the improvement of recommendation correctness and each feature contributes, the proposed technique is compared with GEMS as a baseline. Using a prediction model with the confidence and the prediction model of GEMS, the recommendation results for the test data are compared using three metrics. Also, the importance of the features is examined for each model.

As in the previous studies[8, 9], we use the top-5 recommendations for each method of test data to evaluate the recommendation performance with three metrics: precision, recall, and F-measure. It should be noted that all the recommended extraction candidates are classified as should be extracted. The precision is the ratio of candidates with correct recommendation results to the total number of recommended extraction candidates. The recall is the percentage of targets that are actually recommended out of all refactoring targets in test data. The F-measure is the weighted average of the precision and recall, and is calculated by the following formula:

F-measure=\frac{2\times precision\times recall}{(precision+recall)}

Like previous studies[8, 9], the value called tolerance is introduced as a permissible limit on variation in lines of code when the correctness of the recommendation is evaluated. The three tolerance patterns to be used are 1%, 2%, and 3%. The number of the extraction source method lines is multiplied by the tolerance, and the value rounded up to the nearest whole number is used as the tolerance line. For example, if a method with 50 lines is given and the tolerance is 3%, the tolerance line is 2, since 50 * 0.03 = 1.5. In this case, the extraction candidates with an error of ±2 lines from the correct recommendation are also considered to be correct targets.

Also, the Gini importance[28] calculated by the Gradient boosting classifier of scikit-learn is used when comparing the importance of features. The higher the Gini importance of a feature, the more important the feature is considered to be.

IV-B Setup

IV-B1 Prediction Model

A prediction model with the confidence by code2seq added to GEMS features is created using machine learning. The real Extract Method refactoring data created by Silva et al.[29] from the Java project available on GitHub is used to create the training data. The real data is also used in the study of GEMS[9] and consists of the methods that have undergone Extract Method refactoring and the code fragments extracted from those methods. Using this real data, training data is created from the features of the positive and negative examples as shown in Section III-A. Since the case where only one extraction candidate as a positive example is created from a method is included, a total of 479 pieces of training data consisting of 244 positive examples and 235 negative examples were created from 244 methods as a result. The training data has 49 dimensions. Gradient boosting classifier³³3https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html implemented in scikit-learn[30], the Python machine learning library, is used as the machine learning algorithm similar to the study of GEMS. Parameter tuning is performed using Optuna[31] that is a hyperparameter optimization framework to determine the values of the hyperparameters in building the model. 5-fold cross-validation is performed on 479 training data which is the total of positive and negative examples, and the hyperparameters are searched so that the average of the five F-measure is as high as possible.

IV-B2 Test Data

The test data uses the methods that Extract Method refactoring should be executed identified from five Java projects: SelfPlanner, WikiDev, JHotDraw, Junit, and MyWebMarket. The projects are considered to be quality open-source software, and 155 code fragments to be extracted have been identified from 130 methods across these five projects. Among all data, the data from SelfPlanner and WikiDev were created by Tsantalis et al.[5] and used to evaluate the performance of JDeodorant. The data from JHotDraw, Junit, and MyWebMarket were created by Silva et al.[6] and used to evaluate the performance of JExtract. Charalampidou et al.[8] and Xu et al.[9] have also evaluated SEMI and GEMS using the data sets from these five projects, which are currently considered to be the most suitable data sets for evaluating the work of Extract Method refactoring recommendation.

TABLE III: Comparison of correctness

Metrics	Tolerance	GEMS	GEMS + Conf
Precision	None	0.08757	0.09574
	1%	0.21191	0.21986
	2%	0.21366	0.22518
	3%	0.22067	0.23227
Recall	None	0.32258	0.34839
	1%	0.45806	0.52258
	2%	0.46452	0.52903
	3%	0.48387	0.54839
F-measure	None	0.13774	0.15021
	1%	0.28977	0.30950
	2%	0.29269	0.31590
	3%	0.30310	0.32632

IV-C Results

IV-C1 Comparison of Correctness

The results of the comparison of the three metrics are shown in TABLE III. In the table, a model labeled GEMS is created from GEMS features only, while a model labeled GEMS + Conf is created by adding confidence from code2seq to GEMS features. The total number of the top-5 recommendations for the 130 methods is 571 using the model without confidence and 564 using the model with confidence.

From TABLE III, it can be seen that the precision, recall, and F-measure increase for all the tolerances when the confidence is used compared to the case where it is not used. The F-measure increases by about 1.25 for no tolerance, 1.97 for 1%, and 2.32 for 2% - 3%. Then, the increase in the recall is larger than that of the precision, which is about 2.58 for no tolerance and about 6.45 for 1% - 3%.

In this paper, the model is built so that the number of positive and negative examples of the training data is almost equal, following the previous research, but in reality, most of the extraction candidates obtained from the method are considered to be classified as results that should not be extracted. Therefore, the ratio of positive examples to negative examples while model training is not considered to be realistic. So, it is necessary to note that the reliability of the precision is considered to be lower than that of recall.

TABLE IV: Feature importance of GEMS

Rank	Feature	Importance
1	TYPEDELE_COHESION	0.15175
2	NUM_TYPED_ELE	0.12510
3	RATIO_LOC	0.10388
4	INVOCATION_COHESION	0.09432
5	NUM_PACKAGE	0.09407
6	CON_LOC	0.05041
7	NUM_TYPE_AC	0.04943
8	VARAC_COHESION	0.04165
9	CON_TYPED_ELE	0.03155
10	CON_PACKAGE	0.03067

TABLE V: Feature importance of the proposed technique

Rank	Feature	Importance
1	CODE2SEQ_CONFIDENCE	0.30011
2	TYPEDELE_COHESION	0.15584
3	INVOCATION_COHESION	0.08658
4	NUM_PACKAGE	0.07611
5	RATIO_LOC	0.05153
6	NUM_INVOCATION	0.04172
7	NUM_VAR_AC	0.03058
8	NUM_TYPE_AC	0.02505
9	RATIO_TYPE_ACCESS	0.02409
10	LOC_EXTRACTED_METHOD	0.02188

TABLE VI: Comparison of the confidence between positive and negative data

Label	Maximum	Minimum	Mean	Median
Positive	0.67956	$5.9918\times 10^{-15}$	$6.3695\times 10^{-2}$	$3.6578\times 10^{-3}$
Negative	0.76879	$3.2844\times 10^{-18}$	$1.1233\times 10^{-2}$	$4.9674\times 10^{-7}$

IV-C2 Comparison of Feature Importance

The top-10 importance of the features in each model are compared. The importance of the models without confidence is shown in TABLE IV, and the importance of the models with confidence is shown in TABLE V. In TABLE V, CODE2SEQ_CONFIDENCE represents the confidence by code2seq added in the proposed technique.

From TABLE IV, when confidence is not used in the model, the most important feature is TYPEDELE_COHESION, with a score of 0.15175. The difference in score between this feature and the second most important feature, NUM_TYPED_ELE, is about 2.67. On the other hand, TABLE V shows that when confidence is used in the model, CODE2SEQ_CONFIDENCE is the feature with the highest importance. The score for this importance is 0.30011, which is about 14.84 higher than the highest score in TABLE IV, and the difference in score with TYPEDELE_COHESION, which is the second most important, is about 14.43.

IV-D Discussion

From Section IV-C1, it is found that the recommendation correctness of Extract Method refactoring can be improved by adding the confidence since the F-measure increases for all the tolerance. Also, since the increase in the recall is larger than that in the precision, it can be considered that the confidence has a significant effect in increasing the coverage of the refactoring target to be extracted in the recommendation. Furthermore, it can be said that the confidence makes a significant contribution to the prediction among all the features from the results in Section IV-C2.

Code2seq has the nature of predicting the method name by acquiring semantic information about the source code from the syntactic structure of the method. The reason why the confidence of code2seq is excellent in identifying the target of Extract Method refactoring is considered to be that this nature allowed us to determine whether the candidate code fragments have a semantic coherence suitable for refactoring that is easy to explain their roles. To examine whether there is a clear difference in the value of the confidence between the code fragments that are real examples for refactoring and other code fragments, the maximum, minimum, mean, and median confidence for the extracted code fragments of the positive and negative examples created from the real data in Section IV-B1 are compared. The results of the comparison are shown in TABLE VI. The table shows that the minimum, mean, and median confidence of the positive examples is higher than those of the negative examples, suggesting that the confidence tends to be higher for code fragments with semantic coherence that should be refactored.

V Conclusion

This paper has proposed a technique to improve the recommendation correctness of Extract Method refactoring by using the confidence value of predicted names by code2seq, a method name prediction technique, for newly created methods by refactoring. The proposed technique employs the metrics of GEMS, which has the highest correctness among the existing extract method recommendation techniques, in addition to the confidence values, and the implementation of the proposed technique is also based on GEMS. The evaluation experiments comparing the proposed technique with GEMS confirmed the high preciseness of the proposed technique and also revealed that the confidence value contributes significantly to the estimation.

As a future work, an evaluation experiment with a larger data set is considered to be essential. On the other hand, the correctness can be improved by using the confidence values of the second-ranked method names as features in addition to the top-ranked one.

References

[1] Martin Fowler. Refactoring: Improving the Design of Existing Code. Addison-Wesley, 1999.
[2] Tom Mens and Tom Tourwé. A survey of software refactoring. IEEE Transactions on Software Enginering, 30(2):126–139, February 2004.
[3] Gail C. Murphy, Mik Kersten, and Leah Findlater. How are java software developers using the eclipse ide? IEEE Software, 23(4):76–83, 2006.
[4] Alexander Chatzigeorgiou and Anastasios Manakos. Investigating the evolution of bad smells in object-oriented code. In Proceedings of the 2010 Seventh International Conference on the Quality of Information and Communications Technology, QUATIC ’10, page 106–115, USA, 2010. IEEE Computer Society.
[5] Nikolaos Tsantalis and Alexander Chatzigeorgiou. Identification of extract method refactoring opportunities for the decomposition of methods. Journal of Systems and Software, 84(10):1757–1782, 2011.
[6] Danilo Silva, Ricardo Terra, and Marco Tulio Valente. Recommending automated extract method refactorings. In Proceedings of the 22nd International Conference on Program Comprehension, ICPC 2014, page 146–156, New York, NY, USA, 2014. Association for Computing Machinery.
[7] Danilo Silva, Ricardo Terra, and Marco Tulio Valente. Jextract: An eclipse plug-in for recommending automated extract method refactorings. In Brazilian Conference on Software: Theory and Practice, pages 1–8, 2015.
[8] Sofia Charalampidou, Apostolos Ampatzoglou, Alexander Chatzigeorgiou, Antonios Gkortzis, and Paris Avgeriou. Identifying extract method refactoring opportunities based on functional relevance. IEEE Transactions on Software Engineering, 43(10):954–974, 2017.
[9] Sihan Xu, Aishwarya Sivaraman, Siau-Cheng Khoo, and Jing Xu. Gems: An extract method refactoring recommender. In 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE), pages 24–34, 2017.
[10] Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. code2seq: Generating sequences from structured representations of code. In 7th International Conference on Learning Representations, 2019.
[11] Einar W. Høst and Bjarte M. Østvold. The programmer’s lexicon, volume i: The verbs. In Seventh IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2007), pages 193–202, 2007.
[12] Einar W. Høst and Bjarte M. Østvold. Debugging method names. In Sophia Drossopoulou, editor, ECOOP 2009 – Object-Oriented Programming, pages 294–317, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.
[13] Yuki Kashiwabara, Yuya Onizuka, Takashi Ishio, Yasuhiro Hayase, Tetsuo Yamamoto, and Katsuro Inoue. Recommending verbs for rename method using association rule mining. In 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), pages 323–327, 2014.
[14] Miltiadis Allamanis, Hao Peng, and Charles Sutton. A convolutional attention network for extreme summarization of source code. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2091–2100, New York, New York, USA, 20–22 Jun 2016. PMLR.
[15] Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. In International Conference on Learning Representations, 2018.
[16] Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):40:1–40:29, 2019.
[17] Ricardo Terra, Marco Tulio Valente, Sergio Miranda, and Vitor Sales. Jmove: A novel heuristic and tool to detect move method refactoring opportunities. Journal of Systems and Software, 138:19–36, 2018.
[18] Zarina Kurbatova, Ivan Veselov, Yaroslav Golubev, and Timofey Bryksin. Recommendation of move method refactoring using path-based representation of code. In Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, ICSEW’20, page 315–322, New York, NY, USA, 2020. Association for Computing Machinery.
[19] Gabriele Bavota, Andrea De Lucia, and Rocco Oliveto. Identifying extract class refactoring opportunities using structural and semantic cohesion measures. Journal of Systems and Software, 84(3):397–414, 2011.
[20] Gabriele Bavota, Andrea De Lucia, Andrian Marcus, and Rocco Oliveto. Automating extract class refactoring: An improved method and its evaluation. Empirical Software Engineering, 19(6):1617–1664, December 2014.
[21] Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, and Andrea De Lucia. Comparing heuristic and machine learning approaches for metric-based code smell detection. In 2019 IEEE/ACM 27th International Conference on Program Comprehension, pages 93–104, 2019.
[22] Abdulrahman Baqais and Mohammad Alshayeb. Automatic software refactoring: a systematic literature review. Software Quality Journal, 28:459–502, 2020.
[23] Nikolaos Tsantalis and Alexander Chatzigeorgiou. Identification of move method refactoring opportunities. IEEE Transactions on Software Engineering, 35(3):347–367, 2009.
[24] Nikolaos Tsantalis and Alexander Chatzigeorgiou. Identification of refactoring opportunities introducing polymorphism. Journal of Systems and Software, 83(3):391–404, 2010.
[25] Marios Fokaefs, Nikolaos Tsantalis, Eleni Stroulia, and Alexander Chatzigeorgiou. Identification and application of extract class refactorings in object-oriented systems. Journal of Systems and Software, 85(10):2241–2260, 2012. Automated Software Evolution.
[26] Nikolaos Tsantalis, Davood Mazinanian, and Shahriar Rostami. Clone refactoring with lambda expressions. In Proceedings of the 39th International Conference on Software Engineering, ICSE ’17, page 60–70. IEEE Press, 2017.
[27] tech-srl/code2seq: Code for the model presented in the paper: ”code2seq: Generating sequences from structured representations of code”, https://github.com/tech-srl/code2seq.
[28] Leo Breiman, Jerome Friedman, Richard Olshen, and Charles J. Stone. Classification and Regression Trees. CRC press, 1984.
[29] Danilo Silva, Nikolaos Tsantalis, and Marco Tulio Valente. Why we refactor? confessions of github contributors. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, page 858–870, New York, NY, USA, 2016. Association for Computing Machinery.
[30] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(null):2825–2830, November 2011.
[31] Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, page 2623–2631, New York, NY, USA, 2019. Association for Computing Machinery.

Recommending Extract Method Refactoring Based on Confidence of Predicted Method Name