\ourmodel: multi-mode translation of natural language and \python code with transformers
Appendix A Appendix
A.1 Docstring statistics
Figure 1 shows the distributions of various features of docstrings in our corpus. The top row is the distribution of total character-level length of the method signatures (left), docstrings (center), and code bodies. The blue lines are for methods possessing a docstring, and we can see that the vast majority of these methods have docstrings with more than 10 characters. The bottom row shows the distribution of line lengths of the concomitant features from the top row. While the most common line length of docstrings is 1 (comprising 41%), the vast majority of docstrings have multiple lines.

A.2 Pre-training details
Figure 3 is the complete training script, using the Facebook AI Research Sequence (FairSeq) modeling library, with which we pre-trained \ourmodel. The data was pre-noised and processed using the fairseq-preprocess command, and placed in the directory indicated by $DIR. The architecture and training hyper-parameters are set in this script. \ourmodel was trained with the same hyperparameters, but with data described in sec.A.4.
Figure 3 shows learning curves of a single seq2seq model of the same architecture as \ourmodel trained only on docstrings, starting from random initializations, and starting from our pre-trained model. As the figure shows, the pre-trained initialization converged to a better validation loss 25 faster than the randomly initialized model.

A.3 GPT2 training details
Our GPT2 experiments also used the FairSeq library, with the OpenAI English checkpoint supplied by the HuggingFace library. Figure 4 shows the complete training script, where for the English pre-trained initialization a pre-trained checkpoint was provided. Each models was trained on 4 Tesla V100 GPUs with 16GB of memory each, for 7 days.
A.4 Multi-mode training details
In order to better teach \ourmodel to understand the relationships between all the different features of code (signatures, docstrings, and bodies) we taught it to translate between all pairs of combinations of these features which do not contain the same feature in both the source and target. In this way, the model can learn to produce method bodies using both signatures and docstrings, or one or the other. Table 1 spells out exactly which combinations were provided to the model as a source and target. For each source example the comment string ‘# target <feature> (<style>)’ was added, instructing the model which feature combination (e.g. signature and body). Only if a docstring was in the target, a style imperative was added, where the styles are defined and discussed in the main text.
Figure 5 shows the training curves for \ourmodel, where the solid black line is the training loss, and all the other curves are the validation loss for each of the tasks indicated in tab. 1. The dashed lines indicate tasks where docstrings are present in the target, showing that these are generally less predictable than code-only targets (as the validation loss is larger). \ourmodelwas trained on 16 Tesla V100 16GB GPUs for 62 epochs, or 5 weeks training time.
Sources | |||||||
Signature |
Dosctring |
Body |
Sig + doc |
Sig + body |
Doc + body |
||
Signature | ✓ | ✓ | ✓ | ||||
Docstring | ✓ | ✓ | ✓ | ||||
Body | ✓ | ✓ | ✓ | ||||
Sig + doc | ✓ | ||||||
Targets |
Sig + body | ✓ | |||||
Doc + body | ✓ |
