¹¹institutetext: Université Cheikh Anta Diop, Dakar, Sénégal,
¹¹email: [email protected], ¹¹email: [email protected]
https://www.ucad.sn/ ²²institutetext: Baamtu, Dakar, Sénégal
²²email: [email protected]

Low-Resourced Machine Translation for Senegalese Wolof Language

Derguene Mbaye 1122 0000-0002-7490-2731 Moussa Diallo 11 Thierno Ibrahima Diop 22

Abstract

Natural Language Processing (NLP) research has made great advancements in recent years with major breakthroughs that have established new benchmarks. However, these advances have mainly benefited a certain group of languages commonly referred to as resource-rich such as English and French. Majority of other languages with weaker resources are then left behind which is the case for most African languages including Wolof. In this work, we present a parallel Wolof/French corpus of 123,000 sentences on which we conducted experiments on machine translation models based on Recurrent Neural Networks (RNN) in different data configurations. We noted performance gains with the models trained on subworded data as well as those trained on the French-English language pair compared to those trained on the French-Wolof pair under the same experimental conditions.

Keywords:

Low Resource, Machine Translation, African languages, RNN

1 Introduction

A Machine Translation (MT) system allows to switch from a textual sequence (or an audio source) in a source language, to the same sequence in the target language. For a long time, Statistical Machine Translation (SMT) systems [1] were the most popular approach before Neural Machine Translation (NMT) ones [2] came along and achieved an increasingly higher performance. However, the quality of such systems has always been closely related to the amount of data used in their design [3]. Thus, state-of-the-art MT systems have been developed with sequence-to-sequence models using the attention mechanism [4] as well as the Transformer architecture [5]. Languages for which this binding does not represent a constraint, such as English, are said to be resource-rich and have several million sentence pairs; most other languages fall under the concept of ”Low Resource” (LR). However, the term ”Low Resource” can encompass various aspects and can extend beyond the language to domains or tasks where little data is available even if the language is a resource-rich language. This is illustrated in [6] where the concept of ”Low Resource” is defined in three different aspects: availability of task-specific labels, unlabeled language text, and auxiliary data. As shown in [7], most African languages fit into this description which makes the work of researchers difficult and contributes to the low representation of African languages in NLP research [8]. This is particularly the case for Wolof which, beyond the lack of data, is a language for which little work has been done in NLP.

An Automatic Speech Recognition (ASR) dataset on 4 african languages including Wolof was collected in [9] and used to design the first ASR system in this language. In [10], the design of the first collaborative online dictionary in Wolof adapted to the LMF¹¹1Lexical Markup Framework standard has been initiated. As part of the Dictionnaires Langue Africaine-Français (DiLAF)²²2http://pagesperso.ls2n.fr/~enguehard-c/DiLAF/index.php project, researchers have produced several dictionaries on 7 African languages including Wolof. However, at the time of writing, all the dictionaries are available online except Wolof. The autors in [11] explored the development of a finite-state based morphological analyzer for Wolof, the implementation and evaluation of an LFG-based parser for Wolof [12] and the creation of a Universal Dependency (UD) treebank for Wolof [13] which is the first UD treebank within the Northern Atlantic branch of the Niger-Congo languages. In [14], the authors studied the design of a spellchecker for Wolof by presenting an approach based on a dictionary as a lexicon and a morphological analyzer of the Wolof language.

However, to the best of our knowledge, the only work exploring specifically Wolof French machine translation systems is that in [15] where the authors presented a corpus of 70,000 Wolof French parallel sentences with which Word Embedding models as well as LSTM-based translation models were developed; and in [16] where the authors extended the corpus to 83,000 sentences with which they trained two neural machine translation systems for the French $\rightarrow$ Wolof and Wolof $\rightarrow$ French directions based on the Transformer architecture. However, the results presented in [15] were reported in terms of accuracy making it difficult to evaluate the actual translation quality of their systems. Multilingual neural machine translation systems including Wolof have also been developed such as in [17] where authors leveraged existing pre-trained models to create low-resource translation systems for 16 African languages. The Meta’s No Language Left Behind project³³3https://ai.facebook.com/research/no-language-left-behind/ which is capable of translating 200 languages between each other also includes the Wolof language. Nevertheless, beyond Wolof, substantial work has been done on low-resource language NMT (LRL-NMT) in general. The Masakhane community⁴⁴4https://www.masakhane.io/ proposes to address the challenge by targeting African languages with a participatory approach [18] including all relevant resource persons in the process leading to the production of MT datasets and benchmarks for over 30 languages. A detailed study of different approaches has been performed in [19] to address LRL-NMT and a set of guidelines has been defined to select the possible NMT techniques for a given LRL data setting. A set of experiments has been performed in [20] on different translation systems, both neural and statistical based, to translate from English to Icelandic. Most of these works, however, are based on the Transformer architecture, which is very data-intensive. Less recent architectures such as RNNs could perform better in low-resource environments because of the lower parameters required.

In this paper, we present a work in progress of French-Wolof parallel sentence data collection constituting to date, the largest corpus yet collected in this language pair with 123,000 sentences filtered and aligned at the sentence level. We then propose to go further regarding the work in [15] and explore the performance of RNN models on our corpus by evaluating them with the BiLingual Evaluation Understudy (BLEU) metric [21], which is more representative than accuracy. Since subwording i.e. segmentation of the corpus into words or subwords, tends to improve the performance of translation models as shown in [22], we then experimented with the impact of this approach on our models. The paper is therefore organized as follows:

•

In Section 2, we present a describtion of the Wolof language.
•

The data collection and filtering process are presented in Section 3.
•

Section 4 presents the experiments performed.
•

The results are shown in Section 5.
•

Section 6 concludes the work.

2 The Wolof Language

As a West-Atlantic language mainly spoken in Senegal and Gambia, Wolof is also used in the Southern part of Mauritania. It belongs to the Atlantic group of the Niger-Congo language family and over seven million people spreading across three West African states is currently speaking Wolof. While only about 40% of the Senegalese population are Wolof, about 90% of the people speak the language as either their first, second or third language⁵⁵5https://www.axl.cefan.ulaval.ca/afrique/senegal.htm.

There are two major geographical varieties of Wolof: one spoken in Senegal, and the other spoken in Gambia [23]. Even if people who speaks Wolof understand each other, the Senegalese Wolof and the Gambian Wolof are two distincts languages: both own their ISO 639-3 language code (respectively ”WOL” and ”WOF”). Although it has a long tradition of writing using the Arabic script known as Ajami or Wolofal, it has also been adapted to Roman script.

Wolof is an agglutinative language [11] whose alphabet is quite close to the French one: we can find all the letters of its alphabet except H, V and Z [24]. It also includes the characters $\eta$ (”ng”) and $\widetilde{N}$ (”gn”, as in Spanish). Accents are present, but in limited number ( $\acute{A}$ , $\acute{E}$ , $\tilde{A}$ , $\acute{O}$ ). Twenty nine (29) Roman-based characters are used from the Latin script and most of them are involved in digraphs standing for geminate and prenasalized stops. Unlike many other Niger-Congo languages, Wolof does not have tones. Nevertheless, Wolof syllables differ in intensity, e.g., long vowels are pronounced with more intensity than short ones. Length is represented by double vowel letters in writing and most Wolof consonants can be also geminated (doubled). However, Wolof is not a standardized language (and some sources exclude the ”H” from the alphabet) since no single variety has ever been accepted as the norm. Nonetheless, the Center of Applied Linguistics of Dakar (CLAD), coordinates the orthographic standardization of the Wolof language [9].

3 Data Collection

3.1 Corpus

The construction of a dataset is a tedious and time-consuming task, especially for languages that have yet to be standardized like Wolof. The language is not taught in school and few people follow the spelling rules, which makes the texts available on sources such as social networks very heterogeneous and difficult to use. We therefore opted to collect data in French, since this is the official language in Senegal since colonization, and to have them translated by competent linguists to build part of the dataset from scratch.

Refer to caption — Figure 1: Data collection pipeline

The linguists used the official Wolof alphabet established by the government⁶⁶6http://www.jo.gouv.sn/spip.php?article4802 to perform the translation. Monolingual french data are collected from existing resources such as Opus and text scraped from online sources that include news sites, religious and blogs. We used Opus to collect monolingual textual data in French and collected translations of the Quran and the Bible as parallel texts. We also collected data from offline sources such as French books that have been translated into Wolof. We were thus able to collect a corpus of 123,000 parallel French-Wolof sentences, making our corpus the largest collected to date.

For experimental purposes, the overall dataset is divided into three subsets: a training set, a validation set and a test set. The validation and test sets are kept fixed and separated from the full dataset with 16,000 sentences for the validation set and 7,000 for the test set. We only vary the training set from 10,000 to 100,000 sentences in steps of 10,000 sentences.

3.2 Data filtering

Before distributing the data between the different experimental configurations, we performed a set of post-processing operations. We started by performing stratified sampling to ensure that the validation and test sets were representative of the overall dataset and thus limit sampling bias.

Since the quality of the system depends directly on the quality of the data, we were inspired by the approaches proposed in [25] to then filter our dataset. We have thus removed sentences written in the same language on both sides, duplicate pairs of sentences as well as sentences that are identical on both sides. We also removed special characters, URLs and filtered out sentences that were too long and under-represented in the dataset. We consider a sentence to be too long when its size (number of words) is greater than twice the average size of the sentences in the dataset considered.

4 Experiments

Despite having collected a corpus of 123,000 sentences, we are still in a low-resource configuration for the NMT. We have therefore opted for a medium data-intensive architecture (compared to SMT and Transformers) and exploited data manipulations to maximize the performance of the model.

We used OpenNMT [26] to reproduce a similar architecture to that of [15] in order to compare the results. The RNN model is thus composed of an LSTM layer [27] at both the encoder and decoder with 300 hidden units and a dropout layer. The dropout rate is set to 0.1 and the embedding size to 128. We have defined an optimizer Adam [28] with a learning rate of 0.001 and the batch size is set to 4096 tokens.

We split our dataset into different size configurations and in each configuration, the model is trained in the directions Fr $\rightarrow$ Wo and Fr $\rightarrow$ En until it reaches convergence. Convergence is considered to be reached when no improvement is observed on the validation set after 6 checkpoints.

For data subwording, we used SentencePiece [29] with Byte-Pair Encoding (BPE) which offers interesting performance gains in agglutinative languages like Wolof [30]. We then generated a vocabulary on all segments of the considered size configuration’s training set and performed an automatic model evaluation using BLEU [21]. BLEU is the most widely used metric in NMT in view of the fairly high correlation it has with human evaluations. We used the SacreBLEU [31] implementation⁷⁷7version 2.0.0 of the BLUE metric to evaluate the models.

5 Results

We compare the same architectures in the same data size configurations (i) when the data are provided to the model in a raw form i.e. without subwording, compared to when they are subworded before training (ii) when they are trained on the different language pairs i.e. Fr $\rightarrow$ En compared to Fr $\rightarrow$ Wo. The first case allows us to measure the impact of subwording on the quality of the translations and the second allows us to observe the influence of linguistic properties between languages that can facilitate or hinder translation performance.

Tables 1 and 2 show the results of the translation experiments and all BLEU scores were computed on the test set.

Table 1: French Wolof Experimentation

Training size	No subword	With subword
100k	15.22	16.71
90k	14.41	15.28
80k	15.12	16.09
70k	12.76	14.85
60k	12.11	14.23
50k	10.45	12.14
40k	9.35	11.03
30k	7.33	9.73
20k	5.58	7.45
10k	3.94	4.84

In Table 1, we observe a gain of about 1.6 point of BLEU score between the raw corpus and the subworded one on Fr $\rightarrow$ Wo data, which can be explained by the fact that the subwords are more frequent and are therefore better learned by the model. This gain is more visible with Fig.2 where we see that the performance of the model on the subworded data is better at each training checkpoint.

Table 2: French English Experimentation

Training size	No subword	With subword
100k	18.88	22.19
90k	18.52	21.11
80k	18.05	20.79
70k	17.82	20.57
60k	16.70	19.28
50k	15.17	18.94
40k	14.18	17.52
30k	4.68	16.22
20k	10.5	14.8
10k	3.34	10.1

We observe a similar pattern in Table 2 on Fr $\rightarrow$ En data with a gain of about 4 points of BLUE score this time. When we compare the experimental results between the two language pairs, we also notice that under the same experimental conditions (corpus size and subwording), a gain of about 3.5 is noted on the BLEU score on Fr $\rightarrow$ En data compared to Fr $\rightarrow$ Wo.

Fig.3 illustrates well the behavior of the Fr $\rightarrow$ En models on the different dataset formats with a sharp drop at checkpoint 30k. This is explained by the quality of the added data segment which contains a lot of artifacts and illustrates the fact that not all data points are useful for training.

In addition to subwording, we wanted to observe whether linguistic properties shared between two languages could influence translation performance. Fig.4 and Fig.5 illustrate the performance of the two models in the same configurations (architecture and data) on the language pairs Fr $\rightarrow$ Wo and Fr $\rightarrow$ En.

In general, whether the data is subworded or not, we notice that the performance of the model trained on the Fr $\rightarrow$ En language pair is better than the one trained on the Fr $\rightarrow$ Wo language pair at all training checkpoints except the one at 30k where a sharp drop is observed. This can be explained by the linguistic similarities between French and English which, although belonging to different families, share the same alphabet. They also have a lexical similarity of 27% [23] and words from one language that are found or have their origins in the other language. Our assumption is that the difference in morphology between the language pairs influences the ability of the model to translate one language into the other. The Wolof alphabet has more letters than the French one and Wolof is morphologically richer, which could hinder the ability of the model to capture the specificities of this language.

6 Conclusion

In this article, we presented a French Wolof parallel corpus of 123,000 sentences. This corpus was mostly collected from scratch, as openly accessible resources concerning this pair are scarce. As the collection project is still in progress, the dataset is not yet open. We then conducted experiments on various architectures of LSTM and global attention based neural machine translation models and showed that these systems were more efficient on subworded data. Further experiments attempted to investigate the impact of linguistic similarity between a language pair on translation performance by comparing systems on two different language pairs under the same experimental conditions: Fr $\rightarrow$ Wo and Fr $\rightarrow$ En.

To the best of our knowledge, our corpus constitutes the largest corpus yet collected in this language pair and it is the first work where LSTM-based machine translation systems specifically for the Fr $\leftrightarrow$ Wo language pair are presenting the performance with the BLEU metric which allows to better appreciate the performance of NMT models.

However, the BLUE metric may induce biases and therefore not be sufficient for a complete evaluation of the actual quality of our systems [32]. Subwording also brought significant gains, but the SentencePiece method is language agnostic and may not be optimal for all languages. On the other hand, RNN systems suffer from the inability to handle long sequences even when LSTM or GRU [27] cells are used. State of the art systems today are mainly based on the Transformer architecture which has a better ability to handle longer sequences and allows parallelization as it does not do sequential processing. Cross-lingual transfer learning approaches have also shown very promising results in addressing machine translation for low-resource languages and are thus a relevant direction to explore.

In future work, we plan to further extend our dataset and explore Transformer-based models that, although data-intensive, can be optimized for a limited resource configuration [33]. We will also do a comparative analysis of multilingual models in order to choose the one that has better transfer learning performance with Wolof and perform transfer learning on it.

References

Koehn [2009] Koehn P (2009) Statistical Machine Translation. Cambridge University Press, DOI 10.1017/CBO9780511815829
Bahdanau et al [2014] Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ArXiv 1409
Koehn and Knowles [2017] Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, Association for Computational Linguistics, Vancouver, pp 28–39, DOI 10.18653/v1/W17-3204, URL https://aclanthology.org/W17-3204
Luong et al [2015] Luong T, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, pp 1412–1421, DOI 10.18653/v1/D15-1166, URL https://aclanthology.org/D15-1166
Vaswani et al [2017] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol 30, URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Hedderich et al [2021] Hedderich M, Lange L, Adel H, Strötgen J, Klakow D (2021) A survey on recent approaches for natural language processing in low-resource scenarios. pp 2545–2568, DOI 10.18653/v1/2021.naacl-main.201
Adebara and Abdul-Mageed [2022] Adebara I, Abdul-Mageed M (2022) Towards afrocentric NLP for African languages: Where we are and where we can go. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, pp 3814–3841, DOI 10.18653/v1/2022.acl-long.265, URL https://aclanthology.org/2022.acl-long.265
van Esch et al [2022] van Esch D, Lucassen T, Ruder S, Caswell I, Rivera CE (2022) Writing system and speaker metadata for 2,800+ language varieties. In: Proceedings of the Language Resources and Evaluation Conference, Marseille, France, pp 5035–5046, URL https://aclanthology.org/2022.lrec-1.538
Gauthier et al [2016] Gauthier E, Besacier L, Voisin S, Melese M, Elingui UP (2016) Collecting resources in sub-Saharan African languages for automatic speech recognition: a case study of Wolof. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), European Language Resources Association (ELRA), Portorož, Slovenia, pp 3863–3867, URL https://aclanthology.org/L16-1611
Nguer et al [2016] Nguer EH, Khoulé M, Thiaré O, Cissé MT, Mangeot M (2016) Dictionnaires wolof en ligne : État de l’art et perspectives, URL https://hal.archives-ouvertes.fr/hal-01311413, working paper or preprint
Dione [2012] Dione CMB (2012) A morphological analyzer for Wolof using finite-state techniques. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), European Language Resources Association (ELRA), Istanbul, Turkey, pp 894–901, URL http://www.lrec-conf.org/proceedings/lrec2012/pdf/572_Paper.pdf
Dione [2020] Dione CMB (2020) Implementation and evaluation of an LFG-based parser for Wolof. In: Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, pp 5128–5136, URL https://aclanthology.org/2020.lrec-1.631
Dione [2019] Dione CB (2019) Developing Universal Dependencies for Wolof. In: Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019), Association for Computational Linguistics, Paris, France, pp 12–23, DOI 10.18653/v1/W19-8003, URL https://aclanthology.org/W19-8003
Lo et al [2016] Lo A, Nguer EHM, Abdoulaye N, Dione CB, Mangeot M, Khoule M, Bao-Diop S, Cissé MT (2016) Correction orthographique pour la langue wolof : état de l’art et perspectives. In: JEP-TALN-RECITAL 2016: Traitement Automatique des Langues Africaines TALAF 2016, Paris, France, URL https://hal.archives-ouvertes.fr/hal-02054917
Nguer et al [2020] Nguer EM, Lo A, Dione CMB, Ba SO, Lo M (2020) SENCORPUS: A French-Wolof parallel corpus. In: Proceedings of the 12th Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, pp 2803–2811, URL https://aclanthology.org/2020.lrec-1.341
Dione et al [2022] Dione CMB, Lo A, Nguer EM, Ba S (2022) Low-resource neural machine translation: Benchmarking state-of-the-art transformer for Wolof<->French. In: Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, pp 6654–6661, URL https://aclanthology.org/2022.lrec-1.717
Adelani et al [2022] Adelani D, Alabi J, Fan A, Kreutzer J, Shen X, Reid M, Ruiter D, Klakow D, Nabende P, Chang E, Gwadabe T, Sackey F, Dossou BFP, Emezue C, Leong C, Beukman M, Muhammad S, Jarso G, Yousuf O, Niyongabo Rubungo A, Hacheme G, Wairagala EP, Nasir MU, Ajibade B, Ajayi T, Gitau Y, Abbott J, Ahmed M, Ochieng M, Aremu A, Ogayo P, Mukiibi J, Ouoba Kabore F, Kalipe G, Mbaye D, Tapo AA, Memdjokam Koagne V, Munkoh-Buabeng E, Wagner V, Abdulmumin I, Awokoya A, Buzaaba H, Sibanda B, Bukula A, Manthalu S (2022) A few thousand translations go a long way! leveraging pre-trained models for African news translation. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, pp 3053–3070, DOI 10.18653/v1/2022.naacl-main.223, URL https://aclanthology.org/2022.naacl-main.223
Nekoto et al [2020] Nekoto W, Marivate V, Matsila T, Fasubaa T, Fagbohungbe T, Akinola SO, Muhammad S, Kabongo Kabenamualu S, Osei S, Sackey F, Niyongabo RA, Macharm R, Ogayo P, Ahia O, Berhe MM, Adeyemi M, Mokgesi-Selinga M, Okegbemi L, Martinus L, Tajudeen K, Degila K, Ogueji K, Siminyu K, Kreutzer J, Webster J, Ali JT, Abbott J, Orife I, Ezeani I, Dangana IA, Kamper H, Elsahar H, Duru G, Kioko G, Espoir M, van Biljon E, Whitenack D, Onyefuluchi C, Emezue CC, Dossou BFP, Sibanda B, Bassey B, Olabiyi A, Ramkilowan A, Öktem A, Akinfaderin A, Bashir A (2020) Participatory research for low-resourced machine translation: A case study in African languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, pp 2144–2160, DOI 10.18653/v1/2020.findings-emnlp.195, URL https://aclanthology.org/2020.findings-emnlp.195
Ranathunga et al [2021] Ranathunga S, Lee ESA, Skenduli MP, Shekhar R, Alam M, Kaur R (2021) Neural machine translation for low-resource languages: A survey. ArXiv abs/2106.15115
Jónsson et al [2020] Jónsson HP, Símonarson HB, Snæbjarnarson V, Steingrímsson S, Loftsson H (2020) Experimenting with different machine translation models in medium-resource settings. In: Sojka P, Kopeček I, Pala K, Horák A (eds) Text, Speech, and Dialogue, Springer International Publishing, Cham, pp 95–103
Papineni et al [2002] Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318, DOI 10.3115/1073083.1073135, URL https://aclanthology.org/P02-1040
Domingo et al [2018] Domingo M, Garcıa-Martınez M, Helle A, Casacuberta F, Herranz M (2018) How Much Does Tokenization Affect Neural Machine Translation? arXiv e-prints arXiv:1812.08621, 1812.08621
Eberhard et al [2019] Eberhard D, Simons G, Fennig C (2019) Ethnologue: Languages of the World, 22nd Edition
Adelani et al [2021] Adelani DI, Abbott J, Neubig G, D’souza D, Kreutzer J, Lignos C, Palen-Michel C, Buzaaba H, Rijhwani S, Ruder S, Mayhew S, Azime IA, Muhammad SH, Emezue CC, Nakatumba-Nabende J, Ogayo P, Anuoluwapo A, Gitau C, Mbaye D, Alabi J, Yimam SM, Gwadabe TR, Ezeani I, Niyongabo RA, Mukiibi J, Otiende V, Orife I, David D, Ngom S, Adewumi T, Rayson P, Adeyemi M, Muriuki G, Anebi E, Chukwuneke C, Odu N, Wairagala EP, Oyerinde S, Siro C, Bateesa TS, Oloyede T, Wambui Y, Akinode V, Nabagereka D, Katusiime M, Awokoya A, MBOUP M, Gebreyohannes D, Tilaye H, Nwaike K, Wolde D, Faye A, Sibanda B, Ahia O, Dossou BFP, Ogueji K, DIOP TI, Diallo A, Akinfaderin A, Marengereke T, Osei S (2021) MasakhaNER: Named entity recognition for African languages. Transactions of the Association for Computational Linguistics 9:1116–1131, DOI 10.1162/tacl_a_00416, URL https://aclanthology.org/2021.tacl-1.66
Pinnis [2018] Pinnis M (2018) Tilde’s parallel corpus filtering methods for WMT 2018. In: Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Association for Computational Linguistics, Belgium, Brussels, pp 939–945, DOI 10.18653/v1/W18-6486, URL https://aclanthology.org/W18-6486
Klein et al [2017] Klein G, Kim Y, Deng Y, Senellart J, Rush A (2017) OpenNMT: Open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, Association for Computational Linguistics, Vancouver, Canada, pp 67–72, URL https://www.aclweb.org/anthology/P17-4012
Hochreiter and Schmidhuber [1997] Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural computation 9:1735–80, DOI 10.1162/neco.1997.9.8.1735
Kingma and Ba [2015] Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. CoRR abs/1412.6980
Kudo and Richardson [2018] Kudo T, Richardson J (2018) SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Brussels, Belgium, pp 66–71, DOI 10.18653/v1/D18-2012, URL https://aclanthology.org/D18-2012
Sennrich et al [2016] Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Berlin, Germany, pp 1715–1725, DOI 10.18653/v1/P16-1162, URL https://aclanthology.org/P16-1162
Post [2018] Post M (2018) A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers, Association for Computational Linguistics, Brussels, Belgium, pp 186–191, DOI 10.18653/v1/W18-6319, URL https://aclanthology.org/W18-6319
Wieting et al [2019] Wieting J, Berg-Kirkpatrick T, Gimpel K, Neubig G (2019) Beyond BLEU:training neural machine translation with semantic similarity. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp 4344–4355, DOI 10.18653/v1/P19-1427, URL https://aclanthology.org/P19-1427
Araabi and Monz [2020] Araabi A, Monz C (2020) Optimizing transformer for low-resource neural machine translation. In: Proceedings of the 28th International Conference on Computational Linguistics, International Committee on Computational Linguistics, Barcelona, Spain (Online), pp 3429–3435, DOI 10.18653/v1/2020.coling-main.304, URL https://aclanthology.org/2020.coling-main.304