This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Waseda University, IPS   11email: [email protected]
22institutetext: Waseda University, IPS   22email: [email protected]
33institutetext: Waseda University, IPS   33email: [email protected]

High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Hengjie Liu 11    Ruibo Hou 22    Yves Lepage 33
Abstract

Back translation, as a technique for extending a dataset, is widely used by researchers in low-resource language translation tasks. It typically translates from the target to the source language to ensure high-quality translation results. This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource settings. We realize this concept by employing a Generative Adversarial Network (GAN), which augments the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator. Additionally, this paper integrates Translation Memory (TM) with NMT, increasing the amount of data available to the generator. Moreover, we propose a novel procedure to filter the synthetic sentence pairs during the augmentation process, ensuring the high quality of the data.

Keywords:
Back translation Generative Adversarial Networks Translation Memory High-quality filtering

1 Introduction

There are many languages around the world that are on the brink of extinction. Machine translation plays a crucial role in preserving endangered languages. However, a common challenge in this endeavor is the scarcity of parallel corpora available online, which hinders the development of effective translation systems.

In recent years, the rise of Neural Machine Translation (NMT) [18] make to a technological leap. At the beginning of the research of NMT, researchers used deep neural networks, especially Recurrent Neural Network (RNN) and Long Short-term Memory Networks (LSTM), to better capture language context and complex structures. Moreover, some researchers transfered Convolution Neural Network (CNN) in the classfication task of Natural Language Processing (NLP) [16]. However, today, most of the researchers design their models based on transformer [19]. The transformer model, based on attention mechanisms, allows more flexible focus on different parts of the input sentence, further improving translation accuracy. It also set new benchmarks for most other tasks in the field of NLP.

Low-resource translation [9] refers to the situation where machine translation encounters a shortage of parallel corpora for training models. This commonly occurs for languages with limited linguistic resources or smaller speaker populations, leading to difficulties in achieving effective translation performance.

A Translation Memory (TM) is a repository that archives pairs of source sentences along with their corresponding translations. Recent studies have validated the beneficial impact of TM on enhancing NMT models. This enhancement has been demonstrated through various approaches, including concatenating both the source and target side of the TM [20] and the target side of the TM with the source input [22], encoding TM and source input separately [5], and leveraging TM in Non-autoregressive machine translation models [23]. However, research [1] indicates that integrating TM into the NMT model does not yield improvements over the baseline NMT model when applied to the same task in a low-resource setting.

In low-resource translation tasks, researchers often seek to expand parallel corpora to enhance model performance. Back-translation is a popular data augmentation technique among researchers. However, it has some drawbacks. The sentences generated through back-translation frequently differ from natural language sentences, potentially disrupting and degrading the translator’s training process. A proposed method utilizes a monolingual corpus on the target side for back-translation [15]. This approach involves back-translating target language sentences to the source language and adding these synthetic sentences to the parallel data, ensuring high-quality target-side translations. Building on this work, [2] investigated various aspects of back-translation and proposed several sampling strategy variations targeting difficult-to-predict words using prediction losses and word frequencies. Additionally, iterative back-translation to expand low-resource corpora was explored by [7], who proposed a method to generate increasingly better synthetic parallel data from monolingual data to train neural machine translation systems.

Currently, when using back-translation to handle low-resource translation tasks, researchers typically use monolingual corpora from the target language to ensure that the synthetic sentences does not interfere with the model in training process. However, when the target language is a low-resource language, the available monolingual corpora are extremely limited. Therefore, this paper aims to propose a method that utilizes monolingual corpora from the source language to improve low-resource translation tasks. In devising this method, the paper leverages the structural characteristics of Generative Adversarial Networks (GAN) to achieve this goal.

GAN architectures have not been extensively applied in NMT tasks, primarily due to the inherent instability of the GAN training process, which is especially evident in NMT. A simple GAN architecture for NMT was proposed by [21], suggesting the use of a Convolution Neural Network (CNN) as a discriminator and an NMT model as a generator. Given the exceptional performance of the Transformer model in various NLP tasks, [24] proposed employing a Transformer as a generator. To address the training instability issues of traditional GANs in NMT, [25] introduced a novel Bidirectional Generative Adversarial Network (BGAN) for NMT. However, even when other researchers employ GANs to tackle NMT tasks, they primarily focus on model design and selection rather than effectively harnessing the adversarial nature of the GAN framework.

In this paper, integrating the findings of previous work, we propose harnessing data augmentation of low-resource via GAN integrating TM. Here are our contributions:

  • We enable the utilization of a source-side monolingual corpus without interfering with the translator’s performance by employing synthetic data.

  • We integrate a TM into the generator of GAN, enlarging the amount of data which can be trained by generator.

  • We design a novel filtering procedure to ensure the high-quality of the translations.

2 Methodology

2.1 Integrating TM into NMT

Refer to caption
Figure 1: Process of integrating TM into NMT, where d(s,st)d(s,s_{t}) means the Euclidean distance between sentence vectors of source input ss and a source sentence sts_{t} in TM. The input consists of ss, sts_{t} and ttt_{t}. The corresponding output is the translation tt of ss.

We employ the retrieval method to find sentences sts_{t} in the TM that are semantically similar to the source input ss, along with their corresponding target sentences ttt_{t}. Here, we introduce a threshold for the Euclidean distance between sentence vectors to ensure the quality of the retrieved TM. To ensure the quality of the retrieved synthetic sentences that will be introduced in Section 2.2, we will discuss how to control generation in Section 2.2 and how to filter high qulaity sentences in Section 2.3. Additionally, considering the limited number of sentences in the low-resource language TM, we avoid setting this threshold too high or too low. In subsequent experiments, the threshold is set to 0.5 [22].

i=1n[si(st)i]20.5\sqrt{\sum_{i=1}^{n}[s_{i}-(s_{t})_{i}]^{2}}\leq 0.5 (1)

Sentence pairs (st,tt)(s_{t},t_{t}) with a distance below this threshold will be concatenated with ss using the following format: ’[SEP] ss [SEP] sts_{t} [SEP] ttt_{t}’, where [SEP] denote separators. The process of integrating TM into the input is presented in Figure 1.

Retrieval Method   Suppose that our retrieval dataset is a set of source and target sentence pairs (st,tt)(s_{t},t_{t}), which is the same set that would be used as a training dataset for an NMT system. For a given source input, denoted as ss, it is treated as the query to do semantical retrieval in retrieval dataset. The corresponding semantically similar source sentence, denoted as sts_{t} in the TM, serves as the associated value. Consequently, obtaining the corresponding target sentence ttt_{t}, and the sentence pair (st,tt)(s_{t},t_{t}) becomes a straightforward process.

Firstly, we input each sentence in the TM into a pre-trained sentence embedding model [14] to generate sentence vectors. Subsequently, we calculate the similarity score between two sentences by computing the Euclidean distance between their respective sentence vectors.

To facilitate rapid retrieval between the input vector representation and the corresponding vector of sentences in the TM, we employ FAISS toolkit [8].

2.2 Back-Translation Leveraging Monolingual Corpus on Source Side

Refer to caption
Figure 2: The way utilizing monolingual corpus in source side enhance NMT task. The Generator is a vanilla Transformer model. The Discriminator is a fusion neural network model design by ourselves.

In this paper, integrating the findings of previous work and the current research landscape of NMT, we use a vanilla Transformer as a generator, GG, for complex translation tasks, to generate synthetic data in the target language, i.e., the low-resource language in our settings. To build this generator GG, we design a fusion neural network as a discriminator DD, composed of parallel BiLSTM and CNN, handling a binary classification task. As [4, 21, 25] explained, DD and GG play the following two-player minimax game with value function V(G,D)V(G,D):

min𝐺max𝐷V(D,G)=𝔼(x,y)Pdata(x,y)[logD(x,y)]+𝔼(x1,y1)PG(x,y)[log(1D(x,y))]\begin{split}\underset{G}{\min}~{}\underset{D}{\max}~{}V(D,G)=\mathbb{E}_{(x,y)\sim P_{data}(x,y)}[\log D(x,y)]\\ +~{}\mathbb{E}_{(x_{1},y_{1}^{\prime})\sim P_{G}(x,y)}[\log(1-D(x,y^{\prime}))]\end{split} (2)

In the Equation (2), (x,y)(x,y) is a sentence pair from the bilingual corpus, (x1,y1)(x_{1},y_{1}^{\prime}) is a source sentence in the monolingual corpus combined with the target sentence generated by GG. PdataP_{data} represents the real data distribution and PGP_{G} denotes the generator distribution. In this way, DD continually learns how to distinguish real and natural sentences and not so natural sentences generated by GG. Meanwhile, GG strives to produce the most natural sentences to deceive DD.

Training Strategy   Figure 2 illustrates our process of performing back-translation by leveraging a monolingual corpus on the source side. We employ a vanilla Transformer model as the GAN generator. The monolingual corpus, which is similar to but distinct from the bilingual corpus, is utilized to supply negative samples to the discriminator. In the training process, we first reform the monolingual corpus by integrating TM as the input of the generator. After obtaining the translations from the generator, we label these synthetic translation sentences from the monolingual corpus as fake sentences. We separate the natural target sentences from the bilingual corpus as ground-truth sentences. Our discriminator is trained by combining these two batches of sentences, labeled as unnatural and natural (fake and true in GAN parlance), separately. The output of the discriminator from fake sentences is used to calculate the loss value with 0, while the output from true sentences is used to calculate the loss value with 11. We add these loss values together with the same weight and propagate the gradient back to the discriminator to guide the classification task.

On the generator side, two loss functions are employed to improve performance. Since this is essentially still a translation task, we use the original loss function of the translation task. This differs somewhat from a classical GAN model. The loss value is derived from training the generator using the bilingual corpus. Additionally, we feed the translations generated from the bilingual corpus into the discriminator. The discriminator’s outputs are then used to compute the cross-entropy loss with 11, which is fed back to the generator. The generator combines this loss with the translation task loss using a smaller weight. This approach aims to assist in improving the translation model. In this way, the generator continuously produces more natural sentences to deceive the discriminator.

By using monolingual corpora in this manner, we observe that synthetic sentences derived from monolingual translation not only avoid interfering with the translation model but also enhance the discriminator’s capability along with real sentences. In turn, the discriminator helps improve the performance of the generator. Thus, we achieve the goal of using monolingual corpora on the source side to improve the model’s translation ability without negatively impacting the translation model.

Refer to caption
Figure 3: A process for filtering high-quality translations. We filter both the source sentences and the target sentences. We calculate features from natural language corpora (the ratio of sentence length and perplexity) to serve as filtering criteria. This allow us to filter translations that are more fit with natural language sentences. Finally, we validate the effectiveness of the translation results using standard data augmentation experiments.

2.3 High-quality Filtering

To demonstrate the effectiveness of the low-resource sentences translated by our generator, we utilize data augmentation experiments for verification. However, in a standard data augmentation process, the synthetic sentences may vary in quality, and we cannot ensure that all sentences in the synthetic corpus are of high quality. Based on this, this paper proposes an effective high-quality filtering process, as illustrated in Figure 3.

Similar Domain Selection   Since the source language is high-resource, we have a large amount of data to choose from. To initially ensure the quality of the translated sentences, we perform similar domain selection on the original monolingual corpus. First, we build a retrieval database based on the original monolingual corpus. For each source language sentence ss in the original bilingual corpus, we select the closest sentence sts_{t} from the retrieval database. These sts_{t} sentences form the monolingual corpus to be translated after the similar domain selection.

High-quality Filtering Method   To filter high-quality sentences in the translation, we designed a high-quality filtering. As described by Gale-Church alignment algorithm [3], in a natural bilingual parallel corpus, the length ratio between source and target sentences typically varies around a fixed value, and these ratios generally follow a Gaussian distribution. This paper posits that, besides length, other features of parallel sentence pairs also adhere to such a specific relationship. We use a language model toolkit [6] to train NN-gram models for both the source (de) and target (hsb) languages and compute their perplexities. Preliminary experiments reveal that the distribution of perplexity ratios for parallel sentence pairs also approximates a Gaussian distribution. Hence, we use the interval [mean - standard deviation; mean + standard deviation] of the ratio of these two features (length and perplexity) in a natural parallel corpus as the filtering criterion for high quality. For each translation sentences, we calculate the length and perplexity ratios between the source sentence and itself. Only if both ratios fall within the filtering criterion, the translation is considered high-quality and passes through the filter.

Data Augmentation   We conduct a standard data augmentation experiment to demonstrate the effectiveness of our filtering method. We combined the filtered high-quality translations with their source sentences to form a synthetic bilingual corpus. Then, we merged this synthetic bilingual corpus with the original bilingual corpus and used them to train a vanilla Transformer model. Finally, we calculated relevant metrics to prove the effectiveness of the translations from our generator.

3 Setup & Evaluation

3.1 Dataset

We use the German-Upper Sorbian (de-hsb) parallel corpus from WMT20111https://www.statmt.org/wmt20/unsup_and_very_low_res/ as the original bilingual corpus. The training sets of the dataset consist of 60,000 parallel sentences, while both the validation and test sets contain 2,000 sentences each. We choose German monolingual corpus from WMT14222https://www.statmt.org/wmt14/training-monolingual-news-crawl/ as the original monolingual corpus. This German monolingual corpus consist of 1,879,765 German sentences. We apply filtering and do not utilize all of the data. Since we will try to understand what is the necessary and sufficient amount of additional data to attain the best performance. The statistics of the corpora used are shown in Table 1.

Corpus Language # sentences Sentence length
in words in characters
bilingual German (de) 60,000 ×\times 2 12.1 ± 6.9 83.4 ± 51.7
Upper Sorbian (hsb) 10.7 ± 6.3 71.6 ± 45.4
monolingual German (de) 1,879,765 15.4 ± 9.2 108.5 ± 64.8
Table 1: Statistics of the corpora used

3.2 Models & Evaluation

Given the outstanding performance of Transformers in NLP tasks, especially NMT, we choose a vanilla Transformer model as our generator. Its encoder and decoder each have 8 heads and 12 layers in total. We set the embedding dimension to 512 and the dropout rate to 0.15. We employe a warm-up strategy, after which the learning rate decays with the inverse square root schedule. On the discriminator side, we use a parallel combination of CNN and BiLSTM. The CNN had a convolutional kernel size of 16×116\times 1, while the BiLSTM had a hidden layer size of 256. Both the discriminator and generator use an Adam optimizer.

All of our models and training processes were implemented using PyTorch [11]. The FAISS toolkit [8] was utilized for similar sentence retrieval, while the kenLM toolkit [6] was employed for training NN-gram models. All model training was conducted on a single A4500 GPU. To evaluate the translation results, we used BLEU [10], chrF2 [12], and TER [17] as the evaluation metrics. The calculation of these metrics were implemented through the sacreBLEU toolkit [13].

4 Experimental Results

4.1 Performance of Generators

The first group of rows in Table 2 shows the performance of the generator with different components. In this series of experiments, we used a vanilla Transformer model as the baseline. This model, serving as the generator, achieved a BLEU score of 37.2 for translation. Building upon this baseline, we incorporated different components.

We first integrated TM into the input of the model. The results surpassed the baseline by 1.4 BLEU points. Considering the confidence intervals. This improvement is not significant, as the intervals for the baseline model (37.2 ± 1.2) and the model with TM integrated (38.6 ± 1.2) overlap (38.2 - 1.2 = 37.0 << 38.4 = 37.2 + 1.2).

Subsequently, we trained the generator using a monolingual corpus in conjunction with a GAN. However, we found that the performance improvement was not significant (37.3 - 1.1 = 36.2 << 38.4 = 37.2 + 1.2). We believe that this is due to the instability of GAN training, which also confirms that GANs are not well-suited for NMT tasks [25]. Of course, this is also related to the weight proportions of different loss values during the training process. Nevertheless, utilizing GANs was not our primary purpose. Our most important goal in employing GANs was to introduce source-side monolingual corpora into the model training process.

From the metrics, we can observe that when both TM and GAN are integrated into the system, the translation performance improves by 3.7 BLEU points compared to the baseline. This improvement is significant, as the intervals for the baseline model (37.2 ± 1.2) and the model with TM and GAN integrated (40.9 ± 1.2) do not overlap (40.9 - 1.2 = 39.7 >> 38.4 = 37.2 + 1.2). These four sets of experiments demonstrate that TM is an effective method for enhancing the translation performance of the model. We believe that this is because TM increases the amount of available data for model training and serves as a prompt for the model. The GAN structure, on the other hand, enables the utilization of source-side monolingual corpora. The synthetic sentences not only do not interfer with the generator but also indirectly facilitate the learning process of the generator through the discriminator.

For comparison with the method proposed by [15], a reverse translation model (from target to source) was trained in this study (see the 1st row in Table 2).

New sentence Amount of Filter with BLEU chrF2 TER
pairs from new sentence pairs random perplexity length
Vanilla Transformer (reverse side) 0 39.6 ± 1.1 66.6 ± 0.8 45.1 ± 1.0
Vanilla Transformer 0 37.2 ± 1.2 64.2 ± 0.8 46.1 ± 1.1
+TM 38.6 ± 1.2 64.5 ± 0.8 44.3 ± 1.1
+GAN 37.3 ± 1.1 64.6 ± 0.8 45.9 ± 1.0
+TM +GAN 40.9 ± 1.2 67.6 ± 0.8 39.8 ± 1.0
+TM +GAN 60,000 37.4 ± 1.1 64.3 ± 0.8 45.6 ± 1.0
Sennrich’s work [15] 60,000 37.7 ± 1.2 64.8 ± 0.8 44.8 ± 1.1
+TM +GAN 10,000 38.2 ± 1.2 65.0 ± 0.8 45.5 ± 1.1
+TM +GAN 40,052 38.1 ± 1.1 64.7 ± 0.8 45.0 ± 1.0
+TM +GAN 19,408 38.8 ± 1.2 65.1 ± 0.9 44.5 ± 1.1
+TM +GAN 13,464 38.8 ± 1.1 65.3 ± 0.8 44.4 ± 1.0
Sennrich’s work [15] 13,464 39.1 ± 1.2 65.5 ± 0.8 44.3 ± 1.0
+TM +GAN 13,464 40.1 ± 1.3 66.2 ± 0.8 43.6 ± 1.1
Table 2: The table compares the performance of different sentence filtering methods on a NMT and augmentation task. The first group of rows in the table presents the NMT task used to train our generator model and its performance. The second group of rows shows that we used the generator to augment the original bilingual corpus. These augmented bilingual corpora were then used to train a vanilla Transformer model, which was subsequently evaluated on the test set.

4.2 High-quality Filtering & Results

As shown in the second group of rows of the Table 2, we used the best performing generator from the first group of rows (combining TM and GAN) for translation and conducted data augmentation experiments following the process illustrated in Figure 3.

Initially, we augmented the original bilingual corpus with the entire monolingual corpus (60,000 German sentences) and their translations without any filtering (see the 6th row in Table 2). We found that the results hardly improved. After analysis, we concluded that this was due to the gap between the synthetic translation sentences and real natural language. Augmenting with an equal number of synthetic sentences as real sentences would interfere with the translation model, which is consistent with our previous analysis.

Therefore, we reduced the quantity and randomly selected 10,000 synthetic parallel sentence pairs for augmentation (see the 8th row in Table 2). As expected, the results showed preliminary improvement, surpassing the baseline by 1.0 BLEU point.

We then changed our filtering strategy to ensure the high quality for the selected sentence pairs. As introduced in Section 2.3, we employed three filtering methods using the sentence length ratio, sentence perplexity ratio, and a combination of length ratio and perplexity ratio from the original bilingual corpus as the filtering criterion. The experimental results demonstrate that using both length and perplexity ratios as the filtering criterion yields the best performance, surpassing the baseline by 2.9 BLEU points (see row number 2 and 13 in Table 2: 40.11.3=38.8>38.4=37.2+1.240.1-1.3=38.8>38.4=37.2+1.2).

Additionally, to prove that the experimental results are not entirely influenced by the number of sentences, we randomly selected the same number of sentences (13,464) as the best-performing method (see the 11th row in Table 2).

For Sennrich’s method, 13,464 sentences were also randomly selected. In comparison, our method demonstrated better performance (see row number 12 and 13 in Table 2). However, the improvement was not significant. In the back-translation method employed by Sennrich, the target sentences are natural. Therefore, it can be concluded that after our high-quality filtering process, the synthesized target sentences also more closely resemble natural sentences. The results were lower than those obtained using the two filtering methods. Therefore, we conclude that our filtering process is effective.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Figure 4: Figures (a) and (b) illustrate the distribution of the original bilingual corpus in terms of sentence length ratio and perplexity ratio, respectively. In contrast, (c) and (d) depict the distribution of our newly created bilingual corpus. Subsequently, we first used the length ratio of the original bilingual corpus as the filtering criterion (1.18 ± 0.17) to filter the created bilingual corpus, with the corresponding distribution shown in (e) and (f). Figures (g) and (h) present the distribution when using the perplexity ratio as the filtering criterion (1.01 ± 0.37). Finally, (i) and (j) demonstrate the distribution obtained by employing both length and perplexity as the filtering methods.

High-quality Filtering   Figure 4 specifically illustrates the distribution of sentence length and perplexity ratios after filtering. Comparing subfigures 4 and 4, we observe that the features of the synthetic sentences translated by our generater largely conform to the features of natural sentences in terms of length. However, there is still a discrepancy in perplexity, as shown in subfigures 4 and 4. Subfigure 4 is shifted to the left overall compared to 4, which precisely reflects that some of the synthetic sentences (hsb) have higher perplexity, resulting in smaller ratios. This hints at some deficiency in the synthesized sentences. For that reason, subsequently, we used the mean ± standard deviation of the sentence length ratio and perplexity ratio of the sentence pairs in the original bilingual corpus as the filtering criterion (1.18 ± 0.17) and (1.01 ± 0.37), respectively. The specific distribution is shown in the Figure 4.

5 Conclusion

This paper introduced a novel approach to leveraging a monolingual corpus on the source side when the source language was highly resourced, but the target language was low-resourced, to support NMT in low-resource scenarios. We realized this concept by employing a GAN, which augmented the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator. Furthermore, this paper integrated TM with NMT, increasing the amount of data available to the generator. Additionally, we proposed a novel criterion to filter the synthetic sentence pairs in the augmentation process, ensuring the high quality of the data.

However, our approach also has limitations. The training process of GANs is unstable and requires a substantial amount of time. Moreover, the semantics and other information inherent in natural language are complex and implicit, making GANs not particularly suitable for natural language translation tasks. As for TM, it consumes a significant amount of computational resources. Regarding our high-quality filtering process, it cannot guarantee that all the filtered sentence pairs are of high quality. Future research can delve into these aspects for further improvements.

References

  • [1] Cai, D., Wang, Y., Li, H., Lam, W., Liu, L.: Neural machine translation with monolingual translation memory. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 7307–7318. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-long.567, https://aclanthology.org/2021.acl-long.567
  • [2] Fadaee, M., Monz, C.: Back-translation sampling by targeting difficult words in neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. pp. 436–446. Association for Computational Linguistics, Brussels, Belgium (Oct-Nov 2018). https://doi.org/10.18653/v1/D18-1040, https://aclanthology.org/D18-1040
  • [3] Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19(1), 75–102 (1993), https://aclanthology.org/J93-1004
  • [4] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. vol. 27. Curran Associates, Inc. (2014), https://proceedings.neurips.cc/paper_files/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf
  • [5] He, Q., Huang, G., Cui, Q., Li, L., Liu, L.: Fast and accurate neural machine translation with translation memory. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 3170–3180. Association for Computational Linguistics, Online (Aug 2021). https://doi.org/10.18653/v1/2021.acl-long.246, https://aclanthology.org/2021.acl-long.246
  • [6] Heafield, K.: KenLM: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation. pp. 187–197. Association for Computational Linguistics, Edinburgh, Scotland (Jul 2011), https://aclanthology.org/W11-2123
  • [7] Hoang, V.C.D., Koehn, P., Haffari, G., Cohn, T.: Iterative back-translation for neural machine translation. In: Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. pp. 18–24. Association for Computational Linguistics, Melbourne, Australia (Jul 2018). https://doi.org/10.18653/v1/W18-2703, https://aclanthology.org/W18-2703
  • [8] Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7(3), 535–547 (2019)
  • [9] Koehn, P., Knowles, R.: Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation. pp. 28–39. Association for Computational Linguistics, Vancouver (Aug 2017). https://doi.org/10.18653/v1/W17-3204, https://aclanthology.org/W17-3204
  • [10] Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. pp. 311–318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (Jul 2002). https://doi.org/10.3115/1073083.1073135, https://aclanthology.org/P02-1040
  • [11] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch. In: NIPS-W (2017)
  • [12] Popović, M.: chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the Tenth Workshop on Statistical Machine Translation. pp. 392–395. Association for Computational Linguistics, Lisbon, Portugal (Sep 2015). https://doi.org/10.18653/v1/W15-3049, https://aclanthology.org/W15-3049
  • [13] Post, M.: A call for clarity in reporting BLEU scores. In: Proceedings of the Third Conference on Machine Translation: Research Papers. pp. 186–191. Association for Computational Linguistics, Brussels, Belgium (Oct 2018). https://doi.org/10.18653/v1/W18-6319, https://aclanthology.org/W18-6319
  • [14] Reimers, N., Gurevych, I.: Sentence-bert: Sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (11 2019), https://arxiv.org/abs/1908.10084
  • [15] Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 86–96. Association for Computational Linguistics, Berlin, Germany (Aug 2016). https://doi.org/10.18653/v1/P16-1009, https://aclanthology.org/P16-1009
  • [16] Sharma, A.K., Chaurasia, S., Srivastava, D.K.: Sentimental short sentences classification by using cnn deep learning model with fine tuned word2vec. Procedia Computer Science 167, 1139–1147 (2020)
  • [17] Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. pp. 223–231. Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA (Aug 8-12 2006), https://aclanthology.org/2006.amta-papers.25
  • [18] Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Advances in Neural Information Processing Systems. vol. 27. Curran Associates, Inc. (2014), https://proceedings.neurips.cc/paper_files/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf
  • [19] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017), https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  • [20] Wang, Y., Lepage, Y.: Can the translation memory principle benefit neural machine translation? a series of extensive experiments with input sentence annotation. In: Dita, S., Trillanes, A., Lucas, R.I. (eds.) Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation. pp. 243–252. Association for Computational Linguistics, Manila, Philippines (Oct 2022), https://aclanthology.org/2022.paclic-1.27
  • [21] Wu, L., Xia, Y., Tian, F., Zhao, L., Qin, T., Lai, J., Liu, T.Y.: Adversarial neural machine translation. In: Zhu, J., Takeuchi, I. (eds.) Proceedings of The 10th Asian Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 95, pp. 534–549. PMLR (14–16 Nov 2018), https://proceedings.mlr.press/v95/wu18a.html
  • [22] Xu, J., Crego, J., Senellart, J.: Boosting neural machine translation with similar translations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 1580–1590. Association for Computational Linguistics, Online (Jul 2020). https://doi.org/10.18653/v1/2020.acl-main.144, https://aclanthology.org/2020.acl-main.144
  • [23] Xu, J., Crego, J., Yvon, F.: Integrating translation memories into non-autoregressive machine translation. In: Vlachos, A., Augenstein, I. (eds.) Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. pp. 1326–1338. Association for Computational Linguistics, Dubrovnik, Croatia (May 2023). https://doi.org/10.18653/v1/2023.eacl-main.96, https://aclanthology.org/2023.eacl-main.96
  • [24] Yang, Z., Chen, W., Wang, F., Xu, B.: Improving neural machine translation with conditional sequence generative adversarial nets. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). pp. 1346–1355. Association for Computational Linguistics, New Orleans, Louisiana (Jun 2018). https://doi.org/10.18653/v1/N18-1122, https://aclanthology.org/N18-1122
  • [25] Zhang, Z., Liu, S., Li, M., Zhou, M., Chen, E.: Bidirectional generative adversarial networks for neural machine translation. In: Proceedings of the 22nd Conference on Computational Natural Language Learning. pp. 190–199. Association for Computational Linguistics, Brussels, Belgium (Oct 2018). https://doi.org/10.18653/v1/K18-1019, https://aclanthology.org/K18-1019