This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Frustratingly Simple Decoding Method for
Neural Text Generation

Abstract

We introduce a frustratingly simple, highly efficient, and surprisingly effective decoding method, termed Frustratingly Simple Decoding (FSD), for neural text generation. The idea behind FSD is straightforward: We construct an anti-language model (anti-LM) based on previously generated text, which is employed to penalize the future generation of repetitive content. The anti-LM can be implemented as simple as an nn-gram language model or a vectorized variant. In this way, FSD incurs no additional model parameters and negligible computational overhead (FSD can be as fast as greedy search). Despite its simplicity, FSD is surprisingly effective and generalizes across different datasets, models, and languages. Extensive experiments show that FSD outperforms established strong baselines in terms of generation quality, decoding speed, and universality. The code is available at https://github.com/LHRYANG/FSD

Keywords: language model, decoding method, universality, efficiency

\NAT@set@cites

A Frustratingly Simple Decoding Method for
Neural Text Generation


Haoran Yang♠,∗thanks: Most work was done during an internship at Tencent AI Lab., Deng Cai♡,†thanks: Corresponding author., Huayang Li, Wei Bi, Wai Lam, Shuming Shi
The Chinese University of Hong Kong Tencent AI Lab Nara Institute of Science and Technology
{hryang, wlam}@se.cuhk.edu.hk, [email protected]
{jcykcai, victoriabi, shumingshi}@tencent.com

Abstract content

1.   Introduction

Neural text generation has attracted increasing attention from both academia and industry. The canonical approach factors the generation process in an autoregressive fashion, reducing the generation into a series of next-token predictions conditioned on their preceding sequences. With the development of large language models (LMs) Brown et al. (2020); Touvron et al. (2023a, b), the estimation of the probability distribution for next-token predictions has become remarkably accurate. However, when it comes to open-ended text generation, such as story generation Fan et al. (2018) and writing assistance Shi et al. (2022), perhaps counter-intuitively, searching for the most likely sequences (e.g., greedy search and beam search) often results in low-quality outputs. Concretely, the generations are prone to falling into tedious and repetitive loops, a notorious issue referred to as neural text degeneration Holtzman et al. (2020); Xu et al. (2022); Shi et al. (2024).

Refer to caption
Figure 1: FSD exploits the contrasts between the LM and the anti-LM, where the probabilities from the LM and the anti-LM are used as rewards and penalties respectively. In the above example, the top prediction of the LM is “driving”. However, the anti-LM also gives a large penalty to “driving" because it will result in repetition. Consequently, “wearing” is instead selected and the anti-LM is updated accordingly.

To address the above problem, two lines of research efforts have been devoted to devising better decoding strategies. The canonical approaches take random samples from the LM’s output distribution Fan et al. (2018); Holtzman et al. (2020); Meister et al. (2022); Hewitt et al. (2022). The introduced stochasticity can alleviate repetitive generation, however, it also increases the chance of unnatural topic drift and semantic incoherence. More recently, another class of approaches proposes to re-rank top candidate tokens using extra objectives. Concretely, contrastive search (CS) Su et al. (2022) uses a look-ahead mechanism and penalizes tokens compromising the isotropy of the LM’s latent space Ethayarajh (2019). Contrastive decoding (CD) Li et al. (2023a) searches for the token that maximizes the probability difference between the LM and another smaller LM with the same tokenization. Despite better generation quality is achieved, the look-ahead mechanism in CS and the running of an external LM in CD considerably increase computational overhead. Moreover, CS relies on the isotropic property of the LM and CD depends on another LM using the same tokenization, thereby limiting their applicability.

In this paper, we propose Frustratingly Simple Decoding (FSD) for addressing the degeneration issue with minimal computational cost and without any assumptions on the underlying LM. As illustrated in Figure 1, FSD works by imposing penalties on repetitive patterns that have appeared in the prefix. This is realized through an anti-LM that can capture and memorize these patterns. Specifically, at each generation step, both the LM and the anti-LM take the current prefix as input and separately produce two next-token distributions. The generation probabilities from the LM serve as rewards and those from the anti-LM act as penalties. FSD subtracts the penalties from the rewards, selects the token that maximizes the final score, and continuously updates the anti-LM based on the growing prefix. The anti-LM can be implemented as simple as an nn-gram language model or a vectorized variant, making FSD as fast as greedy search.

We perform extensive experiments to demonstrate the effectiveness, efficiency, and universality of FSD. The key findings can be summarized as follows: (1) On three canonical open-ended text generation benchmarks, the generation quality of FSD not only surpasses the standard top-pp sampling but also is comparable to, if not better than, recent state-of-the-art methods, according to both automatic and human evaluations. (2) FSD exhibits robustness in handling varying generation lengths, particularly demonstrating its superiority in generating longer sequences where existing state-of-the-art methods often struggle. (3) The generation speed of FSD is as fast as greedy search (the theoretical upper bound for autoregressive generation). The speed advantage over existing state-of-the-art methods amplifies as the generation length increases. (4) FSD shows versatility across a variety of models, languages, and tasks (e.g., instruction following and summarization).

2.   Related Work

Recent years have witnessed enormous progress in neural text generation, particularly with the success of large LMs Radford et al. (2019). The most straightforward heuristics for generating text from an LM is to find the most likely sequence estimated by the LM. Although maximizing the LM probabilities (e.g., greedy search and beam search) obtains excellent performance in close-ended text generation tasks (e.g., translation Sutskever et al. (2014) and summarization See et al. (2017)), these search-based methods suffer from generating nonsensical output in open-ended text generation tasks (e.g., story generation Fan et al. (2018)). One prominent issue is that they tend to generate dull and repetitive output Holtzman et al. (2020); Fu et al. (2021); Pillutla et al. (2021).

Decoding Methods

To tackle the above challenge, different decoding methods have been proposed, which can be broadly categorized into two classes. The first class is truncated sampling, where each token is randomly sampled from a truncated next-token distribution. For instance, top-kk sampling Fan et al. (2018) only samples from the kk most likely tokens. Top-pp sampling Holtzman et al. (2020) only considers the minimal set of top tokens that cover a specified percentage pp of the distribution. Typical sampling  Meister et al. (2022) sorts tokens according to the differences between distribution entropy and probabilities. Hewitt et al. (2022) truncate words whose probabilities are below an entropy-dependent threshold. Although sampling-based methods reduce repetitions, the randomness at each sampling step also increases the chance of incoherence and topic drift.

The second class of decoding methods is still search-based but optimizes a different objective. Contrastive Search (CS) Su et al. (2022) assumes the LM has an isotropic representation space and adds a penalty term that decreases the generation probabilities of tokens producing hidden states that are similar to the previous context. However, the look-ahead operation at each step brings considerable additional cost. Contrastive Decoding (CD) Li et al. (2023a) employs an amateur LM (a smaller pre-trained LM using the same tokenization) and penalizes undesired attributes associated with the amateur model. In contrast, FSD is much more lightweight and efficient; FSD only constructs an nn-gram model on-the-fly, requiring no external model and introducing negligible computational cost. In addition, FSD holds the potential for broader applicability as it does not assume the existence of an amateur LM or any properties of the LM.

Training Methods

Another group of methods attempts to improve text generation quality by fine-tuning the LMs with new training objectives. Welleck et al. (2020) propose unlikelihood training, which explicitly minimizes the generation probability of repetitive tokens. Lagutin et al. (2021) improve the generation using policy gradient with a repetition objective. Xu et al. (2022) learn to penalize probabilities of sentence-level repetitions from pseudo-repetitive data. Su et al. (2022) devise a contrastive training objective that encourages discriminative and isotropic token representations. In contrast, FSD simply employs off-the-shelf pre-trained LMs and requires zero training.

3.   Background

3.1.   Language Models

An LM is a probability distribution over token sequences. Given a sequence x1:t=x1,x2,,xtx_{1:t}=x_{1},x_{2},\ldots,x_{t} of length tt, LM assigns a probability p(x1:t)p(x_{1:t}) to the sequence, which is usually decomposed in an autoregressive fashion: p(x1:t)=i=1tp(xi|x<i)p(x_{1:t})=\prod_{i=1}^{t}p(x_{i}|x_{<i}).

NN-gram Language Model

The most traditional LM is the nn-gram model, which relies on the Markov assumption  Jurafsky and Martin (2009). In an nn-gram LM, the probability of the ii-th token only depends on the previous n1n-1 tokens, expressed as p(xi|x<i)=pn(xi|xin+1:i1)p(x_{i}|x_{<i})=p_{n}(x_{i}|x_{i-n+1:i-1}). This probability can be computed by evaluating the relative frequency counts within a training corpus:

pn(xi|xin+1:i1)=C(xin+1:i)C(xin+1:i1)p_{n}(x_{i}|x_{i-n+1:i-1})=\frac{C(x_{i-n+1:i})}{C(x_{i-n+1:i-1})} (1)

where C()C(\cdot) counts the number of occurrences of the input sequence within the training corpus. In practice, the probability distributions are often smoothed to improve the model’s generalizability. For example, the interpolation of nn-gram models of different orders can help prevent the LM from assigning zero probability to unseen sequences Tonella et al. (2014).

Neural Language Model

With the rise of deep learning, nn-gram LMs have been largely superseded by neural networks, for example, the GPT family Radford et al. (2019); Brown et al. (2020) and the LLaMA family Touvron et al. (2023a, b). These models are trained to predict the next token by conditioning on the preceding context: θ=i=1tlogpθ(xi|x<i)\mathcal{L}_{\theta}=-\sum_{i=1}^{t}\log p_{\theta}(x_{i}|x_{<i}), where θ\theta denotes the model parameters. With the capabilities acquired by large-scale pre-training, these neural LMs can be readily applied to text generation Liu et al. (2022).

3.2.   Open-Ended Text Generation

Most of our experiments are conducted on open-ended text generation tasks, where the input is a short prompt and the goal is to generate a fluent and coherent continuation. Formally, given a prompt x1:l=x1,x2,,xlx_{1:l}=x_{1},x_{2},\ldots,x_{l}, we aim to generate the next mm tokens, denoted by xl+1:l+m=xl+1,xl+2,,xl+mx_{l+1:l+m}=x_{l+1},x_{l+2},\ldots,x_{l+m}. A pre-trained neural LM can complete this task autoregressively by a series of next-token predictions:

pθ(xl+1:l+m|x1:l)=i=l+1l+mpθ(xi|x<i)p_{\theta}(x_{l+1:l+m}|x_{1:l})=\prod_{i=l+1}^{l+m}p_{\theta}(x_{i}|x_{<i})

Previous works have revealed that the decoding method that selects the token at each generation step has a significant impact on the generation quality Holtzman et al. (2020); Wiher et al. (2022). For example, greedy and beam search often result in repetitions while sampling-based methods suffer from incoherence Su et al. (2022); Li et al. (2023a).

4.   Method

We present our proposed decoding method, Frustratingly Simple Decoding (FSD), named after its remarkably straightforward nature. We begin by introducing the intuition and the general framework of FSD (section 4.1). We then describe the implementation of FSD in the discrete version (section 4.2) and further extend it to the vectorized version (section 4.3).

4.1.   Intuition & Framework

To produce coherent and diverse generations, it is crucial not only to select the most probable tokens but also to prevent repetitive content. While the former objective can be achieved using the original LM, the latter requires a mechanism for tracking previously generated content and reducing their likelihood of reoccurrence. To this end, we propose the construction of an anti-LM based on the preceding context. This anti-LM is expected to assign higher scores to tokens that will cause repetitions in the preceding context. Consequently, these scores serve as penalties. By integrating the original LM and the anti-LM, we can discourage repetitive token generation and promote other contextually appropriate choices.

Formally, when decoding the tt-th token, we calculate an FSD\mathrm{FSD} score for each candidate token vv:

FSD(v|x<t)=pθ(v|x<t)α×pω(v|x<t)\mathrm{FSD}(v|x_{<t})=p_{\theta}(v|x_{<t})-\alpha\times p_{\omega}(v|x_{<t}) (2)

where pθp_{\theta} and pωp_{\omega} represent the LM and the anti-LM respectively. The hyper-parameter α0\alpha\geq 0 is used to balance the two scores. In practice, we first select the top-kk most probable tokens according to pθ(|x)p_{\theta}(\cdot|x), denoted by V(k)V^{(k)}. The token in V(k)V^{(k)} with the largest FSD\mathrm{FSD} score is chosen as the tt-th token.

4.2.   NN-gram Model as anti-LM

Following the intuition described above, we start to devise the anti-LM. In principle, any language model capable of capturing patterns in a token sequence can be harnessed to implement the anti-LM. However, we note several critical design principles. First, the prediction of the anti-LM should be efficient, given that it is invoked at every decoding step. Second, the anti-LM should not assume any particular properties of the LM or the language, thus ensuring our method’s universal applicability across diverse settings. Last but not least, the update of the anti-LM should be easy, as it undergoes continuous evolution with the expanding prefix.

One natural (and perhaps the simplest) choice is the nn-gram LM, which offers two key advantages. First, all the operations (i.e., construction, prediction, and update) associated with an nn-gram model add little computational overhead. Second, the effectiveness and efficiency of nn-gram LM is scalable across different prefix lengths.

Construction and Update

Given an input prompt x1:lx_{1:l}, the nn-gram LM is constructed and updated as follows. Initially, the prompt x1:lx_{1:l} is split into nn-grams. These nn-grams can be stored as a set of key-value pairs 𝒟n\mathcal{D}_{n}. For each nn-gram xin+1:ix_{i-n+1:i}, the key is the first n1n-1 tokens xin+1:i1x_{i-n+1:i-1} and the value is the last token xix_{i}. After generating each new token, we update 𝒟n\mathcal{D}_{n} to include the new nn-gram composed by the last nn tokens in the sequence.

To calculate next-token probabilities, we use the last n1n-1 tokens in the sequence as the query. We first identify all key-value pairs in 𝒟n\mathcal{D}_{n} whose key precisely matches the query and then compute the probabilities according to Eq. 1. All of the above operations introduce little computational overhead compared to the running of the original neural LM.

Input : prefix x<tx_{<t}; nn-gram models with different orders from 1 to NN (p1,p2,pNp_{1},p_{2},\cdots p_{N}); candidate token vv; decay factor β=0.9\beta=0.9
1 Initialize r=1,cv=0r=1,c_{v}=0
2 for n=N,,2n=N,\cdots,2 do
3       if pn(v|x<t)0p_{n}(v|x_{<t})\neq 0 then
4             λn=rβ\lambda_{n}=r*\beta
5             r=rλir=r-\lambda_{i}
6             cv+=λnpn(v|xtn+1:t1)c_{v}\mathrel{+}=\lambda_{n}p_{n}(v|x_{t-n+1:t-1})
7            
8      
9cv+=rp1(v)c_{v}\mathrel{+}=rp_{1}(v)
Output : pω(v|x<t)=cvp_{\omega}(v|x_{<t})=c_{v}
Algorithm 1 Calculation of Penalty pω(v|x<t)p_{\omega}(v|x_{<t})

Smoothed NN-gram Model

An ordinary nn-gram model cannot penalize the m(m<n)m(m<n)-gram repetitions. Inspired by two common smoothing techniques in modern nn-gram models, back-off and interpolation Jurafsky and Martin (2009), we combine nn-gram models with different orders from n=1n=1 to NN (NN being the highest order). The result is a smoothed nn-gram model p^\hat{p}:

p^=λNpN+λN1pN1++λ1p1\hat{p}=\lambda_{N}p_{N}+\lambda_{N-1}p_{N-1}+\cdots+\lambda_{1}p_{1} (3)

where λn\lambda_{n} is the weight of pnp_{n} and n=1Nλn=1\sum_{n=1}^{N}\lambda_{n}=1. The detailed process is elaborated in Alg. 1. In brief, we enumerate nn-gram models from n=Nn=N to n=1n=1, setting λn\lambda_{n} to decrease exponentially with a decay factor β=0.9\beta=0.9, thus assigning greater weights to higher-order sub-models. The construction and update of the smoothed nn-gram LM are straightforward; We only need to maintain NN copies of key-value pairs (𝒟1,𝒟2,,𝒟N\mathcal{D}_{1},\mathcal{D}_{2},\ldots,\mathcal{D}_{N}) separately.

4.3.   Vectorized NN-gram Model

We further provide a vectorized version where the keys are represented using continuous vectors instead of discrete tokens. It offers two advantages compared with the discrete version. First, it possesses the ability to penalize not only identical but also similar patterns in the preceding context, thus allowing for more generalizable pattern recognition. Second, the computation of the vectorized version can be efficiently conducted on GPU, resulting in faster decoding speed.

Specifically, we use the hidden states from the last layer of the original LM as the keys. Let 𝒉1,𝒉2,,𝒉t1\boldsymbol{h}_{1},\boldsymbol{h}_{2},\ldots,\boldsymbol{h}_{t-1} be the hidden states for the current sequence x1:t1x_{1:t-1} (𝒉t1\boldsymbol{h}_{t-1} is used to predict the tt-th token in the original LM). Each key-value pair in the discrete version (xin+1:i1,xi)(x_{i-n+1:i-1},x_{i}) now turns to be (𝒉in+1:i1,xi)(\boldsymbol{h}_{i-n+1:i-1},x_{i}). Accordingly, the exact query-key matching in the discrete version becomes a “soft” vector matching. To predict the next token, the query is 𝒉tn+1:t1\boldsymbol{h}_{t-n+1:t-1} and the matching score between the query and a key 𝒉in+1:i1\boldsymbol{h}_{i-n+1:i-1} is computed as follows:

ci=cos(cat(𝒉in+1:i1),cat(𝒉tn+1:t1))c_{i}=\mathrm{cos}(\mathrm{cat}(\boldsymbol{h}_{i-n+1:i-1}),\mathrm{cat}(\boldsymbol{h}_{t-n+1:t-1})) (4)

where cos\mathrm{cos} computes cosine similarity and cat\mathrm{cat} denotes vector concatenation. For a candidate token vv that appears multiple times in the sequence, we take the largest matching score as its penalty score. In addition, we clip the penalty score to ensure it is always greater than or equal to zero.

5.   Experiments

wikinews wikitext book
div mau coh div mau coh div mau coh
Human 0.92 1 0.65 0.93 1 0.63 0.95 1 0.51
OPT-6.7b p=0.95 0.91 0.95 0.60 0.87 0.95 0.59 0.93 0.92 0.48
typical=0.95 0.94 0.93 0.58 0.93 0.93 0.56 0.95 0.89 0.45
CS 0.91 0.93 0.62 0.87 0.91 0.57 0.86 0.88 0.47
CD 0.93 0.95 0.69 0.89 0.95 0.69 0.87 0.95 0.61
FSD 0.95 0.93 0.66 0.94 0.93 0.61 0.95 0.85 0.51
FSD-vec 0.95 0.93 0.64 0.92 0.90 0.60 0.96 0.87 0.49
GPT2-XL p=0.95 0.94 0.96 0.60 0.92 0.94 0.57 0.94 0.95 0.46
typical=0.95 0.95 0.93 0.56 0.95 0.92 0.53 0.96 0.87 0.43
CS 0.93 0.92 0.64 0.86 0.92 0.60 0.88 0.89 0.48
CD 0.92 0.92 0.69 0.89 0.93 0.69 0.83 0.93 0.64
FSD 0.93 0.93 0.66 0.94 0.88 0.61 0.96 0.90 0.49
FSD-vec 0.93 0.93 0.64 0.93 0.90 0.58 0.96 0.91 0.47
GPT2-Medium p=0.95 0.96 0.94 0.56 0.96 0.92 0.53 0.97 0.92 0.43
typical=0.95 0.96 0.94 0.56 0.96 0.93 0.53 0.97 0.91 0.43
CS 0.03 0.14 0.65 0.02 0.07 0.64 0.01 0.03 0.50
CD 0.88 0.95 0.71 0.83 0.88 0.71 0.68 0.92 0.67
FSD 0.94 0.93 0.65 0.94 0.91 0.60 0.97 0.87 0.49
FSD-vec 0.94 0.90 0.60 0.92 0.86 0.55 0.93 0.92 0.44
Table 1: Automatic evaluation results. The best results (the closer to human the better) are boldfaced.

Our main experiments focus on open-ended text generation. This task has been used for evaluating various decoding methods in recent works Li et al. (2023a); Su et al. (2022); Lan et al. (2022) because it is particularly susceptible to the repetition issue. We follow the standard setups (section 5.1) and report the results in section 5.2. In addition, we assess the speed of the decoding methods in section 5.3, an essential aspect when considering real-world deployment. Moreover, we explore the universality of our proposed method in section 5.4 from several perspectives: (1) robustness across various models, languages, and datasets (2) versatility for tackling other tasks such as instruction following (the most popular use of LLMs) and closed-ended generation.

5.1.   Setup for Open-Ended Text Generation

Datasets & Models

Following previous works Su et al. (2022); Li et al. (2023a); Lan et al. (2022), we compare FSD and existing decoding methods on three English benchmarks. That is, wikinews111http://www.wikinews.org in the news domain, wikitext-103 Merity et al. (2017) in the Wikipedia domain and bookcorpus Zhu et al. (2015) in the story domain. For each test case, the first 32 tokens are used as the prompt and the task is to generate the following 256 tokens. We test three off-the-shelf LMs of different scales: OPT-6.7b Zhang et al. (2022), GPT2-XL, and GPT2-Medium Radford et al. (2019). The amateur LM used in CD is OPT-125m for OPT-6.7b and GPT2 for GPT2-XL and GPT2-Medium.

Evaluation Metrics

For automatic evaluation, we report three metrics assessing different aspects of the generations:

  • Diversity measures the degree of repetition at different nn-gram levels. The calculation can be expressed as n=24(1REPn)\prod\limits_{n=2}^{4}(1-\texttt{REP}_{n}), where REPn=(1#unique n-grams(x^)#total n-grams(x^))\texttt{REP}_{n}=(1-\frac{\#\texttt{unique n-grams}(\hat{\textbf{x}})}{\#\texttt{total n-grams}(\hat{\textbf{x}})}). x^\hat{\textbf{x}} is the generated continuation. A higher diversity score indicates that generated outputs contain fewer repetitive segments.

  • MAUVE Pillutla et al. (2021) measures the distribution similarity between the generated texts and reference texts.

  • Coherence Su et al. (2022) is defined as the cosine similarity between the embeddings of the prompt 𝐱\mathbf{x} and the generated continuation 𝐱^\hat{\mathbf{x}}: COH=f(𝐱)f(𝐱^)f(𝐱)f(𝐱^)\texttt{COH}=\frac{f(\mathbf{x})f(\hat{\mathbf{x}})}{\left\|f(\mathbf{x})\right\|\left\|f(\hat{\mathbf{x}})\right\|}, where ff is the SimCSE Gao et al. (2021) sentence embedding function.

For human evaluation, we conduct blind A/B tests with the help of proficient English speakers from a third-party grading platform. In the process of annotation, annotators are asked to compare two continuations of the same prompt and decide which one is better (or two are equally good/bad) by jointly considering fluency, coherence, and commonsense. Each case is rated by three annotators and we use majority vote.

Implementation Details

For clarity, the variant of FSD using the vectorized nn-gram model is named as FSD-vec. We set nn to 3 and 2 for FSD and FSD-vec respectively and kk to 6 for both variants. Based on our preliminary experiments, the penalty strength α\alpha is set to 3 and 1 for FSD and FSD-vec respectively. We find this setting is quite robust and generalizes well to different scenarios.

Baselines

To show the superior performance of FSD/FSD-vec, we mainly compared it with two recent search-based decoding methods, CD Li et al. (2023a) and CS Su et al. (2022), since they were reported to outperform other existing decoding methods.222We omit greedy and beam search due to space limitation. The text generated by these two methods is very repetitive and of low quality Li et al. (2023a). We follow the suggested hyper-parameter settings from their respective papers. We also compare with top-pp sampling Holtzman et al. (2020) because it is the most popular decoding method for open-ended text generation. We also include the results of typical sampling Meister et al. (2022). We set pp in top-pp sampling and typical sampling to 0.950.95, as adopted by Li et al. (2023a).

5.2.   Main Results

Automatic Evaluation Results

For automatic metrics, we believe that results closer to human are better because a higher score does not always indicate a better generation. For example, a random token sequence would obtain an extremely high diversity score, and a continuation identical to the input prompt would get a full coherence score. This is also commonly adopted in previous works Holtzman et al. (2020); Meister et al. (2022); Xu et al. (2022). Therefore, we highlight the results that are closest to human in our experiments. From Table 1, we can observe that:

  • For diversity (div), FSD/FSD-vec matches or outperforms all other decoding baselines in six/five out of nine settings (the combinations of three LMs and three domains). In cases where FSD and FSD-vec are not the best, the gaps between them and the best scores are minimal (<< 0.03). It is worth noting that recent state-of-the-art methods (CD and CS) are very sensitive to the choices of the LMs. For example, CS fails to achieve reasonable diversity scores on all three benchmarks when using GPT2-Medium. The reason is that CS relies on the isotropy of the LM’s latent space and GPT2-Medium may not fulfill this requirement. The diversity scores of CD also decrease significantly as the LM switches from GPT2-XL to GPT2-Medium, perhaps because the difference between the LM and its amateur is not sufficiently indicative of degeneration. In contrast, FSD and FSD-vec are much more stable in diversity. We attribute the success to that the operations of the anti-LM in FSD are relatively independent to the choice of the LM.

  • For coherence (coh), FSD/FSD-vec achieves the best coherence scores in seven/four out of nine settings. These results emphasize the effectiveness of FSD and FSD-vec in generating coherent and contextually-appropriate continuations. We can see that sampling-based methods (top-pp and typical sampling) often deliver lower coherence scores than search-based methods (CS and CD). This confirms that sampling-based methods produce lexically diverse text at the expense of topic drift. Importantly, FSD and FSD-vec often attain better diversity and better coherence at the same time, suggesting our methods provide a better trade-off between diversity and coherence.

  • For MAUVE (mau), sampling-based methods (particularly top-pp sampling) are generally better than search-based methods (CS, CD, FSD, and FSD-vec) though the gaps are often very small. However, it has been reported that the generation quality of CS and CD is better according to human evaluation. This indicates that MAUVE may not be a reliable metric which is also pointed out by Su and Xu (2022). Therefore, we turn to extensive human evaluation.

Human Evaluation Results

For human evaluation, we randomly select 100 prompts from each of the three benchmarks. We first compare FSD against top-pp sampling and two recent state-of-the-art methods, CS and CD. The results are shown in Table 2. We can see that on average across settings, annotators prefer FSD 1.30x more than CD, 1.26x more than top-pp sampling and 1.14x more than CS. FSD wins all the comparisons with the only exception: FSD vs CS on book. The results show that CS is the most competitive method, we then turn to compare FSD-vec with CS and FSD. As shown in Table 3, FSD-vec wins all the comparisons against CS and is preferred 1.49x more than CS. The quality of FSD-vec is on par with FSD.

Case Study

We find that, compared with CS, FSD is less likely to generate continuations that deviate from the topic. Table 4 shows two continuations from CS and FSD respectively. The prefix’s topic is “a musician is considering running for presidency”. But the topic of CS’s output is concert tours which is irrelevant to that of the prefix. It may be because CS tends to excessively penalize tokens in comparison to FSD. For instance, CS has the potential to penalize tokens that have never occurred in the preceding context, as long as they produce similar hidden states. In contrast, FSD only penalizes tokens that appear in the context and genuinely result in repetitions.

Effect of Decoding Length

Next, we investigate the robustness of our methods in addressing the degeneration issue under different generation lengths. In Figure 2, we present the diversity scores of FSD, FSD-vec, CS and CD when the generation length is 256, 512 and 768. As seen, the diversity of human-generated text is most stable across different lengths. The diversity of CS and CD drops dramatically as the generation length increases, resulting in a progressively larger disparity between the generated text and human-generated text. In contrast, FSD has the smallest slope and FSD-vec exhibits a similar slope to FSD from 256 to 512, and slightly steeper from 512 to 768. It reveals that our method is much more robust in reducing repetitions in longer sequence generation.

wikinews A is better Neutral B is better
FSD 41% 22% 37% top-pp
FSD 45% 25% 30% CS
FSD 52% 12% 36% CD
wikiext A is better Neutral B is better
FSD 46% 24% 30% top-pp
FSD 39% 24% 37% CS
FSD 37% 30% 33% CD
book A is better Neutral B is better
FSD 41% 24% 35% top-pp
FSD 38% 22% 40% CS
FSD 46% 19% 35% CD
Table 2: Human evaluation results of FSD. means the advantage is statistically significant as judged by Sign Test with pp-value<0.05<0.05.
wikinews A is better Neutral B is better
FSD-vec 44% 25% 31% CS
FSD-vec 36% 21% 43% FSD
wikiext A is better Neutral B is better
FSD-vec 51% 26% 23% CS
FSD-vec 36% 33% 31% FSD
book A is better Neutral B is better
FSD-vec 42% 20% 38% CS
FSD-vec 37% 27% 36% FSD
Table 3: Human evaluation results of FSD-vec. means the advantage is statistically significant as judged by Sign Test with pp-value<0.05<0.05.
Refer to caption
Figure 2: Diversity across different generation lengths.
Prompt: Following months of rumours, musician Wyclef Jean today confirmed that he is considering standing for the presidency of Haiti in the country’s November elections. A statement from
CS: Jean’s publicist reads: "Wyclef will make a decision on his political future in the coming days and is consulting with his family and advisors. "The announcement comes after Jean, who has been touring the world in support of his latest album A New Day In The Old Town, was forced to cancel a concert in the Dominican Republic last week due to security concerns. He had been due to play the concert in Santo Domingo on November 4 but the concert was called off after Dominican authorities raised concerns about the safety of the concert venue, which is close to the border with Haiti. \cdots
FSD: Jean’s management company, La Nuit Productions, said: "Wyclef is currently in the process of gathering information and will make a decision in the coming days."Jean, who has been a vocal supporter of the Haitian government, has previously said he would consider running for the position. The musician, who is also a member of the hip-hop group A Tribe Called Quest, has not yet announced his candidacy. Haiti’s current president, Michel Martelly, is widely expected to run for re-election. \cdots
Table 4: Case study: FSD vs CS.

5.3.   Decoding Speed

To compare the decoding speed of different methods, we plot the decoding latency (seconds per instance) of search-based methods in Figure 3. For clarity, we omit the results of sampling-based methods because they are close to greedy search. We can see that both FSD and FSD-vec demonstrate superior decoding speed compared with CD and CS, being more than 1.5x faster. In fact, FSD and FSD-vec can match the speed of greedy search. This can be attributed to the minimal computational overhead brought by the nn-gram anti-LM, as opposed to the time-consuming look-ahead mechanism in CS and the running of an amateur LM in CD. Importantly, as the generation length increases, the absolute speed gap between FSD and CS/CD becomes even more pronounced, increasing from 8/10 seconds to 20/40 seconds per instance. This highlights the great efficiency advantage of our methods in generating long sequences. Note that FSD-vec is slightly faster than FSD. The reason is that the computation of the vectorized nn-gram can be efficiently performed on GPUs.

Refer to caption
Figure 3: Decoding latency tested on GPT2-XL.
german french japanese chinese
div mau coh div mau coh div mau coh div mau coh
Human 0.93 1 0.77 0.88 1 0.65 0.56 1 0.60 0.75 1 0.87
BLOOM p=0.95 0.93 0.45 0.69 0.90 0.95 0.63 0.56 0.08 0.51 0.72 0.82 0.86
CS 0.20 0.72 0.72 0.50 0.90 0.64 0.04 0.23 0.53 0.07 0.57 0.80
FSD 0.93 0.79 0.76 0.90 0.90 0.67 0.59 0.11 0.53 0.75 0.75 0.84
FSD-vec 0.92 0.73 0.74 0.91 0.91 0.65 0.55 0.06 0.50 0.80 0.75 0.83
OPT p=0.95 0.91 0.70 0.73 0.89 0.70 0.60 0.73 0.61 0.55 - - -
CS 0.83 0.60 0.72 0.84 0.72 0.60 0.42 0.18 0.53 - - -
FSD 0.93 0.69 0.73 0.91 0.73 0.62 0.65 0.69 0.59 - - -
FSD-vec 0.93 0.64 0.73 0.85 0.69 0.61 0.64 0.58 0.56 - - -
LLaMA p=0.95 0.94 0.94 0.75 0.90 0.93 0.64 - - - - - -
CS 0.90 0.78 0.73 0.90 0.73 0.61 - - - - - -
FSD 0.93 0.88 0.75 0.93 0.85 0.65 - - - - - -
FSD-vec 0.92 0.94 0.74 0.91 0.88 0.64 - - - - - -
Table 5: Automatic evaluation results on four non-English datasets and three LMs.
BLOOM OPT
R-1 R-2 R-L R-1 R-2 R-L
beam (size=8) 32.0 6.7 27.8 35.8 5.9 31.2
p=0.95 27.5 2.9 23.6 24.1 2.5 20.7
CS 34.1 5.5 30.4 35.6 8.3 31.3
FSD 34.2 5.9 31.3 37.4 9.8 33.7
FSD-vec 33.2 5.2 29.1 37.1 8.6 32.1
Table 6: Automatic evaluation results on XSum.

5.4.   Universality

More Languages, Models and Datasets

So far, our evaluation has been primarily focused on English corpora, and the types of LMs used are also limited. We here expand our evaluation to include other non-English languages using various LMs. We conduct experiments on four datasets, chinese-wiki 333https://github.com/SigmaQuan/Awesome-Chinese-Corpus-Datasets-and-Models, japanese-news 444https://www.kaggle.com/datasets/tanreinama/japanese-fakenews-dataset, german-wiki, and french-wiki 555https://huggingface.co/datasets/wikipedia. We adopt a variety of popular LMs, including BLOOM-7b BigScience (2023), LLaMA-7b Touvron et al. (2023a), OPT-6.7b Zhang et al. (2022). The evaluation results are shown in Table 5, where also report the results of the state-of-the-art decoding methods, CS and top-pp (the missing positions indicate the LM does not support the language.). As seen, FSD and FSD-vec generally outperform CS and top-pp (most of the boldfaced numbers are from FSD and FSD-vec.). It should be noted that for BLOOM-7b, CS does not work entirely in all four languages (see the extremely low diversity scores). Additionally, the performance of CS also exhibits greater sensitivity to different languages. For instance, when applied to the japanese dataset using OPT, the diversity and MAUVE scores are notably low. In contrast, FSD and FSD-vec deliver much more stable performance across different settings, indicating FSD/FSD-vec can be a universal choice for open-ended text generation.

Instruction Following

The latest generation of LLMs such as ChatGPT OpenAI (2022) and LLaMA-2-chat Touvron et al. (2023b) have the capabilities to perform various tasks by following natural language instructions. This instruction-following approach has become the standard form for harnessing LLMs. Therefore, we compare FSD/FSD-vec against baselines within this context. Specifically, we follow the widely accepted comparison setting  Li et al. (2023b), i.e., reporting the win rates against text-davinci-003 on the alpaca_eval dataset666https://huggingface.co/datasets/tatsu-lab/alpaca_eval with the help of GPT-4 OpenAI (2023). We adopt the LLaMA-2-7b-chat model Touvron et al. (2023b) since it is among the most popular instruction-tuned models. The results of different decoding methods are shown in Table 7. The results clearly indicate that FSD/FSD-vec outperforms the baselines, thus further validating the effectiveness of our approach.

method top-p CD CS FSD FSD-vec
win rate 77.20 - 78.32 82.32 81.84
Table 7: Win rate of GPT-4 evaluation. CD is omitted since it requires a smaller amateur model and the model we use is already the smallest one.

Close-Ended Generation Task: Summarization

So far, our evaluation has been focused on open-ended text generation and general-purpose instruction following. We also evaluate our methods on a specific, close-ended generation task: summarization. We use the XSum dataset Narayan et al. (2018). As shown in Table 6, FSD/FSD-vec is generally better than other baselines. It showcases that our methods can also work well in close-ended scenarios.

5.5.   Hyperparameter Analysis

Analysis of α\alpha

We first study the effect of the penalty strength α\alpha. We present the results in Table 8. We notice that as α\alpha increases, the div score consistently increases. This is an expected outcome, as a larger α\alpha imposes a greater penalty on repetitive content, thereby promoting increased diversity in the model’s outputs. The coh score demonstrates a decreasing trend. The reason is that penalizing the most probable tokens may damage the coherence between the prefix and the continuation. Consequently, we see that the mauve score initially shows an upward trend and then experiences a slight decrease.

Analysis of nn

We study the effect of the hyperparameter nn as shown in Table 9. We can observe that diversity and coherence are very stable for different nn, when n>3n>3, the mauve begins to decrease.

Analysis of kk

We investigate the impact of the hyperparameter kk, as presented in Table 10. When kk is assigned a minimal value, a notably lower diversity (div) is observed. This can be attributed to the reduced search space associated with a smaller kk, which consequently constrains the diversity of generated outcomes. Conversely, upon incrementing kk past a specific threshold, all evaluated metrics—diversity, mauve, and coherence—demonstrate substantial stability, with only negligible fluctuations observed. This stability suggests that the effective selection space of FSD predominantly comprises a limited number of top tokens.

Despite that the hyperparameters can take different values, we recommend using the default settings of those hyperparameters and only adjusting α\alpha to suit different tasks.

div mau coh
α=1\alpha=1 0.62 0.88 0.67
α=2\alpha=2 0.87 0.94 0.66
α=3\alpha=3 0.93 0.93 0.66
α=4\alpha=4 0.95 0.89 0.65
Table 8: Analysis of α\alpha. The experiments are conducted on wikinews using GPT2-XL with FSD.
div mauve coh
n=2n=2 0.93 0.93 0.64
n=3n=3 0.93 0.94 0.65
n=4n=4 0.92 0.88 0.65
Table 9: Analysis of nn. The experiments are conducted on wikinews using GPT2-XL with FSD.
div mauve coh
k=2k=2 0.69 0.90 0.66
k=4k=4 0.91 0.92 0.65
k=6k=6 0.93 0.93 0.64
k=8k=8 0.94 0.94 0.64
k=10k=10 0.94 0.94 0.64
Table 10: Analysis of kk. The experiments are conducted on wikinews using GPT2-XL with FSD.

6.   Conclusion and Future Directions

We proposed FSD, an effective, efficient, and universal decoding method for avoiding the degeneration problem and improving generation quality. FSD constructs an anti-LM on-the-fly to penalize repetitive generation. Extensive evaluations and analyses confirm its effectiveness across open-ended text generation, instruction following, and summarization tasks. In addition, FSD demonstrates better efficiency and generality compared with existing state-of-the-art decoding methods.

An intriguing future research direction could involve a more nuanced approach to repetitions. In fact, some grams (like named entities) might not require penalization at all. Therefore, researchers may develop more meticulous algorithms based on FSD to discern the contexts and conditions under which repetitions should be penalized. This would enable a more refined and context-sensitive application of repetition management in text generation.

Ethics Statement

Due to the nature of language models, we note that the generations of our method may have offensive, toxic, unfair, or unethical content. The generations of our method may also have hallucinated content and can be misleading. When deployed in real-world applications, special attention should be paid to avoid inappropriate generations. For example, one can use post-process steps such as toxicity identification and fact checking.

7.   References

\c@NAT@ctr

Appendix A Implementation Details

We provide the detailed pseudo code of FSD in Alg. 2.

Stopwords and Punctuations

Stopwords significantly influence the diversity of sentence structures as they often appear at the beginning of sentences, such as "The…" or "He…". To provide finer control over the penalty applied to stopwords, we introduce a discount factor ϕ\phi. This factor is multiplied by the second term of Eq. 2, replacing α\alpha with ϕα\phi\cdot\alpha specifically for stopwords. A smaller ϕ\phi tends to produce sentences with similar structures, as demonstrated in the example provided in Table 13. Conversely, a larger ϕ\phi can lead to the generation of invalid sentences due to the heavy penalty imposed on stopwords at the beginning of a sentence. This may result in the selection of incorrect tokens, as illustrated in the example presented in Table 14.

We also experimentally find that penalizing punctuations can sometimes introduce grammar errors in the generated text. Specifically, when utilizing GPT2 as the base model, we have found that the punctuation symbols ĊĊ (representing "\n \n") and Ċ (representing "\n") have a significant impact on the grammatical correctness of the output. An example illustrating this phenomenon is provided in Table 15. In our experiments, we do not penalize punctuations and the punctuations set is 𝒫={. , : " ‘ ĊĊ Ċ}\mathcal{P}=\left\{\texttt{. , : " ` \.{C}\.{C} \.{C}}\right\}.

1
Input : the LM pθp_{\theta} (e.g. GPT2); the anti-LM pωp_{\omega}; the prompt text x1:lx_{1:l}; the decoding length mm; the stopwords set 𝒮\mathcal{S}; the punctuation set 𝒫\mathcal{P};
2 Construct the anti-LM pωp_{\omega} with the prompt x1:lx_{1:l};
3 for step t=l+1t=l+1 to l+ml+m do
4       Compute next token distribution pθ(|x<t)p_{\theta}(\cdot|x_{<t});
5      
6      Get V(k)V^{(k)} from pθ(|x<t)p_{\theta}(\cdot|x_{<t});
7       for  candidate vV(k)v\in V^{(k)} do
8             Get the penalty pω(v|x<t)p_{\omega}(v|x_{<t}) according to Eq. 3 (discrete version) or Eq. 4 (vectorized version);
9            
10      xt=argmaxvV(k){FSD(v|x<t)}x_{t}=\operatorname*{arg\,max}_{v\in V^{(k)}}\{\mathrm{FSD}(v|x_{<t})\};
11       Update pωp_{\omega} with xtx_{t};
12      
Output : The generated text x^\hat{x}.
Algorithm 2 FSD Decoding

Hyper-parameter Settings

We search ϕ\phi from {0.2,0.4,0.6,0.8,1,1.5}\{0.2,0.4,0.6,0.8,1,1.5\}. The detailed parameter settings are listed in Table 11.

nn α\alpha ϕ\phi
FSD
wikinws 3 3 0.2
wikitext 3 3 0.4
book 3 3 0.6
FSD-vec
wikinws 2 1 0.2
wikitext 2 1 0.2
book 2 1 0.6
Table 11: Parameter settings of ϕs\phi_{s}

Appendix B Further Analysis

B.1.   Effect of Smoothing

In previous experiments, we adopt the smoothed nn-gram model as the anti-LM. To understand the effect of smoothing, we also implement the unsmoothed nn-gram and run multiple experiments with different n[1,2,3,4]n\in[1,2,3,4]. Then, we calculate REP-i,i[2,3,4]\texttt{REP-i},i\in[2,3,4]. The results are illustrated in Figure 4. We found that if unsmoothed nn-gram is applied, the best performance is achieved when n=2n=2. The reason for this phenomenon is that if n>2n>2, the unsmoothed nn-gram LM can not penalize grams with lengths smaller than nn, which is manifested by the high REP-i,i<n\texttt{REP-i},i<n in Figure 4. However, setting n=2n=2 sometimes may not be a good option due to the BPE encoding algorithm, under which a word (e.g., name of a person) can be decomposed into multiple tokens. If penalized heavily, these words may not be recovered.

Refer to caption
Figure 4: Repetition rate for different nn.

Appendix C More Cases

C.1.   FSD vs CS

We provide more cases on Table 16, Table 17, Table 18 for wikinews, wikitext and book respectively to compare FSD with CS.

C.2.   FSD vs FSD-vec

We provide more cases on Table 19, Table 20, Table 21 for wikinews, wikitext and book respectively to compare FSD with FSD-vec.

Prompt: Following months of rumours, musician Wyclef Jean today confirmed that he is considering standing for the presidency of Haiti in the country’s November elections. A statement from
CS: Jean’s publicist reads: "Wyclef will make a decision on his political future in the coming days and is consulting with his family and advisors. "The announcement comes after Jean, who has been touring the world in support of his latest album A New Day In The Old Town, was forced to cancel a concert in the Dominican Republic last week due to security concerns. He had been due to play the concert in Santo Domingo on November 4 but the concert was called off after Dominican authorities raised concerns about the safety of the concert venue, which is close to the border with Haiti. In an interview with Rolling Stone, Jean said he was "saddened" by the cancellation, but that he had no choice but to cancel the concert. "I’m a human being and it’s my job to do what I have to do," he said. "If there’s a risk, I have to take it. But I can’t go to the Dominican Republic with this threat hanging over my head."Jean’s tour, which has taken him to South Africa, the United Arab Emirates (UAE, where he performed at the Abu Dhabi Formula One Grand Prix in October) and Brazil, is the first time he has performed in Haiti since
FSD: Jean’s management company, La Nuit Productions, said: "Wyclef is currently in the process of gathering information and will make a decision in the coming days."Jean, who has been a vocal supporter of the Haitian government, has previously said he would consider running for the position. The musician, who is also a member of the hip-hop group A Tribe Called Quest, has not yet announced his candidacy. Haiti’s current president, Michel Martelly, is widely expected to run for re-election. In a statement, Jean said: Ï am considering running for president of Haiti. I have been following the election process closely and I believe that Haiti needs a new leader who can bring about change and rebuild the country. "I have always been a supporter of President Martello and his government. I am also a fan of the music of A Tribe called Quest and have supported them in the past. "The Haitian people have suffered greatly under the current government and I feel that the time has come for a change. "Hearing the news of the earthquake in Haiti, I was deeply saddened and saddened to hear of the loss of life and the destruction of homes and businesses.
Table 12: One case of FSD versus CS. The prefix’s topic is “a musician is condsiering runing for presidency”. But the topic of CS’s output is concert tours which is irrelevant to that of the prefix. By contrast, FSD always sticks to the topic of the prefix.
Prompt: A gas explosion in the Grand Riviera Princess resort in Playa del Carmen, Mexico earlier today killed seven people and left 20 people injured, two critically according to
Generations: local media. The explosion occurred at around 8:30am local time (10:00am GMT) at the hotel’s pool area, which is located on the second floor of the resort.The hotel is a popular destination for tourists and is known for its large pool and spa. The cause of the explosion is still unknown, but local authorities have said that the cause is not related to the fire that broke out in the pool earlier in the day. The resort’s management has said that the incident is not connected to the blaze that erupted in the same area earlier in the day.The fire was extinguished by the resort’s staff and the guests were evacuated. The victims of the blast have been identified as: A man in his 30s A woman in her 20s A young man in his 20s A middle-aged woman A 40-year-old man A 50-year old woman A 60-something woman A 70-some-year women A man in his 50s A woman in her 30s A 30-something man A 20-something girl A 15-some year
Prompt: According to data from South Korea’s national bank, the country’s economy grew by 0.2% in the last quarter of 2009. The rise is now
Generations: expected to be 0.3% in the first quarter of 2010. The government has been trying to boost the economy by cutting taxes and increasing spending.The central bank has also been cutting interest rates to stimulate the economy.The Bank of Korea has cut its benchmark interest rate to 1.5% from 2.5%.The rate is still higher than the rate of inflation.The country has been struggling with a huge debt burden.The debt is estimated at about $1.4 trillion.The South Korean government has been attempting to reduce the debt by selling off state-owned companies.The government has been selling off the Hyundai Motor Group, the nation’s largest carmaker, and the Korea Development Bank.The state has also been selling off its stake in the Samsung Group.The sale of the two companies has been a success.The Hyundai Group has been sold for $8.6 billion.The Korea DevelopmentBank has been sold for about half that amount.The two sales have been a big success.The sales have helped the government to lower its debt to GDP ratio.The debt to the GDP is now down to about 50% of GDP.The Korean economy is expected to grow by 1.
Table 13: No penalty on stopwords may procude sentences with similar grammatical structure.
Prompt:Charlie, a Chimpanzee in a South African zoo who became addicted to cigarettes, has died at the age of 52, exceeding the average lifespan of a captive
Generation: animal. The cause of death was not immediately known. Charlie, who was born in captivity in 1989, was diagnosed with lung cancer in 2011. He was given a lung transplant in 2013, but his health deteriorated again and he was put down. His death comes just weeks after another chimpanzee, Harambe, was shot dead by zoo officials after a child fell into his enclosure. Image copyright Getty Images Image caption Charlie was one of the last chimps to be born at an American zoo In a statement, the zoo said: "Charlie’s passing is a great loss to the animal kingdom. "Charlie was a beloved member of our family and we will miss him dearly." Chimpanzees are the closest living relatives of humans, and are often used in research. They are also used for entertainment, as they are highly intelligent and can mimic human speech. Charlie, whose name means "chimney sweep" in Swahili, was born on 19 April 1989 at San Diego Zoo. He lived at the San Francisco Zoo for five years before moving to Cincinnati Zoo in July 2014. Image caption Chimps are known for their intelligence and mimicry The
Prompt:Police have said that an explosion in a hotel garage in San Pedro, Los Angeles, California, that sparked an evacuation of a large area was an act of vandalism
Generation: . The incident occurred at around 2:30am local time (11:00 GMT) on Thursday. A fire broke out in the garage, which is located near the hotel, and caused a huge explosion. Image copyright Getty Images Image caption The hotel was evacuated after the explosion "It was a very loud explosion," said one witness. "I was sleeping and woke up to the sound of the explosion. I looked out the window and saw a lot of smoke and fire." Another witness, who did not want to be named, said: "I heard a loud bang and then I saw the fire. "The fire was so big it was like a forest fire. It was just like something out of a movie." Media playback is unsupported on your device Media caption Footage shows the aftermath of the blast Los Angeles Mayor Eric Garcetti said the incident was "an act against our city". "We will not tolerate this type of violence," he said. "This is an attack on our way of life, our economy, our infrastructure, and our future." He said he had ordered the evacuation "out of an abundance of caution". "Our city
Table 14: Penalizing stopwords heavily may produce invalid generations. The invalid generation “Image copyright …” may occur in some examples because the stopwords that have a larger probability than “Image” have a larger penalty. Therefore, “Image” is selected.
Prompt: Apple Inc. announced Friday that it will give owners of its new iPhone 4 a free case in response to mounting concerns over the device’s antenna placement. The iPhone
Generation: maker said it is offering the free cases to customers who have already purchased the phone. The company said the new cases will be available in stores and online starting Friday. Apple said the cases will have a "unique design" that will help protect the iPhone’s antennas from damage. , which is made by Corning Inc., will be sold in the U.S. and Canada for $29.99. The company also said that the case will be made in China, where the company has been making its products for years. "We are committed to providing our customers with the best possible products and we are always looking for ways to improve our products and services," Apple spokeswoman Kristin Huguet said in a statement. "We know that our fans want to protect their iPhone and this case is designed to do just that." Apple’s decision to offer the free protective cases comes after a number of reports of people experiencing problems with their iPhones’ antennas. Some people have said they have been unable to make calls or send text messages because their phones were unable to receive signals. Others have complained that their devices have lost reception or have been damaged by the antenna. a report by The Wall Street Journal last week that said some iPhone owners had been complaining
Prompt:Wikinews reporter Iain Macdonald has performed an interview with Dr Isabella Margara, a London-based member of the Communist Party of Greece (K
Generation: KE), who has been a member of the party since the 1970s. Dr Margaras is a well-known figure in the Greek left, having been a prominent member of the KKE since the 1980s, and having served as a deputy to the party’s leader, Dimitris Koutsoumbas. She is also a former member of the Socialist Party (PASOK) and a long-time member of the European Parliament. In the interview, she discusses the current situation in Greece, the role of the EU in the country’s crisis, and the possibility of a new left government in Athens. and the role of the EU in the countrys crisis,. In the article, Dr Margareta Margara also discusses the recent decision by the Greek government to impose capital controls on the banks, and the implications of this for the country. The interview was conducted by Iain MacDonald on behalf of The Independent. Iain: What is the situation in your country? Margaras: The situation is very bad. The government has imposed capital control on the banking system. This is a very serious situation. The banks are closed and there is no access to money. There is no money in the banks. There is
Table 15: Penalizing punctuations can produce grammar errors.
Prompt-1: On Friday, the 189 Parties to the Nuclear Non-Proliferation Treaty (NPT) agreed to a final resolution calling for steps toward the banning of weapons
FSD: of mass destruction. The resolution, which was adopted by consensus, calls for the "immediate and complete cessation of all nuclear, chemical and biological weapons-related activities" and "the establishment of a comprehensive, verifiable and irreversible nuclear-weapons-free zone in the Middle East." It also calls for "a comprehensive, effective and enforceable prohibition on the production, stockpiling, acquisition, transfer, use and threat of use of nuclear weapons and all other weapons of indiscriminate and disproportionate impact." The NPT was created in 1968 to prevent the spread of nuclear and other arms. "The international community has been working for many years to achieve a world without nuclear weapons," said U.S. Ambassador to the United Nations Samantha Power. "Today, we have taken a major step forward toward that goal." "This is a historic day for the international non-proliferating movement," she said. "It is a day of hope and optimism for all those who believe that the world can be a safer, more peaceful place." "We are proud of the work that has been done by the 193 N.G.O. member states to reach this historic agreement," Power said. "We will continue to
CS: of mass destruction. In the meantime, the United States is continuing to develop nuclear weapons and ballistic missile defense systems that it can use against Iran if it were to violate the terms of the agreement. On Monday, Secretary of State John Kerry said Iran’s "threats and provocations are a clear violation of the spirit and letter" of the NPT, which was signed by 122 countries in the 1970s. Kerry’s comments came during a visit to South Korea, where he met with President Park Geun-hye and Foreign Minister Yun Byung-se. "We have made it clear to the Iranians that their nuclear program must be subject to IAEA safeguards, and that they must cease work on a heavy water reactor that could produce plutonium," Kerry said. The Obama administration has been working for more than a year to reach a deal with Iran that would curb its nuclear program in exchange for relief from economic sanctions. But negotiations have hit snags, most recently over Iran’s refusal to allow inspectors into Parchin, a military site where Iran is believed to have carried out experiments on nuclear weapons in the past. Iranian officials have said the site is for research and development, but Western intelligence
Prompt-2: Inventor Walter Frederick Morrison has died of the age of 90. He was famous for being the inventor of the flying disc, more commonly identified as the fr
FSD: isbee. Morrison was born in New York City in 1891. His father was a lawyer and his mother was a homemaker. Morrison’s father died when he was young, and he moved to Chicago to live with his aunt. He attended the University of Chicago, where he studied engineering. In 1915, he joined the U.S. Army Air Corps, and was stationed in France during World War I. In 1919, Morrison was commissioned as an officer in the Army Signal Corps. He served in the Signal Service, which was responsible for transmitting messages across the Atlantic. After the war, Morrison went to work for the Bell Telephone Company, where his job was to design and build telephone systems. In 1928, he founded the Morrison Electric Company, which manufactured and sold electric motors and generators. In 1939, Morrison founded the American Aircraft Corporation, which made aircraft engines. In 1941, the company was sold to the General Electric Corporation, and Morrison became president of the new company. The company’s first aircraft, the P-51 Mustang, was designed by Morrison. The P2V Mustang, a fighter plane, was the first American fighter to be powered by a jet engine.
CS: sbee. Morrison’s name is on the back of every disc sold, and the discs are made in his hometown of St. Louis, Missouri. The disc was invented in 1887 by a man named George Eastman, who patented it in the United States and sold it to Parker Brothers for $1,500. It was a flimsy disc with a hole in the middle that could be thrown with one hand, according to Smithsonian Magazine. In 1889, Parker Brothers began to sell the disc in Europe, and by the 1920s, there were over 2 million discs in circulation, according to Smithsonian Magazine. But the disc was not without its problems. It was difficult to make, and many people complained about the disc breaking on the way to the basket. "I have seen it, and it is terrible," one golfer said, according to Smithsonian Magazine. "The ball goes in, and you have to hold it in your hand for a long time, and then it breaks in the basket." The frisbee, on the other hand, was much more durable and easy to throw. "It’s like a golf club in that way," Morrison told National Geographic in a 2011 interview. "
Table 16: Comparison of CS and FSD on wikinews. We can see that Both FSD and CS can generate fluent sentences. However, FSD is more sticky to the topic on these two prompts.
Prompt-1: <|endoftext|> With a satisfying protagonist completed, Sega turned to esteemed programmer Yuji Naka, who had impressed them through his work on Phantasy Star and the Genesis port
FSD: of Sonic the Hedgehog 2. Nana was a new face to the company, but he had a reputation for being a hard worker and a good communicator. Nana’s first task was to create a character that would be able to stand out from the crowd. He chose Sonic, a character who was already well known in Japan. The game’s story was also designed to be unique, with the player controlling Sonic as he travels through a world that is not only different from any other Sonic game, but also from other games in the Sonic series. The game was originally going to be called Sonic Adventure, but Sega decided to change the name to Sonic the Fighters. Sega’s decision to use a different name for the game was not a coincidence. In the early 1990s, the Japanese video game industry was in a state of flux. Sega was struggling to find its place in the market, and it was looking for a way to differentiate itself from Nintendo. In the United States, Nintendo had been dominating the market for years, and Sega had struggled to gain a foothold. As a result, Sega began to look for ways to differentiate itself. One of the ways Sega tried to do this was by creating a game that was
CS: of Final Fantasy. Naka was brought in to work on Sonic the Hedgehog, and the result was a game that had everything you could want in a Sonic game: speed, platforming, a sense of humor, and, of course, lots and lots of explosions. Sonic 3 & Knuckles is an incredibly well-made game, and the story is one of the best in the series to this day. It’s not perfect, but it’s a great example of what can be done with the Sonic formula. In this article, we’re going to take a look at some of the things that make Sonic 3 & Knuckles such a great game, and how they were implemented in the game. If you’re new to Sonic, this is a great place to start. 1. The Game Over screen is the best part of the game The game over screen is where you get to see your score, how many lives you have left, and what level you’re on. It’s a great way to end a game, and it’s one of the most memorable parts of the game. 2. There are a lot of secrets to find The levels are packed with secrets, and they’re hidden all over the
Prompt-2: <|endoftext|> " Kiss You " was written by Kristoffer Fogelmark, Kristian Lundin, Albin Nedler, Savan Kotecha, Shellback
FSD: , and Jens-Christian Svenning. It was produced by the Swedish Film Institute and directed by Jonas Åkerlund. The film was released in the United States on October 21, 2010. Plot [ edit ] A young man, who is a member of the "Kiss You" group, is on a date with a woman. He is very nervous and tries to hide his feelings. The woman, however, is not interested in him and leaves. The man is left alone and begins to cry. Later, the man meets a man who has been in a coma for a year. The two men are friends and the man tells the young guy that he has a girlfriend. The young boy is shocked and asks the girl if she is his girlfriend. She says that she is not and that they are just friends. The man then asks if he can kiss her. The girl says no and tells him that it is too soon. The man says that if it’s not too late, he will kiss her. The young couple then kiss and the woman says "I love you". The film ends with the two smiling. Cast [ Edit ] Production [ edit ]; Credits [ add
CS: . Recorded and Produced at Soundworks in Gothenburg, Sweden. Mastered at Sterling Sound in Los Angeles, CA. Additional Recording at Studio G-A-N in Gothenburg. Barcode: 7 27361 85928 2 Matrix / Runout (Variant 1): SON-A-1-SOUNDWAVES-BRAZIL-01-GOTHLAND-AUGUST-2014 (WEA) MADE IN JAPAN BY APPLE RECORDS A/S. Matrix / Runout (Variant 2): SON-A-1-SOUNDWAVES-BRAZIL-01-GOTHLAND-AUGUST-2014 (WEA) MADE IN JAPAN BY APPLE RECORDS A/S. Mastering SID Code (Variant 1): IFPI L574 Mastering SID Code (Variant 2): IFPI L574 Mould SID Code (Variant 1): IFPI 9085 Mould SID Code (Variant 2): IFPI 9085 Rights Society: GEMA
Table 17: Comparison of CS and FSD on wikitext. CS obviously has a topic drift on Prompt-2.
Prompt-1: The Sons vowed to combine their wisdoms and forsake their names and worship wholeheartedly the Unmarked Ones so that they one day might return. They would
FSD: be the first to bring the world back to the way it was before the fall. The Firstborn In the years that followed, the Sons of Svanir grew in power and influence. The First Born were the most powerful of their number, and they were known as the "Firstborn." They were also the most secretive, for they had no need of a public face. Their true name was kept secret, but they knew it well. They were born of the same blood as their brothers, but their bodies were different. Instead of being born with a single eye, they had two. This was because the Firstborn were not born from the same mother as their brethren, but from a different mother. This was not a problem for the FirstBorn, for their mother was a woman named Nithra. She was the daughter of an Unmarked One named Kord. Kords was an old man who had lived for many years. He had been a great warrior, and he had fought in many wars. One day, Kord was attacked by a group of bandits. He was killed, and his body was taken to a nearby village. There, his wife, Nethra, was told of his death
CS: be a force of good in the world, and their god-kings would rule with an iron fist. The Sons of Sanguinius had no need for this kind of power, but it was the only way they could keep from being wiped out by the Emperor’s Children, who were the most powerful Space Marine Legion at the time. In the end, the Emperor’s Children won the Great Crusade, and Sanguinius was forced to flee into the Eye of Terror, where he remained for a thousand standard years before being found by the Ultramarines. Siege of Vraks "You are a fool, Horus. I know what it is to be an outcast, and the shame is yours." — Fulgrim, to Horus during the Siege of Vraks (Unknown Date.M41) In 998.M41, the Blood Angels were deployed to the world of Vraks, which had been the site of a bloody civil war between the Sons of Sanguinius and the Traitor Legion known as the World Eaters. The Sons of Sanguinius had taken the planet as a staging ground for their invasion of the Segmentum Obscurus, and were intent on using
Prompt-2: But that’s enough about aging, enough gloom and despondency. I shall tell you a story about when I myself was young and susceptible. It was in
FSD: the early days of the war, and I was a student at the University of Chicago. The war was raging, and we were all in a state of excitement. We were not yet aware of the fact that we had been born into a world of death. I was studying in my room, with my books open on the table. Suddenly I heard a noise outside. I looked out the window, and saw a man running across the street. He was dressed in civilian clothes, but he was carrying a rifle. "What are you doing?" I asked. He stopped, looked at me, and then he ran back to his house. The next day, I went to the university, where I found that the man had died. It was a strange experience. For a moment I felt that I had lived in a dream. Then I remembered that it was only a dream. But it is a fact, that in those days, when the world was full of fear, there were people who were prepared to die for their country. And it is also a fact, that when we are young, we have a tendency to forget that there are people in the world who are prepared to sacrifice their lives for us
CS: the summer of 1884, and my family was on a visit to New York. We stayed at the Waldorf-Astoria, which was in the heart of Times Square. The hotel was a grand, opulent edifice, with marble floors, gold-trimmed walls, and a grand ballroom that could seat two thousand people at a time. I was in the ballroom one day with my mother, father, and a few of my friends. It was a hot day, and the air was filled with steam from the boilers, which were in operation all over the hotel. We were sitting on the verandah, enjoying the cool breeze and the view of the city below. Suddenly there was a commotion in the ballroom, and a man in a black suit rushed in with a pistol in his hand. He pointed it at my mother, and demanded to know who she was. When she answered that she was my mother, the man drew his pistol and shot her point-blank in the chest. She fell to the floor, and I was left to look on in horror. My mother was dead, and there was nothing I could do to save her. The next day, when I got home, I found a
Table 18: Comparison of CS and FSD on book.
Prompt-1: A private security guard, defending the Panamanian based MV Almezaan, killed a Somali pirate who was attempting to hijack the ship on Tuesday.
FSD: The MV Maersk Alabama was sailing from the port of Bremerhaven in Germany to the Gulf of Aden when it was attacked by a pirate vessel. The crew of the MV Ala’a were able to board the pirate ship and kill the Somali pirates. "The security guards of the Almezan were on board the ship when they were attacked by pirates," said a statement from the German Maritime Ministry. "The guards were armed with pistols and were able to shoot and injure the pirates." The Almezan was able to return to Bremershaven and continue its journey to the gulf of Aden. "We are grateful to the security forces of Germany and the United Arab Emirates for their quick response and for the safe return of the vessel," the statement said. "This is a clear example of the cooperation between the maritime authorities of the two countries." Piracy off the coast of Somalia has been on the rise in recent years. In 2012, the UN Security Council passed a resolution to tackle the problem. The resolution called for a "comprehensive and coordinated international response" to the problem. It also called for the creation of a "global maritime force
FSD-vec: The incident occurred at about 10:30 a.m. local time (1:30 p.m. ET) when the crew of the MV Almezaan, a cargo vessel, spotted a small boat in the water off the coast of Somalia. The security guard boarded the vessel and ordered the pirates to surrender. "The pirate refused to comply with the order and fired a warning shot at the guard," said Captain John G. Smith, commander of the U.S. Navy’s 7th Fleet. "The armed guard then opened fire, killing the suspected pirate." The Almezaan, which is owned by the Dutch company Maersk Line, was carrying about 1,000 tons of fuel oil, food and other supplies. The Somali Pirates are known for their piracy and have been involved in several attacks on ships in the region. In April, a Somali Pirate hijacked a tanker, the MSC Vos Hestia, off the Horn of Africa. The ship was eventually rescued by the Royal Navy. Last year, a group of Somali pirates attacked a merchant shipping vessel, the SSV Punt, off the east coast of Africa. The attack resulted in the
Prompt-2: The Space Shuttle Discovery, flying the STS-133 mission, has successfully rendezvoused and docked with the International Space Station (ISS) today at
FSD: 7:30 a.m. EDT. The docking was completed at 8:15 a.m. "The crew of STS 133 is safely back on Earth," said NASA Administrator Charles Bolden. "This is a great day for NASA and the nation. We are grateful to the crew and their families for their dedication and professionalism." "This is a very exciting day for the entire NASA family," added NASA Chief Technologist David Weaver. "The shuttle program has been a tremendous success, and we are proud of the work of the men and women of the shuttle fleet. This is a day of celebration for all of us at NASA." The STS flight was the first of two planned shuttle missions to the ISS. The second mission, STS-134, is scheduled for launch in late November. "Today’s successful docking marks the end of a long journey for the space shuttle," noted NASA Administrator Boldin. "It is a testament to the hard work and dedication of the astronauts and engineers who have flown on these missions. It is also a tribute to the strength of the American spirit and the ingenuity of our nation’s scientists and engineers." "We are very proud of the team that has worked so hard
FSD-vec: 7:30 a.m. EDT. The docking was completed at 8:15 a.m. EDT, NASA officials said. "This is a great day for all of us," said Rick Mastracchio, commander of the space shuttle program. "We are very proud of the crew and the team that worked so hard to get this done." The shuttle’s arrival at the ISS marks the first time a U.S. spacecraft has docked with the orbiting laboratory since the end of the Apollo era in 1972. "Today is a historic moment for the United States and the world, and we are grateful to the astronauts and their families for their dedication and professionalism throughout this process," President Obama said in a statement. "I want to thank the entire NASA team for their efforts to bring the Space Shuttle back to Earth safely, and I congratulate the STS-133 crew on their successful return to flight. This is a very special achievement for our nation, and it is a testament to the ingenuity and determination of the American people. We are all looking forward to seeing what they do next!" The mission, which was originally scheduled to launch on May 20, will be the last of the current generation of shutt
Table 19: Comparison of FSD and FSD-vec on wikinews.
Prompt-1: <|endoftext|> A tropical depression developed near Yap on November 11. The system intensified gradually as it tracked quickly westward toward the Philippines. An anticyclone aloft
FSD: over the central Pacific was responsible for the development of the system. The system was located about 1,000 miles west of Yape, Papua New Guinea, and about 2,500 miles east of Guam. It was moving west at about 15 mph. A tropical storm warning was issued for Yap and the surrounding area. On November 12, the system strengthened to a tropical cyclone. It was named Yolanda after the city in the Philippines where it formed. Yoland, a small town in Yapto, is located on the northern coast of the island of Yap. The National Hurricane Center said the storm was expected to strengthen further and become a hurricane by the time it reaches the western Pacific. Forecasters said Yolan will likely become a Category 1 hurricane on Friday. "The center of this system is expected to move over land near the southern Philippines on Saturday and Sunday," the center said. "The maximum sustained winds are near 45 mph, with higher gusts." Yolanda is about 3,200 miles south of Manila. "Yolan is forecast to become a major hurricane later this week," the National Weather Service said. "It is likely
FSD-vec: over the central Pacific was responsible for the development of the storm. The typhoon, which is named after the island of Yape, formed in the western part of the Central Philippine Sea. It moved northwest at a speed of about 10 mph (16 km/h) and then turned east-northeast. It weakened slightly during the night, but remained a Category 1 hurricane with maximum sustained winds of 115 miles per hour. Yap is located about 2,000 miles southwest of Manila. A Tropical Storm Warning was issued for the northernmost parts of the country, including the provinces of Cebu, Iloilo, and Bohol. Forecasters said the cyclone could bring heavy rain and strong winds to the southern and eastern portions of the archipelago. "We are expecting some very intense rainfall, especially in the south and southeast," National Disaster Risk Reduction and Management Council (NDRRMC) Director General Joel Pagdilao told ABS-CBN News. "There will be gusty wind, so we need to be prepared for that." Pagdilao added that the NDRRMC is monitoring the situation closely. "Our teams are on
Prompt-2: <|endoftext|> Malaysia LNG is a Liquefied Natural Gas ( LNG ) manufacturing complex located in Bintulu that currently contains eight LNG trains with a ninth
FSD: train under construction. The complex is owned by the Malaysian government and is operated by Malaysia’s state-owned Petronas. The complex was built in the early 1990s to supply LPG to the domestic market. In 2004, Petrolio, a subsidiary of Petronas, acquired the LNGL project and began to produce LNB. In 2009, the government of Malaysia announced that it would be building a second LGNL complex in Kuala Lumpur. This second complex, which is currently under development, will be capable of producing up to 1.5 million tonnes of LBN per year. Petronas has stated that the second facility will have a capacity of up to 2.2 million tons per annum. LNG train The first LNG train was delivered to BINTUL in April 2010. The train is capable of transporting up to 4.3 million litres of LNB per day. A second train, which was originally scheduled to be delivered in 2012, was delayed due to a number of issues. In September 2012, Petromonas announced that the first train would be completed by December 2012. However, in January 2013, the company announced that the train would not be ready
FSD-vec: train under construction. The facility is owned by Malaysia’s state-owned Petronas. The plant was built in the early 1990s and has been used to produce liquified natural gas for export. It is one of the largest industrial facilities in Southeast Asia. In addition to the four existing rail lines, there are plans to build a fifth line. Liquifed-natural gasses are produced from coal, oil, or biomass. They are then transported by pipeline to the terminal, where they are loaded onto tankers and shipped to other countries. Malaysia produces about 1.5 million tonnes of liquid fuel per year. A new railway will be constructed to transport the gas to the port. Construction of the new train began in 2009. It is expected to be completed by 2015. The first two tracks were laid in 2010. The third track was installed in 2011. The fourth and final section of the railway was opened in 2012. The LNGL project is part of the Malaysian government’s plan to increase its energy security. Petrol prices have been rising steadily since the beginning of the year. According to the Petroleum Ministry, petrol costs
Table 20: Comparison of FSD and FSD-vec on wikitext.
Prompt-1: The person slowly backed out of the snag. Olivia was expecting either a kid or a deranged hermit, so she was startled to see a young woman,
FSD: maybe in her early twenties, standing there. She was dressed in a black dress, with a white blouse and black pants. Her hair was tied back in a ponytail, and she had a small, round face. "Hello, Olivia," the woman said. "I’m Sarah. I’m here to help you." Olivia’s eyes widened. "You’re Sarah?" Sarah nodded. "Yes. I’ve been looking for you for a while. You’re the one who saved my life." She smiled. "Thank you. It was my pleasure." Oblivious to the fact that she’d just met a stranger, Olivia walked over to Sarah and hugged her. Sarah returned the hug, and Olivia felt her heart swell with gratitude. "I’ve got to go," Sarah said. As Olivia stood up, she noticed that Sarah had left her purse behind. "Where’s your purse?" Olivia asked. Sarah shrugged. "It’s in my room. I’ll be back soon." Olivetti The next day, Sarah came back to Olivia’s room. "You look tired," she said. I know, I
FSD-vec: maybe in her early twenties, standing there. She had long, dark hair, and wore a black dress with a white collar. Her eyes were closed, but they looked like those of a child. "Hello," said the girl. "I’m Emily. I’ve been waiting for you." Olivia’s heart sank. Emily smiled. She took off her glasses and wiped her eyes. The two of them walked into the house. They sat down on the couch. It was quiet. Olivia felt as if she’d been transported back to that night. But this time, it wasn’t just because of the silence. There was something else. Something different. The air was warm. And, somehow, there was a sense of peace. Olivia couldn’t quite put her finger on it. Maybe it was the fact that the room was empty. Or maybe, just maybe, it was because there was someone sitting across from her. A tall, thin man. He didn’t look at her. Instead, he stared at the floor. His face was expression
Prompt-2: The trip ended far too soon; I was a bit confused as we pulled up to a small beach cottage on the far side of the inlet. The whit
FSD: ewashed walls were covered in graffiti, and the windows were boarded up. I asked the owner if he knew where we could find a place to stay, but he didn’t seem to know either. We decided to walk back to the car and head back out to the inlets. We walked along the beach for a while, until we came across a large group of people sitting on a bench. They were all dressed in white, and they were talking about something. It was hard to make out what they were saying, but I could tell they had a lot of energy. I asked one of the guys if I could join them, and he said sure. He led me to a spot on the beach where he had set up a tent. As we sat down, I noticed that the guy was wearing a white shirt with a black tie. "What’s your name?" I asked. "I’m John," he replied. He was tall, with dark hair and a beard. He was also wearing jeans and a t-shirt. John was an older man, probably in his late 50s or early 60s. He had dark skin, a thin mustache, and was balding. As we
FSD-vec: ewashed walls were covered in graffiti, and the windows were boarded up. We walked inside, and found a large, empty room. "I’m sorry, but there’s nothing here," said my guide. We sat down at the table, and he told us about his life. He had been born in Mexico, but moved to the United States when he was five. His parents divorced, and he lived with his mother. He worked as a cook, and then became a security guard. When that job dried up, he started working for a construction company. When the recession hit, things got worse. The economy was in shambles, and many people lost their jobs. His boss fired him, and sent him to live with his grandmother. That’s when everything changed. A few months later, his father died. It was hard for him to cope. So, after living with his grandparents for a while, he decided to move back to Mexico. But, before he left, he took some photos. One day, while walking home, he saw a man who looked like him. They talked, and eventually, they agreed to meet
Table 21: Comparison of FSD and FSD-vec on book.