Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

Kohei Tsuji¹, Tatsuya Hiraoka², Yuchang Cheng ^1,3, Eiji Aramaki ¹, Tomoya Iwakura^1,3,

¹NAIST, ²MBZUAI, ³Fujitsu Ltd.
[email protected],
[email protected],
[email protected],
{cheng.yuchang, iwakura.tomoya}@fujitsu.com,

Abstract

This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are responsible for the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.

Kohei Tsuji¹, Tatsuya Hiraoka², Yuchang Cheng ^1,3, Eiji Aramaki ¹, Tomoya Iwakura^1,3, ¹NAIST, ²MBZUAI, ³Fujitsu Ltd. [email protected], [email protected], [email protected], {cheng.yuchang, iwakura.tomoya}@fujitsu.com,

1 Introduction

Inputs for real applications using large language models (LLMs) sometimes contain typographical errors (typos) Zheng and Saparov (2023); Wang et al. (2024a); Zhu et al. (2023). LLMs often make correct answers on inputs with typos Wang et al. (2024a), which implies that LLMs can “fix” typos to recover the initially intended meaning. However, LLMs sometimes imperfectly fix the meaning against typos, which might “damage” the performance of LLMs on downstream tasks Zhuo et al. (2023); Wang et al. (2023); Zhu et al. (2023); Edman et al. (2024). To reduce the impact of typos on LLMs, it is essential to understand both their robustness against typos and the reasons for performance degradation caused by typos more deeply.

Existing studies have primarily focused on the surface-level exhibition of performance degradation due to typos Wang et al. (2023); Zhu et al. (2023) and methods for improving robustness against typos Zheng and Saparov (2023); Zhuo et al. (2023); Almagro et al. (2023). Few studies have investigated how typos affect LLM’s inner workings Kaplan et al. (2024); García-Carrasco et al. (2024b). However, previous work focused on cases where the input contains only a few subwords and a typo. Therefore, they examined typo-fixing working with only local contexts. In contrast, studies have reported that the performance of typo correction can be improved by observing longer (global) contexts Li et al. (2020); Ji et al. (2021). This implies that LLMs might see global contexts when handling typo inputs.

Based on these previous works, we hypothesize that LLMs with the Transformer-based decoder also fix typos along two axes: typo-fixing with local contexts, which focuses on nearby subwords, and typo-fixing with global contexts, which understands longer contextual information. To verify this hypothesis, we investigated neurons (typo neurons) and attention heads (typo heads) in LLMs that provide robustness against typos through the following steps. First, we investigated the inner workings against typos in contextualized words using a word identification task (§3). Then, we propose a method to identify typo neurons (§4) and typo heads (§5). Subsequently, we analyze the differences in their behavior between cases where the model is damaged by typos and cases or not.

We conducted experiments using Gemma 2 Team et al. (2024), Qwen 2.5 Yang et al. (2024), and two of the Llama 3 AI@Meta (2024) family to investigate the inner workings when inputs contain typos. Our findings suggest the following:

•

LLMs can fix typos when the typo neurons in either the early or late layers, both of which focus on local contexts, are activated, even if those in the other are not.
•

Typo neurons in the middle layers are responsible for typo-fixing considering global contexts, regardless of the models.
•

Typo heads fix typos using the local and global contexts, not focusing on specific tokens.
•

Typo neurons and typo heads not only fix typos but also understand general grammatical or morphological features.

Refer to caption — Figure 1: The dataset overview (left), an overview of an input example to LLM (middle), and the visualization of $M_{x}$ for calculating neurons activation score $s_{n}^{x}$ (right).

2 Related work

2.1 Analysis of LLMs against Typos

Typos are mistakes in writing or typing letters, categorized into insertion, deletion, substitution, and reordering Gao et al. (2018). Research on the robustness of LLMs regards typos as a perturbation. Typos change the token sequence obtained through the tokenization process. Changing the token sequence potentially leads to a different output, even if the sentence is the same Tsuji et al. (2024). Most existing LLM studies about typos focus on measuring the robustness against perturbed inputs Wang et al. (2021, 2023); Zhu et al. (2023); Edman et al. (2024) or modifying the architecture or prompts to improve robustness Zhuo et al. (2023); Zheng and Saparov (2023); Almagro et al. (2023). Chai et al. (2024) reported that the larger models are more robust to typos. Before the LLM era, researchers corrected typos using specific models for typo-correction Li et al. (2020); Ji et al. (2021).

2.2 LLM’s Interpretability

The feed-forward network (FFN) layer in the Transformer Vaswani (2017) has two linear layers separated by an activation function. Recent studies regard the output of the activation function as “neurons” that store knowledge Geva et al. (2021). It has been reported that some neurons promote specific tasks Wang et al. (2022, 2024c), knowledge Dai et al. (2022); Bau et al. (2019); Gurnee et al. (2024), and behaviors Hiraoka and Inui (2024); Wang et al. (2024b); Chen et al. (2024).

Some attention heads also respond to specific knowledge Gould et al. (2024); Voita et al. (2019); García-Carrasco et al. (2024b) or behaviors McDougall et al. (2024); Crosbie and Shutova (2024). Additionally, some heads are responsible for merging multiple subwords of a word Correia et al. (2019); Ferrando and Voita (2024).

There are various methods to investigate LLM’s interpretability. Some measure contributions to the residual stream García-Carrasco et al. (2024a); Hanna et al. (2024), while others observe intermediate predictions nostalgebraist (2020); Kaplan et al. (2024), graph the inference process Ferrando and Voita (2024), or directly observe activations Wang et al. (2022); Hiraoka and Inui (2024); Wang et al. (2024c). We hypothesize that typo neurons are a type of skill neurons. Therefore we use the direct activation observation method, following previous studies on skill neurons Wang et al. (2022); Hiraoka and Inui (2024). Mosbach et al. (2024) concludes that understanding the inner workings is important to improve the model performance.

Lad et al. (2024) divides LLMs into four stages. The early layers convert token-level representations into entity-level representations with local contexts as Detokenization. The early middle layers build representations with global contexts as Feature Engineering. The late middle layers, convert current representations into next token representations as Prediction Ensembling. Finally, the late layers remove the noise and refine the distribution of the next token as Residual Sharpening. Elhage et al. (2022) reports that the late layers perform the opposite function of the early layers’ Detokenization, converting entity-level representations into token-level representations as Retokenization.

Kaplan et al. (2024) reveals which layers are responsible for typo-fixing. However, they only focused on isolated words as inputs by layer-level observation. We focus on neurons and heads and experiment with global contexts.

3 Preliminary

We created a dataset that LLMs can solve without typos (§3.2). Then, we applied typos to the dataset (§3.3) and conducted a preliminary experiment to observe accuracy when inputs include typos (§3.4). Next, we identify typo neurons and reveal their specific roles (§4). Similarly, we conduct analogous experiments for attention heads (§4).

3.1 Models

We used Google’s Gemma 2 Team et al. (2024) with 2B, 9B, and 27B parameters, Meta’s Llama 3.2 AI@Meta (2024) with 1B and 3B parameters, Meta’s Llama 3.1 with 8B parameters, and Qwen’s Qwen 2.5 Yang et al. (2024) with 3B, 7B, 14B, 32B parameters; Gemma 2 27B and Qwen 2.5 32B were loaded in bfloat16, while the others were loaded in float32¹¹1We described our computing environment in Appendix A.. We conducted all experiments using greedy generation.

3.2 Clean Datasets without Typos

We used a word identification task in which LLMs infer a single word from a given definition. Since typo-fixing relies on vocabulary knowledge, it is crucial to use a task that directly reflects the LLMs’ vocabulary knowledge, such as word identification. Moreover, we avoided tasks requiring complex reasoning, such as NLI, as variations in sample difficulty could hinder a clear observation of typo-related phenomena.

For instance, we feed the definition of the word as input, like “a young swan”, to an LLM, and then the model is expected to output the corresponding word “cygnet”. Following Greco et al. (2024), we extracted 62,643 word-definition pairs from WordNet Fellbaum (2005)²²2WordNet via NLTK Bird and Loper (2004) ver.3.9.1.. We created the dataset with these pairs. We designed a prompt so that LLMs can solve this task by predicting tokens following outputs, as shown in the middle part of Figure 1.

For our analysis, we need a dataset composed of samples that LLMs can correctly answer when the samples do not include typos. Therefore, we extracted the top 5,000 or 1,000 word-definition pairs after sorting the samples by descending order of likelihood for the correct words³³3Due to Llama 3.2 1B’s worse performance, we could not extract 5,000 pairs for the Llama 3 family. Therefore, we extract 1,000 pairs for the Llama 3 family.. Note that we created a unique dataset for each model.

3.3 Generating Inputs with Typos

3.3.1 Typo Dataset

To focus on text with typos, we generated inputs with typos from the definition part of the clean dataset created in §3.2. We selected the top $t$ most important tokens depending on their importance scores on the word identification task. Then, we injected a random single letter or digit into each selected token as a typo. The importance scores were calculated with the method used in Wang et al. (2023); Li et al. (2019), with the smallest models among ones that share the same tokenizer (e.g., Gemma 2 2B for Gemma 2 or Llama 3.2 1B for Llama 3 family). Specifically, we obtained the importance scores by performing back-propagation while predicting words from their definitions. This process assigns higher gradients to tokens that are important to predict the correct answer. For example, consider the sentence “a young swan” with $t=2$ and the top two most important words are “young” and “swan.” In this case, we inject random letters such as “e” and “5” into random positions⁴⁴4We exclude the positions before the spaces to avoid the situation where a typo would appear at the end of the previous token rather than within the target token. of each word, which results in “a youneg s5wan.”

3.3.2 Split Dataset

We often obtain a different number of subwords when tokenizing typo inputs compared to clean inputs. For instance, the tokenizer encodes the word “young” into a single token, but it tokenizes the typo version “youneg” into two tokens (e.g., “you / neg”). When comparing the inner workings when LLMs encode the clean inputs and the typo inputs, the difference in the token length might prevent appropriate analysis⁵⁵5Kaplan et al. (2024) reported that there are inner workings to fix the original token from differently tokenized subwords. We need to exclude the effect of this factor to deeply focus on the typo-related inner workings..

To divide typo-related inner workings into the factor corresponding to typos and the one to tokenization difference, we created the “split dataset” in addition to the “typo dataset” mentioned in §3.3.1. The split dataset contains samples tokenized into the same number of tokens as the one with typos. For example, when the typo dataset has a sample whose tokenized sequence is “a / you / neg / swan”, an example of counterparts in the split dataset is “a / y / oung / swan” whose length is equivalent to the one of the typo version. We can obtain the various tokenization candidates using the tokenizer and we randomly selected one candidate with the same length as the typo input. This process is shown in Figure 1 (left).

3.4 Preliminary Experiment

To examine the impact of typos on the model performance, we applied typos to $t$ tokens ( $1\leq t\leq 16$ ) and analyzed the change in accuracy.

Figure 2 shows the preliminary experimental results. The accuracy of $t=0$ indicates the performance of the clean data without typos. Since the clean data consists of samples that each model can answer correctly, the accuracy for all models is 1.0. As shown in the figure, the larger models maintain higher accuracy than the smaller models even with many typos. This result supports the existing work reporting that the larger model has robustness against typos Chai et al. (2024). This preliminary result also indicates that the robustness of larger models against typos is insufficient, resulting in a performance drop. We conclude that typos damage performance, but larger LLMs have some robustness against typos, which motivates us to investigate the typo-related inner workings. Furthermore, this leads us to a deep analysis of the reasons for the differences in robustness against typos by model sizes for further improvement.

4 Typo Neurons

Some FFN layers have been found to combine multiple tokens into a single representation vector Kaplan et al. (2024); Elhage et al. (2022); Lad et al. (2024). Additionally, it has been reported that certain neurons within LLMs function as “skill neurons” with specific roles Wang et al. (2022). In this section, we investigate the existence of typo neurons, a particular type of skill neuron that is responsible for recognizing and fixing typos.

4.1 Method to Identify Typo Neurons

Following the approach of Hiraoka and Inui (2024), we compare the activation values of neurons between clean inputs and typo inputs to identify neurons that specifically respond to typos. Let $x\in X$ be a sample of the dataset, where $x$ is a sequence of $|x|$ tokens: $x={w_{1},...,w_{m},...,w_{|x|}}$ . Each sample comprises the prompt (e.g., “Q. What is … A. This is ”) and the answer (e.g., “cygnet”).

The activation value $s_{n}^{X}$ of a neuron $n$ when feeding a dataset $X$ is defined as the following:

s_{n}^{X}=\frac{1}{|X|}\sum_{x\in X}\left(\frac{1}{|M_{x}|}\sum_{m\in M_{x}}f(x_{1}^{m},n)\right),

(1)

where $|X|$ is the number of samples in the dataset. $f(x_{1}^{m},n)$ is a function calculating the activation value of the neuron $n$ corresponding to $w_{m}$ when the LLM reads the input $x_{1}^{m}={w_{1},...,w_{m}}$ . $M_{x}$ is a set of indices that indicates the token positions, and $|M_{x}|$ is the number of indices. We define $M_{x}$ as the indices comprising the answer word tokens and $t$ important words.

For example, in Figure 1, $M_{x}$ for the clean input is composed of “young” and the apostrophe before “cygnet”, while $M_{x}$ for the typo input is composed of “you”, “neg”, and the apostrophe and for the split input is “y”, “oung”, and the apostrophe. In the figure, tokens comprising $M_{x}$ are indicated with an orange background.

We obtain the responsibility of neurons specialized to the typo inputs separated from clean and split inputs with the following score $\Delta_{n}$ :

{\Delta}_{n}=s_{n}^{X_{\mathrm{typo}}}-\max\left(s_{n}^{X_{\mathrm{clean}}},s_{n}^{X_{\mathrm{split}}}\right),

(2)

where $X_{\mathrm{typo}}$ , $X_{\mathrm{clean}}$ , and $X_{\mathrm{split}}$ are the typo, clean, and the split datasets, respectively.

A larger ${\Delta}_{n}$ indicates the neuron $n$ that responds specifically to typos but not clean inputs or split inputs. Among the neurons, the top $K$ neurons based on ${\Delta}_{n}$ scores are identified as typo neurons.

4.2 Experimental Results

This section investigates the typo neurons found with the method introduced in §4.1. We used the number of typos $t=1$ . Appendix C additionally describes the results for $t=16$ .

Figure 3 shows the distribution of ${\Delta}_{n}$ and the distribution of the typo neurons in each layer. We extracted the top $0.5$ % of neurons with the highest ${\Delta}_{n}$ as the typo neurons. The average (Ave) and standard deviation (SD) in Figure 3 indicate that a few neurons have significantly larger scores than others, similar to knowledge and skill neurons Dai et al. (2022); Wang et al. (2022).

For the distribution of neurons, Llama 3 family and Qwen 2.5 have many typo neurons in the late layers(i.e., from 0.8 to 1.0). In contrast, Gemma 2 models have many typo neurons in the early layers (i.e., from 0.0 to 0.2), and few are in the late layers. Especially in the 9B and 27B models, the largest number of typo neurons exist in the early layers.

According to Lad et al. (2024), the late layers perform Residual Sharpening, which removes noise from representations. Considering typos as noise, it is natural that many typo neurons are in the late layers. Besides, Elhage et al. (2022) reports that the early layers are responsible for Detokenization that converts raw token representations into coherent entities (e.g., words), while the late layers perform Retokenization that converts them back into token-level representations. These suggest that Gemma 2 fixes typos as Detokenization, while LLaMA 3 family and Qwen 2.5 fix typos as Retokenization. Since both processes use local contexts, we can see the variety of the balance in responsibility between the early and late layers. As shown in Appendix C, with many typos, typo neurons in the late layers of Gemma 2 models also increased. This indicates that the distribution of responsibility between the early and late layers is adaptable.

In the middle layers (i.e., 0.2-0.8), all models have many typo neurons. This suggests that these layers play a common role in typo-fixing across models. Since the early middle layers create representations depending on global contexts with attention heads as Feature Engineering and the late middle layers convert current representations to next token representations as Prediction Ensembling Lad et al. (2024), typo-fixing in these layers seem to focus on recognition of global contexts in contrast to the early and late layers.

4.3 Discussion

While the experimental results in §4.2 suggest the existence of typo neurons, their impact has not been clarified. Then, in this section, we investigate their specific impact, focusing primarily on Gemma 2.