\NewCommandCopy\proofqedsymbol

∎\AtBeginEnvironmentproof \AtBeginEnvironmentexample \AtEndEnvironmentexample∎

Language all the way down:
Linguistic structures in statistics education

Tess O’Brien

(October 2024)

Abstract

The ability to read, write, and speak mathematics is critical to students becoming comfortable with statistical models and skills. Faster development of those skills may act as encouragement to further engage with the discipline. Vocabulary has been the focus of scholarship in existing literature on the linguistics of mathematics and statistics but there are structures that go beyond the content of words and symbols. Here I introduce ideas for grammar and discourse features through a sequence of examples.

Acknowledgments

Thanks to Laura Le for directions in the literature and a keen eye on drafts, Jacqui Ramagge for pointers on the adjective-noun description in the work of Herb and Ken Gross, Alberto Nettel-Aguirre and Elinor Jones for advice on drafts.

Introduction

When we teach statistics, we do so in the language of mathematics. When we construct models, communicate them, use them to make predictions, we do so in the language of mathematics. Our students struggle to learn that language because it is hard, and the linguistic structures are opaque even to the fluent. By making linguistic structures explicit we may be able to help students develop confidence and competence in using mathematical language to express their ideas.

That mathematics is a language is not a new idea. In many ways the revolution in proof theory during the 19th and into the 20th century was about the language of argument structure. There has been work since then in linguistic discourse theory such as [6], and in education research [1, 7] which has predominantly addressed the vocabulary used to pose mathematical problems in school learners. From statistics specifically, [8] addresses vocabulary in statistical English and its interaction with ordinary and mathematical English for comparison. However, vocabulary is only part of a bigger picture of language structure. Recent developments in functional linguistics as applied to physics [3] and Piaget’s ideas about mechanisms of abstraction [2] have opened up some useful avenues for thinking about more general linguistic structures in maths, advancing preliminary ideas published in [5].

In this article I will introduce and apply ideas about language structures and processes in maths to some specific statistical examples. To get the ball rolling I start with a more theoretical example, then extend those ideas into specific situations that arise both at the introductory and advanced level of statistics. Finally I will point to linguistic choices that are made outside the level of a specific sentence or equation, when we use language to connect a statistical model to the system it represents.

Example: Linear Function

We start with an example that is based in theoretical maths in order to set up the linguistic tools we will use later in a more straightforward situation.

Example 1.

Let $f$ be a function from $\mathbb{R}$ to $\mathbb{R}$ given by

f(x)=2x+1.

(1)

The main linguistic structure I see here is a grammatical process moving between noun and verb in the use of $f$ . When $f$ is first named it appears as a noun in ‘Let $f$ be’. The second appearance (in the same sentence no less) is as a verb – $f$ -ing $x$ . Bob Carpenter of the Stan developer team pointed out that this is a use-mention distinction. Our introduction of $f$ as a noun is a performative linguistic act, we are doing a speech act that itself causes something to happen. Think of a promise or a declaration by an authority figure or a variable initialised in code. ‘Let $f$ be a function’ instantiates the mathematical object called $f$ in the minds of the readers and ascribes to it the structures of a function. We do similar things in statistics when we introduce a random variable as we shall see in the next example.

In linguistics, we put words into word classes that describe their function in a sentence. You are likely familiar with nouns and verbs, for example. Some words move between word classes as their function changes in different contexts. The process of moving between word classes is extremely common in maths education because we tend to start by using something as a noun modifier (adjective) or operation (verb) then turn it into a noun to talk about it more. In linguistics, a word becoming a noun is nominalisation and happens commonly to verbs and adjectives. [5] points to numbers as adjectives in a construction like counting objects in the world where the number describes the noun. Five chairs, three plates, 15 people. Numbers then get nominalised when we start talking about them as mathematical objects in the abstract. In Example 1 we moving between word classes for the function but in the reverse direction to becoming a noun. We start with naming the object $f$ and the properties of being a function and then use $f$ to do something when we apply it to $x$ .

More generally, we can associate the grammatical process of nominalisation in maths with moving between levels of abstraction. The maths education literature on abstraction is extensive and typically connected to the process of learning maths through the work of Jean Piaget. I think the most relevant ideas are those in the Piaget process of reflective abstraction [2]. If we consider the use of a thing (verb form of the function, adjective form of the number) to be a lower level of abstraction compared to what that thing becomes post-nominalisation (function, number as nouns), we can align the grammatical process with learning the next level in mathematical structure. Once a number has been nominalised, we start talking about its properties. Once we nominalise equations as functions we can look at function properties like continuity.

In Equation (1) we also see an example of 2 going the opposite direction to nominalisation, returning to adjective form. The construction of $2x$ as a noun group (multiple words combined into a single unit that acts as a name) uses 2 as an adjective describing the noun $x$ . Herb and Ken Gross are credited with an adjective-noun description of these algebraic objects many years ago (The adjective-noun algebra description has inspired a great many people as evidenced by the material on the now-defunct website adjectivenounmath.com [4].) but I disagree with their claim that arithmetic alone without variables such as $2+1=3$ has a hidden common noun that the numbers are attached to as adjectives. Instead I think that 1, 2, and 3 are present as nouns from the nominalisation process, and it is plausible that one of the difficulties of introducing the adjective-noun algebraic construction is undoing the nominalisation process. We go from $2\times x$ where both 2 and $x$ are present as nouns to $2x$ – the adjective-noun noun group construction that has been transformed grammatically and mathematically by the now hidden multiplication process. We still have 1 running around as a noun though, so the student has to hold both grammatical forms in play simultaneously.

In the first example we have introduced some linguistic terminology in relation to a simple function definition. From here we are going to move into a far more complicated function that is more directly relevant to teaching statistics – the normal distribution cumulative distribution function.

Example: Cumulative Distribution Function

The introduction of cumulative distribution functions is difficult for students but serves as a case study where applying the above framework for grammatical structure and processes can be useful.

Example 2.

Let $X$ be normally distributed with mean $\mu$ and variance $\sigma^{2}$ . We define the cumulative distribution function $F_{X}$ to be

F_{X}(x)=\int_{-\infty}^{x}\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\bigg{(}-\frac{(t-\mu)^{2}}{2\sigma^{2}}\bigg{)}\,dt.

(2)

We then use $F_{X}$ to calculate the probability of $X$ taking values in an interval as

F_{X}(x)=P(X\leq x),\text{ hence }P(a\leq X\leq b)=F_{X}(b)-F_{X}(a).

As in the first example, our naming of the cumulative distribution function $F_{X}$ is instantiation as a noun, while the use of it to actually extract probabilities is a verb applied to different nouns here. We are also using a probability function directly in $P$ . In the expression of how to calculate $F_{X}(x)$ we use other functions such as exponentiation and square root, as well as integration but these appear as verbs only as we are assuming those to be introduced by name elsewhere. We also see the performative action of introducing a random variable $X$ and ascribing structure to it, in this case having a normal distribution.

The formula in Equation (2) uses a collection of notation for different verbs and nouns which is worth unpacking to examine as a 2-dimensional pictogram expression of the mathematical language. The integral operation (including $dt$ ) is written in a non-linear fashion and breaks the line flow for the limits. We assume that students will know exp is the name of a single function and $dt$ is not treated as a function at all. Elsewhere we use single letters for other functions that get named such as $P$ and $F$ , and reserve combinations of letters for implied multiplication as in $2\pi\sigma^{2}$ . In $2\pi\sigma^{2}$ all of the pieces being multiplied ( $2$ , $\pi$ , and $\sigma^{2}$ ) are behaving as nouns in a noun group because they’re all constants rather than a combination of a constant and a variable as we saw in Equation (1) with $2x$ .

Among the symbols in Equation (2) are Latin letters for the functions (verbs) exp, $P$ , and $F_{X}$ but also the variable (noun) $t$ . We have Greek letters for some constants that are nouns and also have 2 show up. Other mathematical symbols are present for arithmetic in $-$ and the fraction bar which are verbs. Powers are notated by position and size of a number in the squares so the verb identification is based on relative position in two dimensions in that case, but square root function is given its own symbol.

Among the verbs there are multiple ways of relating a function or operation to its arguments which are the associated nouns. Some things such as $F$ and exp use parentheses to enclose arguments. The integral operator has limits given above and below (which are part of the arguments) while the name of the integrating variable $t$ is only known from its inclusion in the operator symbol $dt$ that indicates the end of the function being integrated. At least, $dt$ does not have further meaning when we first teach students integration. The square root uses the line over its argument to show what it is applied to while squaring uses parentheses where multiple symbols are involved. Subtraction takes arguments on each side – a status really only given to arithmetic operations – but we duplicate the subtraction symbol to use as multiplication by $-1$ which applies to the entire fraction that follows. Understandably, students struggle to learn these conventions.

In the normal distribution CDF we see much more complicated grammatical structure through the construction of Equation (2) when compared to Equation (1). Some similar relationships are at play such as constructing noun groups through implied multiplication, but the introduction of more functions acting as verbs on their arguments makes the pictogram harder to deconstruct.

In the next example we will move to multiple linear regression, which behaves more like Equation (1) in the underlying structure, but has additional features such as the combination of multiple terms.

Example: Multiple Linear Regression

In this example we will present the regression equation in two forms: first with matrix notation and then with explicit arithmetic. The choice of form is meaningful as we shall see.

Example 3.

Consider data with variables $Y$ , $X_{1}$ , and $X_{2}$ , transformed variable $X_{1}^{2}$ , and interaction term $X_{1}X_{2}$ under the linear model

Y=X\beta+\epsilon.

(3)

We can also express Equation (3) with direct reference to data as

y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\beta_{3}x_{1i}^{2}+\beta_{4}x_{1i}x_{2i}+e_{i}.

(4)

The expression in Equation (3) has a near-identical construction to Equation (1), but we have changed the meaning by sneakily swapping to matrix rather than number arithmetic and changed the order of the multiplication. We can still read $Y$ , $X$ , and $\epsilon$ as nouns, $\beta$ as an adjective (as distinct from the implied multiplication), and $+$ as a verb, but there is a considerably greater amount of information tied up in the symbols compared to Equation (1). The neutron-star like density of information of Equation (3) is precisely why we don’t require any more symbols but comes at the cost of making structure implicit. There is also meta-textual information. I gave a talk about preliminary ideas on language structure at the University of Leiden, and an audience member pointed out that our choice of matrix form is intended to communicate the sort of arguments we use for proofs and calculations. Matrix expressions align with matrix operations for example. A classic case of the medium being the message.

Equation (4) is a more explicit expression of the same relationship as Equation (3), but in lowering the density we have to lean on many more symbols being present to encode the linear structure directly. By writing out the arithmetic we can talk much more directly about the variables, and allows for much easier tracking of transformations and interactions. We can leverage the grammatical construction of adjective-noun groups from the first example too. The terms in Equation (4) like $\beta_{1}x_{1i}$ are such constructions, and get combined with the verb $+$ . We still have $\beta_{0}$ running around as a noun on its own as well, analogous to the presence of the constant 1 in Equation (1).

Equations (3) and (4), combined with the initial naming of the variables and statement that they have a linear relationship, gives us three different representations of the same underlying structure. There are differing levels of abstraction depending on how explicit the number arithmetic is made, with the term ‘linear model’ being the most abstract.

We can see the linguistic choices in notation in this example, and how those relate to linguistic structure. We also get to see different levels of abstraction directly linked to how explicit or implicit the mathematical structure is for the particular model. As we use linear regressions to represent systems of interest in the world, it’s worth looking at how that relationship works from the perspective of language.

Discourse and Analogy

One of the biggest conceptual struggles for students is the relationship between a statistical model that lives as a mathematical object in our heads, and the real world system we are investigating. From the linguistics perspective we leverage analogy to connect the two. Our argument for why we can use a particular mathematical object comes down to whether we believe there’s enough similarity to the behaviour of the system we care about to be useful. We rarely make this discursive mechanism explicit, the closest we get is checking assumptions in data or talking about data types. Students struggle to recognise that they can and should think about the structure of mathematical objects deeply, and consider if those objects behave in a suitable way for their particular use case. Making the analogy explicit serves a double purpose as it also shows the less mathematically inclined students why they should seek to develop a thorough understanding of the abstract objects we use.

In the classroom

The core argument of this article is that making linguistic structure clear to students may help them leverage their existing language skills to understand mathematical objects and representations. I have explicitly used this in my classes and found it useful, but there is no literature testing the results as the linguistic structures I am discussing are not part of existing theory. However, I can offer some suggestions for how to introduce the specific structure I discuss.

The main structure I talk about in this paper is grammatical. First, that we can use ideas like nouns and verbs to interpret what is going on in a symbolic expression, and second that there are grammatical processes at play that move specific symbols between word classes in order to convey different information. Demonstrating this with explicit examples is key because it is difficult to conceptualise in the abstract. As seen in Example 1, these do not need to be very complicated to get started and I have found them particularly useful for non-mathematics students who are not exposed to enough maths to develop fluency otherwise. Those ideas are applicable to more complicated cases such as Example 2 which is more relevant for a mathematics student because they will see a lot of complicated pictograms of notation.

The secondary structure of analogy that occurs at the level of discourse is less explicit in examples, and more about introducing statistics students to the idea that they are communicating with others when they build statistical models. I have found explicit reference to analogy useful to demystify the relationship between the mathematical object that is the model itself and the system that we care about. Describing this structure as an analogy immediately opens up questions about justifying the use of that object, evidence, and how that relates to the data and measurement process. The idea of an analogy can be incorporated into wider effort to teach statistical reporting writing and I find it helps explain why we want them to talk about model assumptions in particular.

Discussion and Conclusions

In this article I have shown a few linguistic structures that are at play when we use mathematical language in statistics. These structures exist at multiple levels, from the relationships between symbols in an equation to the choices we make about models. Making these structures explicit may help students leverage their existing language skills including the basic grammar they learn in school, and help them to see the process of learning to work with the mathematical objects we use as a process of learning the language of mathematics.

At the level of static symbols we have pictograms such as Equation (2), where a complicated web of syntactic relationships – many implicit and based on relative position – presents a major barrier to learners. Equipping students with the tools to dissect these expressions as a construction in language may help to relieve the anxiety of being confronted with such a mess. Equation (4) leverages fewer grammatical tools, but relies on the recognition of noun groups connected up by verbs in order to separate out the pieces.

Grammatical processes come into play in more dynamic situations across the three examples. Moving a particular symbol between word classes is key to how we first introduce, and then use, a mathematical object such as a function. Students really struggle to recognise that is what they are doing when they write and use equations, and can get confused about the role symbols play as a result. The reverse process of nominalisation plays a key role in how students are first introduced to many different mathematical objects, and I suspect is a key barrier to developing mathematical fluency.

At the level of discourse we leverage analogy across statistics every time we use a mathematical object to represent a system we are interested in. These discursive mechanisms are almost never discussed with students which can leave them in the dark about why they need to be careful with the mathematical objects they choose. We have to make the argument that a model works as an analogy, and that is much easier for students to do when they know that they’re being asked to.

There is much work yet to be done on linguistic structure in mathematics and statistics. However, in the absence of a robust theory we can leverage observed structures to make it easier for students to be aware of what they are doing, and make better choices about how they use mathematical expressions to represent statistical models.

References

[1] F. Bruun, J. M. Diaz, and V. J. Dykes, The language of mathematics, Teaching Children Mathematics, 21 (2015), pp. 530–536.
[2] P. C. Dawkins, A. J. Hackenberg, and A. Norton, Piaget’s genetic epistemology for mathematics education research, Springer, 2024.
[3] Y. J. Doran, The discourse of physics: Building knowledge through language, mathematics and image, Routledge, 2017.
[4] H. Gross and K. Gross, Mathematics as a second language, 2012. Accessed via WebArchive https://web.archive.org/web/20120122070732/http://www.adjectivenounmath.com/index.html on 4th September 2024.
[5] T. O’Brien, Maths is a language, Australian Mathematical Society Gazette, 50 (2023), pp. 122–128.
[6] K. O’Halloran, Mathematical discourse: Language, symbolism and visual images, Continuum, 2004.
[7] S. R. Powell, S. E. Bos, and X. Lin, The assessment of mathematics vocabulary in the elementary and middle school grades, Diversity dimensions in mathematics and language learning, 24 (2021), pp. 313–330.
[8] M. Rangecroft, The language of statistics., Teaching Statistics, 24 (2002).

Language all the way down: Linguistic structures in statistics education