This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An enriched category theory of language: from syntax to semantics

Tai-Danae Bradley1 1X, The Moonshot Factory and Sandbox@Alphabet, Mountain View, CA [email protected] John Terilla2 2The City University of New York and Tunnel, New York, NY [email protected]  and  Yiannis Vlassopoulos3 3Tunnel, New York, NY [email protected]
Abstract.

State of the art language models return a natural language text continuation from any piece of input text. This ability to generate coherent text extensions implies significant sophistication, including a knowledge of grammar and semantics. In this paper, we propose a mathematical framework for passing from probability distributions on extensions of given texts, such as the ones learned by today’s large language models, to an enriched category containing semantic information. Roughly speaking, we model probability distributions on texts as a category enriched over the unit interval. Objects of this category are expressions in language, and hom objects are conditional probabilities that one expression is an extension of another. This category is syntactical—it describes what goes with what. Then, via the Yoneda embedding, we pass to the enriched category of unit interval-valued copresheaves on this syntactical category. This category of enriched copresheaves is semantic—it is where we find meaning, logical operations such as entailment, and the building blocks for more elaborate semantic concepts.

1. Introduction

The world’s best large language models (LLMs) have recently attained new levels of sophistication by effectively learning a probability distribution on possible continuations of a given text. Interactively, one can input prefix text and then sample repeatedly from a next word distribution to generate original, high quality texts [VSP+17, RNSS18, RWC+18, B+20]. Intuitively, the ability to continue a story implies a great deal of sophistication. A grammatically correct continuation requires a mastery of syntax, careful pronoun matching, part of speech awareness, a sense of tense, and much more. A language model that effectively learns a probability distribution on possible continuations must apparently also have learned some semantic knowledge. For the continuation of a story to be reasonable and internally consistent requires knowledge of the world: dogs are animals that bark, golf is played outdoors during the day, Tuesday is the day after Monday, etc. What is striking is that these LLMs can be trained using unlabeled samples of text to predict a next word. No grammatical or semantic input is provided, nevertheless complex syntactic structures, semantic information, and world knowledge are learned and demonstrated. The present work is a response to the real-world evidence that it is possible to pass from probability distributions on text continuations to semantic information. We propose a mathematical framework for this passage.

We define a syntax category, which is a category enriched over the unit-interval [0,1][0,1], that models probability distributions on text continuations [BV20]. Our semantic category is then defined to be the enriched category of [0,1][0,1]-valued copresheaves on the syntax category. The Yoneda embedding, which maps the syntax category as a subcategory of the semantic category, assigns to a given text its representable copresheaf. We regard the [0,1][0,1]-valued copresheaf represented by a text as the meaning of the text, as in dynamic semantics [NBvEV16], whose slogan is “meaning is context change potential.” Furthermore, there are categorical operations in the semantic category that allow one to combine meanings that correspond to certain logical operations. In particular, there is a kind of context-sensitive implication that models whether a certain text is true, given that another text is true.

The paper is organized as follows. Section 1.1 motivates the category theoretical approach to language by contrasting it to an algebro-geometric perspective and describing a few advantages with explicit examples in Section 1.2. As noted there, a primary advantage is the ability to marry both the compositional and distributional structures of language in a principled way, a claim fully developed in Section 2. There, a few basic definitions from enriched category theory are recalled before defining the language syntax category \mathcal{L} as a category enriched over [0,1][0,1]. We then review more enriched category theory and prove the results we need in the special case that the enriching category is the unit interval. This sets the stage to pass to our semantic category of [0,1][0,1]-valued copresheaves on \mathcal{L}. The next section details operations on enriched copresheaves that are akin to conjunction, disjunction, and implication between meanings of expressions in language. Section 5 makes use of an isomorphism between [0,1][0,1] and the set of nonnegative extended reals [0,][0,\infty] to recast our ideas in the language of generalized metric spaces and tropical geometry. Finally, a conclusion and summary is provided in Section 6.

The intended audience for this work includes mathematicians as well as mathematical data scientists who may be intrigued by recent advances in deep learning and natural language, as well as researchers interested in interpretability. Some of the content may be helpful to philosophers concerned with reasoning, artificial intelligence, and logic in both abstract theory and applications. We do assume the reader is familiar with basic concepts in category theory: categories, functors, natural transformations, limits, and colimits. Excellent introductions to the subject are readily available, including [Rie17, Lei14, FS19, LS09]. We do not assume familiarity with enriched category theory and introduce definitions and prove results as needed. For a gentle introduction to the topic, see [FS19, Chapter 2] as well as [Kel82] for a more formal treatment. For a historical background of statistical language models and common techniques, see the survey in [JX19], and for nontechnical discussions of the current best-in-class LLMs see [Met20, Hea20, Hao21].

1.1. Compositionality

Language is a compositional structure: expressions can be combined to make longer expressions. This is not a new idea. Theoretical linguists have studied grammatical structures for a long time — think, for example, of the richly developed field of formal grammars [Wik21]. Focusing on the compositionality, one may consider language as some kind of algebraic structure, perhaps simply as the free monoid on some set of atomic symbols quotiented by the ideal of all expressions that are not grammatically correct; or, perhaps, as something more operadic, resembling the tree-like structures of parsed texts, labelled with classes that can be assembled by some set of tree-grafting rules. In either case, the algebraist might identify the meaning of a word, such as “red” with the principal ideal of all expressions that contain “red.” The concept of red, then, is the ideal containing red ruby, bright red ruby, red hot chili peppers, red rover, blood red, red blooded, the workers’ and peasants’ red army, red meat, red firetruck,… If one knows every expression that contains a word, then, as the thinking goes, one understands the meaning of that word. One is then led to a geometric picture by an algebro-geometric analogy: construct a space whose points are ideals and view the algebraic structure as a sort of coordinate algebra on the space. From this perspective, language is a coordinate algebra on a space of meanings. This geometric picture points in appealing directions. If one can say the same kinds of things in different languages, then those languages should be regarded as having comparable ideal structures (or possibly equivalent categories of modules), translating between different languages involves a change of coordinates on the space of meanings, and so on. Going further, one has a way to model semantics via sheaves on this space of meanings, as Lawvere in his classic 1969 paper [Law69] about syntax-semantic duality.

We think this algebro-geometric picture is inspirational, but incomplete. Here, we furnish it with an important refinement. The compositional structure of language is only part of the structure that exists in language. What is missing is the distribution of meaningful texts. What has been, and what will be, said and written is crucially important. When one encounters the word “firetruck” it’s relevant that “red firetruck” has been observed more often than “green firetruck.” The fact that “red firetruck” is observed more often than “red idea” contributes to the meaning of red. Encountering an unlikely expression like “red idea” involves a shift in expectation that demands attention and contributes to the meaning of the broader context containing it. Again, the idea that distributional structure is important in language is not new. The so called distributional hypothesis [Har54] in linguistic states that linguistic items with similar distributions have similar meanings. In machine learning, word frequency counts have been employed to automatically extract knowledge from a text corpus in Latent Semantic Analysis and vector models of meaning [TP10].

So, the ideals mentioned above are just a first approximation for the space of meanings. In this paper, we formalize a marriage of the compositional structure of language with the distributional structure as a category enriched over the unit interval, which we denote by \mathcal{L} and call the language syntax category. Then, once we have defined this language syntax category, we pass to the category of enriched copresheaves on \mathcal{L}, the objects of which are unit interval valued functions on \mathcal{L} satisfying a certain monotonicity condition. The category ^\widehat{\mathcal{L}} of enriched copresheaves, which is itself a category enriched over the unit interval, is the semantic category. Meanings reside in this semantic category, which also admits ways for meanings to be manipulated and combined to form higher semantic concepts.

Before we discuss why we are using category theory, let us comment briefly on two other efforts to study natural language using categorical methods. The DisCoCat program of [CSC10] seeks to combine compositional and distributional structures in a single categorical framework, though a choice of grammar is needed as input. In our work, syntactical structures are inherent within the enriched category theory and no grammatical input is required. This is motivated by the observation that LLMs can continue texts in a grammatically correct way without the additional input of a grammatical structure, suggesting that all important grammatical information is already contained in the enriched category. In short, the DisCoCat program aims to attribute meaning to parts of texts and grammatical rules for combining them to build meaning of larger texts. This is like the reverse of our work, which asserts that the meaning of small texts is derived from the distribution of larger texts that contain them. In [AS14], Abramsky and Sadrzadeh consider a sheaf theoretic framework for studying language. That work makes some interesting use of the gluing condition for sheaves. Here, we work with copresheaves, without any kind of gluing conditions, but the most important difference is that Abramsky and Sadrzadeh are not using the distributional structure that we are focused on here.

1.2. Why category theory?

The algebraic perspective of viewing ideals as a proxy for meaning is consistent with a category theoretical perspective, and the latter provides a better setting in which to merge the compositional and distributional structures of language. But even before adding distributional structures, moving from an algebraic to a categorical perspective provides certain conceptual advantages that we highlight first.

Consider the category whose objects are elements of the free monoid on some finite set of atomic symbols, where there is a morphism xyx\to y whenever xx is a substring of yy, that is, when the expression yy is a continuation of xx. If the finite set is taken to be a set of English words, for example, then there are morphisms red \to red firetruck and red \to bright red ruby and so on. Each string is a substring of itself, providing the identity morphisms, and composite morphisms are provided by transitivity: if xx is a substring of yy and yy is a substring of zz, then xx is a substring of z.z. This category is simple to visualize: it is thin, which is to say there is at most one morphism between any two objects, and so one might have in mind a picture like that in Figure 1. A consequence of the Yoneda lemma implies that a fixed object in this category is determined up to unique isomorphism by the totality of its relationships to all other objects in the category. One thus thinks of identifying an expression xx with the functor hx:=hom(x,)h^{x}:=\hom(x,-) whose value on an expression yy is the one-point set \ast if xx is contained in yy and is the empty set otherwise. The functor hxh^{x} is an example of a copresheaf, the name given to a functor from a given category to the category of sets, and is in fact a representable copresheaf, represented by the object xx. So in this way the preimage of \ast is precisely the principal ideal generated by x.x. Representable copresheaves, therefore, are comparable to principal ideals and both can be thought of as a first approximation for the meaning of an expression.

Figure 1. A category with at most one morphism between any two objects. Here the objects are expressions in a language and morphisms indicate when one expression is a continuation of another. Idenity morphisms are not pictured, but are understood to be present.

1.3. Constructions in the unenriched setting

An immediate advantage to the shift in perspective is that the functor category of copresheaves 𝖢^:=𝖲𝖾𝗍𝖢\widehat{\mathsf{C}}:=\mathsf{Set}^{\mathsf{C}} on any small category 𝖢\mathsf{C} is better behaved than 𝖢\mathsf{C} itself. It has all limits and colimits and is Cartesian closed. In fact, the category of copresheaves is a topos, which is known to be the appropriate setting for intuitionistic logic [MM12]. The full theory of toposes is not incorporated in this work, but intuition abounds when one takes inspiration from the field. Importantly, one has concrete operations for building new copreasheaves from representable ones, suggestive of the “meanings are composable instructions” perspective of internalist semantics [Pie18].

1.3.1. Coproducts

In the algebraic setting, the union of ideals is not an ideal, and so one is left to wonder what algebraic structure might represent the disjunction of two concepts, say “red” or “blue.” In the categorical setting, the answer is straightforward. The coproduct of copresheaves is again a copreasheaf. Given any objects xx and yy in a small category 𝖢\mathsf{C}, the coproduct of representable copresheaves hxhy:𝖢𝖲𝖾𝗍h^{x}\sqcup h^{y}\colon\mathsf{C}\to\mathsf{Set} is computed “pointwise.” So suppose 𝖢\mathsf{C} is the category of expressions in language defined above, and let x=redx=\textit{red} and y=bluey=\textit{blue}. The coproduct hredhblueh^{\textit{red}}\sqcup h^{\textit{blue}} therefore maps an expression cc to the set hred(c)hblue(c)h^{\textit{red}}(c)\sqcup h^{\textit{blue}}(c). This set is isomorphic to \ast if cc contains either red or blue, and it is isomorphic to a two-point set if cc contains both, and otherwise it is the empty set. The output of the functor hredhblueh^{\textit{red}}\sqcup h^{\textit{blue}} is therefore nonempty on the union of all expressions that contain red or blue or both, and this matches well with the role of union as logical “or” among sets.

1.3.2. Products

Now think about limits and in particular products. Like the coproduct, the product of representable copresheaves hx×hy:𝖢𝖲𝖾𝗍h^{x}\times h^{y}\colon\mathsf{C}\to\mathsf{Set} is again a copresheaf computed pointwise. So if 𝖢\mathsf{C} is the category of language and x=redx=\textit{red} and y=bluey=\textit{blue}, then the value of the product on any expression cc is given by hred(c)×hblue(c)h^{\textit{red}}(c)\times h^{\text{blue}}(c), which is isomorphic to \ast if cc contains both red and blue and is the empty set otherwise. So the output of the functor hred×hblueh^{\textit{red}}\times h^{\textit{blue}} is nonempty on the intersection of expressions that contain red with those that contain blue, and this coincides with the role of intersection as logical “and” among sets.

1.3.3. Cartesian closure

Advantages of the categorical perspective can further be illustrated with a third example, which comes from the Cartesian closure of the category of copresheaves 𝖢^\widehat{\mathsf{C}} on a small category 𝖢\mathsf{C}. To say that 𝖢^\widehat{\mathsf{C}} is Cartesian closed means that it has finite products and moreover that the product of copresheaves has a right adjoint (See section I.6. of [MM92]). More precisely, there is a bifunctor [,]:𝖢^×𝖢^𝖢^[\,\,,\,\,]\colon\widehat{\mathsf{C}}\times\widehat{\mathsf{C}}\to\widehat{\mathsf{C}} called the internal hom that fits into an isomorphism 𝖢^(F×G,H)𝖢^(F,[G,H])\widehat{\mathsf{C}}(F\times G,H)\cong\widehat{\mathsf{C}}(F,[G,H]) for all copresheaves F,GF,G and HH. Here and throughout we use the notation 𝖠(x,y)\mathsf{A}(x,y) to denote the set of morphisms from objects xx to yy in a category 𝖠\mathsf{A}. The relevance of this new copresheaf [G,H][G,H] can be seen in its kinship to implication in logic. Indeed, in intuitionistic propositional calculus one works within a certain kind of thin Cartesian closed category called a Heyting algebra. The usual notation in this setting is to write xyx\leq y if there is a morphism from xyx\to y. The categorical product is denoted by \wedge and the internal hom is denoted by \Rightarrow. So, in this notation, one has: xyzx\wedge y\leq z if and only if x(yz)x\leq(y\Rightarrow z) for all elements x,y,zx,y,z. Keep this analogy in mind as we dissect an example of the internal hom in 𝖢^\widehat{\mathsf{C}} when 𝖢\mathsf{C} is the category of expressions in language. Let’s do some unwinding to see that the copresheaf

[hred,hblue]:𝖢𝖲𝖾𝗍[h^{\textit{red}},h^{\textit{blue}}]\colon\mathsf{C}\to\mathsf{Set}

captures something like an implication red \Rightarrow blue. First use the Yoneda lemma and then the defining property of the internal hom to get

(1) [hred,hblue](c)=𝖢^(hc,[hred,hblue])=𝖢^(hc×hred,hblue).[h^{\textit{red}},h^{\textit{blue}}](c)=\widehat{\mathsf{C}}(h^{c},[h^{\textit{red}},h^{\textit{blue}}])=\widehat{\mathsf{C}}(h^{c}\times h^{\textit{red}},h^{\textit{blue}}).

So, the internal hom assigns to an expression cc the set of natural transformations from the functor hc×hredh^{c}\times h^{\textit{red}} to the functor hblueh^{\textit{blue}}. The data of a single natural transformation from hc×hredh^{c}\times h^{\textit{red}} to hblueh^{\textit{blue}} consists of a collection of set functions

(2) {(hc×hred)(d)hblue(d)}dob(𝖢)\left\{(h^{c}\times h^{\textit{red}})(d)\to h^{\textit{blue}}(d)\right\}_{d\in\mathrm{ob}(\mathsf{C})}

that fit into a particular commutative square. Here, the domain (hc×hred)(d)(h^{c}\times h^{\textit{red}})(d) and the codomain (hblue)(d)(h^{\textit{blue}})(d) are either empty or a singleton, and so the data of a natural transformation either does not exist or is uniquely specified and automatically fits into the required commutative square. Therefore, [hred,hblue](c)[h^{\textit{red}},h^{\textit{blue}}](c) is either the empty set \emptyset or the one-point set \ast.

To determine whether [hred,hblue](c)[h^{\textit{red}},h^{\textit{blue}}](c) is empty or not, let’s look closer at the functions in (2). The product is computed pointwise, and thus the domain (hc×hred)(d)=hc(d)×hred(d)(h^{c}\times h^{\textit{red}})(d)=h^{c}(d)\times h^{\textit{red}}(d) is isomorphic to the one-point set \ast whenever dd contains both cc and red and is the empty set otherwise. The codomain hblue(d)h^{\textit{blue}}(d) is likewise either \ast if dd contains blue and is the empty set otherwise. Note that if there exists a text dd that contains cc and red but does not contain blue then (hc×hred)(d)=(h^{c}\times h^{\textit{red}})(d)=\ast and hblue(d)=h^{\textit{blue}}(d)=\emptyset, hence there does not exist a function(hc×hred)(d)hblue(d)(h^{c}\times h^{\textit{red}})(d)\to h^{\textit{blue}}(d) and the set of natural transformations hc×hredh^{c}\times h^{\textit{red}} to hblueh^{\textit{blue}} is empty. On the other hand, if every text dd that contains cc and red also contains blue then there is a unique function (hc×hred)(d)hblue(d)(h^{c}\times h^{\textit{red}})(d)\to h^{\textit{blue}}(d) which is specified as follows:

{when d contains cred, and bluewhen d does not contain both c and red, and d does contain bluewhen d does not contain both c and red, and d does not contain blue.\begin{cases}\ast\to\ast&\text{when $d$ contains $c$, \emph{red}, and \emph{blue}}\\ \emptyset\to\ast&\text{when $d$ does not contain both $c$ and \emph{red}, and $d$ does contain \emph{blue}}\\ \emptyset\to\emptyset&\text{when $d$ does not contain both $c$ and \emph{red}, and $d$ does not contain \emph{blue}}.\end{cases}

For example, if cc is the expression French flag, then there exists a unique natural transformation from hc×hredh^{c}\times h^{\textit{red}} to hblueh^{\textit{blue}} if every text that contains both French flag and red as subtexts, also contains blue and so [hred,hblue](French flag)=[h^{\textit{red}},h^{\textit{blue}}](\emph{French flag})=\ast. If cc is the expression ruby then there exists no natural transformation hc×hredh^{c}\times h^{\textit{red}} to hblueh^{\textit{blue}} if there is a text that contains red and ruby and does not contain blue and so [hred,hblue](ruby)=.[h^{\textit{red}},h^{\textit{blue}}](\emph{ruby})=\emptyset. One might say “red implies blue in the context of French flag” but “red does not implies blue in the context of ruby.”

In summary, the copresheaf [hx,hy]:𝖢𝖲𝖾𝗍[h^{x},h^{y}]\colon\mathsf{C}\to\mathsf{Set} is given by

[hx,hy](c)={ if every text that contains both c and x also contains yotherwise.[h^{x},h^{y}](c)=\begin{cases}\ast&\text{ if every text that contains both $c$ and $x$ also contains $y$}\\ \emptyset&\text{otherwise}.\end{cases}

Let’s review the picture of language presented in this section. One has a simplified language category 𝖢\mathsf{C} whose objects are expressions in the language having a single morphism xyx\to y if xx is a subexpression of yy. The functor 𝖢op𝖲𝖾𝗍𝖢\mathsf{C}^{\text{op}}\to\mathsf{Set}^{\mathsf{C}} that maps an expression xx to the representable functor hx=hom(x,)h^{x}=\hom(x,-) is called the Yoneda embedding and captures something of the meaning of xx. Here 𝖢op\mathsf{C}^{\text{op}} is the opposite category of 𝖢\mathsf{C}. As a category, it has the same objects as 𝖢\mathsf{C} and the morphisms are defined to be 𝖢op(x,y):=𝖢(y,x).\mathsf{C}^{\text{op}}(x,y):=\mathsf{C}(y,x).) This is comparable to making a passage from syntax to semantics, after which meanings can be combined by computing products, coproducts, and internal homs. Even more is possible, for combined meanings can be combined again forming higher concepts, and there are other categorical limits and colimits beyond products and coproducts, such as pushouts, pullbacks, equalizers, and so on…. Yet this picture is still incomplete. The distributional structure of language has not yet been accounted for. For this, we use enriched category theory.

2. Enriched category theory

Enriched category theory provides a ready-made way to decorate morphisms in the simplified language category 𝖢\mathsf{C} from Section 1.2 with conditional probabilities as in Figure 2. Enriched category theory begins with the observation that the set of morphisms between objects may have more structure than just a set. Examples are plentiful. The set of linear maps between vector spaces is itself a vector space, for instance. In enriched category theory, the morphisms between objects is itself an object in a category called the base category or the category over which the category is enriched. When the enriching category is the category 𝖲𝖾𝗍\mathsf{Set} of sets, enriched category theory reduces to ordinary category theory. In order for enriched category theory to have the desired structures and axioms (such as having composable morphisms), the base category must have some of the structures that the category of sets has. One can go rather far assuming the base category is a symmetric monoidal category. In order to have convenient versions of enriched presheaves and copresheaves, the base category should be a symmetric monoidal category that is also closed, which allows the base category to be enriched over itself. If, in addition, the base category is complete and cocomplete, meaning that it contains all limits and colimits, then the categories of copresheaves and presheaves are complete and cocomplete. From a categorical point of view, our setting is relatively simple. The set 𝖢(x,y)\mathsf{C}(x,y) of morphisms from xx to yy is either the empty set or the one point set. Also, the category we wish to enrich over is the unit interval [0,1][0,1], which becomes a complete, closed, symmetric monoidal category in a simple way that fits our purposes well. So, we will not burden the reader with the full machinery of general enriched category theory [Kel82] but rather take advantage of the simplifications afforded by our setting and specialize some of the definitions.

.04.10.07.97.17.11.07.02.65.33
Figure 2. The compositional and distributional structures of language are married by decorating arrows with the conditional probability that one expression contains another.

2.1. Categories enriched over [0,1][0,1]

A preorder is a set together with a reflexive, transitive relation. Any preorder becomes a category whose objects are the elements of the set, with a single morphism from aa to bb if and only if aa is related to bb. Reflexivity provides identity morphisms and transitivity provides composition of morphisms, which is defined in the only way it can be. Limits and colimits of finite diagrams always exists and are easy to compute: the limit is the minimum of the elements in the diagram and the colimit of a diagram is the maximum.

A monoid is a set together with an associative binary operation and a unit for that operation. A commutative monoid is a monoid whose operation is commutative. A preorder with a compatible commutative monoidal structure naturally becomes a kind of category over which one can enrich. Formally,

Definition 1.

A commutative monoidal preorder (𝒱,,,1)(\mathcal{V},\leq,\otimes,1) is a preorder (𝒱,)(\mathcal{V},\leq) and a commutative monoid (𝒱,,1)(\mathcal{V},\otimes,1) satisfying xyxyx\otimes y\leq x^{\prime}\otimes y^{\prime} whenever xxx\leq x^{\prime} and yyy\leq y^{\prime}.

Definition 2.

Let (𝒱,,,1)(\mathcal{V},\leq,\otimes,1) be a commutative monoidal preorder. The data of a (small) 𝒱\mathcal{V}-enriched category, or simply a 𝒱\mathcal{V}-category, consists of a set of objects 𝒞\mathcal{C}, and for every pair of objects xx and yy, there is an element 𝒞(x,y)𝒱\mathcal{C}(x,y)\in\mathcal{V} called a 𝒱\mathcal{V}-hom object. This data satisfies

(3) 1𝒞(x,x)\displaystyle 1\leq\mathcal{C}(x,x)
(4) 𝒞(y,z)𝒞(x,y)𝒞(x,z)\displaystyle\mathcal{C}(y,z)\otimes\mathcal{C}(x,y)\leq\mathcal{C}(x,z)

for all objects x,y,z𝒞x,y,z\in\mathcal{C}.

The unit interval [0,1]:={x:0x1}[0,1]:=\{x\in\mathbb{R}:0\leq x\leq 1\} is a commutative monoidal preorder with multiplication being the monoidal product, having 11 as the unit, and with the usual \leq relation being the preorder. The data of a [0,1][0,1]-category consists of a set of objects 𝒞\mathcal{C} and a [0,1][0,1]-valued function (x,y)𝒞(x,y)(x,y)\mapsto\mathcal{C}(x,y) defined for every x,y𝒞x,y\in\mathcal{C} satisfying 𝒞(x,x)=1\mathcal{C}(x,x)=1 for every x𝒞x\in\mathcal{C} and 𝒞(y,z)𝒞(x,y)𝒞(x,z)\mathcal{C}(y,z)\mathcal{C}(x,y)\leq\mathcal{C}(x,z) for all x,y,z𝒞x,y,z\in\mathcal{C}.

Definition 3.

A commutative monoidal preorder 𝒱\mathcal{V} is said to be closed provided for every pair of elements xx and yy in 𝒱\mathcal{V} there is an element [x,y]𝒱[x,y]\in\mathcal{V} , called the internal hom, satisfying

(5) xyz if and only if x[y,z]x\otimes y\leq z\text{ if and only if }x\leq[y,z]

for all x,y,zx,y,z in 𝒱\mathcal{V}.

The relevance is that a closed commutative monoidal preorder 𝒱\mathcal{V} becomes a category enriched over itself by replacing the hom set 𝒱(x,y)\mathcal{V}(x,y), which is either the empty set or the one-point set, by the internal hom [x,y][x,y], which is an object of 𝒱\mathcal{V}. To see that the assignment (x,y)[x,y](x,y)\mapsto[x,y] does in fact make 𝒱\mathcal{V} a category enriched over itself, one needs to check that 1[x,x]1\leq[x,x] and [y,z][x,y][x,z][y,z]\otimes[x,y]\leq[x,z]. The first inequality follows from the fact that 1xx1\otimes x\leq x and Equation (5). One can check that [y,z][x,y][x,z][y,z]\otimes[x,y]\leq[x,z] in two steps. First, [y,z][y,z][y,z]\leq[y,z] and Equation (5) gives [y,z]yz[y,z]\otimes y\leq z and similarly, [x,y]xy[x,y]\otimes x\leq y. Putting these two inequalities together yields [y,z][x,y]xz[y,z]\otimes[x,y]\otimes x\leq z which implies [y,z][x,y][x,z][y,z]\otimes[x,y]\leq[x,z]. Much can be said about commutative monoidal preorders and enrichment, for instance, see [FS19, Chapter 2], though for now our primary focus will be on the unit interval.

Lemma 1.

The unit interval [0,1][0,1] is a closed commutative monoidal preorder. The monoidal product ab:=aba\otimes b:=ab is the usual product of numbers and the internal hom is truncated division: for all a,b[0,1]a,b\in[0,1] define

(6) [a,b]:={b/aif b<a,1otherwise.[a,b]:=\begin{cases}b/a&\text{if }b<a,\\ 1&\text{otherwise}.\end{cases}
Proof.

To verify closure, we need to check the formula for truncated division works as an internal hom; that is, abcab\leq c if and only if a[b,c]a\leq[b,c]. There are two cases. If c<bc<b then abcab\leq c implies that acb=[b,c].a\leq\frac{c}{b}=[b,c]. If bcb\leq c then [b,c]=1[b,c]=1 and a[b,c]a\leq[b,c] automatically. ∎

While everything constructed using the internal hom in [0,1][0,1] ultimately involves statements about multiplication and less than or equal to, trying to argue using only multiplication and order can get messy, as statements often break down into multiple cases owing to the minimum in truncated division. Using the fact that for all a,b,c[0,1]a,b,c\in[0,1] we have

(7) abc if and only if a[b,c]ab\leq c\text{ if and only if }a\leq[b,c]

often makes arguments simpler. In what follows, we refer to the equivalence in (7) as the closure property of [0,1][0,1]. We occasionally will write min{b/a,1}\min\{b/a,1\} when aa might be zero. The reader should interpret b/01b/0\geq 1 for any 0b1.0\leq b\leq 1.

Also, keep in mind that we use juxtaposition abab for ordinary multiplication of numbers. For our purposes, this is shorthand for the monoidal product ab.a\otimes b. We also have the categorical product in [0,1][0,1], which is given by the minimum. So, for two numbers a,b[0,1]a,b\in[0,1] the expression a×ba\times b means min{a,b}.\min\{a,b\}. The coproduct is given by the maximum and is denoted by aba\sqcup b. So, when we are working in the unit interval [0,1][0,1], remember Equation (6) for the internal hom and

a×b\displaystyle a\times b :=min{a,b}\displaystyle:=\min\{a,b\}
ab\displaystyle a\sqcup b :=max{a,b}.\displaystyle:=\max\{a,b\}.

As a final remark, any thin category 𝖢\mathsf{C} can become a category enriched over [0,1][0,1] by setting 𝖢(x,y)=1\mathsf{C}(x,y)=1 if there is a morphism between objects xyx\to y and 𝖢(x,y)=0\mathsf{C}(x,y)=0 otherwise. In fact, 2:={0,1}2:=\{0,1\}, the set that contains zero and one only, is itself a commutative monoidal preorder that is closed, a sub-commutative monoidal preorder of the unit interval, and can serve as a base category over which to enrich.

2.2. The syntax category \mathcal{L}

Following [BV20], we now define a category \mathcal{L} enriched over [0,1][0,1].

Definition 4.

We define the syntax category \mathcal{L} to be the category enriched over [0,1][0,1] whose objects are expression in the language and where [0,1][0,1]-objects are defined by

(x,y):=π(y|x),\mathcal{L}(x,y):=\pi(y|x),

where π(y|x)\pi(y|x) denotes the probability that expression yy extends expression xx. If xx is not a subtext of yy then necessarily π(y|x)=0\pi(y|x)=0.

So, for example, one might have

(red,red firetruck)\displaystyle\mathcal{L}\left(\textit{red,red firetruck}\right) =0.02\displaystyle=0.02
(red,red idea)\displaystyle\mathcal{L}\left(\textit{red,red idea}\right) =105\displaystyle=10^{-5}
(red,blue sky)\displaystyle\mathcal{L}\left(\textit{red,blue sky}\right) =0.\displaystyle=0.

That \mathcal{L} is indeed a category enriched over [0,1][0,1] follows from the fact that π(x|x)=1\pi(x|x)=1 and

(8) π(z|y)π(y|x)=π(z|x)\pi(z|y)\pi(y|x)=\pi(z|x)

for all texts x,y,zx,y,z and so satisfies the required inequalities (3) and (4) with equalities. The reader might think of these probabilities π(y|x)\pi(y|x) as being most well defined when yy is a short extension of xx. While one may be skeptical about assigning a probability distribution on the set of all possible texts, it’s reasonable to say there is a nonzero probability that cat food will follow I am going to the store to buy a can of and, practically speaking, that probability can be estimated. Indeed, existing LLMs [RNSS18, RWC+18, B+20] successfully learn these conditional probabilities π(y|x)\pi(y|x) using standard machine learning tools trained on large corpora of texts, which may be viewed as providing a wealth of samples drawn from these conditional probability distributions. Figure 2 gives the right toy picture: the objects are expressions in language, and the labels on the arrows describe the probability of extension.

As in the unenriched setting of Section 1.2, the category \mathcal{L} is inherently syntactical—it encodes “what goes with what” together with the statistics of those expressions. But what about semantics? Following the same line of reasoning that led to the consideration of copresheaves in the unenriched case, we now wish to pass from \mathcal{L} to the enriched version of copresheaves on \mathcal{L} where we propose that meaning lies, and where concepts can again be combined through the enriched version of limits, colimits, and internal homs. This discussion first requires a few mathematical preliminaries, beginning with what an enriched functor is, what an enriched copresheaf is, and what the enriched version of natural transformations between functors are.

Before going on to the next section, we briefly comment on the work in [BV20]. In that paper, a [0,1][0,1]-enriched functor is defined between the syntax category \mathcal{L} and another [0,1][0,1]-enriched category, embedding \mathcal{L} as subcategory of a [0,1][0,1]-enriched category 𝒟\mathcal{D} consisting of “density” operators. In this paper we construct a category ^\widehat{\mathcal{L}} of [0,1][0,1]-copresheaves on \mathcal{L}, and the enriched version of the Yoneda embedding defines an embedding ^{\mathcal{L}}\to\widehat{\mathcal{L}}. The reader may think of the [0,1][0,1]-enriched Yoneda embedding as factoring through the functor 𝒟\mathcal{L}\to\mathcal{D} defined in [BV20], as in the picture below:

{\mathcal{L}}^{\widehat{\mathcal{L}}}𝒟{\mathcal{D}}

While technical details are needed to construct the third arrow 𝒟{\mathcal{D}}^{\widehat{\mathcal{L}}}, one may think of the category 𝒟\mathcal{D} is a kind of intermediary. The logical and semantic possibilities within 𝒟\mathcal{D} are more limited than in the semantic ^\widehat{\mathcal{L}}. However, the category 𝒟\mathcal{D}, as is argued in [BV20], can be approximated by a computer model, while providing more room for semantic exploration than in the syntactic category \mathcal{L}.

3. Enriched copresheaves

Recall from Section 1.2 that a copresheaf on a category 𝖢\mathsf{C} is a functor 𝖢𝖲𝖾𝗍\mathsf{C}\to\mathsf{Set}. To understand the enriched version, then, one must first have a notion of functors between enriched categories.

Definition 5.

Suppose 𝒞\mathcal{C} and 𝒟\mathcal{D} are categories enriched over a commutative monoidal preorder (𝒱,,,1)(\mathcal{V},\otimes,\leq,1). An enriched functor 𝒞𝒟\mathcal{C}~{}\to~{}\mathcal{D} is a function f:𝒞𝒟f\colon\mathcal{C}\to\mathcal{D} satisfying

(9) 𝒞(x,y)𝒟(fx,fy)\mathcal{C}(x,y)\leq\mathcal{D}(fx,fy)

for all objects xx and yy in 𝒞\mathcal{C}.

In the case that 𝒱\mathcal{V} is closed, it is enriched over itself and we can take 𝒟=𝒱\mathcal{D}=\mathcal{V} to make sense of enriched copresheaves.

Definition 6.

Suppose 𝒞\mathcal{C} is a category enriched over a closed commutative monoidal preorder 𝒱\mathcal{V}. An enriched copresheaf is a function f:𝒞𝒱f\colon\mathcal{C}\to\mathcal{V} satisfying 𝒞(x,y)𝒱(fx,fy)\mathcal{C}(x,y)\leq\mathcal{V}(fx,fy) for all objects xx and yy in 𝒞.\mathcal{C}.

Now, we discuss how to make a 𝒱\mathcal{V}-enriched category 𝒟𝒞\mathcal{D}^{\mathcal{C}} whose objects are 𝒱\mathcal{V}-functors from 𝒞\mathcal{C} to 𝒟\mathcal{D}. We need a hom object 𝒟𝒞(f,g)𝒱\mathcal{D}^{\mathcal{C}}(f,g)\in\mathcal{V} between any two such functors f,g:𝒞𝒟f,g\colon\mathcal{C}\to\mathcal{D}. Formally, it is defined by an end, which is a particular kind of limit in 𝒱\mathcal{V}

(10) 𝒟𝒞(f,g):=c𝒞𝒟(fc,gc)\mathcal{D}^{\mathcal{C}}(f,g):=\int_{c\in\mathcal{C}}\mathcal{D}(fc,gc)

which always exists if 𝒱\mathcal{V} is complete. The precise definition of an end can be found in Section 7.3 in [Rie14], but now we will specialize to the case of copresheaves and the right hand side of Equation (10) reduces to something simple, summarized in Lemma 2 below. The takeaway is that just as mappings between (ordinary) categories define a functor category, so mappings between enriched categories define an enriched functor category, at least when the base category is closed and complete.

From now on, we specialize to the case that the enriching category is [0,1][0,1]. We begin with a lemma that says how to assign an element of [0,1][0,1] to a pair of functors enriched over [0,1][0,1].

Lemma 2.

If 𝒞\mathcal{C} is a category enriched over [0,1][0,1], then the category 𝒞^:=[0,1]𝒞\widehat{\mathcal{C}}:=[0,1]^{\mathcal{C}} of copresheaves is also enriched over [0,1][0,1]. The [0,1][0,1]-object between any pair of copresheaves f,g:𝒞[0,1]f,g\colon\mathcal{C}\to[0,1] is given by the following infimum over all objects in 𝒞\mathcal{C},

(11) 𝒞^(f,g)=infc𝒞{[fc,gc]}\widehat{\mathcal{C}}(f,g)=\inf_{c\in\mathcal{C}}\{[fc,gc]\}
Proof.

The proof is a computation of the end on the right hand side of Equation (10) when the target category is the unit interval. ∎

Recalling that the internal hom in the unit interval is given by truncated division, one has the [0,1][0,1]-hom object associated to a pair of [0,1][0,1]-functors f,g:𝒞[0,1]f,g\colon\mathcal{C}\to[0,1] is given by 𝒞^(f,g)=infc𝒞{1,gc/fc}.\widehat{\mathcal{C}}(f,g)=\inf_{c\in\mathcal{C}}\{1,gc/fc\}.

Lemma 3.

Let 𝒞\mathcal{C} be a [0,1][0,1]-category. For every object xx, the function

hx:=𝒞(x,)h^{x}:=\mathcal{C}(x,-)

is a [0,1][0,1]-functor.

Proof.

Setting 𝒱=[0,1]\mathcal{V}=[0,1] and also 𝒟=[0,1]\mathcal{D}=[0,1], Equation (9) says what is required for a function 𝒞[0,1]\mathcal{C}\to[0,1] to be an enriched functor. So, let c,d𝒞c,d\in\mathcal{C}. We need to check that

𝒞(c,d)[hx(c),hx(d)]\mathcal{C}(c,d)\leq[h^{x}(c),h^{x}(d)]

That is, we need to check that 𝒞(c,d)[𝒞(x,c),𝒞(x,d)]\mathcal{C}(c,d)\leq[\mathcal{C}(x,c),\mathcal{C}(x,d)], which by the closure condition in (7) is equivalent to 𝒞(c,d)𝒞(x,c)𝒞(x,d)\mathcal{C}(c,d)\mathcal{C}(x,c)\leq\mathcal{C}(x,d), which is satisfied since 𝒞\mathcal{C} is enriched over [0,1][0,1]. ∎

Definition 7.

For any object xx in a [0,1][0,1]-category 𝒞\mathcal{C}, we call the functor hx:=𝒞(x,)h^{x}:=\mathcal{C}(x,-) the copresheaf represented by xx. We say a copresheaf f:𝒞[0,1]f\colon\mathcal{C}\to[0,1] is representable if f=hxf=h^{x} for some x𝒞x\in\mathcal{C}.

3.1. The [0,1][0,1]-enriched Yoneda lemma

Lemma 2 says that enriched copresheaves form an enriched category. Lemma 3 says that each object defines a representable copresheaf. In this subsection, we see that the assignment xhxx\mapsto h^{x} defines an enriched functor that embeds (the opposite of) a [0,1][0,1]-category within its category of copresheaves. It is a corollary of an enriched Yoneda lemma.

Theorem 1 (The Enriched Yoneda Lemma).

For any object xx in a [0,1][0,1] category 𝒞\mathcal{C} and any [0,1][0,1]-copresheaf f:𝒞[0,1]f\colon\mathcal{C}\to[0,1] we have 𝒞^(hx,f)=f(x).\widehat{\mathcal{C}}(h^{x},f)=f(x).

Proof.

Fix an object x𝒞x\in\mathcal{C} and a copresheaf ff. Since 𝒞^(hx,f)=infc𝒞{[hx(c),fc]}\widehat{\mathcal{C}}(h^{x},f)=\inf_{c\in\mathcal{C}}\{[h^{x}(c),fc]\} we have for any particular c𝒞c\in\mathcal{C} that 𝒞^(hx,f)[hx(c),fc]\widehat{\mathcal{C}}(h^{x},f)\leq[h^{x}(c),fc]. For c=xc=x, we have

𝒞^(hx,f)[hx(x),fx]=[1,fx]=fx.\widehat{\mathcal{C}}(h^{x},f)\leq[h^{x}(x),fx]=[1,fx]=fx.

On the other hand, since ff is a [0,1][0,1]-functor from 𝒞\mathcal{C} to [0,1][0,1], we have 𝒞(x,c)[fx,fc]\mathcal{C}(x,c)\leq[fx,fc] for any c𝒞c\in\mathcal{C}. By the closure of [0,1][0,1] the inequality 𝒞(x,c)[fx,fc]\mathcal{C}(x,c)\leq[fx,fc] is equivalent to 𝒞(x,c)fxfc\mathcal{C}(x,c)fx\leq fc which in turn is equivalent to fx[𝒞(x,c),fc]fx\leq[\mathcal{C}(x,c),fc]. Having fx[𝒞(x,c),fc]fx\leq[\mathcal{C}(x,c),fc] for every c𝒞c\in\mathcal{C} implies that fxinfc𝒞{[hx(c),fc]}=𝒞^(hx,f)fx\leq\inf_{c\in\mathcal{C}}\{[h^{x}(c),fc]\}=\widehat{\mathcal{C}}(h^{x},f) and the theorem is proved. ∎

Corollary 1.

𝒞(y,x)=𝒞^(hx,hy)\mathcal{C}(y,x)=\widehat{\mathcal{C}}(h^{x},h^{y}) for all objects x,yx,y in a [0,1][0,1]-category 𝒞\mathcal{C}.

Proof.

Setting f=hyf=h^{y} in Theorem 1 yields 𝒞^(hx,hy)=hy(x)=𝒞(y,x).\widehat{\mathcal{C}}(h^{x},h^{y})=h^{y}(x)=\mathcal{C}(y,x).

Therefore, we have the expected interpretation of Corollary 1 as an enriched version of the (co)Yoneda embedding.

Corollary 2.

For any [0,1][0,1]-category 𝒞\mathcal{C}, the assignment xhxx\mapsto h^{x} defines an enriched functor 𝒞op𝒞^\mathcal{C}^{\text{op}}\to\widehat{\mathcal{C}}, embedding 𝒞op\mathcal{C}^{\text{op}} as an enriched subcategory of 𝒞^\widehat{\mathcal{C}}.

The op in 𝒞op\mathcal{C}^{\text{op}} stands for “opposite” and is there because the assignment xhxx\mapsto h^{x} reverses morphisms, as in the statement of Corollary 1 above.

3.2. The semantic category ^\widehat{\mathcal{L}}

Often, when a mathematical object YY has nice structure, the set of functions {XY}\{X\to Y\} from a fixed XX has nice struture also, even if XX does not. Real valued functions on any set form a vector space, etc…. The unit interval has rich structure from the perspective of category theory. It is commutative monoidal closed and complete and cocomplete. For any [0,1][0,1]-enriched category 𝒞\mathcal{C}, the category 𝒞^=[0,1]𝒞\widehat{\mathcal{C}}=[0,1]^{\mathcal{C}} of copresheaves on 𝒞\mathcal{C}, as a category of functors into the unit interval, inherits rich structure, as well. In particular, it has enriched versions of products, coproducts, and an internal hom. Corollary 2, from this perspective, says that any [0,1][0,1]-category embeds into the [0,1][0,1]-category 𝒞^\widehat{\mathcal{C}}, which is often much nicer.

We now look at the copresheaves on the enriched syntax category \mathcal{L} which will provide a place for the combination of concepts in language in a way that’s parallel to the ideas explored in Section 1.2.

Definition 8.

Let \mathcal{L} be the syntax category (Definition 4). The semantic category ^:=[0,1]\widehat{\mathcal{L}}:=[0,1]^{\mathcal{L}} is the [0,1][0,1]-category of [0,1][0,1]-enriched copresheaves on the [0,1][0,1]-category \mathcal{L}.

For each object xx in \mathcal{L}, the representable copresheaf hx:=(x,)h^{x}:=\mathcal{L}(x,-) is given by the conditional probability of extending xx,

(12) chx(c):={π(c|x)if xc0otherwise.c\mapsto h^{x}(c):=\begin{cases}\pi(c|x)&\text{if }x\leq c\\ 0&\text{otherwise}.\end{cases}

where we use “xcx\leq c” as shorthand for “xx is contained as a subtext of cc.”

The representable enriched copresheaf hxh^{x} is our proposal for the meaning of the expression xx. Its support consists of all expressions containing xx, which coincides with the principal ideal associated to xx, and hxh^{x} further accounts for the statistics associated with those expressions, which is precisely the distributional structure missing from Section 1.2. In other words, the meaning of a text is the varying potential of all contexts in which it is used, making our definition of hxh^{x} as the meaning of the text xx consistent with at least several philosophical traditions, including a use theory of meaning [Hor04] and dynamic semantics [NBvEV16]. The embedding in Corollary 2 assigns to a text xx its meaning hxh^{x}.

From a mathematical point of view, the assignment xhxx\mapsto h^{x} faithfully embeds the syntax category \mathcal{L} into the category of copresheaves on \mathcal{L}. The category ^\widehat{\mathcal{L}} is complete and cocomplete and so contains all (the enriched versions of) limits and colimits. The meanings of texts that live in ^\widehat{\mathcal{L}} can be manipulated and combined to form higher concepts in ^\widehat{\mathcal{L}}, none of which is possible within the confines of \mathcal{L}. These operations on copresheaves are the subject of the next section.

The reversal of arrows in the [0,1][0,1]-enriched Yoneda Lemma has a nice interpretation here also. Suppose the text yy extends the text xx and (x,y)=a0.\mathcal{L}(x,y)=a\neq 0. We have the following picture:

x{x}y{y}hy{h^{y}}hx{h^{x}}a\scriptstyle{a}a\scriptstyle{a}

On the left, the text yy is an extension of the text xx in the syntax category \mathcal{L}. Passing to the semantic category ^\widehat{\mathcal{L}} on the right, hxh^{x} is the meaning of the text xx, which represents the varying potential contexts in which xx might appear, and hyh^{y} represents the varying potential contexts in which the text yy appears. The contexts that xx can appear extend the contexts that in which yy can appear. Continuing an expression restricts the potential contexts in which the expression can be used.

4. Enriched products and coproducts in ^\widehat{\mathcal{L}}

Think back to Section 1.3.1, for instance, where we described the coproduct of ordinary copresheaves associated to the words red and blue, which represented their disjunction red or blue. Section 1.3.2 likewise computed the product of copresheaves representing the conjunction red and blue, and Section 1.3.3 computed the copresheaf representing the implication red \Rightarrow blue. In this section, we discuss the analogous constructions for enriched copresheaves. We have asserted several times that the [0,1][0,1]-category of [0,1][0,1]-copresheaves on a [0,1][0,1]-category 𝒞\mathcal{C} contains all of the enriched versions of (co)limits. Without having yet said what the enriched version of (co)limits are, one can understand that the reason [0,1][0,1]-enriched copresheaf categories are (co)complete is that the appropriate (co)limits are computed “pointwise” in [0,1][0,1]. The real-analysis completeness of the interval implies that the infimum and supremum of any subset of the interval exists, hence the [0,1][0,1]-enriched categorical (co)limit of any diagram in [0,1][0,1] exists. This is analagous to the way (co)limits in the functor category 𝖲𝖾𝗍𝖢\mathsf{Set}^{\mathsf{C}} when 𝖢\mathsf{C} is an ordinary category are built up from (co)limits of sets, which always exist in 𝖲𝖾𝗍\mathsf{Set}.

To make sense of (co)limits in this new enriched setting, the familiar definition must be slightly modified. This leads to the notion of weighted limits and colimits, which are the appropriate notions of limits and colimits in enriched category theory.

4.1. Weighted limits and colimits

The appropriate notion of limits and colimits in enriched category theory are called weighted limits and colimits. We focus on building intuition around limits as the discussion for colimits is analogous. Now, to understand weighted limits, it will first help to recall the defining isomorphism for ordinary limits, also called “conical limits.” To that end, let 𝖩\mathsf{J} be an indexing category and let 𝖢\mathsf{C} be any category. Recall that the limit of a diagram F:𝖩𝖢F\colon\mathsf{J}\to\mathsf{C}, if it exists, is an object limF\lim F in 𝖢\mathsf{C} together with the following isomorphism of functors 𝖢𝖲𝖾𝗍\mathsf{C}\to\mathsf{Set},

(13) 𝖢(,limF)𝖲𝖾𝗍𝖩(,𝖢(,F))\mathsf{C}(-,\lim F)\cong\mathsf{Set}^{\mathsf{J}}(\ast,\mathsf{C}(-,F))

which is natural in the first variable. Here, \ast is used to denote the constant functor at the one-point set:

(14) 𝖩{\mathsf{J}}𝖲𝖾𝗍{\mathsf{Set}}i{i}{\ast}j{j}{\ast}\scriptstyle{\ast}id\scriptstyle{\text{id}_{\ast}}

So for each object ZZ in 𝖢\mathsf{C} there is a bijection 𝖢(Z,limF)𝖲𝖾𝗍𝖩(,𝖢(Z,F))\mathsf{C}(Z,\lim F)\cong\mathsf{Set}^{\mathsf{J}}(\ast,\mathsf{C}(Z,F)) and so “morphisms into the limit are the same as a cone over the diagram, whose legs commute with morphisms in the diagram,” and one has in mind the following picture.

Z{Z}limF{\lim F}Fi{Fi}Fj{Fj}

Taking a closer look at the right-hand side of the isomorphism in (13), notice that for each object ZZ in 𝖢,\mathsf{C}, a natural transformation from \ast to 𝖢(Z,F)\mathsf{C}(Z,F) consists of a function =(i)𝖢(Z,Fi)\ast=\ast(i)\to\mathsf{C}(Z,Fi) for each object ii in 𝖩\mathsf{J}. This simply picks out a morphism ZFiZ\to Fi in 𝖢\mathsf{C}, and these morphisms comprise the legs of the limit cone over limF\lim F. Naturality ensures that these legs are compatible with the morphisms in the diagram.

With this background in mind, we now introduce weighted limits. The essential difference between the two constructions is that the constant functor in (14) is replaced with a so-called 𝒱\mathcal{V}-functor of weights W:𝒥𝒱W\colon\mathcal{J}\to\mathcal{V}, where 𝒱\mathcal{V} is the base category over which enrichment takes place and where 𝒥\mathcal{J} is an indexing 𝒱\mathcal{V}-category. Here is the formal definition.

Definition 9.

Let 𝒱\mathcal{V} be a closed commutative monoidal category and let 𝒥\mathcal{J} and \mathcal{E} be 𝒱\mathcal{V}-categories. Given a 𝒱\mathcal{V}-functor F:𝒥F\colon\mathcal{J}\to\mathcal{E} and a 𝒱\mathcal{V}-functor W:𝒥𝒱W\colon\mathcal{J}\to\mathcal{V}, the weighted limit of FF, if it exists, is an object limWF\lim_{W}F of \mathcal{E} together with the following isomorphism of 𝒱\mathcal{V}-functors 𝒱\mathcal{E}\to\mathcal{V},

(15) (,limWF)𝒱𝒥(W,(,F))\mathcal{E}(-,\lim_{W}F)\cong\mathcal{V}^{\mathcal{J}}(W,\mathcal{E}(-,F))

that is natural in the first variable.

So in other words, for each object ZZ in \mathcal{E} there is an isomorphism (Z,limWF)𝒱𝒥(W,(Z,F))\mathcal{E}(Z,\text{lim}_{W}F)\cong\mathcal{V}^{\mathcal{J}}(W,\mathcal{E}(Z,F)) of objects in 𝒱\mathcal{V}. The idea behind a weighted colimit is analogous. Given a 𝒱\mathcal{V}-functor F:𝒥F\colon\mathcal{J}\to\mathcal{E} and a 𝒱\mathcal{V}-functor of weights W:𝒥op𝒱W\colon\mathcal{J}^{\text{op}}\to\mathcal{V}, the weighted colimit of FF is an object colimWF\operatorname{colim}_{W}F of \mathcal{E} together with an isomorphism

(16) (colimWF,)𝒱𝒥op(W,(F,))\mathcal{E}(\operatorname{colim}_{W}F,-)\cong\mathcal{V}^{\mathcal{J}^{\text{op}}}(W,\mathcal{E}(F,-))

Details may be found in [Rie14, Chapter 7].

4.2. Weighted products in ^\widehat{\mathcal{L}}

Let’s unwind the isomorphism in (15) in the simple case when 𝒱=[0,1]\mathcal{V}=[0,1], when =^:=[0,1]\mathcal{E}=\widehat{\mathcal{L}}:=[0,1]^{\mathcal{L}}, and when the indexing category 𝒥\mathcal{J} is a discrete category with two objects, call them 11 and 22, enriched over [0,1][0,1] by setting 𝒥(i,j)=δij\mathcal{J}(i,j)=\delta_{ij} for i,j{1,2}.i,j\in\{1,2\}. To begin, fix a functor of weights W:𝒥[0,1]W\colon\mathcal{J}\to[0,1]. This is nothing more than a choice of two numbers w1:=W(1)w_{1}:=W(1) and w2:=W(2)w_{2}:=W(2). Further, for a fixed pair of copresheaves f,g:[0,1]f,g:\mathcal{L}\to[0,1], define F:𝒥^F\colon\mathcal{J}\to\widehat{\mathcal{L}} to be the [0,1][0,1]-functor with f:=F(1)f:=F(1) and g:=F(2)g:=F(2).

Definition 10.

Denote the weighted limit of FF with respect to the weight WW by

(w1,f)×(w2,g):=limWF.(w_{1},f)\times(w_{2},g):=\lim_{W}F.
Theorem 2.

The weighted limit (w1,f)×(w2,g):[0,1](w_{1},f)\times(w_{2},g):\mathcal{L}\to[0,1] is given by cmin{fcw1,gcw2,1}.\displaystyle c\mapsto\min\left\{\frac{fc}{w_{1}},\frac{gc}{w_{2}},1\right\}.

Proof.

To check that cmin{fcw1,gcw2,1}c\mapsto\min\left\{\frac{fc}{w_{1}},\frac{gc}{w_{2}},1\right\} satisfies the universal property of the weighted limit, let Z:[0,1]Z\colon\mathcal{L}\to[0,1] be any copresheaf and look at Equation (15) evaluated at ZZ. We need to check that

(17) ^(Z,limWF)[0,1]𝒥(W,^(Z,F)).\widehat{\mathcal{L}}(Z,\text{lim}_{W}F)\cong[0,1]^{\mathcal{J}}(W,\widehat{\mathcal{L}}(Z,F)).

On the left hand side, with the claimed copresheaf substituted for the limit, we have

infc{[Zc,min{fcw1,gcw2,1}]}=infc{fcw1Zc,gcw2Zc,1}.\inf_{c\in\mathcal{L}}\left\{\left[Zc,\min\left\{\frac{fc}{w_{1}},\frac{gc}{w_{2}},1\right\}\right]\right\}=\inf_{c\in\mathcal{L}}\left\{\frac{fc}{w_{1}Zc},\frac{gc}{w_{2}Zc},1\right\}.

Let’s begin looking at the right hand side of Equation (17) by simplifying the expression ^(Z,F)\widehat{\mathcal{L}}(Z,F), which is a functor J[0,1]J\to[0,1]. Evaluating at ii in JJ yields the number

^(Z,Fi)=infc{[Zc,Fic]}=infc{FicZc,1}.\widehat{\mathcal{L}}(Z,Fi)=\inf_{c\in\mathcal{L}}\{[Zc,Fic]\}=\inf_{c\in\mathcal{L}}\left\{\frac{Fic}{Zc},1\right\}.

Now, the right hand side of Equation (17), as a hom-object in [0,1]𝒥[0,1]^{\mathcal{J}}, is the minimum over the objects of 𝒥\mathcal{J} which are 11 and 22. Using f=F(1)f=F(1) and g=F(2)g=F(2), we have:

[0,1]𝒥(W,^(Z,F))\displaystyle[0,1]^{\mathcal{J}}(W,\widehat{\mathcal{L}}(Z,F)) =min{[w1,infc{fcZc,1}],[w2,infc{gcZc,1}]}\displaystyle=\min\left\{\left[w_{1},\inf_{c\in\mathcal{L}}\left\{\frac{fc}{Zc},1\right\}\right],\left[w_{2},\inf_{c\in\mathcal{L}}\left\{\frac{gc}{Zc},1\right\}\right]\right\}
=infc{fcw1Zc,gcw2Zc,1}.\displaystyle=\inf_{c\in\mathcal{L}}\left\{\frac{fc}{w_{1}Zc},\frac{gc}{w_{2}Zc},1\right\}.

It remains to check that the assignment cmin{fcw1,gcw2,1}c\mapsto\min\left\{\frac{fc}{w_{1}},\frac{gc}{w_{2}},1\right\} is indeed a [0,1][0,1]-copresheaf; that is, that (c,d)[limWF(c),limWF(d)]\mathcal{L}(c,d)\leq\left[\lim_{W}F(c),\lim_{W}F(d)\right], or equivalently (c,d)limWF(c)limWF(d)\mathcal{L}(c,d)\lim_{W}F(c)\leq\lim_{W}F(d), for all c,dc,d\in\mathcal{L}. This desired inequality follows from the simple observation that,

(c,d)limWF(c)\displaystyle\mathcal{L}(c,d)\lim_{W}F(c) =min{(c,d)f(c)w1,(c,d)g(c)w2,(c,d)}\displaystyle=\min\left\{\frac{\mathcal{L}(c,d)f(c)}{w_{1}},\frac{\mathcal{L}(c,d)g(c)}{w_{2}},\mathcal{L}(c,d)\right\}
min{f(d)w1,g(d)w2,1}\displaystyle\leq\min\left\{\frac{f(d)}{w_{1}},\frac{g(d)}{w_{2}},1\right\}
min{f(d)w1,g(d)w2,1}\displaystyle\leq\min\left\{\frac{f(d)}{w_{1}},\frac{g(d)}{w_{2}},1\right\}
=limWF(d),\displaystyle=\lim_{W}F(d),

where the first inequality follows from the assumption that ff and gg are [0,1][0,1]-copresheaves. ∎

Keeping in mind that products in intuitionist logic serve as a kind of conjunction, let’s look closer at the weighted product (w1,f)×(w2,g)(w_{1},f)\times(w_{2},g) in the case ff and gg are representable. Fix a pair of expressions xx and yy in \mathcal{L} and a pair of nonzero weights w1,w2[0,1]w_{1},w_{2}\in[0,1]. The weighted product (w1,hx)×(w2,hy):[0,1](w_{1},h^{x})\times(w_{2},h^{y})\colon\mathcal{L}\to[0,1] assigns to an expression cc:

(18) ((w1,hx)×(w2,hy))(c)=min{hx(c)w1,hy(c)w2,1}.\displaystyle((w_{1},h^{x})\times(w_{2},h^{y}))(c)=\min\left\{\frac{h^{x}(c)}{w_{1}},\frac{h^{y}(c)}{w_{2}},1\right\}.

Remembering Equation (12), the value of this minimum depends on whether the expressions xx and yy are jointly contained within cc, and whether π(c|x)w1\pi(c|x)\leq w_{1} and π(c|y)w2\pi(c|y)\leq w_{2}.

(19) ((w1,hx)×(w2,hy))(c)\displaystyle((w_{1},h^{x})\times(w_{2},h^{y}))(c) ={min{π(c|x)w1,π(c|y)w2,1} if xc and yc0otherwise.\displaystyle=\begin{cases}\min\left\{\frac{\pi(c|x)}{w_{1}},\frac{\pi(c|y)}{w_{2}},1\right\}&\text{ if }x\leq c\text{ and }y\leq c\\[5.0pt] 0&\text{otherwise}.\end{cases}

The support of this copresheaf thus coincides with the support of logical “and” in Boolean logic. As the weights decrease, the values of the quotients in Equation (19) increase and thus contribute less to the value of w1hx(c)×w2hy(c)w_{1}h^{x}(c)\times w_{2}h^{y}(c), which is a minimum. The copresheaf (w1,hx)×(w2,hy)(w_{1},h^{x})\times(w_{2},h^{y}) captures something like a weighted conjunction.

It’s worth looking closer when the weights are w1=w2=1w_{1}=w_{2}=1. In this case, we denote the weighted product simply by f×gf\times g. This weighted product when the weights both equal 11 works a lot like an ordinary product in the sense that morphisms into the product correspond precisely to products of morphisms. Here, “morphisms into the product” means a particular [0,1][0,1]-object, and “products of morphisms” means the product of [0,1][0,1]-objects in [0,1][0,1], which recall is a minimum. That is,

Lemma 4.

For any copresheaves f,g,h:[0,1]f,g,h\colon\mathcal{L}\to[0,1] we have

^(h,f×g)=^(h,f)×^(h,g).\widehat{\mathcal{L}}(h,f\times g)=\widehat{\mathcal{L}}(h,f)\times\widehat{\mathcal{L}}(h,g).
Proof.

We compute:

^(h,f×g)\displaystyle\widehat{\mathcal{L}}(h,f\times g) =infc{[hc,(f×g)c]}\displaystyle=\inf_{c\in\mathcal{L}}\{[hc,(f\times g)c]\}
=infc{[hc,min{fc,gc}]}\displaystyle=\inf_{c\in\mathcal{L}}\{[hc,\min\{fc,gc\}]\}
=infc{fchc,gchc,1}\displaystyle=\inf_{c\in\mathcal{L}}\left\{\frac{fc}{hc},\frac{gc}{hc},1\right\}
=min{infc{fchc,1},infc{gchc,1},1}\displaystyle=\min\left\{\inf_{c\in\mathcal{L}}\left\{\frac{fc}{hc},1\right\},\inf_{c\in\mathcal{L}}\left\{\frac{gc}{hc},1\right\},1\right\}
=^(h,f)×^(h,g)\displaystyle=\widehat{\mathcal{L}}(h,f)\times\widehat{\mathcal{L}}(h,g)

Lemma 4 will be helpful when we discuss enriched implications. But before we do, let’s discuss weighted coproducts.

4.3. Weighted coproducts in ^\widehat{\mathcal{L}}

Now, let’s look at a simple weighted coproduct in ^:=[0,1]\widehat{\mathcal{L}}:=[0,1]^{\mathcal{L}}. Again, let the indexing category 𝒥\mathcal{J} be the same discrete category with two objects. Fix a functor of weights W:𝒥op[0,1]W\colon\mathcal{J}^{\text{op}}\to[0,1] setting w1:=W(1)w_{1}:=W(1) and w2:=W(2)w_{2}:=W(2) and a diagram F:𝒥^F\colon\mathcal{J}\to\widehat{\mathcal{L}} with f:=F(1)f:=F(1) and g:=F(2)g:=F(2).

Definition 11.

Denote the weighted colimit of FF with respect to the weight WW by

(w1,f)(w2,g):=colimWF.(w_{1},f)\sqcup(w_{2},g):=\operatorname{colim}_{W}F.
Theorem 3.

The weighted colimit (w1,f)(w2,g):[0,1](w_{1},f)\sqcup(w_{2},g)\colon\mathcal{L}\to[0,1] is given by cmax{w1fc,w2gc}.\displaystyle c\mapsto\max\left\{w_{1}fc,w_{2}gc\right\}.

Proof.

Note that 𝒥=𝒥op\mathcal{J}=\mathcal{J}^{\text{op}} since there are no nonidentiy morphisms in 𝒥\mathcal{J}. Let Z:[0,1]Z\colon\mathcal{L}\to[0,1] be any copresheaf. We need to show

(20) ^(colimWF,Z)=[0,1]𝒥(W,^(F,Z)).\widehat{\mathcal{L}}(\operatorname{colim}_{W}F,Z)=[0,1]^{\mathcal{J}}(W,\widehat{\mathcal{L}}(F,Z)).

Substituting the claimed colimit into the left hand side and evaluating at an object ii in 𝒥\mathcal{J} yields

^(colimWFi,Z)\displaystyle\widehat{\mathcal{L}}(\operatorname{colim}_{W}Fi,Z) =infc{[max{w1fc,w2gc},Zc]}\displaystyle=\inf_{c\in\mathcal{L}}\{[\max\left\{w_{1}fc,w_{2}gc\right\},Zc]\}
=infc{Zcmax{w1fc,w2gc},1}\displaystyle=\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{\max\left\{w_{1}fc,w_{2}gc\right\}},1\right\}
=infc{Zcw1fc,Zcw2gc}.\displaystyle=\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{w_{1}fc},\frac{Zc}{w_{2}gc}\right\}.

Evaluating ^(F,Z)\widehat{\mathcal{L}}(F-,Z) at i=1,2i=1,2 yields

^(Fi,Z)=infc{[Fic,Zc]}=infc{ZcFic,1}.\widehat{\mathcal{L}}(Fi,Z)=\inf_{c\in\mathcal{L}}\{[Fic,Zc]\}=\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{Fic},1\right\}.

Using f=F(1)f=F(1) and g=F(2)g=F(2) the right hand side of Equation (20) is

[0,1]𝒥(W,^(F,Z))\displaystyle[0,1]^{\mathcal{J}}(W,\widehat{\mathcal{L}}(F,Z)) =min{[w1,infc{Zcfc,1}],[w2,infc{Zcgc,1}]}\displaystyle=\min\left\{\left[w_{1},\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{fc},1\right\}\right],\left[w_{2},\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{gc},1\right\}\right]\right\}
=infc{Zcw1fc,Zcw2gc,1}.\displaystyle=\inf_{c\in\mathcal{L}}\left\{\frac{Zc}{w_{1}fc},\frac{Zc}{w_{2}gc},1\right\}.

It remains to check that the assignment cmax{w1fc,w2gc}c\mapsto\max\left\{w_{1}fc,w_{2}gc\right\} is indeed a [0,1][0,1]-copresheaf; that is, that (c,d)[colimWF(c),colimWF(d)]\mathcal{L}(c,d)\leq\left[\operatorname{colim}_{W}F(c),\operatorname{colim}_{W}F(d)\right], or equivalently (c,d)colimWF(c)colimWF(d)\mathcal{L}(c,d)\operatorname{colim}_{W}F(c)\leq\operatorname{colim}_{W}F(d), for all c,dc,d\in\mathcal{L}. This desired inequality follows from the simple observation that

(c,d)colimWF(c)\displaystyle\mathcal{L}(c,d)\operatorname{colim}_{W}F(c) =max{(c,d)w1f(c),(c,d)w2g(c)}\displaystyle=\max\left\{\mathcal{L}(c,d)w_{1}f(c),\mathcal{L}(c,d)w_{2}g(c)\right\}
max{f(d)w1,g(d)w2}\displaystyle\leq\max\left\{\frac{f(d)}{w_{1}},\frac{g(d)}{w_{2}}\right\}
=colimWF(d),\displaystyle=\operatorname{colim}_{W}F(d),

where the middle inequality follows from the assumption that ff and gg are [0,1][0,1]-copresheaves. This proves the copresheaf defined by cmax{w1fc,w2gc}c\mapsto\max\left\{w_{1}fc,w_{2}gc\right\} satisfies the universal property to be the claimed weighted coproduct. ∎

The support of the copresheaf (w1,f)(w2,g)(w_{1},f)\sqcup(w_{2},g) coincides with the support of logical “or,” and this matches the intuition that colimits capture something of disjunction. Here, as the weights decrease, the corresponding cofactors contribute less to the weighted coproduct.

Now, let’s now turn to a discussion of implication in the enriched setting.

4.4. Enriched implication

Recall from Equation (1) in Section 1.3.3 that the internal hom between a pair of ordinary copresheaves G,H:𝖢𝖲𝖾𝗍G,H\colon\mathsf{C}\to\mathsf{Set} is the copresheaf [G,H][G,H] defined on objects by c𝖢^(hc×G,H)c\mapsto\widehat{\mathsf{C}}(h^{c}\times G,H), which is the set of natural transformations from hc×Gh^{c}\times G to HH.

Replacing the base category of sets with the unit interval gives the enriched version.

Definition 12.

For any f,g:[0,1]f,g\colon\mathcal{L}\to[0,1] let [f,g]:[0,1][f,g]\colon\mathcal{L}\to[0,1] be defined by

[f,g](c):=^(hc×f,g).[f,g](c):=\widehat{\mathcal{L}}(h^{c}\times f,g).

where hc×fh^{c}\times f is the product defined above Lemma 4.

Lemma 5.

For any f,g:[0,1]f,g\colon\mathcal{L}\to[0,1] the function [f,g]:[0,1][f,g]\colon\mathcal{L}\to[0,1] is, in fact, a [0,1][0,1]-copresheaf.

Proof.

We need to show that (c,d)[[f,g](c),[f,g](d)]\mathcal{L}(c,d)\leq[[f,g](c),[f,g](d)], which is equivalent to showing that (c,d)[f,g](c)[f,g](d)\mathcal{L}(c,d)[f,g](c)\leq[f,g](d). Start with the fact that gg is a copresheaf to get (c,d)[g(c),g(d)]\mathcal{L}(c,d)\leq[g(c),g(d)] which is equivalent to (c,d)g(c)g(d)\mathcal{L}(c,d)g(c)\leq g(d) which by the enriched Yoneda Lemma is equivalent to (c,d)^(hc,g)^(hd,g)\mathcal{L}(c,d)\widehat{\mathcal{L}}(h^{c},g)\leq\widehat{\mathcal{L}}(h^{d},g). This is, in turn, equivalent to (c,d)(^(hc,g)×^(f,g))^(hd,g)×^(f,g)\mathcal{L}(c,d)\left(\widehat{\mathcal{L}}(h^{c},g)\times\widehat{\mathcal{L}}(f,g)\right)\leq\widehat{\mathcal{L}}(h^{d},g)\times\widehat{\mathcal{L}}(f,g). Using Lemma 4, we can rewrite this inequality as (c,d)^(hc×f,g)^(hd×f,g)\mathcal{L}(c,d)\widehat{\mathcal{L}}(h^{c}\times f,g)\leq\widehat{\mathcal{L}}(h^{d}\times f,g), which is equivalent to (c,d)[^(hc×f,g),^(hd×f,g)]=[[f,g](c),[f,g](d)]\mathcal{L}(c,d)\leq[\widehat{\mathcal{L}}(h^{c}\times f,g),\widehat{\mathcal{L}}(h^{d}\times f,g)]=[[f,g](c),[f,g](d)] as needed. ∎

We have an enriched functor (f×):^^(f\times-)\colon\widehat{\mathcal{L}}\to\widehat{\mathcal{L}} and Definition 12 presents [f,][f,-] which is enriched right adjoint to ×f-\times f. The enriched Yoneda Lemma (Theorem 1) says ^(hc,[f,g])=[f,g](c)\widehat{\mathcal{L}}(h^{c},[f,g])=[f,g](c) and by definition, we have [f,g](c)=^(hc×f,g).[f,g](c)=\widehat{\mathcal{L}}(h^{c}\times f,g). The equality ^(hc,[f,g])=^(hc×f,g)\widehat{\mathcal{L}}(h^{c},[f,g])=\widehat{\mathcal{L}}(h^{c}\times f,g) for all representables hch^{c} implies equality for all copresheaves hh:

^(h×f,g)=^(h,[f,g]).\widehat{\mathcal{L}}(h\times f,g)=\widehat{\mathcal{L}}(h,[f,g]).

That is, [f,g][f,g] serves as an internal-hom for ^\widehat{\mathcal{L}} making it enriched Cartesian closed. Now, let’s take a closer look at this internal hom [f,g][f,g] when ff and gg are representable.

Theorem 4.

For any x,yx,y\in\mathcal{L}, we have

(21) [hx,hy](c)=infd{π(d|y)min{π(d|c),π(d|x)},1}.[h^{x},h^{y}](c)=\inf_{d\in\mathcal{L}}\left\{\frac{\pi(d|y)}{\min\{\pi(d|c),\pi(d|x)\}},1\right\}.
Proof.

We compute

[hx,hy](c)\displaystyle[h^{x},h^{y}](c) =^(hc×hx,hy)\displaystyle=\widehat{\mathcal{L}}(h^{c}\times h^{x},h^{y})
=infd{[(hc×hx)(d),hy(d)]}\displaystyle=\inf_{d\in\mathcal{L}}\left\{\left[(h^{c}\times h^{x})(d),h^{y}(d)\right]\right\}
=infd{π(d|y)π(d|c)×π(d|x),1}.\displaystyle=\inf_{d\in\mathcal{L}}\left\{\frac{\pi(d|y)}{\pi(d|c)\times\pi(d|x)},1\right\}.
=infd{π(d|y)min{π(d|c),π(d|x)},1}.\displaystyle=\inf_{d\in\mathcal{L}}\left\{\frac{\pi(d|y)}{\min\{\pi(d|c),\pi(d|x)\}},1\right\}.

For the last equality, remember that the categorical product in the unit interval is the minimum. ∎

Note that if there is a text cc that contains xx but does not contain yy, then [hx,hy](c)=0[h^{x},h^{y}](c)=0 for the infimum in Equation (21) is realized when d=cd=c: the numerator π(c|y)=0\pi(c|y)=0 and the denominator π(c|c)×π(c|x)=π(c|x)0\pi(c|c)\times\pi(c|x)=\pi(c|x)\neq 0. So, if [hx,hy](c)0[h^{x},h^{y}](c)\neq 0, then within the context cc, either cc does not contain xx or cc contains both xx and yy, capturing a quantitative sort of implication.

Definition 13.

For any expressions x,yx,y\in\mathcal{L}, define the implication xyx\Rightarrow y to be the copresheaf [hx,hy]:[0,1].[h^{x},h^{y}]\colon\mathcal{L}\to[0,1].

5. A metric space interpretation

It is sometimes desirable to work in a category \mathcal{M} that is a slight variant of the syntax category \mathcal{L}. One can get from \mathcal{L} to \mathcal{M} by applying the negative logarithm to each hom object. First notice the set of nonnegative extended reals [0,][0,\infty] together with addition of real numbers is a commutative monoid with unit 0, where a+:=a+\infty:=\infty and +a:=\infty+a:=\infty for all aa. If one further specifies a morphism from aa to bb whenever bab\leq a, then [0,][0,\infty] is a commutative monoidal preorder. As a category [0,][0,\infty], like the unit interval, is also monoidal closed as well as complete and cocomplete: the internal hom is given by truncated subtraction, [a,b]:=max{ba,0}[a,b]:=\max\{b-a,0\}, and the limit of any diagram is supremum of the numbers in the diagram, while the colimit is given by the infimum. In particular, the categorical product and coproduct are respectively given by a×b=max{a,b}a\times b=\max\{a,b\} and ab=min{a,b}a\sqcup b=\min\{a,b\} for all aa and bb in [0,][0,\infty]. The function ln:[0,1][0,]-\ln\colon[0,1]\to[0,\infty] defines an isomorphism of commutative monoidal preorders, the inverse of which [0,][0,1][0,\infty]\to[0,1] is the map aexp(a)a\mapsto\exp(-a), and both maps are continuous and co-continuous isomorphisms of categories.

By applying ln-\ln to morphisms of \mathcal{L} we thus obtain a new category \mathcal{M}, enriched over the commutative monoidal preorder [0,][0,\infty], having the same objects as \mathcal{L} and where the hom object between any pair of expressions xx and yy is given by (x,y):=ln(x,y)\mathcal{M}(x,y):=-\ln\mathcal{L}(x,y). It is then straightforward to check that 0(x,x)0\geq\mathcal{M}(x,x) for all expressions xx and moreover that (x,y)+(y,z)(x,z)\mathcal{M}(x,y)+\mathcal{M}(y,z)\geq\mathcal{M}(x,z) for all expressions x,y,zx,y,z thus showing that \mathcal{M} is indeed a [0,][0,\infty]-category. As was the case in \mathcal{L}, both inequalities in \mathcal{M} are in fact equalities.

Categories enriched over [0,][0,\infty] are typically called generalized metric spaces [Law73, Law86] since composition of morphisms is precisely the triangle inequality, though notice that symmetry is not required as with usual metrics. Even so, we embrace the generalized metric space point of view and will denote hom objects in \mathcal{M} by d(x,y):=(x,y)d_{\mathcal{M}}(x,y):=\mathcal{M}(x,y), thinking of dd_{\mathcal{M}} as defining a kind of distance between texts. Texts that are likely extensions of other texts have small distances from the texts they extend, and texts that are not extensions of one another are infinitely far apart. Figure 3 illustrates a picture one might have in mind. Paths in this generalized metric space go in only one direction and represent stories: you begin somewhere, there are expectations of where the story is going, the story continues, expectations of where you are going are revised, the story continues, and so on. So the two categories \mathcal{L} and \mathcal{M} have the same information, but the first emphasizes the probabilistic point of view while the second gives a geometric picture.

red red ruby red idea
Figure 3. When viewing language as a generalized metric space, a text such as red is identified with its corresponding [0,][0,\infty]-copresheaf d(red,)d_{\mathcal{M}}(\emph{red},-). Expressions that are unlikely continuations of red are therefore far away, whereas expressions that are more likely to be continuations of red are closer to it.

Naturally, the idea that semantic information lies in a copresheaf category applies in \mathcal{M} as well. There is a generalized metric space ^:=[0,]\widehat{\mathcal{M}}:=[0,\infty]^{\mathcal{M}} whose objects are [0,][0,\infty]-copresheaves on \mathcal{M}, which are functions f:[0,]f\colon\mathcal{M}\to[0,\infty] satisfying dM(x,y)[fx,fy]=max{fyfx,0}d_{M}(x,y)\geq[fx,fy]=\max\{fy-fx,0\} for all expressions xx and yy in the language. We can think of [0,][0,\infty] as a generalized metric space itself and denote [a,b]=max{ba,0}[a,b]=\max\{b-a,0\} by d[0,](a,b)d_{[0,\infty]}(a,b). So a copresheaf is like a metric contraction; that is, a function f:[0,]f\colon\mathcal{M}\to[0,\infty] satsifying d[0,](fx,fy)dM(x,y)d_{[0,\infty]}(fx,fy)\leq d_{M}(x,y).

Translating Equation (11) tells us that the hom object between any pair of [0,][0,\infty]-copresheaves ff and gg is given by

^(f,g)=supxd[0,](fx,gx)=supxmax{gxfx,0}\widehat{\mathcal{M}}(f,g)=\sup_{x\in\mathcal{M}}d_{[0,\infty]}(fx,gx)=\sup_{x\in\mathcal{M}}\max\{gx-fx,0\}

defining the generalized metric on the space of functions that one would expect. Translating the results from Sections 3 and 4, one finds the generalized metric space ^\widehat{\mathcal{M}} is also both complete and cocomplete with respect to weighted (co)limits, and moreover the Yoneda embedding op^\mathcal{M}^{\text{op}}\to\widehat{\mathcal{M}} that maps xd(x,)x\mapsto d_{\mathcal{M}}(x,-) defines an isometric embedding of (the opposite category of) \mathcal{M} as the representable copresheaves within ^\widehat{\mathcal{M}}. Expressions xx and yy that may be unrelated in \mathcal{M}, can be combined in ^\widehat{\mathcal{M}} in ways that are completely analagous to the weighted products and coproducts in ^\widehat{\mathcal{L}}. See also [Wil13] for a similar discussion of generalized metric spaces and what is called the categorical Isbell completion.

Theorem 5.

For any copresheaves f,g^f,g\in\widehat{\mathcal{M}} and any weights w1,w2w_{1},w_{2}, the weighted product and coproducts of ff and gg are given by

((w1,f)×(w2,g))(c)=max{fcw1,gcw2,0}\displaystyle((w_{1},f)\times(w_{2},g))(c)=\max\{fc-w_{1},gc-w_{2},0\}
((w1,f)(w2,g))(c)=min{fc+w1,gc+w2}\displaystyle((w_{1},f)\sqcup(w_{2},g))(c)=\min\{fc+w_{1},gc+w_{2}\}
Proof.

We translate the results from Sections 4.2 and 4.3. Let f,g:[0,]f,g\colon\mathcal{M}\to[0,\infty] and w1,w2[0,]w_{1},w_{2}\in[0,\infty]. Define f,g:[0,1]f^{\prime},g^{\prime}\colon\mathcal{L}\to[0,1] and weights w1,w2[0,1]w_{1}^{\prime},w_{2}^{\prime}\in[0,1] by setting fc=exp(fc)f^{\prime}c=\exp(-fc), gc=exp(gc)g^{\prime}c=\exp(-gc), and wi=exp(wi)w_{i}^{\prime}=\exp(-w_{i}) for i=1,2i=1,2. Then in ^\widehat{\mathcal{L}} we have

((w1,f)×(w2,g))(c)=min{fcw1,gcw2,1}((w_{1}^{\prime},f^{\prime})\times(w_{2}^{\prime},g^{\prime}))(c)=\min\left\{\frac{f^{\prime}c}{w^{\prime}_{1}},\frac{g^{\prime}c}{w^{\prime}_{2}},1\right\}

Translating back to ^\widehat{\mathcal{M}} by applying ln-\ln yields

((w1,f)×(w2,g))(c)\displaystyle((w_{1},f)\times(w_{2},g))(c) =ln(((w1,f)×(w2,g))(c))\displaystyle=-\ln\left(((w_{1}^{\prime},f^{\prime})\times(w_{2}^{\prime},g^{\prime}))(c)\right)
=ln(min{fcw1,gcw2,1})\displaystyle=-\ln\left(\min\left\{\frac{f^{\prime}c}{w^{\prime}_{1}},\frac{g^{\prime}c}{w^{\prime}_{2}},1\right\}\right)
=max{ln(fcw1),ln(gcw2),0}\displaystyle=\max\left\{-\ln\left(\frac{f^{\prime}c}{w^{\prime}_{1}}\right),-\ln\left(\frac{g^{\prime}c}{w^{\prime}_{2}}\right),0\right\}
=max{ln(fc)+ln(w1),ln(gc)+ln(w2),0}\displaystyle=\max\left\{-\ln(f^{\prime}c)+\ln(w^{\prime}_{1}),-\ln(g^{\prime}c)+\ln(w^{\prime}_{2}),0\right\}
=max{fcw1,gcw2,0}.\displaystyle=\max\left\{fc-w_{1},gc-w_{2},0\right\}.

Similarly,

((w1,f)(w2,g))(c)\displaystyle((w_{1},f)\sqcup(w_{2},g))(c) =ln(((w1,f)(w2,g))(c))\displaystyle=-\ln\left(((w_{1}^{\prime},f^{\prime})\sqcup(w_{2}^{\prime},g^{\prime}))(c)\right)
=ln(max{w1fc,w2gc})\displaystyle=-\ln\left(\max\left\{w^{\prime}_{1}f^{\prime}c,w^{\prime}2g^{\prime}c\right\}\right)
=min{ln(w1fc),ln(w2gc)}\displaystyle=\min\left\{-\ln\left(w^{\prime}_{1}f^{\prime}c\right),-\ln\left(w^{\prime}_{2}g^{\prime}c\right)\right\}
=min{ln(fc)ln(w1),ln(gc)ln(w2)}\displaystyle=\min\left\{-\ln(f^{\prime}c)-\ln(w^{\prime}_{1}),-\ln(g^{\prime}c)-\ln(w^{\prime}_{2})\right\}
=min{fc+w1,gc+w2}.\displaystyle=\min\left\{fc+w_{1},gc+w_{2}\right\}.

Geometrically, one may think of nonrepresentable copresheaves, such as the weighted products and coproducts modeling conjoined and disjoined meanings, as new “points” added in ^\widehat{\mathcal{M}}, which are specified precisely by giving their distance to all other points in the category \mathcal{M}. The general idea that copresheaves ff in ^\widehat{\mathcal{M}} are specified precisely by indicating the distance between ff and the representable copresheaves d(x,)d_{\mathcal{M}}(x,-) suggests that the categorical completion ^\widehat{\mathcal{M}} resembles a kind of metric completion.

5.1. Tropical module structure

We note now that the completion ^\widehat{\mathcal{M}} has a semi-tropical module structure, that is, the structure of a module over the semi-tropical semi-ring that we define in the next paragraph. The terminology semi-tropical, and the connection to categorical cocompletions of generalized metric spaces can be found in [Wil13]. In our context, the semi-tropical structure is analogous to the fact that the set of scalar valued functions on a given set has the structure of a vector space, which it inherits from the field of scalars. Here elements of ^\widehat{\mathcal{M}} are functions valued in [0,][0,\infty], which is a semi-tropical semi-ring and therefore ^\widehat{\mathcal{M}} inherits the structure of a semi-tropical module.

Definition 14.

The data ((,],,)((-\infty,\infty],\oplus,\odot) with operations \oplus and \odot defined by

s1s2=min{s1,s2} and s1s2=s1+s2s_{1}\oplus s_{2}=\min\{s_{1},s_{2}\}\text{ and }s_{1}\odot s_{2}=s_{1}+s_{2}

defines a semi-ring called the tropical semi-ring or the (min,+)(\min,+) algebra. The sub semi-ring ([0,],,)([0,\infty],\oplus,\odot) is called the semi-tropical semi-ring. A module over a semi-ring is a commutative monoid with an action of the semi-ring.

Theorem 6.

The coproduct (with trivial weights w1=w2=0w_{1}=w_{2}=0) in ^\widehat{\mathcal{M}} makes it into a commutative monoid. The map [0,]×^^[0,\infty]\times\widehat{\mathcal{M}}\to\widehat{\mathcal{M}} defined by (s,f)sf(s,f)\mapsto s\odot f where (sf)(x)=f(x)+s(s\odot f)(x)=f(x)+s makes \mathcal{M} into a module over the semi-tropical semi-ring.

Proof.

We check: (s1s2)f(x)=min{s1,s2}f(x)=min{fx+s1,fx+s2}(s_{1}\oplus s_{2})f(x)=\min\{s_{1},s_{2}\}f(x)=\min\{fx+s_{1},fx+s_{2}\} and (s1s2)f(x)=fx+s1+s2(s_{1}\odot s_{2})f(x)=fx+s_{1}+s_{2} which is the same as (s1)(s2f)(x)=(s2f)(x)+s1=fx+s2+s1.(s_{1})\odot(s_{2}\odot f)(x)=(s_{2}\odot f)(x)+s_{1}=fx+s_{2}+s_{1}.

In fact, the formulas for the weighted coproduct in ^\widehat{\mathcal{M}} can be re-expressed as a tropical linear combination:

((w1,f)(w2,g))(c)=min{fc+w1,gc+w2}=w1fcw2gc.((w_{1},f)\sqcup(w_{2},g))(c)=\min\left\{fc+w_{1},gc+w_{2}\right\}=w_{1}\odot fc\oplus w_{2}\odot gc.

We have a hunch that tropical geometry provides more than merely a different language to express the results described in ^.\widehat{\mathcal{L}}. Over the last twenty years, tropical geometry has been an area of active research. One thing it provides is a way to transform non-linear algebraic geometry to piecewise linear geometry [Mac12, BIMS15, DS04b, DS04a, Yos11, SS09, SS03, Vir01]. The passage is achieved by considering usual polynomials in several variables and replacing the additions and products by their tropical counterparts. The zero set of the original polynomial is mapped to the singular set of the tropical one which is a piecewise linear space. Semi-tropical module structures on presheaves on generalized metric spaces, along with a variation where min\min is replaced with max\max, were studied in [Wil13].

A potentially relevant discovery is a tropical module structure [DS04b, DS04a, SS09] generated by distance functions, like the ones we have in this paper, but coming from a symmetric metric and satisfying a tropical Plücker relation. A one-dimensional tropical polytope arises, which realizes the metric defined by the distance functions as a tree metric. The achievement here is that extra limit points are added in precise locations, indicating branching points, so that the distances between leaves are realized as distances along tree paths connecting them. This result is used in phylogenetics for the production of trees encoding common ancestors of species using distances defined by their DNA [SS09, DS04c]. We conjecture that a similar result for the metric semantic category ^\widehat{\mathcal{M}} will give rise to phylogenetric structures that work like metric knowledge graphs.

More speculatively, tropical structures might provide insights about how large language models actually learn semantic information. Recently, it has been discovered [ZNL18, MCT21, CM19, SM19] that feedforward neural networks with ReLU activation functions compute tropical rational maps, opening a new way to study the mathematical structure underlying them. LLMs use neural architectures to learn probabability distributions on text continuations. That is, LLMs learn the syntax category \mathcal{L}. It would be very interesting to determine if there is any link between the tropical module structures in the metric semantic language category ^\widehat{\mathcal{M}} and some tropical rational maps computed within the neural architectures of LLMs.

6. Conclusion

Knowing how to continue texts implies a good deal of semantic information. So one might think that it would be very difficult to learn the syntactic language category \mathcal{L} which encodes probability distributions on text continuations. Two intelligent researchers debating five years ago could reasonably disagree about whether or not a computer model that coherently continues texts would require, as part of its training input, some semantic knowledge. However, existing LLMs prove that is not the case. Standard machine learning techniques demonstrate that the enriched category \mathcal{L} can be learned in unsupervised way directly from samples of existing texts. These models have then necessarily learned some semantic information. In this paper, we provide a mathematical framework for where, mathematically speaking, that semantic information lives. Namely, it lives in the category of [0,1][0,1]-copresheaves on \mathcal{L}. We then described a number of categorical constructions that define meaningful operations on that semantic information.

6.1. Applications and future directions

There are several near-term applications of the work described in this paper. One is to study the architectures of trained LLMs, like GPT3, which have successfully learned probability distributions on text continuations. In the language of this paper, these distributions are precisely the representable enriched copresheaves. One can look for other enriched categorical structures in these architectures and since tropical structures appear in both feedforward ReLU networks and the category of copresheaves, they might be a place to start. Then the work in this paper could lead to well-defined mathematical operations on the parameters of trained language models that allow users to access, combine, and manipulate the semantic knowledge that is mathematically implicit in models that have learned to continue texts. For a simple example, the concept of a gender neutral pronoun could be created by taking the coproduct of the copresheaves representing “he” and “she.” Also, weighted limits and colimits allow one to build concepts from multiple texts with precise shapes and weights, and by going back from copresheaves to texts, one would have control of the concepts contained in generated texts. Realizing the internal hom operation, which is like a context-sensitive implication operator, would yield a powerful entailment tool, permitting certain texts to be input as given “true” when performing other NLP tasks. Perhaps it is possible to use internal hom or the polyhedral structures arising from the tropical modules (like higher dimensional phylogenetic trees) to automatically create knowledge graphs, or otherwise organize semantic information, from the parameters of trained LLMs. All of this depends on being able to realize certain categorical operations within existing architectures. Another direction would be to design novel architectures which have the built-in capability to implement these categorical operations. Using density operators to model probability distributions on text continuations is one idea [BV20]. There are natural operations on density operators, like taking convex combinations, that correspond to operations on representable copresheaves. One can average the density operators representing “he” and “she” to obtain a density that doesn’t represent any particular word in the language, but rather captures the concept “he” \sqcup “she.” Density operators also have spectral structures that can be accessed and manipulated realizing other operations with categorical interpretations. Furthermore, tensor networks give highly efficient algorithms for storing and manipulating densities on existing classical hardware, making a tensor network language model an attractive possibility [MRT21, and references within].

In short, we have presented a mathematical framework that puts the kind of syntactical information that large language models learn into an enriched-category theoretic setting. Understanding that semantic information resides in a category of [0,1][0,1]-copresheaves and understanding how categorical operations act on that information leads to a number of concrete and appealing applications which should be explored further.

Acknowledgements

The authors thank Olivia Caramello, Shawn Henry, Maxim Kontsevich, Laurent Lafforgue, Jacob Miller, David Jaz Myers, David Spivak, and Simon Willerton for helpful mathematical discussions. The authors thank Juan Gastaldi and Luc Pellissier for discussions about their philosophical work [Gas21, GP21] and the anonymous referees who made suggestions that greatly improved this article.

References

  • [AS14] Samson Abramsky and Mehrnoosh Sadrzadeh. Semantic Unification, pages 1–13. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014.
  • [B+20] Tom Brown et al. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
  • [BIMS15] Erwan Brugallé, Ilia Itenberg, Grigory Mikhalkin, and Kristin Shaw. Brief introduction to tropical geometry, 2015.
  • [BV20] Tai-Danae Bradley and Yiannis Vlassopoulos. Language modeling with reduced densities. arXiv:2007.03834, 2020. To appear in Compositionality.
  • [CM19] Vasileios Charisopoulos and Petros Maragos. A tropical approach to neural networks with piecewise linear activations, 2019.
  • [CSC10] Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark. Mathematical foundations for a compositional distributional model of meaning. arXiv:1003.4394, 2010.
  • [DS04a] M. Develin and B. Sturmfels. Tropical covexity erratum. Documenta Mathematica, 9 1-27, 2004.
  • [DS04b] Mike Develin and Bernd Sturmfels. Tropical convexity. Documenta Mathematica, 9:1–27, 2004. Erratum pp. 205–206.
  • [DS04c] Mike Develin and Bernd Sturmfels. Tropical convexity. Documenta Mathematica, 9:1–27, 2004.
  • [FS19] Brendan Fong and David I. Spivak. An Invitation to Applied Category Theory: Seven Sketches in Compositionality. Cambridge University Press, 2019.
  • [Gas21] Juan Luis Gastaldi. Why can computers understand natural language? Philosophy & Technology, 34(1):149–214, 2021.
  • [GP21] Juan Luis Gastaldi and Luc Pellissier. The calculus of language: explicit representation of emergent linguistic structure through type-theoretical paradigms. Interdisciplinary Science Reviews, 46(4):569–590, 2021.
  • [Hao21] Karen Hao. “The race to understand the exhilarating, dangerous world of language AI”. MIT Technology Review, May 20, 2021. https://www.technologyreview.com/2021/05/20/1025135/ai-large-language-models-bigscience-project/. Accessed June 1, 2021.
  • [Har54] Zellig S. Harris. Distributional structure. WORD, 10(2-3):146–162, 1954.
  • [Hea20] Will Douglas Heaven. “OpenAI’s new language generator GPT-3 is shockingly good—and completely mindless”. MIT Technology Review, July 20, 2020. https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/. Accessed June 1, 2021.
  • [Hor04] Paul Horwich. A use theory of meaning. Philosophy and Phenomenological Research, 68(2):351–372, 2004. http://www.jstor.org/stable/40040682.
  • [JX19] Kun Jing and Jungang Xu. A survey on neural network language models. arXiv 1906.03591, 2019.
  • [Kel82] G.M. Kelly. Basic Concepts of Enriched Category Theory. Lecture note series / London mathematical society. Cambridge University Press, 1982.
  • [Law69] F. William Lawvere. Adjointness in foundations. Dialectica, 23(3/4):281–296, 1969.
  • [Law73] F. William Lawvere. Metric spaces, generalized logic and closed categories. Rendiconti del seminario matematico e fisico di MIlano, 43:135–166, 1973. https://doi.org/10.1007/BF02924844.
  • [Law86] F. William Lawvere. Taking categories seriously. Revista Colombiana de Matematicas, 20(3-4):147–178, 1986.
  • [Lei14] Tom Leinster. Basic Category Theory. Cambridge Studies in Advanced Mathematics. Cambridge University Press, 2014.
  • [LS09] F.W. Lawvere and S.H. Schanuel. Conceptual Mathematics: A First Introduction to Categories. Cambridge University Press, 2009.
  • [Mac12] Diane Maclagan. Introduction to tropical algebraic geometry, 2012. arXiv:1207.1925.
  • [MCH+20] Christopher D. Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
  • [MCT21] Petros Maragos, Vasileios Charisopoulos, and Emmanouil Theodosis. Tropical geometry and machine learning. Proceedings of the IEEE, 109(5):728–755, 2021.
  • [Met20] Cade Metz. “Meet GPT-3. it has learned to code (and blog and argue).”. New York Times, Nov. 24, 2020. https://www.nytimes.com/2020/11/24/science/artificial-intelligence-ai-gpt3.html. Accessed June 1, 2021.
  • [MM92] Saunders MacLane and Ieke Moerdijk. Sheaves in geometry and logic: a first introduction to topos theory. Universitext. Springer, Berlin, 1992.
  • [MM12] Saunders MacLane and Ieke Moerdijk. Sheaves in Geometry and Logic: A First Introduction to Topos Theory. Universitext. Springer New York, 2012.
  • [MRT21] Jacob Miller, Guillaume Rabusseau, and John Terilla. Tensor networks for probabilistic sequence modeling. In Arindam Banerjee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 3079–3087. PMLR, 2021.
  • [NBvEV16] Rick Nouwen, Adrian Brasoveanu, Jan van Eijck, and Albert Visser. Dynamic Semantics. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2016 edition, 2016.
  • [Pie18] Paul M. Pietroski. Conjoining Meanings: Semantics Without Truth Values. Oxford University Press, Berlin, 2018.
  • [Rie14] Emily Riehl. Categorical Homotopy Theory. New Mathematical Monographs. Cambridge University Press, 2014.
  • [Rie17] Emily Riehl. Category Theory in Context. Aurora: Dover Modern Math Originals. Dover Publications, 2017.
  • [RNSS18] Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Language models are unsupervised multitask learners. 2018. https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf.
  • [RWC+18] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Improving language understanding by generative pre-training. 2018. urlhttps://www.cs.ubc.ca/ amuham01/LING530/papers/radford2018improving.pdf.
  • [SM19] Georgios Smyrnis and Petros Maragos. Tropical polynomial division and neural networks, 2019. arXiv:1911.12922.
  • [SS03] David Speyer and Bernd Sturmfels. The tropical grassmannian. Advances in Geometry, 4:389–411, 2003.
  • [SS09] David Speyer and Bernd Sturmfels. Tropical mathematics. Mathematics Magazine, 82(3):163–173, 2009.
  • [TP10] P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, arXiv:1003.1141, 37:141–188, Feb 2010.
  • [Vir01] Oleg Viro. Dequantization of real algebraic geometry on logarithmic paper. In Carles Casacuberta, Rosa Maria Miró-Roig, Joan Verdera, and Sebastià Xambó-Descamps, editors, European Congress of Mathematics, pages 135–146, Basel, 2001. Birkhäuser Basel.
  • [VSP+17] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, undefinedukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA, 2017. Curran Associates Inc.
  • [Wik21] Wikipedia contributors. Formal grammar — Wikipedia, the free encyclopedia, 2021. [Online; accessed 28-October-2021].
  • [Wil13] Simon Willerton. Tight spans, Isbell completions and semi-tropical modules. Theory and Applications of Categories, 28:696–732, 2013.
  • [Yos11] Shuhei Yoshitomi. Generators of modules in tropical geometry, 2011. arXiv:1001.0448.
  • [ZNL18] Liwen Zhang, Gregory Naitzat, and Lek-Heng Lim. Tropical geometry of deep neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5824–5832. PMLR, 10–15 Jul 2018.