Information content in formal languages

Bernhard Burgstaller [email protected] Departamento de Matematica, Universidade Federal de Santa Catarina, CEP 88.040-900 Florianópolis-SC, Brasil

Abstract.

Motivated by creating physical theories, formal languages $S$ with variables are considered and a kind of distance between elements of the languages is defined by the formula $d(x,y)=\ell(x\nabla y)-\ell(x){\wedge}\ell(y)$ , where $\ell$ is a length function and $x\nabla y$ means the united theory of $x$ and $y$ . Actually we mainly consider abstract abelian idempotent monoids $(S,\nabla)$ provided with length functions $\ell$ . The set of length functions can be projected to another set of length functions such that the distance $d$ is actually a pseudometric and satisfies $d(x\nabla a,y\nabla b)\leq d(x,y)+d(a,b)$ . We also propose a “signed measure” on the set of Boolean expressions of elements in $S$ , and a Banach-Mazur-like distance between abelian, idempotent monoids with length functions, or formal languages.

Key words and phrases:

formal language, variables, metric, distance, equivalence relation, information, information content, abelian, commutative, idempotent, monoid, semigroup, length, length function, measure, Boolean algebra, computing, physics, quantum gravity, physical theories

1991 Mathematics Subject Classification:

94A17, 06E75, 06F05, 68Q45, 94-04

1. Introduction

Variables are the center and core of mathematics, physics, theoretical technics and theoretical economy theory. In this paper we consider formal languages with variables (see definition 2.1), provided with a length function for sentences (for example a simple character count), and define something which appears like a pseudo-metric between the sentences within one languages (see definition 2.12).

The languages are assumed to be equipped with an equivalence relation (def. 2.6), and the more two sentences $x,y$ have in common, the closer their distance becomes.

Actually, this project was motivated by physics in the quest for a solution to quantum gravity, see for instance Weinberg [21] and Hawking and Israel [11]. It is observed, that in general physical theories are surprisingly short when formulated as equations, and that progress in advancing the correct equations in physics is often realized by small modifications in formulas. For example, the formulas between Hamiltonian mechanics and quantum mechanics are very close in distance, also in our approach here, for all given Hamiltonians, even if quantum mechanics and mechanics appear very different when considered without formulas (cf. example 11.1).

We guess that a solution that unites gravity and quantum field theory, when formulated as mathematical equations, is actually close as formulas, for example in the sense proposed in this paper, to the existing ununited theories of gravity and quantum field theory. This might lead to the direction towards a candidate for a united theory, and on the other hand, might be used as a tool to evaluate a given candidate theory by computing the distance to the existing theory so far.

In short terms, the simpler a theory is, the better. The closer it is to existing effective theories, the better, as the more likely it is the correct generalization.

In this sense, even quantum mechanics might very theoretically have had been predicted without any inspection of black body radiation or double slit experiment, because it is perhaps the closest theory to mechanics which avoids the infinities. Simply by looking at formulas.

In this sense also, a solution to quantum gravity is perhaps less a physical problem but more a pure mathematical problem. It may be formulated like so: indicate the theory with the shortest distance - in the sense of this paper, say - to the existing theory, such that no infinities occur (cf. problem 11.3). Even more so, as ultrahigh-energy experiments appear currently to be impossible by several reasons, for example the no-return from black holes obstacle.

After this motivating and ambitious remarks, which may also appear somewhat self-evident, let us come to the hard facts of this paper:

Our definitions may be used much more widely, for example in computing, and it is widely applicable, as the approach is very flexible and does not bother with details but with the core principle. World applications are viewed as formulas. Actually, the distance $d(s,t)$ between two formulas $s,t$ is defined as

d(s,t)=\ell(s\nabla t)-\ell(s){\wedge}\ell(t)

which means in words: unite both theories $s$ and $t$ (called $s\nabla t$ ), optimize their length (called $\ell$ ) and compute it (the more overlap they have the shorter it will become) and then subtract the length of the shorter theory of $s$ and $t$ .

It looks natural and simple, but is also intriguing because of the minus sign. Even if it appears natural that the triangle inequality - also called $\Delta$ -inequality in this paper - holds for $d$ (cf. lemma 3.6), it is difficult and enormous involving when trying to verify this. A key of this paper is to repair the definition of $d$ by making the $\Delta$ -inequality valid, see definition 7.1. See also theorem 6.10 for other conditions which imply the $\Delta$ -inequality.

As another example, it is difficult to get any easy criteria for the uniform continuity of a homomorphism between two languages, for example computer languages, but see lemma 12.1.

To not get lost in details or become too specific, in this paper we mainly forget languages at all but instead work with an abelian, idempotent monoid $(S,\nabla)$ provided with a length function $\ell$ (see definition 2.15).

This setting can be somewhat compared with a measured space $(M,\cup)$ with the operation $\cup$ of taking the union of sets and a measure $\ell$ on it. And indeed, we will make parallels, and show how we can compute the “signed measure” $\zeta$ (i.e. length) of a formal Boolean expression in the monoid $S$ , see definition 6.15. The above formula for $d$ is easily seen to be indeed a metric for measured spaces $(M,\cup)$ and $\nabla:=\cup$ , see lemma 3.3.

In a main result, see theorem 7.21, we show how one can, starting with a length function $\ell$ , construct a length function ${{\ell}^{(\infty)}}$ such that the above formula for $d$ with $\ell$ replaced by ${{\ell}^{(\infty)}}$ is indeed a pseudometric, and the measure $\zeta(x\cap y)$ of the Boolean expression of intersection of two sentences $x$ and $y$ , is monotonically increasing in $x$ and $y$ , as one may naturally expect it. We have however no clue in how far ${{\ell}^{(\infty)}}$ degenerates, it could even be completely zero (but don’t believe that).

A key role in this paper also plays the $\nabla$ -inequality

d(x\nabla y,a\nabla b)\leq d(x,a)+d(y,b)

which, even if it holds heuristically, see lemma 4.2, we also force by definition finally, see definition 7.4.

For homomorphisms between languages we use the same concepts as for languages itself to define a metric, enriched by the submultiplicativity (cf. lemma 8.14).

As a distance between languages, or more generally, abelian, idempotent monoids, one may use the proposed Banach-Mazur-like distance of definition 8.27.

In the final sections we give various examples which should illustrate and motivate the theory of this paper, presented in an extremely superficially and sloppy way.

Actually, defining pseudometrics on languages, and on a family of languages goes back for a long time with a wide literature, and we only cite samples, far from being in any way complete, Han and Ko [10], Djordjević, Ikodinović and Stojanović [6], Fulop and Kephart [8], Ko, Han and Salomaa [15], Imaoka [13], Cseresnyes and Seiwert [2], Ibarra, McQuillan and Ravikumar [12], Bertoni, Goldwurm and Sabadini [1], Stratford [20], Jalobeanu [14], Parker, K.B. Yancey and M.P. Yancey [16], Cui, Dang, Fischer and Ibarra [3], Pighizzini [17], and in the context of physical theories, Giangiacomo [9]. We will not compare these approaches with the one in this paper, but we think there is a different emphasis on the latter.

Let us remark, that we less stress the exact computability of the proposed metrics, but the idealized relations they should satisfy. Also, our definitions are designed such that we find them particularly useful for physical theories.

2. Metric

Under a language $(S,A)$ we mean a set $A$ called alphabet, and a set $S\subseteq\{f:\{1,\ldots,n\}\rightarrow A|\,n\geq 1\}\cup\{\emptyset\}$ of finite text-strings with letters in $A$ called sentences or formulas or theories. Here, $\emptyset$ means the empty sentence (i.e. empty set). In this section we are going to give successively several definitions, each of which becomes part of the considered setting after the definition.

Definition 2.1 (Language with variables).

A language with variables means a language $(S,A)$ , where $\emptyset\in S$ and $A$ contains at least a sign ${\vee}$ and a distinguished infinite subset of so-called variables $X\subseteq A$ , divided further into two disjoint infinite subsets $X_{0}\cup X_{1}=X$ ( $X_{0}\cap X_{1}=\emptyset$ ) called main variables set $X_{0}$ and auxiliary variable set $X_{1}$ , such that for all elements $s,t\in S$ and variable permutations $\phi\in P_{S}$ (to be explained below),

s{\vee}t\in S

\phi(s)\in S

Here, $s{\vee}t$ means the concatenated string of the string $s$ with the letter ${\vee}$ with the string $t$ in the order as indicated. The letter ${\vee}$ serves purely as a symbol for concanation of strings.

Definition 2.2 (Variables permutation).

Let $\psi:X\rightarrow X$ be a permutation (i.e. bijective function) on the variable set such that $\psi(X_{i})=X_{i}$ ( $i=1,2$ ), and let $\phi:S\rightarrow S$ denote the canonically bijective map induced by $\psi$ . That is, set $\phi(a_{1}\ldots a_{n}):=\psi(a_{1})\ldots\psi(a_{n})$ for $a_{i}\in A$ , where $\psi(a):=a$ if $a\in A\backslash X$ .

Write $P_{S}$ for the set of all such functions $\phi:S\rightarrow S$ , called variable permutation or transformation.

Definition 2.3 (Length function).

Under a length or cost function on a language $S$ with variables we mean a positive function $L:S\rightarrow{\mathbb{R}}$ satisfying

L(\emptyset)=0

L(x{\vee}y)=L(x)+L(y)

L(s)=L(\phi(s))

for all $x,y\in S$ and $\phi\in P_{S}$ .

In other words, the empty word has no length, the length of two concanated strings is the sum of of the length of the single strings, and the length or cost of a string does not change if we just choose other variable names.

Lemma 2.4.

The map $f:S\times S\rightarrow S$

f(s,t)=s{\vee}t

is an associative semigroup operation on $S$ .

Two sentences are completely unrelated if they have no variables in common because they do not speak about the same variables:

Definition 2.5 (Independence).

Two elements $x,y\in S$ are called independent, notated by $x\bot y$ , if and only if $x$ and $y$ have no variables in common.

(In other words, a variable showing up in the string $x$ must not show up in the string $y$ .)

In languages like mathematical of physical theories, one can typically express formulas in formally different ways but which are equivalent in content (for example, formally different equations may have the same solution). That is why we need an equivalence relation on the set of sentences, and the following axioms are the minimum natural ones we postulate:

Definition 2.6 (Equivalence relation).

Let be given an equivalence relation $\equiv$ on $S$ such that the following relations hold for all $a,b,s,t,x\in S$ and $\phi\in P_{S}$ :

Neutral element:

(1)

s{\vee}\emptyset\equiv\emptyset{\vee}s\equiv s

Commutativity:

(2)

a{\vee}s{\vee}t{\vee}b\equiv a{\vee}t{\vee}s{\vee}b

${\vee}$ preserves equivalent elements in case of independence:

(3)

s\equiv t,\;s\bot x,\;t\bot x\quad\Rightarrow\quad s{\vee}x\equiv t{\vee}x

A superfluous copy can be removed:

(4)

\phi(x)\bot x\quad\Rightarrow\quad x{\vee}\phi(x)\equiv x

Variable transformation does not change:

(5)

x\equiv\phi(x)

If $x$ contains no main variables then require:

(6)

x\equiv\emptyset

Let us revisit some aspects of the last definition. Relation (2) means it is unimportant in which order we list our sentences separated by ${\vee}$ , like it is unimportant in which order we list the single formulas in a longer theory, or the collection of subroutines in a longer computer program. The connection between the single sentences is regulated solely by the variables they have in common. That is why relation (3) says that the semigroup operation ${\vee}$ of concatenation does not preserve the equivalence relation in general, but only if the concatenated sentences are unrelated. Thus we also needed to incorporate $a$ and $b$ in commutativity (2). Relation (5) tells us that it is unimportant which variables we choose (but only as a whole sentence). Relation (4) says if a theory is restated and fused with the original theory with other variables then essentially nothing new is said and the so new formed theory collapses to the original theory. The relation (6) is actually theoretically unimportant in the sense that it does not shine to the later abstract approach stated in definition 2.15 below through and may thus be omitted, but it is essential in the examples that we shall consider and are the only difference and the point what the distinction between main and auxiliary variables is all about. (See also example 10.1.)

We use class brackets as in $[x]$ for classes in quotients, where $x$ is a representative. The limits of sequences and the compare of all real valued functions $f:Y\rightarrow{\mathbb{R}}$ are always understood to be taken pointwise (i.e. $f\leq g$ iff $f(x)\leq g(x)$ for all $x,y\in Y$ ).

The whole theory of this paper is mainly build only on the quotient set of sentences:

Definition 2.7 (Quotient).

Set $\overline{S}:=S/\equiv$ to be the set-theoretical quotient.

On the quotient of sentences we naturally form an operation of concatenation by concatenating different sentences by choosing copies of them with disjoint variables chosen:

Definition 2.8 (Operation).

Define $\nabla:\overline{S}\times\overline{S}\rightarrow\overline{S}$ by

[x]\nabla[y]:=[x{\vee}\phi(y)]

for all $x,y\in S$ , where $\phi\in P_{S}$ (dependent on $x,y$ ) is chosen such that $x\bot\phi(y)$ .

We will then mainly work with an abstract abelian monoid that actually comes out so far:

Lemma 2.9.

$(\overline{S},\nabla,[\emptyset])$ is an abelian, idempotent monoid (i.e. $x\nabla x=x$ for all $x\in\overline{S}$ ).

Proof.

Indeed, by (5) and (3) the above definition of $\nabla$ is independent of the chosen $\phi\in P_{S}$ . By (1), $[\emptyset]$ is a null-element, and by (2) we get commutativity. Idempotence follows from (4). ∎

On the quotient set of sentences we essentially define a length function $\ell$ by choosing the shortest possible length $L$ among the equivalent representatives (i.e. $\ell([x])=\inf_{y\equiv x}L(y)$ ), but to ensure that the length of $s\nabla t$ is always bigger or equal than the length of $s$ for $s,t$ in the quotient $\overline{S}$ , we correct this definition by the formula of definition 7.9. Thus the following definition of $\ell$ is just a somewhat obscured reformulation of defining $\ell$ at first as in definition 9.1 and then correcting it by definition 7.9:

Definition 2.10 (Length function on quotient).

Set ( $x\in S$ )

\ell:\overline{S}\rightarrow{\mathbb{R}}

\ell([x])=\inf\{L(y)\in{\mathbb{R}}|\,y,a\in S,\;x\bot a,\;y\equiv x{\vee}a\}

We shall conveniently denote $\emptyset:=[\emptyset]\in\overline{S}$ . We get very natural properties for the length function which we later will postulate axiomatically for abelian monoids:

Lemma 2.11.

$\ell$ is positive and for all $x,y\in\overline{S}$ we have:

(7)

\ell(\emptyset)=0

Subadditivity:

(8)

\ell(x\nabla y)\leq\ell(x)+\ell(y)

Monotone increasingness:

(9)

\ell(x)\leq\ell(x\nabla y)

Proof.

Let $\varepsilon>0$ . Given $x_{1},x_{2}\in S$ , by definition 2.10 choose $y_{1},y_{2},a_{1},a_{2}\in S$ such that $\ell([x_{i}])\geq L(y_{i})+\varepsilon$ where $x_{i}{\vee}a_{i}\equiv y_{i}$ and $x_{i}\bot a_{i}$ ( $i=1,2$ ).

Subadditivity: Choose any $\phi\in P_{S}$ such that $y_{1}\bot\phi(y_{2})$ . Then

\ell([x_{1}])+\ell([x_{2}])+2\varepsilon\geq L(y_{1})+L(\phi(y_{2}))=L(y_{1}{\vee}\phi(y_{2}))\geq\ell([y_{1}){\vee}\phi(y_{2})])

=\ell([y_{1}]\nabla[y_{2}])=\ell([x_{1}{\vee}a_{1}]\nabla[x_{2}{\vee}a_{2}])=\ell([x_{1}]\nabla[x_{2}]\nabla[a_{1}]\nabla[a_{2}])\geq\ell([x_{1}]\nabla[x_{2}])

where the last inequality follows from monotone increasingness, which we prove next:

\ell([x]\nabla[a])+\varepsilon=\ell([x{\vee}\phi(a)])+\varepsilon\geq L(y)\geq\ell([x])

where, given $x,a\in S$ , $y,b\in S$ is chosen such that $y\equiv x{\vee}\phi(a){\vee}b$ and the before last inequality holds. ∎

We often abbreviate the notion “monotonically increasing” for (9) simply by “monotone” or “increasing”. Let $x{\wedge}y:=\min(x,y)$ and $x{\vee}y:=\max(x,y)$ denote the minimum and maximum, respectively, of two real numbers $x,y\in{\mathbb{R}}$ .

The length function induces now two very natural notions of distance functions between sentences:

Definition 2.12 (“Metric”).

We define two functions $d,\sigma:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ by

d(s,t)=\ell(s\nabla t)-\ell(s){\wedge}\ell(t)

\sigma(s,t)=2\ell(s\nabla t)-\ell(s)-\ell(t)

We sometimes call these functions also more precisely $d_{\ell}:=d$ and $\sigma_{\ell}:=\sigma$ . They have the characteristic of distance between elements of $\overline{S}$ :

As an illustrative example let us recall that in his famous work [4], Dirac took the then known quantum numbers valued Klein-Gordon function $\psi:{\mathbb{R}}^{4}\rightarrow B(H)$ (operators on Hilbert space) defined on spacetime ${\mathbb{R}}^{4}$ and the Klein-Gordon differential equation $\Delta\psi-\partial_{t}\psi-m^{2}\psi=0$ ( $m\in{\mathbb{R}}$ constant) and took the formal square-root of the Klein-Gorden differential operator $K:=\Delta-\partial_{t}-m^{2}$ and formed the Dirac differential operator $D:=\sqrt{K}$ with constant coefficients in matrix algebra $M_{4}({\mathbb{C}})$ , that is, $D^{2}=K$ , and proposed the physical equation $D\psi=0$ and along with it predicted correctly the existence of particles and anti-particles [5]. This is a prototype example where an existing physical theory is slightly modified to obtain a new theory. That is, the Klein-Gordon equation is close to the Dirac equation, also in our “metric”. Let us compute this:

Example 2.13.

The prototype example is physical theories. $S$ are the collections of all sentences, which are equations, formulas, function definitions and so on. Equivalence relation $\equiv$ is when two formulas are equivalent in a mathematical sense. The ${\vee}$ means “and” in the sense that this and the other formula/equation holds. The equivalence relation $\equiv$ has to be enriched by the axioms of definition 2.6. Main variables are those that cannot be deleted, like the electromagnetic field in Maxwell’s equations, because it is what the theory is about. The length function $L$ could be a simple character count. See also the beginning of section 10 including example 10.1 for further explanations.

For example, consider the free Klein-Gordon equation $\Delta\psi=\partial_{t}^{2}\psi+m^{2}\psi$ and the Dirac equation $D\psi=0$ , where $D^{2}=\Delta-\partial_{t}^{2}-m^{2}=:K$ , and where in both cases $\psi$ is a main variable. Then

d([\mbox{Dirac}],[\mbox{Klein-Gordon}])=d([D\psi=0{\vee}D^{2}=\Delta-\partial_{t}^{2}-m^{2}],[(\Delta-\partial_{t}^{2}-m^{2})\psi=0])

=\ell([D\psi=0{\vee}D^{2}=\Delta-\partial_{t}^{2}-m^{2}{\vee}(\Delta-\partial_{t}^{2}-m^{2})\phi=0])-\ell([(\Delta-\partial_{t}^{2}-m^{2})\psi=0])

=\ell([D\psi=0{\vee}D^{2}=K{\vee}K:=\Delta-\partial_{t}^{2}-m^{2}{\vee}K\phi=0])-\ell([(\Delta-\partial_{t}^{2}-m^{2})\psi=0])

\approx 23-13=10

whereas $\ell([\mbox{Dirac}])=15$ and $\ell([\mbox{Klein-Gordon}])\approx 13$ .

Here, $K$ and $D$ are auxiliary variables. If we had a longer formula of $K$ (i.e. the Klein-Gordon operator) then the above distance $d(D,KG)$ would not change, but $\ell(D)$ and $\ell(KG)$ would increase.

Note that we did not explicitly transform the variables $x_{1},x_{2},x_{3},t$ in $(\Delta-\partial_{t}^{2})\phi(x,t)$ as required by the $\nabla$ -operation of definition 2.8 because they are bound variables which we can label as we like without change, and so have back transformed.

Example 2.14.

Another standard example is computer languages. For example $C$ ++. Sentences are meaningful programs or code snippets. Two programs are equivalent, if they do the same, for example, independent of time performance. Typically, the entry point of a program is a main variable. Set $L(p)$ to be the length of a program $p$ . Set ${\vee}$ to be a concatenation of programs.

Variables are the typical variables in higher computer languages. The variable transformation in the $\nabla$ -operation of definition 2.8 ensures that the variables of two different programs do not overlap.

The more two programs $s,t\in\overline{S}$ have in common, the lower $d(s,t)$ becomes.

For example, if $s$ and $t$ have one common routine, then it is enough to keep one copy of them in $s{\vee}t$ .

Throughout, if nothing else is said, we switch to the following general setting for $\overline{S},\nabla$ and $\ell$ . Only if we discuss something involving $S$ itself, we implicitly assume without saying that we use the above specific definitions of $\overline{S},\nabla$ and $\ell$ .

Definition 2.15 (More abstract definition).

Let $(\overline{S},\nabla,\emptyset)$ be an abelian, idempotent monoid.

We call $\ell:\overline{S}\rightarrow{\mathbb{R}}$ a length function if the assertions of lemma 2.11 hold for it.

$\ell(x)$ is also called the information content of $x\in\overline{S}$ .

Positivity of $\ell$ is also automatic by idempotence and subadditivity, as $\ell(x)=\ell(x\nabla x)\leq 2\ell(x)$ . Instead of the above specific metric candidates $d$ and $\sigma$ from above, we shall sometimes use the following generalization:

Definition 2.16 (Metric candidate).

We call $d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ a positive, symmetric, nilpotent function if

d(x,y)\geq 0,\quad d(x,y)=d(y,x),\quad d(x,x)=0

for all $x,y\in\overline{S}$ .

$d$ is called a pseudometric on $\overline{S}$ if it is positive, symmetric, nilpotent and satisfies the $\Delta$ -inequality (see definition 3.1).

3. $\Delta$ -inequality

The following section is a first discussion when the proposed distance functions $d$ and $\sigma$ of defintion 2.12 could indeed satisfy the well known triangle inequality (here also called $\Delta$ -inequality). This is actually a main theme of this paper and first satisfying answers are given in theorems 6.10 and 7.21.

Let $(\overline{S},\nabla,\emptyset)$ be an abelian, idempotent monoid.

Definition 3.1.

A function $d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ is said to satisfy the $\Delta$ -inequality if

d(x,z)\leq d(x,y)+d(y,z)

for all $x,y,z\in\overline{S}$

Lemma 3.2.

If a symmetric function $d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ satisfies the $\Delta$ -inequality then also the second $\Delta$ -inequality (i.e. second triangle inequality)

|d(x,z)-d(z,y)|\leq d(x,y)

To heuristically see that $d$ of definition 2.12 in the context of formal languages could satisfy the $\Delta$ -inequality we do the following considerations. Assume a sentence $s\in S$ is written in its shortest form as $s\equiv s_{1}{\vee}\ldots{\vee}s_{n}$ for $s_{i}\in S$ , that is, such that $\ell([s])\approx L(s_{1}{\vee}\ldots{\vee}s_{n})=L(s_{1})+\ldots L_{(}s_{n})$ . Similarly we do so for a given sentence $t\equiv t_{1}{\vee}\ldots t_{m}$ . Many $s_{i}$ could be auxiliary function definitions, some of which may be identical to some $t_{j}$ . As a first approximation we may estimate

\ell([s]\nabla[t])=\ell([s{\vee}\phi(t)])=\ell([s_{1}{\vee}\ldots s_{n}{\vee}\phi(t_{1}){\vee}\ldots\phi(t_{m})])

=\ell([s_{1}{\vee}\ldots s_{n}{\vee}\psi(t_{1}){\vee}\ldots\psi(t_{k})])\leq L(s_{1})+\ldots+L(s_{n})+L(t_{1})+\ldots+L(t_{k})

where in the last line we have dropped identical auxiliary function definitions $\phi(t_{k+1}),\ldots,\phi(t_{m})$ which appear already in the $s_{i}$ s, and so instead of calling $\phi(t_{j})$ we may call $s_{i}$ and that is why we had to introduce another transformation $\psi$ and dropped the superfluous $\phi(t_{j})$ by (6). In a certain sense, we formed the set theoretical union of $\{s_{1},\ldots,s_{n}\}$ and $\{t_{1},\ldots,t_{m}\}$ .

If we sloppily replace the above $\leq$ by $\approx$ and view $L$ as a measure, then the $\Delta$ -inequality holds:

Lemma 3.3.

Let $(X,{\mathcal{A}},\mu)$ be a measured space. Set ${\mathcal{A}}_{0}:=\{A\in{\mathcal{A}}|\,\mu(A)<\infty\}$ to be the collection of finite, measurable sets. Then

\displaystyle d(A,B)

\displaystyle:=

\displaystyle\mu(A\cup B)-\mu(A)\wedge\mu(B)

\sigma(A,B):=2\mu(A\cup B)-\mu(A)-\mu(B)

where $A,B\in{\mathcal{A}}_{0}$ , define metrics on ${\mathcal{A}}_{0}$ .

Proof.

Generally we have

(10)

\mu(X\cup Z)-\mu(X)=\mu(Z\backslash X)=\mu(Z)-\mu(Z\cap X)

We only prove the $d$ -case. (i) Assume $\mu(Y)\leq\mu(X)\leq\mu(Z)$ . Then we have equivalences

\mu(X\cup Z)-\mu(X)\leq\mu(X\cup Y)+\mu(Y\cup Z)-2\mu(Y)

\mu(Z)-\mu(Z\cap X)\leq\mu(X)-\mu(X\cap Y)+\mu(Z)-\mu(Z\cap Y)

(11)

\mu(X\cap Y)+\mu(Z\cap Y)\leq\mu(Z\cap X)+\mu(X)

But this is true because in general we have

\mu(X\cap Y)+\mu(Z\cap Y)\leq\mu(Y)+\mu(X\cap Y\cap Z)

(ii) Assume $\mu(X)\leq\mu(Y)\leq\mu(Z)$ . Doing the same as above leads to (11) but with $\mu(X)$ replaced with $\mu(Y)$ and this is then also true by the above argument.

(iii) Assuming $\mu(X)\leq\mu(Z)\leq\mu(Y)$ leads to (11) but with $\mu(X)$ replaced with $2\mu(Y)-\mu(X)$ and this is then also true. ∎

Actually $d$ is like the maximum’s norm or $\ell^{\infty}$ -norm, and $\sigma$ like the $\ell^{1}$ -norm in ${\mathbb{R}}^{n}$ or space of real-valued sequences:

Lemma 3.4.

Continuing the setting of the last lemma, we have $d\leq\sigma$ and

d(A,B)=\max(\mu(A\backslash B),\mu(B\backslash A))

\sigma(A,B)=\mu(A\backslash B)+\mu(B\backslash A)

This is obvious by (10). Actually, $\sigma$ as just formulated is well-known to be a metric, $d$ might be less used. We continue by imposing definitions 2.15 and 2.12.

Switch back to the notions of definitions 2.15 and 2.16 for $\overline{S},\ell,d$ and $\sigma$ .

Lemma 3.5.

For all $x,y\in\overline{S}$ we have

\ell(x){\vee}\ell(y)\leq\ell(x\nabla y)

|\ell(x)-\ell(y)|\leq d(x,y)\leq\ell(x){\vee}\ell(y)

|\ell(x)-\ell(y)|\leq\sigma(x,y)\leq\ell(x)+\ell(y)

(12)

d(x,y)\leq\sigma(x,y)\leq 2d(x,y)

d(x,\emptyset)=\sigma(x,\emptyset)=\ell(x)

Proof.

The first relation follows from (9). If $\ell(x)\leq\ell(y)$ then $|\ell(x)-\ell(y)|=\ell(y)-\ell(x)\leq\ell(x\nabla y)-\ell(x)=d(x,y)\leq\ell(x)+\ell(y)-\ell(x)=\ell(y)$ by (9), showing the second line. The third line follows from the similarly proved (12) and (9). ∎

This criterion for the triangle inequality will be extensively used in section 6:

Lemma 3.6.

$\sigma$ satisfies the $\Delta$ -inequality if and only if

\ell(x\nabla z)+\ell(y)\leq\ell(x\nabla y)+\ell(y\nabla z)

for all $x,y,z\in\overline{S}$ .

Moreover, this equivalence is true for any given function $\ell:\overline{S}\rightarrow{\mathbb{R}}$ .

Proof.

Divide the $\Delta$ -inequality spelled-out by $2$ to see the equivalence:

2\ell(x\nabla z)-\ell(x)-\ell(z)\leq 2\ell(x\nabla y)-\ell(x)-\ell(y)+2\ell(y\nabla z)-\ell(y)-\ell(z)

∎

Interestingly, if $\sigma$ satisfies the triangle inequality then $\ell$ must be a length function:

Lemma 3.7.

Given any function $\ell:\overline{S}\rightarrow{\mathbb{R}}$ , if $\sigma$ satisfies the $\Delta$ -inequality, then $\ell$ is monotonically increasing.

Moreover it is positive and subadditive if $\ell(\emptyset)\geq 0$ , and even a length function (def. 2.15) if $\ell(\emptyset)=0$ .

Proof.

Set $z=\emptyset$ in lemma 3.6 to get monotonically increasingness, $y=\emptyset$ to get subaddtivity, and positivity by $0\leq\ell(\emptyset)\leq\ell(x)$ . ∎

4. $\nabla$ -inequality

An important ingredient of our theory is the $\nabla$ -inequality defined and first discussed in this section. It is fundamental in the proof of lemma 7.14, and thus for the whole section 7, theorem 7.14 and corollary 7.24.

Let $(\overline{S},\nabla,\emptyset)$ be an abelian, idempotent monoid.

Definition 4.1.

A function $d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ is said to satisfy the $\nabla$ -inequality if

d(x\nabla y,a\nabla b)\leq d(x,a)+d(y,b)

for all $x,y,a,b\in\overline{S}$

This may be heuristically compared with normed vector spaces by setting $d(x,y):=\|x-y\|$ , $x\nabla y:=x+y$ and having

\|x+y-a-b\|\leq\|x-a\|+\|y-b\|

Going back to the remark before lemma 3.3 and the setting of this lemma, we heuristically argue that the $\nabla$ -equality may hold:

Lemma 4.2.

Let $d$ be as in lemma 3.3. Then

d(A\cup B,C\cup D)\leq d(A,C)+d(B,D)

Proof.

Write $|A|:=\mu(A)$ . We have to show that

(13)

|A\cup B\cup C\cup D|-|A\cup B|{\wedge}|C\cup D|\leq|A\cup C|-|A|{\wedge}|C|+|B\cup D|-|B|{\wedge}|D|

By symmetry, just exchange $A\leftrightarrow C$ and $B\leftrightarrow D$ , it is enough to consider the case $|A\cup B|\leq|C\cup D|$ .

Recalling (10), the left hand side of (13) is

|(C\cup D)\backslash(A\cup B)|=|(C\cup D)\cap A^{c}\cap B^{c}|

\leq|C\cap A^{c}|+|D\cap B^{c}|=|C\backslash A|+|D\backslash B|

Case $|A|\leq|C|,|B|\leq|D|$ : In this case, the last inequality verifies (13) by (10).

Case $|A|\leq|C|,|B|\geq|D|$ : In this case, $|D\backslash B|\leq|B\backslash D|$ . Hence the last inequality verifies (13). The third case is analogous. ∎

Lemma 4.3.

If a nilpotent function $d$ satisfies the $\nabla$ -inequality then ( $\forall x,y,z\in\overline{S}$ )

(14)

d(x\nabla z,y\nabla z)\leq d(x,y)

We say that a function $d$ satisfies the weak $\nabla$ -inequality if it satisfies the inequality (14), and only the very weak one, if this holds only for $x\leq y$ (see definition 5.1).

We finally remark that given a (possibly infinite) sequence of pseudometrics in our setting, we may sum them up to get possibly a more faithful pseudometric:

Lemma 4.4.

The set of length functions on $\overline{S}$ is a positive cone, i.e. if $\ell_{1},\ell_{2}$ are length functions, then also $\lambda_{1}\ell_{1}+\lambda_{2}\ell_{2}$ for constants $\lambda_{1},\lambda_{2}\geq 0$ in ${\mathbb{R}}$ . Similarly, the functions $d$ satisfying the $\nabla$ -inequality (or the $\Delta$ -inequality, or those of the form $\sigma=\sigma_{\ell}$ ) are a positive cone, and one has $\sigma_{\lambda_{1}\ell_{1}+\lambda_{2}\ell_{2}}=\lambda_{1}\sigma_{\ell_{1}}+\lambda_{2}\sigma_{\ell_{2}}$ .

5. Order relation

In this section we consider a natural order relation on an abelian idempotent monoid induced by the monoid operation. This is not only important in various occasions in the theory like in the monotone increasingness of $\ell$ and in various aspects in the next section, but also a fundamental relation for information itself. It says when a theory is information-theoretically contained in another theory, and because we work with equivalence relations on theories, this works also for formally very different looking representatives of theories. This effect will even become more intense when we shall divide out equivalences induced by a pseudometric in definition 5.7.

In this section, if nothing else is said, we continue with the setting and notions of definitions 2.15 and 2.12.

Definition 5.1.

Define an order relation $\leq$ on $\overline{S}$ by $(\forall x,y\in\overline{S})$

x\leq y\quad\Leftrightarrow\quad x\nabla y=y

The following three lemmas are straightforward to check, and the reader should recall the monotone increasingness of $\ell$ , see (9), which is often used, as it is in lemmas 5.5 and 5.6.

Lemma 5.2.

(i) This is an order relation on $\overline{S}$ .

(ii) For all $a,b,x,y\in\overline{S}$ we have

a\leq x,b\leq y\quad\Rightarrow\quad a\nabla b\leq x\nabla y

(iii) $\overline{S}$ is a directed set as

\emptyset\leq a,b\leq a\nabla b

The length function $\ell$ preserves the order and $d$ and $\sigma$ have particular simple formulas for ordered pairs:

Lemma 5.3.

For all $a,b,c\in\overline{S}$ we have

a\leq b\quad\Rightarrow\quad\ell(a)\leq\ell(b)

a\leq b\quad\Rightarrow\quad d(a,b)=\sigma(a,b)=\ell(b)-\ell(a)

a\leq b\leq c\quad\Rightarrow\quad d(a,c)=d(a,b)+d(b,c),\quad\sigma(a,c)=\sigma(a,b)+\sigma(b,c)

Here are some further relations ( $\delta_{y}$ is defined below) for $d$ and $\sigma$ under ordered pairs:

Lemma 5.4.

For all $a,b,x,y\in\overline{S}$ we have

(15)

\sigma(x,x\nabla y)=d(x,x\nabla y)=-\delta_{y}(x)

-\sigma(x,x\nabla y)+\sigma(y,x\nabla y)=\ell(x)-\ell(y)

\sigma(a,b)=d(a,a\nabla b)+d(b,a\nabla b)=\sigma(a,a\nabla b)+\sigma(b,a\nabla b)

d(a,b)=d(a,a\nabla b){\vee}d(b,a\nabla b)=\sigma(a,a\nabla b){\vee}\sigma(b,a\nabla b)

In other words, if we know the values of $d$ or $\sigma$ for all $a\leq b$ , then we know them for all $a,b$ , and for both $d$ and $\sigma$ .

The following lemma reverses lemma 5.3. It also states in words that a theory is contained in another theory (i.e. $x\leq y$ ) if and only if the information content of the united theory is the information content of the bigger theory (i.e. $\ell(x\nabla y)=\ell(y)$ ).

Lemma 5.5.

For all $x,y\in\overline{S}$ , the following assertions are equivalent:

d(x,y)=\ell(y)-\ell(x)\quad\Leftrightarrow\quad d(y,x\nabla y)=0\quad\Leftrightarrow\quad\ell(x\nabla y)=\ell(y)

\Leftrightarrow\quad\sigma(x,y)=\ell(y)-\ell(x)\quad\Leftrightarrow\quad\sigma(y,x\nabla y)=0

If $d$ or $\sigma$ is faithful (i.e. for all $x,y\in\overline{S}$ one has $d(x,y)=0\Rightarrow x=y$ ) then these relations are also equivalent to

x\leq y

Proof.

(i) if $d(x,y)=\ell(y)-\ell(x)\geq 0$ then $\ell(y)\geq\ell(x)$ and so $d(x,y)=\ell(x\nabla y)-\ell(x)$ and thus $\ell(x\nabla y)=\ell(y)\geq\ell(x)$ and so

d(y,x\nabla y)=\ell(x\nabla y)-\ell(y)=0

These implications can be reversed.

The same equivalence with $d$ replaced by $\sigma$ is analogously proved.

(ii) If $d$ is faithful this is equivalent to saying that $y=x\nabla y$ ∎

Lemma 5.6.

For all $x,y\in\overline{S}$ we have the equivalences

d(x,y)=0\quad\Leftrightarrow\quad\sigma(x,y)=0\quad\Leftrightarrow\quad\ell(x\nabla y)=\ell(x)=\ell(y)

In particular, $d$ is faithful iff $\sigma$ is faithful.

Proof.

If $\sigma(x,y)=2\ell(x\nabla y)-\ell(x)-\ell(y)=0\leq 0$ then, because $\ell(x)+\ell(y)\leq 2\ell(x\nabla y)$ , we have $2\ell(x)\leq 2\ell(x\nabla y)=\ell(x)+\ell(y)$ and so $\ell(x\nabla y)=\ell(x)=\ell(y)$ . The first equivalence is by (12). ∎

Typically, later in this paper we will construct pseudometrics satisfying the $\nabla$ -inequality and in the next definition and proposition we state how the given structures on the abelian idempotent monoid carry over naturally to the quotient when turning a pseudometric to a metric.

Definition 5.7 (From pseudometric to metric).

Assume that a pseudometric $d$ on $\overline{S}$ satisfies the $\nabla$ -inequality.

Define an equivalence relation $\cong$ on $\overline{S}$ by

x\cong y\quad\Leftrightarrow\quad d(x,y)=0\qquad\forall x,y\in\overline{S}

Set ${\check{S}}:=\overline{S}/\cong$ to be the set-theoretical quotient. Define operations, relations and functions on ${\check{S}}$ by (for all $x,y\in\overline{S}$ )

[x]\nabla[y]:=[x\nabla y]

[x]\leq[y]:\Leftrightarrow[y]=[x]\nabla[y]

{\check{d}}([x],[y]):=d(x,y)

{\check{\ell}}([x]):=\check{d}([x],[\emptyset])

Then the so defined language ${\check{S}}$ with equivalence relation satisfies the concepts of section 2 again:

Proposition 5.8.

If a pseudo-metric $d$ on $\overline{S}$ satisfies the $\nabla$ -inequality then:

(i) The $\nabla$ operation passes to $\check{S}$ , that is, the above $(\check{S},\nabla,[\emptyset])$ is an abelian, idempotent monoid with order relation as in definition 5.1, and is provided with a metric $\check{d}$ satisfying the $\nabla$ -inequality.

(ii) In case $\overline{S}=S/\equiv$ as of definition 2.7, then we may canonically identify $\check{S}=S/\equiv_{2}$ for an equivalence relation $\equiv_{2}$ on $S$ satisfying the same rules of definition 2.6. (So $\check{S}$ is again of the form $\overline{S}$ .)

(iii) In case $d=d_{\ell}$ (or $d=\sigma_{\ell}$ ), the above defined $\check{\ell}$ is a length function on $\check{S}$ and

{\check{d_{\ell}}}=d_{{\check{\ell}}}\qquad(\mbox{or }\quad{\check{\sigma_{\ell}}}=\sigma_{{\check{\ell}}})

Proof.

(i) The $\nabla$ operation is well-defined on $\tilde{S}$ as if $x\cong x^{\prime}$ and $y\cong y^{\prime}$ then

d(x\nabla y,x^{\prime}\nabla y^{\prime})\leq d(x,x^{\prime})+d(y,y^{\prime})=0

(ii) Let $\alpha:S\rightarrow\overline{S}/\cong$ the canonical map and $\equiv_{2}$ the associated equivalence relation $x\equiv_{2}y$ if and only if $\alpha(x)=\alpha(y)$ for all $x,y\in S$ . As all rules of definition hold for $\equiv$ , they hold also for $\equiv_{2}$ , excepting (3) needs explanation. But by lemma 4.3 we have

s\equiv_{2}t\Leftrightarrow d([s],[t])=0\Rightarrow 0=d([s]\nabla[x],[t]\nabla[x])=d([s{\vee}x],[t{\vee}x])\Rightarrow s{\vee}x\equiv_{2}t{\vee}x

for $s,t\bot x\in S$

(iii) By definition

\check{\ell}([x]\nabla[y])-\check{\ell}([x])=d_{\ell}(x\nabla y,\emptyset)-d_{\ell}(x,\emptyset)=\ell(x\nabla y)-\ell(y)=d(x\nabla y,x)\geq 0

so $\check{\ell}$ is monotonically increasing, and its subadditivity follows from the $\nabla$ -inequality of $\check{d}$ . ∎

6. Measure of Boolean expressions

We have seen in the considerations before lemma 3.3, that somehow, $\ell(x\nabla y)$ is the measure of the union of $x$ and $y$ . In this section we want to extend the notion of a measure also to other Boolean expressions, like the intersection $x\cap y$ of sentences $x$ and $y$ .

There is not necessarily a sentence that represents the intersection or the set-theoretical difference $x\backslash y$ of two sentences $x,y$ , see example 10.11. But we can still compute the “measure” of these non-existing, imagined elements.

Let us write formally $\zeta(E)$ for the measure of a Boolean expression $E$ , for example $\zeta(x\cap y)$ . We will give precise definitions in this section.

In this section, we assume the setting and notations of definitions 2.15 and 2.12, but suppose that $\ell$ is possibly non-increasing, that is, we cease requiring (9).

Definition 6.1 (Measure of intersection).

For all $x,y\in\overline{S}$ we set

\zeta(x\cap y):=\ell(x)+\ell(y)-\ell(x\nabla y)

An illustrative example for $\zeta(x\cap y)$ would be the amount two computer programs $x$ and $y$ have in common, for example by identical or mathematically similar routines. The following function slightly extracts some core of the last definition:

Definition 6.2 ( $\delta$ -function).

We also set, for all $x,y\in\overline{S}$ ,

\delta_{y}(x):=\ell(x)-\ell(x\nabla y)

We say the measure of intersection is increasing if

a\leq x,b\leq y\quad\Rightarrow\quad\zeta(a\cap b)\leq\zeta(x\cap y)

for all $a,b,x,y\in\overline{S}$ , and $\delta$ is increasing if

\forall a,x,y\in\overline{S}:\qquad a\leq x\quad\Rightarrow\quad\delta_{y}(a)\leq\delta_{y}(x)

The following couple of lemmas will cummulate to theorem 6.10 below.

Lemma 6.3.

We have the following equivalences:

The measure of intersection is increasing.

$\Leftrightarrow$ $\zeta(x\cap y)$ is increasing in $x\in\overline{S}$ for all fixed $y\in\overline{S}$ .

$\Leftrightarrow$ $\delta$ is increasing.

Proof.

If $\delta$ is increasing, then $\zeta(x\cap y)=\delta_{y}(x)+\ell(y)$ is increasing in $x$ for fixed $y$ , and so also in $y$ for fixed $x$ by symmetry of $\zeta(x\cap y)$ . Combining both we get that the measure of intersection is increasing. ∎

Lemma 6.4.

If $\delta$ is increasing and $\ell$ monotone, then $\sigma$ satisfies the $\Delta$ -inequality.

Proof.

$\delta$ is increasing means that for all $x,y,z\in\overline{S}$ we have

(16)

\ell(y)-\ell(y\nabla x)\leq\ell(y\nabla z)-\ell(y\nabla z\nabla x)

Monotony of $\ell$ implies

\ell(z\nabla x)\leq\ell(y\nabla z\nabla x)

Thus we get the criterion of lemma 3.6, that is,

(17)

\ell(z\nabla x)+\ell(y)\leq\ell(y\nabla z)+\ell(y\nabla x)

∎

Lemma 6.5.

If $\sigma$ satisfies the $\Delta$ -inequality then $\delta$ is increasing.

Proof.

Replace $z$ in (17), which holds by lemma 3.6, by $z\nabla y$ to obtain (16). ∎

Lemma 6.6.

If $\delta$ is increasing then $\sigma$ satisfies the $\nabla$ -inequality.

Proof.

By lemma 6.3, $\rho(x,y):=\zeta(x\cap y)$ is increasing in $x,y$ simultaneously. Then $\rho$ is increasing implies $-\rho$ is decreasing, and applying this two times yields

-2\rho(x\nabla y,a\nabla b)=2\ell(x\nabla y\nabla a\nabla b)-2\ell(x\nabla y)-2\ell(a\nabla b)

\leq-\rho(x,a)-\rho(y,b)=\ell(x\nabla a)-\ell(x)-\ell(a)+\ell(y\nabla b)-\ell(y)-\ell(b)

A simple rearrangement of the summands in this inequality yields exactly

\sigma(x\nabla a,y\nabla b)\leq\sigma(x,y)+\sigma(a,b)

∎

Lemma 6.7.

If $d$ or $\sigma$ satisfies the very weak $\nabla$ -inequality and $\ell$ is monotone, then $\delta$ is increasing.

Proof.

The very weak $\nabla$ -inequality of $d$ implies

-d(y,y\nabla z)\leq-d(y\nabla x,y\nabla z\nabla x)

which by the monotonicity of $\ell$ means

\ell(y)-\ell(y\nabla z)\leq\ell(y\nabla x)-\ell(y\nabla z\nabla x)

But this means $\delta$ is increasing. The same is true for $\sigma$ by (15). ∎

Lemma 6.8.

If $\ell$ is monotone, then $d$ satisfies the very weak $\nabla$ -inequality if and only if $\sigma$ satisfies it.

Proof.

Because of (15). ∎

Lemma 6.9.

If $\sigma$ satisfies the $\Delta$ -inequality then $d$ satisfies the $\Delta$ -inequality.

Proof.

Let $x,y,z\in\overline{S}$ . In all cases like $\ell(x)\leq\ell(y)\leq\ell(z)$ etc. we see that

\ell(x){\wedge}\ell(y)+\ell(y){\wedge}\ell(z)-\ell(x){\wedge}\ell(z)\leq\ell(y)

Thus, by lemma 3.6 we get

\ell(x\nabla z)+\ell(x){\wedge}\ell(y)+\ell(y){\wedge}\ell(z)-\ell(x){\wedge}\ell(z)\leq\ell(x\nabla y)+\ell(y\nabla z)

But this is the $\Delta$ -inequality for $d$ . ∎

It is astonishing how strongly the validity of the $\Delta$ -inequality and the $\nabla$ -inequality of $d$ and $\sigma$ , and the monotone increasingness of the measure of intersection and the $\delta$ -function are connected:

Theorem 6.10.

Assume the setting and notations of definitions 2.15 and 2.12. In particular, suppose that $\ell$ is monotonically increasing. Then the following are equivalent:

(a) $d$ satisfies the very weak $\nabla$ -inequality

(b) $\sigma$ satisfies the very weak $\nabla$ -inequality

(d) the measure of intersection is increasing

(e) $\sigma$ satisfies the $\Delta$ -inequality

(f) $\sigma$ satisfies the $\nabla$ -inequality

Each of these assertions imply that $d$ satisfies the $\Delta$ -inequality.

Reversely, the $\nabla$ -inequality for $d$ implies the above assertions.

Proof.

(a) $\Leftrightarrow$ (b) is by lemma 6.8, (a) $\Leftrightarrow$ (c) by lemma 6.7, (c) $\Leftrightarrow$ (d) by lemma 6.3, (c) $\Leftrightarrow$ (e) by lemmas 6.4 and 6.5, (c) $\Rightarrow$ (f) by lemma 6.6, and (f) $\Rightarrow$ (b) is clear. The forelast assertion is by lemma 6.9. That the last assertion implies (a) is clear. ∎

Having said in definition 6.1 what the measure of an imagined intersection of two sentences is, we continue to define it for the last Boolean operations of complement and union.

Definition 6.11 (Measure of complement).

For all $x,y\in\overline{S}$ put

\zeta(x\backslash y):=\ell(x\nabla y)-\ell(y)

As $\zeta(x\backslash y)=-\delta_{y}(x)$ , $\delta$ is increasing if and only if the measure of complement $\zeta(x\backslash y)$ is decreasing in $y$ . This is reminiscent of lemma 3.4:

Lemma 6.12.

If $\ell$ is monotone then

\zeta(x\backslash y)=d(y,x\nabla y)=\sigma(y,x\nabla y)

d(x,y)=\zeta(x\backslash y){\vee}\zeta(y\backslash x)

\sigma(x,y)=\zeta(x\backslash y)+\zeta(y\backslash x)

Definition 6.13 (Measure of union).

For $x_{1},\ldots,x_{n}\in\overline{S}$ set

\zeta(x_{1}\cup\ldots\cup x_{n}):=\ell(x_{1}\nabla\ldots\nabla x_{n})

The final goal of this section is to extend the measure of imagined Boolean expressions of sentences to all possible Boolean expressions.

Definition 6.14.

Let $B$ be the free Boolean algebra (with zero element $0$ and one-element $1$ ) generated by the set $\overline{S}$ .

That is, $B$ consists of expressions like $x\cup y,x\cap y,x\backslash y=x\cap\overline{y},\emptyset,\overline{y}$ (complement of $y$ ) etc. (also notated as $x+y,xy,x\overline{y},0,\overline{y}$ ) for $x,y\in\overline{S}$ and $x,y\in B$ .

Two elements in $B$ are regarded as same if and only if they are the same by formal rules in Boolean algebra, for example, $x+x+y=x+y,x\overline{x}=0,x+\overline{x}=1$ .

For $n=0$ and $x_{i}\in\overline{S}$ define the expression $x_{1}\nabla\ldots\nabla x_{n}:=\emptyset$ .

The above definitions for $\zeta$ will now be generalized as follows:

Definition 6.15.

Define a function

\zeta:B\rightarrow{\mathbb{R}}

as follows. Put

\zeta(0)=\zeta(1)=0

For all $x_{1},\ldots,x_{n},y_{1},\ldots,y_{m}\in\overline{S}$ with $n\geq 0,m\geq 0$ (and $y:=y_{1}\nabla\ldots\nabla y_{m}$ ) set

\zeta(x_{1}\cap...\cap x_{n}):=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

Here it is understood that mutually $i_{s}\neq i_{t}$ . Then define

(18)

\zeta(x_{1}\cap...\cap x_{n}\cap\overline{y}_{1}\cap...\cap\overline{y}_{m})=\zeta((x_{1}\cap...\cap x_{n})\backslash(y_{1}\cup...\cup y_{m}))

:=\zeta(x_{1}\cap...\cap x_{n})-\zeta(x_{1}\cap...\cap x_{n}\cap y)

(19)

=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(y\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

because all expressions without $y$ cancel. Finally set

\zeta(x\cup y):=\zeta(x)+\zeta(y)\qquad\mbox{if }x\cap y=0

Lemma 6.16.

$\zeta$ is well defined.

Proof.

Because we can choose common refinements, every expression in $B$ can be brought to the form $\sum_{n}z_{n}=\sum_{n}x_{1,n}\ldots x_{m_{n},n}\overline{y}_{1,n}\ldots\overline{y}_{k_{n},n}$ with $z_{n}z_{m}=0$ for $n\neq m$ , with $x_{i,j},y_{i,j}\in\overline{S}$ and $z_{n}$ obviously defined, and so the above $\zeta$ is already fully defined.

This expression in $B$ is uniquely defined if the set of involved variables is determined. Thus we only need to check that the definition of $\zeta$ is independent of the set of involved variables:

Observe here the occurrence of $x_{n+1}$ :

\zeta(x_{1}...x_{n}x_{n+1}\overline{y}_{1}...\overline{y}_{m})+\zeta(x_{1}...x_{n}\overline{x_{n+1}}\,\overline{y}_{1}...\overline{y}_{m})

=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n+1\}}\ell(y\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

+\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(y\nabla x_{n+1}\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

=\zeta(x_{1}...x_{n}\overline{y}_{1}...\overline{y}_{m})

because all occurrences with $x_{n+1}$ cancel each other. This result is consistent with $x_{n+1}+\overline{x_{n+1}}=1$ .

By induction we can add as many variables as we want by the formula $x+\overline{x}=1$ and expansion, without changing $\zeta$ . ∎

Lemma 6.17.

$\zeta$ is a “signed measure” on the Boolean algebra B generated by $\overline{S}$ in the sense that

\zeta(x\cap y)=\zeta(x)+\zeta(y)-\zeta(x\cup y)

for all $x,y\in B$ . In particular we have

\zeta(\overline{y})=-\zeta(y)

\zeta(x\backslash y)=\zeta(x\cup y)-\zeta(y)

Proof.

This is clear, as when we form common refinements for $x$ and $y$ in $B$ as indicated in the last proof, $\zeta(x\cap y)$ sums only over the products which appear both in $x,y$ , and $\zeta(x\cup y)$ over those which appear in $x$ or $y$ .

The last two identities follow from this and $\zeta(1)=0$ . ∎

We should however be reminded that we do not work with a real positive measure on measurable sets. We only observe:

Lemma 6.18.

For all $x,y,x_{i}\in\overline{S}$ we have:

(i) $\zeta(x\cap y)\geq 0$ , $\zeta(x_{1}\cup\ldots\cup x_{n})\geq 0$

(ii) If $\ell$ is monotonically increasing then $\zeta(x\backslash y)\geq 0$ .

7. New metric

Up to now we could not clarify if $d$ or $\sigma$ satisfies the triangle or $\nabla$ -inequality in our natural prototype languages. We have theorem 6.10, and it tells us the $\nabla$ -inequality of $d$ implies all the desired natural properties. In this section we show how we may modify a given length function $\ell$ such that we get the $\nabla$ -inequality for its associated distance function $d_{\ell}$ and thus all the desired properties. The approach is as follows. At first we modify the distance function $d_{\ell}$ for a given $\ell$ as in definitions 7.1 and 7.4 such that the so obtained $d$ satisfies the $\Delta$ - and $\nabla$ -inequality. Then we extract from it a new length function $\ell_{2}$ by setting $\ell_{2}(x):=d(x,\emptyset)$ (see definition 7.11). But this new $\ell_{2}$ may not be monotone increasing as required in (9). That is why we make it monotone increasing in definition 7.9 to obtain a new length $\overline{\ell_{2}}$ . But the validity of the desired $\Delta$ - and $\nabla$ -inequality of $d_{\overline{\ell_{2}}}$ and $\sigma_{\overline{\ell_{2}}}$ associated to $\overline{\ell_{2}}$ might fail, also cf. lemma 7.13. Hence we restart the above procedure with $\overline{\ell_{2}}$ again, and then again and again infinitely many often, and in the limit the obtained length function $\ell^{\infty}$ satisfies all desired properties.

A key lemma for this procedure is lemma 7.14, whose proof shows that we need both the $\Delta$ - and $\nabla$ -inequality simultaneously. Hence, it is also intrinsic to do the procedure with the $d$ -distance, because the lemma might not hold for the $\sigma$ -distance. Doing the procedure for the $\sigma$ -distance instead leads at least to the estimates of lemma 7.27 in the limit.

Actually, in this section we start with $d$ , and if nothing else is said, $d$ always denotes a symmetric, positive, nilpotent function $d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ for an abelian, idempotent monoid $(\overline{S},\nabla,\emptyset)$ .

Example 10.10 suggests, that $d$ of definition 2.12 does not satisfy the $\Delta$ -inequality in general. A way out of that fact is to repair $d$ as follows:

Definition 7.1 (Forced $\Delta$ -inequality).

Define a function $\tilde{d}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ by

\tilde{d}(x,y)=\inf_{n\geq 0,a_{1},\ldots,a_{n}\in\overline{S}}d(x,a_{1})+d(a_{2},a_{3})+\ldots+d(a_{n-1},a_{n})+d(a_{n},y)

Lemma 7.2.

$\tilde{d}$ satisfies the $\Delta$ -inequality, and $d=\tilde{d}$ if and only if $d$ satisfies the $\Delta$ -inequality.

Proof.

This is obvious as the right hand side of the $\Delta$ -inequality is one possible value over which the infimum is taken in $\tilde{d}$ . ∎

The following lemma may be useful for the pseudometric $\rho$ on $\overline{S}$ defined by $\rho(x,y):=|\ell(x)-\ell(y)|$ to get a lower bound for $\tilde{d}$ , cf. lemma 3.5.

Lemma 7.3.

If a function $\rho:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ satisfies the $\Delta$ -inequality then we have the following lower bound estimate:

\rho\leq d\quad\Rightarrow\quad\rho\leq\tilde{d}

Next we repair $d$ to satisfy the $\nabla$ -inequality:

Definition 7.4 (Forced $\nabla$ -inquality).

Define a function $\hat{d}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ by

\hat{d}(x,y):=\inf_{x=x_{1}\nabla\ldots\nabla x_{n},\,y=y_{1}\nabla\ldots\nabla y_{n},\;x_{i},y_{i}\in\overline{S}}d(x_{1},y_{1})+\ldots+d(x_{n},y_{n})

Lemma 7.5.

$\hat{d}$ satisfies the $\nabla$ -inequality, and $d=\hat{d}$ if and only if $d$ satisfies the $\nabla$ -inequality.

Proof.

This is obvious as the right hand side of the $\nabla$ -inequality is one possible value over which the infimum is taken in $\hat{d}$ . ∎

Fortunately, the transition from $d$ to $\tilde{d}$ does not destroy the validitiy of the $\nabla$ -inequality:

Lemma 7.6.

If $d$ satisfies the $\nabla$ -inequality, then $\tilde{d}$ satisfies the $\nabla$ -inequality.

Proof.

By application of the $\nabla$ -inequality for $d$ we get

\tilde{d}(x\nabla y,z\nabla w)=\inf_{n\geq 0,a_{1},\ldots,a_{n}\in\overline{S}}d(x\nabla y,a_{1})+d(a_{2},a_{3})+\ldots+d(a_{n},z\nabla w)

\leq\inf_{n\geq 0,a_{i},b_{i}\in\overline{S}}d(x\nabla y,a_{1}\nabla b_{1})+d(a_{2}\nabla b_{2},a_{3}\nabla b_{3})+\ldots+d(a_{n}\nabla b_{n},z\nabla w)

\leq\tilde{d}(x,z)+\tilde{d}(y,w)

where the first inequality is because the infimum goes over a smaller selection. ∎

Lemma 7.7.

${\tilde{\hat{d}}}$ satisfies the $\nabla$ - and $\Delta$ -inequality, and $d={\tilde{\hat{d}}}$ if and only if $d$ satisfies the $\nabla$ - and $\Delta$ - inequality.

Proof.

The first assertion follows by lemmas 7.2, 7.5 and 7.6.

If $d={\tilde{\hat{d}}}$ then $d\geq\hat{d}\geq{\tilde{\hat{d}}}=d$ , thus $d=\hat{d}\geq{\tilde{\hat{d}}}=d=\hat{d}$ , so that the last equivalence is by lemmas 7.6 and 7.2. ∎

Hence we get a projection from metric candidates to those ones which satisfy the $\Delta$ - and $\nabla$ -inequality:

Corollary 7.8 (Projection onto pseudometrics satisfying $\nabla$ -inequality).

There is a projection $d\mapsto{\tilde{\hat{d}}}$ from the set of positive, symmetric, nilpotent functions on $\overline{S}$ onto the set of pseudometrics on $\overline{S}$ satisfying the $\nabla$ -inequality.

The following bar operator turns possibly non-increasing length functions to increasing ones:

Definition 7.9 (Bar operator).

Given a positive function $\ell:\overline{S}\rightarrow{\mathbb{R}}$ set $\overline{\ell}:\overline{S}\rightarrow{\mathbb{R}}$ as

\overline{\ell}(x):=\inf_{z\in\overline{S}}\ell(x\nabla z)

Lemma 7.10.

If $\ell:\overline{S}\rightarrow{\mathbb{R}}$ is a possibly non-increasing length function then $\overline{\ell}$ is an increasing length-function, i.e. (for all $x,y\in\overline{S}$ )

\overline{\ell}(x)\leq\overline{\ell}(x\nabla y)

If $\ell$ is already monotonically increasing, then $\bar{\ell}=\ell$ .

From the various distance functions we naturally extract the length functions out:

Definition 7.11.

Define positive functions $\ell,\tilde{\ell},\hat{\ell},{\tilde{\hat{\ell}}}:\overline{S}\rightarrow{\mathbb{R}}$ (depending on $d$ ) by

(20)

\ell(x):=d(x,\emptyset),\;\tilde{\ell}(x):=\tilde{d}(x,\emptyset),\;\hat{\ell}(x):=\hat{d}(x,\emptyset),\;{\tilde{\hat{\ell}}}(x):={\tilde{\hat{d}}}(x,\emptyset)

Lemma 7.12.

If $d$ satisfies the $\nabla$ -inequality, then $\ell$ of (20) is a non-increasing length function, and $\overline{\ell}$ a length function.

The next lemma shows that switching from $d$ to ${\tilde{\hat{d}}}$ may really change their associated length functions $\ell$ and ${\tilde{\hat{\ell}}}$ , respectively:

Lemma 7.13.

If $\rho\leq d$ for $\rho$ as defined before lemma 7.3 then $\tilde{\ell}=\ell$ , and if $\ell$ is subadditive then $\hat{\ell}=\ell$ . But we have ${\tilde{\hat{\ell}}}\neq\ell$ in certain cases.

Proof.

By lemma 7.3 we have

|\ell(x)-\ell(\emptyset)|\leq\tilde{d}(x,\emptyset)\leq\ell(x)

If ${\tilde{\hat{\ell}}}=\ell$ then $d_{\ell}$ would satisfy the $\Delta$ -inequality by lemma 7.22. (By independence, the forward reference is allowed.) But we know from example 10.10 that this is not in general true. ∎

This is an important key lemma which shows how our above procedure decreases the metric candidates:

Lemma 7.14.

If $d$ satisfies the $\Delta$ - and $\nabla$ -inequality then for $\ell$ of (20),

d_{\ell}\leq d

d_{\overline{\ell}}\leq d

Proof.

(i) Without loss of generality we have by the second $\Delta$ -inequality and the $\nabla$ -inequality that

d_{\ell}(x,y)=d(x\nabla y,\emptyset)-d(x,\emptyset)

\leq d(x\nabla y,x)\leq d(x,y)+d(x,x)=d(x,y)

(ii) Similarly we get

d_{\overline{\ell}}(x,y)=\inf_{z}\ell(x\nabla y\nabla z)-\inf_{w}\ell(x\nabla w){\wedge}\inf_{w}\ell(y\nabla w)

Given any $\varepsilon>0$ we may choose a $w\in\overline{S}$ such that this is

\leq\inf_{z}\ell(x\nabla y\nabla z)-\ell(x\nabla w)+\varepsilon

\leq\ell(x\nabla y\nabla w)-\ell(x\nabla w)+\varepsilon

\leq d(x\nabla y\nabla w,x\nabla w)+\varepsilon

\leq d(y,x)+d(x\nabla w,x\nabla w)+\varepsilon=d(x,y)+\varepsilon

∎

Corollary 7.15.

Using definition (20), we have

d_{{{\tilde{\hat{\ell}}}}}\leq{\tilde{\hat{d}}}

d_{\overline{{\tilde{\hat{\ell}}}}}\leq{\tilde{\hat{d}}}

Proof.

Apply lemma 7.14 to ${\tilde{\hat{d}}}$ . ∎

Let us now restart the above procedure with a new $d_{{\bar{\tilde{\hat{\ell}}}}}$ instead of $d$ and then again and again. More precisely put

Definition 7.16.

Set a start point $d^{(1)}:=d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ , and then for all $n\geq 1$ define successively functions ${\tilde{\hat{\ell^{(n)}}}},\ell^{(n+1)}:\overline{S}\rightarrow{\mathbb{R}}$ and $d^{(n+1)}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ by

{\tilde{\hat{\ell^{(n)}}}}(x):={\tilde{\hat{d^{(n)}}}}(x,\emptyset)

\ell^{(n+1)}:={\overline{\tilde{\hat{\ell^{(n)}}}}}

(21)

d^{(n+1)}:=d_{\ell^{(n+1)}}

We are going to show that the successively applied procedure defined in the last definition which takes a length function in and gives a length function out converges to a desired fixed point.

Lemma 7.17.

All $\ell^{(n+1)}$ are length functions on $\overline{S}$ , and we have descending sequences

0\leq...\leq\ell^{(3)}=\overline{\tilde{\hat{\ell^{(2)}}}}\leq{\tilde{\hat{\ell^{(2)}}}}\leq\ell^{(2)}=\overline{\tilde{\hat{\ell}}}\leq\tilde{\hat{\ell}}\leq\ell

0\leq...\leq d^{(3)}=d_{\ell^{(3)}}\leq\tilde{\hat{d^{(2)}}}\leq d^{(2)}=d_{\ell^{(2)}}\leq\tilde{\hat{d}}\leq d

Proof.

By lemmas 7.7 and and 7.12, $\ell^{(n+1)}$ is a length function. By definitions 7.1 and 7.4, we have ${\tilde{\hat{d}}}\leq d$ . By corollary 7.15 applied to $d:=d^{(n)}$ and ${\tilde{\hat{\ell}}}:={\tilde{\hat{\ell^{(n)}}}}$ we thus obtain $d^{(n+1)}=d_{\ell^{(n+1)}}=d_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq{\tilde{\hat{d^{(n)}}}}\leq{d^{(n)}}$ . Hence, as $\overline{\ell}\leq\ell$ by definition 7.9,

\ell^{(n+1)}=\overline{{\tilde{\hat{\ell^{(n)}}}}}\leq{{\tilde{\hat{\ell^{(n)}}}}}={\tilde{\hat{d^{(n)}}}}(-,\emptyset)\leq d^{(n)}(-,\emptyset)=d_{\ell^{(n)}}(-,\emptyset)=\ell^{(n)}

∎

We note that all ${\tilde{\hat{d^{(n)}}}}$ are pseudometrics which satisfy the $\nabla$ -inequality by lemma 7.7, but it is unclear whether they are of the form $d_{\ell^{\prime}}$ . (Otherwise we would be done.)

Since we have monotonically falling real-valued sequences that are bounded below by $0$ we may define

Definition 7.18.

Define the pointwise function limits

{{d}^{(\infty)}}:=\lim_{n\rightarrow\infty}d^{(n)}

(22)

{{\ell}^{(\infty)}}:=\lim_{n\rightarrow\infty}\ell^{(n)}

Lemma 7.19.

The function

d^{(\infty)}=d_{{{\ell}^{(\infty)}}}=\lim_{n\rightarrow\infty}d_{\ell^{(n)}}=\lim_{n\rightarrow\infty}{\tilde{\hat{d_{\ell^{(n)}}}}}={\tilde{\hat{d^{(\infty)}}}}={\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}

is a pseudometric on $\overline{S}$ satisfying the $\nabla$ -inequality.

Proof.

The second equality is clear by the formula for $d_{\ell}$ of def. 2.12, the first one is then by definition. The third equality is because of the converging sequence of lemma 7.17. Because ${\tilde{\hat{d_{\ell^{(n)}}}}}$ satisfies the $\nabla$ - and $\Delta$ -inequality by lemma 7.7, their point-wise limit does also, and so we get the last equalities and thus the final assertion. ∎

Lemma 7.20.

${{\ell}^{(\infty)}}$ is a length function on $\overline{S}$ and

\ell^{(\infty)}={d_{{{\ell}^{(\infty)}}}}(-,\emptyset)={\tilde{\hat{\ell^{(\infty)}}}}:={\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}(-,\emptyset)=\overline{\tilde{\hat{\ell^{(\infty)}}}}=\overline{\ell^{(\infty)}}

Proof.

As ${\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}=d_{{{\ell}^{(\infty)}}}$ by lemma 7.19, we get the second identity.

The last equality holds because, as $d^{(\infty)}=d_{{{\ell}^{(\infty)}}}$ is positive,

{d_{{{\ell}^{(\infty)}}}}(x\nabla y,x)={{\ell}^{(\infty)}}(x\nabla y)-\ell^{\infty}(x)\geq 0

that is, ${{\ell}^{(\infty)}}$ satisfies already the monotone increasingness. ∎

We now summarize that the successively applied described procedure which takes a length function and forms a new one converges to a desired fixed point:

Theorem 7.21 (Ideal length function).

Given a positive symmetric idempotent function $d$ on an abelian, idempotent monoid $(\overline{S},\nabla,\emptyset)$ , define ${{\ell}^{(\infty)}}$ as described in definitions 7.16 and 7.18.

Then ${{\ell}^{(\infty)}}$ is a length function on $\overline{S}$ , and $d_{{{\ell}^{(\infty)}}}$ and $\sigma_{{{\ell}^{(\infty)}}}$ (def. 2.12) are pseudometrics on $\overline{S}$ satisfying the $\nabla$ -inequality.

Proof.

By lemmas 7.19 and 7.20, and theorem 6.10. ∎

If $d$ starting at the beginning of this section happens to be of the form $d=d_{\ell}$ for a length function $\ell$ on $\overline{S}$ , then astonishingly $d$ satisfies already the $\Delta$ - and $\nabla$ -inequality if only $\ell={\tilde{\hat{\ell}}}$ :

Proposition 7.22 (Ideal length function attained).

If $d$ is of the form $d=d_{\ell}$ for some length function $\ell$ then $d_{\ell}$ satisfies the $\nabla$ - and $\Delta$ -inequality $\Leftrightarrow$

d_{\ell}={\tilde{\hat{d_{\ell}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\ell}}}=\ell\quad\Leftrightarrow\quad{\bar{\tilde{\hat{\ell}}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}

Proof.

If $d$ satisfies the $\Delta$ -inequality then certainly $d\leq\tilde{d}\leq d$ and similarly the $\nabla$ -inequality implies $d=\hat{d}$ . So ${\tilde{\hat{d}}}=d$ by lemma 7.6 and thus ${\tilde{\hat{\ell}}}=\ell$ .

If ${\tilde{\hat{\ell}}}=\ell$ then setting $d:=d_{\ell}$ we have with corollary 7.15 that

d_{\ell}=d_{{\tilde{\hat{\ell}}}}\leq{\tilde{\hat{d}}}={\tilde{\hat{d_{\ell}}}}\leq d_{\ell}

and so $d_{\ell}={\tilde{\hat{d_{\ell}}}}$ satisfies the $\nabla$ and $\Delta$ -inequality.

If ${\bar{\tilde{\hat{\ell}}}}=\ell$ then

{\bar{\tilde{\hat{\ell}}}}=\ell\geq{\tilde{\hat{\ell}}}\geq{\bar{\tilde{\hat{\ell}}}}=\ell

and so $\ell={\tilde{\hat{\ell}}}$ . Also then, $\ell^{(2)}={\bar{\tilde{\hat{\ell}}}}=\ell$ , and thus $d^{(2)}=d_{\ell}$ and the above procedure restarts again with the same start $d_{\ell}$ , and so again $\ell^{(3)}=\ell^{(2)}$ etc and we get a constant sequence and finally ${{\ell}^{(\infty)}}=\ell$ .

If ${{\ell}^{(\infty)}}=\ell$ then by corollary 7.20 ${\bar{\tilde{\hat{\ell}}}}=\ell$ . ∎

We just remark that one will naturally turn the desired pseudometrics to metrics effortlessly:

Corollary 7.23.

By definition 5.7, we may then divide out elements in $\overline{S}$ to obtain a metric space ${\check{S}}$ , which is again an abelian idempotent monoid, with metrics ${\check{d_{{{\ell}^{(\infty)}}}}}=d_{{\check{{{\ell}^{(\infty)}}}}}$ and ${\check{\sigma_{{{\ell}^{(\infty)}}}}}=\sigma_{{\check{{{\ell}^{(\infty)}}}}}$ satisfying the $\nabla$ -inequality.

Proof.

By theorem 7.21 we have pseudometrics ${{d}^{(\infty)}}$ and ${{\sigma}^{(\infty)}}$ , and to these we apply definition 5.7, proposition 5.8 and lemma 5.6. ∎

The above procedure gives a desired projection onto the set of length functions from arbitrary length functions to desired ones:

Corollary 7.24 (Projection onto ideal length functions).

There is a projection $\ell\mapsto{{\ell}^{(\infty)}}$ from the set of length functions $\ell$ on $\overline{S}$ onto the set of those ones such that $d_{\ell}$ and $\sigma_{\ell}$ are pseudometrics which satisfy the $\nabla$ -inequality.

Proof.

Start the above procedure, definition 7.16, with $d:=d_{\ell}$ (or alternatively with $d:=\sigma_{\ell}$ , which however may result in another fixed point ${{\ell}^{(\infty)}}$ ) and end up with theorem 7.21.

Projection is onto because of lemma 7.7 and theorem 6.10. ∎

If the length function $L$ on $S$ is a simple character count as in our natural protoexamples then the fixed point for the length functions will be attained after finitely many steps:

Lemma 7.25.

If the $\ell$ -function is in a grid, i.e. $\ell(\overline{S})\subseteq\gamma{\mathbb{Z}}$ for some constant $\gamma\in{\mathbb{R}}$ , and $d=d_{\ell}$ , then the limits of definition 7.18 will all be attained at some $n$ ( $n$ depending on the points of $\overline{S}$ however).

We will now make a bigger cut and switch from $d$ to $\sigma$ . Let us now do the above procedure by starting with a length function $\ell^{(1)}$ on $\overline{S}$ , but instead always forming $d_{\ell^{(n)}}$ we now choose $\sigma_{\ell^{(n)}}$ . In other words, set

Definition 7.26.

Do everything as in definition 7.16 but modify (21) to

d^{(n+1)}:=\sigma_{\ell^{(n+1)}}

Lemma 7.27.

Assume definition 7.26, but to avoid confusion with what we have defined before, rewrite $\sigma^{(n)}:=d^{(n)}$ .

Then $(\ell^{(n)})_{n}$ is a decreasing sequence of length functions on $\overline{S}$ , with limits and estimates

{{\ell}^{(\infty)}}:=\lim_{n\rightarrow\infty}\ell^{(n)},\qquad{{\sigma}^{(\infty)}}:=\lim_{n\rightarrow\infty}\sigma^{(n)}=\sigma_{{{\ell}^{(\infty)}}}

(23)

1/2{\sigma^{(n+1)}}\leq{\tilde{\hat{\sigma^{(n)}}}}\leq{\sigma^{(n)}}

(24)

1/2{{\sigma}^{(\infty)}}\leq\liminf_{n}{\tilde{\hat{\sigma^{(n)}}}}\leq\limsup_{n}{\tilde{\hat{\sigma^{(n)}}}}\leq{\tilde{\hat{{{\sigma}^{(\infty)}}}}}\leq{{\sigma}^{(\infty)}}

Proof.

As $\ell^{(n)}$ converges pointwise to ${{\ell}^{(\infty)}}$ , $\sigma^{(n+1)}=\sigma_{\ell^{(n+1)}}$ converges also pointwise to ${{\sigma}^{(\infty)}}$ . (23) is by lemmas 3.5 and corollary 7.15 applied to $d:=\sigma^{(n)}$ , as

1/2\sigma_{\ell^{(n+1)}}=1/2\sigma_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq d_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq{\tilde{\hat{\sigma^{(n)}}}}

By definitions 7.1 and 7.4, and because $\sigma^{(n)}$ converges pointwise to ${{\sigma}^{(\infty)}}$ , for all $\varepsilon>0$ there are $a_{1},a_{2},x_{0},a_{1,0},\ldots\in\overline{S}$ and $n_{0}\geq 1$ such that for all $n\geq n_{0}$ we have that

{\tilde{\hat{{{\sigma}^{(\infty)}}}}}(x,y)+3\varepsilon\geq\hat{{{\sigma}^{(\infty)}}}(x,a_{1})+\hat{{{\sigma}^{(\infty)}}}(a_{1},a_{2})+\ldots+\hat{{{\sigma}^{(\infty)}}}(a_{m},y)+2\varepsilon

\geq{{{\sigma}^{(\infty)}}}(x_{0},a_{1,0})+\ldots+{{{\sigma}^{(\infty)}}}(a_{m,k_{m}}^{\prime},y_{m})+\varepsilon

\geq{\sigma^{(n)}}(x_{0},a_{1,0})+\ldots+{\sigma^{(n)}}(a_{m,k_{m}}^{\prime},y_{m})

\geq{\tilde{\hat{\sigma^{(n)}}}}(x_{0},a_{1,0})+\ldots+{\tilde{\hat{\sigma^{(n)}}}}(a_{m,k_{m}}^{\prime},y_{m})\geq{\tilde{\hat{\sigma^{(n)}}}}(x,y)

where the last inequality is because ${\tilde{\hat{\sigma}}}^{(n)}$ satisfies the $\nabla$ - and $\Delta$ -inequality and so we can estimate back what we have expanded in the first two lines above.

This and (23) yield (24). ∎

8. Homomorphisms

In this section we shift our attention away from the abelian idempotent monoids to their homomorphism sets between them. Thereby we consider only homomorphisms which are “uniformly continuous” similarly as the bounded linear operator between Banach spaces are, see definition 8.8. We are provided by a length on each homomorphism set (analogy: norm on operators between Banach spaces) and apply the procedure of section 7 to obtain new length functions with better theoretical properties. Thereby we enrich this concept with the desired property of product inequalities, that is, we want achieve the validity of inequalities like $\ell(U\circ V)\leq\ell(U)\ell(V)$ and $d(A\circ U,A\circ V)\leq\ell(A)d(U,V)$ for homomorphisms $A,U$ and $V$ . Similarly as in the last section we end up with a fixed point theorem yielding desired length functions, and apply them to get a Banach-Mazur-like distance between languages.

At first we define homomorphism between languages. In this section let $(S,\equiv_{S})$ and $(Q,\equiv_{Q})$ be languages with equivalence relation as in definitions 2.1 and 2.6.

Definition 8.1 (Homomorphism of languages).

A function $T:(S,\equiv_{S})\rightarrow(Q,\equiv_{Q})$ is called a homomorphism between these languages if for all $x,y\in S$ we have

x\bot y\quad\Rightarrow\quad\Big{(}Tx\bot Ty\quad\mbox{and}\quad T(x{\vee}y)=Tx{\vee}Ty\Big{)}

x\equiv_{S}y\quad\Rightarrow\quad Tx\equiv_{Q}Ty

In this section we almost exclusively return to the abstract setting of abelian idempotent monoids. In what follows let $(\overline{S},\nabla,\emptyset),(\overline{Q},\nabla,\emptyset)$ and $(\overline{R},\nabla,\emptyset)$ denote any abelian idempotent monoids. Finally, a natural notion for homomorphisms between such monoids is certainly a semigroup or monoid homomorphism:

Definition 8.2.

A function $T:\overline{S}\rightarrow\overline{Q}$ is called a homomorphism if for all $x,y\in\overline{S}$

T(x\nabla y)=Tx\nabla Ty

A homomorphism between languages becomes a homomorphism between their quotients:

Lemma 8.3.

If $T:(S,\equiv_{S})\rightarrow(Q,\equiv_{Q})$ is a homomorphism then it descends to a homomorphism $\overline{T}:\overline{S}\rightarrow\overline{Q}$ defined by $\overline{T}([s])=[T(s)]$ .

The monoid operation $\nabla$ between abelian idempotent monoids naturally carries over to the set of homomorphisms between them:

Definition 8.4 (Operation on homomorphism set).

For homomorphisms $U,V:\overline{S}\rightarrow\overline{Q}$ define the homomorphism

U\nabla V:\overline{S}\rightarrow\overline{Q}:(U\nabla V)(x):=Ux\nabla Vx

That is why we get the usual order on the homomorphism set:

Definition 8.5.

We equip $\hom(\overline{S},\overline{Q})$ with the usual order $\leq$ as of definition 5.1, that is, for all homomorphisms $U,V:\overline{S}\rightarrow\overline{Q}$ set

U\leq V\quad\Leftrightarrow\quad V=U\nabla V\quad\Leftrightarrow\quad Ux\leq Vx\;\forall x\in\overline{S}

Lemma 8.6.

For any $a\in\overline{S}$ , $T_{a}:\overline{S}\rightarrow\overline{S}$ defined by $T_{a}(x)=x\nabla a$ is a (non-unital) homomorphism .

We write the composition $U\circ V$ of two homomorphisms $U,V$ also as a product $UV$ . The following lemma states the distributive law between the (“additive”) monoid operations and the (“multiplicative”) composition operations on the homomorphism sets.

Lemma 8.7 (Distribution laws).

For all homomorphisms $U,V:\overline{Q}\rightarrow\overline{R}$ and $X:\overline{S}\rightarrow\overline{S}$ and $Y:\overline{R}\rightarrow\overline{W}$ we have

(U\nabla V)X=UX\nabla VX

Y(U\nabla V)=YU\nabla YV

Write $\emptyset$ for the operator $\emptyset:\overline{S}\rightarrow\overline{T}$ $s\mapsto[\emptyset]$ . A continuous linear operator between Banach spaces is actually uniformly continuous, that is, is bounded. We consider analogously only uniformly continuous, or in other words, bounded homomorphisms between abelian idempotent monoids equipped with metrics in order to get a natural length function on each homomorphism set in analogy to the norm of linear operators between Banach spaces:

Definition 8.8 (Natural length function on homomorphism set).

Let $\overline{S}$ and $\overline{Q}$ be equipped with positive, symmetric, nilpotent functions $d_{\overline{S}}$ and $d_{\overline{Q}}$ , respectively, satisfying the $\nabla$ -inequality. Write $\hom(\overline{S},\overline{Q})$ for the set of all “uniformly continuous” homomorphisms $U:\overline{S}\rightarrow\overline{Q}$ , i.e. for which $\ell^{\prime}(U)$ defined next is finite.

We define functions $\ell^{\prime},\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ , where $\ell$ is a length function, by

\ell^{\prime}(U):=\inf\{M\geq 0|\,M\in{\mathbb{R}},\;\forall x,y\in\overline{S}:\;d_{\overline{Q}}(Ux,Uy)\leq Md_{\overline{S}}(x,y)\}

(25)

\ell:=\overline{\ell^{\prime}}

In (25) we used the bar operator of definition 7.9. Actually we should index in $\ell$ the data $\overline{S},\overline{Q}$ etc., but to avoid cluttering we use the sloppy light notation. If we choose for $d_{\overline{Q}}$ the trivial null function then we will get all homomorphisms in $\hom(\overline{S},\overline{Q})$ . For simplicity, we shall call elements of $\hom(\overline{S},\overline{Q})$ just homomorphisms and drop “uniformly continuous”.

Lemma 8.9.

For all (meaningful composable) homomorphisms $U,V$ we have

(26)

\ell^{\prime}(U\circ V)\leq\ell^{\prime}(U)\ell^{\prime}(V)

\ell^{\prime}(U\nabla V)\leq\ell^{\prime}(U)+\ell^{\prime}(V)

Proof.

The first inequality is elementary as for the norm of linear operators, and the second one follows from the $\nabla$ -inequality of $d_{\overline{Q}}$ . ∎

We refer to the inequality (26) (for any given $\ell^{\prime}$ ) as the product-inequality. It is analogue to the product inequality $\|ST\|\leq\|S\|\|T\|$ for bounded linear operators between Banach spaces. We tidy up the situation so far and get the usual abstract setting of abelian idempotent monoids also for the homomorphism sets:

Lemma 8.10 (Abstraction).

The homomorphism set $\big{(}(\hom(\overline{S},\overline{Q}),\nabla,\emptyset\big{)}$ is an abelian idempotent monoid and $\ell$ of definition 8.8 a length function on it.

Proof.

By definition 8.4 we check the first assertion, by lemma 8.9, $\ell^{\prime}$ is a non-increasing length function, to which we apply lemma 7.10. ∎

For the further discussion we need now consider a whole family of abelian idempotent monoids:

Definition 8.11 (Category of languages).

Let $\Lambda$ be a small category consisting of abelian, idempotent monoids $(\overline{S},\nabla,\emptyset)$ equipped with positive, symmetric, nilpotent functions $d_{\overline{S}}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}$ satisfying the $\nabla$ -inequality as objects, and all possible “uniformly continuous” homomorphisms between these objects as in definition 8.2 with ordinary composition as morphisms.

It is now important to stress that we will in the sequel not use the distance functions $d_{\overline{S}}$ on the objects at all. We only mentioned it to have a natural length function on the homomorphism sets by definition 8.8. We require now all homomorphism sets to be equipped with length functions and metric candidates:

Definition 8.12 (Category of languages with general length functions on homomorphism sets).

Each morphism set $M$ ( $=\hom(\overline{S},\overline{Q})$ ) in $\Lambda$ assume to be equipped with a length function $\ell:M\rightarrow{\mathbb{R}}$ and a positive, symmetric, nilpotent function $d:M\times M\rightarrow{\mathbb{R}}$ . (For example the length functions $\ell$ of definition 8.8, and $d:=d_{\ell}$ of definition 2.12 are the typical natural choices.) For simplicity, all $\ell$ and $d$ are called in the same way, independent of $M$ .

Our next aim is to modify the length functions on the homomorphism sets in such a way such that we (just) get the natural inequality $d(A\circ U,A\circ V)\leq\ell(A)d_{\ell}(U,V)$ for all composable homomorphisms $A,U,V$ in analogy to the inequality $\|AU-AV\|\leq\|A\|\|U-V\|$ for linear operators between Banach spaces. Simultaneously we alter $\ell$ as to achieve that $d_{\ell}$ satisfies the $\Delta$ - and $\nabla$ -inequality. To this end we apply the fix point concept of section 7 to the homomorphism sets, enriched by the now also desired product inequality.

For the rest of this section we will define various new $\ell$ s and $d$ s simultaneously for all morphism sets $M$ in $\Lambda$ , without indicating $M$ in notation. We again note that we may choose all functions $d_{\overline{S}}$ to be the trivial null functions and any $\ell$ and $d$ s in definition 8.12 independent of the suggested $\ell$ s of definition 8.8.

We repair now all given symmetric nilpotent positive functions $d$ on the homomorphism sets in such a way that they satisfy the product inequalities with respect to the length functions:

Definition 8.13 (New submultiplicative distance function).

For all objects $\overline{S}$ and $\overline{Q}$ in $\Lambda$ define $\dot{d}:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ as follows. For all morphisms $X,Y:\overline{S}\rightarrow\overline{Q}$ in $\Lambda$ with same domain and range set

\dot{d}(X,Y):=\inf_{\stackrel{{\scriptstyle X=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{Y=A_{1}...A_{n}VB_{1}....B_{m}}},\;n,m\geq 0}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})

where $A_{i},V,U,B_{i}$ are morphisms between any objects of $\Lambda$ for which their products as indicated are valid (composable).

The so repaired functions $\dot{d}$ now fulfill the desired product inequalities:

Lemma 8.14 (Product inequality).

For composable morphisms $A,B,X$ in $\Lambda$ we have the submultiplicativity relations (often called product-inequality)

(27)

\dot{d}(AX,BX)\leq\dot{d}(A,B)\ell(X)

(28)

\dot{d}(XA,XB)\leq\ell(X)\dot{d}(A,B)

Proof.

Inserting in definition 8.13 we have

\dot{d}(QX,RX):=\inf_{\stackrel{{\scriptstyle QX=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{RX=A_{1}...A_{n}VB_{1}....B_{m}}}}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})

\leq\inf_{\stackrel{{\scriptstyle Q=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{R=A_{1}...A_{n}VB_{1}....B_{m}}}}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})\ell(X)

because we choose the infimum over a smaller set in the second line, that is, we at first chose a resolution $Q=A_{1}...A_{n}UB_{1}....B_{m}$ and then said $QXA_{1}...A_{n}UB_{1}....B_{m}X$ . ∎

The validity of relations (27) and (28) for all morphism sets of $\Lambda$ is summarized as product inequality. It is important that $\dot{d}$ descends:

Lemma 8.15.

$\dot{d}$ is a positive, symmetric nilpotent function with

\dot{d}\leq d

Lemma 8.16.

$\dot{d}$ satisfies the product-inequality, and $d=\dot{d}$ if and only if $d$ satisfies the product-inequality.

Lemma 8.17.

If $d$ satisfies the product inequality then also $\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ the one of (26) in case that $\ell(x)=d(x,\emptyset)$ .

We proceed now analogously and thus faster as in section 7, and use various definitions from there, but now applied to the homomorphism sets rather than to the object sets of $\Lambda$ . In what follows we use definitions 7.1, 7.4, 7.9 and 7.11 applied to the homomorphism sets. In analogy to lemma 7.7 we can now repair $d$ in such a way that we get all the desired properties we want:

Lemma 8.18.

For $\dot{d}$ being any positive, symmetric, nilpotent functions satisfying the product-inequality on each morphism set of $\Lambda$ we have:

$\hat{\dot{d}}$ satisfies the $\nabla$ - and product-inequality.

$\tilde{\dot{d}}$ satisfies the $\Delta$ - and product-inequality.

$\tilde{\hat{\dot{d}}}$ satisfies the $\nabla$ -, $\Delta$ - and product-inequality.

Proof.

By the distribution laws of lemma 8.7 and the supposed validity of the product inequalities as stated in lemma 8.14 we have

\hat{\dot{d}}(QX,RX)=\inf_{QX=\nabla_{i=1}^{n}R_{i},RX=\nabla_{i=1}^{n}R_{i}}\sum_{i=1}^{n}{\dot{d}}(Q_{i},R_{i})

\leq\inf_{Q=\nabla_{i=1}^{n}Q_{i},R=\nabla_{i=1}^{n}R_{i}}\sum_{i=1}^{n}{\dot{d}}(Q_{i}X,R_{i}X)\leq\hat{\dot{d}}(Q,R)\ell(X)

\tilde{\dot{d}}(QX,RX)=\inf_{A_{i}}{\dot{d}}(QX,A_{1})+{\dot{d}}(A_{1},A_{2})+....+{\dot{d}}(A_{n},RX)

\leq\inf_{A_{i}}{\dot{d}}(QX,A_{1}X)+{\dot{d}}(A_{1}X,A_{2}X)+....+{\dot{d}}(A_{n}X,RX)\leq\tilde{\dot{d}}(Q,R)\ell(X)

Hence $\hat{\dot{d}}$ and $\tilde{\dot{d}}$ fulfill the product inequalities, and the rest follows from lemmas 7.2, 7.5 and 7.7 ∎

Lemma 8.19.

$d_{\ell}$ satisfies the product-inequality if and only if $\sigma_{\ell}$ satisfies it.

Proof.

By lemma 5.4 we may express $\sigma_{\ell}$ by $d_{\ell}$ , and vice versa, and from this it is easily deduced. ∎

We introduce now the following functions:

Definition 8.20.

Define positive functions $\ell,\dot{\ell},{\tilde{\hat{\dot{\ell}}}}:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ depending on $d$ (recall that we do this for each morphism set $(M,d)$ of $\Lambda$ separately) by

\ell(x):=d(x,\emptyset),\;\dot{\ell}(x):=\dot{d}(x,\emptyset),\;{\tilde{\hat{\dot{\ell}}}}(x):={\tilde{\hat{\dot{d}}}}(x,\emptyset)

To get finally convergence, it is fundamental that the new metric candidates descend:

Corollary 8.21.

Using definitions 8.20 and 7.9, we have

d_{{{\tilde{\hat{\dot{\ell}}}}}}\leq{\tilde{\hat{\dot{d}}}},\qquad d_{\overline{{\tilde{\hat{\dot{\ell}}}}}}\leq{\tilde{\hat{\dot{d}}}}

Proof.

Apply corollary 7.15 to $d:=\dot{d}$ and $\ell:=\dot{\ell}$ . ∎

We do now an analogous procedure as in definition 7.16, but now for the homomorphism sets instead of the object sets:

Definition 8.22.

For all object sets $\overline{S}$ and $\overline{Q}$ of $\Lambda$ set the start points $d^{(1)}:=d:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ and $\ell^{(1)}:=\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ (def. 8.12), and then successively define ${\tilde{\hat{\dot{\ell^{(n)}}}}},\ell^{(n+1)}:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ and $d^{(n+1)}:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}$ ( $n\geq 1$ ) by

{\tilde{\hat{\dot{\ell^{(n)}}}}}(x):={\tilde{\hat{\dot{d^{(n)}}}}}(x,\emptyset)

\ell^{(n+1)}:={\overline{{\tilde{\hat{\dot{\ell^{(n)}}}}}}}

d^{(n+1)}:=d_{\ell^{(n+1)}}

where $\dot{d^{(n)}}$ is defined as in definition 8.13 with respect to ${d^{(n)}}$ and ${\ell^{(n)}}$ (instead of $d$ and $\ell$ ).

The successively applied procedure to a length function as described in the last definition converges again, analogously as in the theorem 7.21, to a wished fixed point:

Theorem 8.23 (Ideal length functions on homomorphism sets).

Assume the definitions 8.11, 8.12, 8.22 and form the limits ${{\ell}^{(\infty)}}:=\lim\ell^{(n)}$ and ${{d}^{(\infty)}}:=\lim d^{(n)}$ with respect to all morphism sets $M$ of $\Lambda$ .

Then for all morphism sets $M$ of $\Lambda$ , $\ell^{(n+1)}$ and ${{\ell}^{(\infty)}}$ are length functions on $M$ fulfilling the product-inequality (26), $d_{{{\ell}^{(\infty)}}}$ and $\sigma_{{{\ell}^{(\infty)}}}$ are pseudometrics on $M$ fulfilling the $\nabla$ - and product-inequality, and moreover and more precisely

d^{(\infty)}=d_{{{\ell}^{(\infty)}}}=\lim_{n\rightarrow\infty}d_{\ell^{(n)}}={\tilde{\hat{\dot{{{d}^{(\infty)}}}}}}=\lim_{n\rightarrow\infty}{\tilde{\hat{\dot{d_{\ell^{(n)}}}}}}=\lim_{n\rightarrow\infty}{\tilde{\hat{\dot{d^{(n)}}}}}=\lim_{n\rightarrow\infty}{d^{(n)}}

\qquad{{\ell}^{(\infty)}}=\dot{{{\ell}^{(\infty)}}}={\tilde{\hat{\dot{{{\ell}^{(\infty)}}}}}}={\bar{\tilde{\hat{\dot{{{\ell}^{(\infty)}}}}}}}=\lim_{n\rightarrow\infty}\ell^{(n)}=\lim_{n\rightarrow\infty}{\bar{\tilde{\hat{\dot{\ell^{(n)}}}}}}

{d_{{{\ell}^{(\infty)}}}}(UX,VX)\leq{d_{{{\ell}^{(\infty)}}}}(U,V){{{\ell}^{(\infty)}}}(X)

{d_{{{\ell}^{(\infty)}}}}(XU,XV)\leq{{{\ell}^{(\infty)}}}(X){d_{{{\ell}^{(\infty)}}}}(U,V)

for all composable morphisms $X,U$ and $V$ of $\Lambda$ .

Proof.

We have the analogy of lemma 7.17, that is, descending sequences

\ell^{(n+2)}=\overline{{\tilde{\hat{\dot{\ell^{(n+1)}}}}}}\leq\ell^{(n+1)},\qquad d^{(n+1)}=d_{\ell^{(n+1)}}\leq{\tilde{\hat{\dot{d^{(n)}}}}}\leq d^{(n)}

by an analogous proof but by employing corollary 8.21 instead of 7.15 , and so their limits ${{d}^{(\infty)}}$ and ${{\ell}^{(\infty)}}$ and their stated identities.

Similar proofs of lemmas 7.19 and 7.20 then yield the first identities.

The last stated product inequalities with respect to the pair $({{d}^{(\infty)}},{{\ell}^{(\infty)}})$ follow then simply by taking the pointwise limits of the valid product-inequalities for the pairs $({\tilde{\hat{\dot{d^{(n)}}}}},{\ell^{(n)}})$ by definition of $\dot{d^{(n)}}$ in definition 8.22 and lemma 8.18. ∎

This is the analogy to proposition 7.22. It shows, for example, when the procedure of definition 8.22 stops:

Proposition 8.24 (Ideal length functions on homomorphism sets attained).

If $d=d_{\ell}$ for all morphism sets of $\Lambda$ , then all $d_{\ell}$ satisfy the $\nabla$ -, $\Delta$ - and product-inequality $\Leftrightarrow$

d_{\ell}={\tilde{\hat{\dot{d_{\ell}}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\dot{\ell}}}}=\ell\quad\Leftrightarrow\quad{\bar{\tilde{\hat{\dot{\ell}}}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}

where each item is understood to hold for all morphism sets of $\Lambda$ .

Proof.

This is completely analogously proved as proposition 7.22, using corollary 8.21 and theorem 8.23 instead of 7.15 and 7.20, respectively. ∎

We finally may turn the obtained pseudometrics to metrics and notice:

Corollary 8.25 (Quotient category).

By def. 5.7, for each objects $\overline{S},\overline{Q}$ in $\Lambda$ , we may then divide out elements in ${\hom(\overline{S},\overline{Q})}$ to obtain a metric space ${\check{\hom(\overline{S},\overline{Q})}}$ , which is again an abelian idempotent monoid, with metric ${\check{d_{{{\ell}^{(\infty)}}}}}=d_{{\check{{{\ell}^{(\infty)}}}}}$ satisfying the $\nabla$ - and product-inequality with respect to $\check{{{\ell}^{(\infty)}}}$ .

These quotients are compatible with composition, i.e. if $X_{1},X_{2}\in\hom(\overline{S},\overline{Q})$ with $[X_{1}]=[X_{2}]$ (class brackets) and $Y_{1},Y_{2}\in\hom(\overline{Q},\overline{R})$ with $[Y_{1}]=[Y_{2}]$ then

[X_{1}][Y_{1}]:=[X_{1}\circ Y_{1}]=[X_{2}\circ Y_{2}]

In this way we obtain a category $\check{\Lambda}$ out of $\Lambda$ by replacing the morphism sets $M$ by $\check{M}$ .

Analogous distribution formulas as in lemma 8.7 hold in $\check{\Lambda}$ .

Proof.

The first assertions follow by definition 5.7 and proposition 5.8.

For $d:=d_{{{\ell}^{(\infty)}}}$ and $\ell:={{\ell}^{(\infty)}}$ we have

d(X_{1}Y_{1},X_{2}Y_{2})\leq d(X_{1}Y_{1},X_{1}Y_{2})+d(X_{1}Y_{2},X_{2}Y_{2})

\leq\ell(X_{1})d(Y_{1},Y_{2})+d(X_{1},X_{2})\ell(Y_{2})=0

by the $\Delta$ - and product-inequality for $d$ and $\ell$ found out in theorem 8.23. ∎

The next lemma indicates how we could show that a homomorphism is uniformly continuous with respect to various derived metric candidates.

Lemma 8.26.

Let $U:\overline{S}\rightarrow\overline{Q}$ be a homomorphism. If there is an $M\geq 0$ such that

d_{\overline{Q}}(Ua,Ub)\leq Md_{\overline{S}}(a,b)

for all $a\leq b$ , then this inequality holds for all $a,b\in\overline{S}$ . Then also

{\tilde{\hat{d_{\overline{Q}}}}}(Ua,Ub)\leq M{\tilde{\hat{d_{\overline{S}}}}}(a,b)

Analogous assertions hold for $\sigma$ instead of $d$ .

Proof.

The first assertion is by the formulas of lemma 5.4, and the second one follows directly from definitions 7.1 and 7.4. ∎

We may apply the last theorem to get a Banach-Mazur-like distance between languages:

Definition 8.27 (Banach-Mazur-like distance between languages).

Let $\hom(\overline{S},\overline{Q})$ be equipped with a length function $\ell$ . Then a Banach-Mazur-like distance between $\overline{S}$ and $\overline{Q}$ (e.g. languages) may be defined as follows:

d(\overline{S},\overline{Q})=\log\inf\{\ell(\phi)\ell(\phi^{-1})\in{\mathbb{R}}|\;\phi:\overline{S}\rightarrow\overline{Q}\mbox{ an isomorphism}\}

Lemma 8.28.

This distance is a pseudometric on any category of isomorphic abelian, idempotent monoids such that the hom-sets $\hom(\overline{S},\overline{Q})$ are provided with a length function $\ell$ satisfying the product-inequality (26).

The length functions $\ell^{(\infty)}$ of theorem 8.23 may be good candidates for the Banach-Mazur-like distance as they satisfy the product inequalities (26) (indeed, apply the inequalities of theorem 8.23 and note that ${d_{{{\ell}^{(\infty)}}}}(x,\emptyset)={{\ell}^{(\infty)}}(x)$ ). Alternatively one may start with definitions 8.11 and 8.12 with $d=d_{\ell}$ , and then using $\ell$ of lemma 8.17.

9. Non-increasing length functions

In this section we work with length functions which may not be monotonically increasing, that is, without requiring inequality (9). The length function of definition 2.10 may be replaced by the following possibly non-increasing bigger one:

Definition 9.1 (Length function on quotient).

Given a language $S$ with variables and a length function $L:S\rightarrow{\mathbb{R}}$ on it, set

\ell:\overline{S}\rightarrow{\mathbb{R}}

\ell([s])=\inf\{L(t)\in{\mathbb{R}}|\,t\in S,\;s\equiv t\}

Now assume that we are given an abelian, idempotent monoid $(\overline{S},\nabla,\emptyset)$ with a non-increasing length function $\ell$ on it (i.e. without postulating (9)).

In this section we do the same limit as in definitions 7.16 and 7.18, but instead of implementing the bar operator of lemma 7.10 we choose other functions $d_{\ell}$ and $\sigma_{\ell}$ as follows, which do not require the monotone increasingness of $\ell$ and are still positive:

Definition 9.2.

We define for all $x,y\in\overline{S}$ and $p\geq 1$ ( $p\in{\mathbb{R}}$ )

d_{\ell}(x,y):=|\ell(x\nabla y)-\ell(x)|{\vee}|\ell(x\nabla y)-\ell(y)|

\sigma_{p,\ell}(x,y):=\Big{(}|\ell(x\nabla y)-\ell(x)|^{p}+|\ell(x\nabla y)-\ell(y)|^{p}\Big{)}^{1/p}

We set $\sigma_{\ell}:=\sigma_{1,\ell}$ for short. We use the same notations $d$ and $\sigma$ as in definition 2.12, as notice, there is no difference between definitions 9.2 and 2.12 for monotonically increasing length functions $\ell$ . We still have the lower bound $|\ell(x)-\ell(y)|\leq\sigma_{\ell}$ for $\sigma_{\ell}$ , but not so for $d_{\ell}$ .

Let us be given a positive, symmetric, nilpotent function $d$ on $\overline{S}$ . We are going to use definitions 7.1, 7.4 and 7.9. Define $\ell$ as in definition 7.11.

Lemma 9.3.

If $d$ satisfies the $\Delta$ - and $\nabla$ -inequality then for $\ell$ as in (20),

d_{\ell}\leq d

d_{\overline{\ell}}\leq d

Proof.

Yielded by a similar proof as for lemma 7.14. ∎

We do now a procedure as in definition 7.16, but with the bar operator now being ignored:

Definition 9.4.

Set $d^{(1)}:=d$ , and then for $n\geq 1$ put

\ell^{(n+1)}(x):={\tilde{\hat{\ell^{(n)}}}}(x):={\tilde{\hat{d^{(n)}}}}(x,\emptyset)

d^{(n+1)}:=d_{\ell^{(n+1)}}

Corollary 9.5.

Given a positive, symmetric, idempotent function $d$ on an abelian, idempotent monoid $(\overline{S},\nabla,\emptyset)$ , define $\ell^{(n)}$ and $d^{(n)}$ as in definitions 9.4 and 9.2.

We get limits ${{\ell}^{(\infty)}}$ and ${{d}^{(\infty)}}$ as in definition 7.18, and the analogous assertion of lemma 7.19. That is, ${{\ell}^{(\infty)}}$ is a non-increasing length function on $\overline{S}$ , and ${{d}^{(\infty)}}=d_{{{\ell}^{(\infty)}}}$ is a pseudometric on $\overline{S}$ satisfying the $\nabla$ -inequality.

Proof.

By lemma 9.3, applied to $d:={\tilde{\hat{d^{(n)}}}}$ , we get

d^{(n+1)}=d_{{\tilde{\hat{\ell^{(n)}}}}}\leq{\tilde{\hat{d^{(n)}}}}\leq{d^{(n)}}

So $d^{(n)}$ is decreasing, in particular thus also $\ell^{(n)}$ by definition 9.4, and we get a limit ${{\ell}^{(\infty)}}:=\lim_{n}\ell^{(n)}$ . The rest goes verbatim as in lemmas 7.19 and 7.20. ∎

Proposition 9.6.

If $d$ is of the form $d=d_{\ell}$ for a non-increasing length function $\ell$ then $d_{\ell}$ satisfies the $\nabla$ - and $\Delta$ -inequality $\Leftrightarrow$

d_{\ell}={\tilde{\hat{d_{\ell}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\ell}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}

Proof.

This is analogously proved as proposition 7.22. ∎

However, it cannot be claimed that $\ell^{(\infty)}$ is increasing.

Also, contrary to theorem 7.21, $\delta$ of definition 6.2 with respect to ${{\ell}^{(\infty)}}$ might not be increasing and ${{\sigma}^{(\infty)}}$ might not satisfy the $\nabla$ - and $\Delta$ -inequality, as all the proofs given rely on the monotonically increasingness of ${{\ell}^{(\infty)}}$ .

We could now form $\overline{{{\ell}^{(\infty)}}}$ , define $d_{\ell}$ as above and restart the above procedure, or the one of section 7. There seem to be endless many variants. A principal problem is, however this can be said, that $\ell\leq\ell_{2}$ might not imply $d_{\ell}\leq d_{\ell_{2}}$ , so one might not be able to order the limits by $\ell$ .

10. General examples

In this and the remaining sections, we assume that $S$ is the mathematical language of technics and physics, but also of typical mathematics. We may use quantors, set-definitions, but over all sentences need not to be completely defined in a rigorous mathematical way. This may also not be useful. In finding a new theory, it may be better to let the precise meaning open. For example, a quantum field theory may be singular and yield infinities, but after renormalization the situation turns, see the various textbooks about quantum field theory, for example [22].

We distinguish variables in sentences which are unbound, like $x$ in $x+2=y$ , and bound ones like $x$ in $f(x):=x+2$ . Here, $f$ is a function which is being defined, and in $a(y):=2+f(y)$ the function $f$ is being called. We sometimes underline variables (like so: $\underline{x}$ ) to indicate that these are main variables.

For the length function $L$ on $S$ we choose a simple character count (i.e. $L(a_{1}\ldots a_{n})=n$ for $n\geq 0,a_{i}\in A$ ), without counting non-notated brackets, even if they are theoretically there (as in $x^{y}$ ), and set $L(:=)=1$ , $L({\vee})=0$ . For example, $L(x^{y}{\vee}(a+b)^{2})=8$ . The sign ${\vee}$ is always understood to bind most weakly (for example, $a+b{\vee}c$ means $(a+b){\vee}c$ ). The sign $\approx$ means approximately equal between two real numbers, and its more precise meaning depends (for example, $90\approx 100$ for integers). We note that for most of the discussion this choice of $L$ is essentially irrelevant, and one may as well choose any other length function $L$ , or the length function $\ell={{\ell}^{(\infty)}}$ of theorem 7.21, but to get specific numbers in certain examples we choose the character count for simplicity.

From now on, everything is to be understood to be very sloppy, inexact, vague and superficial!

Example 10.1 (Main and auxiliary variables).

Main variables cannot be deleted by (6). They are what the theory is about, for example a fundamental field $\underline{\psi}$ in physics, or the entry point $\underline{main}$ in C++. Auxiliary variables help to simplify and shorten expressions, typically functions which are often called. For example

\underline{x}=a+b{\vee}a=3{\vee}b=5\qquad\equiv\qquad\underline{x}=8{\vee}a=3{\vee}b=5\qquad\equiv\qquad\underline{x}=8

\underline{f}(x)=g(x)^{2}+g(x)^{3}{\vee}g(x)=2x\qquad\equiv\qquad\underline{f}(x)=4x^{2}+8x^{3}

by (3) and (6).

Example 10.2 (Length function $\ell$ is not directly subadditive on the level of $S$ ).

Given $x,y\in S$ , in general we have

\ell([x{\vee}y])\not\leq\ell([x])+\ell([y])

Indeed, if $y$ is longer and contains only auxiliary variables and $x$ is short with main variables and uses the variables of $y$ (typically by function calls, that is, $x$ outsources the bulk of formulas to $y$ as auxilliary functions) then, by (6) one has $y\equiv\emptyset$ , and thus we got a contradiction

0\ll\ell([x{\vee}y])\leq\ell([x])+\ell([y])=\ell([x])\approx 0

The next two examples show that even if the class elements $[x{\vee}y{\vee}\phi(x)]\neq[x{\vee}y]$ , their lengths are approximately equal.

Example 10.3 (Copies are not superfluous in general).

Even if $\phi(x)\bot x,\phi(x)\bot y$ for $x,y\in S,\phi\in P_{S}$ , then - contrary to (4) which says that $x{\vee}\phi(x)\equiv x$ because $\phi(x)$ is a superfluous copy of $x$ - we still may have

(29)

x{\vee}y{\vee}\phi(x)\not\equiv x{\vee}y

Indeed, set $x:=\{\underline{f}:=a\}$ and $y:=\{a(x):=2x\}$ , then $\phi(x)=\{\underline{g}:=b\}$ but it cannot equivalently be deleted in

x{\vee}y{\vee}\phi(x)\quad=\quad\Big{(}\underline{f}:=a\quad{\vee}\quad a(x):=2x\quad{\vee}\quad\underline{g}:=b\Big{)}

because $\underline{g}:=b$ is a different theory than $\underline{f}:=a$ because $a$ is further determined, whereas $b$ is not.

Even if (29) is an inequality in general, it becomes approximately an identity when put into $\ell$ , because $\phi(x)$ is just a copy of $x$ :

Example 10.4 (Copies are cheap).

If $x,y\in S,\phi\in P_{S}$ and $\phi(x)\bot x,\phi(x)\bot y$ then we have

\ell([x{\vee}y{\vee}\phi(x)])\approx\ell([x{\vee}y])

In other words, if we take a theory $x{\vee}y$ , and add to it a copy $\phi(x)$ (with other variable names) of the subtheory $x$ , then the length (= cost) of the outcoming theory $x{\vee}y{\vee}\phi(x)$ does not much change. This is clear for computer programs because in $\phi(x)$ we just need to call all the routines of $x$ and need not to rewrite them twice in $\phi(x)$ . Indeed, going back to the concrete example 10.3 we may write

x{\vee}y{\vee}\phi(x)\quad\equiv\quad\Big{(}F(u,v):=u-v\quad{\vee}\quad F(\underline{f},a)=0\quad{\vee}\quad y\quad{\vee}\quad F(\underline{g},b)=0\Big{)}

That is, instead of $x$ we wrote $F(u,v):=\underline{u}-v{\vee}F(\underline{f},a)=0$ , and instead of $\phi(x)$ $F(\underline{g},b)=0$ . So we have one long definition of $F$ , approximately as long as $x$ itself, and two short calls $F(\underline{f},a)=0$ and $F(\underline{g},b)=0$ . So we may end up with the claim:

\ell([x{\vee}y{\vee}\phi(x)])\approx\ell([F(u,v):=\underline{u}-v{\vee}F(\underline{f},a)=0{\vee}y])+\ell([F(\underline{g},b)=0])\approx\ell([x{\vee}y])

In this short example it may not pay off, because the length of $x$ is so short, but if $x$ is long with few unbound variables, certainly.

By definition 2.8, the $\nabla$ -operation requires that $[x]\nabla[y]=[x{\vee}y]$ if $x$ and $y$ have no variables in common. The next lemma shows when this identity approximately holds in the quotient ${\check{S}}$ , see definition 5.7, even if there are common variables.

Lemma 10.5 ( $\nabla$ -operation as an idealization of ${\vee}$ -operation).

Here, $\precsim$ means approximately lower or equal. For all $x,y\in S$ we have

\ell([x{\vee}y])\precsim\ell([x]\nabla[y])\quad\Rightarrow\quad d([x{\vee}y],[x]\nabla[y])\approx 0

Proof.

Indeed, two-fold application of example 10.4 yields

d([x{\vee}y],[x]\nabla[y])\approx\ell([x{\vee}y{\vee}\phi(x){\vee}\psi(y)])-\ell([x{\vee}y])

\approx\ell([x{\vee}y])-\ell([x{\vee}y])=0

∎

The next two important lemmas about auxiliary and main variables are exactly proved in the general framework of section 2.

Lemma 10.6 (The more main variables the longer the sentence).

If $s\in S$ and $t\in S$ a copy of it, but where some auxiliary variables are declared to main variables, then

\ell([s])\leq\ell([t])

Proof.

In our natural definition of $S$ there is no difference between main and auxiliary variables besides the rule (6), which can only help to reduce the $\ell$ -length of a sentence, whence the inequality. ∎

The next lemma, which is an important assertion that holds actually in the general framework of section 2 as well, says that in an optimal sentence we can change all auxiliary variables to main variables without changing the element in the quotient ${\check{S}}$ :

Lemma 10.7 (Optimal sentences and main variables).

If $x\in S$ is optimal in the sense that $\ell([x])=L(x)$ , and $x^{\prime}$ is a copy of the sentence $x$ but where all unbound auxiliary variables are replaced by main variables (in a bijective way) then

d([x],[x^{\prime}])=0

[x]\leq[x^{\prime}]\mbox{ in }\overline{S}\qquad[[x]]=[[y]]\mbox{ in }{\check{S}}

Proof.

Choose a variable transformation $\phi\in P_{S}$ such that the copy $\phi(x)$ of $x$ is disjoint to $x$ and $x^{\prime}$ , i.e. $\phi(x)\bot x$ and $\phi(x)\bot x^{\prime}$ .

Let $z$ be a verbatim copy of $\phi(x)$ but where all unbound auxiliary variables are declared to main variables. Then by lemma 10.6, $\ell([\phi(x){\vee}x^{\prime}])\leq\ell([z{\vee}x^{\prime}])$ . By law (4), $z$ is just another variable transformed copy of $x^{\prime}$ and thus $z{\vee}x^{\prime}\equiv x^{\prime}$ . Hence, by assumption we get

\ell([x]\nabla[x^{\prime})])=\ell([\phi(x){\vee}x^{\prime}])\leq\ell([z{\vee}x^{\prime}])=\ell([x^{\prime}])

\leq L(x^{\prime})=L(x)=\ell([x])\leq\ell([x]\nabla[x^{\prime})])

Thus $d([x],[x^{\prime}])=0$ by definition 2.12. ∎

The following example illustrates the last lemma:

Example 10.8 (Changing auxiliary to main variables).

(i) Consider the sentence $\alpha\in S$ defined by

\alpha:=\quad\Big{(}\underline{f}:=\sin\circ u+\cos\circ u\quad{\vee}\quad u(x):=5x^{9}+2x^{3}+77x^{2}+1\Big{)}

Let $\alpha^{\prime}$ be a verbatim copy of $\alpha$ but where each $u$ is replaced by $\underline{u}$ , so a main variable. As it appears, the representations of $\alpha$ and $\alpha^{\prime}$ are already shortest possible and thus

(30)

\ell([\alpha])=L(\alpha)=L(\alpha^{\prime})=\ell([\alpha^{\prime}])

\ell([\alpha^{\prime}]\nabla[\alpha])=\ell([\alpha^{\prime}\;{\vee}\;\underline{g}(x):=\sin(\underline{u}(x))+\cos(\underline{u}(x))])=\ell([\alpha])

because $\underline{g}$ is a superfluous copy of $\underline{f}$ by (4), and where we have dropped the definition of $u$ in $\alpha$ by (6) and taken $\underline{u}$ from $\alpha^{\prime}$ instead. Thus

d([\alpha],[\alpha^{\prime}])=\ell([\alpha{\vee}\alpha^{\prime}])-\ell([\alpha])=0

Because of (30), this is in accordance with lemma 10.7.

(ii) But if $u$ is a short polynomial, say $u(x)=2x$ , then $\alpha$ can be simplified by saying $[\alpha]=[g(x):=\sin(2x)+\cos(2x)]$ (so $\alpha$ is not optimal, i.e. $\ell([\alpha])\neq L(\alpha)$ ), but $\alpha^{\prime}$ can be not, as $\underline{u}$ is a main variable and so cannot be deleted, and so

d([\alpha],[\alpha^{\prime}])=\ell([\alpha{\vee}\alpha^{\prime}])-\ell([\alpha])=\ell([\alpha^{\prime}])-\ell([\alpha])>0

This example demonstrates that the assumption of lemma 10.7 is necessary.

Example 10.9 (Approximate automaticity of (4)).

Relation (4), that is, $x{\vee}\phi(x)\equiv x$ for $x\bot\phi(x)$ , is an idealization that holds approximately automatically in our prototype language, that is, $\ell(x{\vee}\phi(x))\approx\ell(x)$ . Indeed, as an illustration considering at first the sentence $\alpha$ of example 10.11.(i), we may write (without using (4))

\alpha{\vee}\phi(\alpha)\quad\equiv\quad\alpha{\vee}\underline{g}:=\underline{f}

because we can shorten the long expression $\phi(\alpha)$ by equivalence in mathematics by just saying that we have a main variable $\underline{g}$ which is just a function like $\underline{f}$ and so we have

\ell([\alpha{\vee}\phi(\alpha)])=\ell([\alpha])+3

More generally, one drops the copy $\phi(\alpha)$ but rescues the main variables from $\phi(a)$ by adding a list $\underline{g}_{1}:=\underline{f}_{1}{\vee}\ldots{\vee}\underline{g}_{n}:=\underline{f}_{n}$ where $\underline{g}_{i}$ are the main variables appearing in $\phi(\alpha)$ and $\underline{f}_{i}$ those corresponding in $\alpha$ .

Even more, one could encode the main variables $\underline{f}_{1},\ldots,\underline{f}_{n}$ into one main variable $\underline{f}$ as a function on ${\mathbb{N}}$ and use $\underline{f}(1),\ldots,\underline{f}(n)$ instead. It would be then enough to say $\underline{g}:=\underline{f}$ .

The fact that in optimal sentences (i.e. those $x\in S$ for which $L(x)\approx\ell([x])$ ) the auxiliary variables can be turned to main variables without changing the optimality (see lemma 10.7) can be exploited to give an instance where the triangle inequality is invalid:

Example 10.10 (Violation of $\Delta$ -inequality).

Assume that $a\in S$ has two approximately optimal representations $x,y\in S$ in the sense that $[a]=[x]=[y]$ in $\overline{S}$ and $\ell([a])\approx L(x)\approx L(y)$ . Of course,

d([a],[x])=d([a],[y])=0

Since $x$ and $y$ are approximately optimal short in length, we may replace all unbound auxiliary variables of $x$ and $y$ , respectively, by main variables to obtain new sentences $x^{\prime}$ and $y^{\prime}$ , respectively, and still

d([x],[x^{\prime}])\approx 0\qquad d([y],[y^{\prime}])\approx 0

by lemma 10.7. But because $x$ and $y$ may be very different in $S$ , by all the new main variables appearing now in $x^{\prime}$ and $y^{\prime}$ , $x^{\prime}$ and $y^{\prime}$ contain much necessary information which cannot be deleted (so (6) does not apply) and it may be difficult to simplify $x^{\prime}{\vee}\phi(y^{\prime})$ , so we may get

d([x^{\prime}],[y^{\prime}])\gg 0

But this contradicts the $\Delta$ -inequality, as assuming it, we got

0\approx d([x^{\prime}],[a])+d([a],[y^{\prime}])\geq d([x^{\prime}],[y^{\prime}])\gg 0

It would be helpful if one had a difference element in the sense that if $x\leq y$ for $x,y\in\overline{S}$ then there is a ‘complement’ $z\in\overline{S}$ such that $x\nabla z=y$ ( $x$ united with $z$ is the whole $y$ ) and $\zeta(x\cap z)=0$ (the intersection of $x$ and $z$ is zero), that is,

\ell(x\nabla z)=\ell(x)+\ell(z),

by definition 6.1, or in other words, $\zeta(z)=\zeta(y\backslash x)$ . But it does not exist in general:

Example 10.11 (Difference Element does not exist).

Consider two formulas $\alpha,\gamma\in S$ as follows:

\alpha:=\quad\Big{(}\underline{b}(x):=x^{2}+1\quad{\vee}\quad\underline{y}:=b^{2}+b\Big{)}

\gamma:=\quad\Big{(}\underline{z}(x):=x^{4}+3x^{2}+2\Big{)}

We assume that $\alpha$ and $\gamma$ , as it appears, are already in their shortest form, so, observing also that $\underline{y}=\underline{z}$ are the same polynomials, and to this applying (4) and then (3), we get

[\alpha]\nabla[\gamma]=[\underline{b}(x):=x^{2}+1\;{\vee}\;\underline{y}(v)=v^{4}+3v^{2}+2\;{\vee}\;\underline{z}(t):=t^{4}+3t^{2}+2]=[\alpha]

\ell([\alpha])=L(\alpha)=15,\quad\ell([\gamma])=L(\gamma)=13,\quad[\gamma]\leq[\alpha],\quad\ell([\alpha]\nabla[\gamma])=\ell([\alpha])

\zeta([\alpha]\backslash[\gamma])=\ell([\alpha]\nabla[\gamma])-\ell([\gamma])=d([\alpha],[\gamma])=2

The last line shows that a difference element $[\alpha]\backslash[\gamma]$ in $\overline{S}$ does not exist, as it should have only length $2$ , which appears impossible in our language $S$ .

11. Examples from Physics

In this section we collect various examples from physics, and indicate that the distance between an existing and a new improved theory is often very low, as well as the lengths of the theories itself. A very good instance is already the transition form the Klein-Gordon equation to the Dirac equation discussed in example 2.13 and the paragraph before it. Another example would be Maxwell’s equations extended from Ampere’s and Faraday’s equations.

Example 11.1 (Quantum mechanics).

Classical mechanics can be described as follows. In these equations, $\dot{p}_{i}$ means derivative of $p_{i}$ after time, and $\dot{q}_{i}$ a free independent variable

p_{i}=\frac{\partial{\mathcal{L}}}{\partial\dot{q}_{i}}(q,\dot{q},t)\qquad\mbox{(compute $\dot{q}_{i}=\dot{q}_{i}(q,p,t)$ from these equations)}

H(q,p,t)=\sum_{i=1}^{n}\dot{q}_{i}p_{i}-{\mathcal{L}}\qquad\mbox{(Hamilton function)}

\dot{p}_{i}=-\frac{\partial H}{\partial q_{i}}\quad{\vee}\quad\dot{q}_{i}=\frac{\partial H}{\partial p_{i}}\qquad\mbox{(Hamilton equations)}

If the Lagrangian ${\mathcal{L}}=T-U$ (kinetic energy $-$ potential energy) then $H=E$ (energy). The Schrödinger equations [18, 19] read

(31)

\hat{p}_{i}=-ih\frac{\partial}{\partial q_{i}}

(32)

ih\frac{\partial\psi}{\partial t}(q,\hat{p},t)=H(q,\hat{p},t)\psi(q,t)

We abbreviate quantum mechanics by $QM$ , Schrödinger equations (31)-(32) by $SE$ , Hamilton function by $H$ , Hamilton equations by $HE$ and classical mechanics by $CM$ . The distance between $QM=SE{\vee}H$ and $CM=HE{\vee}H$ is

d([QM],[CM])=\ell([QM]\nabla[CM])-\ell([CM])=

\ell([SE{\vee}H{\vee}\phi(HE){\vee}\phi(H)])-\ell([CM])

for a suitable $\phi\in P_{S}$ by def. 2.8. Since $H$ and $\phi(H)$ are copies of each other, we may take the Hamilton function $H$ instead of its copy $\phi(H)$ in $\phi(HE)$ and so choose a variable transformation $\psi\in P_{S}$ accordingly and then dropping the superfluous term $\phi(H)$ which only consists of auxiliary variables and which is not needed anymore by (6). So we get

=\ell([SE{\vee}H{\vee}\psi(HE)])-\ell([HE{\vee}H])

(33)

\approx\ell([SE])+\ell([H]+\ell([\psi(HE)])-\ell([HE])-\ell([H])

=\ell([SE])\approx 100

a low number, for all Hamilton functions $H$ .

In (33) we have assumed that there is not much optimization possible between the involved parts, in particular in relation to a longer Hamilton function $H$ .

The $\nabla$ -inequality is somehow the idealization of an estimate as above. Compare it with (cf. also lemma 10.5) the fast approximate estimate by definition 4.1, provided the $\nabla$ -inequality holds for $d$ ,

d([SE]\nabla[H],[HE]\nabla[H])\leq d([SE],[HE])

Example 11.2 (General relativity).

In generalizing the field equation of Newtonian mechanics to relativity, Einstein noted that because of the equation $E=mc^{2}$ , a perpetuum mobile could be formed by sending a light beam upwards from a big mass like the earth and so lifting mass upwards in a gravitational field without using energy.

To avoid violation of the theorem of conservation of energy, he suggested that spacetime is deformed under the influence of mass, so that the upwards-traveling lightbeam could shift to red and so lose energy. To this end, mass is substituted by energy, which is turned to a tensor to obtain the energy momentum tensor $T$ , which interacts with the metric $g$ of the 4-dim spacetime by the formula [7]

R_{\lambda\mu}-\frac{R}{2}g_{\lambda\mu}=-\frac{8G\pi}{c^{4}}T_{\lambda\mu}

Once again, the distance between general relativity GR and Newtonian field equations NF appears very low, i.e.

d([GR],[NF])\approx 100

maybe the shortest possible of a solution extending SR, NF and solving the perpetuum mobile problem. Simply already because the theory of GR is short, i.e. $\ell([GR])$ is low (cf. lemma 3.5).

Since Maxwell’s equations are Lorentz invariant, and $E=mc^{2}$ is deduced from Maxwell’s equations, GR might already follow from the latter by optimization, because if nature were not Lorentz invariant, but the Maxwell’s equations hold, the whole alternative theory might be longer.

Problem 11.3.

Given any theory $x$ with singularities, indicate a theory $y$ without singularities (i.e. without mathematical pathological inconsistencies and divergences) and such that $y$ works approximately as $x$ in the range where $x$ works and

d(x,y)=\mbox{low}

Even the low distance alone might increase the likelihood that $y$ generalizes $x$ properly.

12. Examples of Computing

For computer languages we use the definitions of example 2.14.

Lemma 12.1.

Assume that $\phi:A\rightarrow B$ is an isomorphism between two computer languages $A,B$ (exceptionally with only finitely many main variables available).

Then there is a constant $M\geq 0$ such that for all $s,t\in A$

|d_{B}(\phi(s),\phi(t))-d_{A}(s,t)|\leq M

Hence

\lim_{d_{A}(s,t)\rightarrow\infty}\frac{d(\phi(s),\phi(t))}{d(s,t)}=1

Proof.

Indeed, write a compiler or interpreter $i\in B$ such that

\phi(s)\quad\equiv\quad i{\vee}\tilde{s}\quad=\quad\Big{(}i\quad{\vee}\quad a:=s^{\prime}\quad{\vee}\quad b\Big{)}

does the same on the computer $B$ as $\phi(s)$ does by compiling or interpreting the code $s\in A$ on the computer $B$ . We assume since $\phi$ is an isomorphism this is possible, and that $\phi$ is given on the formal language level as in definition 8.1 in this proof. Here $s^{\prime}$ means an optimal short program $s$ (i.e. $\ell_{A}([s])=L_{A}(s)$ ) as a pure text string, which we assign the variable $a$ above. Further, $b$ means some minor code containing all main variables of $\phi([s])$ and calling $i$ to interpret or compile $a$ . Moreover, $\tilde{s}:=(a:=s^{\prime}{\vee}b)$ is just an abbreviation.

We also assume that

L_{A}(s)=L(s^{\prime})

gives only the length of the text string $s^{\prime}$ and similarly so for $L_{B}(t)$ . Also, as we have only finitely many main variables at all (say one entry point of a program is the main variable), we assume that there is a fixed constant $K$ such that

L_{B}(b)<K

for all $s\in S$ . We have

(i{\vee}\tilde{s})\nabla(i{\vee}\tilde{t})\equiv i{\vee}\tilde{s}{\vee}j{\vee}\tilde{t}\equiv i{\vee}\tilde{s}{\vee}\tilde{t}\equiv i{\vee}\widetilde{s{\vee}t}\equiv i{\vee}\tilde{x}

for any $x=s{\vee}t$ . Here, the second equality is because we do not need a copy $j$ of the compiler $i$ , the third one is because instead of interpreting $s^{\prime}$ and $t^{\prime}$ separately, we may interpret $(s{\vee}t)^{\prime}$ at once to get the same code.

Hence, setting $M:=L_{B}(i)+3+K$ we get

\ell_{B}([\phi(s)])=\ell_{B}([i{\vee}\tilde{s}])\leq L_{B}(i)+L_{B}(\tilde{s})\leq L_{B}(i)+L(s^{\prime})+3+L_{B}(b)

\leq M+L(s^{\prime})=M+L_{A}(s)=\ell_{A}([s])+M

Analogously we get

\ell_{A}([\phi^{-1}(t)])\leq\ell_{B}([t])+M

so that we end up with

|\ell_{B}([\phi(s)])-\ell_{A}([s])|\leq M

Thus

|\sigma([i{\vee}\tilde{s}],[i{\vee}\tilde{t}])-\sigma([s],[t])|

\approx|2\ell_{B}([i{\vee}\tilde{x}])-\ell_{B}([i{\vee}\tilde{s}])-\ell_{B}([i{\vee}\tilde{t}])-2\ell_{A}([x])+\ell_{A}([s])+\ell_{A}([t])|

\leq 4M

∎

We are going to define a Reduced Instruction Set Central Processing Unit (RISC CPU) in our ordinary mathematical language of physics and technics by describing its function of memory, accumulator and program pointer in spacetime:

Example 12.2.

A simple computer can be realized as formulas like so:

x_{t+1,n}=a_{t}[x_{t,p_{t}}=S{\wedge}x_{t,p_{t}+1}=n]+x_{t,n}(1-[x_{t,p_{t}}=S{\wedge}x_{t,p_{t}+1}=n])

	$\displaystyle a_{t+1}$	$\displaystyle=$	$\displaystyle x_{t,p_{t}+1}[x_{t,p_{t}}=L]+(a_{t}+x_{t,p_{t}+1})[x_{t,p_{t}}=A]$
			$\displaystyle+(a_{t}*x_{t,p_{t}+1})[x_{t,p_{t}}=M]+a_{t}[x_{t,p_{t}}\notin\{L,A,M\}]$

p_{t+1}=x_{t,p_{t}+1}[x_{t,p_{t}}=B{\wedge}a_{t}\geq 0]+(p_{t}+2)(1-[x_{t,p_{t}}=B{\wedge}a_{t}\geq 0])

where $[A]=1\in{\mathbb{R}}$ if an assertion $A$ is true, and $[A]=0\in{\mathbb{R}}$ otherwise.

Here, $x:{\mathbb{Z}}\times{\mathbb{Z}}\rightarrow{\mathbb{R}}$ stands for memory, $a:{\mathbb{Z}}\rightarrow{\mathbb{R}}$ for an accumulator, and $p:{\mathbb{Z}}\rightarrow{\mathbb{R}}$ for program pointer. Everything is time indexed $t\in{\mathbb{Z}}$ . Then $x_{t,n}\in{\mathbb{R}}$ is the value of memory at place $n\in{\mathbb{Z}}$ at time $t\in{\mathbb{N}}$ . Similarly, $a_{t},p_{t}\in{\mathbb{R}}$ is the value of accumulator, program pointer, respectively, at time $t\in{\mathbb{N}}$ . We have the opcodes $S,L,A,M,B$ (some constant real values) which stand for store accumulator, load accumulator, add accumulator, multiply accumulator, and branch. One command has the format opcode and value in the next memory. For example

L1000\quad B2000

mean, load accu with the value at memory address 1000, or, branch to 2000 if accu $\geq 0$ .

If a sentence $s\in S$ is longer, it may pay off to implement the above computer and implement the formula $s$ via a program. In other words, to add the above computer $c$ as auxiliary variables, i.e. $s\equiv s{\vee}c$ and simplify $s{\vee}c$ by writing a program on $c$ which substitutes (part of) $s$ . In this way, long sentences are in concurrence with computing, which emerges out of nowhere simply because of the pressure of optimization.

The above formulas are similarly to differential equations in rough shape. One could analogously allow functions than real values, that is, $x:{\mathbb{Z}}\times{\mathbb{Z}}\rightarrow C^{\infty}({\mathbb{R}})$ and add a differential command for the accumulator, say, $a_{t+1}=da_{t}/dy$ .

We may enrich any language without variables with variables to get a macro language:

Example 12.3 (Macro language).

Given a language $S$ , with or without variables, we may form a macro language $M$ by adding countably infinitely many auxiliary and main variables, function definitions (macros) which accept text strings as input and give a text string as output, function calls, concatenation of text (written .), and a ${\vee}$ sign. Sentences of $M$ are sentences of $S$ enriched with macros.

For example, both $\underline{x}$ are the same:

\underline{x}:=\Big{(}f(`aa^{\prime},^{\prime}b^{\prime}){\vee}f(x,y):=x.x.y.`teee^{\prime}.y\Big{)}\quad=\quad\underline{x}:=^{\prime}aaaabteeeb^{\prime}

If two sentences $s,t$ in $S$ are similar by much text overlap, the distance $d(s,t)$ becomes low in the macro language $T$ by using macros.

Example 12.4 (Further examples).

(i) In the usual mathematical framework we could also allow Boolean algebra expressions like

A:=\quad\underline{a}=x_{1}\overline{x}_{2}+x_{9}(\overline{x}_{7}+x_{6})+....

and compare $d(A,B)$ . More generally we may compare any two functions $f,g$ which allow definitions by symbolic expressions by considering $d(f,g)$ .

(ii) Further we could work with text strings and concatenation in our general mathematical framework and in this way get a stronger macro language than in example 12.3 as we had natural numbers, functions or pointers to functions as arguments in functions calls etc.

(iii) We may consider set definitions and in this way compute the distance between sets, or algebraic structured sets like rings etc.

(iv) Mathematical proofs may be compared by writing down proofs $a$ , regarding them as text strings (or better as text strings in the framework of the mathematical logic of deduction), and compare these text strings within the macro language as in example 12.3 or in item (ii).

References

[1] A. Bertoni, M. Goldwurm, and N. Sabadini. Computing the counting function of context-free languages. STACS 87, Theoretical aspects of computer science, Proc. 4th annu. Symp., Passau/FRG 1987, Lect. Notes Comput. Sci. 247, 169-179 (1987)., 1987.
[2] Ehud Cseresnyes and Hannes Seiwert. Regular expression length via arithmetic formula complexity. J. Comput. Syst. Sci., 125:1–24, 2022.
[3] Cewei Cui, Zhe Dang, Thomas R. Fischer, and Oscar H. Ibarra. Similarity in languages and programs. Theor. Comput. Sci., 498:58–75, 2013.
[4] P. A. M. Dirac. The quantum theory of the electron. I. Proc. R. Soc. Lond., Ser. A, 117:610–624, 1928.
[5] P. A. M. Dirac. Quantised singularities in the electromagnetic field. Proc. R. Soc. Lond., Ser. A, 133:60–72, 1931.
[6] Radosav Djordjević, Nebojša Ikodinović, and Nenad Stojanović. A propositional metric logic with fixed finite ranges. Fundam. Inform., 174(2):185–199, 2020.
[7] A. Einstein. Zur allgemeinen Relativitätstheorie. Berl. Ber., 1915:778–786, 1915.
[8] Sean A. Fulop and David Kephart. Topology of language classes. In The 14th meeting on the mathematics of language. Proceedings of the meeting, MoL 14, Chicago, IL, USA, July 25–26, 2015, pages 26–38. Stroudsburg, PA: Association for Computational Linguistics, 2015.
[9] Giangiacomo Gerla. Distances, diameters and verisimilitude of theories. Arch. Math. Logic, 31(6):407–414, 1992.
[10] Yo-Sub Han and Sang-Ki Ko. Alignment distance of regular tree languages. In Implementation and application of automata. 22nd international conference, CIAA 2017, Marne-la-Vallée, France, June 27–30, 2017. Proceedings, pages 126–137. Cham: Springer, 2017.
[11] Stephen W. Hawking and W. Israel. General relativity. An Einstein centenary survey. Cambridge etc.: Cambridge University Press. XVIII, 919 p. £ 37.50 (1979)., 1979.
[12] Oscar H. Ibarra, Ian McQuillan, and Bala Ravikumar. On counting functions and slenderness of languages. Theor. Comput. Sci., 777:356–378, 2019.
[13] Teruo Imaoka. Algebraic semigroups, formal languages and computation. Proceedings of a symposium held at the Research Institute for Mathematical Sciences, Kyoto University, Kyoto, Japan, February 19–21, 2001. RIMS Kokyuroku 1222, viii, 188 p. (2001)., 2001.
[14] Ciresica Jalobeanu. Approximation in languages. In Current issues in mathematical linguistics. 1st International Conference, ICML ’93, Virgili University, Tarragona, Catalonia, Spain, on March 30- 31, 1993. Selected papers, pages 161–169. Amsterdam: North-Holland, 1994.
[15] Sang-Ki Ko, Yo-Sub Han, and Kai Salomaa. Top-down tree edit-distance of regular tree languages. In Language and automata theory and applications. 8th international conference, LATA 2014, Madrid, Spain, March 10–14, 2014. Proceedings, pages 466–477. Berlin: Springer, 2014.
[16] Austin J. Parker, Kelly B. Yancey, and Matthew P. Yancey. Regular language distance and entropy. In 42nd international symposium on mathematical foundations of computer science, MFCS 2017, August 21–25, 2017, Aalborg, Denmark, page 14. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2017. Id/No 3.
[17] Giovanni Pighizzini. How hard is computing the edit distance? Inf. Comput., 165(1):1–13, 2001.
[18] E. Schrödinger. Quantisierung als Eigenwertproblem. I. Ann. der Phys. (4), 79:361–374, 1926.
[19] E. Schrödinger. Quantisierung als Eigenwertproblem. II. Ann. der Phys. (4), 79:489–527, 1926.
[20] Barney Stratford. The topology of language. J. Math. Psychol., 53(6):502–509, 2009.
[21] Steven Weinberg. Ultraviolet divergences in quantum theories of gravitation. In Hawking, Stephen W. and Israel, W., Cambridge etc.: Cambridge University Press. XVIII, 919 p. £ 37.50 (1979)., pages 790–831. 1979.
[22] Steven Weinberg. The quantum theory of fields. Vol. 1: Foundations. Cambridge: Cambridge University Press, corr. repr. of the 1995 orig. edition, 1996.

Information content in formal languages

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Metric

Definition 2.1 (Language with variables).

Definition 2.2 (Variables permutation).

Definition 2.3 (Length function).

Lemma 2.4.

Definition 2.5 (Independence).

Definition 2.6 (Equivalence relation).

Definition 2.7 (Quotient).

Definition 2.8 (Operation).

Lemma 2.9.

Proof.

Definition 2.10 (Length function on quotient).

Lemma 2.11.

Proof.

Definition 2.12 (“Metric”).

Example 2.13.

Example 2.14.

Definition 2.15 (More abstract definition).

Definition 2.16 (Metric candidate).

3. Δ\Delta-inequality

Definition 3.1.

Lemma 3.2.

Lemma 3.3.

Proof.

Lemma 3.4.

Lemma 3.5.

Proof.

Lemma 3.6.

Proof.

Lemma 3.7.

Proof.

4. ∇\nabla-inequality

Definition 4.1.

Lemma 4.2.

Proof.

Lemma 4.3.

Lemma 4.4.

5. Order relation

Definition 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 5.5.

Proof.

Lemma 5.6.

Proof.

Definition 5.7 (From pseudometric to metric).

Proposition 5.8.

Proof.

6. Measure of Boolean expressions

Definition 6.1 (Measure of intersection).

Definition 6.2 (δ\delta-function).

Lemma 6.3.

Proof.

Lemma 6.4.

Proof.

Lemma 6.5.

Proof.

Lemma 6.6.

Proof.

Lemma 6.7.

Proof.

Lemma 6.8.

Proof.

Lemma 6.9.

Proof.

Theorem 6.10.

Proof.

Definition 6.11 (Measure of complement).

Lemma 6.12.

Definition 6.13 (Measure of union).

Definition 6.14.

Definition 6.15.

Lemma 6.16.

Proof.

3. $\Delta$ -inequality

4. $\nabla$ -inequality

Definition 6.2 ( $\delta$ -function).

Definition 7.1 (Forced $\Delta$ -inequality).

Definition 7.4 (Forced $\nabla$ -inquality).

Corollary 7.8 (Projection onto pseudometrics satisfying $\nabla$ -inequality).