This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Information content in formal languages

Bernhard Burgstaller [email protected] Departamento de Matematica, Universidade Federal de Santa Catarina, CEP 88.040-900 Florianópolis-SC, Brasil
Abstract.

Motivated by creating physical theories, formal languages SS with variables are considered and a kind of distance between elements of the languages is defined by the formula d(x,y)=(xy)(x)(y)d(x,y)=\ell(x\nabla y)-\ell(x){\wedge}\ell(y), where \ell is a length function and xyx\nabla y means the united theory of xx and yy. Actually we mainly consider abstract abelian idempotent monoids (S,)(S,\nabla) provided with length functions \ell. The set of length functions can be projected to another set of length functions such that the distance dd is actually a pseudometric and satisfies d(xa,yb)d(x,y)+d(a,b)d(x\nabla a,y\nabla b)\leq d(x,y)+d(a,b). We also propose a “signed measure” on the set of Boolean expressions of elements in SS, and a Banach-Mazur-like distance between abelian, idempotent monoids with length functions, or formal languages.

Key words and phrases:
formal language, variables, metric, distance, equivalence relation, information, information content, abelian, commutative, idempotent, monoid, semigroup, length, length function, measure, Boolean algebra, computing, physics, quantum gravity, physical theories
1991 Mathematics Subject Classification:
94A17, 06E75, 06F05, 68Q45, 94-04

1. Introduction

Variables are the center and core of mathematics, physics, theoretical technics and theoretical economy theory. In this paper we consider formal languages with variables (see definition 2.1), provided with a length function for sentences (for example a simple character count), and define something which appears like a pseudo-metric between the sentences within one languages (see definition 2.12).

The languages are assumed to be equipped with an equivalence relation (def. 2.6), and the more two sentences x,yx,y have in common, the closer their distance becomes.

Actually, this project was motivated by physics in the quest for a solution to quantum gravity, see for instance Weinberg [21] and Hawking and Israel [11]. It is observed, that in general physical theories are surprisingly short when formulated as equations, and that progress in advancing the correct equations in physics is often realized by small modifications in formulas. For example, the formulas between Hamiltonian mechanics and quantum mechanics are very close in distance, also in our approach here, for all given Hamiltonians, even if quantum mechanics and mechanics appear very different when considered without formulas (cf. example 11.1).

We guess that a solution that unites gravity and quantum field theory, when formulated as mathematical equations, is actually close as formulas, for example in the sense proposed in this paper, to the existing ununited theories of gravity and quantum field theory. This might lead to the direction towards a candidate for a united theory, and on the other hand, might be used as a tool to evaluate a given candidate theory by computing the distance to the existing theory so far.

In short terms, the simpler a theory is, the better. The closer it is to existing effective theories, the better, as the more likely it is the correct generalization.

In this sense, even quantum mechanics might very theoretically have had been predicted without any inspection of black body radiation or double slit experiment, because it is perhaps the closest theory to mechanics which avoids the infinities. Simply by looking at formulas.

In this sense also, a solution to quantum gravity is perhaps less a physical problem but more a pure mathematical problem. It may be formulated like so: indicate the theory with the shortest distance - in the sense of this paper, say - to the existing theory, such that no infinities occur (cf. problem 11.3). Even more so, as ultrahigh-energy experiments appear currently to be impossible by several reasons, for example the no-return from black holes obstacle.

After this motivating and ambitious remarks, which may also appear somewhat self-evident, let us come to the hard facts of this paper:

Our definitions may be used much more widely, for example in computing, and it is widely applicable, as the approach is very flexible and does not bother with details but with the core principle. World applications are viewed as formulas. Actually, the distance d(s,t)d(s,t) between two formulas s,ts,t is defined as

d(s,t)=(st)(s)(t)d(s,t)=\ell(s\nabla t)-\ell(s){\wedge}\ell(t)

which means in words: unite both theories ss and tt (called sts\nabla t), optimize their length (called \ell) and compute it (the more overlap they have the shorter it will become) and then subtract the length of the shorter theory of ss and tt.

It looks natural and simple, but is also intriguing because of the minus sign. Even if it appears natural that the triangle inequality - also called Δ\Delta-inequality in this paper - holds for dd (cf. lemma 3.6), it is difficult and enormous involving when trying to verify this. A key of this paper is to repair the definition of dd by making the Δ\Delta-inequality valid, see definition 7.1. See also theorem 6.10 for other conditions which imply the Δ\Delta-inequality.

As another example, it is difficult to get any easy criteria for the uniform continuity of a homomorphism between two languages, for example computer languages, but see lemma 12.1.

To not get lost in details or become too specific, in this paper we mainly forget languages at all but instead work with an abelian, idempotent monoid (S,)(S,\nabla) provided with a length function \ell (see definition 2.15).

This setting can be somewhat compared with a measured space (M,)(M,\cup) with the operation \cup of taking the union of sets and a measure \ell on it. And indeed, we will make parallels, and show how we can compute the “signed measure” ζ\zeta (i.e. length) of a formal Boolean expression in the monoid SS, see definition 6.15. The above formula for dd is easily seen to be indeed a metric for measured spaces (M,)(M,\cup) and :=\nabla:=\cup, see lemma 3.3.

In a main result, see theorem 7.21, we show how one can, starting with a length function \ell, construct a length function (){{\ell}^{(\infty)}} such that the above formula for dd with \ell replaced by (){{\ell}^{(\infty)}} is indeed a pseudometric, and the measure ζ(xy)\zeta(x\cap y) of the Boolean expression of intersection of two sentences xx and yy, is monotonically increasing in xx and yy, as one may naturally expect it. We have however no clue in how far (){{\ell}^{(\infty)}} degenerates, it could even be completely zero (but don’t believe that).

A key role in this paper also plays the \nabla-inequality

d(xy,ab)d(x,a)+d(y,b)d(x\nabla y,a\nabla b)\leq d(x,a)+d(y,b)

which, even if it holds heuristically, see lemma 4.2, we also force by definition finally, see definition 7.4.

For homomorphisms between languages we use the same concepts as for languages itself to define a metric, enriched by the submultiplicativity (cf. lemma 8.14).

As a distance between languages, or more generally, abelian, idempotent monoids, one may use the proposed Banach-Mazur-like distance of definition 8.27.

In the final sections we give various examples which should illustrate and motivate the theory of this paper, presented in an extremely superficially and sloppy way.

Actually, defining pseudometrics on languages, and on a family of languages goes back for a long time with a wide literature, and we only cite samples, far from being in any way complete, Han and Ko [10], Djordjević, Ikodinović and Stojanović [6], Fulop and Kephart [8], Ko, Han and Salomaa [15], Imaoka [13], Cseresnyes and Seiwert [2], Ibarra, McQuillan and Ravikumar [12], Bertoni, Goldwurm and Sabadini [1], Stratford [20], Jalobeanu [14], Parker, K.B. Yancey and M.P. Yancey [16], Cui, Dang, Fischer and Ibarra [3], Pighizzini [17], and in the context of physical theories, Giangiacomo [9]. We will not compare these approaches with the one in this paper, but we think there is a different emphasis on the latter.

Let us remark, that we less stress the exact computability of the proposed metrics, but the idealized relations they should satisfy. Also, our definitions are designed such that we find them particularly useful for physical theories.

2. Metric

Under a language (S,A)(S,A) we mean a set AA called alphabet, and a set S{f:{1,,n}A|n1}{}S\subseteq\{f:\{1,\ldots,n\}\rightarrow A|\,n\geq 1\}\cup\{\emptyset\} of finite text-strings with letters in AA called sentences or formulas or theories. Here, \emptyset means the empty sentence (i.e. empty set). In this section we are going to give successively several definitions, each of which becomes part of the considered setting after the definition.

Definition 2.1 (Language with variables).

A language with variables means a language (S,A)(S,A), where S\emptyset\in S and AA contains at least a sign {\vee} and a distinguished infinite subset of so-called variables XAX\subseteq A, divided further into two disjoint infinite subsets X0X1=XX_{0}\cup X_{1}=X (X0X1=X_{0}\cap X_{1}=\emptyset) called main variables set X0X_{0} and auxiliary variable set X1X_{1}, such that for all elements s,tSs,t\in S and variable permutations ϕPS\phi\in P_{S} (to be explained below),

stSs{\vee}t\in S
ϕ(s)S\phi(s)\in S

Here, sts{\vee}t means the concatenated string of the string ss with the letter {\vee} with the string tt in the order as indicated. The letter {\vee} serves purely as a symbol for concanation of strings.

Definition 2.2 (Variables permutation).

Let ψ:XX\psi:X\rightarrow X be a permutation (i.e. bijective function) on the variable set such that ψ(Xi)=Xi\psi(X_{i})=X_{i} (i=1,2i=1,2), and let ϕ:SS\phi:S\rightarrow S denote the canonically bijective map induced by ψ\psi. That is, set ϕ(a1an):=ψ(a1)ψ(an)\phi(a_{1}\ldots a_{n}):=\psi(a_{1})\ldots\psi(a_{n}) for aiAa_{i}\in A, where ψ(a):=a\psi(a):=a if aA\Xa\in A\backslash X.

Write PSP_{S} for the set of all such functions ϕ:SS\phi:S\rightarrow S, called variable permutation or transformation.

Definition 2.3 (Length function).

Under a length or cost function on a language SS with variables we mean a positive function L:SL:S\rightarrow{\mathbb{R}} satisfying

L()=0L(\emptyset)=0
L(xy)=L(x)+L(y)L(x{\vee}y)=L(x)+L(y)
L(s)=L(ϕ(s))L(s)=L(\phi(s))

for all x,ySx,y\in S and ϕPS\phi\in P_{S}.

In other words, the empty word has no length, the length of two concanated strings is the sum of of the length of the single strings, and the length or cost of a string does not change if we just choose other variable names.

Lemma 2.4.

The map f:S×SSf:S\times S\rightarrow S

f(s,t)=stf(s,t)=s{\vee}t

is an associative semigroup operation on SS.

Two sentences are completely unrelated if they have no variables in common because they do not speak about the same variables:

Definition 2.5 (Independence).

Two elements x,ySx,y\in S are called independent, notated by xyx\bot y, if and only if xx and yy have no variables in common.

(In other words, a variable showing up in the string xx must not show up in the string yy.)

In languages like mathematical of physical theories, one can typically express formulas in formally different ways but which are equivalent in content (for example, formally different equations may have the same solution). That is why we need an equivalence relation on the set of sentences, and the following axioms are the minimum natural ones we postulate:

Definition 2.6 (Equivalence relation).

Let be given an equivalence relation \equiv on SS such that the following relations hold for all a,b,s,t,xSa,b,s,t,x\in S and ϕPS\phi\in P_{S}:

Neutral element:

(1) ssss{\vee}\emptyset\equiv\emptyset{\vee}s\equiv s

Commutativity:

(2) astbatsba{\vee}s{\vee}t{\vee}b\equiv a{\vee}t{\vee}s{\vee}b

{\vee} preserves equivalent elements in case of independence:

(3) st,sx,txsxtxs\equiv t,\;s\bot x,\;t\bot x\quad\Rightarrow\quad s{\vee}x\equiv t{\vee}x

A superfluous copy can be removed:

(4) ϕ(x)xxϕ(x)x\phi(x)\bot x\quad\Rightarrow\quad x{\vee}\phi(x)\equiv x

Variable transformation does not change:

(5) xϕ(x)x\equiv\phi(x)

If xx contains no main variables then require:

(6) xx\equiv\emptyset

Let us revisit some aspects of the last definition. Relation (2) means it is unimportant in which order we list our sentences separated by {\vee}, like it is unimportant in which order we list the single formulas in a longer theory, or the collection of subroutines in a longer computer program. The connection between the single sentences is regulated solely by the variables they have in common. That is why relation (3) says that the semigroup operation {\vee} of concatenation does not preserve the equivalence relation in general, but only if the concatenated sentences are unrelated. Thus we also needed to incorporate aa and bb in commutativity (2). Relation (5) tells us that it is unimportant which variables we choose (but only as a whole sentence). Relation (4) says if a theory is restated and fused with the original theory with other variables then essentially nothing new is said and the so new formed theory collapses to the original theory. The relation (6) is actually theoretically unimportant in the sense that it does not shine to the later abstract approach stated in definition 2.15 below through and may thus be omitted, but it is essential in the examples that we shall consider and are the only difference and the point what the distinction between main and auxiliary variables is all about. (See also example 10.1.)

We use class brackets as in [x][x] for classes in quotients, where xx is a representative. The limits of sequences and the compare of all real valued functions f:Yf:Y\rightarrow{\mathbb{R}} are always understood to be taken pointwise (i.e. fgf\leq g iff f(x)g(x)f(x)\leq g(x) for all x,yYx,y\in Y).

The whole theory of this paper is mainly build only on the quotient set of sentences:

Definition 2.7 (Quotient).

Set S¯:=S/\overline{S}:=S/\equiv to be the set-theoretical quotient.

On the quotient of sentences we naturally form an operation of concatenation by concatenating different sentences by choosing copies of them with disjoint variables chosen:

Definition 2.8 (Operation).

Define :S¯×S¯S¯\nabla:\overline{S}\times\overline{S}\rightarrow\overline{S} by

[x][y]:=[xϕ(y)][x]\nabla[y]:=[x{\vee}\phi(y)]

for all x,ySx,y\in S, where ϕPS\phi\in P_{S} (dependent on x,yx,y) is chosen such that xϕ(y)x\bot\phi(y).

We will then mainly work with an abstract abelian monoid that actually comes out so far:

Lemma 2.9.

(S¯,,[])(\overline{S},\nabla,[\emptyset]) is an abelian, idempotent monoid (i.e. xx=xx\nabla x=x for all xS¯x\in\overline{S}).

Proof.

Indeed, by (5) and (3) the above definition of \nabla is independent of the chosen ϕPS\phi\in P_{S}. By (1), [][\emptyset] is a null-element, and by (2) we get commutativity. Idempotence follows from (4). ∎

On the quotient set of sentences we essentially define a length function \ell by choosing the shortest possible length LL among the equivalent representatives (i.e. ([x])=infyxL(y)\ell([x])=\inf_{y\equiv x}L(y)), but to ensure that the length of sts\nabla t is always bigger or equal than the length of ss for s,ts,t in the quotient S¯\overline{S}, we correct this definition by the formula of definition 7.9. Thus the following definition of \ell is just a somewhat obscured reformulation of defining \ell at first as in definition 9.1 and then correcting it by definition 7.9:

Definition 2.10 (Length function on quotient).

Set (xSx\in S)

:S¯\ell:\overline{S}\rightarrow{\mathbb{R}}
([x])=inf{L(y)|y,aS,xa,yxa}\ell([x])=\inf\{L(y)\in{\mathbb{R}}|\,y,a\in S,\;x\bot a,\;y\equiv x{\vee}a\}

We shall conveniently denote :=[]S¯\emptyset:=[\emptyset]\in\overline{S}. We get very natural properties for the length function which we later will postulate axiomatically for abelian monoids:

Lemma 2.11.

\ell is positive and for all x,yS¯x,y\in\overline{S} we have:

(7) ()=0\ell(\emptyset)=0

Subadditivity:

(8) (xy)(x)+(y)\ell(x\nabla y)\leq\ell(x)+\ell(y)

Monotone increasingness:

(9) (x)(xy)\ell(x)\leq\ell(x\nabla y)
Proof.

Let ε>0\varepsilon>0. Given x1,x2Sx_{1},x_{2}\in S, by definition 2.10 choose y1,y2,a1,a2Sy_{1},y_{2},a_{1},a_{2}\in S such that ([xi])L(yi)+ε\ell([x_{i}])\geq L(y_{i})+\varepsilon where xiaiyix_{i}{\vee}a_{i}\equiv y_{i} and xiaix_{i}\bot a_{i} (i=1,2i=1,2).

Subadditivity: Choose any ϕPS\phi\in P_{S} such that y1ϕ(y2)y_{1}\bot\phi(y_{2}). Then

([x1])+([x2])+2εL(y1)+L(ϕ(y2))=L(y1ϕ(y2))([y1)ϕ(y2)])\ell([x_{1}])+\ell([x_{2}])+2\varepsilon\geq L(y_{1})+L(\phi(y_{2}))=L(y_{1}{\vee}\phi(y_{2}))\geq\ell([y_{1}){\vee}\phi(y_{2})])
=([y1][y2])=([x1a1][x2a2])=([x1][x2][a1][a2])([x1][x2])=\ell([y_{1}]\nabla[y_{2}])=\ell([x_{1}{\vee}a_{1}]\nabla[x_{2}{\vee}a_{2}])=\ell([x_{1}]\nabla[x_{2}]\nabla[a_{1}]\nabla[a_{2}])\geq\ell([x_{1}]\nabla[x_{2}])

where the last inequality follows from monotone increasingness, which we prove next:

([x][a])+ε=([xϕ(a)])+εL(y)([x])\ell([x]\nabla[a])+\varepsilon=\ell([x{\vee}\phi(a)])+\varepsilon\geq L(y)\geq\ell([x])

where, given x,aSx,a\in S, y,bSy,b\in S is chosen such that yxϕ(a)by\equiv x{\vee}\phi(a){\vee}b and the before last inequality holds. ∎

We often abbreviate the notion “monotonically increasing” for (9) simply by “monotone” or “increasing”. Let xy:=min(x,y)x{\wedge}y:=\min(x,y) and xy:=max(x,y)x{\vee}y:=\max(x,y) denote the minimum and maximum, respectively, of two real numbers x,yx,y\in{\mathbb{R}}.

The length function induces now two very natural notions of distance functions between sentences:

Definition 2.12 (“Metric”).

We define two functions d,σ:S¯×S¯d,\sigma:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} by

d(s,t)=(st)(s)(t)d(s,t)=\ell(s\nabla t)-\ell(s){\wedge}\ell(t)
σ(s,t)=2(st)(s)(t)\sigma(s,t)=2\ell(s\nabla t)-\ell(s)-\ell(t)

We sometimes call these functions also more precisely d:=dd_{\ell}:=d and σ:=σ\sigma_{\ell}:=\sigma. They have the characteristic of distance between elements of S¯\overline{S}:

As an illustrative example let us recall that in his famous work [4], Dirac took the then known quantum numbers valued Klein-Gordon function ψ:4B(H)\psi:{\mathbb{R}}^{4}\rightarrow B(H) (operators on Hilbert space) defined on spacetime 4{\mathbb{R}}^{4} and the Klein-Gordon differential equation Δψtψm2ψ=0\Delta\psi-\partial_{t}\psi-m^{2}\psi=0 (mm\in{\mathbb{R}} constant) and took the formal square-root of the Klein-Gorden differential operator K:=Δtm2K:=\Delta-\partial_{t}-m^{2} and formed the Dirac differential operator D:=KD:=\sqrt{K} with constant coefficients in matrix algebra M4()M_{4}({\mathbb{C}}), that is, D2=KD^{2}=K, and proposed the physical equation Dψ=0D\psi=0 and along with it predicted correctly the existence of particles and anti-particles [5]. This is a prototype example where an existing physical theory is slightly modified to obtain a new theory. That is, the Klein-Gordon equation is close to the Dirac equation, also in our “metric”. Let us compute this:

Example 2.13.

The prototype example is physical theories. SS are the collections of all sentences, which are equations, formulas, function definitions and so on. Equivalence relation \equiv is when two formulas are equivalent in a mathematical sense. The {\vee} means “and” in the sense that this and the other formula/equation holds. The equivalence relation \equiv has to be enriched by the axioms of definition 2.6. Main variables are those that cannot be deleted, like the electromagnetic field in Maxwell’s equations, because it is what the theory is about. The length function LL could be a simple character count. See also the beginning of section 10 including example 10.1 for further explanations.

For example, consider the free Klein-Gordon equation Δψ=t2ψ+m2ψ\Delta\psi=\partial_{t}^{2}\psi+m^{2}\psi and the Dirac equation Dψ=0D\psi=0, where D2=Δt2m2=:KD^{2}=\Delta-\partial_{t}^{2}-m^{2}=:K , and where in both cases ψ\psi is a main variable. Then

d([Dirac],[Klein-Gordon])=d([Dψ=0D2=Δt2m2],[(Δt2m2)ψ=0])d([\mbox{Dirac}],[\mbox{Klein-Gordon}])=d([D\psi=0{\vee}D^{2}=\Delta-\partial_{t}^{2}-m^{2}],[(\Delta-\partial_{t}^{2}-m^{2})\psi=0])
=([Dψ=0D2=Δt2m2(Δt2m2)ϕ=0])([(Δt2m2)ψ=0])=\ell([D\psi=0{\vee}D^{2}=\Delta-\partial_{t}^{2}-m^{2}{\vee}(\Delta-\partial_{t}^{2}-m^{2})\phi=0])-\ell([(\Delta-\partial_{t}^{2}-m^{2})\psi=0])
=([Dψ=0D2=KK:=Δt2m2Kϕ=0])([(Δt2m2)ψ=0])=\ell([D\psi=0{\vee}D^{2}=K{\vee}K:=\Delta-\partial_{t}^{2}-m^{2}{\vee}K\phi=0])-\ell([(\Delta-\partial_{t}^{2}-m^{2})\psi=0])
2313=10\approx 23-13=10

whereas ([Dirac])=15\ell([\mbox{Dirac}])=15 and ([Klein-Gordon])13\ell([\mbox{Klein-Gordon}])\approx 13.

Here, KK and DD are auxiliary variables. If we had a longer formula of KK (i.e. the Klein-Gordon operator) then the above distance d(D,KG)d(D,KG) would not change, but (D)\ell(D) and (KG)\ell(KG) would increase.

Note that we did not explicitly transform the variables x1,x2,x3,tx_{1},x_{2},x_{3},t in (Δt2)ϕ(x,t)(\Delta-\partial_{t}^{2})\phi(x,t) as required by the \nabla-operation of definition 2.8 because they are bound variables which we can label as we like without change, and so have back transformed.

Example 2.14.

Another standard example is computer languages. For example CC++. Sentences are meaningful programs or code snippets. Two programs are equivalent, if they do the same, for example, independent of time performance. Typically, the entry point of a program is a main variable. Set L(p)L(p) to be the length of a program pp. Set {\vee} to be a concatenation of programs.

Variables are the typical variables in higher computer languages. The variable transformation in the \nabla-operation of definition 2.8 ensures that the variables of two different programs do not overlap.

The more two programs s,tS¯s,t\in\overline{S} have in common, the lower d(s,t)d(s,t) becomes.

For example, if ss and tt have one common routine, then it is enough to keep one copy of them in sts{\vee}t.

Throughout, if nothing else is said, we switch to the following general setting for S¯,\overline{S},\nabla and \ell. Only if we discuss something involving SS itself, we implicitly assume without saying that we use the above specific definitions of S¯,\overline{S},\nabla and \ell.

Definition 2.15 (More abstract definition).

Let (S¯,,)(\overline{S},\nabla,\emptyset) be an abelian, idempotent monoid.

We call :S¯\ell:\overline{S}\rightarrow{\mathbb{R}} a length function if the assertions of lemma 2.11 hold for it.

(x)\ell(x) is also called the information content of xS¯x\in\overline{S}.

Positivity of \ell is also automatic by idempotence and subadditivity, as (x)=(xx)2(x)\ell(x)=\ell(x\nabla x)\leq 2\ell(x). Instead of the above specific metric candidates dd and σ\sigma from above, we shall sometimes use the following generalization:

Definition 2.16 (Metric candidate).

We call d:S¯×S¯d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} a positive, symmetric, nilpotent function if

d(x,y)0,d(x,y)=d(y,x),d(x,x)=0d(x,y)\geq 0,\quad d(x,y)=d(y,x),\quad d(x,x)=0

for all x,yS¯x,y\in\overline{S}.

dd is called a pseudometric on S¯\overline{S} if it is positive, symmetric, nilpotent and satisfies the Δ\Delta-inequality (see definition 3.1).

3. Δ\Delta-inequality

The following section is a first discussion when the proposed distance functions dd and σ\sigma of defintion 2.12 could indeed satisfy the well known triangle inequality (here also called Δ\Delta-inequality). This is actually a main theme of this paper and first satisfying answers are given in theorems 6.10 and 7.21.

Let (S¯,,)(\overline{S},\nabla,\emptyset) be an abelian, idempotent monoid.

Definition 3.1.

A function d:S¯×S¯d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} is said to satisfy the Δ\Delta-inequality if

d(x,z)d(x,y)+d(y,z)d(x,z)\leq d(x,y)+d(y,z)

for all x,y,zS¯x,y,z\in\overline{S}

Lemma 3.2.

If a symmetric function d:S¯×S¯d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} satisfies the Δ\Delta-inequality then also the second Δ\Delta-inequality (i.e. second triangle inequality)

|d(x,z)d(z,y)|d(x,y)|d(x,z)-d(z,y)|\leq d(x,y)

To heuristically see that dd of definition 2.12 in the context of formal languages could satisfy the Δ\Delta-inequality we do the following considerations. Assume a sentence sSs\in S is written in its shortest form as ss1sns\equiv s_{1}{\vee}\ldots{\vee}s_{n} for siSs_{i}\in S, that is, such that ([s])L(s1sn)=L(s1)+L(sn)\ell([s])\approx L(s_{1}{\vee}\ldots{\vee}s_{n})=L(s_{1})+\ldots L_{(}s_{n}). Similarly we do so for a given sentence tt1tmt\equiv t_{1}{\vee}\ldots t_{m}. Many sis_{i} could be auxiliary function definitions, some of which may be identical to some tjt_{j}. As a first approximation we may estimate

([s][t])=([sϕ(t)])=([s1snϕ(t1)ϕ(tm)])\ell([s]\nabla[t])=\ell([s{\vee}\phi(t)])=\ell([s_{1}{\vee}\ldots s_{n}{\vee}\phi(t_{1}){\vee}\ldots\phi(t_{m})])
=([s1snψ(t1)ψ(tk)])L(s1)++L(sn)+L(t1)++L(tk)=\ell([s_{1}{\vee}\ldots s_{n}{\vee}\psi(t_{1}){\vee}\ldots\psi(t_{k})])\leq L(s_{1})+\ldots+L(s_{n})+L(t_{1})+\ldots+L(t_{k})

where in the last line we have dropped identical auxiliary function definitions ϕ(tk+1),,ϕ(tm)\phi(t_{k+1}),\ldots,\phi(t_{m}) which appear already in the sis_{i}s, and so instead of calling ϕ(tj)\phi(t_{j}) we may call sis_{i} and that is why we had to introduce another transformation ψ\psi and dropped the superfluous ϕ(tj)\phi(t_{j}) by (6). In a certain sense, we formed the set theoretical union of {s1,,sn}\{s_{1},\ldots,s_{n}\} and {t1,,tm}\{t_{1},\ldots,t_{m}\}.

If we sloppily replace the above \leq by \approx and view LL as a measure, then the Δ\Delta-inequality holds:

Lemma 3.3.

Let (X,𝒜,μ)(X,{\mathcal{A}},\mu) be a measured space. Set 𝒜0:={A𝒜|μ(A)<}{\mathcal{A}}_{0}:=\{A\in{\mathcal{A}}|\,\mu(A)<\infty\} to be the collection of finite, measurable sets. Then

d(A,B)\displaystyle d(A,B) :=\displaystyle:= μ(AB)μ(A)μ(B)\displaystyle\mu(A\cup B)-\mu(A)\wedge\mu(B)
σ(A,B):=2μ(AB)μ(A)μ(B)\sigma(A,B):=2\mu(A\cup B)-\mu(A)-\mu(B)

where A,B𝒜0A,B\in{\mathcal{A}}_{0}, define metrics on 𝒜0{\mathcal{A}}_{0}.

Proof.

Generally we have

(10) μ(XZ)μ(X)=μ(Z\X)=μ(Z)μ(ZX)\mu(X\cup Z)-\mu(X)=\mu(Z\backslash X)=\mu(Z)-\mu(Z\cap X)

We only prove the dd-case. (i) Assume μ(Y)μ(X)μ(Z)\mu(Y)\leq\mu(X)\leq\mu(Z). Then we have equivalences

μ(XZ)μ(X)μ(XY)+μ(YZ)2μ(Y)\mu(X\cup Z)-\mu(X)\leq\mu(X\cup Y)+\mu(Y\cup Z)-2\mu(Y)
μ(Z)μ(ZX)μ(X)μ(XY)+μ(Z)μ(ZY)\mu(Z)-\mu(Z\cap X)\leq\mu(X)-\mu(X\cap Y)+\mu(Z)-\mu(Z\cap Y)
(11) μ(XY)+μ(ZY)μ(ZX)+μ(X)\mu(X\cap Y)+\mu(Z\cap Y)\leq\mu(Z\cap X)+\mu(X)

But this is true because in general we have

μ(XY)+μ(ZY)μ(Y)+μ(XYZ)\mu(X\cap Y)+\mu(Z\cap Y)\leq\mu(Y)+\mu(X\cap Y\cap Z)

(ii) Assume μ(X)μ(Y)μ(Z)\mu(X)\leq\mu(Y)\leq\mu(Z). Doing the same as above leads to (11) but with μ(X)\mu(X) replaced with μ(Y)\mu(Y) and this is then also true by the above argument.

(iii) Assuming μ(X)μ(Z)μ(Y)\mu(X)\leq\mu(Z)\leq\mu(Y) leads to (11) but with μ(X)\mu(X) replaced with 2μ(Y)μ(X)2\mu(Y)-\mu(X) and this is then also true. ∎

Actually dd is like the maximum’s norm or \ell^{\infty}-norm, and σ\sigma like the 1\ell^{1}-norm in n{\mathbb{R}}^{n} or space of real-valued sequences:

Lemma 3.4.

Continuing the setting of the last lemma, we have dσd\leq\sigma and

d(A,B)=max(μ(A\B),μ(B\A))d(A,B)=\max(\mu(A\backslash B),\mu(B\backslash A))
σ(A,B)=μ(A\B)+μ(B\A)\sigma(A,B)=\mu(A\backslash B)+\mu(B\backslash A)

This is obvious by (10). Actually, σ\sigma as just formulated is well-known to be a metric, dd might be less used. We continue by imposing definitions 2.15 and 2.12.

Switch back to the notions of definitions 2.15 and 2.16 for S¯,,d\overline{S},\ell,d and σ\sigma.

Lemma 3.5.

For all x,yS¯x,y\in\overline{S} we have

(x)(y)(xy)\ell(x){\vee}\ell(y)\leq\ell(x\nabla y)
|(x)(y)|d(x,y)(x)(y)|\ell(x)-\ell(y)|\leq d(x,y)\leq\ell(x){\vee}\ell(y)
|(x)(y)|σ(x,y)(x)+(y)|\ell(x)-\ell(y)|\leq\sigma(x,y)\leq\ell(x)+\ell(y)
(12) d(x,y)σ(x,y)2d(x,y)d(x,y)\leq\sigma(x,y)\leq 2d(x,y)
d(x,)=σ(x,)=(x)d(x,\emptyset)=\sigma(x,\emptyset)=\ell(x)
Proof.

The first relation follows from (9). If (x)(y)\ell(x)\leq\ell(y) then |(x)(y)|=(y)(x)(xy)(x)=d(x,y)(x)+(y)(x)=(y)|\ell(x)-\ell(y)|=\ell(y)-\ell(x)\leq\ell(x\nabla y)-\ell(x)=d(x,y)\leq\ell(x)+\ell(y)-\ell(x)=\ell(y) by (9), showing the second line. The third line follows from the similarly proved (12) and (9). ∎

This criterion for the triangle inequality will be extensively used in section 6:

Lemma 3.6.

σ\sigma satisfies the Δ\Delta-inequality if and only if

(xz)+(y)(xy)+(yz)\ell(x\nabla z)+\ell(y)\leq\ell(x\nabla y)+\ell(y\nabla z)

for all x,y,zS¯x,y,z\in\overline{S}.

Moreover, this equivalence is true for any given function :S¯\ell:\overline{S}\rightarrow{\mathbb{R}}.

Proof.

Divide the Δ\Delta-inequality spelled-out by 22 to see the equivalence:

2(xz)(x)(z)2(xy)(x)(y)+2(yz)(y)(z)2\ell(x\nabla z)-\ell(x)-\ell(z)\leq 2\ell(x\nabla y)-\ell(x)-\ell(y)+2\ell(y\nabla z)-\ell(y)-\ell(z)

Interestingly, if σ\sigma satisfies the triangle inequality then \ell must be a length function:

Lemma 3.7.

Given any function :S¯\ell:\overline{S}\rightarrow{\mathbb{R}}, if σ\sigma satisfies the Δ\Delta-inequality, then \ell is monotonically increasing.

Moreover it is positive and subadditive if ()0\ell(\emptyset)\geq 0, and even a length function (def. 2.15) if ()=0\ell(\emptyset)=0.

Proof.

Set z=z=\emptyset in lemma 3.6 to get monotonically increasingness, y=y=\emptyset to get subaddtivity, and positivity by 0()(x)0\leq\ell(\emptyset)\leq\ell(x). ∎

4. \nabla-inequality

An important ingredient of our theory is the \nabla-inequality defined and first discussed in this section. It is fundamental in the proof of lemma 7.14, and thus for the whole section 7, theorem 7.14 and corollary 7.24.

Let (S¯,,)(\overline{S},\nabla,\emptyset) be an abelian, idempotent monoid.

Definition 4.1.

A function d:S¯×S¯d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} is said to satisfy the \nabla-inequality if

d(xy,ab)d(x,a)+d(y,b)d(x\nabla y,a\nabla b)\leq d(x,a)+d(y,b)

for all x,y,a,bS¯x,y,a,b\in\overline{S}

This may be heuristically compared with normed vector spaces by setting d(x,y):=xyd(x,y):=\|x-y\|, xy:=x+yx\nabla y:=x+y and having

x+yabxa+yb\|x+y-a-b\|\leq\|x-a\|+\|y-b\|

Going back to the remark before lemma 3.3 and the setting of this lemma, we heuristically argue that the \nabla-equality may hold:

Lemma 4.2.

Let dd be as in lemma 3.3. Then

d(AB,CD)d(A,C)+d(B,D)d(A\cup B,C\cup D)\leq d(A,C)+d(B,D)
Proof.

Write |A|:=μ(A)|A|:=\mu(A). We have to show that

(13) |ABCD||AB||CD||AC||A||C|+|BD||B||D||A\cup B\cup C\cup D|-|A\cup B|{\wedge}|C\cup D|\leq|A\cup C|-|A|{\wedge}|C|+|B\cup D|-|B|{\wedge}|D|

By symmetry, just exchange ACA\leftrightarrow C and BDB\leftrightarrow D, it is enough to consider the case |AB||CD||A\cup B|\leq|C\cup D|.

Recalling (10), the left hand side of (13) is

|(CD)\(AB)|=|(CD)AcBc||(C\cup D)\backslash(A\cup B)|=|(C\cup D)\cap A^{c}\cap B^{c}|
|CAc|+|DBc|=|C\A|+|D\B|\leq|C\cap A^{c}|+|D\cap B^{c}|=|C\backslash A|+|D\backslash B|

Case |A||C|,|B||D||A|\leq|C|,|B|\leq|D|: In this case, the last inequality verifies (13) by (10).

Case |A||C|,|B||D||A|\leq|C|,|B|\geq|D|: In this case, |D\B||B\D||D\backslash B|\leq|B\backslash D|. Hence the last inequality verifies (13). The third case is analogous. ∎

Lemma 4.3.

If a nilpotent function dd satisfies the \nabla-inequality then (x,y,zS¯\forall x,y,z\in\overline{S})

(14) d(xz,yz)d(x,y)d(x\nabla z,y\nabla z)\leq d(x,y)

We say that a function dd satisfies the weak \nabla-inequality if it satisfies the inequality (14), and only the very weak one, if this holds only for xyx\leq y (see definition 5.1).

We finally remark that given a (possibly infinite) sequence of pseudometrics in our setting, we may sum them up to get possibly a more faithful pseudometric:

Lemma 4.4.

The set of length functions on S¯\overline{S} is a positive cone, i.e. if 1,2\ell_{1},\ell_{2} are length functions, then also λ11+λ22\lambda_{1}\ell_{1}+\lambda_{2}\ell_{2} for constants λ1,λ20\lambda_{1},\lambda_{2}\geq 0 in {\mathbb{R}}. Similarly, the functions dd satisfying the \nabla-inequality (or the Δ\Delta-inequality, or those of the form σ=σ\sigma=\sigma_{\ell}) are a positive cone, and one has σλ11+λ22=λ1σ1+λ2σ2\sigma_{\lambda_{1}\ell_{1}+\lambda_{2}\ell_{2}}=\lambda_{1}\sigma_{\ell_{1}}+\lambda_{2}\sigma_{\ell_{2}}.

5. Order relation

In this section we consider a natural order relation on an abelian idempotent monoid induced by the monoid operation. This is not only important in various occasions in the theory like in the monotone increasingness of \ell and in various aspects in the next section, but also a fundamental relation for information itself. It says when a theory is information-theoretically contained in another theory, and because we work with equivalence relations on theories, this works also for formally very different looking representatives of theories. This effect will even become more intense when we shall divide out equivalences induced by a pseudometric in definition 5.7.

In this section, if nothing else is said, we continue with the setting and notions of definitions 2.15 and 2.12.

Definition 5.1.

Define an order relation \leq on S¯\overline{S} by (x,yS¯)(\forall x,y\in\overline{S})

xyxy=yx\leq y\quad\Leftrightarrow\quad x\nabla y=y

The following three lemmas are straightforward to check, and the reader should recall the monotone increasingness of \ell, see (9), which is often used, as it is in lemmas 5.5 and 5.6.

Lemma 5.2.

(i) This is an order relation on S¯\overline{S}.

(ii) For all a,b,x,yS¯a,b,x,y\in\overline{S} we have

ax,byabxya\leq x,b\leq y\quad\Rightarrow\quad a\nabla b\leq x\nabla y

(iii) S¯\overline{S} is a directed set as

a,bab\emptyset\leq a,b\leq a\nabla b

The length function \ell preserves the order and dd and σ\sigma have particular simple formulas for ordered pairs:

Lemma 5.3.

For all a,b,cS¯a,b,c\in\overline{S} we have

ab(a)(b)a\leq b\quad\Rightarrow\quad\ell(a)\leq\ell(b)
abd(a,b)=σ(a,b)=(b)(a)a\leq b\quad\Rightarrow\quad d(a,b)=\sigma(a,b)=\ell(b)-\ell(a)
abcd(a,c)=d(a,b)+d(b,c),σ(a,c)=σ(a,b)+σ(b,c)a\leq b\leq c\quad\Rightarrow\quad d(a,c)=d(a,b)+d(b,c),\quad\sigma(a,c)=\sigma(a,b)+\sigma(b,c)

Here are some further relations (δy\delta_{y} is defined below) for dd and σ\sigma under ordered pairs:

Lemma 5.4.

For all a,b,x,yS¯a,b,x,y\in\overline{S} we have

(15) σ(x,xy)=d(x,xy)=δy(x)\sigma(x,x\nabla y)=d(x,x\nabla y)=-\delta_{y}(x)
σ(x,xy)+σ(y,xy)=(x)(y)-\sigma(x,x\nabla y)+\sigma(y,x\nabla y)=\ell(x)-\ell(y)
σ(a,b)=d(a,ab)+d(b,ab)=σ(a,ab)+σ(b,ab)\sigma(a,b)=d(a,a\nabla b)+d(b,a\nabla b)=\sigma(a,a\nabla b)+\sigma(b,a\nabla b)
d(a,b)=d(a,ab)d(b,ab)=σ(a,ab)σ(b,ab)d(a,b)=d(a,a\nabla b){\vee}d(b,a\nabla b)=\sigma(a,a\nabla b){\vee}\sigma(b,a\nabla b)

In other words, if we know the values of dd or σ\sigma for all aba\leq b, then we know them for all a,ba,b, and for both dd and σ\sigma.

The following lemma reverses lemma 5.3. It also states in words that a theory is contained in another theory (i.e. xyx\leq y) if and only if the information content of the united theory is the information content of the bigger theory (i.e. (xy)=(y)\ell(x\nabla y)=\ell(y)).

Lemma 5.5.

For all x,yS¯x,y\in\overline{S}, the following assertions are equivalent:

d(x,y)=(y)(x)d(y,xy)=0(xy)=(y)d(x,y)=\ell(y)-\ell(x)\quad\Leftrightarrow\quad d(y,x\nabla y)=0\quad\Leftrightarrow\quad\ell(x\nabla y)=\ell(y)
σ(x,y)=(y)(x)σ(y,xy)=0\Leftrightarrow\quad\sigma(x,y)=\ell(y)-\ell(x)\quad\Leftrightarrow\quad\sigma(y,x\nabla y)=0

If dd or σ\sigma is faithful (i.e. for all x,yS¯x,y\in\overline{S} one has d(x,y)=0x=yd(x,y)=0\Rightarrow x=y) then these relations are also equivalent to

xyx\leq y
Proof.

(i) if d(x,y)=(y)(x)0d(x,y)=\ell(y)-\ell(x)\geq 0 then (y)(x)\ell(y)\geq\ell(x) and so d(x,y)=(xy)(x)d(x,y)=\ell(x\nabla y)-\ell(x) and thus (xy)=(y)(x)\ell(x\nabla y)=\ell(y)\geq\ell(x) and so

d(y,xy)=(xy)(y)=0d(y,x\nabla y)=\ell(x\nabla y)-\ell(y)=0

These implications can be reversed.

The same equivalence with dd replaced by σ\sigma is analogously proved.

(ii) If dd is faithful this is equivalent to saying that y=xyy=x\nabla y

Lemma 5.6.

For all x,yS¯x,y\in\overline{S} we have the equivalences

d(x,y)=0σ(x,y)=0(xy)=(x)=(y)d(x,y)=0\quad\Leftrightarrow\quad\sigma(x,y)=0\quad\Leftrightarrow\quad\ell(x\nabla y)=\ell(x)=\ell(y)

In particular, dd is faithful iff σ\sigma is faithful.

Proof.

If σ(x,y)=2(xy)(x)(y)=00\sigma(x,y)=2\ell(x\nabla y)-\ell(x)-\ell(y)=0\leq 0 then, because (x)+(y)2(xy)\ell(x)+\ell(y)\leq 2\ell(x\nabla y), we have 2(x)2(xy)=(x)+(y)2\ell(x)\leq 2\ell(x\nabla y)=\ell(x)+\ell(y) and so (xy)=(x)=(y)\ell(x\nabla y)=\ell(x)=\ell(y). The first equivalence is by (12). ∎

Typically, later in this paper we will construct pseudometrics satisfying the \nabla-inequality and in the next definition and proposition we state how the given structures on the abelian idempotent monoid carry over naturally to the quotient when turning a pseudometric to a metric.

Definition 5.7 (From pseudometric to metric).

Assume that a pseudometric dd on S¯\overline{S} satisfies the \nabla-inequality.

Define an equivalence relation \cong on S¯\overline{S} by

xyd(x,y)=0x,yS¯x\cong y\quad\Leftrightarrow\quad d(x,y)=0\qquad\forall x,y\in\overline{S}

Set Sˇ:=S¯/{\check{S}}:=\overline{S}/\cong to be the set-theoretical quotient. Define operations, relations and functions on Sˇ{\check{S}} by (for all x,yS¯x,y\in\overline{S})

[x][y]:=[xy][x]\nabla[y]:=[x\nabla y]
[x][y]:[y]=[x][y][x]\leq[y]:\Leftrightarrow[y]=[x]\nabla[y]
dˇ([x],[y]):=d(x,y){\check{d}}([x],[y]):=d(x,y)
ˇ([x]):=dˇ([x],[]){\check{\ell}}([x]):=\check{d}([x],[\emptyset])

Then the so defined language Sˇ{\check{S}} with equivalence relation satisfies the concepts of section 2 again:

Proposition 5.8.

If a pseudo-metric dd on S¯\overline{S} satisfies the \nabla-inequality then:

(i) The \nabla operation passes to Sˇ\check{S}, that is, the above (Sˇ,,[])(\check{S},\nabla,[\emptyset]) is an abelian, idempotent monoid with order relation as in definition 5.1, and is provided with a metric dˇ\check{d} satisfying the \nabla-inequality.

(ii) In case S¯=S/\overline{S}=S/\equiv as of definition 2.7, then we may canonically identify Sˇ=S/2\check{S}=S/\equiv_{2} for an equivalence relation 2\equiv_{2} on SS satisfying the same rules of definition 2.6. (So Sˇ\check{S} is again of the form S¯\overline{S}.)

(iii) In case d=dd=d_{\ell} (or d=σd=\sigma_{\ell}), the above defined ˇ\check{\ell} is a length function on Sˇ\check{S} and

dˇ=dˇ(or σˇ=σˇ){\check{d_{\ell}}}=d_{{\check{\ell}}}\qquad(\mbox{or }\quad{\check{\sigma_{\ell}}}=\sigma_{{\check{\ell}}})
Proof.

(i) The \nabla operation is well-defined on S~\tilde{S} as if xxx\cong x^{\prime} and yyy\cong y^{\prime} then

d(xy,xy)d(x,x)+d(y,y)=0d(x\nabla y,x^{\prime}\nabla y^{\prime})\leq d(x,x^{\prime})+d(y,y^{\prime})=0

(ii) Let α:SS¯/\alpha:S\rightarrow\overline{S}/\cong the canonical map and 2\equiv_{2} the associated equivalence relation x2yx\equiv_{2}y if and only if α(x)=α(y)\alpha(x)=\alpha(y) for all x,ySx,y\in S. As all rules of definition hold for \equiv, they hold also for 2\equiv_{2}, excepting (3) needs explanation. But by lemma 4.3 we have

s2td([s],[t])=00=d([s][x],[t][x])=d([sx],[tx])sx2txs\equiv_{2}t\Leftrightarrow d([s],[t])=0\Rightarrow 0=d([s]\nabla[x],[t]\nabla[x])=d([s{\vee}x],[t{\vee}x])\Rightarrow s{\vee}x\equiv_{2}t{\vee}x

for s,txSs,t\bot x\in S

(iii) By definition

ˇ([x][y])ˇ([x])=d(xy,)d(x,)=(xy)(y)=d(xy,x)0\check{\ell}([x]\nabla[y])-\check{\ell}([x])=d_{\ell}(x\nabla y,\emptyset)-d_{\ell}(x,\emptyset)=\ell(x\nabla y)-\ell(y)=d(x\nabla y,x)\geq 0

so ˇ\check{\ell} is monotonically increasing, and its subadditivity follows from the \nabla-inequality of dˇ\check{d}. ∎

6. Measure of Boolean expressions

We have seen in the considerations before lemma 3.3, that somehow, (xy)\ell(x\nabla y) is the measure of the union of xx and yy. In this section we want to extend the notion of a measure also to other Boolean expressions, like the intersection xyx\cap y of sentences xx and yy.

There is not necessarily a sentence that represents the intersection or the set-theoretical difference x\yx\backslash y of two sentences x,yx,y, see example 10.11. But we can still compute the “measure” of these non-existing, imagined elements.

Let us write formally ζ(E)\zeta(E) for the measure of a Boolean expression EE, for example ζ(xy)\zeta(x\cap y). We will give precise definitions in this section.

In this section, we assume the setting and notations of definitions 2.15 and 2.12, but suppose that \ell is possibly non-increasing, that is, we cease requiring (9).

Definition 6.1 (Measure of intersection).

For all x,yS¯x,y\in\overline{S} we set

ζ(xy):=(x)+(y)(xy)\zeta(x\cap y):=\ell(x)+\ell(y)-\ell(x\nabla y)

An illustrative example for ζ(xy)\zeta(x\cap y) would be the amount two computer programs xx and yy have in common, for example by identical or mathematically similar routines. The following function slightly extracts some core of the last definition:

Definition 6.2 (δ\delta-function).

We also set, for all x,yS¯x,y\in\overline{S},

δy(x):=(x)(xy)\delta_{y}(x):=\ell(x)-\ell(x\nabla y)

We say the measure of intersection is increasing if

ax,byζ(ab)ζ(xy)a\leq x,b\leq y\quad\Rightarrow\quad\zeta(a\cap b)\leq\zeta(x\cap y)

for all a,b,x,yS¯a,b,x,y\in\overline{S}, and δ\delta is increasing if

a,x,yS¯:axδy(a)δy(x)\forall a,x,y\in\overline{S}:\qquad a\leq x\quad\Rightarrow\quad\delta_{y}(a)\leq\delta_{y}(x)

The following couple of lemmas will cummulate to theorem 6.10 below.

Lemma 6.3.

We have the following equivalences:

The measure of intersection is increasing.

\Leftrightarrow ζ(xy)\zeta(x\cap y) is increasing in xS¯x\in\overline{S} for all fixed yS¯y\in\overline{S}.

\Leftrightarrow δ\delta is increasing.

Proof.

If δ\delta is increasing, then ζ(xy)=δy(x)+(y)\zeta(x\cap y)=\delta_{y}(x)+\ell(y) is increasing in xx for fixed yy, and so also in yy for fixed xx by symmetry of ζ(xy)\zeta(x\cap y). Combining both we get that the measure of intersection is increasing. ∎

Lemma 6.4.

If δ\delta is increasing and \ell monotone, then σ\sigma satisfies the Δ\Delta-inequality.

Proof.

δ\delta is increasing means that for all x,y,zS¯x,y,z\in\overline{S} we have

(16) (y)(yx)(yz)(yzx)\ell(y)-\ell(y\nabla x)\leq\ell(y\nabla z)-\ell(y\nabla z\nabla x)

Monotony of \ell implies

(zx)(yzx)\ell(z\nabla x)\leq\ell(y\nabla z\nabla x)

Thus we get the criterion of lemma 3.6, that is,

(17) (zx)+(y)(yz)+(yx)\ell(z\nabla x)+\ell(y)\leq\ell(y\nabla z)+\ell(y\nabla x)

Lemma 6.5.

If σ\sigma satisfies the Δ\Delta-inequality then δ\delta is increasing.

Proof.

Replace zz in (17), which holds by lemma 3.6, by zyz\nabla y to obtain (16). ∎

Lemma 6.6.

If δ\delta is increasing then σ\sigma satisfies the \nabla-inequality.

Proof.

By lemma 6.3, ρ(x,y):=ζ(xy)\rho(x,y):=\zeta(x\cap y) is increasing in x,yx,y simultaneously. Then ρ\rho is increasing implies ρ-\rho is decreasing, and applying this two times yields

2ρ(xy,ab)=2(xyab)2(xy)2(ab)-2\rho(x\nabla y,a\nabla b)=2\ell(x\nabla y\nabla a\nabla b)-2\ell(x\nabla y)-2\ell(a\nabla b)
ρ(x,a)ρ(y,b)=(xa)(x)(a)+(yb)(y)(b)\leq-\rho(x,a)-\rho(y,b)=\ell(x\nabla a)-\ell(x)-\ell(a)+\ell(y\nabla b)-\ell(y)-\ell(b)

A simple rearrangement of the summands in this inequality yields exactly

σ(xa,yb)σ(x,y)+σ(a,b)\sigma(x\nabla a,y\nabla b)\leq\sigma(x,y)+\sigma(a,b)

Lemma 6.7.

If dd or σ\sigma satisfies the very weak \nabla-inequality and \ell is monotone, then δ\delta is increasing.

Proof.

The very weak \nabla-inequality of dd implies

d(y,yz)d(yx,yzx)-d(y,y\nabla z)\leq-d(y\nabla x,y\nabla z\nabla x)

which by the monotonicity of \ell means

(y)(yz)(yx)(yzx)\ell(y)-\ell(y\nabla z)\leq\ell(y\nabla x)-\ell(y\nabla z\nabla x)

But this means δ\delta is increasing. The same is true for σ\sigma by (15). ∎

Lemma 6.8.

If \ell is monotone, then dd satisfies the very weak \nabla-inequality if and only if σ\sigma satisfies it.

Proof.

Because of (15). ∎

Lemma 6.9.

If σ\sigma satisfies the Δ\Delta-inequality then dd satisfies the Δ\Delta-inequality.

Proof.

Let x,y,zS¯x,y,z\in\overline{S}. In all cases like (x)(y)(z)\ell(x)\leq\ell(y)\leq\ell(z) etc. we see that

(x)(y)+(y)(z)(x)(z)(y)\ell(x){\wedge}\ell(y)+\ell(y){\wedge}\ell(z)-\ell(x){\wedge}\ell(z)\leq\ell(y)

Thus, by lemma 3.6 we get

(xz)+(x)(y)+(y)(z)(x)(z)(xy)+(yz)\ell(x\nabla z)+\ell(x){\wedge}\ell(y)+\ell(y){\wedge}\ell(z)-\ell(x){\wedge}\ell(z)\leq\ell(x\nabla y)+\ell(y\nabla z)

But this is the Δ\Delta-inequality for dd. ∎

It is astonishing how strongly the validity of the Δ\Delta-inequality and the \nabla-inequality of dd and σ\sigma, and the monotone increasingness of the measure of intersection and the δ\delta-function are connected:

Theorem 6.10.

Assume the setting and notations of definitions 2.15 and 2.12. In particular, suppose that \ell is monotonically increasing. Then the following are equivalent:

(a) dd satisfies the very weak \nabla-inequality

(b) σ\sigma satisfies the very weak \nabla-inequality

(c) δ\delta is increasing

(d) the measure of intersection is increasing

(e) σ\sigma satisfies the Δ\Delta-inequality

(f) σ\sigma satisfies the \nabla-inequality

Each of these assertions imply that dd satisfies the Δ\Delta-inequality.

Reversely, the \nabla-inequality for dd implies the above assertions.

Proof.

(a) \Leftrightarrow (b) is by lemma 6.8, (a) \Leftrightarrow (c) by lemma 6.7, (c) \Leftrightarrow (d) by lemma 6.3, (c) \Leftrightarrow (e) by lemmas 6.4 and 6.5, (c) \Rightarrow (f) by lemma 6.6, and (f) \Rightarrow (b) is clear. The forelast assertion is by lemma 6.9. That the last assertion implies (a) is clear. ∎

Having said in definition 6.1 what the measure of an imagined intersection of two sentences is, we continue to define it for the last Boolean operations of complement and union.

Definition 6.11 (Measure of complement).

For all x,yS¯x,y\in\overline{S} put

ζ(x\y):=(xy)(y)\zeta(x\backslash y):=\ell(x\nabla y)-\ell(y)

As ζ(x\y)=δy(x)\zeta(x\backslash y)=-\delta_{y}(x), δ\delta is increasing if and only if the measure of complement ζ(x\y)\zeta(x\backslash y) is decreasing in yy. This is reminiscent of lemma 3.4:

Lemma 6.12.

If \ell is monotone then

ζ(x\y)=d(y,xy)=σ(y,xy)\zeta(x\backslash y)=d(y,x\nabla y)=\sigma(y,x\nabla y)
d(x,y)=ζ(x\y)ζ(y\x)d(x,y)=\zeta(x\backslash y){\vee}\zeta(y\backslash x)
σ(x,y)=ζ(x\y)+ζ(y\x)\sigma(x,y)=\zeta(x\backslash y)+\zeta(y\backslash x)
Definition 6.13 (Measure of union).

For x1,,xnS¯x_{1},\ldots,x_{n}\in\overline{S} set

ζ(x1xn):=(x1xn)\zeta(x_{1}\cup\ldots\cup x_{n}):=\ell(x_{1}\nabla\ldots\nabla x_{n})

The final goal of this section is to extend the measure of imagined Boolean expressions of sentences to all possible Boolean expressions.

Definition 6.14.

Let BB be the free Boolean algebra (with zero element 0 and one-element 11) generated by the set S¯\overline{S}.

That is, BB consists of expressions like xy,xy,x\y=xy¯,,y¯x\cup y,x\cap y,x\backslash y=x\cap\overline{y},\emptyset,\overline{y} (complement of yy) etc. (also notated as x+y,xy,xy¯,0,y¯x+y,xy,x\overline{y},0,\overline{y}) for x,yS¯x,y\in\overline{S} and x,yBx,y\in B.

Two elements in BB are regarded as same if and only if they are the same by formal rules in Boolean algebra, for example, x+x+y=x+y,xx¯=0,x+x¯=1x+x+y=x+y,x\overline{x}=0,x+\overline{x}=1.

For n=0n=0 and xiS¯x_{i}\in\overline{S} define the expression x1xn:=x_{1}\nabla\ldots\nabla x_{n}:=\emptyset.

The above definitions for ζ\zeta will now be generalized as follows:

Definition 6.15.

Define a function

ζ:B\zeta:B\rightarrow{\mathbb{R}}

as follows. Put

ζ(0)=ζ(1)=0\zeta(0)=\zeta(1)=0

For all x1,,xn,y1,,ymS¯x_{1},\ldots,x_{n},y_{1},\ldots,y_{m}\in\overline{S} with n0,m0n\geq 0,m\geq 0 (and y:=y1ymy:=y_{1}\nabla\ldots\nabla y_{m}) set

ζ(x1xn):=k0,{i1,,ik}{1,,n}(xi1xik)(1)k+1\zeta(x_{1}\cap...\cap x_{n}):=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

Here it is understood that mutually isiti_{s}\neq i_{t}. Then define

(18) ζ(x1xny¯1y¯m)=ζ((x1xn)\(y1ym))\zeta(x_{1}\cap...\cap x_{n}\cap\overline{y}_{1}\cap...\cap\overline{y}_{m})=\zeta((x_{1}\cap...\cap x_{n})\backslash(y_{1}\cup...\cup y_{m}))
:=ζ(x1xn)ζ(x1xny):=\zeta(x_{1}\cap...\cap x_{n})-\zeta(x_{1}\cap...\cap x_{n}\cap y)
(19) =k0,{i1,,ik}{1,,n}(yxi1xik)(1)k+1=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(y\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}

because all expressions without yy cancel. Finally set

ζ(xy):=ζ(x)+ζ(y)if xy=0\zeta(x\cup y):=\zeta(x)+\zeta(y)\qquad\mbox{if }x\cap y=0
Lemma 6.16.

ζ\zeta is well defined.

Proof.

Because we can choose common refinements, every expression in BB can be brought to the form nzn=nx1,nxmn,ny¯1,ny¯kn,n\sum_{n}z_{n}=\sum_{n}x_{1,n}\ldots x_{m_{n},n}\overline{y}_{1,n}\ldots\overline{y}_{k_{n},n} with znzm=0z_{n}z_{m}=0 for nmn\neq m, with xi,j,yi,jS¯x_{i,j},y_{i,j}\in\overline{S} and znz_{n} obviously defined, and so the above ζ\zeta is already fully defined.

This expression in BB is uniquely defined if the set of involved variables is determined. Thus we only need to check that the definition of ζ\zeta is independent of the set of involved variables:

Observe here the occurrence of xn+1x_{n+1}:

ζ(x1xnxn+1y¯1y¯m)+ζ(x1xnxn+1¯y¯1y¯m)\zeta(x_{1}...x_{n}x_{n+1}\overline{y}_{1}...\overline{y}_{m})+\zeta(x_{1}...x_{n}\overline{x_{n+1}}\,\overline{y}_{1}...\overline{y}_{m})
=k0,{i1,,ik}{1,,n+1}(yxi1xik)(1)k+1=\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n+1\}}\ell(y\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}
+k0,{i1,,ik}{1,,n}(yxn+1xi1xik)(1)k+1+\sum_{k\geq 0,\{i_{1},\ldots,i_{k}\}\subseteq\{1,...,n\}}\ell(y\nabla x_{n+1}\nabla x_{i_{1}}\nabla\ldots\nabla x_{i_{k}})(-1)^{k+1}
=ζ(x1xny¯1y¯m)=\zeta(x_{1}...x_{n}\overline{y}_{1}...\overline{y}_{m})

because all occurrences with xn+1x_{n+1} cancel each other. This result is consistent with xn+1+xn+1¯=1x_{n+1}+\overline{x_{n+1}}=1.

By induction we can add as many variables as we want by the formula x+x¯=1x+\overline{x}=1 and expansion, without changing ζ\zeta. ∎

Lemma 6.17.

ζ\zeta is a “signed measure” on the Boolean algebra B generated by S¯\overline{S} in the sense that

ζ(xy)=ζ(x)+ζ(y)ζ(xy)\zeta(x\cap y)=\zeta(x)+\zeta(y)-\zeta(x\cup y)

for all x,yBx,y\in B. In particular we have

ζ(y¯)=ζ(y)\zeta(\overline{y})=-\zeta(y)
ζ(x\y)=ζ(xy)ζ(y)\zeta(x\backslash y)=\zeta(x\cup y)-\zeta(y)
Proof.

This is clear, as when we form common refinements for xx and yy in BB as indicated in the last proof, ζ(xy)\zeta(x\cap y) sums only over the products which appear both in x,yx,y, and ζ(xy)\zeta(x\cup y) over those which appear in xx or yy.

The last two identities follow from this and ζ(1)=0\zeta(1)=0. ∎

We should however be reminded that we do not work with a real positive measure on measurable sets. We only observe:

Lemma 6.18.

For all x,y,xiS¯x,y,x_{i}\in\overline{S} we have:

(i) ζ(xy)0\zeta(x\cap y)\geq 0, ζ(x1xn)0\zeta(x_{1}\cup\ldots\cup x_{n})\geq 0

(ii) If \ell is monotonically increasing then ζ(x\y)0\zeta(x\backslash y)\geq 0.

7. New metric

Up to now we could not clarify if dd or σ\sigma satisfies the triangle or \nabla-inequality in our natural prototype languages. We have theorem 6.10, and it tells us the \nabla-inequality of dd implies all the desired natural properties. In this section we show how we may modify a given length function \ell such that we get the \nabla-inequality for its associated distance function dd_{\ell} and thus all the desired properties. The approach is as follows. At first we modify the distance function dd_{\ell} for a given \ell as in definitions 7.1 and 7.4 such that the so obtained dd satisfies the Δ\Delta- and \nabla-inequality. Then we extract from it a new length function 2\ell_{2} by setting 2(x):=d(x,)\ell_{2}(x):=d(x,\emptyset) (see definition 7.11). But this new 2\ell_{2} may not be monotone increasing as required in (9). That is why we make it monotone increasing in definition 7.9 to obtain a new length 2¯\overline{\ell_{2}}. But the validity of the desired Δ\Delta- and \nabla-inequality of d2¯d_{\overline{\ell_{2}}} and σ2¯\sigma_{\overline{\ell_{2}}} associated to 2¯\overline{\ell_{2}} might fail, also cf. lemma 7.13. Hence we restart the above procedure with 2¯\overline{\ell_{2}} again, and then again and again infinitely many often, and in the limit the obtained length function \ell^{\infty} satisfies all desired properties.

A key lemma for this procedure is lemma 7.14, whose proof shows that we need both the Δ\Delta- and \nabla-inequality simultaneously. Hence, it is also intrinsic to do the procedure with the dd-distance, because the lemma might not hold for the σ\sigma-distance. Doing the procedure for the σ\sigma-distance instead leads at least to the estimates of lemma 7.27 in the limit.

Actually, in this section we start with dd, and if nothing else is said, dd always denotes a symmetric, positive, nilpotent function d:S¯×S¯d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} for an abelian, idempotent monoid (S¯,,)(\overline{S},\nabla,\emptyset).

Example 10.10 suggests, that dd of definition 2.12 does not satisfy the Δ\Delta-inequality in general. A way out of that fact is to repair dd as follows:

Definition 7.1 (Forced Δ\Delta-inequality).

Define a function d~:S¯×S¯\tilde{d}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} by

d~(x,y)=infn0,a1,,anS¯d(x,a1)+d(a2,a3)++d(an1,an)+d(an,y)\tilde{d}(x,y)=\inf_{n\geq 0,a_{1},\ldots,a_{n}\in\overline{S}}d(x,a_{1})+d(a_{2},a_{3})+\ldots+d(a_{n-1},a_{n})+d(a_{n},y)
Lemma 7.2.

d~\tilde{d} satisfies the Δ\Delta-inequality, and d=d~d=\tilde{d} if and only if dd satisfies the Δ\Delta-inequality.

Proof.

This is obvious as the right hand side of the Δ\Delta-inequality is one possible value over which the infimum is taken in d~\tilde{d}. ∎

The following lemma may be useful for the pseudometric ρ\rho on S¯\overline{S} defined by ρ(x,y):=|(x)(y)|\rho(x,y):=|\ell(x)-\ell(y)| to get a lower bound for d~\tilde{d}, cf. lemma 3.5.

Lemma 7.3.

If a function ρ:S¯×S¯\rho:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} satisfies the Δ\Delta-inequality then we have the following lower bound estimate:

ρdρd~\rho\leq d\quad\Rightarrow\quad\rho\leq\tilde{d}

Next we repair dd to satisfy the \nabla-inequality:

Definition 7.4 (Forced \nabla-inquality).

Define a function d^:S¯×S¯\hat{d}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} by

d^(x,y):=infx=x1xn,y=y1yn,xi,yiS¯d(x1,y1)++d(xn,yn)\hat{d}(x,y):=\inf_{x=x_{1}\nabla\ldots\nabla x_{n},\,y=y_{1}\nabla\ldots\nabla y_{n},\;x_{i},y_{i}\in\overline{S}}d(x_{1},y_{1})+\ldots+d(x_{n},y_{n})
Lemma 7.5.

d^\hat{d} satisfies the \nabla-inequality, and d=d^d=\hat{d} if and only if dd satisfies the \nabla-inequality.

Proof.

This is obvious as the right hand side of the \nabla-inequality is one possible value over which the infimum is taken in d^\hat{d}. ∎

Fortunately, the transition from dd to d~\tilde{d} does not destroy the validitiy of the \nabla-inequality:

Lemma 7.6.

If dd satisfies the \nabla-inequality, then d~\tilde{d} satisfies the \nabla-inequality.

Proof.

By application of the \nabla-inequality for dd we get

d~(xy,zw)=infn0,a1,,anS¯d(xy,a1)+d(a2,a3)++d(an,zw)\tilde{d}(x\nabla y,z\nabla w)=\inf_{n\geq 0,a_{1},\ldots,a_{n}\in\overline{S}}d(x\nabla y,a_{1})+d(a_{2},a_{3})+\ldots+d(a_{n},z\nabla w)
infn0,ai,biS¯d(xy,a1b1)+d(a2b2,a3b3)++d(anbn,zw)\leq\inf_{n\geq 0,a_{i},b_{i}\in\overline{S}}d(x\nabla y,a_{1}\nabla b_{1})+d(a_{2}\nabla b_{2},a_{3}\nabla b_{3})+\ldots+d(a_{n}\nabla b_{n},z\nabla w)
d~(x,z)+d~(y,w)\leq\tilde{d}(x,z)+\tilde{d}(y,w)

where the first inequality is because the infimum goes over a smaller selection. ∎

Lemma 7.7.

d^~{\tilde{\hat{d}}} satisfies the \nabla- and Δ\Delta-inequality, and d=d^~d={\tilde{\hat{d}}} if and only if dd satisfies the \nabla- and Δ\Delta- inequality.

Proof.

The first assertion follows by lemmas 7.2, 7.5 and 7.6.

If d=d^~d={\tilde{\hat{d}}} then dd^d^~=dd\geq\hat{d}\geq{\tilde{\hat{d}}}=d, thus d=d^d^~=d=d^d=\hat{d}\geq{\tilde{\hat{d}}}=d=\hat{d}, so that the last equivalence is by lemmas 7.6 and 7.2. ∎

Hence we get a projection from metric candidates to those ones which satisfy the Δ\Delta- and \nabla-inequality:

Corollary 7.8 (Projection onto pseudometrics satisfying \nabla-inequality).

There is a projection dd^~d\mapsto{\tilde{\hat{d}}} from the set of positive, symmetric, nilpotent functions on S¯\overline{S} onto the set of pseudometrics on S¯\overline{S} satisfying the \nabla-inequality.

The following bar operator turns possibly non-increasing length functions to increasing ones:

Definition 7.9 (Bar operator).

Given a positive function :S¯\ell:\overline{S}\rightarrow{\mathbb{R}} set ¯:S¯\overline{\ell}:\overline{S}\rightarrow{\mathbb{R}} as

¯(x):=infzS¯(xz)\overline{\ell}(x):=\inf_{z\in\overline{S}}\ell(x\nabla z)
Lemma 7.10.

If :S¯\ell:\overline{S}\rightarrow{\mathbb{R}} is a possibly non-increasing length function then ¯\overline{\ell} is an increasing length-function, i.e. (for all x,yS¯x,y\in\overline{S})

¯(x)¯(xy)\overline{\ell}(x)\leq\overline{\ell}(x\nabla y)

If \ell is already monotonically increasing, then ¯=\bar{\ell}=\ell.

From the various distance functions we naturally extract the length functions out:

Definition 7.11.

Define positive functions ,~,^,^~:S¯\ell,\tilde{\ell},\hat{\ell},{\tilde{\hat{\ell}}}:\overline{S}\rightarrow{\mathbb{R}} (depending on dd) by

(20) (x):=d(x,),~(x):=d~(x,),^(x):=d^(x,),^~(x):=d^~(x,)\ell(x):=d(x,\emptyset),\;\tilde{\ell}(x):=\tilde{d}(x,\emptyset),\;\hat{\ell}(x):=\hat{d}(x,\emptyset),\;{\tilde{\hat{\ell}}}(x):={\tilde{\hat{d}}}(x,\emptyset)
Lemma 7.12.

If dd satisfies the \nabla-inequality, then \ell of (20) is a non-increasing length function, and ¯\overline{\ell} a length function.

The next lemma shows that switching from dd to d^~{\tilde{\hat{d}}} may really change their associated length functions \ell and ^~{\tilde{\hat{\ell}}}, respectively:

Lemma 7.13.

If ρd\rho\leq d for ρ\rho as defined before lemma 7.3 then ~=\tilde{\ell}=\ell, and if \ell is subadditive then ^=\hat{\ell}=\ell. But we have ^~{\tilde{\hat{\ell}}}\neq\ell in certain cases.

Proof.

By lemma 7.3 we have

|(x)()|d~(x,)(x)|\ell(x)-\ell(\emptyset)|\leq\tilde{d}(x,\emptyset)\leq\ell(x)

If ^~={\tilde{\hat{\ell}}}=\ell then dd_{\ell} would satisfy the Δ\Delta-inequality by lemma 7.22. (By independence, the forward reference is allowed.) But we know from example 10.10 that this is not in general true. ∎

This is an important key lemma which shows how our above procedure decreases the metric candidates:

Lemma 7.14.

If dd satisfies the Δ\Delta- and \nabla-inequality then for \ell of (20),

ddd_{\ell}\leq d
d¯dd_{\overline{\ell}}\leq d
Proof.

(i) Without loss of generality we have by the second Δ\Delta-inequality and the \nabla-inequality that

d(x,y)=d(xy,)d(x,)d_{\ell}(x,y)=d(x\nabla y,\emptyset)-d(x,\emptyset)
d(xy,x)d(x,y)+d(x,x)=d(x,y)\leq d(x\nabla y,x)\leq d(x,y)+d(x,x)=d(x,y)

(ii) Similarly we get

d¯(x,y)=infz(xyz)infw(xw)infw(yw)d_{\overline{\ell}}(x,y)=\inf_{z}\ell(x\nabla y\nabla z)-\inf_{w}\ell(x\nabla w){\wedge}\inf_{w}\ell(y\nabla w)

Given any ε>0\varepsilon>0 we may choose a wS¯w\in\overline{S} such that this is

infz(xyz)(xw)+ε\leq\inf_{z}\ell(x\nabla y\nabla z)-\ell(x\nabla w)+\varepsilon
(xyw)(xw)+ε\leq\ell(x\nabla y\nabla w)-\ell(x\nabla w)+\varepsilon
d(xyw,xw)+ε\leq d(x\nabla y\nabla w,x\nabla w)+\varepsilon
d(y,x)+d(xw,xw)+ε=d(x,y)+ε\leq d(y,x)+d(x\nabla w,x\nabla w)+\varepsilon=d(x,y)+\varepsilon

Corollary 7.15.

Using definition (20), we have

d^~d^~d_{{{\tilde{\hat{\ell}}}}}\leq{\tilde{\hat{d}}}
d^~¯d^~d_{\overline{{\tilde{\hat{\ell}}}}}\leq{\tilde{\hat{d}}}
Proof.

Apply lemma 7.14 to d^~{\tilde{\hat{d}}}. ∎

Let us now restart the above procedure with a new d^~¯d_{{\bar{\tilde{\hat{\ell}}}}} instead of dd and then again and again. More precisely put

Definition 7.16.

Set a start point d(1):=d:S¯×S¯d^{(1)}:=d:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}}, and then for all n1n\geq 1 define successively functions (n)^~,(n+1):S¯{\tilde{\hat{\ell^{(n)}}}},\ell^{(n+1)}:\overline{S}\rightarrow{\mathbb{R}} and d(n+1):S¯×S¯d^{(n+1)}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} by

(n)^~(x):=d(n)^~(x,){\tilde{\hat{\ell^{(n)}}}}(x):={\tilde{\hat{d^{(n)}}}}(x,\emptyset)
(n+1):=(n)^~¯\ell^{(n+1)}:={\overline{\tilde{\hat{\ell^{(n)}}}}}
(21) d(n+1):=d(n+1)d^{(n+1)}:=d_{\ell^{(n+1)}}

We are going to show that the successively applied procedure defined in the last definition which takes a length function in and gives a length function out converges to a desired fixed point.

Lemma 7.17.

All (n+1)\ell^{(n+1)} are length functions on S¯\overline{S}, and we have descending sequences

0(3)=(2)^~¯(2)^~(2)=^~¯^~0\leq...\leq\ell^{(3)}=\overline{\tilde{\hat{\ell^{(2)}}}}\leq{\tilde{\hat{\ell^{(2)}}}}\leq\ell^{(2)}=\overline{\tilde{\hat{\ell}}}\leq\tilde{\hat{\ell}}\leq\ell
0d(3)=d(3)d(2)^~d(2)=d(2)d^~d0\leq...\leq d^{(3)}=d_{\ell^{(3)}}\leq\tilde{\hat{d^{(2)}}}\leq d^{(2)}=d_{\ell^{(2)}}\leq\tilde{\hat{d}}\leq d
Proof.

By lemmas 7.7 and and 7.12, (n+1)\ell^{(n+1)} is a length function. By definitions 7.1 and 7.4, we have d^~d{\tilde{\hat{d}}}\leq d. By corollary 7.15 applied to d:=d(n)d:=d^{(n)} and ^~:=(n)^~{\tilde{\hat{\ell}}}:={\tilde{\hat{\ell^{(n)}}}} we thus obtain d(n+1)=d(n+1)=d(n)^~¯d(n)^~d(n)d^{(n+1)}=d_{\ell^{(n+1)}}=d_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq{\tilde{\hat{d^{(n)}}}}\leq{d^{(n)}}. Hence, as ¯\overline{\ell}\leq\ell by definition 7.9,

(n+1)=(n)^~¯(n)^~=d(n)^~(,)d(n)(,)=d(n)(,)=(n)\ell^{(n+1)}=\overline{{\tilde{\hat{\ell^{(n)}}}}}\leq{{\tilde{\hat{\ell^{(n)}}}}}={\tilde{\hat{d^{(n)}}}}(-,\emptyset)\leq d^{(n)}(-,\emptyset)=d_{\ell^{(n)}}(-,\emptyset)=\ell^{(n)}

We note that all d(n)^~{\tilde{\hat{d^{(n)}}}} are pseudometrics which satisfy the \nabla-inequality by lemma 7.7, but it is unclear whether they are of the form dd_{\ell^{\prime}}. (Otherwise we would be done.)

Since we have monotonically falling real-valued sequences that are bounded below by 0 we may define

Definition 7.18.

Define the pointwise function limits

d():=limnd(n){{d}^{(\infty)}}:=\lim_{n\rightarrow\infty}d^{(n)}
(22) ():=limn(n){{\ell}^{(\infty)}}:=\lim_{n\rightarrow\infty}\ell^{(n)}
Lemma 7.19.

The function

d()=d()=limnd(n)=limnd(n)^~=d()^~=d()^~d^{(\infty)}=d_{{{\ell}^{(\infty)}}}=\lim_{n\rightarrow\infty}d_{\ell^{(n)}}=\lim_{n\rightarrow\infty}{\tilde{\hat{d_{\ell^{(n)}}}}}={\tilde{\hat{d^{(\infty)}}}}={\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}

is a pseudometric on S¯\overline{S} satisfying the \nabla-inequality.

Proof.

The second equality is clear by the formula for dd_{\ell} of def. 2.12, the first one is then by definition. The third equality is because of the converging sequence of lemma 7.17. Because d(n)^~{\tilde{\hat{d_{\ell^{(n)}}}}} satisfies the \nabla- and Δ\Delta-inequality by lemma 7.7, their point-wise limit does also, and so we get the last equalities and thus the final assertion. ∎

Lemma 7.20.

(){{\ell}^{(\infty)}} is a length function on S¯\overline{S} and

()=d()(,)=()^~:=d()^~(,)=()^~¯=()¯\ell^{(\infty)}={d_{{{\ell}^{(\infty)}}}}(-,\emptyset)={\tilde{\hat{\ell^{(\infty)}}}}:={\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}(-,\emptyset)=\overline{\tilde{\hat{\ell^{(\infty)}}}}=\overline{\ell^{(\infty)}}
Proof.

As d()^~=d(){\tilde{\hat{d_{{{\ell}^{(\infty)}}}}}}=d_{{{\ell}^{(\infty)}}} by lemma 7.19, we get the second identity.

The last equality holds because, as d()=d()d^{(\infty)}=d_{{{\ell}^{(\infty)}}} is positive,

d()(xy,x)=()(xy)(x)0{d_{{{\ell}^{(\infty)}}}}(x\nabla y,x)={{\ell}^{(\infty)}}(x\nabla y)-\ell^{\infty}(x)\geq 0

that is, (){{\ell}^{(\infty)}} satisfies already the monotone increasingness. ∎

We now summarize that the successively applied described procedure which takes a length function and forms a new one converges to a desired fixed point:

Theorem 7.21 (Ideal length function).

Given a positive symmetric idempotent function dd on an abelian, idempotent monoid (S¯,,)(\overline{S},\nabla,\emptyset), define (){{\ell}^{(\infty)}} as described in definitions 7.16 and 7.18.

Then (){{\ell}^{(\infty)}} is a length function on S¯\overline{S}, and d()d_{{{\ell}^{(\infty)}}} and σ()\sigma_{{{\ell}^{(\infty)}}} (def. 2.12) are pseudometrics on S¯\overline{S} satisfying the \nabla-inequality.

Proof.

By lemmas 7.19 and 7.20, and theorem 6.10. ∎

If dd starting at the beginning of this section happens to be of the form d=dd=d_{\ell} for a length function \ell on S¯\overline{S}, then astonishingly dd satisfies already the Δ\Delta- and \nabla-inequality if only =^~\ell={\tilde{\hat{\ell}}}:

Proposition 7.22 (Ideal length function attained).

If dd is of the form d=dd=d_{\ell} for some length function \ell then dd_{\ell} satisfies the \nabla- and Δ\Delta-inequality \Leftrightarrow

d=d^~^~=^~¯==()d_{\ell}={\tilde{\hat{d_{\ell}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\ell}}}=\ell\quad\Leftrightarrow\quad{\bar{\tilde{\hat{\ell}}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}
Proof.

If dd satisfies the Δ\Delta-inequality then certainly dd~dd\leq\tilde{d}\leq d and similarly the \nabla-inequality implies d=d^d=\hat{d}. So d^~=d{\tilde{\hat{d}}}=d by lemma 7.6 and thus ^~={\tilde{\hat{\ell}}}=\ell.

If ^~={\tilde{\hat{\ell}}}=\ell then setting d:=dd:=d_{\ell} we have with corollary 7.15 that

d=d^~d^~=d^~dd_{\ell}=d_{{\tilde{\hat{\ell}}}}\leq{\tilde{\hat{d}}}={\tilde{\hat{d_{\ell}}}}\leq d_{\ell}

and so d=d^~d_{\ell}={\tilde{\hat{d_{\ell}}}} satisfies the \nabla and Δ\Delta-inequality.

If ^~¯={\bar{\tilde{\hat{\ell}}}}=\ell then

^~¯=^~^~¯={\bar{\tilde{\hat{\ell}}}}=\ell\geq{\tilde{\hat{\ell}}}\geq{\bar{\tilde{\hat{\ell}}}}=\ell

and so =^~\ell={\tilde{\hat{\ell}}}. Also then, (2)=^~¯=\ell^{(2)}={\bar{\tilde{\hat{\ell}}}}=\ell, and thus d(2)=dd^{(2)}=d_{\ell} and the above procedure restarts again with the same start dd_{\ell}, and so again (3)=(2)\ell^{(3)}=\ell^{(2)} etc and we get a constant sequence and finally ()={{\ell}^{(\infty)}}=\ell.

If ()={{\ell}^{(\infty)}}=\ell then by corollary 7.20 ^~¯={\bar{\tilde{\hat{\ell}}}}=\ell. ∎

We just remark that one will naturally turn the desired pseudometrics to metrics effortlessly:

Corollary 7.23.

By definition 5.7, we may then divide out elements in S¯\overline{S} to obtain a metric space Sˇ{\check{S}}, which is again an abelian idempotent monoid, with metrics d()ˇ=d()ˇ{\check{d_{{{\ell}^{(\infty)}}}}}=d_{{\check{{{\ell}^{(\infty)}}}}} and σ()ˇ=σ()ˇ{\check{\sigma_{{{\ell}^{(\infty)}}}}}=\sigma_{{\check{{{\ell}^{(\infty)}}}}} satisfying the \nabla-inequality.

Proof.

By theorem 7.21 we have pseudometrics d(){{d}^{(\infty)}} and σ(){{\sigma}^{(\infty)}}, and to these we apply definition 5.7, proposition 5.8 and lemma 5.6. ∎

The above procedure gives a desired projection onto the set of length functions from arbitrary length functions to desired ones:

Corollary 7.24 (Projection onto ideal length functions).

There is a projection ()\ell\mapsto{{\ell}^{(\infty)}} from the set of length functions \ell on S¯\overline{S} onto the set of those ones such that dd_{\ell} and σ\sigma_{\ell} are pseudometrics which satisfy the \nabla-inequality.

Proof.

Start the above procedure, definition 7.16, with d:=dd:=d_{\ell} (or alternatively with d:=σd:=\sigma_{\ell}, which however may result in another fixed point (){{\ell}^{(\infty)}}) and end up with theorem 7.21.

Projection is onto because of lemma 7.7 and theorem 6.10. ∎

If the length function LL on SS is a simple character count as in our natural protoexamples then the fixed point for the length functions will be attained after finitely many steps:

Lemma 7.25.

If the \ell-function is in a grid, i.e. (S¯)γ\ell(\overline{S})\subseteq\gamma{\mathbb{Z}} for some constant γ\gamma\in{\mathbb{R}}, and d=dd=d_{\ell}, then the limits of definition 7.18 will all be attained at some nn (nn depending on the points of S¯\overline{S} however).

We will now make a bigger cut and switch from dd to σ\sigma. Let us now do the above procedure by starting with a length function (1)\ell^{(1)} on S¯\overline{S}, but instead always forming d(n)d_{\ell^{(n)}} we now choose σ(n)\sigma_{\ell^{(n)}}. In other words, set

Definition 7.26.

Do everything as in definition 7.16 but modify (21) to

d(n+1):=σ(n+1)d^{(n+1)}:=\sigma_{\ell^{(n+1)}}
Lemma 7.27.

Assume definition 7.26, but to avoid confusion with what we have defined before, rewrite σ(n):=d(n)\sigma^{(n)}:=d^{(n)}.

Then ((n))n(\ell^{(n)})_{n} is a decreasing sequence of length functions on S¯\overline{S}, with limits and estimates

():=limn(n),σ():=limnσ(n)=σ(){{\ell}^{(\infty)}}:=\lim_{n\rightarrow\infty}\ell^{(n)},\qquad{{\sigma}^{(\infty)}}:=\lim_{n\rightarrow\infty}\sigma^{(n)}=\sigma_{{{\ell}^{(\infty)}}}
(23) 1/2σ(n+1)σ(n)^~σ(n)1/2{\sigma^{(n+1)}}\leq{\tilde{\hat{\sigma^{(n)}}}}\leq{\sigma^{(n)}}
(24) 1/2σ()lim infnσ(n)^~lim supnσ(n)^~σ()^~σ()1/2{{\sigma}^{(\infty)}}\leq\liminf_{n}{\tilde{\hat{\sigma^{(n)}}}}\leq\limsup_{n}{\tilde{\hat{\sigma^{(n)}}}}\leq{\tilde{\hat{{{\sigma}^{(\infty)}}}}}\leq{{\sigma}^{(\infty)}}
Proof.

As (n)\ell^{(n)} converges pointwise to (){{\ell}^{(\infty)}}, σ(n+1)=σ(n+1)\sigma^{(n+1)}=\sigma_{\ell^{(n+1)}} converges also pointwise to σ(){{\sigma}^{(\infty)}}. (23) is by lemmas 3.5 and corollary 7.15 applied to d:=σ(n)d:=\sigma^{(n)}, as

1/2σ(n+1)=1/2σ(n)^~¯d(n)^~¯σ(n)^~1/2\sigma_{\ell^{(n+1)}}=1/2\sigma_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq d_{{\bar{\tilde{\hat{\ell^{(n)}}}}}}\leq{\tilde{\hat{\sigma^{(n)}}}}

By definitions 7.1 and 7.4, and because σ(n)\sigma^{(n)} converges pointwise to σ(){{\sigma}^{(\infty)}}, for all ε>0\varepsilon>0 there are a1,a2,x0,a1,0,S¯a_{1},a_{2},x_{0},a_{1,0},\ldots\in\overline{S} and n01n_{0}\geq 1 such that for all nn0n\geq n_{0} we have that

σ()^~(x,y)+3εσ()^(x,a1)+σ()^(a1,a2)++σ()^(am,y)+2ε{\tilde{\hat{{{\sigma}^{(\infty)}}}}}(x,y)+3\varepsilon\geq\hat{{{\sigma}^{(\infty)}}}(x,a_{1})+\hat{{{\sigma}^{(\infty)}}}(a_{1},a_{2})+\ldots+\hat{{{\sigma}^{(\infty)}}}(a_{m},y)+2\varepsilon
σ()(x0,a1,0)++σ()(am,km,ym)+ε\geq{{{\sigma}^{(\infty)}}}(x_{0},a_{1,0})+\ldots+{{{\sigma}^{(\infty)}}}(a_{m,k_{m}}^{\prime},y_{m})+\varepsilon
σ(n)(x0,a1,0)++σ(n)(am,km,ym)\geq{\sigma^{(n)}}(x_{0},a_{1,0})+\ldots+{\sigma^{(n)}}(a_{m,k_{m}}^{\prime},y_{m})
σ(n)^~(x0,a1,0)++σ(n)^~(am,km,ym)σ(n)^~(x,y)\geq{\tilde{\hat{\sigma^{(n)}}}}(x_{0},a_{1,0})+\ldots+{\tilde{\hat{\sigma^{(n)}}}}(a_{m,k_{m}}^{\prime},y_{m})\geq{\tilde{\hat{\sigma^{(n)}}}}(x,y)

where the last inequality is because σ^~(n){\tilde{\hat{\sigma}}}^{(n)} satisfies the \nabla- and Δ\Delta-inequality and so we can estimate back what we have expanded in the first two lines above.

This and (23) yield (24). ∎

8. Homomorphisms

In this section we shift our attention away from the abelian idempotent monoids to their homomorphism sets between them. Thereby we consider only homomorphisms which are “uniformly continuous” similarly as the bounded linear operator between Banach spaces are, see definition 8.8. We are provided by a length on each homomorphism set (analogy: norm on operators between Banach spaces) and apply the procedure of section 7 to obtain new length functions with better theoretical properties. Thereby we enrich this concept with the desired property of product inequalities, that is, we want achieve the validity of inequalities like (UV)(U)(V)\ell(U\circ V)\leq\ell(U)\ell(V) and d(AU,AV)(A)d(U,V)d(A\circ U,A\circ V)\leq\ell(A)d(U,V) for homomorphisms A,UA,U and VV. Similarly as in the last section we end up with a fixed point theorem yielding desired length functions, and apply them to get a Banach-Mazur-like distance between languages.

At first we define homomorphism between languages. In this section let (S,S)(S,\equiv_{S}) and (Q,Q)(Q,\equiv_{Q}) be languages with equivalence relation as in definitions 2.1 and 2.6.

Definition 8.1 (Homomorphism of languages).

A function T:(S,S)(Q,Q)T:(S,\equiv_{S})\rightarrow(Q,\equiv_{Q}) is called a homomorphism between these languages if for all x,ySx,y\in S we have

xy(TxTyandT(xy)=TxTy)x\bot y\quad\Rightarrow\quad\Big{(}Tx\bot Ty\quad\mbox{and}\quad T(x{\vee}y)=Tx{\vee}Ty\Big{)}
xSyTxQTyx\equiv_{S}y\quad\Rightarrow\quad Tx\equiv_{Q}Ty

In this section we almost exclusively return to the abstract setting of abelian idempotent monoids. In what follows let (S¯,,),(Q¯,,)(\overline{S},\nabla,\emptyset),(\overline{Q},\nabla,\emptyset) and (R¯,,)(\overline{R},\nabla,\emptyset) denote any abelian idempotent monoids. Finally, a natural notion for homomorphisms between such monoids is certainly a semigroup or monoid homomorphism:

Definition 8.2.

A function T:S¯Q¯T:\overline{S}\rightarrow\overline{Q} is called a homomorphism if for all x,yS¯x,y\in\overline{S}

T(xy)=TxTyT(x\nabla y)=Tx\nabla Ty

A homomorphism between languages becomes a homomorphism between their quotients:

Lemma 8.3.

If T:(S,S)(Q,Q)T:(S,\equiv_{S})\rightarrow(Q,\equiv_{Q}) is a homomorphism then it descends to a homomorphism T¯:S¯Q¯\overline{T}:\overline{S}\rightarrow\overline{Q} defined by T¯([s])=[T(s)]\overline{T}([s])=[T(s)].

The monoid operation \nabla between abelian idempotent monoids naturally carries over to the set of homomorphisms between them:

Definition 8.4 (Operation on homomorphism set).

For homomorphisms U,V:S¯Q¯U,V:\overline{S}\rightarrow\overline{Q} define the homomorphism

UV:S¯Q¯:(UV)(x):=UxVxU\nabla V:\overline{S}\rightarrow\overline{Q}:(U\nabla V)(x):=Ux\nabla Vx

That is why we get the usual order on the homomorphism set:

Definition 8.5.

We equip hom(S¯,Q¯)\hom(\overline{S},\overline{Q}) with the usual order \leq as of definition 5.1, that is, for all homomorphisms U,V:S¯Q¯U,V:\overline{S}\rightarrow\overline{Q} set

UVV=UVUxVxxS¯U\leq V\quad\Leftrightarrow\quad V=U\nabla V\quad\Leftrightarrow\quad Ux\leq Vx\;\forall x\in\overline{S}
Lemma 8.6.

For any aS¯a\in\overline{S}, Ta:S¯S¯T_{a}:\overline{S}\rightarrow\overline{S} defined by Ta(x)=xaT_{a}(x)=x\nabla a is a (non-unital) homomorphism .

We write the composition UVU\circ V of two homomorphisms U,VU,V also as a product UVUV. The following lemma states the distributive law between the (“additive”) monoid operations and the (“multiplicative”) composition operations on the homomorphism sets.

Lemma 8.7 (Distribution laws).

For all homomorphisms U,V:Q¯R¯U,V:\overline{Q}\rightarrow\overline{R} and X:S¯S¯X:\overline{S}\rightarrow\overline{S} and Y:R¯W¯Y:\overline{R}\rightarrow\overline{W} we have

(UV)X=UXVX(U\nabla V)X=UX\nabla VX
Y(UV)=YUYVY(U\nabla V)=YU\nabla YV

Write \emptyset for the operator :S¯T¯\emptyset:\overline{S}\rightarrow\overline{T} s[]s\mapsto[\emptyset]. A continuous linear operator between Banach spaces is actually uniformly continuous, that is, is bounded. We consider analogously only uniformly continuous, or in other words, bounded homomorphisms between abelian idempotent monoids equipped with metrics in order to get a natural length function on each homomorphism set in analogy to the norm of linear operators between Banach spaces:

Definition 8.8 (Natural length function on homomorphism set).

Let S¯\overline{S} and Q¯\overline{Q} be equipped with positive, symmetric, nilpotent functions dS¯d_{\overline{S}} and dQ¯d_{\overline{Q}}, respectively, satisfying the \nabla-inequality. Write hom(S¯,Q¯)\hom(\overline{S},\overline{Q}) for the set of all “uniformly continuous” homomorphisms U:S¯Q¯U:\overline{S}\rightarrow\overline{Q}, i.e. for which (U)\ell^{\prime}(U) defined next is finite.

We define functions ,:hom(S¯,Q¯)\ell^{\prime},\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}}, where \ell is a length function, by

(U):=inf{M0|M,x,yS¯:dQ¯(Ux,Uy)MdS¯(x,y)}\ell^{\prime}(U):=\inf\{M\geq 0|\,M\in{\mathbb{R}},\;\forall x,y\in\overline{S}:\;d_{\overline{Q}}(Ux,Uy)\leq Md_{\overline{S}}(x,y)\}
(25) :=¯\ell:=\overline{\ell^{\prime}}

In (25) we used the bar operator of definition 7.9. Actually we should index in \ell the data S¯,Q¯\overline{S},\overline{Q} etc., but to avoid cluttering we use the sloppy light notation. If we choose for dQ¯d_{\overline{Q}} the trivial null function then we will get all homomorphisms in hom(S¯,Q¯)\hom(\overline{S},\overline{Q}). For simplicity, we shall call elements of hom(S¯,Q¯)\hom(\overline{S},\overline{Q}) just homomorphisms and drop “uniformly continuous”.

Lemma 8.9.

For all (meaningful composable) homomorphisms U,VU,V we have

(26) (UV)(U)(V)\ell^{\prime}(U\circ V)\leq\ell^{\prime}(U)\ell^{\prime}(V)
(UV)(U)+(V)\ell^{\prime}(U\nabla V)\leq\ell^{\prime}(U)+\ell^{\prime}(V)
Proof.

The first inequality is elementary as for the norm of linear operators, and the second one follows from the \nabla-inequality of dQ¯d_{\overline{Q}}. ∎

We refer to the inequality (26) (for any given \ell^{\prime}) as the product-inequality. It is analogue to the product inequality STST\|ST\|\leq\|S\|\|T\| for bounded linear operators between Banach spaces. We tidy up the situation so far and get the usual abstract setting of abelian idempotent monoids also for the homomorphism sets:

Lemma 8.10 (Abstraction).

The homomorphism set ((hom(S¯,Q¯),,)\big{(}(\hom(\overline{S},\overline{Q}),\nabla,\emptyset\big{)} is an abelian idempotent monoid and \ell of definition 8.8 a length function on it.

Proof.

By definition 8.4 we check the first assertion, by lemma 8.9, \ell^{\prime} is a non-increasing length function, to which we apply lemma 7.10. ∎

For the further discussion we need now consider a whole family of abelian idempotent monoids:

Definition 8.11 (Category of languages).

Let Λ\Lambda be a small category consisting of abelian, idempotent monoids (S¯,,)(\overline{S},\nabla,\emptyset) equipped with positive, symmetric, nilpotent functions dS¯:S¯×S¯d_{\overline{S}}:\overline{S}\times\overline{S}\rightarrow{\mathbb{R}} satisfying the \nabla-inequality as objects, and all possible “uniformly continuous” homomorphisms between these objects as in definition 8.2 with ordinary composition as morphisms.

It is now important to stress that we will in the sequel not use the distance functions dS¯d_{\overline{S}} on the objects at all. We only mentioned it to have a natural length function on the homomorphism sets by definition 8.8. We require now all homomorphism sets to be equipped with length functions and metric candidates:

Definition 8.12 (Category of languages with general length functions on homomorphism sets).

Each morphism set MM (=hom(S¯,Q¯)=\hom(\overline{S},\overline{Q})) in Λ\Lambda assume to be equipped with a length function :M\ell:M\rightarrow{\mathbb{R}} and a positive, symmetric, nilpotent function d:M×Md:M\times M\rightarrow{\mathbb{R}}. (For example the length functions \ell of definition 8.8, and d:=dd:=d_{\ell} of definition 2.12 are the typical natural choices.) For simplicity, all \ell and dd are called in the same way, independent of MM.

Our next aim is to modify the length functions on the homomorphism sets in such a way such that we (just) get the natural inequality d(AU,AV)(A)d(U,V)d(A\circ U,A\circ V)\leq\ell(A)d_{\ell}(U,V) for all composable homomorphisms A,U,VA,U,V in analogy to the inequality AUAVAUV\|AU-AV\|\leq\|A\|\|U-V\| for linear operators between Banach spaces. Simultaneously we alter \ell as to achieve that dd_{\ell} satisfies the Δ\Delta- and \nabla-inequality. To this end we apply the fix point concept of section 7 to the homomorphism sets, enriched by the now also desired product inequality.

For the rest of this section we will define various new \ells and dds simultaneously for all morphism sets MM in Λ\Lambda, without indicating MM in notation. We again note that we may choose all functions dS¯d_{\overline{S}} to be the trivial null functions and any \ell and dds in definition 8.12 independent of the suggested \ells of definition 8.8.

We repair now all given symmetric nilpotent positive functions dd on the homomorphism sets in such a way that they satisfy the product inequalities with respect to the length functions:

Definition 8.13 (New submultiplicative distance function).

For all objects S¯\overline{S} and Q¯\overline{Q} in Λ\Lambda define d˙:hom(S¯,Q¯)×hom(S¯,Q¯)\dot{d}:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} as follows. For all morphisms X,Y:S¯Q¯X,Y:\overline{S}\rightarrow\overline{Q} in Λ\Lambda with same domain and range set

d˙(X,Y):=infY=A1AnVB1.BmX=A1AnUB1.Bm,,n,m0(A1)(An)d(U,V)(B1).(Bm)\dot{d}(X,Y):=\inf_{\stackrel{{\scriptstyle X=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{Y=A_{1}...A_{n}VB_{1}....B_{m}}},\;n,m\geq 0}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})

where Ai,V,U,BiA_{i},V,U,B_{i} are morphisms between any objects of Λ\Lambda for which their products as indicated are valid (composable).

The so repaired functions d˙\dot{d} now fulfill the desired product inequalities:

Lemma 8.14 (Product inequality).

For composable morphisms A,B,XA,B,X in Λ\Lambda we have the submultiplicativity relations (often called product-inequality)

(27) d˙(AX,BX)d˙(A,B)(X)\dot{d}(AX,BX)\leq\dot{d}(A,B)\ell(X)
(28) d˙(XA,XB)(X)d˙(A,B)\dot{d}(XA,XB)\leq\ell(X)\dot{d}(A,B)
Proof.

Inserting in definition 8.13 we have

d˙(QX,RX):=infRX=A1AnVB1.BmQX=A1AnUB1.Bm,(A1)(An)d(U,V)(B1).(Bm)\dot{d}(QX,RX):=\inf_{\stackrel{{\scriptstyle QX=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{RX=A_{1}...A_{n}VB_{1}....B_{m}}}}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})
infR=A1AnVB1.BmQ=A1AnUB1.Bm,(A1)(An)d(U,V)(B1).(Bm)(X)\leq\inf_{\stackrel{{\scriptstyle Q=A_{1}...A_{n}UB_{1}....B_{m},\;}}{{R=A_{1}...A_{n}VB_{1}....B_{m}}}}\ell(A_{1})...\ell(A_{n})d(U,V)\ell(B_{1})....\ell(B_{m})\ell(X)

because we choose the infimum over a smaller set in the second line, that is, we at first chose a resolution Q=A1AnUB1.BmQ=A_{1}...A_{n}UB_{1}....B_{m} and then said QXA1AnUB1.BmXQXA_{1}...A_{n}UB_{1}....B_{m}X. ∎

The validity of relations (27) and (28) for all morphism sets of Λ\Lambda is summarized as product inequality. It is important that d˙\dot{d} descends:

Lemma 8.15.

d˙\dot{d} is a positive, symmetric nilpotent function with

d˙d\dot{d}\leq d
Lemma 8.16.

d˙\dot{d} satisfies the product-inequality, and d=d˙d=\dot{d} if and only if dd satisfies the product-inequality.

Lemma 8.17.

If dd satisfies the product inequality then also :hom(S¯,Q¯)\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} the one of (26) in case that (x)=d(x,)\ell(x)=d(x,\emptyset).

We proceed now analogously and thus faster as in section 7, and use various definitions from there, but now applied to the homomorphism sets rather than to the object sets of Λ\Lambda. In what follows we use definitions 7.1, 7.4, 7.9 and 7.11 applied to the homomorphism sets. In analogy to lemma 7.7 we can now repair dd in such a way that we get all the desired properties we want:

Lemma 8.18.

For d˙\dot{d} being any positive, symmetric, nilpotent functions satisfying the product-inequality on each morphism set of Λ\Lambda we have:

d˙^\hat{\dot{d}} satisfies the \nabla- and product-inequality.

d˙~\tilde{\dot{d}} satisfies the Δ\Delta- and product-inequality.

d˙^~\tilde{\hat{\dot{d}}} satisfies the \nabla-, Δ\Delta- and product-inequality.

Proof.

By the distribution laws of lemma 8.7 and the supposed validity of the product inequalities as stated in lemma 8.14 we have

d˙^(QX,RX)=infQX=i=1nRi,RX=i=1nRii=1nd˙(Qi,Ri)\hat{\dot{d}}(QX,RX)=\inf_{QX=\nabla_{i=1}^{n}R_{i},RX=\nabla_{i=1}^{n}R_{i}}\sum_{i=1}^{n}{\dot{d}}(Q_{i},R_{i})
infQ=i=1nQi,R=i=1nRii=1nd˙(QiX,RiX)d˙^(Q,R)(X)\leq\inf_{Q=\nabla_{i=1}^{n}Q_{i},R=\nabla_{i=1}^{n}R_{i}}\sum_{i=1}^{n}{\dot{d}}(Q_{i}X,R_{i}X)\leq\hat{\dot{d}}(Q,R)\ell(X)
d˙~(QX,RX)=infAid˙(QX,A1)+d˙(A1,A2)+.+d˙(An,RX)\tilde{\dot{d}}(QX,RX)=\inf_{A_{i}}{\dot{d}}(QX,A_{1})+{\dot{d}}(A_{1},A_{2})+....+{\dot{d}}(A_{n},RX)
infAid˙(QX,A1X)+d˙(A1X,A2X)+.+d˙(AnX,RX)d˙~(Q,R)(X)\leq\inf_{A_{i}}{\dot{d}}(QX,A_{1}X)+{\dot{d}}(A_{1}X,A_{2}X)+....+{\dot{d}}(A_{n}X,RX)\leq\tilde{\dot{d}}(Q,R)\ell(X)

Hence d˙^\hat{\dot{d}} and d˙~\tilde{\dot{d}} fulfill the product inequalities, and the rest follows from lemmas 7.2, 7.5 and 7.7

Lemma 8.19.

dd_{\ell} satisfies the product-inequality if and only if σ\sigma_{\ell} satisfies it.

Proof.

By lemma 5.4 we may express σ\sigma_{\ell} by dd_{\ell}, and vice versa, and from this it is easily deduced. ∎

We introduce now the following functions:

Definition 8.20.

Define positive functions ,˙,˙^~:hom(S¯,Q¯)\ell,\dot{\ell},{\tilde{\hat{\dot{\ell}}}}:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} depending on dd (recall that we do this for each morphism set (M,d)(M,d) of Λ\Lambda separately) by

(x):=d(x,),˙(x):=d˙(x,),˙^~(x):=d˙^~(x,)\ell(x):=d(x,\emptyset),\;\dot{\ell}(x):=\dot{d}(x,\emptyset),\;{\tilde{\hat{\dot{\ell}}}}(x):={\tilde{\hat{\dot{d}}}}(x,\emptyset)

To get finally convergence, it is fundamental that the new metric candidates descend:

Corollary 8.21.

Using definitions 8.20 and 7.9, we have

d˙^~d˙^~,d˙^~¯d˙^~d_{{{\tilde{\hat{\dot{\ell}}}}}}\leq{\tilde{\hat{\dot{d}}}},\qquad d_{\overline{{\tilde{\hat{\dot{\ell}}}}}}\leq{\tilde{\hat{\dot{d}}}}
Proof.

Apply corollary 7.15 to d:=d˙d:=\dot{d} and :=˙\ell:=\dot{\ell}. ∎

We do now an analogous procedure as in definition 7.16, but now for the homomorphism sets instead of the object sets:

Definition 8.22.

For all object sets S¯\overline{S} and Q¯\overline{Q} of Λ\Lambda set the start points d(1):=d:hom(S¯,Q¯)×hom(S¯,Q¯)d^{(1)}:=d:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} and (1):=:hom(S¯,Q¯)\ell^{(1)}:=\ell:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} (def. 8.12), and then successively define (n)˙^~,(n+1):hom(S¯,Q¯){\tilde{\hat{\dot{\ell^{(n)}}}}},\ell^{(n+1)}:\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} and d(n+1):hom(S¯,Q¯)×hom(S¯,Q¯)d^{(n+1)}:\hom(\overline{S},\overline{Q})\times\hom(\overline{S},\overline{Q})\rightarrow{\mathbb{R}} (n1n\geq 1) by

(n)˙^~(x):=d(n)˙^~(x,){\tilde{\hat{\dot{\ell^{(n)}}}}}(x):={\tilde{\hat{\dot{d^{(n)}}}}}(x,\emptyset)
(n+1):=(n)˙^~¯\ell^{(n+1)}:={\overline{{\tilde{\hat{\dot{\ell^{(n)}}}}}}}
d(n+1):=d(n+1)d^{(n+1)}:=d_{\ell^{(n+1)}}

where d(n)˙\dot{d^{(n)}} is defined as in definition 8.13 with respect to d(n){d^{(n)}} and (n){\ell^{(n)}} (instead of dd and \ell).

The successively applied procedure to a length function as described in the last definition converges again, analogously as in the theorem 7.21, to a wished fixed point:

Theorem 8.23 (Ideal length functions on homomorphism sets).

Assume the definitions 8.11, 8.12, 8.22 and form the limits ():=lim(n){{\ell}^{(\infty)}}:=\lim\ell^{(n)} and d():=limd(n){{d}^{(\infty)}}:=\lim d^{(n)} with respect to all morphism sets MM of Λ\Lambda.

Then for all morphism sets MM of Λ\Lambda, (n+1)\ell^{(n+1)} and (){{\ell}^{(\infty)}} are length functions on MM fulfilling the product-inequality (26), d()d_{{{\ell}^{(\infty)}}} and σ()\sigma_{{{\ell}^{(\infty)}}} are pseudometrics on MM fulfilling the \nabla- and product-inequality, and moreover and more precisely

d()=d()=limnd(n)=d()˙^~=limnd(n)˙^~=limnd(n)˙^~=limnd(n)d^{(\infty)}=d_{{{\ell}^{(\infty)}}}=\lim_{n\rightarrow\infty}d_{\ell^{(n)}}={\tilde{\hat{\dot{{{d}^{(\infty)}}}}}}=\lim_{n\rightarrow\infty}{\tilde{\hat{\dot{d_{\ell^{(n)}}}}}}=\lim_{n\rightarrow\infty}{\tilde{\hat{\dot{d^{(n)}}}}}=\lim_{n\rightarrow\infty}{d^{(n)}}
()=()˙=()˙^~=()˙^~¯=limn(n)=limn(n)˙^~¯\qquad{{\ell}^{(\infty)}}=\dot{{{\ell}^{(\infty)}}}={\tilde{\hat{\dot{{{\ell}^{(\infty)}}}}}}={\bar{\tilde{\hat{\dot{{{\ell}^{(\infty)}}}}}}}=\lim_{n\rightarrow\infty}\ell^{(n)}=\lim_{n\rightarrow\infty}{\bar{\tilde{\hat{\dot{\ell^{(n)}}}}}}
d()(UX,VX)d()(U,V)()(X){d_{{{\ell}^{(\infty)}}}}(UX,VX)\leq{d_{{{\ell}^{(\infty)}}}}(U,V){{{\ell}^{(\infty)}}}(X)
d()(XU,XV)()(X)d()(U,V){d_{{{\ell}^{(\infty)}}}}(XU,XV)\leq{{{\ell}^{(\infty)}}}(X){d_{{{\ell}^{(\infty)}}}}(U,V)

for all composable morphisms X,UX,U and VV of Λ\Lambda.

Proof.

We have the analogy of lemma 7.17, that is, descending sequences

(n+2)=(n+1)˙^~¯(n+1),d(n+1)=d(n+1)d(n)˙^~d(n)\ell^{(n+2)}=\overline{{\tilde{\hat{\dot{\ell^{(n+1)}}}}}}\leq\ell^{(n+1)},\qquad d^{(n+1)}=d_{\ell^{(n+1)}}\leq{\tilde{\hat{\dot{d^{(n)}}}}}\leq d^{(n)}

by an analogous proof but by employing corollary 8.21 instead of 7.15 , and so their limits d(){{d}^{(\infty)}} and (){{\ell}^{(\infty)}} and their stated identities.

Similar proofs of lemmas 7.19 and 7.20 then yield the first identities.

The last stated product inequalities with respect to the pair (d(),())({{d}^{(\infty)}},{{\ell}^{(\infty)}}) follow then simply by taking the pointwise limits of the valid product-inequalities for the pairs (d(n)˙^~,(n))({\tilde{\hat{\dot{d^{(n)}}}}},{\ell^{(n)}}) by definition of d(n)˙\dot{d^{(n)}} in definition 8.22 and lemma 8.18. ∎

This is the analogy to proposition 7.22. It shows, for example, when the procedure of definition 8.22 stops:

Proposition 8.24 (Ideal length functions on homomorphism sets attained).

If d=dd=d_{\ell} for all morphism sets of Λ\Lambda, then all dd_{\ell} satisfy the \nabla-, Δ\Delta- and product-inequality \Leftrightarrow

d=d˙^~˙^~=˙^~¯==()d_{\ell}={\tilde{\hat{\dot{d_{\ell}}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\dot{\ell}}}}=\ell\quad\Leftrightarrow\quad{\bar{\tilde{\hat{\dot{\ell}}}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}

where each item is understood to hold for all morphism sets of Λ\Lambda.

Proof.

This is completely analogously proved as proposition 7.22, using corollary 8.21 and theorem 8.23 instead of 7.15 and 7.20, respectively. ∎

We finally may turn the obtained pseudometrics to metrics and notice:

Corollary 8.25 (Quotient category).

By def. 5.7, for each objects S¯,Q¯\overline{S},\overline{Q} in Λ\Lambda, we may then divide out elements in hom(S¯,Q¯){\hom(\overline{S},\overline{Q})} to obtain a metric space hom(S¯,Q¯)ˇ{\check{\hom(\overline{S},\overline{Q})}}, which is again an abelian idempotent monoid, with metric d()ˇ=d()ˇ{\check{d_{{{\ell}^{(\infty)}}}}}=d_{{\check{{{\ell}^{(\infty)}}}}} satisfying the \nabla- and product-inequality with respect to ()ˇ\check{{{\ell}^{(\infty)}}}.

These quotients are compatible with composition, i.e. if X1,X2hom(S¯,Q¯)X_{1},X_{2}\in\hom(\overline{S},\overline{Q}) with [X1]=[X2][X_{1}]=[X_{2}] (class brackets) and Y1,Y2hom(Q¯,R¯)Y_{1},Y_{2}\in\hom(\overline{Q},\overline{R}) with [Y1]=[Y2][Y_{1}]=[Y_{2}] then

[X1][Y1]:=[X1Y1]=[X2Y2][X_{1}][Y_{1}]:=[X_{1}\circ Y_{1}]=[X_{2}\circ Y_{2}]

In this way we obtain a category Λˇ\check{\Lambda} out of Λ\Lambda by replacing the morphism sets MM by Mˇ\check{M}.

Analogous distribution formulas as in lemma 8.7 hold in Λˇ\check{\Lambda}.

Proof.

The first assertions follow by definition 5.7 and proposition 5.8.

For d:=d()d:=d_{{{\ell}^{(\infty)}}} and :=()\ell:={{\ell}^{(\infty)}} we have

d(X1Y1,X2Y2)d(X1Y1,X1Y2)+d(X1Y2,X2Y2)d(X_{1}Y_{1},X_{2}Y_{2})\leq d(X_{1}Y_{1},X_{1}Y_{2})+d(X_{1}Y_{2},X_{2}Y_{2})
(X1)d(Y1,Y2)+d(X1,X2)(Y2)=0\leq\ell(X_{1})d(Y_{1},Y_{2})+d(X_{1},X_{2})\ell(Y_{2})=0

by the Δ\Delta- and product-inequality for dd and \ell found out in theorem 8.23. ∎

The next lemma indicates how we could show that a homomorphism is uniformly continuous with respect to various derived metric candidates.

Lemma 8.26.

Let U:S¯Q¯U:\overline{S}\rightarrow\overline{Q} be a homomorphism. If there is an M0M\geq 0 such that

dQ¯(Ua,Ub)MdS¯(a,b)d_{\overline{Q}}(Ua,Ub)\leq Md_{\overline{S}}(a,b)

for all aba\leq b, then this inequality holds for all a,bS¯a,b\in\overline{S}. Then also

dQ¯^~(Ua,Ub)MdS¯^~(a,b){\tilde{\hat{d_{\overline{Q}}}}}(Ua,Ub)\leq M{\tilde{\hat{d_{\overline{S}}}}}(a,b)

Analogous assertions hold for σ\sigma instead of dd.

Proof.

The first assertion is by the formulas of lemma 5.4, and the second one follows directly from definitions 7.1 and 7.4. ∎

We may apply the last theorem to get a Banach-Mazur-like distance between languages:

Definition 8.27 (Banach-Mazur-like distance between languages).

Let hom(S¯,Q¯)\hom(\overline{S},\overline{Q}) be equipped with a length function \ell. Then a Banach-Mazur-like distance between S¯\overline{S} and Q¯\overline{Q} (e.g. languages) may be defined as follows:

d(S¯,Q¯)=loginf{(ϕ)(ϕ1)|ϕ:S¯Q¯ an isomorphism}d(\overline{S},\overline{Q})=\log\inf\{\ell(\phi)\ell(\phi^{-1})\in{\mathbb{R}}|\;\phi:\overline{S}\rightarrow\overline{Q}\mbox{ an isomorphism}\}
Lemma 8.28.

This distance is a pseudometric on any category of isomorphic abelian, idempotent monoids such that the hom-sets hom(S¯,Q¯)\hom(\overline{S},\overline{Q}) are provided with a length function \ell satisfying the product-inequality (26).

The length functions ()\ell^{(\infty)} of theorem 8.23 may be good candidates for the Banach-Mazur-like distance as they satisfy the product inequalities (26) (indeed, apply the inequalities of theorem 8.23 and note that d()(x,)=()(x){d_{{{\ell}^{(\infty)}}}}(x,\emptyset)={{\ell}^{(\infty)}}(x)). Alternatively one may start with definitions 8.11 and 8.12 with d=dd=d_{\ell}, and then using \ell of lemma 8.17.

9. Non-increasing length functions

In this section we work with length functions which may not be monotonically increasing, that is, without requiring inequality (9). The length function of definition 2.10 may be replaced by the following possibly non-increasing bigger one:

Definition 9.1 (Length function on quotient).

Given a language SS with variables and a length function L:SL:S\rightarrow{\mathbb{R}} on it, set

:S¯\ell:\overline{S}\rightarrow{\mathbb{R}}
([s])=inf{L(t)|tS,st}\ell([s])=\inf\{L(t)\in{\mathbb{R}}|\,t\in S,\;s\equiv t\}

Now assume that we are given an abelian, idempotent monoid (S¯,,)(\overline{S},\nabla,\emptyset) with a non-increasing length function \ell on it (i.e. without postulating (9)).

In this section we do the same limit as in definitions 7.16 and 7.18, but instead of implementing the bar operator of lemma 7.10 we choose other functions dd_{\ell} and σ\sigma_{\ell} as follows, which do not require the monotone increasingness of \ell and are still positive:

Definition 9.2.

We define for all x,yS¯x,y\in\overline{S} and p1p\geq 1 (pp\in{\mathbb{R}})

d(x,y):=|(xy)(x)||(xy)(y)|d_{\ell}(x,y):=|\ell(x\nabla y)-\ell(x)|{\vee}|\ell(x\nabla y)-\ell(y)|
σp,(x,y):=(|(xy)(x)|p+|(xy)(y)|p)1/p\sigma_{p,\ell}(x,y):=\Big{(}|\ell(x\nabla y)-\ell(x)|^{p}+|\ell(x\nabla y)-\ell(y)|^{p}\Big{)}^{1/p}

We set σ:=σ1,\sigma_{\ell}:=\sigma_{1,\ell} for short. We use the same notations dd and σ\sigma as in definition 2.12, as notice, there is no difference between definitions 9.2 and 2.12 for monotonically increasing length functions \ell. We still have the lower bound |(x)(y)|σ|\ell(x)-\ell(y)|\leq\sigma_{\ell} for σ\sigma_{\ell}, but not so for dd_{\ell}.

Let us be given a positive, symmetric, nilpotent function dd on S¯\overline{S}. We are going to use definitions 7.1, 7.4 and 7.9. Define \ell as in definition 7.11.

Lemma 9.3.

If dd satisfies the Δ\Delta- and \nabla-inequality then for \ell as in (20),

ddd_{\ell}\leq d
d¯dd_{\overline{\ell}}\leq d
Proof.

Yielded by a similar proof as for lemma 7.14. ∎

We do now a procedure as in definition 7.16, but with the bar operator now being ignored:

Definition 9.4.

Set d(1):=dd^{(1)}:=d, and then for n1n\geq 1 put

(n+1)(x):=(n)^~(x):=d(n)^~(x,)\ell^{(n+1)}(x):={\tilde{\hat{\ell^{(n)}}}}(x):={\tilde{\hat{d^{(n)}}}}(x,\emptyset)
d(n+1):=d(n+1)d^{(n+1)}:=d_{\ell^{(n+1)}}
Corollary 9.5.

Given a positive, symmetric, idempotent function dd on an abelian, idempotent monoid (S¯,,)(\overline{S},\nabla,\emptyset), define (n)\ell^{(n)} and d(n)d^{(n)} as in definitions 9.4 and 9.2.

We get limits (){{\ell}^{(\infty)}} and d(){{d}^{(\infty)}} as in definition 7.18, and the analogous assertion of lemma 7.19. That is, (){{\ell}^{(\infty)}} is a non-increasing length function on S¯\overline{S}, and d()=d(){{d}^{(\infty)}}=d_{{{\ell}^{(\infty)}}} is a pseudometric on S¯\overline{S} satisfying the \nabla-inequality.

Proof.

By lemma 9.3, applied to d:=d(n)^~d:={\tilde{\hat{d^{(n)}}}} , we get

d(n+1)=d(n)^~d(n)^~d(n)d^{(n+1)}=d_{{\tilde{\hat{\ell^{(n)}}}}}\leq{\tilde{\hat{d^{(n)}}}}\leq{d^{(n)}}

So d(n)d^{(n)} is decreasing, in particular thus also (n)\ell^{(n)} by definition 9.4, and we get a limit ():=limn(n){{\ell}^{(\infty)}}:=\lim_{n}\ell^{(n)}. The rest goes verbatim as in lemmas 7.19 and 7.20. ∎

Proposition 9.6.

If dd is of the form d=dd=d_{\ell} for a non-increasing length function \ell then dd_{\ell} satisfies the \nabla- and Δ\Delta-inequality \Leftrightarrow

d=d^~^~==()d_{\ell}={\tilde{\hat{d_{\ell}}}}\quad\Leftrightarrow\quad{\tilde{\hat{\ell}}}=\ell\quad\Leftrightarrow\quad\ell={{\ell}^{(\infty)}}
Proof.

This is analogously proved as proposition 7.22. ∎

However, it cannot be claimed that ()\ell^{(\infty)} is increasing.

Also, contrary to theorem 7.21, δ\delta of definition 6.2 with respect to (){{\ell}^{(\infty)}} might not be increasing and σ(){{\sigma}^{(\infty)}} might not satisfy the \nabla- and Δ\Delta-inequality, as all the proofs given rely on the monotonically increasingness of (){{\ell}^{(\infty)}}.

We could now form ()¯\overline{{{\ell}^{(\infty)}}}, define dd_{\ell} as above and restart the above procedure, or the one of section 7. There seem to be endless many variants. A principal problem is, however this can be said, that 2\ell\leq\ell_{2} might not imply dd2d_{\ell}\leq d_{\ell_{2}}, so one might not be able to order the limits by \ell.

10. General examples

In this and the remaining sections, we assume that SS is the mathematical language of technics and physics, but also of typical mathematics. We may use quantors, set-definitions, but over all sentences need not to be completely defined in a rigorous mathematical way. This may also not be useful. In finding a new theory, it may be better to let the precise meaning open. For example, a quantum field theory may be singular and yield infinities, but after renormalization the situation turns, see the various textbooks about quantum field theory, for example [22].

We distinguish variables in sentences which are unbound, like xx in x+2=yx+2=y, and bound ones like xx in f(x):=x+2f(x):=x+2. Here, ff is a function which is being defined, and in a(y):=2+f(y)a(y):=2+f(y) the function ff is being called. We sometimes underline variables (like so: x¯\underline{x}) to indicate that these are main variables.

For the length function LL on SS we choose a simple character count (i.e. L(a1an)=nL(a_{1}\ldots a_{n})=n for n0,aiAn\geq 0,a_{i}\in A), without counting non-notated brackets, even if they are theoretically there (as in xyx^{y}), and set L(:=)=1L(:=)=1, L()=0L({\vee})=0. For example, L(xy(a+b)2)=8L(x^{y}{\vee}(a+b)^{2})=8. The sign {\vee} is always understood to bind most weakly (for example, a+bca+b{\vee}c means (a+b)c(a+b){\vee}c). The sign \approx means approximately equal between two real numbers, and its more precise meaning depends (for example, 9010090\approx 100 for integers). We note that for most of the discussion this choice of LL is essentially irrelevant, and one may as well choose any other length function LL, or the length function =()\ell={{\ell}^{(\infty)}} of theorem 7.21, but to get specific numbers in certain examples we choose the character count for simplicity.

From now on, everything is to be understood to be very sloppy, inexact, vague and superficial!

Example 10.1 (Main and auxiliary variables).

Main variables cannot be deleted by (6). They are what the theory is about, for example a fundamental field ψ¯\underline{\psi} in physics, or the entry point main¯\underline{main} in C++. Auxiliary variables help to simplify and shorten expressions, typically functions which are often called. For example

x¯=a+ba=3b=5x¯=8a=3b=5x¯=8\underline{x}=a+b{\vee}a=3{\vee}b=5\qquad\equiv\qquad\underline{x}=8{\vee}a=3{\vee}b=5\qquad\equiv\qquad\underline{x}=8
f¯(x)=g(x)2+g(x)3g(x)=2xf¯(x)=4x2+8x3\underline{f}(x)=g(x)^{2}+g(x)^{3}{\vee}g(x)=2x\qquad\equiv\qquad\underline{f}(x)=4x^{2}+8x^{3}

by (3) and (6).

Example 10.2 (Length function \ell is not directly subadditive on the level of SS).

Given x,ySx,y\in S, in general we have

([xy])([x])+([y])\ell([x{\vee}y])\not\leq\ell([x])+\ell([y])

Indeed, if yy is longer and contains only auxiliary variables and xx is short with main variables and uses the variables of yy (typically by function calls, that is, xx outsources the bulk of formulas to yy as auxilliary functions) then, by (6) one has yy\equiv\emptyset, and thus we got a contradiction

0([xy])([x])+([y])=([x])00\ll\ell([x{\vee}y])\leq\ell([x])+\ell([y])=\ell([x])\approx 0

The next two examples show that even if the class elements [xyϕ(x)][xy][x{\vee}y{\vee}\phi(x)]\neq[x{\vee}y], their lengths are approximately equal.

Example 10.3 (Copies are not superfluous in general).

Even if ϕ(x)x,ϕ(x)y\phi(x)\bot x,\phi(x)\bot y for x,yS,ϕPSx,y\in S,\phi\in P_{S}, then - contrary to (4) which says that xϕ(x)xx{\vee}\phi(x)\equiv x because ϕ(x)\phi(x) is a superfluous copy of xx - we still may have

(29) xyϕ(x)xyx{\vee}y{\vee}\phi(x)\not\equiv x{\vee}y

Indeed, set x:={f¯:=a}x:=\{\underline{f}:=a\} and y:={a(x):=2x}y:=\{a(x):=2x\}, then ϕ(x)={g¯:=b}\phi(x)=\{\underline{g}:=b\} but it cannot equivalently be deleted in

xyϕ(x)=(f¯:=aa(x):=2xg¯:=b)x{\vee}y{\vee}\phi(x)\quad=\quad\Big{(}\underline{f}:=a\quad{\vee}\quad a(x):=2x\quad{\vee}\quad\underline{g}:=b\Big{)}

because g¯:=b\underline{g}:=b is a different theory than f¯:=a\underline{f}:=a because aa is further determined, whereas bb is not.

Even if (29) is an inequality in general, it becomes approximately an identity when put into \ell, because ϕ(x)\phi(x) is just a copy of xx:

Example 10.4 (Copies are cheap).

If x,yS,ϕPSx,y\in S,\phi\in P_{S} and ϕ(x)x,ϕ(x)y\phi(x)\bot x,\phi(x)\bot y then we have

([xyϕ(x)])([xy])\ell([x{\vee}y{\vee}\phi(x)])\approx\ell([x{\vee}y])

In other words, if we take a theory xyx{\vee}y, and add to it a copy ϕ(x)\phi(x) (with other variable names) of the subtheory xx, then the length (= cost) of the outcoming theory xyϕ(x)x{\vee}y{\vee}\phi(x) does not much change. This is clear for computer programs because in ϕ(x)\phi(x) we just need to call all the routines of xx and need not to rewrite them twice in ϕ(x)\phi(x). Indeed, going back to the concrete example 10.3 we may write

xyϕ(x)(F(u,v):=uvF(f¯,a)=0yF(g¯,b)=0)x{\vee}y{\vee}\phi(x)\quad\equiv\quad\Big{(}F(u,v):=u-v\quad{\vee}\quad F(\underline{f},a)=0\quad{\vee}\quad y\quad{\vee}\quad F(\underline{g},b)=0\Big{)}

That is, instead of xx we wrote F(u,v):=u¯vF(f¯,a)=0F(u,v):=\underline{u}-v{\vee}F(\underline{f},a)=0, and instead of ϕ(x)\phi(x) F(g¯,b)=0F(\underline{g},b)=0. So we have one long definition of FF, approximately as long as xx itself, and two short calls F(f¯,a)=0F(\underline{f},a)=0 and F(g¯,b)=0F(\underline{g},b)=0. So we may end up with the claim:

([xyϕ(x)])([F(u,v):=u¯vF(f¯,a)=0y])+([F(g¯,b)=0])([xy])\ell([x{\vee}y{\vee}\phi(x)])\approx\ell([F(u,v):=\underline{u}-v{\vee}F(\underline{f},a)=0{\vee}y])+\ell([F(\underline{g},b)=0])\approx\ell([x{\vee}y])

In this short example it may not pay off, because the length of xx is so short, but if xx is long with few unbound variables, certainly.

By definition 2.8, the \nabla-operation requires that [x][y]=[xy][x]\nabla[y]=[x{\vee}y] if xx and yy have no variables in common. The next lemma shows when this identity approximately holds in the quotient Sˇ{\check{S}}, see definition 5.7, even if there are common variables.

Lemma 10.5 (\nabla-operation as an idealization of {\vee}-operation).

Here, \precsim means approximately lower or equal. For all x,ySx,y\in S we have

([xy])([x][y])d([xy],[x][y])0\ell([x{\vee}y])\precsim\ell([x]\nabla[y])\quad\Rightarrow\quad d([x{\vee}y],[x]\nabla[y])\approx 0
Proof.

Indeed, two-fold application of example 10.4 yields

d([xy],[x][y])([xyϕ(x)ψ(y)])([xy])d([x{\vee}y],[x]\nabla[y])\approx\ell([x{\vee}y{\vee}\phi(x){\vee}\psi(y)])-\ell([x{\vee}y])
([xy])([xy])=0\approx\ell([x{\vee}y])-\ell([x{\vee}y])=0

The next two important lemmas about auxiliary and main variables are exactly proved in the general framework of section 2.

Lemma 10.6 (The more main variables the longer the sentence).

If sSs\in S and tSt\in S a copy of it, but where some auxiliary variables are declared to main variables, then

([s])([t])\ell([s])\leq\ell([t])
Proof.

In our natural definition of SS there is no difference between main and auxiliary variables besides the rule (6), which can only help to reduce the \ell-length of a sentence, whence the inequality. ∎

The next lemma, which is an important assertion that holds actually in the general framework of section 2 as well, says that in an optimal sentence we can change all auxiliary variables to main variables without changing the element in the quotient Sˇ{\check{S}}:

Lemma 10.7 (Optimal sentences and main variables).

If xSx\in S is optimal in the sense that ([x])=L(x)\ell([x])=L(x), and xx^{\prime} is a copy of the sentence xx but where all unbound auxiliary variables are replaced by main variables (in a bijective way) then

d([x],[x])=0d([x],[x^{\prime}])=0
[x][x] in S¯[[x]]=[[y]] in Sˇ[x]\leq[x^{\prime}]\mbox{ in }\overline{S}\qquad[[x]]=[[y]]\mbox{ in }{\check{S}}
Proof.

Choose a variable transformation ϕPS\phi\in P_{S} such that the copy ϕ(x)\phi(x) of xx is disjoint to xx and xx^{\prime}, i.e. ϕ(x)x\phi(x)\bot x and ϕ(x)x\phi(x)\bot x^{\prime}.

Let zz be a verbatim copy of ϕ(x)\phi(x) but where all unbound auxiliary variables are declared to main variables. Then by lemma 10.6, ([ϕ(x)x])([zx])\ell([\phi(x){\vee}x^{\prime}])\leq\ell([z{\vee}x^{\prime}]). By law (4), zz is just another variable transformed copy of xx^{\prime} and thus zxxz{\vee}x^{\prime}\equiv x^{\prime}. Hence, by assumption we get

([x][x)])=([ϕ(x)x])([zx])=([x])\ell([x]\nabla[x^{\prime})])=\ell([\phi(x){\vee}x^{\prime}])\leq\ell([z{\vee}x^{\prime}])=\ell([x^{\prime}])
L(x)=L(x)=([x])([x][x)])\leq L(x^{\prime})=L(x)=\ell([x])\leq\ell([x]\nabla[x^{\prime})])

Thus d([x],[x])=0d([x],[x^{\prime}])=0 by definition 2.12. ∎

The following example illustrates the last lemma:

Example 10.8 (Changing auxiliary to main variables).

(i) Consider the sentence αS\alpha\in S defined by

α:=(f¯:=sinu+cosuu(x):=5x9+2x3+77x2+1)\alpha:=\quad\Big{(}\underline{f}:=\sin\circ u+\cos\circ u\quad{\vee}\quad u(x):=5x^{9}+2x^{3}+77x^{2}+1\Big{)}

Let α\alpha^{\prime} be a verbatim copy of α\alpha but where each uu is replaced by u¯\underline{u}, so a main variable. As it appears, the representations of α\alpha and α\alpha^{\prime} are already shortest possible and thus

(30) ([α])=L(α)=L(α)=([α])\ell([\alpha])=L(\alpha)=L(\alpha^{\prime})=\ell([\alpha^{\prime}])
([α][α])=([αg¯(x):=sin(u¯(x))+cos(u¯(x))])=([α])\ell([\alpha^{\prime}]\nabla[\alpha])=\ell([\alpha^{\prime}\;{\vee}\;\underline{g}(x):=\sin(\underline{u}(x))+\cos(\underline{u}(x))])=\ell([\alpha])

because g¯\underline{g} is a superfluous copy of f¯\underline{f} by (4), and where we have dropped the definition of uu in α\alpha by (6) and taken u¯\underline{u} from α\alpha^{\prime} instead. Thus

d([α],[α])=([αα])([α])=0d([\alpha],[\alpha^{\prime}])=\ell([\alpha{\vee}\alpha^{\prime}])-\ell([\alpha])=0

Because of (30), this is in accordance with lemma 10.7.

(ii) But if uu is a short polynomial, say u(x)=2xu(x)=2x, then α\alpha can be simplified by saying [α]=[g(x):=sin(2x)+cos(2x)][\alpha]=[g(x):=\sin(2x)+\cos(2x)] (so α\alpha is not optimal, i.e. ([α])L(α)\ell([\alpha])\neq L(\alpha)), but α\alpha^{\prime} can be not, as u¯\underline{u} is a main variable and so cannot be deleted, and so

d([α],[α])=([αα])([α])=([α])([α])>0d([\alpha],[\alpha^{\prime}])=\ell([\alpha{\vee}\alpha^{\prime}])-\ell([\alpha])=\ell([\alpha^{\prime}])-\ell([\alpha])>0

This example demonstrates that the assumption of lemma 10.7 is necessary.

Example 10.9 (Approximate automaticity of (4)).

Relation (4), that is, xϕ(x)xx{\vee}\phi(x)\equiv x for xϕ(x)x\bot\phi(x), is an idealization that holds approximately automatically in our prototype language, that is, (xϕ(x))(x)\ell(x{\vee}\phi(x))\approx\ell(x). Indeed, as an illustration considering at first the sentence α\alpha of example 10.11.(i), we may write (without using (4))

αϕ(α)αg¯:=f¯\alpha{\vee}\phi(\alpha)\quad\equiv\quad\alpha{\vee}\underline{g}:=\underline{f}

because we can shorten the long expression ϕ(α)\phi(\alpha) by equivalence in mathematics by just saying that we have a main variable g¯\underline{g} which is just a function like f¯\underline{f} and so we have

([αϕ(α)])=([α])+3\ell([\alpha{\vee}\phi(\alpha)])=\ell([\alpha])+3

More generally, one drops the copy ϕ(α)\phi(\alpha) but rescues the main variables from ϕ(a)\phi(a) by adding a list g¯1:=f¯1g¯n:=f¯n\underline{g}_{1}:=\underline{f}_{1}{\vee}\ldots{\vee}\underline{g}_{n}:=\underline{f}_{n} where g¯i\underline{g}_{i} are the main variables appearing in ϕ(α)\phi(\alpha) and f¯i\underline{f}_{i} those corresponding in α\alpha.

Even more, one could encode the main variables f¯1,,f¯n\underline{f}_{1},\ldots,\underline{f}_{n} into one main variable f¯\underline{f} as a function on {\mathbb{N}} and use f¯(1),,f¯(n)\underline{f}(1),\ldots,\underline{f}(n) instead. It would be then enough to say g¯:=f¯\underline{g}:=\underline{f}.

The fact that in optimal sentences (i.e. those xSx\in S for which L(x)([x])L(x)\approx\ell([x])) the auxiliary variables can be turned to main variables without changing the optimality (see lemma 10.7) can be exploited to give an instance where the triangle inequality is invalid:

Example 10.10 (Violation of Δ\Delta-inequality).

Assume that aSa\in S has two approximately optimal representations x,ySx,y\in S in the sense that [a]=[x]=[y][a]=[x]=[y] in S¯\overline{S} and ([a])L(x)L(y)\ell([a])\approx L(x)\approx L(y). Of course,

d([a],[x])=d([a],[y])=0d([a],[x])=d([a],[y])=0

Since xx and yy are approximately optimal short in length, we may replace all unbound auxiliary variables of xx and yy, respectively, by main variables to obtain new sentences xx^{\prime} and yy^{\prime}, respectively, and still

d([x],[x])0d([y],[y])0d([x],[x^{\prime}])\approx 0\qquad d([y],[y^{\prime}])\approx 0

by lemma 10.7. But because xx and yy may be very different in SS, by all the new main variables appearing now in xx^{\prime} and yy^{\prime}, xx^{\prime} and yy^{\prime} contain much necessary information which cannot be deleted (so (6) does not apply) and it may be difficult to simplify xϕ(y)x^{\prime}{\vee}\phi(y^{\prime}), so we may get

d([x],[y])0d([x^{\prime}],[y^{\prime}])\gg 0

But this contradicts the Δ\Delta-inequality, as assuming it, we got

0d([x],[a])+d([a],[y])d([x],[y])00\approx d([x^{\prime}],[a])+d([a],[y^{\prime}])\geq d([x^{\prime}],[y^{\prime}])\gg 0

It would be helpful if one had a difference element in the sense that if xyx\leq y for x,yS¯x,y\in\overline{S} then there is a ‘complement’ zS¯z\in\overline{S} such that xz=yx\nabla z=y (xx united with zz is the whole yy) and ζ(xz)=0\zeta(x\cap z)=0 (the intersection of xx and zz is zero), that is,

(xz)=(x)+(z),\ell(x\nabla z)=\ell(x)+\ell(z),

by definition 6.1, or in other words, ζ(z)=ζ(y\x)\zeta(z)=\zeta(y\backslash x). But it does not exist in general:

Example 10.11 (Difference Element does not exist).

Consider two formulas α,γS\alpha,\gamma\in S as follows:

α:=(b¯(x):=x2+1y¯:=b2+b)\alpha:=\quad\Big{(}\underline{b}(x):=x^{2}+1\quad{\vee}\quad\underline{y}:=b^{2}+b\Big{)}
γ:=(z¯(x):=x4+3x2+2)\gamma:=\quad\Big{(}\underline{z}(x):=x^{4}+3x^{2}+2\Big{)}

We assume that α\alpha and γ\gamma, as it appears, are already in their shortest form, so, observing also that y¯=z¯\underline{y}=\underline{z} are the same polynomials, and to this applying (4) and then (3), we get

[α][γ]=[b¯(x):=x2+1y¯(v)=v4+3v2+2z¯(t):=t4+3t2+2]=[α][\alpha]\nabla[\gamma]=[\underline{b}(x):=x^{2}+1\;{\vee}\;\underline{y}(v)=v^{4}+3v^{2}+2\;{\vee}\;\underline{z}(t):=t^{4}+3t^{2}+2]=[\alpha]
([α])=L(α)=15,([γ])=L(γ)=13,[γ][α],([α][γ])=([α])\ell([\alpha])=L(\alpha)=15,\quad\ell([\gamma])=L(\gamma)=13,\quad[\gamma]\leq[\alpha],\quad\ell([\alpha]\nabla[\gamma])=\ell([\alpha])
ζ([α]\[γ])=([α][γ])([γ])=d([α],[γ])=2\zeta([\alpha]\backslash[\gamma])=\ell([\alpha]\nabla[\gamma])-\ell([\gamma])=d([\alpha],[\gamma])=2

The last line shows that a difference element [α]\[γ][\alpha]\backslash[\gamma] in S¯\overline{S} does not exist, as it should have only length 22, which appears impossible in our language SS.

11. Examples from Physics

In this section we collect various examples from physics, and indicate that the distance between an existing and a new improved theory is often very low, as well as the lengths of the theories itself. A very good instance is already the transition form the Klein-Gordon equation to the Dirac equation discussed in example 2.13 and the paragraph before it. Another example would be Maxwell’s equations extended from Ampere’s and Faraday’s equations.

Example 11.1 (Quantum mechanics).

Classical mechanics can be described as follows. In these equations, p˙i\dot{p}_{i} means derivative of pip_{i} after time, and q˙i\dot{q}_{i} a free independent variable

pi=q˙i(q,q˙,t)(compute q˙i=q˙i(q,p,t) from these equations)p_{i}=\frac{\partial{\mathcal{L}}}{\partial\dot{q}_{i}}(q,\dot{q},t)\qquad\mbox{(compute $\dot{q}_{i}=\dot{q}_{i}(q,p,t)$ from these equations)}
H(q,p,t)=i=1nq˙ipi(Hamilton function)H(q,p,t)=\sum_{i=1}^{n}\dot{q}_{i}p_{i}-{\mathcal{L}}\qquad\mbox{(Hamilton function)}
p˙i=Hqiq˙i=Hpi(Hamilton equations)\dot{p}_{i}=-\frac{\partial H}{\partial q_{i}}\quad{\vee}\quad\dot{q}_{i}=\frac{\partial H}{\partial p_{i}}\qquad\mbox{(Hamilton equations)}

If the Lagrangian =TU{\mathcal{L}}=T-U (kinetic energy - potential energy) then H=EH=E (energy). The Schrödinger equations [18, 19] read

(31) p^i=ihqi\hat{p}_{i}=-ih\frac{\partial}{\partial q_{i}}
(32) ihψt(q,p^,t)=H(q,p^,t)ψ(q,t)ih\frac{\partial\psi}{\partial t}(q,\hat{p},t)=H(q,\hat{p},t)\psi(q,t)

We abbreviate quantum mechanics by QMQM, Schrödinger equations (31)-(32) by SESE, Hamilton function by HH, Hamilton equations by HEHE and classical mechanics by CMCM. The distance between QM=SEHQM=SE{\vee}H and CM=HEHCM=HE{\vee}H is

d([QM],[CM])=([QM][CM])([CM])=d([QM],[CM])=\ell([QM]\nabla[CM])-\ell([CM])=
([SEHϕ(HE)ϕ(H)])([CM])\ell([SE{\vee}H{\vee}\phi(HE){\vee}\phi(H)])-\ell([CM])

for a suitable ϕPS\phi\in P_{S} by def. 2.8. Since HH and ϕ(H)\phi(H) are copies of each other, we may take the Hamilton function HH instead of its copy ϕ(H)\phi(H) in ϕ(HE)\phi(HE) and so choose a variable transformation ψPS\psi\in P_{S} accordingly and then dropping the superfluous term ϕ(H)\phi(H) which only consists of auxiliary variables and which is not needed anymore by (6). So we get

=([SEHψ(HE)])([HEH])=\ell([SE{\vee}H{\vee}\psi(HE)])-\ell([HE{\vee}H])
(33) ([SE])+([H]+([ψ(HE)])([HE])([H])\approx\ell([SE])+\ell([H]+\ell([\psi(HE)])-\ell([HE])-\ell([H])
=([SE])100=\ell([SE])\approx 100

a low number, for all Hamilton functions HH.

In (33) we have assumed that there is not much optimization possible between the involved parts, in particular in relation to a longer Hamilton function HH.

The \nabla-inequality is somehow the idealization of an estimate as above. Compare it with (cf. also lemma 10.5) the fast approximate estimate by definition 4.1, provided the \nabla-inequality holds for dd,

d([SE][H],[HE][H])d([SE],[HE])d([SE]\nabla[H],[HE]\nabla[H])\leq d([SE],[HE])
Example 11.2 (General relativity).

In generalizing the field equation of Newtonian mechanics to relativity, Einstein noted that because of the equation E=mc2E=mc^{2}, a perpetuum mobile could be formed by sending a light beam upwards from a big mass like the earth and so lifting mass upwards in a gravitational field without using energy.

To avoid violation of the theorem of conservation of energy, he suggested that spacetime is deformed under the influence of mass, so that the upwards-traveling lightbeam could shift to red and so lose energy. To this end, mass is substituted by energy, which is turned to a tensor to obtain the energy momentum tensor TT, which interacts with the metric gg of the 4-dim spacetime by the formula [7]

RλμR2gλμ=8Gπc4TλμR_{\lambda\mu}-\frac{R}{2}g_{\lambda\mu}=-\frac{8G\pi}{c^{4}}T_{\lambda\mu}

Once again, the distance between general relativity GR and Newtonian field equations NF appears very low, i.e.

d([GR],[NF])100d([GR],[NF])\approx 100

maybe the shortest possible of a solution extending SR, NF and solving the perpetuum mobile problem. Simply already because the theory of GR is short, i.e. ([GR])\ell([GR]) is low (cf. lemma 3.5).

Since Maxwell’s equations are Lorentz invariant, and E=mc2E=mc^{2} is deduced from Maxwell’s equations, GR might already follow from the latter by optimization, because if nature were not Lorentz invariant, but the Maxwell’s equations hold, the whole alternative theory might be longer.

Problem 11.3.

Given any theory xx with singularities, indicate a theory yy without singularities (i.e. without mathematical pathological inconsistencies and divergences) and such that yy works approximately as xx in the range where xx works and

d(x,y)=lowd(x,y)=\mbox{low}

Even the low distance alone might increase the likelihood that yy generalizes xx properly.

12. Examples of Computing

For computer languages we use the definitions of example 2.14.

Lemma 12.1.

Assume that ϕ:AB\phi:A\rightarrow B is an isomorphism between two computer languages A,BA,B (exceptionally with only finitely many main variables available).

Then there is a constant M0M\geq 0 such that for all s,tAs,t\in A

|dB(ϕ(s),ϕ(t))dA(s,t)|M|d_{B}(\phi(s),\phi(t))-d_{A}(s,t)|\leq M

Hence

limdA(s,t)d(ϕ(s),ϕ(t))d(s,t)=1\lim_{d_{A}(s,t)\rightarrow\infty}\frac{d(\phi(s),\phi(t))}{d(s,t)}=1
Proof.

Indeed, write a compiler or interpreter iBi\in B such that

ϕ(s)is~=(ia:=sb)\phi(s)\quad\equiv\quad i{\vee}\tilde{s}\quad=\quad\Big{(}i\quad{\vee}\quad a:=s^{\prime}\quad{\vee}\quad b\Big{)}

does the same on the computer BB as ϕ(s)\phi(s) does by compiling or interpreting the code sAs\in A on the computer BB. We assume since ϕ\phi is an isomorphism this is possible, and that ϕ\phi is given on the formal language level as in definition 8.1 in this proof. Here ss^{\prime} means an optimal short program ss (i.e. A([s])=LA(s)\ell_{A}([s])=L_{A}(s)) as a pure text string, which we assign the variable aa above. Further, bb means some minor code containing all main variables of ϕ([s])\phi([s]) and calling ii to interpret or compile aa. Moreover, s~:=(a:=sb)\tilde{s}:=(a:=s^{\prime}{\vee}b) is just an abbreviation.

We also assume that

LA(s)=L(s)L_{A}(s)=L(s^{\prime})

gives only the length of the text string ss^{\prime} and similarly so for LB(t)L_{B}(t). Also, as we have only finitely many main variables at all (say one entry point of a program is the main variable), we assume that there is a fixed constant KK such that

LB(b)<KL_{B}(b)<K

for all sSs\in S. We have

(is~)(it~)is~jt~is~t~ist~ix~(i{\vee}\tilde{s})\nabla(i{\vee}\tilde{t})\equiv i{\vee}\tilde{s}{\vee}j{\vee}\tilde{t}\equiv i{\vee}\tilde{s}{\vee}\tilde{t}\equiv i{\vee}\widetilde{s{\vee}t}\equiv i{\vee}\tilde{x}

for any x=stx=s{\vee}t. Here, the second equality is because we do not need a copy jj of the compiler ii, the third one is because instead of interpreting ss^{\prime} and tt^{\prime} separately, we may interpret (st)(s{\vee}t)^{\prime} at once to get the same code.

Hence, setting M:=LB(i)+3+KM:=L_{B}(i)+3+K we get

B([ϕ(s)])=B([is~])LB(i)+LB(s~)LB(i)+L(s)+3+LB(b)\ell_{B}([\phi(s)])=\ell_{B}([i{\vee}\tilde{s}])\leq L_{B}(i)+L_{B}(\tilde{s})\leq L_{B}(i)+L(s^{\prime})+3+L_{B}(b)
M+L(s)=M+LA(s)=A([s])+M\leq M+L(s^{\prime})=M+L_{A}(s)=\ell_{A}([s])+M

Analogously we get

A([ϕ1(t)])B([t])+M\ell_{A}([\phi^{-1}(t)])\leq\ell_{B}([t])+M

so that we end up with

|B([ϕ(s)])A([s])|M|\ell_{B}([\phi(s)])-\ell_{A}([s])|\leq M

Thus

|σ([is~],[it~])σ([s],[t])||\sigma([i{\vee}\tilde{s}],[i{\vee}\tilde{t}])-\sigma([s],[t])|
|2B([ix~])B([is~])B([it~])2A([x])+A([s])+A([t])|\approx|2\ell_{B}([i{\vee}\tilde{x}])-\ell_{B}([i{\vee}\tilde{s}])-\ell_{B}([i{\vee}\tilde{t}])-2\ell_{A}([x])+\ell_{A}([s])+\ell_{A}([t])|
4M\leq 4M

We are going to define a Reduced Instruction Set Central Processing Unit (RISC CPU) in our ordinary mathematical language of physics and technics by describing its function of memory, accumulator and program pointer in spacetime:

Example 12.2.

A simple computer can be realized as formulas like so:

xt+1,n=at[xt,pt=Sxt,pt+1=n]+xt,n(1[xt,pt=Sxt,pt+1=n])x_{t+1,n}=a_{t}[x_{t,p_{t}}=S{\wedge}x_{t,p_{t}+1}=n]+x_{t,n}(1-[x_{t,p_{t}}=S{\wedge}x_{t,p_{t}+1}=n])
at+1\displaystyle a_{t+1} =\displaystyle= xt,pt+1[xt,pt=L]+(at+xt,pt+1)[xt,pt=A]\displaystyle x_{t,p_{t}+1}[x_{t,p_{t}}=L]+(a_{t}+x_{t,p_{t}+1})[x_{t,p_{t}}=A]
+(atxt,pt+1)[xt,pt=M]+at[xt,pt{L,A,M}]\displaystyle+(a_{t}*x_{t,p_{t}+1})[x_{t,p_{t}}=M]+a_{t}[x_{t,p_{t}}\notin\{L,A,M\}]
pt+1=xt,pt+1[xt,pt=Bat0]+(pt+2)(1[xt,pt=Bat0])p_{t+1}=x_{t,p_{t}+1}[x_{t,p_{t}}=B{\wedge}a_{t}\geq 0]+(p_{t}+2)(1-[x_{t,p_{t}}=B{\wedge}a_{t}\geq 0])

where [A]=1[A]=1\in{\mathbb{R}} if an assertion AA is true, and [A]=0[A]=0\in{\mathbb{R}} otherwise.

Here, x:×x:{\mathbb{Z}}\times{\mathbb{Z}}\rightarrow{\mathbb{R}} stands for memory, a:a:{\mathbb{Z}}\rightarrow{\mathbb{R}} for an accumulator, and p:p:{\mathbb{Z}}\rightarrow{\mathbb{R}} for program pointer. Everything is time indexed tt\in{\mathbb{Z}}. Then xt,nx_{t,n}\in{\mathbb{R}} is the value of memory at place nn\in{\mathbb{Z}} at time tt\in{\mathbb{N}}. Similarly, at,pta_{t},p_{t}\in{\mathbb{R}} is the value of accumulator, program pointer, respectively, at time tt\in{\mathbb{N}} . We have the opcodes S,L,A,M,BS,L,A,M,B (some constant real values) which stand for store accumulator, load accumulator, add accumulator, multiply accumulator, and branch. One command has the format opcode and value in the next memory. For example

L1000B2000L1000\quad B2000

mean, load accu with the value at memory address 1000, or, branch to 2000 if accu 0\geq 0.

If a sentence sSs\in S is longer, it may pay off to implement the above computer and implement the formula ss via a program. In other words, to add the above computer cc as auxiliary variables, i.e. sscs\equiv s{\vee}c and simplify scs{\vee}c by writing a program on cc which substitutes (part of) ss. In this way, long sentences are in concurrence with computing, which emerges out of nowhere simply because of the pressure of optimization.

The above formulas are similarly to differential equations in rough shape. One could analogously allow functions than real values, that is, x:×C()x:{\mathbb{Z}}\times{\mathbb{Z}}\rightarrow C^{\infty}({\mathbb{R}}) and add a differential command for the accumulator, say, at+1=dat/dya_{t+1}=da_{t}/dy.

We may enrich any language without variables with variables to get a macro language:

Example 12.3 (Macro language).

Given a language SS, with or without variables, we may form a macro language MM by adding countably infinitely many auxiliary and main variables, function definitions (macros) which accept text strings as input and give a text string as output, function calls, concatenation of text (written .), and a {\vee} sign. Sentences of MM are sentences of SS enriched with macros.

For example, both x¯\underline{x} are the same:

x¯:=(f(`aa,b)f(x,y):=x.x.y.`teee.y)=x¯:=aaaabteeeb\underline{x}:=\Big{(}f(`aa^{\prime},^{\prime}b^{\prime}){\vee}f(x,y):=x.x.y.`teee^{\prime}.y\Big{)}\quad=\quad\underline{x}:=^{\prime}aaaabteeeb^{\prime}

If two sentences s,ts,t in SS are similar by much text overlap, the distance d(s,t)d(s,t) becomes low in the macro language TT by using macros.

Example 12.4 (Further examples).

(i) In the usual mathematical framework we could also allow Boolean algebra expressions like

A:=a¯=x1x¯2+x9(x¯7+x6)+.A:=\quad\underline{a}=x_{1}\overline{x}_{2}+x_{9}(\overline{x}_{7}+x_{6})+....

and compare d(A,B)d(A,B). More generally we may compare any two functions f,gf,g which allow definitions by symbolic expressions by considering d(f,g)d(f,g).

(ii) Further we could work with text strings and concatenation in our general mathematical framework and in this way get a stronger macro language than in example 12.3 as we had natural numbers, functions or pointers to functions as arguments in functions calls etc.

(iii) We may consider set definitions and in this way compute the distance between sets, or algebraic structured sets like rings etc.

(iv) Mathematical proofs may be compared by writing down proofs aa, regarding them as text strings (or better as text strings in the framework of the mathematical logic of deduction), and compare these text strings within the macro language as in example 12.3 or in item (ii).

References

  • [1] A. Bertoni, M. Goldwurm, and N. Sabadini. Computing the counting function of context-free languages. STACS 87, Theoretical aspects of computer science, Proc. 4th annu. Symp., Passau/FRG 1987, Lect. Notes Comput. Sci. 247, 169-179 (1987)., 1987.
  • [2] Ehud Cseresnyes and Hannes Seiwert. Regular expression length via arithmetic formula complexity. J. Comput. Syst. Sci., 125:1–24, 2022.
  • [3] Cewei Cui, Zhe Dang, Thomas R. Fischer, and Oscar H. Ibarra. Similarity in languages and programs. Theor. Comput. Sci., 498:58–75, 2013.
  • [4] P. A. M. Dirac. The quantum theory of the electron. I. Proc. R. Soc. Lond., Ser. A, 117:610–624, 1928.
  • [5] P. A. M. Dirac. Quantised singularities in the electromagnetic field. Proc. R. Soc. Lond., Ser. A, 133:60–72, 1931.
  • [6] Radosav Djordjević, Nebojša Ikodinović, and Nenad Stojanović. A propositional metric logic with fixed finite ranges. Fundam. Inform., 174(2):185–199, 2020.
  • [7] A. Einstein. Zur allgemeinen Relativitätstheorie. Berl. Ber., 1915:778–786, 1915.
  • [8] Sean A. Fulop and David Kephart. Topology of language classes. In The 14th meeting on the mathematics of language. Proceedings of the meeting, MoL 14, Chicago, IL, USA, July 25–26, 2015, pages 26–38. Stroudsburg, PA: Association for Computational Linguistics, 2015.
  • [9] Giangiacomo Gerla. Distances, diameters and verisimilitude of theories. Arch. Math. Logic, 31(6):407–414, 1992.
  • [10] Yo-Sub Han and Sang-Ki Ko. Alignment distance of regular tree languages. In Implementation and application of automata. 22nd international conference, CIAA 2017, Marne-la-Vallée, France, June 27–30, 2017. Proceedings, pages 126–137. Cham: Springer, 2017.
  • [11] Stephen W. Hawking and W. Israel. General relativity. An Einstein centenary survey. Cambridge etc.: Cambridge University Press. XVIII, 919 p. £ 37.50 (1979)., 1979.
  • [12] Oscar H. Ibarra, Ian McQuillan, and Bala Ravikumar. On counting functions and slenderness of languages. Theor. Comput. Sci., 777:356–378, 2019.
  • [13] Teruo Imaoka. Algebraic semigroups, formal languages and computation. Proceedings of a symposium held at the Research Institute for Mathematical Sciences, Kyoto University, Kyoto, Japan, February 19–21, 2001. RIMS Kokyuroku 1222, viii, 188 p. (2001)., 2001.
  • [14] Ciresica Jalobeanu. Approximation in languages. In Current issues in mathematical linguistics. 1st International Conference, ICML ’93, Virgili University, Tarragona, Catalonia, Spain, on March 30- 31, 1993. Selected papers, pages 161–169. Amsterdam: North-Holland, 1994.
  • [15] Sang-Ki Ko, Yo-Sub Han, and Kai Salomaa. Top-down tree edit-distance of regular tree languages. In Language and automata theory and applications. 8th international conference, LATA 2014, Madrid, Spain, March 10–14, 2014. Proceedings, pages 466–477. Berlin: Springer, 2014.
  • [16] Austin J. Parker, Kelly B. Yancey, and Matthew P. Yancey. Regular language distance and entropy. In 42nd international symposium on mathematical foundations of computer science, MFCS 2017, August 21–25, 2017, Aalborg, Denmark, page 14. Wadern: Schloss Dagstuhl – Leibniz Zentrum für Informatik, 2017. Id/No 3.
  • [17] Giovanni Pighizzini. How hard is computing the edit distance? Inf. Comput., 165(1):1–13, 2001.
  • [18] E. Schrödinger. Quantisierung als Eigenwertproblem. I. Ann. der Phys. (4), 79:361–374, 1926.
  • [19] E. Schrödinger. Quantisierung als Eigenwertproblem. II. Ann. der Phys. (4), 79:489–527, 1926.
  • [20] Barney Stratford. The topology of language. J. Math. Psychol., 53(6):502–509, 2009.
  • [21] Steven Weinberg. Ultraviolet divergences in quantum theories of gravitation. In Hawking, Stephen W. and Israel, W., Cambridge etc.: Cambridge University Press. XVIII, 919 p. £ 37.50 (1979)., pages 790–831. 1979.
  • [22] Steven Weinberg. The quantum theory of fields. Vol. 1: Foundations. Cambridge: Cambridge University Press, corr. repr. of the 1995 orig. edition, 1996.