Uncertain Linear Logic via Fibring of Probabilistic and Fuzzy Logic

Ben Goertzel

Abstract

Beginning with a simple semantics for propositions, based on counting observations, it is shown that probabilistic and fuzzy logic correspond to two different heuristic assumptions regarding the combination of propositions whose evidence bases are not currently available. These two different heuristic assumptions lead to two different sets of formulas for propagating quantitative truth values through lattice operations. It is shown that these two sets of formulas provide a natural grounding for the multiplicative and additive operator-sets in linear logic. The standard rules of linear logic then emerge as consequences of the underlying semantics. The concept of linear logic as a “logic of resources” is manifested here via the principle of “conservation of evidence” – the restrictions to weakening and contraction in linear logic serve to avoid double-counting of evidence (beyond any double-counting incurred via use of heuristic truth value functions).

1 Introduction

Linear logic [Gir87] comprises a rich and fascinating formal system that summarizes, in a nuanced way, the way logical inference works if one treats the pool of potential premises of inferences as a resource to be meted out and accounted for. The linear logic abstractions can be applied to practical reasoning systems in a variety of different ways, and can be grounded in concrete domain-specific inference formalisms via multiple routes as well.

Here we connect linear logic to uncertain reasoning based on observational semantics. Beginning with a simple semantics for propositions, based on counting observations, we argue that probabilistic and fuzzy logic correspond to two different heuristic assumptions regarding the combination of propositions whose evidence bases are not currently available. These two different heuristic assumptions lead to two different sets of formulas for propagating quantitative truth values through lattice operations. Given this set-up, it becomes immediately apparents that these two sets of formulas instantiate the same algebraic and conceptual relationships as the multiplicative and additive operator-sets in linear logic. The standard rules of linear logic then emerge as consequences of the underlying semantics of fuzzy and probabilistic evidence management.

The concept of linear logic as a “logic of resources” is manifested here via the principle of “conservation of evidence” – the restrictions to weakening and contraction in linear logic serve to avoid double-counting of evidence (beyond any double-counting incurred via use of heuristic truth value functions).

2 Core Concepts of Linear Logic

First we summarize some basic concepts of linear logic [Gir87].

In linear logic, every propositional variable is considered as a proposition.

For each proposition A, there is a proposition $A^{\perp}$ , the negation of A.

For each proposition A and proposition B, there are four additional propositions:

•

$A\&B$ (read “with”), the additive conjunction of And B;
•

$A\oplus B$ (read “plus”), the additive disjunction of And B;
•

$A\otimes B$ (read “times”), the multiplicative conjunction of And B;
•

$A\mid B$ , the multiplicative disjunction of And B.

There are also four constants to go with these four binary operations:

•

$\top$ (read “top”), the additive truth;
•

$\mathbf{0}$ (read “zero”), the additive falsity;
•

$\mathbf{1}$ (read “one”), the multiplicative truth;
•

$\bot$ (read “bottom”), the multiplicative falsity.

; and, for each proposition A, there are two additional propositions:

•

!A (read “of course”), the exponential conjunction of A;
•

?A (read “why not”), the exponential disjunction of A.

The interpretation of these operations is a complex issue on which there are many complementary perspectives. [LM92] gives a concrete interpretation in terms of a specific computational model . An informal “resource interpretation” is commonly discussed, for instance with a “vending machine” metaphor [Hoa83]:

Suppose we represent having a candy bar by the atomic proposition candy, and having a dollar by $1. To state the fact that a dollar will buy you one candy bar, we might write the implication $\$1\rightarrow\textrm{candy}$ . But in ordinary (classical or intuitionistic) logic, from And $A\rightarrow B$ one can conclude $A\wedge B$ . So, ordinary logic leads us to believe that we can buy the candy bar and keep our dollar! Of course, we can avoid this problem by using more sophisticated encodings…

The vending machine is used as a metaphor for linear logic as follows:

… rather than $\$1\rightarrow\textrm{candy}$ , we express the property of the vending machine as a linear implication $\$1\rightarrow candy$ . From $1 and this fact, we can conclude candy, but not $\$1\otimes candy$ . In general, we can use the linear logic proposition $A\rightarrow B$ to express the validity of transforming resource A into resource B.

Multiplicative conjunction ( $A\otimes B$ ) denotes simultaneous occurrence of resources, to be used as the consumer directs. For example, if you buy a stick of gum and a bottle of soft drink, then you are requesting $\textrm{gum}\otimes\textrm{drink}$ .

Additive conjunction ( $A\&B$ ) represents alternative occurrence of resources, the choice of which the consumer controls. If in the vending machine there is a packet of chips, a candy bar, and a can of soft drink, each costing one dollar, then for that price you can buy exactly one of these products. Thus we write $\$1\rightarrow(\textrm{candy}\&\textrm{chips}\&\textrm{drink})$ . We do not write $\$1\rightarrow(\textrm{candy}\otimes\textrm{chips}\otimes\textrm{drink})$ , which would imply that one dollar suffices for buying all three products together. However, from $\$1\rightarrow(\textrm{candy}\&\textrm{chips}\&\textrm{drink})$ , we can correctly deduce $\$3\rightarrow(\textrm{candy}\otimes\textrm{chips}\otimes\textrm{drink})$ , where $\$3=\$1\otimes\$1\otimes\$1$ .

Additive disjunction ( $A\oplus B$ ) represents alternative occurrence of resources, the choice of which the machine controls. For example, suppose the vending machine permits gambling: insert a dollar and the machine may dispense a candy bar, a packet of chips, or a soft drink. We can express this situation as $\$1\rightarrow(\textrm{candy}\oplus\textrm{chips}\oplus\textrm{drink})$ .

Multiplicative disjunction ( $A\mid B$ ) is more difficult to gloss in terms of the resource interpretation.

Overall, the metaphor somewhat works, but then gets confusing when pushed too far. The approach taken here grounds linear logic operators in evidential resources, which is exact rather than metaphorical.

3 Observational Semantics for Uncertain Inference

Following the approach outlined in our prior writings on Probabilistic Logic Networks [GIGH08], we propose to ground the semantics of propositions in finite sets of observations made by a particular system.

Suppose each proposition $A$ under consideration is supported by a certain set $O_{A}$ of observations, and has a certain quantitative truth value (which may be a single number of a tuple of numbers; we will consider, specifically, pair truth values below).

To calculate the probabilistic truth value of $A\wedge B$ (the conjunction of And B) or $A\vee B$ (the disjunction of And B), if one has not retained the observation-sets $O_{A}$ and $O_{B}$ , one has to make some assumptions about the relationship between $O_{A}$ and $O_{B}$ .

Two assumptions are particularly simple to make:

•

Max Overlap: That $O_{A}$ and $O_{B}$ maximally overlap: i.e. if they are the same size, then they are identical … and if they are different sizes, then one is entirely a subset of the other
•

Independence: That $O_{A}$ and $O_{B}$ are probabilistically independent samples from some larger space

These two assumptions lead to different uncertain truth value formulas

•
Max Overlap:
- –
  
  $p(A\wedge B)=min(p(A),p(B))$
- –
  
  $p(A\vee B)=max(p(A),p(B))$
•
Independence:
- –
  
  $p(A\wedge B)=p(A)*p(B)$
- –
  
  $p(A\vee B)=p(A)+p(B)-p(A)*p(B)$

It happens that these are both t-norms, meaning they have intuitively natural algebraic symmetry properties [GQ91].

Alongside $p(A)$ , it is worthwhile to keep track of $n(A)$ , the number of observations on which $p(A)$ is based. This yields two-component truth values $(s,n)$ , which are a variant of what are called “simple truth values” in PLN. In this regard the two assumptions yield different formulae as well:

•
Max Overlap:
- –
  
  $n(A\wedge B)=min(n(A),n(B))$
- –
  
  $n(A\vee B)=max(n(A),n(B))$
•
Independence:
- –
  
  $n(A\wedge B)=n(A)*n(B)$
- –
  
  $n(A\vee B)=n(A)+n(B)-n(A)*n(B)$

As a concrete example, suppose one is evaluating a population of $100$ people, and $A$ = American while $B$ = crazy. Suppose we know

•

n(A) = 20, i.e. 20 people were observed to evaluate the odds of a person being American
•

p(A) = .5, i.e. based on these 20 evaluations, half were observed to be American and the other half not
•

n(B) = 10, i.e. 10 people were observed to evaluate the odds of a person being crazy
•

p(B) = .3, i.e. 3 of these 10 people were observed to be crazy, the others not

Then we are considering two different situations:

•

Max Overlap: The 10 people evaluated regarding craziness, were a subset of the 20 people evaluated regarding American-ness
•

Independence: The 20 people evaluated regarding craziness, and the 10 people evaluated regarding American-ness, were independently randomly selected as subsets of the original 100 people

In these two cases we have:

•
Max Overlap:
- –
  
  $p(A\wedge B)=.3$ , based on looking at the 10 people in $O_{A}\wedge O_{B}$
- –
  
  $p(A\vee B)=.5$ , since in this assumption all the crazy people evaluated were also American … in this assumption the evidence for non-American crazy people is zero
- –
  
  $n(A\wedge B)=10$ , the group of people evaluated for both American-ness and craziness
- –
  
  $n(A\vee B)=20$ , the group of people evaluated for American-ness (all of whom were also evaluated for craziness)
•
Independence:
- –
  
  $p(A\wedge B)=.5*.3=.15$
- –
  
  $p(A\vee B)=.5+.3-.15=.65$
- –
  
  $n(A\wedge B)=20*10/100=2$ , the expected amount of overlap between the 20 randomly selected people evaluated for American-ness and the 10 randomly selected people evaluated for craziness
- –
  
  $n(A\vee B)=28$ , the expected number of people in the union of the 20 randomly selected people evaluated for American-ness and the 10 randomly selected people evaluated for craziness

4 From Observational Semantics to Linear Logic

The core novel suggestion I want to make here is to interpret:

•

$A\&B$ = conjunction of $A$ and $B$ according to the assumption of Max Overlap of the underlying evidence sets
•

$A\oplus B$ = disjunction of $A$ and $B$ according to the assumption of Max Overlap of the underlying evidence sets
•

$A\otimes B$ = conjunction of $A$ and $B$ according to the assumption of probabilistic independence of the underlying evidence sets
•

$A\mid B$ = disjunction of $A$ and $B$ according to the assumption of probabilistic independence of the underlying evidence sets

This can be viewed as a different sort of “resource interpretation” of linear logic, in which the resources involved are not candy bars and such but rather observations made (evidence gathered) in favor of propositions. More explicitly, in this interpretation/construction of linear logic:

•

Multiplicative conjunction represents utilization of a body of potential observations (e.g. the 100 people potentially observable in the example above) to evaluate two properties independently and concurrently. The set of pairs (possible observation giving evidence about American-ness, possible observation giving evidence about craziness) is quite large, since each of these components of the pair is selected independently from the whole body of observations (100 in the example).
•

Additive conjunction represents minimalist utilization of a body of potential observations, for making a combined observation of two properties. We assume the making of as few observations as possible, consistent with the known data.
•

Additive disjunction represents minimalist use of evidence items (for making an observation of one or the other of two properties): a certain piece of evidence might be used for evaluating craziness only, or for evaluating both craziness and American-ness
•

Multiplicative disjunction represents use of evidence items for evaluating one or the other of two properties, in a way that assumes the processes of evaluating these properties are independent. A certain piece of evidence might be used for evaluating craziness only, for evaluating American-ness only, or for evaluating both.

The $0$ and $1$ constants of linear logic may then be interpreted as follows:

•

$\top$ (read “top”), the additive truth, represents a set of observations whose size is the minimum needed to be logically compatible with all the known information about counts and probabilities
•

$\mathbf{1}$ (read “one”), the multiplicative truth, represents a set of observations whose size can be estimated probabilistically based on independence assumptions (e.g. the ”optimal universe size” formula from PLN)
•

$\bot$ (read “bottom”), the multiplicative falsity, represents the class of propositions grounded by no evidence
•

$\mathbf{0}$ (read “zero”), the additive falsity, represents again the class of propositions grounded by no evidence

Intuitionistic and constructive logic began when people saw the possibility of reading $A\rightarrow B$ as “if you give me an A, I will give you a B”, which is a significant departure from the classical reading “B is true whenever A is.” Specifically, the way to read $A\rightarrow B$ in the current interpretation/construction of linear logic is as: The evidential observations being used in favor of $A$ , can be transferred to $B$ so they can be used in favor of $B$ . That is: $A\rightarrow B$ means that, using the evidence supporting $A$ and the rules of logic (and accepting the heuristic assumptions underlying each rule application we do), we can conclude $B$ .

When we have a proof of $A\rightarrow B$ and a proof of $A$ in linear logic, by composing them we actually consume them to get a proof of $B$ , so that $A\rightarrow B$ and $A$ are no longer available after the composition. What this means is that: in this application of Modus Ponens, the pieces of evidence used to estimate the truth value of $A$ are used to help estimate the truth value of $B$ . Thus it is not OK to then use the the truth value of $A$ and the truth value of $B$ together in further inferences – because this would entail double-counting of evidence. In linear logic one says ”A was consumed”; in this evidential interpretation we would say that ”the observations used as evidence for A have been consumed in this particular inference and thus shouldn’t be used again.”

It is straightforward to verify that the structural and logical rules of linear logic (see 1) all hold in this interpretation, with appropriate uncertain truth value formulas associated to them.

Refer to caption — Figure 1: Rules of linear logic

However, an interesting complexity is that when we include the uncertain truth value formulas, we get a noncommutative variant of linear logic.

The positive exponential ! has a fairly satisfying interpretation in terms of the standard resource interpretation of linear logic. Given a resource a, we know that $!a$ means an infinite supply of $a$ . Or, stated more concretely in terms of the connectives of linear logic: $!a=!a\otimes a$ .

In our evidential linear logic, $!a$ can be interpreted to mean an infinite amount of evidence about A. So this means $n(A)=\infty$ .

Given that $?A=(!(A^{\bot}))^{\bot}$ , we can also say that $?A$ means: The complement of the proposition stating there is an infinite amount of evidence in favor of the complement of A. I.e., $?A$ means ”A is still possible,” which implies that either $n(A)<\infty$ or $p(A)>0$ .

The standard rules for linear logic exponentials –

•

If $\Gamma,A,\Delta\implies\Theta$ , then $\Gamma,!A,\Delta\implies\Theta$ ; conversely, if $\Gamma\implies\Delta,A,\Theta$ , then $\Gamma\implies\Delta,!A,\Theta$ , whenever $\Gamma$ consists entirely of propositions of the form !x while $\Delta$ and $\Theta$ consist entirely of propositions of the form ?x
•

Dually, if $\Gamma\implies\Delta,A,\Theta$ , then $\Gamma\implies\Delta,?A,\Theta$ ; conversely, if $\Gamma,A,\Delta\implies\Theta$ , then $\Gamma,?A,\Delta\implies\Theta$ , whenever $\Gamma$ and $\Delta$ consist entirely of propositions of the form !x , while $\Theta$ consists entirely of propositions of the form ?x .

– follow if one assigns $\Gamma\implies\Delta,!A,\Theta$ and $\Gamma,?A,\Delta\implies\Theta$ probabilistic truth values, although in this interpretation ! and ? turn out to be noncommutative as well (when one accounts for the associated quantitative truth value functions).

Exponential isomorphism, i.e.

!(A\&B)\equiv!A\otimes!B

follows from the observation that if there is infinite evidence for the set of evidence in favor of both $A$ and $B$ , under the assumption of maximal overlap of evidence, then there must be infinite evidence obtainable by independently choosing from the infinite evidence for $A$ and the infinite evidence for $B$ ..

In linear logic, de Morgan’s laws hold even for mixtures of additive and multiplicative operators. Multiplication distributes over addition in this mixed sense, as follows:

1.

$A\otimes(B\oplus C)\equiv(A\otimes B)\oplus(A\otimes C)$ (and on the other side);
2.

$A\mid(B\&C)\equiv(A\mid B)\&(A\mid C)$ (and on the other side);

This also works for the operations as defined here. To validate the first equivalence, observe that

a*max(b,c)=max(a*b,a*c)

To validate the second equivalence, observe that

a+min(b,c)-a*min(b,c)=min(a+b-ab,a+c-ac)

(since we assume $0\leq a\leq 1$ ).

5 Linear Logic via Fibring of Max Overlap and Probabilistic Uncertain Logics

Putting these pieces together, we see that the rules of linear logic can be obtained via:

•

(categorially) fibring together the and-or-not logic of uncertain propositions with $(s,n)$ truth values and a Max Overlap heuristic, with the and-or-not logic of uncertain propositions with $(s,n)$ truth values and a probabilistic-independence heuristic
•

Identifying the negation and the zero of the two logics being fibred together

Linear logic, as an abstract structure, can certainly be used to model many situations beside this one. But this is one way of grounding the linear logic operations which is particularly intuitive, and is relevant to artificial intelligence applications and cognitive modeling.

6 Linear Logic and Fuzziness

Suppose that, following [GL10], we model fuzzy characters like ”tall” and ”crazy”, via considering there as being ”tall detectors” and ”crazy detectors” that we can apply to people.

So if Ben is tall to degree .7 and crazy to degree .8, this means that when we hold the tall detector up next to Ben it rings 70% of the percentage of time it can ring for anyone, and when we hold the crazy detector up next to Ben it rings 80% of the percentage of time it can ring for anyone.

What, then is the degree to which Ben is ”tall and crazy”? There are two meanings here

1.

One is holding a ”tall and crazy” detector up to Ben, which rings only when Ben is judged both tall and crazy
2.

One assumes that Ben is having a tall detector and a crazy detector held up simultaneously next to him, and wants to estimate the odds that, at any point in time during this experiment, both the tall detector and the crazy detector are ringing

In the first case, if one assumes that the max possible amount of ringing is the same for both the tall and crazy detector, then one arrives at the conjunction formula

B_{T\wedge C}=min(B_{T},B_{C})

(where $B_{T}$ denotes the degree to which Ben is tall, etc.).

In the second case, if one assumes that the tall detector and the crazy detector are both configured to do all their ringing during the same interval of time after their initial placement next to Ben, and that the timing of each of their rings during that interval is random, and that the maximum activity would be ringing ceaselessly throughout the whole time interval – then one arrives at the conjunction formula

B_{T\wedge C}=B_{T}*B_{C}

So here the two formulas for conjunctively combining fuzzy degrees are seen to represent different conceptions of the semantics of conjunction.

The count $n$ in this context may be interpreted as the number of times that the degree of output of the detector for a certain fuzzy character of a certain individual has been assessed.

The formulas of linear logic would seem to apply to this case just as well as to the probabilistic logic case considered above. The conclusion one comes to here is that fuzzy logic can be modeled as a special case of probabilistic logic, where one is looking at probabilities emanating from a ”property detector” whose behavior one only observes in the aggregate (the amount of ”ringing” it gives when exposed to a given stimulus) rather than at the micro level (one can’t detect or remember the individual ”rings”).

(This also hints at some of the connections between fuzzy logic and quantum logic. In the model of fuzzy logic suggested here, we are assuming that the individual rings are unobservable, and we are then making heuristic assumptions about their relationships. But quantum logic connects in subtle ways with issues about what is and is not observable in principle. But I’ll leave it for later to unpack these connections…)

References

[GIGH08] B. Goertzel, M. Ikle, I. Goertzel, and A. Heljakka. Probabilistic Logic Networks. Springer, 2008.
[Gir87] Jean-Yves Girard. Linear logic. Theoretical computer science, 50(1):1–101, 1987.
[GL10] Ben Goertzel and Ruiting Lian. A probabilistic characterization of fuzzy semantics. Proc. of ICAI-10, Beijing, 2010.
[GQ91] Madan M Gupta and J Qi. Theory of t-norms and fuzzy inference methods. Fuzzy sets and systems, 40(3):431–450, 1991.
[Hoa83] Charles Antony R Hoare. Communicating sequential processes. Communications of the ACM, 26(1):100–106, 1983.
[LM92] Patrick Lincoln and John Mitchell. Operational aspects of linear lambda calculus. In Logic in Computer Science, 1992. LICS’92., Proceedings of the Seventh Annual IEEE Symposium on, pages 235–246. IEEE, 1992.