This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Cardiff University
11email: [email protected]

Bayesian Entailment Hypothesis: How Brains Implement Monotonic and Non-monotonic Reasoningthanks: This paper was submitted to IJCAI 2020 and rejected.

Hiroyuki Kido
Abstract

Recent success of Bayesian methods in neuroscience and artificial intelligence makes researchers conceive the hypothesis that the brain is a Bayesian machine. Since logic, as the laws of thought, is a product and a practice of our human brains, it is natural to think that there is a Bayesian algorithm and data-structure for entailment. This paper gives a Bayesian account of entailment and characterizes its abstract inferential properties. The Bayesian entailment is shown to be a monotonic consequence relation in an extreme case. In general, it is a non-monotonic consequence relation satisfying Classical cautious monotony and Classical cut we introduce to reconcile existing conflicting views in the field. The preferential entailment, which is a representative non-monotonic consequence relation, is shown to correspond to a maximum a posteriori entailment. It is an approximation of the Bayesian entailment. We derive it based on the fact that maximum a posteriori estimation is an approximation of Bayesian estimation. We finally discuss merits of our proposals in terms of encoding preferences on defaults, handling change and contradiction, and modeling human entailment.

1 Introduction

Bayes’ theorem is a simple mathematical equation published in 1763. Today, it plays an important role in various fields such as AI, neuroscience, cognitive science, statistical physics and bioinformatics. It lays the foundation of most modern AI systems including self-driving cars, machine language translation, speech recognition and medical diagnosis [30]. Recent studies of neuroscience, e.g., [22, 18, 14, 17, 4, 5, 12], empirically show that Bayesian methods explain several functions of the cerebral cortex. It is the outer portion of the brain in charge of higher-order functions such as perception, memory, emotion and thought. These successes of Bayesian methods make researchers conceive the Bayesian brain hypothesis that brain is a Bayesian machine [11, 31].

If the Bayesian brain hypothesis is true then it is natural to think that there is a Bayesian algorithm and data-structure for logical reasoning. This is because logic, as the laws of thought, is a product and a practice of our human brains. Such Bayesian account of logical reasoning is important. First, it has a potential to be a mathematical model to explain how the brain implements logical reasoning. Second, it theoretically supports the Bayesian brain hypothesis in terms of logic. Third. it gives an opportunity and a way to critically assess the existing formalisms of logical reasoning. Nevertheless, few research has focused on reformulating logical reasoning in terms of the Bayesian perspective (see Section 4 for discussion).

In this paper, we begin by assuming the posterior distribution over valuation functions, denoted by vv. The probability of each valuation function represents how much the state of the world specified by the valuation function is natural, normal or typical. We then assume a causal relation from valuation functions to each sentence, denoted by α\alpha. Under the assumptions, the probability that α\alpha is true, denoted by p(α)p(\alpha), will be shown to have

p(α)=vp(α,v)=vp(α|v)p(v).\displaystyle p(\alpha)=\sum_{v}p(\alpha,v)=\sum_{v}p(\alpha|v)p(v).

That is, the probability of any sentence is not primitive but dependent on all valuation functions probabilistically distributed. Given a set Δ\Delta of sentences with the same assumptions, we will show to have

p(α|Δ)=vp(α|v)p(v|Δ).\displaystyle p(\alpha|\Delta)=\sum_{v}p(\alpha|v)p(v|\Delta).

This equation is known as Bayesian learning [30]. Intuitively speaking, Δ\Delta updates the probability distribution over valuation functions, i.e., p(v)p(v) for all vv, and then the updated distribution is used to predict the truth of α\alpha. We define a Bayesian entailment, denoted by Δωα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}\alpha, using p(α|Δ)ωp(\alpha|\Delta)\geq\omega, as usual.

We derive several important facts from the idea. The Bayesian entailment is shown to be a monotonic consequence relation when ω=1\omega=1. In general, it is a non-monotonic consequence relation satisfying Classical cautious monotony and Classical cut we introduce to reconcile existing conflicting views [13, 3] in the field. The preferential entailment [32], which is a representative non-monotonic consequence relation, is shown to correspond to a maximum a posteriori entailment. It is an approximation of the Bayesian entailment. It is derived from the fact that maximum a posteriori estimation is an approximation of Bayesian estimation. These results imply that both monotonic and non-monotonic consequence relations can be seen as Bayesian learning with a fixed probability threshold.

This paper is orgranized as follows. Section 2 gives a probabilistic model for a Bayesian entailment. For the sake of correctness of the Bayesian entailment, Section 3 discusses its inferential properties in terms of monotonic and non-monotonic consequence relations. In Section 4, we discuss merits of our proposals in terms of encoding preferences on defaults, handling change and contradiction, and modeling human entailment.

2 Bayesian Entailment

Let {\cal L}, 𝒫{\cal P} and vv respectively denote the propositional language, the set of all propositional symbols in {\cal L}, and a valuation function, v:𝒫{0,1}v:{\cal P}\rightarrow\{0,1\}, where 0 and 11 mean the truth values, false and true, respectively. To handle uncertainty of states of the world, we assume that valuation functions are probabilistically distributed. Let VV denote a random variable over valuation functions. p(V=vi)p(V=v_{i}) denotes the probability of valuation function viv_{i}. It reflects the probability of the state of the world specified by viv_{i}. Given two valuation functions v1v_{1} and v2v_{2}, p(V=v1)>p(V=v2)p(V=v_{1})>p(V=v_{2}) represents that the state of the world specified by v1v_{1} is more natural/typical/normal than that of v2v_{2}. When the cardinality of 𝒫{\cal P} is nn, there are 2n2^{n} possible states of the world. Thus, there are 2n2^{n} possible valuation functions. It is the case that 0p(vi)10\leq p(v_{i})\leq 1, for all ii such that 1i2n1\leq i\leq 2^{n}, and i=12np(V=vi)=1\sum_{i=1}^{2^{n}}p(V=v_{i})=1.

We assume that every propositional sentence is a random variable that has a truth value either 0 or 11. For all α\alpha\in{\cal L}, p(α=1)p(\alpha=1) represents the probability that α\alpha is true and p(α=0)p(\alpha=0) that α\alpha is false. We assume that α\llbracket\alpha\rrbracket denotes the set of all valuation functions in which α\alpha is true and αv\llbracket\alpha\rrbracket_{v} denotes the truth value under valuation function vv.

Definition 1 (Interpretation)

Let α\alpha be a propositional sentence and VV be a valuation function. The conditional probability distribution over α\alpha given VV is given as follows.

p(α=1|V)\displaystyle p(\alpha=1|V) =\displaystyle= αV\displaystyle\llbracket\alpha\rrbracket_{V}
p(α=0|V)\displaystyle p(\alpha=0|V) =\displaystyle= 1αV\displaystyle 1-\llbracket\alpha\rrbracket_{V}

From the viewpoint of logic, it is natural to assume that a truth value of a sentence is caused only by the valuation functions. Thus, p(α)p(\alpha) is given by

p(α)=vip(α,V=vi)=vip(α|V=vi)p(V=vi).\displaystyle p(\alpha)=\sum_{v_{i}}p(\alpha,V=v_{i})=\sum_{v_{i}}p(\alpha|V=v_{i})p(V=v_{i}).
Example 1

Suppose two propositional symbols aa and bb. The left table in Table 1 shows all of the 22=42^{2}=4 possible valuation functions and their probability distribution. The right table in Table 1 shows p(a¬b|V)p(a\lor\lnot b|V). Let us assume the following probability distribution over valuation functions.

p(V)\displaystyle p(V) =\displaystyle= (p(V=v1),p(V=v2),p(V=v3),p(V=v4))\displaystyle(p(V=v_{1}),p(V=v_{2}),p(V=v_{3}),p(V=v_{4}))
=\displaystyle= (0.5,0.2,0,0.3)\displaystyle(0.5,0.2,0,0.3)

Now, p(a¬b=1)p(a\lor\lnot b=1) and p(a¬b=0)p(a\lor\lnot b=0) are given as follows.

p(a¬b=1)\displaystyle p(a\lor\lnot b=1) =\displaystyle= i=14p(a¬b=1|V=vi)p(V=vi)\displaystyle\sum_{i=1}^{4}p(a\lor\lnot b=1|V=v_{i})p(V=v_{i})
=\displaystyle= i=14a¬bV=vip(V=vi)\displaystyle\sum_{i=1}^{4}\llbracket a\lor\lnot b\rrbracket_{V=v_{i}}p(V=v_{i})
=\displaystyle= p(V=v1)+p(V=v3)+p(V=v4)\displaystyle p(V=v_{1})+p(V=v_{3})+p(V=v_{4})
=\displaystyle= 0.8\displaystyle 0.8
p(a¬b=0)\displaystyle p(a\lor\lnot b=0) =\displaystyle= 1p(a¬b=1)\displaystyle 1-p(a\lor\lnot b=1)
=\displaystyle= 0.2\displaystyle 0.2
Table 1: The left table shows all valuation functions and their distribution. The right table shows p(a¬b|V)p(a\lor\lnot b|V).
p(V)p(V) aa bb
v1v_{1} 0.50.5 0 0
v2v_{2} 0.20.2 0 11
v3v_{3} 0 11 0
v4v_{4} 0.30.3 11 11
a¬b=0a\lor\lnot b=0 a¬b=1a\lor\lnot b=1
V=v1V=v_{1} 0 11
V=v2V=v_{2} 11 0
V=v3V=v_{3} 0 11
V=v4V=v_{4} 0 11

Definition 1 implies that the probability of the truth of a sentence is not primitive, but dependent on the valuation functions. Therefore, we need to guarantee that probabilities on sentences satisfy the Kolmogorov axioms.

Proposition 1

The following expressions hold, for all formulae α,β\alpha,\beta\in{\cal L}.

  1. 1.

    0p(α=i)0\leq p(\alpha=i), for all ii.

  2. 2.

    ip(α=i)=1\sum_{i}p(\alpha=i)=1.

  3. 3.

    p(αβ=i)=p(α=i)+p(β=i)p(αβ=i)p(\alpha\lor\beta=i)=p(\alpha=i)+p(\beta=i)-p(\alpha\land\beta=i), for all ii.

Proof

Since any sentence takes a truth value either 0 or 11, it is sufficient for (1) and (2) to show p(α=0)+p(α=1)=1p(\alpha=0)+p(\alpha=1)=1 and 0p(α=0),p(α=1)10\leq p(\alpha=0),p(\alpha=1)\leq 1. The following expressions hold.

p(α=0)\displaystyle p(\alpha=0) =\displaystyle= vp(α=0|v)p(v)=v(1αv)p(v)\displaystyle\sum_{v}p(\alpha=0|v)p(v)=\sum_{v}(1-\llbracket\alpha\rrbracket_{v})p(v)
p(α=1)\displaystyle p(\alpha=1) =\displaystyle= vp(α=1|v)p(v)=vαvp(v)\displaystyle\sum_{v}p(\alpha=1|v)p(v)=\sum_{v}\llbracket\alpha\rrbracket_{v}p(v)

Here, we have abbreviated p(V=v)p(V=v) to p(v)p(v) and αV=v\llbracket\alpha\rrbracket_{V=v} to αv\llbracket\alpha\rrbracket_{v}. (1) is true because 0p(v)0\leq p(v) holds, for all vv. (2) is true because p(α=0)+p(α=1)=vp(v)=1p(\alpha=0)+p(\alpha=1)=\sum_{v}p(v)=1 holds. (3) can be developed as follows, where the first expression comes when i=0i=0 and the second when i=1i=1.

vp(v)(1αβv)\displaystyle\sum_{v}p(v)(1-\llbracket\alpha\lor\beta\rrbracket_{v}) =\displaystyle= vp(v)(1{αv+βvαβv})\displaystyle\sum_{v}p(v)(1-\{\llbracket\alpha\rrbracket_{v}+\llbracket\beta\rrbracket_{v}-\llbracket\alpha\land\beta\rrbracket_{v}\})
vp(v)αβv\displaystyle\sum_{v}p(v)\llbracket\alpha\lor\beta\rrbracket_{v} =\displaystyle= vp(v){αv+βvαβv}\displaystyle\sum_{v}p(v)\{\llbracket\alpha\rrbracket_{v}+\llbracket\beta\rrbracket_{v}-\llbracket\alpha\land\beta\rrbracket_{v}\}

There are four possible cases. If αv=βv=0\llbracket\alpha\rrbracket_{v}=\llbracket\beta\rrbracket_{v}=0 then the expression in the bracket of the right expressions turn out to be 0(=0+00)0(=0+0-0), if αv=0\llbracket\alpha\rrbracket_{v}=0 and βv=1\llbracket\beta\rrbracket_{v}=1 then 1(=0+10)1(=0+1-0), if αv=1\llbracket\alpha\rrbracket_{v}=1 and βv=0\llbracket\beta\rrbracket_{v}=0 then 1(=1+00)1(=1+0-0), and if αv=βv=1\llbracket\alpha\rrbracket_{v}=\llbracket\beta\rrbracket_{v}=1 then 1(=1+11)1(=1+1-1). All the results are consistent with αβv\llbracket\alpha\lor\beta\rrbracket_{v}.

Proposition 2

p(α=0)=p(¬α=1)p(\alpha=0)=p(\neg\alpha=1) holds, for any α\alpha\in{\cal L}.

Proof

It is true that p(¬α=1)=v¬αvp(v)=v(1αv)p(v)=p(α=0)p(\lnot\alpha=1)=\sum_{v}\llbracket\lnot\alpha\rrbracket_{v}p(v)=\sum_{v}(1-\llbracket\alpha\rrbracket_{v})p(v)=p(\alpha=0).

In what follows, we thus replace p(α=0)p(\alpha=0) by p(¬α=1)p(\lnot\alpha=1) and then abbreviate p(¬α=1)p(\lnot\alpha=1) to p(¬α)p(\lnot\alpha), for all sentences α\alpha\in{\cal L}.

Dependency among random variables is shown in Figure 1 using a Bayesian network, a directed acyclic graphical model. A sentence α\alpha has a directed edge only from a valuation function VV. It represents that valuation functions are the direct causes of truth values of sentences. The dependency between VV and another sentence βi\beta_{i} is the same as α\alpha. Only βi\beta_{i} is coloured grey. It means that βi\beta_{i} is assumed to be observed, which is in contrast to the other nodes assumed to be predicted or estimated. The box surrounding βi\beta_{i} is a plate. It represents that there are NN sentences β1\beta_{1}, β2\beta_{2}, … and βN\beta_{N} to which there is a directed edge from VV. Given the dependency, the conditional probability of α\alpha given β1\beta_{1}, β2\beta_{2}, … and βN\beta_{N} is given as follows.

p(α|β1,β2,,βN)\displaystyle p(\alpha|\beta_{1},\beta_{2},\cdots,\beta_{N}) =\displaystyle= p(α,β1,β2,,βN)p(β1,β2,,βN)=vp(v)p(α|v)i=1Np(βi|v)vp(v)i=1Np(βi|v)\displaystyle\frac{p(\alpha,\beta_{1},\beta_{2},\cdots,\beta_{N})}{p(\beta_{1},\beta_{2},\cdots,\beta_{N})}=\frac{\sum_{v}p(v)p(\alpha|v)\prod_{i=1}^{N}p(\beta_{i}|v)}{\sum_{v}p(v)\prod_{i=1}^{N}p(\beta_{i}|v)}
=\displaystyle= vp(v)αvi=1Nβivvp(v)i=1Nβiv=vp(v)αvβ1,β2,,βNvvp(v)β1,β2,,βNv\displaystyle\frac{\sum_{v}p(v)\llbracket\alpha\rrbracket_{v}\prod_{i=1}^{N}\llbracket\beta_{i}\rrbracket_{v}}{\sum_{v}p(v)\prod_{i=1}^{N}\llbracket\beta_{i}\rrbracket_{v}}=\frac{\sum_{v}p(v)\llbracket\alpha\rrbracket_{v}\llbracket\beta_{1},\beta_{2},...,\beta_{N}\rrbracket_{v}}{\sum_{v}p(v)\llbracket\beta_{1},\beta_{2},...,\beta_{N}\rrbracket_{v}}
=\displaystyle= vα,β1,β2,,βNp(v)vβ1,β2,,βNp(v)\displaystyle\frac{\sum_{v\in\llbracket\alpha,\beta_{1},\beta_{2},...,\beta_{N}\rrbracket}p(v)}{\sum_{v\in\llbracket\beta_{1},\beta_{2},...,\beta_{N}\rrbracket}p(v)}
Example 2 (Continued)

p(¬a|a¬b,¬ab)p(\lnot a|a\lor\lnot b,\lnot a\lor b) is given as follows.

p(¬a|a¬b,¬ab)\displaystyle p(\lnot a|a\lor\lnot b,\lnot a\lor b) =\displaystyle= vp(v)¬ava¬b,¬abvvp(v)a¬b,¬abv\displaystyle\frac{\sum_{v}p(v)\llbracket\lnot a\rrbracket_{v}\llbracket a\lor\lnot b,\lnot a\lor b\rrbracket_{v}}{\sum_{v}p(v)\llbracket a\lor\lnot b,\lnot a\lor b\rrbracket_{v}}
=\displaystyle= p(v1)p(v1)+p(v4)=0.50.8=0.625\displaystyle\frac{p(v_{1})}{p(v_{1})+p(v_{4})}=\frac{0.5}{0.8}=0.625

Now, we want to investigate logical properties of p(α|β1,β2,,βN)p(\alpha|\beta_{1},\beta_{2},...,\beta_{N}). We thus define a consequence relation between {β1,β2,,βN}\{\beta_{1},\beta_{2},...,\beta_{N}\} and α\alpha.

Definition 2 (Bayesian entailment)

Let α\alpha\in{\cal L} be a sentence, Δ\Delta\subseteq{\cal L} be a set of sentences, and ω[0,1]\omega\in[0,1] be a probability. α\alpha is a Bayesian entailment of Δ\Delta with ω\omega, denoted by Δωα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}\alpha, if p(α|Δ)ωp(\alpha|\Delta)\geq\omega or p(Δ)=0p(\Delta)=0.

Condition p(Δ)=0p(\Delta)=0 guarantees that α\alpha is a Bayesian entailment of Δ\Delta when p(α|Δ)p(\alpha|\Delta) is undefined due to division by zero. It happens when Δ\Delta has no models, i.e., Δ=\llbracket\Delta\rrbracket=\emptyset, or zero probability, i.e., p(v)=0p(v)=0, for all vΔv\in\llbracket\Delta\rrbracket. ωα\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}\alpha is a special case of Definition 2. It holds when p(α)ωp(\alpha)\geq\omega. We call Definition 2 Bayesian entailment because p(α|Δ)p(\alpha|\Delta) can be developed as follows.

p(α|Δ)=vp(v,α,Δ)p(Δ)=vp(α|v)p(v|Δ)p(Δ)p(Δ)=vp(α|v)p(v|Δ)\displaystyle p(\alpha|\Delta)=\frac{\sum_{v}p(v,\alpha,\Delta)}{p(\Delta)}=\frac{\sum_{v}p(\alpha|v)p(v|\Delta)p(\Delta)}{p(\Delta)}=\sum_{v}p(\alpha|v)p(v|\Delta)

The expression is often called Bayesian learning where Δ\Delta updates the distribution over valuation functions, i.e., p(V)p(V), and truth values of α\alpha is predicted using the updated distribution. Therefore, the Bayesian entailment allows us to see consequences in logic are predictions in Bayesian learning.

Refer to caption
Figure 1: Dependency between the random variable VV of a valuation function and the random variables α\alpha and βi\beta_{i} of propositional sentences.

3 Correctness

This section aims to show correctness of the Bayesian entailment. We first prove that a natural restriction of the Bayesian entailment is a classical consequence relation. We then prove that a weaker restriction of the Bayesian entailment can be seen as a non-monotonic consequence relation.

3.1 Propositional Entailment

Recall that propositional entailment Δα\Delta\models\alpha is defined as follows: For all valuation functions vv, if Δ\Delta is true in vv then α\alpha is true in vv. The Bayesian entailment 1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1} works in a similar way as the propositional entailment. The only difference is that the Bayesian entailment ignores valuation functions with zero probability. The valuation functions with zero probability represent impossible states of the world.

Theorem 3.1

Let α\alpha\in{\cal L} be a sentence and Δ\Delta\subseteq{\cal L} be a set of sentences. Δ1α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if, for all valuation functions vv such that p(v)0p(v)\neq 0, if Δ\Delta is true in vv then α\alpha is true in vv.

Proof

We show that Δ1α\Delta\not\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if there is a valuation function vv such that p(v)0p(v)\neq 0 holds, Δ\Delta is true in vv, and α\alpha is false in vv. From Definition 2, Δ1α\Delta\not\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if p(Δ)0p(\Delta)\neq 0 and p(α|Δ)1p(\alpha|\Delta)\neq 1 hold. From Definition 1, p(Δ)0p(\Delta)\neq 0 holds if and only if there is a valuation function vv^{*} such that p(v)0p(v^{*})\neq 0 holds and Δ\Delta is true in vv^{*}. (\Leftarrow) This has been proven so far. (\Rightarrow) p(Δ)0p(\Delta)\neq 0 holds due to vv^{*}. Since p(α|Δ)=vp(v)αvΔvvp(v)Δv1p(\alpha|\Delta)=\frac{\sum_{v}p(v)\llbracket\alpha\rrbracket_{v}\llbracket\Delta\rrbracket_{v}}{\sum_{v}p(v)\llbracket\Delta\rrbracket_{v}}\neq 1, there is vΔαv\in\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket such that p(v)0p(v)\neq 0. Δ\Delta is true in vv and α\alpha is false in vv.

When no valuation functions take zero probability, consequences with the Bayesian entailment with probability one coincide with the propositional entailment.

Proposition 3

Let α\alpha\in{\cal L} be a sentence, Δ\Delta\subseteq{\cal L} be a set of sentences. If there is no valuation functions vv such that p(v)=0p(v)=0 then Δ1α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if Δα\Delta\models\alpha holds.

Proof

The definition of the propositional entailment is equivalent to Theorem 3.1 under the non-zero assumption.

No preference is given to valuation functions in the propositional entailment. It corresponds to the Bayesian entailment with the uniform assumption, i.e., p(v1)=p(v2)==p(vN)p(v_{1})=p(v_{2})=\cdots=p(v_{N}). Note that Theorem 3.1 and Proposition 3 hold even when the probability distribution over valuation functions is uniform. It is because the uniform assumption is stricter than the non-zero assumption, i.e., p(v)0p(v)\neq 0. The following holds when no assumption is imposed on the probability distribution over valuation functions,

Proposition 4

Let α\alpha\in{\cal L} be a sentence and Δ\Delta\subseteq{\cal L} be a set of sentences. If Δα\Delta\vDash\alpha holds then Δ1α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds, but not vice versa.

Proof

This is obvious from Theorem 3.1.

Since an entailment is defined between a set of sentences and a sentence, both \models and ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} are subsets of Pow()×Pow({\cal L})\times{\cal L}. Here, Pow()Pow({\cal L}) denotes the power set of the language {\cal L}. Proposition 4 states that \models is a subset of 1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}, i.e., 1\models\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}. It represents that the propositional entailment is more cautious in entailment than the Bayesian entailment. Moreover, it is obvious from Definition 2 that ω1ω2\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega_{1}}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega_{2}} holds if ω2ω1\omega_{2}\leq\omega_{1} holds.

In propositional logic, a sentence is said to be a tautology if it is true in all of the valuation functions. A sentence α\alpha is thus a tautology if and only if α\models\alpha holds. In the Bayesian entailment, 1α\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if α\alpha is true in all of the valuation functions with a non-zero probability.

Proposition 5

Let α\alpha\in{\cal L} be a sentence. 1α\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds if and only if, for all valuation functions vv such that p(v)0p(v)\neq 0, α\alpha is true in vv.

Proof

p(α)=vαvp(v)=vαp(v)p(\alpha)=\sum_{v}\llbracket\alpha\rrbracket_{v}p(v)=\sum_{v\in\llbracket\alpha\rrbracket}p(v). Since vp(v)=1\sum_{v}p(v)=1, vαp(v)1\sum_{v\in\llbracket\alpha\rrbracket}p(v)\neq 1 holds if and only if there is vv^{*} such that vαv^{*}\notin\llbracket\alpha\rrbracket and p(v)0p(v^{*})\neq 0.

Example 3 (Continued)

1¬ab\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\lnot a\lor b holds because we have

p(¬ab)=v¬abp(v)=p(v1)+p(v2)+p(v4)=1.\displaystyle p(\lnot a\lor b)=\sum_{v\in\llbracket\lnot a\lor b\rrbracket}p(v)=p(v_{1})+p(v_{2})+p(v_{4})=1.

3.2 Monotonic Consequence Relation

We investigate inferential properties of the Bayesian entailment in terms of the monotonic consequence relation. It is known that any monotonic consequence relation can be characterised by the three properties: Reflexivity, Monotony and Cut. Let Pow()×\vdash\subseteq Pow({\cal L})\times{\cal L} denote a consequence relation on {\cal L}. Those properties are defined as follows, where α,β\alpha,\beta\in{\cal L} and Δ\Delta\subseteq{\cal L}.

  • Reflexivity: Δα\forall\Delta\forall\alpha, Δ,αα\Delta,\alpha\vdash\alpha

  • Monotony: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δα\Delta\vdash\alpha then Δ,βα\Delta,\beta\vdash\alpha

  • Cut: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\vdash\beta and Δ,βα\Delta,\beta\vdash\alpha then Δα\Delta\vdash\alpha

Reflexivity states that α\alpha is a consequence of any set with α\alpha. Monotony states that if α\alpha is a consequence of Δ\Delta then it is a consequence of any superset of Δ\Delta as well. Cut states that an addition of any consequence of Δ\Delta to Δ\Delta does not reduce any consequence of Δ\Delta. The Bayesian entailment 1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1} is classical in the sense that it satisfies all of the properties.

Theorem 3.2

The Bayesian entailment 1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1} satisfies Reflexivity, Monotony and Cut.

Proof

(Reflexivity) It is true because 1\models\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1} holds. (Monotony) Since Δ1α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds, Δα\llbracket\Delta\rrbracket\subseteq\llbracket\alpha\rrbracket or p(v)=0p(v)=0 holds, for all vΔαv\in\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket. For all vΔαv\notin\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket, it is thus true that if vΔv\in\llbracket\Delta\rrbracket holds then vαv\in\llbracket\alpha\rrbracket. Therefore,

p(α|Δ,β)=p(α,Δ,β)p(Δ,β)=vαvΔvβvp(v)vΔvβvp(v)=vΔαΔvβvp(v)vΔαΔvβvp(v)=1.\displaystyle p(\alpha|\Delta,\beta)=\frac{p(\alpha,\Delta,\beta)}{p(\Delta,\beta)}=\frac{\sum_{v}\llbracket\alpha\rrbracket_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}{\sum_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}=\frac{\sum_{v\notin\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}{\sum_{v\notin\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}=1.

Here, we have excluded all vΔαv\in\llbracket\Delta\rrbracket\setminus\llbracket\alpha\rrbracket because of p(v)=0p(v)=0. (Cut) Since Δ1β\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\beta holds, Δβ\llbracket\Delta\rrbracket\subseteq\llbracket\beta\rrbracket or p(v)=0p(v)=0 holds, for all vΔβv\in\llbracket\Delta\rrbracket\setminus\llbracket\beta\rrbracket. Since Δ,β1α\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\alpha holds, Δ,βα\llbracket\Delta,\beta\rrbracket\subseteq\llbracket\alpha\rrbracket or p(v)=0p(v)=0 holds, for all vΔ,βαv\in\llbracket\Delta,\beta\rrbracket\setminus\llbracket\alpha\rrbracket. Let X=(Δβ)(Δ,βα)X=(\llbracket\Delta\rrbracket\setminus\llbracket\beta\rrbracket)\cup(\llbracket\Delta,\beta\rrbracket\setminus\llbracket\alpha\rrbracket). For all vXv\notin X, it is thus true that if vΔv\in\llbracket\Delta\rrbracket holds then vβv\in\llbracket\beta\rrbracket and vαv\in\llbracket\alpha\rrbracket hold. We thus have

p(α|Δ,β)=p(α,Δ,β)p(Δ,β)=vαvΔvβvp(v)vΔvβvp(v)=vXΔvp(v)vXΔvp(v)=1.\displaystyle p(\alpha|\Delta,\beta)=\frac{p(\alpha,\Delta,\beta)}{p(\Delta,\beta)}=\frac{\sum_{v}\llbracket\alpha\rrbracket_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}{\sum_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}=\frac{\sum_{v\notin X}\llbracket\Delta\rrbracket_{v}p(v)}{\sum_{v\notin X}\llbracket\Delta\rrbracket_{v}p(v)}=1.

The next theorem states inferential properties of the Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} with probability ω\omega where 0.5<ω<10.5<\omega<1.

Theorem 3.3

Let ω\omega be a probability where 0.5<ω<10.5<\omega<1. The Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} satisfies Reflexivity, but does not satisfy Monotony and Cut.

Proof

(Reflexivity) Obvious from 1ω\models\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}. (Monotony) We show a counter-example. Given the set {a,b}\{a,b\} of propositional symbols, consider the probability distribution over valuation functions shown in Table 3. Note that vp(v)=1\sum_{v}p(v)=1 holds. It is the case that

p(a)=p(v3)+p(v4)=(1ω)+(2ω1)=ω\displaystyle p(a)=p(v_{3})+p(v_{4})=(1-\omega)+(2\omega-1)=\omega
p(a|b)=p(v4)p(v2)+p(v4)=2ω1(1ω)+(2ω1)=2ω1ω.\displaystyle p(a|b)=\frac{p(v_{4})}{p(v_{2})+p(v_{4})}=\frac{2\omega-1}{(1-\omega)+(2\omega-1)}=\frac{2\omega-1}{\omega}.

ω>2ω1ω\omega>\frac{2\omega-1}{\omega} holds if and only if (ω1)2>0(\omega-1)^{2}>0 holds. It thus always true when 0.5<ω<10.5<\omega<1. Therefore, ωa\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}a but bωab\not\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}a hold. (Cut) We show a counter-example. Consider the probability distribution over valuation functions shown in Table 3. Note that vp(v)=1\sum_{v}p(v)=1 holds. It is the case that

p(a)=p(v3)+p(v4)=ω(1ω)+ω2=ω\displaystyle p(a)=p(v_{3})+p(v_{4})=\omega(1-\omega)+\omega^{2}=\omega
p(ab|a)=p(v4)p(v3)+p(v4)=ω2ω(1ω)+ω2=ω\displaystyle p(a\land b|a)=\frac{p(v_{4})}{p(v_{3})+p(v_{4})}=\frac{\omega^{2}}{\omega(1-\omega)+\omega^{2}}=\omega
p(ab)=p(v4)=ω2\displaystyle p(a\land b)=p(v_{4})=\omega^{2}

ω>ω2\omega>\omega^{2} always true when 0.5<ω<10.5<\omega<1. Therefore, ωa\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}a and aωaba\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}a\land b hold, but ωab\not\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}a\land b holds.

Table 2: Counter-example of Monotony
p(V)p(V) aa bb
v1v_{1} 0 0 0
v2v_{2} 1ω1-\omega 0 11
v3v_{3} 1ω1-\omega 11 0
v4v_{4} 2ω12\omega-1 11 11
Table 3: Counter-example of Cut
p(V)p(V) aa bb
v1v_{1} 0 0 0
v2v_{2} 1ω1-\omega 0 11
v3v_{3} ω(1ω)\omega(1-\omega) 11 0
v4v_{4} ω2\omega^{2} 11 11
Example 4

Given ω=0.8\omega=0.8 in Table 3, p(a)=0.8p(a)=0.8 and p(a|b)=0.75p(a|b)=0.75 hold. Monotony does not hold because 0.8a\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a holds, but b0.8ab\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a does not hold. Given ω=0.8\omega=0.8 in Table 3, p(a)=0.8p(a)=0.8, p(ab|a)=0.8p(a\land b|a)=0.8 and p(ab)=0.64p(a\land b)=0.64 hold. Cut does not hold because 0.8a\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a and a0.8aba\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a\land b hold, but 0.8ab\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a\land b does not hold.

Therefore, in contrast to 1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{1}, the Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} is not a monotonic consequence relation, for all ω\omega where 0.5<ω<10.5<\omega<1.

3.3 Non-monotonic Consequence Relation

We next analyze the Bayesian entailment in terms of inferential properties of non-monotonic consequence relations. It is known that there are at leat four core properties characterizing non-monotonic consequence relations: Supraclassicality, Reflexivity, Cautious monotony and Cut. Let Pow()×\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\subseteq Pow({\cal L})\times{\cal L} be a consequence relation on {\cal L}. Those properties are formally defined as follows, where α,β\alpha,\beta\in{\cal L} and Δ\Delta\subseteq{\cal L}.

  • Supraclassicality: Δα\forall\Delta\forall\alpha, if Δα\Delta\vdash\alpha then Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Reflexivity: Δα\forall\Delta\forall\alpha, Δ,αα\Delta,\alpha\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Cautious monotony: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\beta and Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Cut: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\beta and Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

We have already discussed reflexivity and cut. Supraclassicality states that the consequence relation extends the monotonic consequence relation. Cautious monotony states that if α\alpha is a consequence of Δ\Delta then it is a consequence of supersets of Δ\Delta as well. However, it is weaker than monotony because the supersets are restricted to consequences of Δ\Delta. Consequence relations satisfying those properties are often called a cumulative consequence relation [3].111This definition is not absolute. The authors [19] define a cumulative consequence as a relation satisfying Reflexivity, Left logical equivalence, Right weakening, Cut and Cautious monotony.

Theorem 3.4

Let ω\omega be a probability where 0.5<ω<10.5<\omega<1. The Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} satisfies Supraclassicality and Reflexivity, but does not satisfy Cautious monotony and Cut.

Proof

(Reflexivity & Cut) See Theorem 3.3. (Supraclassicality) This is obvious from ω\models\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}. (Cautious monotony) It is enough to show a counter-example. Given set {a,b}\{a,b\} of atomic propositions, consider again the distribution over valuation functions shown in Table 3. We have

p(a)=p(v3)+p(v4)=(1ω)+(2ω1)=ω\displaystyle p(a)=p(v_{3})+p(v_{4})=(1-\omega)+(2\omega-1)=\omega
p(b)=p(v2)+p(v4)=(1ω)+(2ω1)=ω\displaystyle p(b)=p(v_{2})+p(v_{4})=(1-\omega)+(2\omega-1)=\omega
p(a|b)=p(v4)p(v2)+p(v4)=2ω1ω.\displaystyle p(a|b)=\frac{p(v_{4})}{p(v_{2})+p(v_{4})}=\frac{2\omega-1}{\omega}.

ω2ω1ω\omega\leq\frac{2\omega-1}{\omega} holds if and only if (ω1)20(\omega-1)^{2}\leq 0 holds. It is always false when 0.5<ω<10.5<\omega<1 holds.

Theorem 3.4 shows that, in general, the Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} is not cumulative. A natural question here is what inferential properties characterize the Bayesian entailment. We thus introduce two properties: Classically cautious monotony and Classical cut.

Definition 3 (Classically cautious monotony and classical cut)

Let α,β\alpha,\beta\in{\cal L} be sentences, Δ\Delta\in{\cal L} be a set of sentences, and Pow()×\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ Pow({\cal L})\times{\cal L} be a consequence relation on {\cal L}. Classically cautious monotony and Classical cut are given as follows:

  • Classically cautious monotony: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\vdash\beta and Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Classical cut: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\vdash\beta and Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

where Pow()×\vdash\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ Pow({\cal L})\times{\cal L} denotes a monotonic consequence relation.

Intuitively speaking, Classically cautious monotony and Cut state that only monotonic consequences may be used as premises of the next inference operation. It is in contrast to Cautious monotony and Cut stating that any consequences may be used as premises of the next operation. Classically cautious monotony is weaker than Cautious monotony, and Cautious monotony is weaker than Monotony. Thus, if a consequence relation satisfies Monotony then it satisfies Classically cautious monotony and Cautious monotony as well. We now define a classically cumulative consequence relation.

Definition 4 (Classically cumulative consequence relation)

Let Pow()×\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ Pow({\cal L})\times{\cal L} be a consequence relation on {\cal L}. \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}} is said to be a classically cumulative consequence relation if it satisfies all of the following properties.

  • Supraclassicality: Δα\forall\Delta\forall\alpha, if Δα\Delta\vdash\alpha then Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Reflexivity: Δα\forall\Delta\forall\alpha, Δ,αα\Delta,\alpha\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Classically cautious monotony: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\vdash\beta and Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

  • Classical cut: Δαβ\forall\Delta\forall\alpha\forall\beta, if Δβ\Delta\vdash\beta and Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha then Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha

Any cumulative consequence relation is a classically cumulative consequence relation, but not vice versa. A classically cumulative consequence relation is more conservative in entailment than a cumulative consequence relation. For example, in a cumulative consequence relation \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}, if (,β)(\emptyset,\beta)\in\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}} and (,α)(\emptyset,\alpha)\in\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}} hold then ({β},α)(\{\beta\},\alpha)\in\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}} necessary holds due to Cautious monotony. However, it does not hold in a classically cumulative consequence relation because (,β)(\emptyset,\beta)\notin\vdash might be the case. The same discussion is possible for Cut and Classical cut.

The next theorem shows that Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} (where 0.5<ω<10.5<\omega<1) is a classically cumulative consequence relation.

Theorem 3.5

Let ω\omega be a probability where 0.5<ω<10.5<\omega<1. The Bayesian entailment ω\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega} is a classically cumulative consequence relation.

Proof

(Supraclassicality & Reflexivity) See Theorem 3.4. (Classically cautious monotony & Classical cut) We prove both by showing that Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha holds if and only if Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha holds, given Δβ\Delta\vdash\beta holds. If p(Δ)=0p(\Delta)=0 holds then Δβ\Delta\vdash\beta, Δα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha and Δ,βα\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\alpha obviously hold from the definition. If p(Δ)0p(\Delta)\neq 0 then we have

p(α|Δ,β)=vαvΔvβvp(v)vΔvβvp(v)=vαvΔvp(v)vΔvp(v)=p(α|Δ).\displaystyle p(\alpha|\Delta,\beta)=\frac{\sum_{v}\llbracket\alpha\rrbracket_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}{\sum_{v}\llbracket\Delta\rrbracket_{v}\llbracket\beta\rrbracket_{v}p(v)}=\frac{\sum_{v}\llbracket\alpha\rrbracket_{v}\llbracket\Delta\rrbracket_{v}p(v)}{\sum_{v}\llbracket\Delta\rrbracket_{v}p(v)}=p(\alpha|\Delta).

We here used the facts Δβ\llbracket\Delta\rrbracket\subseteq\llbracket\beta\rrbracket and p(Δ,β)0p(\Delta,\beta)\neq 0. Δβ\llbracket\Delta\rrbracket\subseteq\llbracket\beta\rrbracket is true because Δβ\Delta\vdash\beta. p(Δ,β)0p(\Delta,\beta)\neq 0 is true because p(Δ)0p(\Delta)\neq 0 and Δβ\llbracket\Delta\rrbracket\subseteq\llbracket\beta\rrbracket.

Note that it makes no sense to show that any classically cumulative consequence relation is the Bayesian entailment. It corresponds to show that any monotonic consequence relation is the propositional entailment. The classically cumulative consequence relation is a meta-theory used to characterize various logical systems with specific logical language, syntax and semantics.

Example 5

Given ω=0.8\omega=0.8 in Table 3, p(ab)=1p(a\lor b)=1, p(a)=0.8p(a)=0.8 and p(a|ab)=0.8p(a|a\lor b)=0.8. It is thus the case that ab\vdash a\lor b, 0.8a\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a and ab0.8aa\lor b\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{0.8}a.

3.4 Preferential Entailment

The preferential entailment [32] is a representative approach to a non-monotonic consequence relation. We show that the preferential entailment coincides with a maximum a posteriori entailment, which is an approximation of the Bayesian entailment. The preferential entailment is defined on a preferential structure (or preferential model) (𝒱,)({\cal V},\succ), where 𝒱{\cal V} is a set of valuation functions and \succ is an irreflexive and transitive relation on 𝒱{\cal V}. v1v2v_{1}\succ v_{2} represents that v1v_{1} is preferable222For the sake of simplicity, we do not adopt the common practice in logic that v2v1v_{2}\succ v_{1} denotes v1v_{1} is preferable to v2v_{2}. to v2v_{2} in the sense that the world identified by v1v_{1} is more normal/typical/natural than the one identified by v2v_{2}. Given preferential structure (𝒱,)({\cal V},\succ), α\alpha is a preferential consequence of Δ\Delta, denoted by Δ(𝒱,)α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\alpha, if α\alpha is true in all \succ-maximal333\succ has to be smooth (or stuttered) [19] so that a maximal model certainly exists. That is, for all valuations vv, either vv is \succ-maximal or there is a \succ-maximal valuation vv^{\prime}such that vvv^{\prime}\succ v. models of Δ\Delta. A consequence relation Pow()×\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ Pow({\cal L})\times{\cal L} is said to be preferential [19] if it satisfies the following Or property, as well as Reflexivity, Cut and Cautious monotony, where Δ\Delta\subseteq{\cal L} and α,β,γ\alpha,\beta,\gamma\in{\cal L}.

  • Or: Δαβγ\forall\Delta\forall\alpha\forall\beta\forall\gamma, if Δ,αγ\Delta,\alpha\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\gamma and Δ,βγ\Delta,\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\gamma then Δ,αβγ\Delta,\alpha\lor\beta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}\gamma

In line with the fact that maximum a posterior estimation is an approximation of Bayesian estimation, we define a maximum a posteriori (MAP) entailment, denoted by MAP\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}, that is an approximation of the Bayesian entailment. Several concepts need to be introduced for that. vMAPv_{MAP} is said to be a maximum a posteriori estimate if it satisfies

vMAP=argmaxvp(v|Δ).\displaystyle v_{MAP}=\operatorname*{arg\,max}_{v}p(v|\Delta).

We now assume that the distribution p(V|Δ)p(V|\Delta) has a unique peak close to 1 at a valuation function. It means that there is a single state of the world that is very normal/natural/typical. It results in

p(V|Δ){1𝑖𝑓V=v𝑀𝐴𝑃0otherwise,\displaystyle p(V|\Delta)\simeq\begin{cases}1&\it{if}\leavevmode\nobreak\ V=v_{MAP}\\ 0&otherwise,\end{cases}

where \simeq denotes an approximation. Note that no one accepting MAP estimation can refuse this assumption in terms of the Bayesian perspective. Now, we have

p(α|Δ)=vp(α|v)p(v|Δ)vp(α|v)δ(v=vMAP)=p(α|vMAP),\displaystyle p(\alpha|\Delta)=\sum_{v}p(\alpha|v)p(v|\Delta)\simeq\sum_{v}p(\alpha|v)\delta(v=v_{MAP})=p(\alpha|v_{MAP}),

where δ\delta is the Kronecker delta that returns 11 if v=vMAPv=v_{MAP} and otherwise 0. It is the case that p(α|vMAP){0,1}p(\alpha|v_{MAP})\in\{0,1\}. Thus, a formal definition of the maximum a posteriori entailment is given as follows.

Definition 5 (Maximum a posteriori entailment)

Let α\alpha\in{\cal L} be a sentence and Δ\Delta\subseteq{\cal L} be a set of sentences. α\alpha is a maximum a posteriori entailment of Δ\Delta, denoted by ΔMAPα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\alpha, if p(α|vMAP)=1p(\alpha|v_{MAP})=1 or p(Δ)=0p(\Delta)=0, where vMAP=argmaxvp(v|Δ)v_{MAP}=\operatorname*{arg\,max}_{v}p(v|\Delta).

Given two ordered sets (S1,1)(S_{1},\leq_{1}) and (S2,2)(S_{2},\leq_{2}), a function ff is said to be an order-preserving (or isotone) map of (S1,1)(S_{1},\leq_{1}) into (S2,2)(S_{2},\leq_{2}) if x1yx\leq_{1}y implies that f(x)2f(y)f(x)\leq_{2}f(y), for all x,yS1x,y\in S_{1}. The next theorem relates the maximum a posteriori entailment to the preferential entailment.

Theorem 3.6

Let (𝒱,)({\cal V},\succ) be a preferential structure and p:𝒱[0,1]p:{\cal V}\rightarrow[0,1] be a probability mass function over VV. If pp is an order-preserving map of (𝒱,)({\cal V},\succ) into ([0,1])([0,1]\geq) then Δ(𝒱,)α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\alpha implies ΔMAPα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\alpha.

Proof

It is obviously true when Δ\Delta has no model. Let vv^{*} be a \succ-maximal model of Δ\Delta. It is sufficient to show p(α|v)=1p(\alpha|v^{*})=1 and v=argmaxvp(v|Δ)v^{*}=\operatorname*{arg\,max}_{v}p(v|\Delta). Since Δ(𝒱,)α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\alpha, α\alpha is true in vv^{*}. Thus, p(α|v)=αv=1p(\alpha|v^{*})=\llbracket\alpha\rrbracket_{v^{*}}=1. We have

argmaxvp(v|Δ)=argmaxvp(Δ|v)p(v)=argmaxvΔvp(v)\displaystyle\operatorname*{arg\,max}_{v}p(v|\Delta)=\operatorname*{arg\,max}_{v}p(\Delta|v)p(v)=\operatorname*{arg\,max}_{v}\llbracket\Delta\rrbracket_{v}p(v)

Since Δ\Delta is true in vv^{*}, Δv=1\llbracket\Delta\rrbracket_{v^{*}}=1. Since pp is order-preserving, if v1v2v_{1}\succ v_{2} then p(v1)p(v2)p(v_{1})\geq p(v_{2}), for all v1,v2v_{1},v_{2}. Thus, if vv is \succ-maximal then p(v)p(v) is maximal. Therefore, v=argmaxvΔvp(v)v^{*}=\operatorname*{arg\,max}_{v}\llbracket\Delta\rrbracket_{v}p(v) holds.

It is the case that (𝒱,),MAPPow()×\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)},\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\subseteq Pow({\cal L})\times{\cal L}. Theorem 3.6 states that (𝒱,)MAP\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\leavevmode\nobreak\ \subseteq\leavevmode\nobreak\ \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP} holds given an appropriate probability distribution over valuation functions.

Example 6

Suppose the probability distribution over valuation functions shown in the table in Figure 2 and the preferential structure (𝒱,)({\cal V},\succ) given as follows.

𝒱\displaystyle{\cal V} =\displaystyle= {v1,v2,v3,v4}\displaystyle\{v_{1},v_{2},v_{3},v_{4}\}
\displaystyle\succ =\displaystyle= {(v1,v2),(v1,v3),(v1,v4),(v3,v2),(v4,v2)}\displaystyle\{(v_{1},v_{2}),(v_{1},v_{3}),(v_{1},v_{4}),(v_{3},v_{2}),(v_{4},v_{2})\}

The transitivity of \succ is depicted in the graph shown in Figure 2. As shown in the graph, the probability mass function pp is an order-preserving map of (𝒱,)({\cal V},\succ) into ([0,1],)([0,1],\geq).

Now, {a¬b}(𝒱,)¬b\{a\lor\lnot b\}\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\lnot b holds because ¬b\lnot b is true in all \succ-maximal models of {a¬b}\{a\lor\lnot b\}. Indeed, ¬b\lnot b is true in v1v_{1}, which is uniquely \succ-maximal in {v1,v3,v4}\{v_{1},v_{3},v_{4}\}, the models of a¬ba\lor\lnot b. Meanwhile, {a¬b}MAP¬b\{a\lor\lnot b\}\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\lnot b holds because p(¬b|v1)=1p(\lnot b|v_{1})=1 holds where v1=argmaxvp(v|a¬b)v_{1}=\operatorname*{arg\,max}_{v}p(v|a\lor\lnot b).

However, {a}(𝒱,)¬b\{a\}\not\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\lnot b holds because ¬b\lnot b is false in v4v_{4}, which is \succ-maximal in {v3,v4}\{v_{3},v_{4}\}, the maximal models of a¬ba\lor\lnot b. In contrast, {a}MAP¬b\{a\}\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\lnot b holds because p(¬b|v3)=1p(\lnot b|v_{3})=1 holds where v3=argmaxvp(v|a)v_{3}=\operatorname*{arg\,max}_{v}p(v|a).

Refer to caption
Figure 2: The left table shows the probability distribution over valuation functions. The right graph shows the order-preserving probability mass function that maps each valuation function to its probability.

The equivalence relation between the maximum a posteriori entailment and the preferential entailment is obtained by restricting the preferential structure to a total order.

Theorem 3.7

Let (𝒱,)({\cal V},\succ) be a totally-ordered preferential structure and p:𝒱[0,1]p:{\cal V}\rightarrow[0,1] be a probability mass function over VV. If pp is an order-preserving map of (𝒱,)({\cal V},\succ) into ([0,1])([0,1]\geq) then Δ(𝒱,)α\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}_{({\cal V},\succ)}\alpha if and only if ΔMAPα\Delta\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{MAP}\alpha.

Proof

Same as Theorem 3.6. Only difference is that the model vv^{*} exists uniquely. Then, only vv^{*} satisfies v=argmaxvΔvp(v)v^{*}=\operatorname*{arg\,max}_{v}\llbracket\Delta\rrbracket_{v}p(v).

4 Discussion and Conclusions

There are a lot of research papers combining logic and probability theory, e.g., [1, 8, 25, 24, 23, 26, 6, 20, 21, 9, 27, 16, 29]. Their common interest is not the notion of truth preservation but rather probability preservation, where the uncertainty of the premises preserves the uncertainty of the conclusion. They are different from ours because they presuppose and extend classical logical consequence.

Besides the preferential entailment, various other semantics of non-monotonic consequence relations have been proposed such as plausibility structure [10], possibility structure [7], ranking structure [15] and ε\varepsilon-semantics [2, 28]. The common idea of the first three approaches is that Δ\Delta entails α\alpha if Δα\llbracket\Delta\land\alpha\rrbracket is preferable to Δ¬α\llbracket\Delta\land\lnot\alpha\rrbracket given a preference on models. The idea of the last approach is that Δ\Delta entails α\alpha if p(α|Δ)p(\alpha|\Delta) is close to one in the dependency network quantifying the strength of the causal relationship between sentences. In contrast to the last approach, we focus on the causal relationship between sentences and models, i.e., states of the world. Any sentences are conditionally independent given a model. This fact makes it possible to update the probability distribution over models using observed sentences Δ\Delta, and then to predict the truth of unobserved sentence α\alpha only using the distribution. It is different from the first three approaches assuming a preference on models prior to the analysis. It is also different from the last approach introducing a new tricky semantics, i.e., ε\varepsilon-semantics, to handle the interaction between models and sentences outside their probabilistic inference. The characteristic allows us to answer the open question [3]:

Perhaps, the greatest technical challenge left for circumscription and model preference theories in general is how to encode preferences among abnormalities or defaults.

The abnormalities or defaults can be seen as unobserved statements. We thus think that the preferences should be encoded by their posterior probabilities derived by taking into account all the uncertainties of models.

A natural criticism against our work is that the Bayesian entailment is inadequate as a non-monotonic consequence relation due to the lack of Cautious monotony and Cut. Indeed, Gabbay [13] considers, on the basis of his intuition, that non-monotonic consequence relations satisfy at least Cautious monotony, Reflexivity and Cut. However, it is controversial because of unintuitive behavior of Cautious monotony and Cut in extreme cases. A consequence relation \mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}} with Cut, for instance, satisfies xN+1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}x_{N+1} when it satisfies x1\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}x_{1} and xixi+1x_{i}\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\sim$}}x_{i+1}, for all i(1iN)i(1\leq i\leq N). This is very unintuitive when NN is large. Brewka [3] in fact points out the infinite transitivity as a weakness of Cut. In this paper, we reconcile both the positions by providing the alternative inferential properties: Classically cautious monotony and Classical cut. The reconciliation does not come from our intuition, but from theoretical analysis of the Bayesian entailment. What we introduced to define the Bayesian entailment is only the probability distribution over valuation functions, representing uncertainty of states of the world. Given the distribution, the Bayesian entailment is simply derived from the laws of probability theory. Furthermore, the preferential entailment satisfying Cautious monotony and Cut is shown to correspond to the maximum a posteriori entailment that is an approximation of the Bayesian entailment. It tells us that Cautious monotony and Cut are ideal under the special condition that a state of the world exists deterministically. They are not ideal under the general perspective that states of the world are probabilistically distributed.

The Bayesian entailment is flexible to extend. For example, a possible extension of Figure 1 is a hidden Markov model shown in Figure 3. It has a valuation variable and a sentence(s) variable, for each time step tt where 1tN1\leq t\leq N. Entailment Δ1,,ΔNωαN\Delta_{1},...,\Delta_{N}\mathrel{\scalebox{1.0}[1.5]{$\shortmid$}\mkern-3.1mu\raisebox{0.43057pt}{$\approx$}}_{\omega}\alpha_{N} defined in accordance with Definition 2 concludes αN\alpha_{N} by taking into account not only the current observation ΔN\Delta_{N} but also the previous states of the world VN1V_{N-1} updated by all of the past observations Δ1,,ΔN1\Delta_{1},...,\Delta_{N-1}. It is especially useful when observations are contradictory, ambiguous or easy to change.

Finally, our hypothesis is that the Bayesian entailment can be a mathematical model of how human brains implement an entailment. Recent studies of neuroscience, e.g., [22, 18, 14, 17, 4, 5, 12], empirically show that Bayesian inference or its approximation explains several functions of the cerebral cortex, the outer portion of the brain in charge of higher-order functions such as perception, memory, emotion and thought. It raises the Bayesian brain hypothesis [11] that the brain is a Bayesian machine. Since logic, as the law of thought, is a product of a human brain, it is natural to think there is a Bayesian interpretation of logic. Of course, we understand that the Bayesian brain hypothesis is controversial and that it is a subject to a scientific experiment. We, however, think that this paper provides sufficient evidences for the hypothesis in terms of logic.

This paper gives a Bayesian account of entailment and characterizes its abstract inferential properties. The Bayesian entailment was shown to be a monotonic consequence relation in an extreme case. In general, it is a non-monotonic consequence relation satisfying Classical cautious monotony and Classical cut we introduced to reconcile existing conflicting views. The preferential entailment was shown to correspond to a maximum a posteriori entailment, which is an approximation of the Bayesian entailment. We finally discuss merits of our proposals in terms of encoding preferences on defaults, handling change and contradiction, and modeling human entailment.

Refer to caption
Figure 3: Hidden Markov model for an extended functionality of the Bayesian entailment.

References

  • [1] Adams, E.W.: A Primer of Probability Logic. Stanford, CA: CSLI Publications (1998)
  • [2] Adams, E.W.: The Logic of Conditionals. Dordrecht: D. Reidel Publishing Co (1975)
  • [3] Brewka, G., Dix, J., Konolige, K.: Nonmonotonic Reasoning: An Overview. CSLI Publications (1997)
  • [4] Chikkerur, S., Serre, T., Tan, C., Poggio, T.: What and where: A bayesian inference theory of attention. Vision Research 50 (2010)
  • [5] Colombo, M., Seri\UTF00E8s, P.: Bayes in the brain: On bayesian modelling in neuroscience. The British Journal for the Philosophy of Science 63, 697–723 (2012)
  • [6] Cross, C.: From worlds to probabilities: A probabilistic semantics for modal logic. Journal of Philosophical Logic 22, 169–192 (1993)
  • [7] Dubois, D., Prade, H.: Readings in uncertain reasoning. chap. An Introduction to Possibilistic and Fuzzy Logics, pp. 742–761. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1990), http://dl.acm.org/citation.cfm?id=84628.85368
  • [8] van Fraassen, B.: Probabilistic semantics objectified: I. postulates and logics. Journal of Philosophical Logic 10, 371–391 (1981)
  • [9] van Fraassen, B.: Gentlemen’s wagers: Relevant logic and probability. Philosophical Studies 43, 47–61 (1983)
  • [10] Friedman, N., Halpern, J.Y.: Plausibility measures and default reasoning. In: Proc. of the 13th National Conference on Artificial Intelligence. pp. 1297–1304 (1996)
  • [11] Friston, K.: The history of the future of the bayesian brain. Neuroimage 62-248(2), 1230–1233 (2012)
  • [12] Funamizu, A., Kuhn, B., Doya, K.: Neural substrate of dynamic bayesian inference in the cerebral cortex. Nature Neuroscience 19, 1682–1689 (2016)
  • [13] Gabbay, D.: Theoretical Foundations for Non-monotonic Reasoning in Expert Systems. Springer-Verlag (1985)
  • [14] George, D., Hawkins, J.: A hierarchical bayesian model of invariant pattern recognition in the visual cortex. In: Proc. of 2005 International Joint Conference on Neural Networks. pp. 1812–1817 (2005)
  • [15] Goldszmidt, M., Pearl, J.: Rank-based systems: A simple approach to belief revision, belief update, and reasoning about evidence and actions. In: Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning. pp. 661–672. KR’92, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1992), http://dl.acm.org/citation.cfm?id=3087223.3087290
  • [16] Goosens, W.K.: Alternative axiomatizations of elementary probability theory. Notre Dame Journal of Formal Logic 20, 227–239 (1979)
  • [17] Ichisugi, Y.: The cerebral cortex model that self-organizes conditional probability tables and executes belief propagation. In: Proc. of 2007 International Joint Conference on Neural Networks. pp. 1065–1070 (2007)
  • [18] Knill, D.C., Pouget, A.: The bayesian brain: the role of uncertainty in neural coding and computation. Trends in Neurosciences 27, 712–719 (2004)
  • [19] Kraus, S., Lehmann, D., Magidor, M.: Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence 44(1-2), 167–207 (1990)
  • [20] Leblanc, H.: Probabilistic semantics for first-order logic. Zeitschrift für mathematische Logik und Grundlagen der Mathematik 25, 497–509 (1979)
  • [21] Leblanc, H.: Alternatives to Standard First-Order Semantics, vol. I, pp. 189–274. Dordrecht: Reidel, handbook of philosophical logic, d. gabbay and f. guenthner (eds.) edn. (1983)
  • [22] Lee, T., Mumford, D.: Hierarchical bayesian inference in the visual cortex. Journal of Optical Society of America 20, 1434–1448 (2003)
  • [23] Morgan, C.: Simple probabilistic semantics for propositional K, T, B, S4, and S5. Journal of Philosophical Logic 11, 443–458 (1982)
  • [24] Morgan, C.: There is a probabilistic semantics for every extension of classical sentence logic. Journal of Philosophical Logic 11, 431–442 (1982)
  • [25] Morgan, C.: Probabilistic Semantics for Propositional Modal Logics, pp. 97–116. New York, NY: Haven Publications, essays in epistemology and semantics, h. leblanc, r. gumb, and r. stern (eds.) edn. (1983)
  • [26] Morgan, C., Leblanc, H.: Probabilistic semantics for intuitionistic logic. Notre Dame Journal of Formal Logic 24, 161–180 (1983)
  • [27] Pearl, J.: Probabilistic Semantics for Nonmonotonic Reasoning, pp. 157–188. Cambridge, MA: The MIT Press, philosophy and ai: essays at the interface, r. cummins and j. pollock (eds.) edn. (1991)
  • [28] Pearl, J.: Probabilistic semantics for nonmonotonic reasoning: a survey. In: Proc. of the 1st International Conference on Principles of Knowledge Representation and Reasoning. pp. 505–516 (1989)
  • [29] Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62, 107–136 (2006)
  • [30] Russell, S., Norvig, P.: Artificial Intelligence : A Modern Approach, Third Edition. Pearson Education, Inc. (2009)
  • [31] Sanborn, A.N., Chater, N.: Bayesian brains without probabilities. Trends in Cognitive Sciences 20, 883–893 (2016)
  • [32] Shoham, Y.: Nonmonotonic logics: Meaning and utility. In: Proc. of the 10th International Joint Conference on Artificial Intelligence. pp. 388–393 (1987)