This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Affiliation: Division of Computing and Mathematical Sciences (CMS), California Institute of Technology. Email: [email protected] and [email protected]

Aggregation of models,
choices, beliefs, and preferences

Hamed Hamze Bajgiran, Houman Owhadi
Abstract.

A natural notion of rationality/consistency for aggregating models is that, for all (possibly aggregated) models AA and BB, if the output of model AA is f(A)f(A) and if the output model BB is f(B)f(B), then the output of the model obtained by aggregating AA and BB must be a weighted average of f(A)f(A) and f(B)f(B). Similarly, a natural notion of rationality for aggregating preferences of ensembles of experts is that, for all (possibly aggregated) experts AA and BB, and all possible choices xx and yy, if both AA and BB prefer xx over yy, then the expert obtained by aggregating AA and BB must also prefer xx over yy. Rational aggregation is an important element of uncertainty quantification, and it lies behind many seemingly different results in economic theory: spanning social choice, belief formation, and individual decision making. Three examples of rational aggregation rules are as follows. (1) Give each individual model (expert) a weight (a score) and use weighted averaging to aggregate individual or finite ensembles of models (experts). (2) Order/rank individual model (expert) and let the aggregation of a finite ensemble of individual models (experts) be the highest-ranked individual model (expert) in that ensemble. (3) Give each individual model (expert) a weight, introduce a weak order/ranking over the set models/experts (two models may share the same rank), aggregate AA and BB as the weighted average of the highest-ranked models (experts) in AA or BB. Note that (1) and (2) are particular cases of (3) (in (1) all models/experts share the same rank, and in (2) the ranking is strict). In this paper, we show that all rational aggregation rules are of the form (3). This result unifies aggregation procedures across many different economic environments, showing that they all rely on the same basic result. Following the main representation, we show applications and extensions of our representation in various separated economics topics such as belief formation, choice theory, aggregation of optimal models, and social welfare economics.

1. Introduction

This paper presents a general framework characterizing rational/consistent aggregation (of models, choices, beliefs, and preferences, which we simply refer to as features) with applications to economic theory. In this framework, individual features have outcomes, and aggregation rules identify the outcome of groups of features. We focus on a recursive form of aggregation, which is the case in the Cased-Based decision theory developed by [18, 6], where the aggregate outcome for larger collections of features results from aggregating the outcomes of smaller subsets. Specifically, the aggregate outcome of the union of two disjoint collections of features is a weighted average of the outcome of each collection of features separately. We show that this form of recursive aggregation is a common structure that lies behind many seemingly unrelated results in economic theory.

Our central axiom, the weighted averaging axiom/property (we will use the terms axiom and property interchangeably), is a simple formalization of the recursivity. It imposes a structure on how the outcome of the union of two disjoint subsets of features relates to the outcome of each of the subsets separately. The axiom states that the outcome of a set of features can be recursively computed by first partitioning the set of features into two disjoint subsets. Then, the aggregated outcome is a weighted average of the outcome of each of the two smaller subsets.

Our contribution is three-fold: (1) We find all aggregation procedures that satisfy the weighted averaging axiom, which generalized the result of [6]. Moreover, by enhancing the procedure with continuity axiom, we connect the axiom to the path independent axiom, which is studied in the choice literature. (2) With a simple geometrical duality argument, we connect the weighted averaging to the combination axiom of [18] and Extended Pareto of [35]. (3) We present applications and extensions to different domains of economics, notably in the context of Belief Formation, Choice Theory, and Welfare Economics.

Formally, we define an aggregation rule as a function on the set of subsets of features that maps each subset of features to an outcome. Our main result finds all aggregation rules that satisfy recursivity in the form of our weighted average axiom. We show that as long as for any two disjoint subsets of features, the outcome of their union is a weighted average (with non-negative weights) of the outcome of each subset, then the aggregation rule has a simple form (with a technical richness condition):

There exist a strictly positive weight function and a weak order (a transitive and complete order) over the set of features, with the outcome of any subset of features being the weighted average of the outcomes of each of the highest-ordered features of the subset separately.

The importance of the result is that the weight of each feature is independent of the group of features being aggregated. The role of the weak order in the main representation is to partition the set of features into different equivalence classes and rank them from the highest class to the lowest class. If all features of a subset of features are in the same class, then the outcome is the weighted average of the outcomes of each member of the set. However, if some features have a higher ranking than others, then the aggregation rule will ignore lower-ordered features.

Following the main result, we discuss two special cases of our main result. In the first case, we introduce the strict weighted averaging axiom to represent the case where the outcome of the union of two disjoint subsets of features is contained in the “relative” interior of the outcomes of each subset separately. We then show that the strict weighted averaging axiom is the necessary and sufficient condition for the weak order, in the main representation, to have only one equivalence class. Hence, the outcome of a subset of features is just the weighted average (with strictly positive weights) of the outcomes of each feature separately.

In the second case, we model the space of features as a subset of a vector space. By considering the distance between vectors, we capture the notion of similarity or closeness of features. In this context, we can consider the following notion of continuity of outcomes with respect to features: replacing a feature in a subset of features with another closely similar feature, the outcome of this new subset stays close to the outcome of the previous one. Under this property, which we define as the continuity property, we show that all similar enough features attain the same ranking with respect to the weak order. Moreover, the weight function is a continuous function over the set of features. In other words, the weight between two close (or similar) features should be close. In a special case, where the space of features is a convex set, we show that all features attain the same ranking. In this case, there is no difference between the weighted averaging and the strict weighted averaging property.

Depending on the application, features and the aggregation rules may have different interpretations. A feature may represent a signal or an event containing some information about the true state of nature. In this case, the role of an aggregation rule is to form a belief about the true state of nature. In the context of choice theory, features may represent choice objects, where an aggregation rule behaves as a decision-maker that selects a lottery or a random choice out of a group of choice objects. Another interpretation is in the context of welfare economics, where each feature represents a preference of an individual over some alternatives. In this case, an aggregation rule represents a social welfare function that associates with each preference profile, a single preference ordering over the set of alternatives.

To describe a natural interpretation of our result, consider the problem of modeling an agent who seeks to make a prediction about the true state of nature, conditional on observing a set of events. In this context, a feature represents an event, and the outcome of the model conditional on observing a set of events is a belief about the true state of nature. Our main result provides a necessary and sufficient condition for the belief formation process to behave as a Bayesian Updater. Under the averaging property of the belief formation process, there exists a conditional probability system associated with the set of events, and the belief formation process conditional on observing a set of events behaves like a conditional probability. The weak order of the main result is capturing the idea that conditional on observing even a zero probability event, the belief formation still behaves as a Bayesian updater.

To motivate the proposed framework, sections 4, 5, 6, and 7 present applications and extensions of our main representation results. We show that the weighted averaging axiom is closely related to many known axioms in different topics, from the Pareto axiom in Social Choice Theory to the path independence axiom in Choice Theory.

2. Model aggregation

Let XX be a nonempty set111We make no assumptions about the cardinality or topology of XX. Write XX^{*} for the set of all nonempty finite subsets of XX. One may interpret XX as a (possibly infinite) set of models, AXA\in X^{*} as a finite set of models, and XX^{*} as the set of all nonempty finite sets of models. Let HH be a separable Hilbert space with H=nH=\mathbb{R}^{n} as a prototypical example.

Definition 1.

An aggregation rule on XX is a function f:XHf:X^{*}\rightarrow H, that associates with every AXA\in X^{*} a vector f(A)Hf(A)\in H.222All discussions of this paper continue to hold if HH is replaced by any general (possibly infinite dimensional) normed vector space.

For xXx\in X and AXA\in X^{*}, one may interpret f({x})f(\{x\}) as the output of the model xx, and f(A)f(A) as the output of the aggregation of models contained in AA. The purpose of this section is to characterize aggregation rules satisfying the weighted averaging axiom/property defined below.

Definition 2.

We say that an aggregation rule ff satisfies the weighted averaging axiom/property if for all A,BXA,B\in X^{*} such that AB=A\cap B=\emptyset, it holds true that

f(AB)=λf(A)+(1λ)f(B)f(A\cup B)=\lambda f(A)+(1-\lambda)f(B) (2.1)

for some λ[0,1]\lambda\in[0,1] (which may depend on AA and BB). We say that ff satisfies the strict weighted averaging axiom/property if (2.1) holds true for λ(0,1)\lambda\in(0,1). We say that ff satisfies the extreme weighted averaging axiom/property if (2.1) holds true for λ{0,1}\lambda\in\{0,1\}.

Two simple examples of aggregation rules satisfying the weighted averaging property are as follows.

Example 1.

Write ++\mathbb{R_{++}} for the set of strictly positive real numbers and let w:X++w:X\to\mathbb{R_{++}} be a weight function on XX. For xXx\in X, let333Abusing notations we write f(x)f(x) for f({x})f(\{x\}) for xXx\in X. f(x)f(x) be the output of model xx. For AXA\in X^{*}, define

f(A)=xA(w(x)yAw(y)f(x)).f(A)=\sum\limits_{x\in A}\left(\frac{w(x)}{\sum\limits_{y\in A}w(y)}f(x)\right)\,. (2.2)

Then ff satisfies the strict weighted averaging property.

Example 2.

Consider a complete strict order \succ on XX. Given any feature xXx\in X, let f(x)Hf(x)\in H be the output of the model xx. For AXA\in X^{*}, write M(A,)M(A,\succ) for the highest order element444M(A,)AM(A,\succ)\in A and M(A,)xM(A,\succ)\succ x for xA{(A,)}x\in A\setminus\{\mathcal{M}(A,\succ)\}. in AA. For AXA\in X^{*}, define

f(A)=f(M(A,)).f(A)=f(M(A,\succ)). (2.3)

Then ff satisfies the extreme weighted averaging property.

We will now show that all aggregation rules satisfying the strict weighted averaging property must be as in Example 1 if XX contains at least three elements x,y,zx,y,z such f(x),f(y)f(x),f(y) and f(z)f(z) are not collinear.

Definition 3.

An aggregation rule f:XHf:X^{*}\rightarrow H is rich if the range of ff is not a subset of a line.

Theorem 1.

Let the aggregation rule f:XHf:X^{*}\to H be rich. The following are equivalent:

  1. (1)

    The aggregation rule ff satisfies the strict weighted averaging property.

  2. (2)

    There exists a weight function w:XR++w:X\to R_{++} such that for every AX:A\in X^{*}:

    f(A)=xAw(x)f(x)xAw(x).f(A)=\frac{\sum\limits_{x\in A}w(x)f(x)}{\sum\limits_{x\in A}w(x)}. (2.4)

Moreover, the function ww is unique up to multiplication by a positive number.

We will now show that all aggregation rules satisfying the weighted averaging property must be of the form of a combination of Example 1 and 2 if for all xXx\in X we can find y,zXy,z\in X such that f({x}),f({y})f(\{x\}),f(\{y\}), and f({z})f(\{z\}) are not collinear and the pairwise aggregation of x,y,zx,y,z does not satisfy the extreme aggregation property.

Definition 4.

An aggregation rule f:XHf:X^{*}\rightarrow H is strongly rich if for any xXx\in X there exist y,zXy,z\in X such that:

  1. (1)

    f({x}),f({y})f(\{x\}),f(\{y\}), and f({z})f(\{z\}) are not on the same line.

  2. (2)

    f({x,y}){f(x),f(y)}f(\{x,y\})\notin\{f(x),f(y)\} and f({x,z}){f(x),f(z)}f(\{x,z\})\notin\{f(x),f(z)\}555In the proof of our main result, we show that as long as f({x}),f({y})f(\{x\}),f(\{y\}), and f({z})f(\{z\}) are not on the same line, then f({x,y}){f(x),f(y)}f(\{x,y\})\notin\{f(x),f(y)\} and f({x,z}){f(x),f(z)}f({y,z}){f(y),f(z)}f(\{x,z\})\notin\{f(x),f(z)\}\Rightarrow f(\{y,z\})\notin\{f(y),f(z)\}..

Definition 5.

A binary relation \succcurlyeq on XX is a weak order on XX, if it is reflexive (xxx\succcurlyeq x), transitive (xyx\succcurlyeq y and yzy\succcurlyeq z imply xzx\succcurlyeq z), and complete (for all x,yXx,y\in X, xyx\succcurlyeq y or yxy\succcurlyeq x). We say that xx is equivalent to yy, and write xyx\sim y, if xyx\succcurlyeq y and yxy\succcurlyeq x.

Theorem 2.

Let the aggregation rule f:XHf:X^{*}\to H be strongly rich. Then the following are equivalent:

  1. (1)

    The aggregation rule ff satisfies the weighted averaging axiom.

  2. (2)

    There exist a unique weak order \succcurlyeq on XX and a weight function w:XR++w:X\to R_{++} such that for every AXA\in X^{*}:

    f(A)=xM(A,)(w(x)yM(A,)w(y))f(x).f(A)=\sum\limits_{x\in M(A,\succcurlyeq)}\left(\frac{w(x)}{\sum\limits_{y\in M(A,\succcurlyeq)}w(y)}\right)f(x). (2.5)

Moreover in this case, the function ww is unique up to multiplication by a positive number in each of the equivalence classes of the weak order \succcurlyeq.

The representation (2.5) has two components: one is captured by the weak order \succcurlyeq; the other is the weight function ww. The weak order partitions the set of features into equivalence classes and ranks them from top to bottom. If all models xAx\in A have the same ranking, then the outcome f(A)f(A) of AXA\in X^{*} is the weighted average of the outcomes of each element xAx\in A. However, if some elements have a higher ranking than others, then the aggregation rule will ignore the lower-ordered elements. Hence, the assessment of the aggregation rule has two steps. First, it only considers the highest-ordered elements. Then, it uses the weight function and finds the weighted average among the highest-ordered elements.

The richness condition is necessary for both theorems 1 and 2. Example 3 shows that without this condition, aggregation rules may satisfy the strict weighted averaging axiom without having a weighted average representation.

Example 3.

Let X={x,y,z}X=\{x,y,z\} with f({x})=0,f({y})=1/2,f({z})=1,f({x,y})=1/4,f({y,z})=3/4,f({x,z})=3/8f(\{x\})=0,f(\{y\})=1/2,f(\{z\})=1,f(\{x,y\})=1/4,f(\{y,z\})=3/4,f(\{x,z\})=3/8, and f({x,y,z})=7/16f(\{x,y,z\})=7/16. Assume that there exists a positive weight function on XX, and the aggregation rule over any coalition of XX has a representation as a weighted average of its elements. Assume that w:X++w:X\rightarrow\mathbb{R}_{++} is the corresponding weight function. In order to have such a representation, we should have f({x,y})=w(x)f(x)+w(y)f(y)w(x)+w(y)f(\{x,y\})=\frac{w(x)f(x)+w(y)f(y)}{w(x)+w(y)}. By considering the value of f({x,y}),f({x})f(\{x,y\}),f(\{x\}), and f({y})f(\{y\}), we get w(x)w(y)=1\frac{w(x)}{w(y)}=1. Similarly, by considering the coalition {y,z}\{y,z\} we get w(y)w(z)=1\frac{w(y)}{w(z)}=1. By combining these two observations, we get w(x)w(z)=1\frac{w(x)}{w(z)}=1. However, considering the coalition {x,z}\{x,z\}, and the representation f({x,z})=w(x)f(x)+w(z)f(z)w(x)+w(z)f(\{x,z\})=\frac{w(x)f(x)+w(z)f(z)}{w(x)+w(z)}, we get w(x)w(z)=5/3\frac{w(x)}{w(z)}=5/3, which is a contradiction. Hence, the representation does not work in this case.

Assume XX is a subset of a normed vector space. We will now show that weight ww in the representation (2.5) must be continuous if ff is continuous as defined below.

Definition 6.

An aggregation rule f:XHf:X^{*}\rightarrow H is (continuous) if, for any AX{}A\in X^{*}\cup\{\emptyset\}, any xXAx\in X\setminus A, and any sequence (xn)n=1X(x_{n})_{n=1}^{\infty}\in X, xnxx_{n}\to x implies666The convergence in XX is with respect to the norm on XX, and the convergence in the range of the aggregation rule is with respect to norm of HH. f(A{xn})f(A{x})f(A\cup\{x_{n}\})\to f(A\cup\{x\}).

Theorem 3.

Let X be a subset of a normed vector space and let f:XHf:X^{*}\to H be a strongly rich continuous aggregation rule satisfying the weighted averaging property. Then the representation (2.5) holds true with a continuous weight function w:XR++w:X\to R_{++}. Furthermore, for any xXx\in X there exists ϵ>0\epsilon>0 such that yxy\sim x for all where yXy\in X such that |yx|<ϵ|y-x|<\epsilon.

We will now show that if XX is a convex subset of a normed vector space, then any continuous aggregation rule on XX under the weighted averaging axiom can only have a single equivalence class, and as a consequence, both the weighted averaging and strict weighted averaging properties lead to the representation (2.4) for ff.

Theorem 4.

Let XX be a convex subset of a normed vector space, and f:XHf:X^{*}\to H a rich continuous aggregation rule satisfying the weighted averaging property. Then, there exists a continuous weight function w:XR++w:X\to R_{++} such that the representation (2.4) holds true.

3. Preference aggregation and duality

In many cases, where the range of the aggregation rule is the set of linear functionals, a simple geometrical interpretation of the weighted averaging axiom results in a related but different consistency axiom. Let HH be a Hilbert space and write ,\langle\cdot,\cdot\rangle for the associated inner product. Let SHS\subset H be a convex subset of HH. Every hHh\in H induces a weak order (reflexive, transitive, and complete binary relation) h\succsim_{h} over the set SS by:

s1hs2h,s1h,s2s_{1}\succsim_{h}s_{2}\Leftrightarrow\langle h,s_{1}\rangle\geq\langle h,s_{2}\rangle (3.1)

Let XX be a non empty set. In this section we define an aggregation rule777By the Riesz representation theorem, ff can also be defined as function mapping XX^{*} to the space of continuous linear functionals on HH, in which case for AXA\in X^{*}, f(A)f(A) is identified with the unique element hHh\in H such that f(A)(x)=h,xf(A)(x)=\langle h,x\rangle for xHx\in H. as a function ff mapping XX^{*} to HH. Since we may interpret each hHh\in H as a linear ranking of the elements of the set SS, the goal of an aggregator ff is to attach an aggregated linear ranking to every finite subset AA of XX.

Example 4.

A simple example of interpretation of X,SX,S and ff is as follows. Let XX be a set of experts and SS a set of alternatives (models, decisions, choices). An expert xXx\in X defines a ranking/preference f({x})f(\{x\}) over SS. An aggregation rule ff is a voting mechanism enabling the aggregation of the preferences of a finite set of experts. A rational notion of consistency (employed here and formally introduced below in Definition 7) is that if A,BXA,B\in X^{*} are two disjoint sets of experts such that both (f(A)f(A) and f(B)f(B)) prefer s1Ss_{1}\in S over s2Ss_{2}\in S, then their aggregate f(AB)f(A\cup B) must also prefer s1s_{1} over s2s_{2}.

Observing that the order (3.1) is invariant under scaling of hh we will restrict the range of aggregation rules to the set Nν={hH|h,ν=1}N_{\nu}=\{h\in H|\ \langle h,\nu\rangle=1\} for some νH\nu\in H. This restriction also avoids entirely opposite ranking directions by imposing a shared rank on ν\nu. The condition of existence of such a ν\nu is what we call a minimal agreement condition.

Definition 7.

An aggregation rule f:XNvf:X^{*}\to N_{v} is weakly consistent if for all disjoint sets A,BXA,B\in X^{*}, and for all s1,s2Ss_{1},s_{2}\in S,

s1f(A)s2,s1f(B)s2s1f(AB)s2s_{1}\succsim_{f(A)}s_{2}\ ,\ s_{1}\succsim_{f(B)}s_{2}\Rightarrow s_{1}\succsim_{f(A\cup B)}s_{2} (3.2)

Moreover, it is consistent if it also satisfies the following condition:

s1f(A)s2,s1f(B)s2s1f(AB)s2s_{1}\succ_{f(A)}s_{2}\ ,\ s_{1}\succsim_{f(B)}s_{2}\Rightarrow s_{1}\succ_{f(A\cup B)}s_{2} (3.3)

A simple duality argument (Farkas’s lemma) results in the following theorem.

Theorem 5.

Let f:XNνf:X^{*}\to N_{\nu} be an aggregation rule. Then, the followings are equivalent:

  1. (1)

    ff is consistent.

  2. (2)

    ff satisfies the strict weighted averaging property.

Moreover, the followings are also equivalent:

  1. (1)

    ff is weakly consistent.

  2. (2)

    ff satisfies the weighted averaging property.

Using Theorem 1, we immediately attain the representation of the consistent aggregation rules.

Corollary 1.

Let f:XNνf:X^{*}\to N_{\nu} be a consistent rich aggregation rule. Then, there exists a weight function w:X++w:X\to\mathbb{R}_{++} such that for every set of features AXA\in X^{*},

f(A)=xA(w(x)yAw(y))f(x).f(A)=\sum\limits_{x\in A}\left(\frac{w(x)}{\sum\limits_{y\in A}w(y)}\right)f(x). (3.4)

Moreover, the weight function is unique up to multiplication by a positive number.

Note that we can generalize the result to the case of weakly consistent rules.

Remark 1.

The notion of consistency obtained as a dual interpretation of the weighted averaging is the same axiom as Extended Pareto introduced by [35]. Similarly, it is the Combination axiom in [18, 19].

4. Belief Formation

In this section, we interpret the set of features as signals. Each signal contains some information about the distribution of states of nature. The role of an aggregation rule is an agent who makes a prediction about the true state of nature-based on observing some signals. In this context, the range of an aggregation rule is that of probability distributions over the states of nature. Following [6], an aggregation rule is a belief formation process that associates with each finite set of signals, a belief over the states of nature.

The representation of the belief formation process under the weighted averaging axiom is a straightforward application of the main results. Using our representation, on the one hand, we propose an extension, where the timing of signals may be important. We consider the case where an agent can receive signals in different time zones in the past. The agent tries to form a prediction at the present time, and it may perceive signals closer to the time of the prediction as more credible. To capture the representation, we introduce the stationarity axiom, in which a belief induced by a set of received signals and their timing is the same as the belief induced by shifting the timings of all signals by a constant number to the past.

Under stationarity, any belief formation process satisfying the strict weighted averaging axiom has a weight function over the set of signals and an exponential discount factor over each time zone. The belief associated with a set of received signals is the discounted weighted average of the beliefs associated with each signal. In this case, the weight function captures the time-independent value of each signal.

On the other hand, we interpret the set of signals as the information structure of an agent who wants to predict the true state. We interpret each subset of signals as an event in her information structure. We show that as long as the information structure has a finite cardinality, the strict weighted averaging axiom is the necessary and sufficient condition for a rich belief formation process to appear as a Bayesian updater. This result answers the question in [36], regarding finding a necessary and sufficient condition for a belief formation process to act as a Bayesian updating rule.

4.1. Belief Formation Processes

Let Ω={1,2,,n}\Omega=\{1,2,\ldots,n\} be a set of states of nature and let Δ(Ω)\Delta(\Omega) be the set of all probability distributions over Ω\Omega. We interpret the elements of the set XX as disjoint signals or events. The role of an aggregation rule over a finite subset of XX^{*} is to predict the true state of nature by assigning probabilities to each state of Ω\Omega. Therefore, following [6], the aggregation rules can be interpreted as a belief formation process, which assigns a belief to the set of states of nature after observing a finite subset of signals.

Definition 8.

A belief formation process is a function f:XΔ(Ω)f:X^{*}\to\Delta(\Omega), that associates with every finite set of signals AXA\in X^{*}, a belief f(A)Δ(Ω)f(A)\in\Delta(\Omega) on the states of nature.

Theorem 2 shows that if the belief induced by the union of two disjoint finite sets of signals is on the line segment connecting the beliefs induced by each set of signals separately, then, under the strong richness condition, there exists a strictly positive weight function and a weak order over the set of signals such that the belief over any finite subset of signals is a weighted average of the beliefs induced by each of the highest-ordered signals of that subset.

By enforcing the belief formation process to use both of the induced beliefs, i.e., the belief induced by the union of two disjoint finite sets of signals is on the “interior” of the line segment connecting the induced belief of each set of signals separately, we can use Theorem 1 to find the representation. Formally, we have:

Corollary 2.

Let f:XΔ(Ω)f:X^{*}\to\Delta({\Omega}) be a strongly rich belief formation process satisfying the weighted averaging property. Then, there exist a unique weak order \succcurlyeq on XX and a weight function w:XR++w:X\to R_{++} such that for every AXA\in X^{*}:

f(A)=xM(A,)(w(x)xM(A,)w(x))f(x).f(A)=\sum\limits_{x\in M(A,\succcurlyeq)}\left(\frac{w(x)}{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)}\right)f(x). (4.1)

Moreover, if ff satisfies the strict weighted averaging property, then the weak order \succcurlyeq has only one equivalence class and for every AXA\in X^{*}:

f(A)=xA(w(x)xAw(x))f(x).f(A)=\sum\limits_{x\in A}\left(\frac{w(x)}{\sum\limits_{x\in A}w(x)}\right)f(x). (4.2)

Although representation (4.2) is, under the strict weighted averaging property, similar to the one in [6], their belief formation process is defined over sequences of signals, in which each sequence can have multiple copies of the same signal. In contrast, we define the belief formation process over sets of signals, and there can be only one copy of a signal in each set. Billot et al.’s main axiom, concatenation axiom, is defined over any two sequences of signals, and counts the number of each signal in each sequence. However, our strict weighted averaging property expressed for f(AB)f(A\cup B) does not allow AA and BB to both contain the same signal.

4.2. Role of Timing

We now explore the role of the timing of signals by associating signals with time labels. In that setting, a signal closer to the time of the prediction may be perceived as more credible (have more weight) compared to the same signal if it was received further in the past. Formally, let X be the set of signals. The present time denoted by 0, and time tt\in\mathbb{N} represents tt units of time before the present time. For a given finite subset of signals AXA\in X^{*}, let a function TA:AT_{A}:A\to\mathbb{N}, represent the timing of each signal in the set AA, i.e., for any signal xAx\in A, TA(x)T_{A}(x) is the time of receiving the signal xx. Given a cc\in\mathbb{N}, TA+cT_{A}+c represents a time shift of size cc over the timing TAT_{A} of a set of received signals AA. Finally, the set XT={(A,TA)|AX,TA:A}X^{T}=\{(A,T_{A})\ |\ A\in X^{*},\ T_{A}:A\to\mathbb{N}\} represents all possible realizations of the received signals. In this context, a belief formation process is a function f:XTΔ(Ω)f:X^{T}\to\Delta(\Omega).

Our main consistency property, in addition to the strict weighted averaging property, is the stationarity property. A belief formation process is stationary if a belief induced by a set of received signals and their timing is the same as the belief induced by a constant shift of timings of the same received signals. More precisely:

Definition 9.

A stationary belief formation is a function f:XTΔ(Ω)f:X^{T}\to\Delta(\Omega) such that

f((A,TA+c))=f(A,TA),f((A,T_{A}+c))=f(A,T_{A}),

for AXA\in X^{*}, TA:AT_{A}:A\to\mathbb{N}, and cc\in\mathbb{N}.

The next proposition characterizes stationarity belief-formation processes satisfying the strict weighted averaging property.

Proposition 1.

Let a rich and stationary belief formation process f:XTΔ(Ω)f:X^{T}\to\Delta(\Omega) satisfy the strict weighted averaging property. Then, there exist a unique discount factor q(0,)q\in(0,\infty) and a unique (up to multiplication by a positive number) weight function w:X++w:X\to\mathbb{R}_{++}, such that for all (A,TA)XT(A,T_{A})\in X^{T}:

f(A,TA)=xAqTA(x)w(x)f(x)xAqTA(x)w(x).f(A,T_{A})=\frac{\sum\limits_{x\in A}q^{T_{A}(x)}w(x)f(x)}{\sum\limits_{x\in A}q^{T_{A}(x)}w(x)}. (4.3)

As a consequence of the representation, under the assumption of the proposition, the weight over a received signal xAx\in A can be separated into two separate factors. One is the intrinsic value of the signal, captured by w(x)w(x). The other one is the role of timing, captured by qTA(x)q^{T_{A}(x)}. Moreover, the only discounting that captures the role of the timing is the exponential form. If q=1q=1, the timing is not important. Hence, the belief formation process only considers the intrinsic value of each signal. However, when q1q\neq 1, the belief formation process places relatively more (q(0,1)q\in(0,1)) or less (q(1,)q\in(1,\infty)) weight on a signal received closer to the time of the prediction.

4.3. Bayesian Updating

Let (X,X{})(X,X^{*}\cup\{\emptyset\}) be the measure space of events, where XX has a finite number of disjoint events. The space of events captures the information structure of the belief formation process. Similarly, by considering the set Ω={1,,n}\Omega=\{1,\ldots,n\}, we denote (Ω,2Ω)(\Omega,2^{\Omega}) as the measure space of states of nature, where 2Ω2^{\Omega} is the set of subsets of the set Ω\Omega. For any probability distribution dΔ(Ω)d\in\Delta(\Omega) and any subset of the state of nature BΩB\in\Omega, let d(B)d(B) denote the probability of BB which is induced by the distribution dd. Hence, d(B)=ωBd(ω)d(B)=\sum_{\omega\in B}d(\omega).

Definition 10.

A belief formation process f:XΔ(Ω)f:X^{*}\to\Delta(\Omega) is Bayesian, if there exists a probability measure PP on the space (Ω×X,2Ω×X)(\Omega\times X,2^{\Omega\times X}), such that for every AXA\in X^{*} and B2ΩB\in 2^{\Omega} we have:

(f(A))(B)=P(B×A)PX(A)\big{(}f(A)\big{)}(B)=\frac{P(B\times A)}{P_{X}(A)} (4.4)

where, PXP_{X} is the marginal probability distribution of PP over XX.

The right-hand side of the previous equation is the conditional probability of BB given AA. Therefore, a Bayesian belief formation process ff behaves as a Bayesian updater: by observing an event AA in her information structure XX^{*}, her prediction about the probability of the true state being in a subset BΩB\in\Omega comes from the Bayes rule. To put it differently, (f(A))(B)\big{(}f(A)\big{)}(B) is equal to the conditional probability P(B|A)P(B|A).

Our next proposition shows that our strict weighted averaging axiom is the necessary and sufficient condition for a rich belief formation process to be Bayesian.

Proposition 2.

A rich belief formation process is Bayesian if and only if it satisfies the strict weighted averaging property.

Note that the richness condition is crucial. Otherwise, as shown in Example 3, there are cases where a belief formation process satisfies the strict weighted averaging axiom, but it is not a Bayesian updater. We will now present a more general version of Proposition 2 by adding the strong richness condition and weakening the strict weighted averaging condition to the weighted averaging property. In the more general version, it is possible to have zero probability events. The belief formation process behaves as a Bayesian updater, even conditional on observing a zero probability event. To capture the idea, we need the following definition.

Definition 11.

A class of functions {PA|PA:2Ω×X[0,1],AX}\{P_{A}|\ P_{A}:2^{\Omega}\times X^{*}\to[0,1],\ A\in X^{*}\} is a conditional probability system if it satisfies the following properties:

  1. (1)

    For every AXA\in X^{*} such that AA\neq\emptyset, PAP_{A} is a probability measure on Ω×X\Omega\times X with PA(Ω×A)=1P_{A}(\Omega\times A)=1.

  2. (2)

    For every disjoint events A1,A2XA_{1},A_{2}\in X^{*} and for every CΩ×XC\in\Omega\times X, we have:

    PA1A2(C)=PA1A2(Ω×A1)PA1(C)+PA1A2(Ω×A2)PA2(C)P_{A_{1}\cup A_{2}}(C)=P_{A_{1}\cup A_{2}}(\Omega\times A_{1})P_{A_{1}}(C)+P_{A_{1}\cup A_{2}}(\Omega\times A_{2})P_{A_{2}}(C)

In this definition, the probability measure PΩP_{\Omega} represents a prior probability measure, and PAP_{A} represents a posterior (conditional) probability probability given the event AA. Therefore, for any set BΩB\in\Omega, PA(B×A)P_{A}(B\times A) is the conditional probability of BB given AA. Moreover, for any two events A2A1A_{2}\subset A_{1} in XX^{*}, PA1(Ω×A2)P_{A_{1}}(\Omega\times A_{2}) is the conditional probability of event A2A_{2} given A1A_{1}. The first property of Definition 11 requires that the support of the posterior probability conditioned on an event AA is contained in AA. The second property requires that, conditional on the event A1A2A_{1}\cup A_{2}, the Bayes updating rule should be satisfied even if the prior probability of A1A2A_{1}\cup A_{2} is zero.

Definition 12.

A belief formation process f:XΔ(Ω)f:X^{*}\to\Delta(\Omega) is rationalizable by a conditional probability system {PA|PA:2Ω×X[0,1],AX}\{P_{A}|\ P_{A}:2^{\Omega}\times X^{*}\to[0,1],\ A\in X^{*}\} if every AXA\in X^{*} and B2ΩB\in 2^{\Omega} we have:

(f(A))(B)=PA(B×A).\big{(}f(A)\big{)}(B)=P_{A}(B\times A)\,. (4.5)

By adding the strong richness condition, the next theorem shows that the weighted averaging axiom is the necessary and sufficient condition for rationalizing a belief formation process by a conditional probability system.

Proposition 3.

A strongly rich belief formation process is rationalizable by a conditional probability system if and only if it satisfies the weighted averaging axiom.

Remark 2.

[36] considers the problem of characterizing the updating rules (in our context the belief formation processes) that appear to be Bayesian. By providing an example, they show that their soundness condition, our strict weighted averaging condition, is not a sufficient condition for an updating rule to behave as a Bayesian updater. However, we show that the strict weighted averaging condition is the necessary and sufficient condition as long as the belief formation process satisfies our richness condition.

5. Average Choice Functions

In this section, the set of features is a subset of n\mathbb{R}^{n}. We interpret each feature as a choice object. The interpretation of the aggregation rule is a decision-maker that selects a choice randomly from a menu of choice objects. We model the decision-maker as an average choice function that associates with any menu of choice objects, an average choice (mean of the distribution of choices) in the convex combination of choice objects. The average choice is easier to report and obtain rather than the entire distribution888Check [1] for the complete discussion on merits of average choice.. However, except for the case where elements of a menu are affinely independent, average choice does not uniquely reveal the underlying distribution of choices.

First, using our main representation, we show that it is possible to uniquely extract the underlying distribution of choices as long as the average choice function satisfies the weighted averaging axiom.

Then, we illustrate two applications of the result. In one application, we consider the class of average choice functions that can be rationalized by a Luce rule, i.e., a stochastic choice function that satisfies the independence of irrelevant alternatives axiom (IIA) proposed by [29]. We show that the average choice functions satisfying the strict weighted averaging axiom are exactly the ones that can be rationalized by a Luce rule. More generally, we show that the class of average choice functions satisfying the weighted averaging axiom is the same as the class of average choice functions rationalizable by a two-stage Luce model proposed by [14].

In the second application, we consider continuous average choice functions. First, we show that any continuous average choice function under the weighted averaging axiom is rationalizable by a Luce rule. This means that there is no continuous average choice function that is rationalizable by a two-stage Luce rule but not with a Luce rule.

Then, we illustrate a connection of our result with the one by [22], regarding the impossibility of an average choice function to satisfy both the path independence axiom and continuity.

5.1. Set up

In this section, XX is a nonempty subset of n\mathbb{R}^{n}, which is not a subset of a line. For any AnA\subseteq\mathbb{R}^{n}, we denote by Conv(A){\rm Conv}(A) the set of all convex combinations of vectors in AA.

Definition 13.

An aggregation rule f:Xnf:X^{*}\to\mathbb{R}^{n} is called an average choice function, if for any (menu of choices) AXA\in X^{*}, f(A)Conv(A)f(A)\in\text{Conv}(A).

One of the goals of this section is to present a connection between our weighted averaging condition and the Path Independent, Luce, and two-stage Luce choice models. The following is a corollary of theorems 2 and 4,

Corollary 3.

Let an average choice function f:XConv(X)f:X^{*}\to Conv(X) be strongly rich. The following statements are equivalent:

  1. (1)

    The average choice function ff satisfies the weighted averaging condition.

  2. (2)

    There exists a unique weak order \succcurlyeq on XX and a unique weight function w:XR++w:X\to R_{++}, up to multiplication over equivalence classes of the weak order such that for every AXA\in X^{*}:

    f(A)=xM(A,)w(x)xxM(A,)w(x)=xM(A,)(w(x)xM(A,)w(x))x.f(A)=\frac{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)x}{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)}=\sum\limits_{x\in M(A,\succcurlyeq)}\left(\frac{w(x)}{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)}\right)x. (5.1)

Moreover, if the average choice function ff satisfies continuity and the weighted averaging condition, the weight function ww is continuous and the weak order \succcurlyeq is the equivalence order. In this case, for every AXA\in X^{*}:

f(A)=xAw(x)xxAw(x)=xA(w(x)xAw(x))x.f(A)=\frac{\sum\limits_{x\in A}w(x)x}{\sum\limits_{x\in A}w(x)}=\sum\limits_{x\in A}\left(\frac{w(x)}{\sum\limits_{x\in A}w(x)}\right)x. (5.2)

5.2. Luce Rationalizable Average Choice Functions

The following definitions are standard definitions in the context of individual decision-making.

Definition 14.

A stochastic choice is a function ρ:XΔ(X)\rho:X^{*}\to\Delta(X), such that ρ(A)Δ(A)\rho(A)\in\Delta(A) for any AXA\in X^{*}.

For an average choice function f:XConv(X)f:X^{*}\to Conv(X) and a menu AXA\in X^{*}, f(A)Conv(A)f(A)\in\text{Conv}(A). Therefore, there exists a stochastic choice ρ:XΔ(X)\rho:X^{*}\to\Delta(X) (which may not be unique) that rationalizes the average choice function ff, i.e., f(A)=xAρ(x,A)xf(A)=\sum_{x\in A}\rho(x,A)x, where ρ(x,A)\rho(x,A) is the probability of selecting the element xx from the menu AA.

One appealing form of a stochastic choice function is the one that satisfies Luce’s IIA, i.e., the probability of selecting an element over another element is independent of any other element. [29] shows that stochastic choices that satisfy the IIA axiom are in the form of Luce rules.

Definition 15.

A stochastic choice ρ:XΔ(X)\rho:X^{*}\to\Delta(X) is a Luce rule if there is a function w:XR++w:X\to R_{++}, such that:

ρ(x,A)=w(x)yAw(y).\rho(x,A)=\frac{w(x)}{\sum_{y\in A}w(y)}.

Furthermore, if ww is continuous, then ρ\rho is a continuous Luce rule.

Definition 16.

An average choice function ff is rationalizable by a stochastic choice ρ\rho, if for all AXA\in X^{*}:

f(A)=xAρ(x,A)x.f(A)=\sum_{x\in A}\rho(x,A)x.

Furthermore, if there exists a Luce rule that rationalizes the average choice function ff, then ff is Luce rationalizable.

By considering our Theorem 1 and corollary 3, a choice ff has a Luce form representation, i.e, f(A)=xA(w(x)xAw(x))xf(A)=\sum\limits_{x\in A}(\frac{w(x)}{\sum\limits_{x\in A}w(x)})x if and only if it satisfies the strict weighted averaging condition. As a result:

Corollary 4.

An average choice function is Luce rationalizable if and only if it satisfies the strict weighted averaging condition. Moreover, the Luce rule that rationalizes the average choice function is unique.
Furthermore, an average choice function is continuous Luce rationalizable if and only if it is continuous and satisfies the strict weighted averaging condition.

In the Luce model, the decision-maker selects each element of a given menu with a strictly positive probability. However, this is not a plausible assumption in many situations. The decision-maker may always select a better choice between two alternatives. We model this behavior by a two-stage Luce model. [14] introduces the two-stage Luce model. In this model, there exists a ranking order and a weight function over elements. A decision-maker choosing from a menu only selects the highest-ordered elements from the menu. The probability of the selection of each highest-ordered element is related to the weight associated with the element. Formally:

Definition 17.

A stochastic choice ρ:XΔ(X)\rho:X^{*}\to\Delta(X) is a two-stage Luce rule if there are a function w:XR++w:X\to R_{++} and a weak order \succcurlyeq over elements of XX, such that:

ρ(x,A)={w(x)yM(A,)w(y)if xM(A,),0otherwise.\rho(x,A)=\begin{cases}\frac{w(x)}{\sum_{y\in M(A,\succcurlyeq)}w(y)}&\text{if }x\in M(A,\succcurlyeq),\\ 0&\text{otherwise}.\end{cases} (5.3)

Given AXA\in X^{*}, the decision-maker only selects the elements in M(A,)M(A,\succcurlyeq), that are the highest-ordered elements of AA. She chooses each element of M(A,)M(A,\succcurlyeq) with a probability associated with its weight.

By considering our Theorem 2, any average choice function under the weighted averaging axiom is rationalizable by a two-stage Luce rule.

Corollary 5.

A strongly rich average choice function is two-stage Luce rationalizable if and only if it satisfies the weighted averaging axiom. Moreover, the two-stage Luce rule that rationalizes the average choice function is unique.

Remark 3.

Under the continuity condition, using Theorem 4implies that both the two-stage Luce model and Luce model are equivalent. The next section discusses this observation.

5.3. Continuous Average Choice Functions

In this section, we consider the class of continuous average choice functions satisfying the weighted averaging condition. First, we reinterpret our corollary 3 as an impossibility result. This means that no continuous average choice function is rationalizable by a two-stage Luce model but not by a Luce model. Then, we show the connection with the impossibility result by [22], regarding the impossibility of a choice function satisfying both the path independence and continuity.

[32] extensively studies choice functions under the path independence axiom. Plott’s notion of path independence requires a choice from the union of two disjoint menu ABA\cup B, to be the choice between the choice from AA and the choice from BB. Using this axiom, the choice from any menu can be recursively obtained by partitioning the elements of the menu into disjoint sub-menus. Then, the choice from the whole menu would be the choice from the choices of each sub-menu. In our setup, for an average choice function ff, we have:

Definition 18.

ff satisfied the (path independence) condition if

f(AB)=f({f(A),f(B)})f(A\cup B)=f(\{f(A),f(B)\})

for all A,BXA,B\in X^{*} such that AB=A\cap B\ =\ \emptyset.

The path independence condition is stronger than our weighted averaging condition. In other words, any average choice function under Plott’s notion of path independence satisfies the weighted averaging condition. More precisely, given a choice function f:XConv(X)f:X^{*}\to Conv(X) and two disjoint menus A,BXA,B\in X^{*}, under the path independence condition, f(AB)=f({f(A),f(B)})f(A\cup B)=f(\{f(A),f(B)\}). By the definition of average choice functions, f({f(A),f(B)})Conv(f(A),f(B))f(\{f(A),f(B)\})\in Conv(f(A),f(B)), which shows that the choice function ff satisfies the weighted averaging axiom.

Continuity is an appealing property of an average choice function. It specifies that by replacing an element of a menu with another element close to it, with respect to the norm of XX, the average choice of the new menu is close to the average choice of the previous menu. [22] shows that there is no average choice function that satisfies both path independence axiom and continuity. Here, we reinterpret the result of corollary 3 to show a more general result for average choice functions.

Corollary 3 states that, for a strongly rich continuous average choice function f:XConv(X)f:X^{*}\to Conv(X) satisfying the weighted averaging condition, there exists a unique weight function w:X++w:X\to\mathbb{R}_{++} such that for any AXA\in X^{*}:

f(A)=xA(w(x)xAw(x))x.f(A)=\sum\limits_{x\in A}(\frac{w(x)}{\sum\limits_{x\in A}w(x)})x.

There are two important observations regarding the representations above.

First, through discussions in Section 5.2, the representation shows that any continuous average choice function that is rationalizable by a two-stage Luce model is also rationalizable by a Luce model. Second, since the function ww is strictly positive, the average choice of any menu should be in the relative interior of the convex hull of members of the menu.

As a result, our impossibility result specifies that for an average choice function that satisfies the weighted averaging condition, it is impossible to satisfy the continuity condition and also to have a choice from a menu that is on the relative boundary of the elements of the menu. We summarize the observation in the following corollary999To see the connection between our corollary 6 and the result in [22], it is enough to consider a menu with three non-collinear members. [22, Thm. 1] shows that the average choice of a path independent average choice function from any menu is the average choice of the average choice function from a sub-menu of two members of the menu. This shows that the average choice from a menu with three non-collinear members is on the line segment connecting two of the member of the menu. As a result, the choice should be on the relative boundary of the menu. That is why it cannot satisfy continuity. .

Corollary 6.

If XX is a nonempty convex subset of a vector space that contains at least three non-collinear points, then an average choice function f:XXf:X^{*}\to X that satisfies the weighted averaging condition cannot be both continuous and contains a menu AXA\in X^{*}, with f(A)r(Conv(A))f(A)\in\partial^{r}(\text{Conv}(A)).

6. Extended Pareto Aggregation Rules

This section demonstrates an application of section 3 in the social choice problems. In this domain, each feature represents a preference ordering of individuals over a set of alternatives. Each preference ordering satisfies the axiom of [38]. The role of an aggregation rule is to associate with each coalition of individuals another vN-M preference ordering over the set of alternatives.

An appealing property of an aggregation rule, in this context, is to satisfy the extended Pareto axiom. [35] introduced the extended Pareto. It specifies that, if two disjoint coalitions of individuals, each prefers an outcome over another outcome, then the union of the coalitions also should prefer the same outcome over the other one. Moreover, if one of them strictly prefer one outcome over the other one, then the union of the coalitions should also strictly prefer the same outcome over the other one.

First, we show that under a normalization of cardinal utilities of individuals and a minor richness condition, aggregation rules under the strict weighted averaging (weighted averaging) axiom are exactly aggregation rules under the extended Pareto (extended weak Pareto) axiom.

Following the equivalence, we use our main representation result as a technical tool to pin down the representation of the extended Pareto aggregation rules. We show that the only possible extended Pareto aggregation is to have a positive weight over each individual in the society. Then, the aggregated preference ordering of a given group of individuals is the weighted sum of their preference ordering.

The representation can be considered as a multi-profile version of the theorem by [20] on Utilitarianism. Harsanyi considers a single profile of individuals and a variant of Pareto to get the Utilitarianism. However, in our approach, we partition a profile into smaller groups. Then, we aggregate the preference ordering of these smaller groups using the extended Pareto. Hence, we get the Utilitarianism through this consistent form of aggregation. As a result, in our representation, the weight associated with each individual appears in all sub-profiles that contain her.101010 Similar to the discussion of [39] regarding the debate of Sen-Harsanyi, our result is better to be interpreted as a representation rather than a justification of the utilitarianism.

In Section 6.3, we extend our result on extended Pareto aggregation rules to the class of generalized social welfare function. Unlike our previous model, individuals may have different preference orderings. Therefore, the domain of the generalized social welfare function is a set of all different groups (with all possible sizes) of individuals, with each individual having all different possible preference orderings. Our definition of generalized social welfare function extends the standard definition used by [3], in which the domain is a set of fixed-length profiles of individuals.

For a technical reason, we restrict the set of vN-M preferences to those in which all of them strictly prefer one fixed lottery to another fixed one. We show that the only possible extended Pareto generalized social welfare functions are the ones that associate a positive number to each individual’s preferences (unlike the previous section, in which each weight depends on both the individual and the whole profile), and it associates each coalition with the weighted sum of their cardinal utility using the weight associated to their preferences.

The important observation is that, each positive weight in the representation is independent of the other individuals in any profiles. The weight only depends on each individual and her own preference ordering.

Our representation above has a positive nature, compared to the claims by [23] and [21] that the negative conclusion of Arrow’s theorem holds even with vN-M preferences. Moreover, the representation provides an answer to the main concern of [7, 8] regarding the correctness of the main theorem of [12].

[12] by considering a set of axioms, other than the ones by Arrow, provides one of the first axiomatizations of relative utilitarianism as a possibility result. However, [7] shows a counterexample to their representation. Our representation fixes the error using our variant of the extended Pareto axiom and our restricted domain of the generalized social welfare function.

Finally, adding the anonymity and the weak IIA axiom of [12] gives us the relative utilitarianism as one possible choice of the weight function. However, the primary concern of our paper is to show that the weighted averaging of preferences is the only generalized social welfare function that respects extended Pareto. The possible choices of weights are not our focus in this paper.

6.1. Set up

Let the set M={0,1,,m}M=\{0,1,\ldots,m\} and L={(p1,,pm)|i=1mpi1,pi0}L=\{(p_{1},\ldots,p_{m})|\sum_{i=1}^{m}p_{i}\leq 1,p_{i}\geq 0\}. A lottery pLp\in L associates the probability pip_{i} to the prospect iM{0}i\in M\setminus\{0\} and 1i=1mpi1-\sum_{i=1}^{m}p_{i} to the prospect 0.

A vN-M preference over the set LL is a preference relation that satisfies the axioms of [38] as defined below111111If RR is a vN-M preference over the set LL, then, by the vN-M theorem, there exists an affine representation of the preference RR. For notational convenience, we normalize all affine representations to have the value 0 over the prospect 0..

Definition 19.

We say that RR is a vN-M preference over the set LL if it is a weak order and if there exists a umu\in\mathbb{R}^{m}, known as a utility, such that for any x,yLx,y\in L, xRy\ xRy if and only if uxuyu\cdot x\geq u\cdot y where “\cdot” represents the inner product in m\mathbb{R}^{m}. Moreover, the (unique) ray U={αu|α>0}U=\{\alpha u|\ \alpha>0\} contains all normalized affine utilities that represent the vN-M preference RR. We write \mathcal{R} for the set of all vN-M preferences over LL and R¯\overline{R} for the strict part of the preference RR\in\mathcal{R}.

Let X={1,,n}X=\{1,\ldots,n\} represent the set of all agents and XX^{*} be the set of all finite subsets of XX. Write X\mathcal{R}^{X} for the X-Fold Cartesian product of \mathcal{R}. Every RXXR^{X}\in\mathcal{R}^{X} defines a preference profile of the set of agents over the set of lotteries.

Definition 20.

A group aggregation rule on X is a function f:Xf:X^{*}\to\mathcal{R}, that associates with every coalition of agents AXA\in X^{*} a vN-M preference f(A)f(A)\in\mathcal{R}.

An rational property of group aggregation rules is that whenever two disjoint coalitions, e.g. A,BXA,B\in X^{*}, both prefer a lottery xx to another lottery yy, then their union, ABA\cup B, also prefers the lottery xx to the lottery yy.

Definition 21.

A group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfies the extended Pareto property if for all disjoint coalitions of agents A,BXA,B\in X^{*}, and for all lotteries x,yLx,y\in L,

xf(A)y,xf(B)yxf(AB)yx\ f(A)\ y,\ x\ f(B)\ y\Rightarrow x\ f(A\cup B)\ y (6.1)
xf(A)¯y,xf(B)yxf(AB)¯yx\ \overline{f(A)}\ y,\ x\ f(B)\ y\Rightarrow x\ \overline{f(A\cup B)}\ y (6.2)

Our last condition requires the existence of two lotteries in the set of lotteries, in which all agents strictly prefer one over the other.

Definition 22.

A group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfies the minimal agreement condition if there exist two lotteries x¯,x¯L\overline{x},\underline{x}\in L such that for every agent iXi\in X, x¯f(i)¯x¯\overline{x}\ \overline{f(i)}\underline{x}.

Remark 4.

Let a group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfy both the minimal agreement and extended Pareto axiom. Given two agents i,jXi,j\in X, by applying the strict part of the definition of the extended Pareto axiom, we have x¯f({i,j})¯x¯\overline{x}\ \overline{f(\{i,j\})}\ \underline{x}. Similarly, for every coalition of agents AXA\in X^{*}, recursively using the strict part of the extended Pareto axiom, we deduce x¯f(A)¯x¯\overline{x}\ \overline{f(A)}\ \underline{x}.

Remark 5.

Let the vector vmv\in\mathbb{R}^{m} be x¯x¯\overline{x}-\underline{x}, where x¯,x¯\overline{x},\underline{x} are the two lotteries in the definition of the minimal agreement condition. Let uimu_{i}\in\mathbb{R}^{m} represent the vN-M preference f(i)f(i). Hence, x¯f(i)¯x¯\overline{x}\ \overline{f(i)}\ \underline{x} if and only if uiv>0u_{i}\cdot v>0. Therefore, the definition of the minimal agreement condition is equivalent to the existence of a direction vmv\in\mathbb{R}^{m} such that for all iXi\in X, uiv>0u_{i}\cdot v>0. this interpretation of vv is exactly the role of ν\nu in section 3.

6.2. The Representation of Extended Pareto Group Aggregation Rules

In this section, we assume that the group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfies the minimal agreement condition. In particular, we assume that all agents strictly prefer the lottery x¯L\overline{x}\in L over the lottery x¯L\underline{x}\in L. Considering remark 5, we define v=x¯x¯v=\overline{x}-\underline{x} as the direction that every agent agrees on. For a coalition of agents AXA\in X^{*}, let the ray UAU_{A} represents the vN-M preference f(A)f(A). Let H:={um|uv=1}H:=\{u\in\mathbb{R}^{m}|\ u\cdot v=1\} represent the normalization of utilities in which the difference of the value of utility of the lottery x¯\overline{x} and the lottery x¯\underline{x} is exactly 1. For every coalition of agents AXA\in X^{*}, there is a unique cardinal utility u^AUA\hat{u}_{A}\in U_{A}, such that u^A\hat{u}_{A} is in HH. For the rest of the section, for every coalition AXA\in X^{*}, we consider the unique cardinal utility u^AH\hat{u}_{A}\in H to represent the vN-M preference f(A)f(A). Using this representation, we can represent the group aggregation rule f:Xf:X^{*}\to\mathcal{R}, by a normalized group aggregation rule fH:Xmf_{H}:X^{*}\to\mathbb{R}^{m}, where fH(A)=u^Af_{H}(A)=\hat{u}_{A}.

Remark 6.

Without loss of generality, we can assume that the lottery x¯\underline{x} in the definition of the minimal agreement condition is just the lottery 0. In that case, the space HH is vN-M preferences with the value 0 for the lottery 0 and the value 11 for the lottery x¯\overline{x}.

The next proposition, which is the same as Theorem 5, shows that under the representation of the vN-M preference f(A)f(A) by the u^A\hat{u}_{A}, the extended Pareto property is equivalent to the strict weighted averaging property. Formally, we have:

Corollary 7.

Let a group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfy the minimal agreement condition with vmv\in\mathbb{R}^{m} as the direction on which all agents agree. Then, the following are equivalent:

  1. (1)

    ff satisfies the extended Pareto property.

  2. (2)

    fHf_{H} satisfies the strict weighted averaging property.

Using the result of Theorem 1, we deduce the representation of the extended Pareto group aggregation rules.

Corollary 8.

Let a rich group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfy both the extended Pareto property and minimal agreement condition. Then, there exists a weight function w:X++w:X\to\mathbb{R}_{++} such that for every coalition of agents AXA\in X^{*},

fH(A)=iA(w(i)jAw(j))fH(i).f_{H}(A)=\sum\limits_{i\in A}\left(\frac{w(i)}{\sum\limits_{j\in A}w(j)}\right)f_{H}(i). (6.3)

Moreover, the weight function is unique up to multiplication by a positive number.

As shown in Example 3, the richness condition is crucial. The richness here is equivalent to the existence of three non-collinear “normalized” cardinal utilities in the space HH (the range of the aggregation rule). We can interpret the theorem as a generalization of the main theorem of [20] on Utilitarianism. However, our result shows the connection between weights of individuals in different sub-coalitions of the main profile.

To see the connection with Harsanyi’s result, we rewrite the theorem in an additive form: let the group aggregation rule f:Xf:X^{*}\to\mathcal{R} satisfy both the extended Pareto property and minimal agreement condition. Then, there exists a weight function w:X++w:X\to\mathbb{R}_{++} such that for every coalition of agents AXA\in X^{*}, f(A)f(A) has the following representation:

iAw(i)fH(i).\sum\limits_{i\in A}w(i)f_{H}(i). (6.4)

Defining u(i):=w(i)fH(i)u(i):=w(i)f_{H}(i) for iXi\in X, we can rewrite equation 6.4 in the additive form iAu(i)\sum\limits_{i\in A}u(i). Moreover, if we consider only the representations with the value 0 for the lottery 0, this representation is unique up to multiplication by a positive number.

6.3. The Representation of Extended Pareto Generalized Social Welfare Functions

The setup of this subsection is the same as the one in the previous section. Without loss of generality, we assume that the lottery x¯L\underline{x}\in L, in the definition of the minimal agreement, is the vector 0. Let x¯L\overline{x}\in L be any lottery other than 0. Define x¯\mathcal{R}_{\overline{x}}\subset\mathcal{R} as the set of all vN-M preferences that strictly prefer x¯\overline{x} to 0. Let x¯X\mathcal{R}_{\overline{x}}^{X} be the X-fold Cartesian product of x¯\mathcal{R}_{\overline{x}}. Every Rx¯XR\in\mathcal{R}_{\overline{x}}^{X} defines a preference profile of the set of individuals. For any coalition AXA\in X^{*} and for any preference profile Rx¯XR\in\mathcal{R}_{\overline{x}}^{X}, let RAx¯AR_{A}\in\mathcal{R}_{\overline{x}}^{A} denote the restriction of the profile RR to the coalition AA.

As in Definition 19, we can represent each preference RR\in\mathcal{R} by a unique ray UR={αu|α>0}U_{R}=\{\alpha u|\ \alpha>0\}, where umu\in\mathbb{R}^{m} is a cardinal utility representing RR. Moreover, for any preference Rx¯R\in\mathcal{R}_{\overline{x}}, there should be a unique cardinal utility uRURu_{R}\in U_{R} with uRx¯=1u_{R}\cdot\overline{x}=1. Write H={um|u.x¯=1}H=\{u\in\mathbb{R}^{m}|\ u.\overline{x}=1\} for the space of all cardinal utilities attaining value 0 at the lottery 0 and the value 11 at the lottery x¯\overline{x}. Let the function uH:x¯Hu_{H}:\mathcal{R}_{\overline{x}}\to H associate each preference Rx¯R\in\mathcal{R}_{\overline{x}} with the unique cardinal utility uH(R)Hu_{H}(R)\in H that represents it. This function is a bijection associating each preference to the unique cardinal utility attaining value 0 at the lottery 0 and value 11 at the lottery x¯\overline{x}.

Write Xx¯X\mathcal{R}_{X}\subset\mathcal{R}_{\overline{x}}^{X} for the set of all profiles where the representation of individuals’ cardinal utilities in the space HH is not a subset of a single line. Formally, we define X={Rx¯X|d({uH(Ri)|iX})>1}\mathcal{R}_{X}=\{R\in\mathcal{R}_{\overline{x}}^{X}|\ d(\{u_{H}(R_{i})|\ i\in X\})>1\}, where d({uH(Ri)|iX})d(\{u_{H}(R_{i})|\ i\in X\}) is the dimension of the smallest linear variety containing all uH(Ri),iXu_{H}(R_{i}),\ i\in X.121212There should be at least four alternatives; otherwise, X\mathcal{R}_{X} is the empty set.

Finally, write X={Rx¯A|AX,RX}\mathcal{R}_{X}^{*}=\{R\in\mathcal{R}_{\overline{x}}^{A}|\ A\subseteq X,R\in\mathcal{R}_{X}\} for all the profiles in X\mathcal{R}_{X} and all sub-coalitions of those profiles. X\mathcal{R}_{X}^{*} is the domain of our generalized social welfare functions. Formally, we have:

Definition 23.

A generalized social welfare function on X\mathcal{R}_{X} is a function f:Xf:\mathcal{R}_{X}^{*}\to\mathcal{R}, that associates with any coalition AXA\in X^{*} and any profile Rx¯XR\in\mathcal{R}_{\overline{x}}^{X} a preference f(RA)f(R_{A})\in\mathcal{R}. Moreover, we assume that for any individual iXi\in X, and any profile Rx¯XR\in\mathcal{R}_{\overline{x}}^{X} , f(Ri)=Rif(R_{i})=R_{i}.

In our setup, the domain of generalized social welfare functions is a rich set of all sizes of profiles. Moreover, it satisfies the Individualism axiom, which means that it associates any individual preference to the same preference.

The connection between profiles of different sizes is the extended Pareto property. The extended Pareto property states that if the associated preference ordering of two disjoint coalitions of individuals, AA and BB, each prefer a lottery xx to yy, then the associated preference ordering of the union of the coalition with the same preference as before should also prefer xx to yy.

Definition 24.

A generalized social welfare function f:Xf:\mathcal{R}_{X}^{*}\to\mathcal{R} satisfies the extended Pareto property if for every preference profile RXR\in\mathcal{R}_{X} and for any two disjoint coalitions A,BXA,B\in X^{*}, and for all lotteries x,yLx,y\in L,

xf(RA)y,xf(RB)yxf(RAB)yx\ f(R_{A})\ y,\ x\ f(R_{B})\ y\Rightarrow x\ f(R_{A\cup B})\ y (6.5)
xf(RA)¯y,xf(RB)yxf(RAB)¯yx\ \overline{f(R_{A})}\ y,\ x\ f(R_{B})\ y\Rightarrow x\ \overline{f(R_{A\cup B})}\ y (6.6)

Our main result of this section characterizes the class of extended Pareto generalized social welfare functions.

Theorem 6.

Let XX be a set of individuals with |X|4|X|\geq 4. The generalized social welfare function f:Xf:\mathcal{R}_{X}^{*}\to\mathcal{R} satisfies the extended Pareto property if and only if there exists a weight function w:X×x¯++w:X\times\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++}, such that for any coalition AXA\in X^{*} and any preference profile RXR\in\mathcal{R}_{X}, f(RA)f(R_{A}) has the following representation:

uH(f(RA))=iA(w(i,Ri)jAw(j,Rj))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w(i,R_{i})}{\sum\limits_{j\in A}w(j,R_{j})}\right)u_{H}(R_{i}). (6.7)

Moreover, the weight function is unique up to multiplication by a positive number.

Remark 7.

We can rewrite the theorem to specify that the generalized social welfare function f:Xf:\mathcal{R}_{X}^{*}\to\mathcal{R} satisfies the extended Pareto axiom if and only if there exists a weight function w:X×x¯++w:X\times\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++}, such that for any coalition AXA\subseteq X and any preference profile RXR\in\mathcal{R}_{X}, f(RA)f(R_{A}) has the following representation:

iAw(i,Ri)uH(Ri).\sum\limits_{i\in A}w(i,R_{i})u_{H}(R_{i}). (6.8)

Note that each weight depends only on the associated individual’s preferences and not on the other individuals.

The weight function in the representation depends on each individual’s index. However, adding the classical Anonymity condition makes the weight function independent of individual’s indexes.

Definition 25.

An extended Pareto aggregation rule f:Xf:\mathcal{R}^{*}_{X}\to\mathcal{R} satisfies the Anonymity condition, if any permutation of the indexes of individuals does not change the generalized social welfare function.

The anonymity condition makes any extended Pareto generalized social welfare functions independent of the individual’s indexes. Hence, the uniqueness of the weight function in Theorem 6 makes the weight function, associated with an anonymous extended Pareto aggregation rule, independent of the indexes. Therefore, we have:

Corollary 9.

Let XX be a set of individuals with |X|5|X|\geq 5. The extended Pareto generalized social welfare function f:Xf:\mathcal{R}_{X}^{*}\to\mathcal{R} satisfies the Anonymity condition if and only if there exists a weight function w:x¯++w:\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++}, such that for any coalition AXA\in X^{*} and any preference profile RXR\in\mathcal{R}_{X}, f(RA)f(R_{A}) has the representation:

uH(f(RA))=iA(w(Ri)jAw(Rj))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w(R_{i})}{\sum\limits_{j\in A}w(R_{j})}\right)u_{H}(R_{i}). (6.9)

Or, equivalently, if and only if f(RA)f(R_{A}) has the representation:

iAw(Ri)uH(Ri).\sum\limits_{i\in A}w(R_{i})u_{H}(R_{i}). (6.10)

Moreover, the weight function is unique up to multiplication by a positive number.

The positive nature of our result appears to contradict the conjectures by [23] and [21] that the negative conclusion of the impossibility theorem by [3] holds even with vN-M preferences. However, other than the differences between our model and theirs, we only consider the restricted domain where all preferences prefer the lottery x¯\overline{x} over the lottery x¯\underline{x}. As discussed before, the definition of our restricted domain is crucial in corollary 9.

Remark 8.

Relative utilitarianism can be obtained by adding the weak IIA axiom of [12]: the weight function normalizes each preference such that the difference between the cardinal utility of the best alternative and the worst alternative becomes 11. In other words, for any preference Rx¯R\in\mathcal{R}_{\overline{x}}, w(R)=1max𝑗(uH(R))jmin𝑗(uH(R))jw(R)=\frac{1}{\underset{j}{max}\ (u_{H}(R))_{j}-\underset{j}{min}\ (u_{H}(R))_{j}}.

7. A Conditional Subjective Expected Utility Theory of State-Dependant Preferences

The choice-theoretic foundation of subjective expected utility was developed by the seminal works of [33], [34], and [2]. In the standard model, the decision-maker has a ranking over acts (state-contingent outcomes). The representation of this ranking consists of a subjective probability over the set of states, capturing the decision maker’s beliefs, and a cardinal utility representing the decision maker’s tastes over the set of outcomes, independent of the realization of the true state. However, in many applications, such as models for buying health insurance, the independence of the utility and the set of states is not a plausible assumption131313Check [4], [9], and [24] for more discussions..

In this section, we provide a simple theory of subjective expected utility of state-dependent utility by reinterpreting our representation of extended Pareto aggregation rules. We build our model using the framework of [2]. In our model, the decision-maker has a preference ordering over the set of conditional constant acts. This means that given any fixed event, the decision-maker has a hypothetical conditional preference ordering over the set of lotteries, representing her conditional preference condition on learning that only that event is happening141414In Section 7, we illustrate another interpretation of hypothetical conditional preferences by providing a preference ordering over the set of conditional constant acts. . Each of these hypothetical conditional preferences satisfies the axioms of [38], which means each has an affine representation. We show that as long as the class of hypothetical conditional preferences satisfies the extended Pareto axiom, there is a subjective probability measure over the set of states and a state-dependent utility over the set of alternatives. The class of hypothetical conditional preferences has a representation in the form of conditional expectation with respect to the subjective probability and the state-dependent utility.

The result shows that the extended Pareto is the main force behind the separation of the belief and the state-dependent utility. However, the representation is not unique. Hence, the challenge is to provide meaning to a decision maker’s prior beliefs when utility is state-dependent. We get the uniqueness by adding a stronger version of our minimal agreement condition. The strong minimal agreement condition specifies that there exist two lotteries where one is strictly preferred to the other, regardless of states. Moreover, the decision maker’s conditional preference for each of them is independent of the set of states.

We show that under the strong minimal agreement, the belief is unique. Moreover, the state-dependent utility is unique up to affine transformation.

7.1. Set up and main result

In this section, we develop a simple theory of subjective expected utility of state-dependant utility by reinterpreting the results of sections 6.2 and 4.3. Our model is built using the framework of [2]. Let Ω={1,2,,n}\Omega=\{1,2,\ldots,n\} be a finite set of states of nature. The finite set MM represents outcomes. The simplex L=Δ(M)L=\Delta(M) represents the set of lotteries over the set MM. A lottery lLl\in L associates the probability lil_{i} to the outcome iMi\in M. Let the OMO\notin M. In our setup, the objects of choice are conditional constant acts. For any lottery lLl\in L, and any event A2ΩA\in 2^{\Omega}\setminus\emptyset, the function f:ΩL{O}f:\Omega\to L\cup\{O\} such that f(w)=lf(w)=l for ωA\omega\in A and f(w)=Of(w)=O for ωAc\omega\in A^{c} is termed a conditional constant act and denoted by f=(l,A,O,Ac)f=(l,A,O,A^{c}). The interpretation is that if the event AA is realized, the decision maker faces the lottery ll. Otherwise, OO will be realized. We assume that the decision maker has a preference relation \succcurlyeq, not necessarily a complete relation, over the set of conditional constant acts.

Let F={(l,A,O,Ac)|A2Ω,lL}F=\{(l,A,O,A^{c})|\ \emptyset\neq A\in 2^{\Omega},l\in L\} represent the set of conditional constant acts. For any event A2Ω\emptyset\neq A\in 2^{\Omega}, let FA={(l,A,O,Ac)|lL}F_{A}=\{(l,A,O,A^{c})|l\in L\} be the set of all conditional constant acts attaining a lottery on the event AA and staying out on the event AcA^{c}. We represent the conditional preference ordering of the decision maker over FAF_{A} by A\succcurlyeq_{A}. For any two lotteries l1,l2l_{1},l_{2}, we write l1Al2l_{1}\succcurlyeq_{A}l_{2} as a shorthand of (l1,A,O,Ac)(l2,A,O,Ac)(l_{1},A,O,A^{c})\succcurlyeq(l_{2},A,O,A^{c}).

Our interpretation of conditional preference ordering is related to the models developed by [30], [15], [37], and [26]. However, there is another interpretation of the conditional preference similar to the conditional decision model of [16]. In this interpretation, we assume that the decision-maker may receive some information that only ωA\omega\in A can be realized. In this case, A\succcurlyeq_{A} represents the decision maker’s ex-post preference over the set of lotteries. Similarly, Ω\succcurlyeq_{\Omega} represents her ex-ante preference over exactly the same set of lotteries.

Regardless of the interpretation, the goal is to provide a theory that connects the class of conditional preferences through the Bayes updating. Formally, our goal is to find sufficient conditions under which that there exists a state-dependent utility function u:Ω×Mu:\Omega\times M\to\mathbb{R} and a subjective probability measure P:Ω++P:\Omega\to\mathbb{R}_{++}, such that for every two lotteries x,yLx,y\in L, and any event AA the following holds:

xAyωAP(ω|A)Ex[u(ω,)]ωAP(ω|A)Ey[u(ω,)].x\succcurlyeq_{A}y\Leftrightarrow\sum\limits_{\omega\in A}P(\omega|A)E^{x}[u(\omega,\cdot)]\geq\sum\limits_{\omega\in A}P(\omega|A)E^{y}[u(\omega,\cdot)]. (7.1)

In the equation above, Ex[u(w,)]E^{x}[u(w,\cdot)] represents the expected utility of the state-dependent utility uu in the state ω\omega and with respect to the lottery xx. The right-hand side of the equation is comparing the conditional expectation utility of the lottery xx and yy, with respect to the subjective probability measure PP and the state-dependent utility uu. The importance of the result is that the probability measure PP depends on the event AA through the Bayes rule. We will obtain 7.1 from the following axioms/conditions.

Axiom 7.1.

(Weak Order) For any event AA, the conditional preference A\succcurlyeq_{A} is complete and transitive.

Axiom 7.2.

(vN-M Continuity) For any event AA and for every x,y,zLx,y,z\in L, if xAyAzx\succcurlyeq_{A}y\succcurlyeq_{A}z, there exist α,β(0,1)\alpha,\beta\in(0,1) such that

αx+(1α)zAyAβx+(1β)z\alpha x+(1-\alpha)z\succcurlyeq_{A}y\succcurlyeq_{A}\beta x+(1-\beta)z
Axiom 7.3.

(Independence) For any event AA, every x,y,zLx,y,z\in L, and every α(0,1)\alpha\in(0,1),

xAyαx+(1α)zAαy+(1α)zx\succcurlyeq_{A}y\Rightarrow\alpha x+(1-\alpha)z\succcurlyeq_{A}\alpha y+(1-\alpha)z
Axiom 7.4.

(extended Pareto) For any two disjoint events A,BA,B, and for every x,yLx,y\in L,

xAy,xByxAByx\ \succcurlyeq_{A}\ y,\ x\ \succcurlyeq_{B}\ y\Rightarrow x\ \succcurlyeq_{A\cup B}\ y (7.2)
xAy,xByxAByx\ \succ_{A}\ y,\ x\ \succcurlyeq_{B}\ y\Rightarrow x\ \succ_{A\cup B}\ y (7.3)
Axiom 7.5.

(Minimal Agreement) There exist two lotteries x¯,x¯L\overline{x},\underline{x}\in L such that for every ωΩ\omega\in\Omega, x¯ωx¯\overline{x}\ \succ_{\omega}\underline{x}.

Axiom 7.6.

(Richness) There exist three states ω1,ω2,ω3Ω\omega_{1},\omega_{2},\omega_{3}\in\Omega such that for any ω{ω1,ω2,ω3}\omega\in\{\omega_{1},\omega_{2},\omega_{3}\}, there exist two lotteries x,yLx,y\in L where xωyx\succ_{\omega}y and yωxy\succcurlyeq_{\omega^{\prime}}x for ω{ω1,ω2,ω3}{ω}\omega^{\prime}\in\{\omega_{1},\omega_{2},\omega_{3}\}\setminus\{\omega\}.

By considering these six axioms, we can rationalize the behavior of the decision-maker as a subjective expected utility maximizer with a state-dependent utility.

Theorem 7.

Suppose that the decision maker’s conditional preferences satisfy axioms 6.1-6.6, then there exist a function u:Ω×Mu:\Omega\times M\to\mathbb{R} and a probability measure P:Ω++P:\Omega\to\mathbb{R}_{++}, such that for every two lotteries x,yLx,y\in L, and any event AA, the following holds:

xAyωAP(ω|A)Ex[u(ω,)]ωAP(ω|A)Ey[u(ω,)]x\succcurlyeq_{A}y\Leftrightarrow\sum\limits_{\omega\in A}P(\omega|A)E^{x}[u(\omega,\cdot)]\geq\sum\limits_{\omega\in A}P(\omega|A)E^{y}[u(\omega,\cdot)] (7.4)

The proof is similar to the proof of corollary 8. The probability measure PP is not unique. Let Q:Ω++Q:\Omega\to\mathbb{R}_{++} be any probability measure on Ω\Omega; by defining a state-dependent utility w(ω,x)=u(ω,x)Q(ω)w(\omega,x)=\frac{u(\omega,x)}{Q(\omega)}, equation 7.4 continues to hold with QQ and ww. However, if we change the minimal agreement axiom to a stronger version, we attain the uniqueness. In the stronger version of the minimal agreement, we assume that the decision maker’s preferences over the lotteries x¯,x¯\overline{x},\underline{x} is indifferent to the realization of the states. Formally:

Axiom 7.7.

(Strong Minimal Agreement) There exist two lotteries x¯,x¯L\overline{x},\underline{x}\in L such that for every ωΩ\omega\in\Omega, x¯ωx¯\overline{x}\ \succ_{\omega}\underline{x}. Moreover, (x¯,{ω1},O,Ω{ω1})(x¯,{ω2},O,Ω{ω2}(\overline{x},\{\omega_{1}\},O,\Omega\setminus\{\omega_{1}\})\sim(\overline{x},\{\omega_{2}\},O,\Omega\setminus\{\omega_{2}\}) and (x¯,{ω1},O,Ω{ω1})(x¯,{ω2},O,Ω{ω2})(\underline{x},\{\omega_{1}\},O,\Omega\setminus\{\omega_{1}\})\sim(\underline{x},\{\omega_{2}\},O,\Omega\setminus\{\omega_{2}\}) for all ω1,ω2Ω\omega_{1},\omega_{2}\in\Omega.

Conceptually, this axiom is closely related to A.0 axiom by [26]. However, unlike Karni’s axiom, we do not need these two lotteries to be the best and worst lotteries in the set of lotteries. Our model only needs two lotteries, with one strictly preferred to the other, regardless of states. Moreover, the decision maker’s conditional preference for each of them is independent of the set of states.

By replacing the minimal agreement axiom with the strong minimal agreement axiom, we can “uniquely” separate the belief from the state-dependent preference.

Theorem 8.

Suppose that the decision maker’s conditional preferences satisfy axioms 6.1-6.7, then there exist a function u:Ω×Mu:\Omega\times M\to\mathbb{R} and a probability measure P:Ω++P:\Omega\to\mathbb{R}_{++}, such that for every two lotteries x,yLx,y\in L, and every event AA, the following holds:

xAyωAP(ω|A)Ex[u(ω,)]ωAP(ω|A)Ey[u(ω,)]x\succcurlyeq_{A}y\Leftrightarrow\sum\limits_{\omega\in A}P(\omega|A)E^{x}[u(\omega,\cdot)]\geq\sum\limits_{\omega\in A}P(\omega|A)E^{y}[u(\omega,\cdot)] (7.5)

Moreover, the probability measure PP is unique and the function uu is unique up to affine transformations.

Proof.

Based on Theorem 7, there exists a pair (P,u)(P,u) satisfying equation 7.5. To prove the uniqueness, we assume that (P1,u1)(P_{1},u_{1}) and (P2,u2)(P_{2},u_{2}) both represent the same class of conditional preferences. By considering the conditional preference ω\succcurlyeq_{\omega} and the vN-M Theorem, we know that u2(ω,.)=αωu1(ω,.)+βωu_{2}(\omega,.)=\alpha_{\omega}u_{1}(\omega,.)+\beta_{\omega}. By using the strong minimal agreement axiom, we have u1(ω1,x¯)=u1(ω2,x¯)u_{1}(\omega_{1},\overline{x})=u_{1}(\omega_{2},\overline{x}), u2(ω1,x¯)=u2(ω2,x¯)u_{2}(\omega_{1},\overline{x})=u_{2}(\omega_{2},\overline{x}), u1(ω1,x¯)=u1(ω2,x¯)u_{1}(\omega_{1},\underline{x})=u_{1}(\omega_{2},\underline{x}), and u2(ω1,x¯)=u2(ω2,x¯)u_{2}(\omega_{1},\underline{x})=u_{2}(\omega_{2},\underline{x}) for any two states ω1,ω2Ω\omega_{1},\omega_{2}\in\Omega. Therefore, αω1=αω2\alpha_{\omega_{1}}=\alpha_{\omega_{2}} and βω1=βω2\beta_{\omega_{1}}=\beta_{\omega_{2}} for all ω1,ω2Ω\omega_{1},\omega_{2}\in\Omega. Hence, u2(ω,.)=αu1(ω,.)+βu_{2}(\omega,.)=\alpha u_{1}(\omega,.)+\beta for all ωΩ\omega\in\Omega.

We consider an event AA. Both (P1,u1)(P_{1},u_{1}) and (P2,u2)(P_{2},u_{2}) represent the conditional preference A\succcurlyeq_{A}. Considering the pair (P2,u2)(P_{2},u_{2}), A\succcurlyeq_{A} has the representation

ωAP2(ω|A)E(.)[u2(ω,.)]=ωAP2(ω|A)E(.)[αu1(ω,.)+β]=αωAP2(ω|A)E(.)[u1(ω,.)]+β.\begin{split}\sum\limits_{\omega\in A}P_{2}(\omega|A)E^{(.)}[u_{2}(\omega,.)]&=\sum\limits_{\omega\in A}P_{2}(\omega|A)E^{(.)}[\alpha u_{1}(\omega,.)+\beta]\\ &=\alpha\sum\limits_{\omega\in A}P_{2}(\omega|A)E^{(.)}[u_{1}(\omega,.)]+\beta\,.\end{split}

Since, α\alpha is strictly positive, the last representation is the same as
ωAP2(ω|A)E(.)[u1(ω,.)]\sum\limits_{\omega\in A}P_{2}(\omega|A)E^{(.)}[u_{1}(\omega,.)]. However, using the other pair, (P1,u1)(P_{1},u_{1}), we get the representation ωAP1(ω|A)E(.)[u1(ω,.)]\sum\limits_{\omega\in A}P_{1}(\omega|A)E^{(.)}[u_{1}(\omega,.)]. Therefore, for any event AA, ωAP2(ω|A)E(.)[u1(ω,.)]\sum\limits_{\omega\in A}P_{2}(\omega|A)E^{(.)}[u_{1}(\omega,.)] and ωAP1(ω|A)E(.)[u1(ω,.)]\sum\limits_{\omega\in A}P_{1}(\omega|A)E^{(.)}[u_{1}(\omega,.)] both represent the conditional preference A\succcurlyeq_{A}. Using the richness axiom, strong minimal agreement condition, and uniqueness of corollary 8, we have P1=P2P_{1}=P_{2}. This completes the proof. ∎

8. Related Literature

Our methods are applicable to different areas of economic theory, and generalize existing ideas in those areas. In particular, instances of our weighted averaging axiom appear in several different papers.

The theory of Case-Based Prediction is developed by the seminal works of [17, 18, 19] and [6]. In this context, the concatenation axiom proposed by [6], is closely related to the strict case of our axiom. However, there are differences between the two axioms. As discussed in detail in Section 4.1, their belief formation process is defined over “sequences” of cases, in which each sequence can have multiple copies of the same case. The role of the concatenation axiom is to count the number of each case. However, in our framework, we define our axiom over “sets” of signals, in which in each set there is only one copy of each signal. Moreover, our axiom is defined over disjoint sets. By weakening our definition for any two general sets, our result does not hold anymore. Using the duality argument, the consistency axiom of our paper is the combination axiom of [18]. However, our goal is to show that the combination axiom can be obtain by weaker version of concatenation axiom using a simple duality argument.

In the paper by [36], they provide an example, on binary state space, to show that their soundness condition is not a sufficient condition for an updating rule to behave as a Bayesian rule. However, we show that under our richness assumption, the strict weighted averaging axiom (which is the same as their soundness condition) is the necessary and sufficient condition for an updating rule to behave as a Bayesian. We also generalize our result for the class of updating rules that can be rationalized by a conditional probability system.

In the context of choice theory, [1] introduces a model of continuous average choice over convex domains. In this application, we generalized their result in many ways. First, their result holds for the strict case of our axiom. Moreover, continuity and convexity are the two important forces behind their result. However, we show that the strictness of an average choice function, continuity, or convexity of the domain are not the main forces behind extracting the underlying distribution of choices. The main force is our weighted averaging axiom. Moreover, we show that it is possible to rationalize an average choice function by a two-stage Luce model, as long as it satisfies our weighted averaging axiom.

The path independence choice functions are extensively studied by [32]. Our representation of average choice functions under the weighted averaging axiom and continuity generalizes the results by [22] and [31], regarding the impossibility of a choice function under both the path independence and the continuity.

In the context of social choice, [12] and [5] study variants of extended Pareto rules.

[5] study the extended Pareto rule over vN-M preferences by relaxing the completeness axiom. Besides the technical and conceptual differences between the two approaches and results, their model depends on their non-degeneracy condition. The condition is only satisfied when there is a spanning tree over the preferences, and every three consecutive preferences in the spanning tree are linearly independent. However, the richness condition of our theorem only requires three linearly independent vectors among the whole set of preferences. Moreover, our result can be applied even for the class of extended weak Pareto aggregation rules under our strong richness condition. Note that our primary goal in this paper is to show that extended Pareto and extended weak Pareto are special cases of our weighted averaging axiom (under the minimal agreement condition).

The papers by [12], [13], and [8] each by considering different sets of axioms, other than Arrow’s, provide an axiomatization of relative utilitarianism as a positive result. The paper by [12] is the closest one to ours. Dhillon considers a variant of extended Pareto to get a weighted averaging structure. However, [7] shows a counterexample to the representation. We restrict the domain and use our definition of extended Pareto to get the weighted averaging structure as a consequence of our main theorem. Again, the technique we developed can also be used to provide a representation of the extended weak Pareto social welfare functions.

Finally, there are many papers and different approaches to address the shortcomings of subjective expected utility theory. Note that our goal, in this context, is to explain the basic underlying structure that lets us separate beliefs and state-dependent utilities.

[28] and [27] use hypothetical preferences on hypothetical lotteries to obtain the identification of the beliefs and state-dependent preferences. [10], [11], and [25] present different theories to identify state-dependent preferences in situations where moral hazard is present.

[30] and [15] use preferences on enlarged choice space of all conditional acts to model subjective expected utility of state-dependent preferences. However, our paper only considers the hypothetical conditional preferences on the set of conditional “constant” acts. We find the necessary and sufficient condition that our conditional preferences are related to each other through a subjective probability and a state-dependent utility.

The papers by [37], [16], and [26] are closely (conceptually) related to our main result of Section 7. However, there are many differences between each result. Moreover, our goal is to build the model that only extended Pareto derives the separation of beliefs and state-dependent preferences.

[37] presents a nonexpected utility model, by considering hypothetical preferences over the set of act-event pairs. His coherence axiom has the same role as the extended Pareto axiom in our setup. However, he used the solvability axiom to be able to apply the Debreu’s additive representation theorem. In our paper, we consider the class of conditional vN-M preferences. As a result, we only require the extended Pareto for our representation.

[26] presents a general model with a preference ordering over the set of unconditional acts. Using the preference order, he defines the set of conditional preferences over the set of all conditional acts. Therefore, to connect the class of conditional preferences, the model needs the existence of the constant-valuation acts. Moreover, the cardinal and ordinal coherence axioms are the main forces behind obtaining the Bayesian updating in his representation. However, in our more restricted domain, we only need the extended Pareto to get our representation.

Finally, [16], by replacing Savage’s sure-thing principle by dynamic consistency, obtains a subjective expected utility theory that the conditional preferences are connected through the Bayes rule. However, his representation only holds for the state-independent preferences.

Acknowledgments

The authors gratefully acknowledge support from Beyond Limits (Learning Optimal Models) through CAST (The Caltech Center for Autonomous Systems and Technologies) and partial support from the Air Force Office of Scientific Research under awards number FA9550-18-1-0271 (Games for Computation and Learning) and FA9550-20-1-0358 (Machine Learning and Physics-Based Modeling and Simulation).

The first version of the paper was written during the first author’s Ph.D. studies with many helpful comments from Federico Echenique and Kota Saito. The first author thanks his Ph.D. advisors Jaksa Cvitanic, Federico Echenique, Kota Saito, and Robert Sherman. For helpful discussions, the first author thanks Itai Ashlagi, Kim Border, Martin Cripps, David Dillenberger, Drew Fudenberg, Simone Galperti, Michihiro Kandori, Igor Kopylov, Jay Lu, Fabio Maccheroni, Thomas Palfrey, Charles Plott, Luciano Pomatto, Antonio Rangel, Pablo Schenone, Omer Tamuz, and Leeat Yariv.

References

References

  • [1] David.. Ahn, F. Echenique and K. Saito “On Path Independent Stochastic Choice” In Theoretical Economics 13, 2018, pp. 61–85
  • [2] F.. Anscombe and R.. Aumann “A Definition of Subjective Probability” In The Annals of Mathematics and Statistics 34, 1963, pp. 199–205
  • [3] K.. Arrow “Social Choice and Individual Values”, 12 Yale Univ. Press, 1963
  • [4] K.. Arrow “Optimal Insurance and Generalized Deductibles” In Scandinavian Actuarial Journal 1, 1974, pp. 1–42
  • [5] M. Baucells and L.. Shapley “Multiperson utility” In Games and Economic Behavior 62, 2008, pp. 329–347
  • [6] A. Billot, I. Gilboa, D. Samet and D. Schmeidler “Probabilities as Similarity-Weighted Frequencies” In Econometrica 73, 2005, pp. 1125–1136
  • [7] T. Borgers and Y. Choo “A Counterexample to Dhillon” In Social Choice Welfare. 48, 2017, pp. 837–843
  • [8] T. Borgers and Y. Choo “Revealed Relative Utilitarianism” Working Paper, 2017
  • [9] P.. Cook and D.. Graham “The Demand for Insurance and Protection: The Case of Irreplaceable Commodities” In Quarterly Journal of Economics 91, 1997, pp. 143–156
  • [10] J. Dereze “Decision Theory with Moral Hazard and State-dependent Preferences” In Essays on Economic Decisions Under Uncertainty Cambridge University Press, 1987
  • [11] J. Dereze and A. Rustichini “Moral Hazard and Conditional Preferences” In Journal of Mathematical Economics 31, 1999, pp. 159–181
  • [12] A. Dhillon “Extended Pareto Rules and Relative Utilitarianism” In Social Choice Welfare 15, 1998, pp. 521–542
  • [13] A. Dhillon and J.. Mertens “Relative Utilitarianism” In Econometrica 67, 1999, pp. 471–498
  • [14] F. Echenique and K. Saito “General Luce Model” In Theoretical Economics 13, 2018, pp. 61–85
  • [15] P.. Fishburn “A Mixture-Set Axiomatization of Conditional Subjective Expected Utility” In Econometrica 41, 1973, pp. 1–25
  • [16] P. Ghirardato “Revisiting Savage in a Conditional World” In Economic Theory 20, 2002, pp. 83–92
  • [17] I. Gilboa and D. Schmeidler “Case-Based Decision Theory” In Quarterly Journal of Economics 110, 1995, pp. 605–639
  • [18] I. Gilboa and D. Schmeidler “Inductive Inference: An Axiomatic Approach” In Econometrica 71, 2003, pp. 1–26
  • [19] I. Gilboa and D. Schmeidler “Case-Based Predictions: An Axiomatic Approach to Prediction, Classification and Statistical Learning” World Scientific Publishing Co, Singapore., 2012
  • [20] J.. Harsanyi “Cardinal Welfare, Individual Ethics, and Interpersonal Comparisons of Utility” In Journal of Political Economy 63, 1955, pp. 309–321
  • [21] A. Hylland “Aggregation Procedure for Cardinal Preferences: A Comments” In Econometrica 48, 1980, pp. 539–542
  • [22] E. Kalai and N. Megiddo “Path Independent Choices” In Econometrica 48, 1980, pp. 781–784
  • [23] E. Kalai and N. Schmeidler “Aggregation Procedure for Cardinal Preferences: A Formulation and Proof of Samuelson’s Impossibility Conjecture” In Econometrica 45, 1977, pp. 1431–1438
  • [24] E. Karni “Decision making under uncertainty: the case of state-dependent preferences” Cambridge: Harvard University Press, 1985
  • [25] E. Karni “Subjective Expected Utility Theory Without States of the World” In Journal of Mathematical Economics 42, 2006, pp. 325–342
  • [26] E. Karni and D. Schmeidler “Foundations of Bayesian theory” In Journal of Economic Theory 132, 2007, pp. 167–188
  • [27] E. Karni and D. Schmeidler “An Expected Utility Theory for State-Dependent Preferences” In Theory and Decision 81, 2016, pp. 467–478
  • [28] E. Karni, D. Schmeidler and K. Vind “On State Dependent Preferences and Subjective Probabilities” In Econometrica 51, 1983, pp. 1021–1031
  • [29] R.. Luce “Individual Choice Behavior a Theoretical Analysis” John WileySons, 1959
  • [30] R.. Luce and D.. Krantz “Conditional Expected Utility” In Econometrica 39, 1971, pp. 253–271
  • [31] M.. Machina and R.. Parks “On Path Independent Randomized Choice” In Econometrica 49, 1981, pp. 1345–1347
  • [32] C.. Plott “Path Independence, Rationality, and Social Choice” In Econometrica 41, 1973, pp. 1075–1091
  • [33] F. Ramsey “Truth and Probability” In The Foundations of Mathematics and Other Logical Essays RoutledgeKegan Paul Ltd, 1931
  • [34] L.. Savage “The Foundations of Statistics” John WileySons, 1954
  • [35] L.. Shapley and M. Shubik “Preferences and Utility” In Game Theory in the Social Sciences: Concepts and Solutions Cambridge, MA: MIT Press, 1982
  • [36] E. Shmaya and L. Yariv “Foundations for Bayesian Updating” Working Paper, Caltech, 2007
  • [37] C. Skiadas “Conditioning and Aggregation of Preferences” In Econometrica 65, 1997, pp. 347–367
  • [38] J. Von-Neumann and O. Morgenstern “Theory of Games and Economic Behavior” Princeton University Press, 1944
  • [39] J.. Weymark “A Reconsideration of the Harsanyi-Sen Debate on Utilitarianism” In Interpersonal Comparisons of Well-Being Cambridge University Press, 1991, pp. 255–320

9. Appendix

9.1. Proof of Theorem 1

The main part of the proof follows the steps of [6], with some twists.

The following two lemmas are the central ideas behind the proof. They help us to first define the function ww and then extend it from the binary sets to any finite-cardinality sets.

Lemma 1.

Select X as any nonempty set. Let XX^{*} denote the set of all nonempty finite subsets of X. Consider two functions f1,f2:Xnf_{1},f_{2}:X^{*}\to\mathbb{R}^{n} that satisfy the strict weighted averaging axiom. Select four points a,b,c,da,b,c,d in the space XX^{*} such that ab=cda\cup b=c\cup d and ab=cd=a\cap b=c\cap d=\emptyset. If x{a,b,c,d}f1(x)=f2(x)\forall\ x\in\{a,b,c,d\}\ f_{1}(x)=f_{2}(x) and not all {f1(a),f1(b),f1(c),f1(d)}\{f_{1}(a),f_{1}(b),f_{1}(c),f_{1}(d)\} are on a same line, then f1(ab)=f2(ab)f_{1}(a\cup b)=f_{2}(a\cup b).

Proof.

Since f1f_{1} satisfies the strict weighted averaging axiom and ab=cda\cup b=c\cup d and ab=cd=a\cap b=c\cap d=\emptyset, thus f1(ab)f_{1}(a\cup b) is on the line connecting f1(a),f1(b)f_{1}(a),f_{1}(b). Also, since ab=cda\cup b=c\cup d , f1(ab)=f1(cd)f_{1}(a\cup b)=f_{1}(c\cup d) should be on the line connecting f1(c)f_{1}(c) and f1(d)f_{1}(d). But {f1(a),f1(b),f1(c),f1(d)}\{f_{1}(a),f_{1}(b),f_{1}(c),f_{1}(d)\} are not collinear, thus the line connecting f1(a)f_{1}(a) and f1(b)f_{1}(b) and the line connecting f1(c)f_{1}(c) and f1(d)f_{1}(d) can only intersect at most at a single point. But f1(ab)f_{1}(a\cup b) is on the both lines, hence this point must be the unique intersection of them.

Similarly, the same is true for f2f_{2}. This means f2(ab)f_{2}(a\cup b) must be the unique intersection of the line passing through f2(a),f2(b)f_{2}(a),f_{2}(b) and the line passing through f2(c),f2(d)f_{2}(c),f_{2}(d). But since x{a,b,c,d}f1(x)=f2(x)\forall x\in\{a,b,c,d\}\ f_{1}(x)=f_{2}(x), f2(ab)f_{2}(a\cup b) should be the unique intersection of the line passing through f1(a),f1(b)f_{1}(a),f_{1}(b) and the line passing through f1(c),f1(d)f_{1}(c),f_{1}(d). But we have already shown that f1(ab)f_{1}(a\cup b) is also the unique intersection of the line passing through f1(a),f1(b)f_{1}(a),f_{1}(b) and the line passing through f1(c),f1(d)f_{1}(c),f_{1}(d). Thus, f1(ab)=f2(ab)f_{1}(a\cup b)=f_{2}(a\cup b).

Lemma 2.

Assume that {x,y,z}\{x,y,z\} are three points in X such that f(x),f(y),f(z)f(x),f(y),f(z) are not collinear. Let ff satisfy the strict weighted averaging axiom and f({x,y,z})=a1f(x)+a2f(y)+a3f(z)f(\{x,y,z\})=a_{1}f(x)+a_{2}f(y)+a_{3}f(z), then a1/a2a_{1}/a_{2} must be independent of the choice of z, as long as f(x),f(y),f(z)f(x),f(y),f(z) are not collinear. Moreover, if f({x,y})=λf(x)+(1λ)f(y)f(\{x,y\})=\lambda f(x)+(1-\lambda)f(y), then a1a2=λ1λ\frac{a_{1}}{a_{2}}=\frac{\lambda}{1-\lambda}.

Proof.

Since f(x),f(y),f(z)f(x),f(y),f(z) are not collinear, they should be affinely independent. Hence, a1,a2,a3a_{1},a_{2},a_{3} are uniquely defined.

By the strict weighted averaging axiom, there exists λ1(0,1)\lambda_{1}\in(0,1) such that f({x,y,z})=λ1f({x,y})+(1λ1)f({z})f(\{x,y,z\})=\lambda_{1}f(\{x,y\})+(1-\lambda_{1})f(\{z\}). Again by the strict weighted averaging axiom there exists λ(0,1)\lambda\in(0,1) such that f({x,y})=λf(x)+(1λ)f(y)f(\{x,y\})=\lambda f(x)+(1-\lambda)f(y). Hence, f({x,y,z})=λ1(λf(x)+(1λ)f(y))+(1λ1)f({z})f(\{x,y,z\})=\lambda_{1}(\lambda f(x)+(1-\lambda)f(y))+(1-\lambda_{1})f(\{z\}). By affinely independence of f(x),f(y),f(z)f(x),f(y),f(z), we should have a1=λ1λa_{1}=\lambda_{1}\lambda and a2=λ1(1λ)a_{2}=\lambda_{1}(1-\lambda). This means that a1a2=λ1λ\frac{a_{1}}{a_{2}}=\frac{\lambda}{1-\lambda}, which means that a1/a2a_{1}/a_{2} is independent of the choice of z, as long as f(x),f(y),f(z)f(x),f(y),f(z) are not collinear. ∎

9.1.1. Proving the necessary and the uniqueness part

Assume that the weight function ww exists. Therefore, f(A)=xAw(x)f(x)xAw(x)f(A)=\frac{\sum_{x\in A}w(x)f(x)}{\sum_{x\in A}w(x)}. It shows that if AB=A\cap B=\emptyset, then f(AB)=xABw(x)f(x)xABw(x)=(xAw(x)xABw(x))(xAw(x)f(x)xAw(x))+(xBw(x)xABw(x))(xBw(x)f(x)xBw(x))\ f(A\cup B)=\frac{\sum_{x\in A\cup B}w(x)f(x)}{\sum_{x\in A\cup B}w(x)}=\linebreak(\frac{\sum_{x\in A}w(x)}{\sum_{x\in A\cup B}w(x)})(\frac{\sum_{x\in A}w(x)f(x)}{\sum_{x\in A}w(x)})+(\frac{\sum_{x\in B}w(x)}{\sum_{x\in A\cup B}w(x)})(\frac{\sum_{x\in B}w(x)f(x)}{\sum_{x\in B}w(x)}). By defining λ=xAw(x)xABw(x)\lambda=\frac{\sum_{x\in A}w(x)}{\sum_{x\in A\cup B}w(x)}, we have f(AB)=λf(A)+(1λ)f(B)f(A\cup B)=\lambda f(A)+(1-\lambda)f(B). Thus, the strict weighted averaging axiom satisfied.

Regarding the uniqueness of ww, assume that there exist two w1,w2w_{1},w_{2} such that f(A)=xAw1(x)f(x)xAw1(x)=xAw2(x)f(x)xAw2(x)f(A)=\frac{\sum_{x\in A}w_{1}(x)f(x)}{\sum_{x\in A}w_{1}(x)}=\frac{\sum_{x\in A}w_{2}(x)f(x)}{\sum_{x\in A}w_{2}(x)}. Since the range of ff is not a subset of a line, there exist at least three elements x,y,zXx,y,z\in X such that f(x),f(y),f(z)f(x),f(y),f(z) are not collinear. Thus, they are affinely independent. Hence, f({x,y,z})=a1f(x)+a2f(y)+a3f(z)f(\{x,y,z\})=a_{1}f(x)+a_{2}f(y)+a_{3}f(z) has a unique solution a1,a2,a3a_{1},a_{2},a_{3}. Hence, there should be an α\alpha such that w1(p)/w2(p)=αp{x,y,z}w_{1}(p)/w_{2}(p)=\alpha\ \forall p\in\{x,y,z\}. We will show that for all other point rX,w1(r)/w2(r)=αr\in X,\allowbreak\ w_{1}(r)/w_{2}(r)=\alpha.

Select a point rXr\in X, based on the assumption on {x,y,z}\{x,y,z\}, there should be at least two points u,vu,v in {x,y,z}\{x,y,z\} such that f(r),f(u),f(v)f(r),f(u),f(v) are not collinear. Without loss of generality, assume that {u,v}={x,y}\{u,v\}=\{x,y\}. Since f(r),f(x),f(y)f(r),f(x),f(y) are affinely independent, f({x,y,r})=b1f(x)+b2f(y)+b3f(r)f(\{x,y,r\})=b_{1}f(x)+b_{2}f(y)+b_{3}f(r) where b1,b2,b3b_{1},b_{2},b_{3} are unique. Therefore, there exists β\beta such that w1(p)/w2(p)=βp{x,y,r}w_{1}(p)/w_{2}(p)=\beta\ \forall p\in\{x,y,r\}. But notice that α=w1(x)/w2(x)=β\alpha=w_{1}(x)/w_{2}(x)=\beta. Hence, we should have w1(r)/w2(r)=αw_{1}(r)/w_{2}(r)=\alpha and this is what we wanted to prove.

9.1.2. Proving the sufficiency part

First, in order to define the function ww, fix an element x0Xx_{0}\in X and put w(x0)=1w(x_{0})=1. Based on the strict weighted averaging axiom for any yX{x0}y\in X\setminus\{x_{0}\} such that f(y)f(x0)f(y)\neq f(x_{0}), we have a unique λ(0,1)\lambda\in(0,1) such that f({x0,y})=λf(x0)+(1λ)f(y)f(\{x_{0},y\})=\lambda f(x_{0})+(1-\lambda)f(y). Let define w(y)=1λλw(y)=\frac{1-\lambda}{\lambda}.

To define the weight for any other yX{x0}y\in X\setminus\{x_{0}\} with f(y)=f(x0)f(y)=f(x_{0}), we fix another point z0X{x0}z_{0}\in X\setminus\{x_{0}\} such that f(z0)f(x0)f(z_{0})\neq f(x_{0}). Since f(x0)=f(y)f(x_{0})=f(y), we should have f(y)f(z0)f(y)\neq f(z_{0}). By using the strict weighted averaging axiom, we know that there exists a unique λ(0,1)\lambda\in(0,1) such that f({z0,y})=λf(z0)+(1λ)f(y)f(\{z_{0},y\})=\lambda f(z_{0})+(1-\lambda)f(y). Since the weight on z0z_{0} has already been defined, we define the weight of yy such that w(y)w(z0)=1λλ\frac{w(y)}{w(z_{0})}=\frac{1-\lambda}{\lambda}. Thus, w(y)=w(z0)×1λλw(y)=w(z_{0})\times\frac{1-\lambda}{\lambda}.

In the rest of this section, we are going to prove that ww satisfies the representation of the theorem. It means that by defining f(A)=xAw(x)f(x)xAw(x)f^{*}(A)=\frac{\sum_{x\in A}w(x)f(x)}{\sum_{x\in A}w(x)}, we should have f(A)=f(A)f^{*}(A)=f(A).

First, in Step 1 we prove that the representation holds for any three points, as long as the three points under ff are not collinear. In Step 2, we prove that the representation holds for any two points. In Step 3, (which is not necessary, and we provide it for its simplicity to capture the main ideas of the main part) we prove that for three points the representation holds. Finally in Step 4, by using induction on the cardinality of subsets of XX, we show that the representation holds for any subset of XX.

Step 1: for any three points r,s,tr,s,t such that f(r),f(s),f(t)f(r),f(s),f(t) are not collinear, we have f({r,s,t})=a1f(r)+a2f(s)+a3f(t)f(\{r,s,t\})=a_{1}f(r)+a_{2}f(s)+a_{3}f(t), where aia_{i} are unique. Note that, it is enough to prove that a1a2=w(r)w(s)\frac{a_{1}}{a_{2}}=\frac{w(r)}{w(s)}, because in the same way, we can also get a2a3=w(s)w(t),a3a1=w(t)w(r)\frac{a_{2}}{a_{3}}=\frac{w(s)}{w(t)},\ \frac{a_{3}}{a_{1}}=\frac{w(t)}{w(r)}. There are two cases:

Case 1: If x0,r,sx_{0},r,s are such f(x0),f(r),f(s)f(x_{0}),f(r),f(s) are not collinear then f({x0,r,s})=b1f(x0)+b2f(r)+b3f(s)f(\{x_{0},r,s\})=b_{1}f(x_{0})+b_{2}f(r)+b_{3}f(s). Based on Lemma 2, we know that a1a2=b2b3\frac{a_{1}}{a_{2}}=\frac{b_{2}}{b_{3}}. But Again using the Lemma 2 and the way we define ww, we know that b1b2=1w(r),b1b3=1w(s)\frac{b_{1}}{b_{2}}=\frac{1}{w(r)},\frac{b_{1}}{b_{3}}=\frac{1}{w(s)} which means that b2b3=w(r)w(s)\frac{b_{2}}{b_{3}}=\frac{w(r)}{w(s)}. Hence, we have a1a2=w(r)w(s)\frac{a_{1}}{a_{2}}=\frac{w(r)}{w(s)}.

Case 2: If x0,r,sx_{0},r,s are such that f(x0),f(r),f(s)f(x_{0}),f(r),f(s) are collinear, in this case both {f(x0),f(r),f(t)}\{f(x_{0}),f(r),f(t)\} and {f(x0),f(s),f(t)}\{f(x_{0}),f(s),f(t)\} are not collinear. By the same technique as the first case, we get that a1a3=w(r)w(t)\frac{a_{1}}{a_{3}}=\frac{w(r)}{w(t)} and a3a2=w(t)w(s)\frac{a_{3}}{a_{2}}=\frac{w(t)}{w(s)}. Hence, it means that a1a2=a1a3×a3a2=w(r)w(t)×w(t)w(s)=w(r)w(s)\frac{a_{1}}{a_{2}}=\frac{a_{1}}{a_{3}}\times\frac{a_{3}}{a_{2}}=\frac{w(r)}{w(t)}\times\frac{w(t)}{w(s)}=\frac{w(r)}{w(s)}, which is what we wanted to prove.

Step 2: Assume that r,sXr,s\in X. We want to show that f({r,s})=f({r,s})f^{*}(\{r,s\})=f(\{r,s\}). If f(r)=f(s)f(r)=f(s), then it is true. If f(r)f(s)f(r)\neq f(s), then by the richness condition, there exists an element tXt\in X such that {f(t),f(r),f(s)}\{f(t),f(r),f(s)\} are not collinear. Based on Step 1, we know that f({t,r,s})=f({t,r,s})f(\{t,r,s\})=f^{*}(\{t,r,s\}), also we have f(t)=f(t),f(r)=f(r)f(t)=f^{*}(t),f(r)=f^{*}(r), and f(s)=f(s)f(s)=f^{*}(s). Notice that, based on the strict weighted averaging axiom, f({r,s})f(\{r,s\}) is on the line connecting f(r)f(r) and f(s)f(s). Also, it is on the line connecting f({t,r,s})f(\{t,r,s\}) and f(t)f(t). The reason is that by the strict combination axiom, there exist a λ(0,1)\lambda\in(0,1) such that f({t,r,s})=λf(t)+(1λ)f({r,s})f(\{t,r,s\})=\lambda f(t)+(1-\lambda)f(\{r,s\}), which means that f({r,s})f(\{r,s\}) is on the line connecting f({t,r,s})f(\{t,r,s\}) and f(t)f(t). Similarly, everything holds for ff^{*} which means that f({r,s})f^{*}(\{r,s\}) is on the line connecting f(r)f^{*}(r) and f(s)f^{*}(s) and also it is on the line connecting f({t,r,s})f^{*}(\{t,r,s\}) and f(t)f^{*}(t). Since {f(t),f(r),f(s)}\{f(t),f(r),f(s)\} are not collinear the intersection of two line can have at most one solution and since f({t,r,s})=f({t,r,s})f(\{t,r,s\})=f^{*}(\{t,r,s\}), f(t)=f(t)f(t)=f^{*}(t), f(r)=f(r)f(r)=f^{*}(r), and f(s)=f(s)f(s)=f^{*}(s) then by a similar argument as Lemma 1, we should have a unique intersection, which satisfies f({r,s})=f({r,s})f^{*}(\{r,s\})=f(\{r,s\}). This is what we wanted to prove.

Step 3: (This part is the tricky part, we provide it to capture the main ideas. We will use the same technique in Step 4) We are going to prove that for all three point r,s,tr,s,t we have f({r,s,t})=f({r,s,t})f^{*}(\{r,s,t\})=f(\{r,s,t\}).There are two separate cases to be considered.

Case 1: If f(r),f(s),f(t)f(r),f(s),f(t) are not collinear, then by Step 1, it is correct.

Case 2: Assume that f(r),f(s),f(t)f(r),f(s),f(t) are collinear. If all of them are the same, then by strict weighted averaging axiom f({r,s,t})=f({r,s,t})f^{*}(\{r,s,t\})=f(\{r,s,t\}). Hence, assume that they are not all the same.

Without loss of the generality, assume that f(s)f(r),f(s)f(t)f(s)\neq f(r),f(s)\neq f(t). Based on the richness condition of ff, we should have a point vXv\in X such that not all f(v),f(r),f(s)f(v),f(r),f(s), and f(t)f(t) are collinear. Note that, f(v),f(r),f(s)f(v),f(r),f(s) are not collinear. Similarly, f(v),f(s),f(t)f(v),f(s),f(t) are not collinear. Based on Case 1, we know that f({v,r,s})=f({v,r,s})f^{*}(\{v,r,s\})=f(\{v,r,s\}) and f({v,s,t})=f({v,s,t})f^{*}(\{v,s,t\})=f(\{v,s,t\}). Also, we know that f(v)=f(v)f^{*}(v)=f(v), f(r)=f(r)f^{*}(r)=f(r), f(s)=f(s)f^{*}(s)=f(s), and f(t)=f(t)f^{*}(t)=f(t). Using the strict weighted averaging axiom, we know that f({v,r,s,t})f(\{v,r,s,t\}) is on the intersection of the line passing through f({v,r,s})f(\{v,r,s\}) and f(t)f(t), and the line passing through the f({v,s,t})f(\{v,s,t\}) and f(r)f(r). Also, note that not all of f({v,r,s})f(\{v,r,s\}), f({v,s,t})f(\{v,s,t\}), f(r)f(r), and f(t)f(t) are collinear, since otherwise f(v)f(v) must be on the line connecting f(r)f(r) and f(s)f(s). Similarly, we have the same properties for f({v,r,s,t})f^{*}(\{v,r,s,t\}). Based on the argument of the Lemma 1, we have f({v,r,s,t})=f({v,r,s,t})f(\{v,r,s,t\})=f^{*}(\{v,r,s,t\}).

By using the strict weighted averaging axiom, we know that f({r,s,t})f(\{r,s,t\}) is on the line passing through f({v,r,s,t})f(\{v,r,s,t\}) and f(v)f(v), since there exists λ(0,1)\lambda\in(0,1) such that f({v,r,s,t})=λf({r,s,t})+(1λ)f(v)f(\{v,r,s,t\})=\lambda f(\{r,s,t\})+(1-\lambda)f(v). Again, by using the strict weighted averaging axiom, we know that f({r,s,t})f(\{r,s,t\}) is on the line passing through f({r,s})f(\{r,s\}) and f(t)f(t). Also, the same holds for ff^{*}. Moreover, we have f({v,r,s,t})=f({v,r,s,t})f(\{v,r,s,t\})=f^{*}(\{v,r,s,t\}), f({r,s})=f({r,s})f(\{r,s\})=f^{*}(\{r,s\}), f(v)=f(v)f(v)=f^{*}(v), and f(t)=f(t)f(t)=f^{*}(t). Also, not all f({v,r,s,t})f(\{v,r,s,t\}), f({r,s})f(\{r,s\}), f(v)f(v), and f(t)f(t) are on a same line, since otherwise f(v)f(v), f(r)f(r), f(s)f(s), and f(t)f(t) are collinear which is not correct. As a result, based on the argument of lemma 1, we have f({r,s,t})=f({r,s,t})f^{*}(\{r,s,t\})=f(\{r,s,t\}). The latter is what we wanted to prove.

Step 4 (The main Step): Up to here, we prove that for any AXA\in X^{*} if |A|3|A|\leq 3 then f(A)=f(A)f^{*}(A)=f(A). To complete the proof, we will use an induction on the cardinality of AA. Assume that for all AXA\in X^{*} with |A|k|A|\leq k we have f(A)=f(A)f^{*}(A)=f(A). We are going to show that for all AXA\in X^{*} with |A|=k+1|A|=k+1, we have f(A)=f(A)f^{*}(A)=f(A).

Fix a subset AA with |A|=k+1|A|=k+1. Assume that A={x1,,xk+1}A=\{x_{1},\ldots,x_{k+1}\}. There are two separate cases to be considered.

Case 1: Assume that not all {f(xi)}i=1k+1\{f(x_{i})\}_{i=1}^{k+1} are collinear. Note that, by the induction hypothesis, xA\forall x\in A and B2A{x}\forall\ B\in 2^{A\setminus\{x\}} we have f(B)=f(B)f(B)=f^{*}(B). Define line(f(x),f(A{x}))line(f(x),f(A\setminus\{x\})) as the line passing through f(x)f(x) and f(A{x})f(A\setminus\{x\}) for the case where f(x)f(A{x})f(x)\not=f(A\setminus\{x\}). However, if f(x)=f(A{x})f(x)=f(A\setminus\{x\}), then define it as the single point f(x)f(x).

If there exists xAx\in A such that f(x)=f(A{x})f(x)=f(A\setminus\{x\}), then based on the strict weighted averaging axiom , there exists λ(0,1)\lambda\in(0,1) such that f(A)=λf(x)+(1λ)f(A{x})=f(x)f(A)=\lambda f(x)+(1-\lambda)f(A\setminus\{x\})=f(x). Similarly, f(A)=f(x)f^{*}(A)=f^{*}(x). But, we know that f(x)=f(x)f(x)=f^{*}(x), which means that f(A)=f(A)f(A)=f^{*}(A).

If xAf(x)f(A{x})\forall\ x\in A\ f(x)\not=f(A\setminus\{x\}), then there exist x,yAx,y\in A such that f(x),f(A{x})f(x),f(A\setminus\{x\}), f(y),f(A{y})f(y),f(A\setminus\{y\}) are not collinear. Because, otherwise all f(xi)f(x_{i}) are on the f(x),f(A{x})f(x),f(A\setminus\{x\}), which cannot be correct since we assumed that not all {f(xi)}i=1k+1\{f(x_{i})\}_{i=1}^{k+1} are collinear. Considering x,yAx,y\in A such that f(x),f(A{x}),f(y),f((A{y})f(x),f(A\setminus\{x\}),f(y),f((A\setminus\{y\}) are not collinear, based on the strict weighted averaging axiom we know that f(A)f(A) is on line(f(x),f(A{x}))line(f(x),f(A\setminus\{x\})). Also, it must be on line(f(y),f(A{y}))line(f(y),f(A\setminus\{y\})). Similarly, by the strict weighted averaging axiom when applied to ff^{*}, we know that f(A)f^{*}(A) is on line(f(x),f(A{x}))line(f^{*}(x),f^{*}(A\setminus\{x\})). Moreover, it must be on the line(f(y),f(A{y}))line(f^{*}(y),f^{*}(A\setminus\{y\})).

Since (1) f(x)=f(x)f(x)=f^{*}(x), f(A{x})=f(A{x})f(A\setminus\{x\})=f^{*}(A\setminus\{x\}), P(y)=P(y)P(y)=P^{*}(y), P(A{y})=P(A{y})P(A\setminus\{y\})=P^{*}(A\setminus\{y\}) and, (2) not all f(x),f(A{x}),f(y)f(x),f(A\setminus\{x\}),f(y), and f((A{y})f((A\setminus\{y\}) are collinear, based on the Lemma 1, we have f(A)=f(A)f^{*}(A)=f(A). Hence in the case that not all {f(xi)}i=1k+1\{f(x_{i})\}_{i=1}^{k+1} are collinear, we showed that f(A)=f(A)f^{*}(A)=f(A).

Case 2: Assume that {f(xi)}i=1k+1\{f(x_{i})\}_{i=1}^{k+1} are collinear. Without loss of generality, assume that f(x1),f(xk+1)f(x_{1}),f(x_{k+1}) are the two extreme points on the line that contains them, which means that all other points are between this two.

If f(x1)=f(xk+1)f(x_{1})=f(x_{k+1}),then all {f(xi)}i=1k+1\{f(x_{i})\}_{i=1}^{k+1} are the same. Using the strict weighted averaging axiom, it shows that f(A)=f(x1)=f(x1)=f(A)f(A)=f(x_{1})=f^{*}(x_{1})=f^{*}(A).

If f(x1)f(xk+1)f(x_{1})\neq f(x_{k+1}), based on the richness condition of the aggregation rule ff, we can select a point yXAy\in X\setminus A such that not all f(y),f(x1)f(y),f(x_{1}), and f(xk+1)f(x_{k+1}) are collinear. Based on the previous Case 1, we know that f(y,x1,,xk)=f(y,x1,,xk)f(y,x_{1},\ldots,x_{k})=f^{*}(y,x_{1},\ldots,x_{k}), since we have proved that ff and ff^{*} are coincided for any k+1k+1 not collinear points. Similarly, we have f(y,x2,,xk+1)=f(y,x2,,xk+1)f(y,x_{2},\allowbreak\ldots,x_{k+1})=f^{*}(y,x_{2},\ldots,x_{k+1}).

Using the strict weighted averaging axiom, f({y,x1,,xk+1})\allowbreak f(\{y,x_{1},\ldots,x_{k+1}\}) is on the line(f({y,x1,,xk}),f(xk+1))\allowbreak line(\allowbreak f(\{y,x_{1},\ldots,x_{k}\})\allowbreak,f(x_{k+1})). It is also on the line(f({y,x2,,xk+1}),f(x1))\allowbreak line(f(\{y,x_{2},\allowbreak\ldots,x_{k+1}\}),f(x_{1})). Also, not all f({y,x1,,xk})f(\{y,x_{1},\ldots,x_{k}\}), f(xk+1))f(x_{k+1})), f({y,x2,,xk+1})\allowbreak f(\{y,x_{2},\allowbreak\ldots,\allowbreak x_{k+1}\allowbreak\}), f(x1))f(x_{1})) are collinear, since f({y,x1,,xk})f(\{y,x_{1},\allowbreak\ldots,x_{k}\}) cannot be on line(f(x1),f(xk+1))\allowbreak line(f(x_{1}),\allowbreak f(x_{k+1})) otherwise f(y)f(y) must be on that line which is not correct. Similarly, everything holds for the ff^{*}.

Since, f({y,x1,,xk})=f({y,x1,,xk})f(\{y,x_{1},\allowbreak\ldots,x_{k}\})=f^{*}(\{y,x_{1},\ldots,x_{k}\}), f(xk+1)=f(xk+1)f(x_{k+1})=f^{*}(x_{k+1}), f({y,x2,,xk+1})=f({y,x2,,xk+1})f(\{y,x_{2},\ldots,x_{k+1}\})=f^{*}(\{y,x_{2},\ldots,x_{k+1}\}), and f(x1)=f(x1)f(x_{1})=f^{*}(x_{1}) then again by using Lemma 1, we get f({y,x1,,xk+1})=f({y,x1,,xk+1})f(\{y,x_{1},\ldots,x_{k+1}\})=f^{*}(\{y,x_{1},\ldots,x_{k+1}\}).

The point f({x1,,xk+1})f(\{x_{1},\ldots,x_{k+1}\}) is on the line(f(x1),f(xk))line(f(x_{1}),f(x_{k})). It is also on the line(f(y),f({y,x1,,xk+1}))line(f(y)\allowbreak,f(\{y,x_{1},\ldots,x_{k+1}\})), since by the strict weighted averaging axiom f({y,x1,,xk+1})=λf(y)+(1λ)f({x1,,xk+1})f(\{y,x_{1},\ldots,x_{k+1}\})=\lambda f(y)+(1-\lambda)f(\{x_{1},\ldots,x_{k+1}\}) for some λ(0,1)\lambda\in(0,1). Similarly, the same holds for ff^{*}. Finally, since (1) f(y)=f(y)f(y)=f^{*}(y), f({y,x1,,xk+1})=f({y,x1,,xk+1})f(\{y,x_{1},\allowbreak\ldots,\allowbreak x_{k+1}\allowbreak\})=\allowbreak f^{*}(\{y,x_{1},\ldots,x_{k+1}\}), f(x1)=f(x1)f(x_{1})=f^{*}(x_{1}) and, (2) f(xk+1)=f(xk+1)f(x_{k+1})=f^{*}(x_{k+1}) and f(x1),f(xk+1)f(x_{1}),f(x_{k+1}), and f(y)f(y) are not collinear, then using the same types of arguments in Lemma 1, we get f(A)=f(A)f(A)=f^{*}(A). This is what we wanted to prove.

Hence, for all AXA\in X^{*} with cardinality k+1k+1, we have f(A)=f(A)f(A)=f^{*}(A). Based on the induction, we have f(A)=f(A)f(A)=f^{*}(A) for all AXA\in X^{*}. This completes the proof.

9.2. Proof of Theorem 2

There are couple of steps in the proof. Defining the weak order:

Step 1: First, we define a binary relation \succcurlyeq over every two different elements x,yXx,y\in X by:

Case 1: If f(x)f(y)f(x)\neq f(y), we define xyf({x,y})f(y)x\succcurlyeq y\iff f(\{x,y\})\neq f(y).

Case 2: If f(x)=f(y)f(x)=f(y), then by the strong richness condition, we select another point zXz\in X, such that f({x,z}){f(x),f(z)}f(\{x,z\})\notin\{f(x),f(z)\}. Hence, we have f(z)f(y)f(z)\neq f(y). In this case, we define xyf({z,y})f(y)x\succcurlyeq y\iff f(\{z,y\})\neq f(y).

To obtain reflexivity, for any xXx\in X, we define xxx\succcurlyeq x.

Step 2: We prove that \succcurlyeq is a weak order. The reflexivity and the completeness are trivial. We only need to establish the transitivity. Assume that xy,yzx\succcurlyeq y,y\succcurlyeq z. We will show that xzx\succcurlyeq z.

The proof is by contradiction. Therefore, assume that zxz\succ x.

Case 1: Assume that f(x),f(y),f(z)f(x),f(y),f(z) are non-collinear. Since zxz\succ x, based on the way we defined \succcurlyeq, we have f(x,z)=f(z)f(x,z)=f(z).

Consider the coalition {x,y,z}\{x,y,z\}. By using the weighted averaging axiom over the sub-coalitions {x,z}\{x,z\} and {y}\{y\}, the vector f({x,y,z})f(\{x,y,z\}) should be on the line joining f(y)f(y) and f({x,z})f(\{x,z\}) (which is the same as f(z)f(z)). Similarly, by considering the sub-coalitions {x,y}\{x,y\} and {z}\{z\}, f({x,y,z})f(\{x,y,z\}) should be on the line passing through f({x,y})f(\{x,y\}) and f(z)f(z). Since f({x,y})f(y)f(\{x,y\})\neq f(y) and f({x,y}),f(y),f(z)f(\{x,y\}),f(y),f(z) are non-collinear, we have f({x,y,z})=f(z)f(\{x,y,z\})=f(z). However, by considering the sub-coalitions {y,z}\{y,z\}, {x}\{x\} and the fact that f({y,z})f(z)f(\{y,z\})\neq f(z), this cannot be happen. Therefore, xzx\succcurlyeq z.

Case 2: Assume that f(x),f(y),f(z)f(x),f(y),f(z) are collinear. By using the strong richness condition, we can select a point uXu\in X such that f(u)f(u) is not on the line passing through f(x),f(y),f(z)f(x),f(y),f(z), and also f({u,x}){f(x),f(u)}f(\{u,x\})\notin\{f(x),f(u)\} (this means that xux\sim u). First, using Case 1, by considering the coalitions {u,x,y}\{u,x,y\}, {u,x,z}\{u,x,z\}, we have uyu\succcurlyeq y and zuz\succ u. Since {u,y,z}\{u,y,z\} are non-collinear, by using case 1, we have zu,uyzyz\succ u,u\succcurlyeq y\Rightarrow z\succ y. But this is a contradiction. Therefore, xzx\succcurlyeq z.

The main part: proving f(A)=f(M(A,))f(A)=f(M(A,\succcurlyeq)).

Up to here, we show that \succcurlyeq is a weak order. Next, we will show that for any coalition AXA\in X^{*} we have f(A)=f(M(A,))\ f(A)=f(M(A,\succcurlyeq)).

We use the letter HH for the highest-ordered elements of AA, and LL for the rest. In other words, H:=M(A,),L:=AHH:=M(A,\succcurlyeq),\ L:=A\setminus H. The proof is by a double-induction on the cardinality of HH and LL. In Step 1, we will show that if xXx\in X and LXL\in X^{*} are such that zL:xy\forall z\in L:\ x\succ y, then we should have f({x}L)=f(x)f(\{x\}\cup L)=f(x).

In Step 2, we show that for a given coalition HXH\in X^{*}, where all elements of HH are in the same equivalence class, and for all LXL\in X^{*}, if for all xH,yL:xyx\in H,\ y\in L:x\succ y, then we have f(HL)=f(H)f(H\cup L)=f(H). Using these two steps, we will finish the proof.

Step 1: Fix an element xXx\in X. By induction on the cardinality of LL, where yL,xy\forall y\in L,\ x\succ y, we prove that f({x}L)=f(x)f(\{x\}\cup L)=f(x).

We have already proved the case where |L|=1|L|=1. Assume that for all |L|k|L|\leq k the result is correct. We will show that for all LL with |L|=k+1|L|=k+1, the result is also correct.

Fix a coalition LL with |A|=k+1|A|=k+1 and such that yL,xy\forall y\in L,\ x\succ y. Assume that A={y1,,yk+1}A=\{y_{1},\ldots,y_{k+1}\}.

If for all yL:f(y)=f(x)y\in L:f(y)=f(x), then using the weighted averaging axiom f({x}L)=f(x)f(\{x\}\cup L)=f(x), which is what we wanted to prove. Similarly, if f(L)=f(x)f(L)=f(x), we have f({x}L)=f(x)f(\{x\}\cup L)=f(x).

Therefore, consider the case that not all of them are the same and f(L)f(x)f(L)\neq f(x).

Using our definition of the lineline in the proof of Theorem 1, for each yLy\in L we consider line(f({x}(L{y})),f(y))line(f(\{x\}\cup(L\setminus\{y\})),f(y)). Using the weighted averaging axiom, for all yL,f({x}L)line(f({x}L{y}),f(y))y\in L,\ f(\{x\}\cup L)\in line(f(\{x\}\cup L\setminus\{y\}),f(y)). By using the induction hypothesis f({x}L{y})=f(x)f(\{x\}\cup L\setminus\{y\})=f(x). Therefore, yL,f({x}L)line(f(x),f(y))\forall y\in L,\ f(\{x\}\cup L)\in line(f(x),f(y)).

Similar to the proof of Theorem 1, we consider two separate cases.

Case 1: Consider the case where there exist two elements y1,y2Ly_{1},y_{2}\in L such that f(x),f(y1),f(y2)f(x),f(y_{1}),f(y_{2}) are non-collinear. We use the same technique as in the proof of Theorem 1.

We know that f({x}L)line(f(x),f(y1))f(\{x\}\cup L)\in line(f(x),f(y_{1})) as well as f({x}L)line(f(x),f(y2))f(\{x\}\cup L)\in line(f(x),f(y_{2})). Moreover, we know that f(x),f(y1),f(y2)f(x),f(y_{1}),f(y_{2}) are non-collinear. Therefore, the intersection of the two lines, should be f(x)f(x). This shows that f({x}L)=f(x)f(\{x\}\cup L)=f(x).

Case 2: If all the vectors f(x),f(y1),,f(yk+1)f(x),f(y_{1}),\ldots,f(y_{k+1}) are collinear. In this case, the idea is to add a point xXx^{\prime}\in X such that xxx^{\prime}\sim x and f(x)f(x^{\prime}) is not on the line containing all f(x),f(y1),,f(yk+1)f(x),f(y_{1}),\ldots,f(y_{k+1}). This is possible because of the strong richness condition. By using the transitivity of the \succcurlyeq, yL:xy\forall y\in L:\ x^{\prime}\succ y.

Fix a point y0Ly_{0}\in L such that f(y0)f(x)f(y_{0})\neq f(x). This is possible since we already assumed that not all f(y)f(y) ,with yLy\in L, are the same as f(x)f(x).

Consider the coalition {x}{x}L\{x\}\cup\{x^{\prime}\}\cup L. By using the weighted averaging axiom and the sub-coalitions {x,L{y0}},{x,y0}\{x,L\setminus\{y_{0}\}\},\ \{x^{\prime},y_{0}\}, we have f({x}{x}L)line(f({x,L{y0}}),f({x,y0}))f(\{x\}\cup\{x^{\prime}\}\cup L)\in line(f(\{x,L\setminus\{y_{0}\}\}),f(\{x^{\prime},y_{0}\})). Using the induction hypothesis, f({x,L{y0}})=f(x)f(\{x,L\setminus\{y_{0}\}\})=f(x) and f({x,y0})=f(x)f(\{x^{\prime},y_{0}\})=f(x^{\prime}). Therefore, f({x}{x}L)line(f(x),f(x))f(\{x\}\cup\{x^{\prime}\}\cup L)\in line(f(x),f(x^{\prime})).

Next, we show that f({x}{x}L)f(x)f(\{x\}\cup\{x^{\prime}\}\cup L)\neq f(x). Since xxx\sim x^{\prime} and f(x)f(x)f(x)\neq f(x^{\prime}), we have f({x,x})f(x)f(\{x,x^{\prime}\})\neq f(x^{\prime}). Moreover, based on the way we selected the point xx^{\prime}, f({x,x})f(\{x,x^{\prime}\}) is not on the line containing f(x)f(x) and {f(y)|yL}\{f(y)|y\in L\}. Consider the partition of {x}{x}L\{x\}\cup\{x^{\prime}\}\cup L into {x,x}\{x,x^{\prime}\} and LL. Based on the choice of the LL, at the beginning of Step 1, f(L)f(x)f(L)\neq f(x). Since f({x,x})f(x)f(\{x,x^{\prime}\})\neq f(x), f(L)f(x)f(L)\neq f(x), and f(x),f(x),f(L)f(x),f(x^{\prime}),f(L) are non-collinear, we have f({x}{x}L)f(x)f(\{x\}\cup\{x^{\prime}\}\cup L)\neq f(x).

Finally, by partitioning {x}{x}L\{x\}\cup\{x^{\prime}\}\cup L into {x}\{x^{\prime}\} and {x}L\{x\}\cup L, the weighted averaging axiom results in f({x}{x}L)line(f({x}L),f(x))f(\{x\}\cup\{x^{\prime}\}\cup L)\in line(f(\{x\}\cup L),f(x^{\prime})). Therefore, f({x}L)f(\{x\}\cup L) is on the line joining f({x}{x}L)f(\{x\}\cup\{x^{\prime}\}\cup L) and f(x)f(x^{\prime}). However, we have already shown that f({x}{x}L)f(\{x\}\cup\{x^{\prime}\}\cup L) is on the line passing through f(x)f(x) and f(x)f(x^{\prime}). Thus, f({x}L)f(\{x\}\cup L) should be on the line joining f(x)f(x) and f(x)f(x^{\prime}). However, the only intersection of line(f(x),f(x))line(f(x),f(x^{\prime})) and the line containing all the points f(x),f(y1),,f(yk+1)f(x),f(y_{1}),\dots,f(y_{k+1}) is the point f(x)f(x). Thus, f({x}L)=f(x)f(\{x\}\cup L)=f(x), which completes the proof.

Step 2: In this step, by using induction on the cardinality of the set HH, in which all elements have the same order, we show that for any coalition LL if all elements of the set LL have lower orderings compared to the elements of HH, then we should have f(HL)=f(H)f(H\cup L)=f(H).

Fix a set LL. Based on Step 1, we know that for any xXx\in X such that yL:xy\forall y\in L:x\succ y, we should have f({x}L)=f(x)f(\{x\}\cup L)=f(x). This is the starting point of our induction. Assume that for all |H|=k{|H|}=k, we have f(HL)=f(H)f(H\cup L)=f(H). We will show that for any |H|=k+1|H|=k+1, we have f(HL)=f(H)f(H\cup L)=f(H).

For any xHx\in H, by the weighted averaging axiom over the sub-coalitions {x}L\{x\}\cup L and H{x}H\setminus\{x\}, we have f(HL)line(f({x}L),f(H{x}))f(H\cup L)\in line(f(\{x\}\cup L),f(H\setminus\{x\})). Based on step 1, we know that f({x}L)=f(x)f(\{x\}\cup L)=f(x). Therefore, f(HL)line(f(x),f(H{x}))f(H\cup L)\in line(f(x),f(H\setminus\{x\})). Similarly, by the weighted averaging axiom over the coalition HH and its sub-coalitions {x},H{x}\{x\},H\setminus\{x\}, we should have f(H)line(f(x),f(H{x}))f(H)\in line(f(x),f(H\setminus\{x\})). Consider two cases:

Case 1: Consider the case in which not all members of {f(x)|xH}\{f(x)|x\in H\} are collinear. Hence, there should be at least two elements x,yHx,y\in H that f(x),f(H{x}),f(y),f(x),f(H\setminus\{x\}),f(y), and f(H{y})f(H\setminus\{y\}) are not collinear. Therefor, the intersection of the lines joining f(x),f(H{x})f(x),f(H\setminus\{x\}) and the line joining f(y),f(H{y})f(y),f(H\setminus\{y\}) can have at most one intersection. Since f(H)f(H) is on both lines, the unique intersection should be f(H)f(H). But f(HL)f(H\cup L) is also on both lines. Hence, we should have f(HL)=f(H)f(H\cup L)=f(H), which completes this case.

Case 2: Consider the case where all members of the set {f(x)|xH}\{f(x)|x\in H\} are on a line. By using the strong richness condition, there exists an element xXx^{\prime}\in X such that f(x)f(x^{\prime}) is not on that line. We consider the coalition {x}HL\{x^{\prime}\}\cup H\cup L. By using the weighted averaging axiom over the sub-coalitions {x}L\{x^{\prime}\}\cup L and HH, we should have f({x}HL)f(\{x^{\prime}\}\cup H\cup L) on the line joining f({x}L)f(\{x^{\prime}\}\cup L) and f(H)f(H). By the induction hypothesis, f({x}L)=f(x)f(\{x^{\prime}\}\cup L)=f(x^{\prime}). Hence, we should have f({x}HL)line(f(x),f(H))f(\{x^{\prime}\}\cup H\cup L)\in line(f(x^{\prime}),f(H)).

Similarly, by partitioning the set {x}HL\{x^{\prime}\}\cup H\cup L into HLH\cup L and {x}\{x^{\prime}\}, we have f({x}HL)line(f(x),f(HL))f(\{x^{\prime}\}\cup H\cup L)\in line(f(x^{\prime}),f(H\cup L)).

Select an element x1Hx_{1}\in H. By partitioning the coalition {x}HL\{x^{\prime}\}\cup H\cup L between the sub-coalitions {x1,x}\{x_{1},x^{\prime}\} and (H{x1})L(H\setminus\{x_{1}\})\cup\ L, using the weighted averaging axiom, we obtain f({x}HL)f({x})f(\{x^{\prime}\}\cup H\cup L)\neq f(\{x^{\prime}\}).

Finally, using (1) f({x}HL)line(f(x),f(H))f(\{x^{\prime}\}\cup H\cup L)\in line(f(x^{\prime}),f(H)), and (2) f({x}HL)line(f(x),f(HL))f(\{x^{\prime}\}\cup H\cup L)\in line(f(x^{\prime}),f(H\cup L)), and (3) f({x}HL)f(x)f(\{x^{\prime}\}\cup H\cup L)\neq f(x^{\prime}), we have line(f(x),f(H))=line(f(x),f(HL))line(f(x^{\prime}),f(H))=\allowbreak line(f(x^{\prime}),f(H\cup L)). But the intersection of the last line with the line containing all the elements of HH, can have at most one intersection. Therefore, f(HL)=f(H)f(H\cup L)=f(H), which completes the proof.

Completing the proof:

Consider a coalition HH where all elements have the similar order. We consider any two disjoint sub-coalitions H1,H2HH_{1},H_{2}\in H, where f(H1)f(H2)f(H_{1})\neq f(H_{2}). Using the same technique of the previous part, we have f(H1H2)f(H1)f(H_{1}\cup H_{2})\neq f(H_{1}).

By using the result of Theorem 1, we can get the appropriate representation in each equivalence class. Also by using the result of the previous part, f(A)=f(M(A,))f(A)=f(M(A,\succcurlyeq)). The combination of these two results completes the proof. ∎

9.3. Proof of Theorem 3

The following two lemmas help us proving the theorem.

Lemma 3.

Given any two linearly independent vectors v1,v2v_{1},v_{2} in n\mathbb{R}^{n}, there exists a neighborhood of v1v_{1} that any vector in that neighborhood is linearly independent of v2v_{2}. More generally, given any mm vectors {v1,,vm}\{v_{1},\ldots,v_{m}\} such that v1v_{1} is not in the linear space generated by the rest of the points, then there exists a neighbor of v1v_{1} such that any point in that neighborhood is not in span({v2,,vm})\text{span}(\{v_{2},\ldots,v_{m}\}).

Proof.

Since K=span({v2,,vm})K=\text{span}(\{v_{2},\ldots,v_{m}\}) is a closed set that is disjoint from the vector v1v_{1}, the distance between v1v_{1} and KK should be nonzero. Hence, there exists a neighborhood of v1v_{1} (for example the ball with radios dist(v1,K)/2dist(v_{1},K)/2 around v1v_{1}) disjoint from KK. As a result, any point in that neighborhood is not in span({v2,,vn})\text{span}(\{v_{2},\ldots,v_{n}\}). ∎

Lemma 4.

Let v1,v2nv_{1},v_{2}\in\mathbb{R}^{n} be two linearly independent vectors and v=αv1+(1α)v2v=\alpha v_{1}+(1-\alpha)v_{2}, for some α[0,1]\alpha\in[0,1], is a vector between v1,v2v_{1},v_{2}. If the vectors vn=αnv1+(1αn)v2nv_{n}=\alpha_{n}v_{1}+(1-\alpha_{n})v_{2}^{n} are such that αn[0,1]\alpha_{n}\in[0,1], v2nv2v_{2}^{n}\rightarrow v_{2}, and vnvv_{n}\rightarrow v, then αnα\alpha_{n}\to\alpha.

Proof.

We prove it by contradiction. If it is not the case, there exists a subsequence αnk\alpha_{n_{k}} of αn\alpha_{n} and some ϵ>0\epsilon>0, such that nk:αnkBϵ(α)\forall\ n_{k}:\ \alpha_{n_{k}}\notin B_{\epsilon}(\alpha). Based on compactness of [0,1][0,1], there exist a subsequence αnkj\alpha_{n_{k_{j}}} of αnk\alpha_{n_{k}} that is convergent to some β[0,1]\beta\in[0,1]. Since αnkBϵ(α)\alpha_{n_{k}}\notin B_{\epsilon}(\alpha), we have βα\beta\not=\alpha. Based on the assumption of the lemma, since the sequence vnv_{n} is convergent to vv, the subsequence vnkjv_{n_{k_{j}}} also converges to vv. Similarly, v2nkjv_{2}^{n_{k_{j}}} converges to v2v_{2}. Hence, vnkj=αnkjv1+(1αnkj)v2nkjβv1+(1β)v2v_{n_{k_{j}}}=\alpha_{n_{k_{j}}}v_{1}+(1-\alpha_{n_{k_{j}}})v_{2}^{n_{k_{j}}}\rightarrow\beta v_{1}+(1-\beta)v_{2} and vnkjvv_{n_{k_{j}}}\rightarrow v. As a result, βv1+(1β)v2=v=αv1+(1α)v2\beta v_{1}+(1-\beta)v_{2}=v=\alpha v_{1}+(1-\alpha)v_{2}. However, since v1,v2v_{1},v_{2} are linearly independent, α\alpha and β\beta should be the same, which is a contradiction. The contradiction shows that αnα\alpha_{n}\to\alpha. ∎

Using the lemmas mentioned above, we will complete the proof. Based on Theorem 2.5, there exist a unique weak order \succcurlyeq and a weight function w:X++w:X\to\mathbb{R}_{++} such that for any AXA\in X^{*}

f(A)=xM(A,)w(x)f(x)xM(A,)w(x).f(A)=\frac{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)f(x)}{\sum\limits_{x\in M(A,\succcurlyeq)}w(x)}.

Let xXx\in X be any given point. We need to prove that the weight function is continuous around xx and any point close enough to xx has the same order, respect to the weak order \succcurlyeq, as xx.

To complete the proof, assume that xnXx_{n}\in X and xnxx_{n}\to x. We are going to prove that:
1) w(xn)w(x)w(x_{n})\to w(x),
2) N\exists N\in\mathbb{N} such that for all n>N:n>N: xnxx_{n}\sim x.
Proving these two completes the proof.

Based on the strong richness condition, there should be a point yXy\in X such that (1) f(x),f(y)f(x),f(y) are linearly independent, and (2) f({x,y})=w(x)f(x)+w(y)f(y)w(x)+w(y)f(\{x,y\})=\frac{w(x)f(x)+w(y)f(y)}{w(x)+w(y)}, which means that xyx\sim y. The reason is that by the strong richness condition, there should be at least two other points y,zy,z with the same order as xx, such that not all of f(x),f(y),f(x),f(y), and f(z)f(z) are collinear. This means that f(x)f(x) and at least one of f(y)f(y) or f(z)f(z) should be linearly independent. Without loss of generality, we assume that f(x)f(x) and f(y)f(y) are linearly independent.

Given any two points a,bXa,b\in X, we define the function 𝟏a(b)\boldsymbol{1}_{a}(b) as follows:

𝟏a(b)={1if ba,0Otherwise.\boldsymbol{1}_{a}(b)=\begin{cases}1&\text{if }b\succcurlyeq a,\\ 0&\text{Otherwise}.\end{cases}

Consider the sequence of vectors f({xn,y})f(\{x_{n},y\}). By Theorem 2, we have f({xn,y})=𝟏y(xn)w(xn)f(xn)+𝟏xn(y)w(y)f(y)𝟏y(xn)w(xn)+𝟏xn(y)w(y)=𝟏y(xn)w(xn)𝟏y(xn)w(xn)+𝟏xn(y)w(y)f(xn)+𝟏xn(y)w(y)𝟏xn(y)w(y)+𝟏y(xn)w(xn)f(y)\allowbreak f(\{x_{n},y\})\allowbreak=\allowbreak\frac{\boldsymbol{1}_{y}(x_{n})w(x_{n})f(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)f(y)}{\boldsymbol{1}_{y}(x_{n})w(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)}\allowbreak=\frac{\boldsymbol{1}_{y}(x_{n})w(x_{n})}{\boldsymbol{1}_{y}(x_{n})w(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)}f(x_{n})+\frac{\boldsymbol{1}_{x_{n}}(y)w(y)}{\boldsymbol{1}_{x_{n}}(y)w(y)+\boldsymbol{1}_{y}(x_{n})w(x_{n})}f(y). Based on continuity of the aggregation rule ff, f(xn)f(x)f(x_{n})\to f(x) and f({xn,y})f({x,y})f(\{x_{n},y\})\to f(\{x,y\}). Since f(x)f(x) and f(y)f(y) are linearly independent, all conditions of Lemma 4 are satisfied. Hence, we have 𝟏xn(y)w(y)𝟏y(xn)w(xn)+𝟏xn(y)w(y)w(y)w(x)+w(y)\frac{\boldsymbol{1}_{x_{n}}(y)w(y)}{\boldsymbol{1}_{y}(x_{n})w(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)}\to\frac{w(y)}{w(x)+w(y)} and 𝟏y(xn)w(xn)𝟏y(xn)w(xn)+𝟏xn(y)w(y)w(x)w(x)+w(y)\frac{\boldsymbol{1}_{y}(x_{n})w(x_{n})}{\boldsymbol{1}_{y}(x_{n})w(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)}\to\frac{w(x)}{w(x)+w(y)}. Since both w(x)w(x) and w(y)w(y) are strictly positive, we should have 𝟏y(xn)1\boldsymbol{1}_{y}(x_{n})\to 1 and similarly 𝟏xn(y)1\boldsymbol{1}_{x_{n}}(y)\to 1. This means that for large nn, xnyx_{n}\sim y. Since yxy\sim x, for large nn we have xnxx_{n}\sim x. This complete part 2 of the proof.

For the part 1, since we have already proved that for large nn, xnxyx_{n}\sim x\sim y, the convergence 𝟏xn(y)w(y)𝟏y(xn)w(xn)+𝟏xn(y)w(y)w(y)w(x)+w(y)\frac{\boldsymbol{1}_{x_{n}}(y)w(y)}{\boldsymbol{1}_{y}(x_{n})w(x_{n})+\boldsymbol{1}_{x_{n}}(y)w(y)}\to\frac{w(y)}{w(x)+w(y)} becomes w(y)w(xn)+w(y)w(y)w(x)+w(y)\frac{w(y)}{w(x_{n})+w(y)}\to\frac{w(y)}{w(x)+w(y)}. This means that w(xn)w(x)w(x_{n})\to w(x), which proves that ww is continuous at xx.

Proving part 1 and 2 complete the proof.

9.4. Proof of Proposition 1

There are a couple of steps to prove the result.

Step 1: Assume that all signals arrive at time 11. By using Corollary 2, there exists a unique (up to multiplication) weight function w:X++w:X^{*}\to\mathbb{R}_{++}, such that for all AXA\in X^{*}, f(A,1)=xAw(x)f(x)xAw(x)f(A,1)=\frac{\sum\limits_{x\in A}w(x)f(x)}{\sum\limits_{x\in A}w(x)}. By using the uniqueness of ww and the stationarity axiom, for any constant time shift cc and for all AXA\in X^{*} we have:

f(A,c)=xAw(x)f(x)xAw(x).f(A,c)=\frac{\sum\limits_{x\in A}w(x)f(x)}{\sum\limits_{x\in A}w(x)}.

Consider two signals x0,y0Xx_{0},y_{0}\in X, where f(x0)f(y0)f(x_{0})\neq f(y_{0}). Let the timing of x0,y0x_{0},y_{0} be T{x0,y0}(x0)=1,T{x0,y0}(y0)=2T_{\{x_{0},y_{0}\}}(x_{0})=1,T_{\{x_{0},y_{0}\}}(y_{0})=2. Using the strict weighted averaging axiom, there exists a λ(0,1)\lambda\in(0,1) where f({x0,y0},T{x0,y0})=λf(x0)+(1λ)f(y0)f(\{x_{0},y_{0}\},T_{\{x_{0},y_{0}\}})=\lambda f(x_{0})+(1-\lambda)f(y_{0}). We define qq such that 1λλ=q×w(y0)w(x0)\frac{1-\lambda}{\lambda}=q\times\frac{w(y_{0})}{w(x_{0})}.

In the rest of the proof, we show that these choices of w,qw,q attain the representation of Proposition 1.

Step 2: We show that for any signal zXz\in X, the representation holds for the coalition {x0,z}\{x_{0},z\} and for the timing function T{x0,z}(x0)=1,T{x0,z}(z)=2T_{\{x_{0},z\}}(x_{0})=1,T_{\{x_{0},z\}}(z)=2.

Case 1: Consider any signal zXz\in X, such that {f(x0),f(y0),f(z)}\{f(x_{0}),f(y_{0}),f(z)\} are not collinear. We form the coalition {x0,y0,z}\{x_{0},y_{0},z\} with the timing T{x0,y0,z}(x0)=1,T{x0,y0,z}(y0)=2T_{\{x_{0},y_{0},z\}}(x_{0})=1,T_{\{x_{0},y_{0},z\}}(y_{0})=2, T{x0,y0,z}(z)=2T_{\{x_{0},y_{0},z\}}(z)=2. Using the strict weighted averaging axiom, by considering the sub-coalitions {x0}\{x_{0}\} and {y0,z}\{y_{0},z\} and the fact that y0y_{0} and zz has the same timing, Lemma 2, in the proof of Theorem 1, shows that the representation holds for the coalition {x0,z}\{x_{0},z\} with the timing T{x0,z}(x0)=1,T{x0,z}(z)=2T_{\{x_{0},z\}}(x_{0})=1,T_{\{x_{0},z\}}(z)=2.

Case 2: Consider any signal zXz\in X, such that {f(x0),f(y0),f(z)}\{f(x_{0}),f(y_{0}),f(z)\} are collinear. By the richness condition, there exists a signal zXz^{\prime}\in X such that {f(x0),f(y0),f(z),f(z)}\{f(x_{0}),f(y_{0}),f(z),f(z^{\prime})\} are not collinear. We consider the timing T{x0,y0,z,z}(x0)=1,T{x0,y0,z,z}(y0)=2T_{\{x_{0},y_{0},z,z^{\prime}\}}(x_{0})=1,T_{\{x_{0},y_{0},z,z^{\prime}\}}(y_{0})=2, T{x0,y0,z,z}(z)=2,T{x0,y0,z,z}(z)=2T_{\{x_{0},y_{0},z,z^{\prime}\}}(z)=2,T_{\{x_{0},y_{0},z,z^{\prime}\}}(z^{\prime})=2. The representation holds for the sub-coalitions {x,y,z}\{x,y,z^{\prime}\} (by Case 1) and {y,z,z}\{y,z,z^{\prime}\} (since all have the same timing). Thus, by applying Lemma 2 first on {y,z,z}\{y,z,z^{\prime}\} and then on {x,y,z}\{x,y,z^{\prime}\}, we can show that the representation holds for the coalition {x0,z}\{x_{0},z\} with the timing T{x0,z}(x0)=1,T{x0,z}(z)=2T_{\{x_{0},z\}}(x_{0})=1,T_{\{x_{0},z\}}(z)=2.

Step 3: We show that the representation holds for any two signals u,vXu,v\in X with the timing function T{u,v}(u)=1,T{u,v}(v)=2T_{\{u,v\}}(u)=1,T_{\{u,v\}}(v)=2.

Case 1: If {f(x0),f(u),f(v)}\{f(x_{0}),f(u),f(v)\} are non-collinear, then we consider the timing function T{x0,u,v}(x0)=1,T{x0,u,v}(u)=1,T{x0,u,v}(v)=2T_{\{x_{0},u,v\}}(x_{0})=1,T_{\{x_{0},u,v\}}(u)=1,T_{\{x_{0},u,v\}}(v)=2. By applying Lemma 2 twice on {x0,u}\{x_{0},u\} and {x0,v}\{x_{0},v\} with their corresponding timings, we can show that the representation should holds for u,vXu,v\in X with the timing function T{u,v}(u)=1,T{u,v}(v)=2T_{\{u,v\}}(u)=1,T_{\{u,v\}}(v)=2.

Case 1: If {f(x0),f(u),f(v)}\{f(x_{0}),f(u),f(v)\} are collinear, then by the richness condition, there exists a signal zXz\in X such that {f(x0),f(u),f(v),f(z)}\{f(x_{0}),f(u),f(v),f(z)\} are not collinear. Consider the timing function T{x0,u,v,z}(x0)=1,T{x0,u,v,z}(u)=1,T{x0,u,v,z}(v)=2,T{x0,u,v,z}(z)=2T_{\{x_{0},u,v,z\}}(x_{0})=1,T_{\{x_{0},u,v,z\}}(u)=1,T_{\{x_{0},u,v,z\}}(v)=2,T_{\{x_{0},u,v,z\}}(z)=2. By applying Lemma 2 for the sub-coalition {x0,u,z}\{x_{0},u,z\} and their corresponding timing, shows that the representation holds for {u,z}\{u,z\} and their timing T{u,z}(u)=1,T{u,z}(z)=2T_{\{u,z\}}(u)=1,T_{\{u,z\}}(z)=2. Then, by considering the coalition {u,v,z}\{u,v,z\} and their corresponding timing, Lemma 2 shows that the representation holds for {u,v}\{u,v\} and the timing function T{u,v}(u)=1,T{u,v}(v)=2T_{\{u,v\}}(u)=1,T_{\{u,v\}}(v)=2.

Step 4: In this step, we show that given any tt\in\mathbb{N}, the representation holds for any two signals u,vXu,v\in X and the timing function T{u,v}(u)=1,T{u,v}(v)=tT_{\{u,v\}}(u)=1,T_{\{u,v\}}(v)=t. The proof is by induction on tt. By Step 3, the representation holds for t=2t=2. Assume that the representation holds for all t<kt<k with k>3k>3. We will show that it also holds for t=kt=k.

Case 1: If f(u)f(v)f(u)\neq f(v), then we consider a signal zXz\in X such that {f(u),f(v),f(z)}\{f(u),f(v),f(z)\} are not collinear. Let the timing function be T{u,v,z}(u)=1,T{u,v,z}(v)=k+1,T{u,v,z}(z)=kT_{\{u,v,z\}}(u)=1,T_{\{u,v,z\}}(v)\allowbreak=k+1,T_{\{u,v,z\}}(z)=k.

Consider any w(u,1),w(v,k),w(z,k1)(0,1)w(u,1),w(v,k),w(z,k-1)\in(0,1) such that f({u,v,z},Tu,v,z)=w(u,1)f(u)+w(v,k)f(v)+w(z,k1)f(z)f(\{u,v,z\},T_{u,v,z})=w(u,1)f(u)+w(v,k)f(v)+w(z,k-1)f(z). By Lemma 2, induction hypothesis, and the stationarity axiom, we have w(v,k)w(u,1)=w(v,k)w(z,k1)×w(z,k1)w(u,1)=(qw(v)w(z))(qk2w(z)w(u))=qk1w(v)w(u)\frac{w(v,k)}{w(u,1)}=\frac{w(v,k)}{w(z,k-1)}\times\frac{w(z,k-1)}{w(u,1)}=(q\frac{w(v)}{w(z)})(q^{k-2}\frac{w(z)}{w(u)})=q^{k-1}\frac{w(v)}{w(u)}. Thus, the representation holds.

Case 1: If f(u)=f(v)f(u)=f(v), then we consider two signals z,zXz,z^{\prime}\in X such that {f(u),f(v),f(z),f(z)}\{f(u),f(v),f(z),f(z^{\prime})\} are not collinear (which is possible by the richness condition). Let the timing function be T{u,v,z,z}(u)=1,T{u,v,z,z}(v)=k+1,T{u,v,z,z}(z)=k,T{u,v,z,z}(z)=kT_{\{u,v,z,z^{\prime}\}}(u)=1,T_{\{u,v,z,z^{\prime}\}}(v)\allowbreak=k+1,T_{\{u,v,z,z^{\prime}\}}(z)=k,T_{\{u,v,z,z^{\prime}\}}(z^{\prime})=k. By the uniqueness part of Theorem 1 and the induction hypothesis, the representation still holds in this case.

Step 4: Finally, for any coalition AXA\in X^{*} and any timing function TAT_{A}, the uniqueness of Theorem 1 and Step 4 establish that the representation should hold with q,wq,w.

9.5. Proof of Theorem 5 and Corollary 7

Assume that the aggregation rule f:Xmf:X^{*}\to\mathcal{R}^{m} satisfies the minimal agreement condition and vmv\in\mathbb{R}^{m} is the direction on which all agents agree.

Consider two disjoint coalitions A,BXA,B\in X^{*} with the corresponding cardinal utilities uAUAu_{A}\in U_{A} and uBUBu_{B}\in U_{B}. Assume that uABUABu_{A\cup B}\in U_{A\cup B} is a cardinal utility that represents the preference ordering of the union ABA\cup B. If uA=uBu_{A}=u_{B}, then the result is trivial. Hence, consider the case uAuBu_{A}\neq u_{B}.

First, by using the Farkas’ Lemma, we show that the extended Pareto is equivalent to uABCone(uA,uB)u_{A\cup B}\in\text{Cone}^{\circ}(u_{A},u_{B}) (which Cone(uA,uB)\text{Cone}^{\circ}(u_{A},u_{B}) denotes the interior of the cone generated by uAu_{A} and uBu_{B}).

If uABCone(uA,uB)u_{A\cup B}\in\text{Cone}^{\circ}(u_{A},u_{B}), then there exist α,β>0\alpha,\beta>0 such that uAB=αuA+βuBu_{A\cup B}=\alpha u_{A}+\beta u_{B}. Therefore, for any x,yLx,y\in L if uAxuAyu_{A}\cdot x\geq u_{A}\cdot y and uBxuByu_{B}\cdot x\geq u_{B}\cdot y, then uABxuAByu_{A\cup B}\cdot x\geq u_{A\cup B}\cdot y. Similarly, if uAx>uAyu_{A}\cdot x>u_{A}\cdot y and uBxuByu_{B}\cdot x\geq u_{B}\cdot y, then uABx>uAByu_{A\cup B}\cdot x>u_{A\cup B}\cdot y. This proves that the preference ordering of ABA\cup B, satisfies the extended Pareto axiom.

For the other side, if the utility of the union uABCone(uA,uB)u_{A\cup B}\notin\text{Cone}^{\circ}(u_{A},u_{B}), then α,β>0\nexists\alpha,\beta>0 such that uAB=αuA+βuBu_{A\cup B}=\alpha u_{A}+\beta u_{B}. The Farkas’ Lemma guarantees that there exists a vector zmz\in\mathbb{R}^{m} such that zuA0,zuB0z\cdot u_{A}\geq 0,z\cdot u_{B}\geq 0 and zuAB<0z\cdot u_{A\cup B}<0.

Consider a vector ymy\in\mathbb{R}^{m} that is in the interior of LL. We select λ>0\lambda>0 such that y+λzLy+\lambda z\in L. This is possible since we assume that yy is in the interior of LL. By defining x=y+λzx=y+\lambda z, we get xy=λzx-y=\lambda z. Since λ>0\lambda>0 and zuA0,zuB0z\cdot u_{A}\geq 0,z\cdot u_{B}\geq 0, we have uAxuAyu_{A}\cdot x\geq u_{A}\cdot y, and uBxuByu_{B}\cdot x\geq u_{B}\cdot y. But since zuAB<0z\cdot u_{A\cup B}<0, we have uABu_{A\cup B} and x<uAByx<u_{A\cup B}\cdot y. But by the extended Pareto axiom, this cannot be true. Therefore, uABCon(uA,uB)u_{A\cup B}\in\text{Con}^{\circ}(u_{A},u_{B}).

Now consider the intersection of H={xm|xv=1}H=\{x\in\mathbb{R}^{m}|\ x\cdot v=1\} and Cone(uA,uB)\text{Cone}^{\circ}(u_{A},u_{B}). Since uAv>0u_{A}\cdot v>0 and uBv>0u_{B}\cdot v>0, there should be a unique u^AUA,u^BUB\hat{u}_{A}\in U_{A},\hat{u}_{B}\in U_{B} both in HH. It is trivial that Con(uA,uB)=Con(u^A,u^B)\text{Con}^{\circ}(u_{A},u_{B})=\text{Con}^{\circ}(\hat{u}_{A},\hat{u}_{B}).

Since both u^A.v>0\hat{u}_{A}.v>0 and u^B.v>0\hat{u}_{B}.v>0, the intersection of the interior of the cone generated by them and the linear variety HH is the segment [u^A,u^B]={λu^A+(1λ)u^B|λ(0,1)}[\hat{u}_{A},\hat{u}_{B}]=\{\lambda\hat{u}_{A}+(1-\lambda)\hat{u}_{B}|\ \lambda\in(0,1)\}. Since uABCon(u^A,u^B)u_{A\cup B}\in\text{Con}^{\circ}(\hat{u}_{A},\hat{u}_{B}), we should have v.uAB>0v.u_{A\cup B}>0. Hence, there should be a u^ABH\hat{u}_{A\cup B}\in H representing uABu_{A\cup B}. Therefore, u^AB[u^A,u^B]\hat{u}_{A\cup B}\in[\hat{u}_{A},\hat{u}_{B}]. This completes the proof.

9.6. Proof of Theorem 6

There are a couple of steps in the proof. Note that for any profile RXR\in\mathcal{R}_{X}, and for any coalition AXA\subseteq X, RAR_{A} denotes the restricted sub-profile of the coalition AA.

Step 1: Fix a preference r^x¯\hat{r}\in\mathcal{R}_{\overline{x}}. Using the corollary 8, for any profile RXR\in\mathcal{R}_{X} such that R1=r^R_{1}=\hat{r}, we can uniquely define a weight function (which depends on the full profile RR) wR(i):X++w^{R}(i):X\to\mathbb{R}_{++} with wR(1)=1w^{R}(1)=1, such that for any coalition AXA\subseteq X we have:

uH(f(RA))=iA(wR(i)jAwR(j))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w^{R}(i)}{\sum\limits_{j\in A}w^{R}(j)}\right)u_{H}(R_{i}).

First, we show that for any individual iX{1}i\in X\setminus\{1\} and for any two profiles Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} with (Ra)1=(Rb)1=r^(R_{a})_{1}=(R_{b})_{1}=\hat{r} and (Ra)i=(Rb)i(R_{a})_{i}=(R_{b})_{i}, we have wRa(i)=wRb(i)w^{R_{a}}(i)=w^{R_{b}}(i). There are two separate cases:

Case 1: If (Ra)i=(Rb)ir^(R_{a})_{i}=(R_{b})_{i}\neq\hat{r}, then using (Ra){1,i}=(Rb){1,i}(R_{a})_{\{1,i\}}=(R_{b})_{\{1,i\}} and the result of corollary 8, we should have wRa(i)=wRb(i)w^{R_{a}}(i)=w^{R_{b}}(i).

Case 2: If (Ra)i=(Rb)i=r^(R_{a})_{i}=(R_{b})_{i}=\hat{r}, then by considering the definition of the domain X\mathcal{R}_{X}, which require the existence of three non-collinear preferences in each profile, there should be a profile RcXR_{c}\in\mathcal{R}_{X} and two individual j1,j2X{1,i}{j_{1}},{j_{2}}\in X\setminus\{1,i\} such that (Rc)1=(Rc)i=r^(R_{c})_{1}=(R_{c})_{i}=\hat{r}, (Rc)j1=(Ra)j1r^\ (R_{c})_{j_{1}}=(R_{a})_{j_{1}}\neq\hat{r}, and (Rc)j2=(Rb)j2r^\ (R_{c})_{j_{2}}=(R_{b})_{j_{2}}\neq\hat{r}. Using Case 1, we have wRc(j1)=wRa(j1)w^{R_{c}}(j_{1})=w^{R_{a}}(j_{1}) and wRc(j2)=wRb(j2)w^{R_{c}}(j_{2})=w^{R_{b}}(j_{2}). Since (Ra){i,j1}=(Rc){i,j1}(R_{a})_{\{i,j_{1}\}}=(R_{c})_{\{i,j_{1}\}} and wRa(j1)=wRc(j1)w^{R_{a}}(j_{1})=w^{R_{c}}(j_{1}), using corollary 8, we should have wRa(i)=wRc(i)w^{R_{a}}(i)=w^{R_{c}}(i). Similarly, we have (Rb){i,j2}=(Rc){i,j2}(R_{b})_{\{i,j_{2}\}}=(R_{c})_{\{i,j_{2}\}} and wRb(j2)=wRc(j2)w^{R_{b}}(j_{2})=w^{R_{c}}(j_{2}). Therefore, we should have wRb(i)=wRc(i)w^{R_{b}}(i)=w^{R_{c}}(i). Hence, we have wRa(i)=wRb(i)w^{R_{a}}(i)=w^{R_{b}}(i).

By considering profiles of the form RXR\in\mathcal{R}_{X} with R1=r^R_{1}=\hat{r}, we can define the weight function w:X{1}×x¯++w:X\setminus\{1\}\times\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++} such that w(i,Ri)=wR(i)w(i,R_{i})=w^{R}(i) for all iX{1}i\in X\setminus\{1\}. By the result of Step 1, this function is well defined. Moreover, we define w(1,r^)=1w(1,\hat{r})=1.

At this point, for any preference profile RXR\in\mathcal{R}_{X} with R1=r^R_{1}=\hat{r} and for any coalition AXA\subseteq X, we have:

uH(f(RA))=iA(w(i,Ri)jAw(j,Rj))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w(i,R_{i})}{\sum\limits_{j\in A}w(j,R_{j})}\right)u_{H}(R_{i}).

Step 2: We need to define w(1,r)w(1,r) for all rx¯r\in\mathcal{R}_{\overline{x}}. We have already fixed the value w(1,r^)=1w(1,\hat{r})=1. For any rx¯{r^}r\in\mathcal{R}_{\overline{x}}\setminus\{\hat{r}\}, let RXR\in\mathcal{R}_{X} be a profile with R1=rR_{1}=r and R2=r^R_{2}=\hat{r}. By corollary 8, there should be a unique function wR:X++w^{R}:X\to\mathbb{R}_{++} with wR(2)=w(2,r^)w^{R}(2)=w(2,\hat{r}). We define w(1,r)=wR(1)w(1,r)=w^{R}(1).

Notice that for any two profile Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} with (Ra)1=(Rb)1=r(R_{a})_{1}=(R_{b})_{1}=r and (Ra)2=(Ra)2=r^(R_{a})_{2}=(R_{a})_{2}=\hat{r}, if we normalize the value of the wRa(2)=wRb(2)=w(2,r^)w^{R_{a}}(2)=w^{R_{b}}(2)=w(2,\hat{r}), then we should have wRa(1)=wRb(1)w^{R_{a}}(1)=w^{R_{b}}(1). Hence, the value w(1,r)w(1,r) is independent of the choice of the profile RR.

At this point the function w:X×x¯++w:X\times\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++} is fully defined. We only need to show that it works.

Step 3: Select any profile RXR\in\mathcal{R}_{X}. We need to show that the representation holds with the weight function defined above.

If R1=r^R_{1}=\hat{r}, by the result of Step 1 the representation holds. Hence, fix any r¯x¯\overline{r}\in\mathcal{R}_{\overline{x}} where r¯r^\overline{r}\neq\hat{r}. In the rest of the proof we show that the representation holds for any RXR\in\mathcal{R}_{X} with R1=r¯R_{1}=\overline{r}.

Similar to Step 1, using the corollary 8, for any profile RXR\in\mathcal{R}_{X} such that R1=r¯R_{1}=\overline{r}, we can uniquely define a weight function (depending on the full profile RR) wR(i):X++w^{\prime R}(i):X\to\mathbb{R}_{++} with wR(1)=w(1,r¯)w^{\prime R}(1)=w(1,\overline{r}), such that for any coalition AXA\subseteq X we have:

uH(f(RA))=iA(wR(i)jAwR(j))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w^{\prime R}(i)}{\sum\limits_{j\in A}w^{\prime R}(j)}\right)u_{H}(R_{i}).

In the same manner as Step 1, for any two profiles Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} with (Ra)1=(Rb)1=r¯(R_{a})_{1}=(R_{b})_{1}=\overline{r} and for every individual iXi\in X, we should have wRa(i)=wRb(i)w^{\prime R_{a}}(i)=w^{\prime R_{b}}(i). Hence, by considering profiles of the form RXR\in\mathcal{R}_{X} with R1=r¯R_{1}=\overline{r}, we can define the weight function w:X{1}×x¯++w^{\prime}:X\setminus\{1\}\times\mathcal{R}_{\overline{x}}\to\mathbb{R}_{++} such that for all iX{1}:w(i,Ri)=wR(i)i\in X\setminus\{1\}:\ w^{\prime}(i,R_{i})=w^{\prime R}(i). By the result of Step 1, this function is well defined. Moreover, we fix w(1,r¯)=w(1,r)w^{\prime}(1,\overline{r})=w(1,r).

For every preference profile RXR\in\mathcal{R}_{X} with R1=r¯R_{1}=\overline{r} and for every coalition AXA\subseteq X, we have:

uH(f(RA))=iA(w(i,Ri)jAw(j,Rj))uH(Ri).u_{H}(f(R_{A}))=\sum\limits_{i\in A}\left(\frac{w^{\prime}(i,R_{i})}{\sum\limits_{j\in A}w^{\prime}(j,R_{j})}\right)u_{H}(R_{i}).

To complete the proof, since we have w(1,r¯)=w(1,r¯)w(1,\overline{r})=w^{\prime}(1,\overline{r}) , it remains to show that for all iX{1}i\in X\setminus\{1\} and for all rx¯r\in\mathcal{R}_{\overline{x}} we have w(i,r)=w(i,r)w(i,r)=w^{\prime}(i,r).

Case 1: Since r^r¯\hat{r}\neq\overline{r} and w(1,r^)=w(1,r^)w(1,\hat{r})=w^{\prime}(1,\hat{r}), based on Step 2 we should have w(2,r^)=w(2,r^)w(2,\hat{r})=w^{\prime}(2,\hat{r}).

Case 2: Assume that rr^r\neq\hat{r} and iX{1,2}i\in X\setminus\{1,2\}. Since N5N\geq 5, based on definition of X\mathcal{R}_{X}, there exist Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} such that (Ra)1=r^,(Ra)2=r^,(Ra)i=r(R_{a})_{1}=\hat{r},(R_{a})_{2}=\hat{r},(R_{a})_{i}=r and (Rb)1=r¯,(Rb)2=r^,(Rb)i=r(R_{b})_{1}=\overline{r},(R_{b})_{2}=\hat{r},(R_{b})_{i}=r.

Since rr^r\neq\hat{r}, (Ra){2,i}=(Rb){2,i}(R_{a})_{\{2,i\}}=(R_{b})_{\{2,i\}}, and by Case 1 w(2,r^)=w(2,r^)w(2,\hat{r})=w^{\prime}(2,\hat{r}), then we should have w(i,r)=w(i,r)w(i,r)=w^{\prime}(i,r).

Case 3: Assume that r=r^r=\hat{r} and iX{1,2}i\in X\setminus\{1,2\}. Since N5N\geq 5, we can select an individual jX{1,2,i}j\in X\setminus{\{1,2,i\}}. Based on the definition of X\mathcal{R}_{X}, there exist Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} such that (Ra)1=r^,(Ra)i=r^,(Ra)j=r¯(R_{a})_{1}=\hat{r},(R_{a})_{i}=\hat{r},(R_{a})_{j}=\overline{r} and (Rb)1=r¯,(Rb)i=r^,(Rb)j=r¯(R_{b})_{1}=\overline{r},(R_{b})_{i}=\hat{r},(R_{b})_{j}=\overline{r}.

Since r=r^r¯r=\hat{r}\neq\overline{r}, (Ra){i,j}=(Rb){i,j}(R_{a})_{\{i,j\}}=(R_{b})_{\{i,j\}}, and by Case 2 w(j,r¯)=w(j,r¯)w(j,\overline{r})=w^{\prime}(j,\overline{r}), then we should have w(i,r)=w(i,r)w(i,r)=w^{\prime}(i,r).

Case 4: Finally, assume that i=2i=2 and rr^r\neq\hat{r}. Select an individual jX{1,2}j\in X\setminus{\{1,2\}}. We consider profiles Ra,RbXR_{a},R_{b}\in\mathcal{R}_{X} such that (Ra)1=r^,(Ra)2=r,(Ra)j=r^(R_{a})_{1}=\hat{r},(R_{a})_{2}=r,(R_{a})_{j}=\hat{r} and (Rb)1=r¯,(Rb)2=r,(Ra)j=r^(R_{b})_{1}=\overline{r},(R_{b})_{2}=r,(R_{a})_{j}=\hat{r}. By Case 3, we have w(i,r^)=w(i,r^)w(i,\hat{r})=w^{\prime}(i,\hat{r}). Hence, since rr^r\neq\hat{r} and (Ra){2,j}=(Rb){2,j}(R_{a})_{\{2,j\}}=(R_{b})_{\{2,j\}} we should have w(2,r)=w(2,r)w(2,r)=w^{\prime}(2,r).

The last observation completes the proof. ∎