Recovering utility

Christopher P. Chambers Department of Economics, Georgetown University , Federico Echenique Department of Economics, UC Berkeley and Nicolas S. Lambert Department of Economics, University of Southern California

Abstract.

We provide sufficient conditions under which a utility function may be recovered from a finite choice experiment. Identification, as is commonly understood in decision theory, is not enough. We provide a general recoverability result that is widely applicable to modern theories of choice under uncertainty. Key is to allow for a monetary environment, in which an objective notion of monotonicity is meaningful. In such environments, we show that subjective expected utility, as well as variational preferences, and other parametrizations of utilities over uncertain acts are recoverable. We also consider utility recovery in a statistical model with noise and random deviations from utility maximization.

Echenique thanks the National Science Foundation for its support through grant SES 1558757. Lambert gratefully acknowledges the financial support and hospitality of Microsoft Research New York and the Yale University Cowles Foundation.

1. Introduction

Economists are often interested in recovering preferences and utility functions from data on agents’ choices. If we are able to recover a utility function, then a preference relation is obviously implied, but the inverse procedure is more delicate. In this paper, we presume access to data on an agent’s choices, and that these describe the agent’s preferences (or that preferences have been obtained as the outcome of a statistical estimation procedure). Our results describe sufficient conditions under which one can recover, or learn, a utility function from the agents’ choices.

At a high level, the problem is that preferences essentially are choices, because they encode the choice that would be made from each binary choice problem. When we write $x\succ y$ we really mean that $x$ would be chosen from the set $\{x,y\}$ . Utility functions are much richer objects, and a given choice behavior may be described by many different utilities. For example, one utility can be used to discuss an agent’s risk preferences: they could have a “constant relative risk aversion” utility, for which a single parameter describes attitudes towards risk. But the same preferences can be represented by a utility that does not have such a convenient parametrization. So recovering, or learning, utilities present important challenges that go beyond the problem of recovering a preference. In the paper, we describe some simple examples that illustrate the challenges. Our main results describe when one may (non-parametrically) recover a utility representation from choice data.

We first consider choice under uncertainty. We adopt the standard (Anscombe-Aumann) setting of choice under uncertainty, and focus attention on a class of utility representations that has been extensively studied in the literature. Special cases include subjected expected utility, the max-min expected utility model of Gilboa and Schmeidler (1989), Choquet expected utility (Schmeidler, 1989), the variational preferences of Maccheroni, Marinacci, and Rustichini (2006), and many other popular models. Decision theorists usually place significance on the uniqueness of their utility representations, arguing that uniqueness provides an identification argument that allows for utility to be recovered from choice data. We argue, in contrast, that uniqueness of a utility representation is not enough to recover a utility from finite choice data.

Counterexamples are not hard to find. Indeed, even when a utility representation is unique, one may find a convergent sequence of utilities that is consistent with larger and larger finite datasets, but that does not converge to the utility function that generated the choices in the data, or to any utility to which it is equivalent. So uniqueness is necessary but not sufficient for a utility representation to be empirically tractable, in the sense of ensuring that a utility is recovered from large, but finite, choice experiments.

Our main results are positive, and exhibit sufficient conditions for utility recovery. Key to our results is the availability of an objective direction of improvements in utility: we focus our attention on models of monotone preferences. Our paper considers choices among monetary acts, meaning state-contingent monetary payoffs. For such acts, there is a natural notion of monotonicity. Between two acts, if one pays more in every state of the world, the agent agent should prefer it. As a discipline on the recovery exercise, this essential notion of monotonicity suffices to ensure that a sequence of utilities that explains the choices in the data converges to the utility function that generated the choices.

We proceed by first discussing the continuity of a utility function in its dependence on the underlying preference relation. If $U(\succeq,x)$ is a function of a preference $\succeq$ and of choice objects $x$ , then we say that it is a utility function if $x\mapsto U(\succeq,x)$ represents $\succeq$ . We draw on the existing literature (Theorem 1) to argue that such continuous utilities exist in very general circumstances. Continuity of this mapping in the preference ensures that if the choice data allow for preference recovery, they also allow a utility to be recovered. The drawback, however, of such general utility representation results is that they do not cover the special theories of utility in which economists generally take interest. There is no reason to expect that the utility $U(\succeq,x)$ coincides with the standard parametrizations of, for example, subjective expected utility or variational preferences.

We then go on to our main exercise, which constrains the environment to the Anscombe-Aumann setting, and considers utility representations that have received special attention in the theory of choice under uncertainty. We consider a setup that is flexible enough to accommodate most theories of choice under uncertainty that have been studied in the literature. Our main result (Theorem 2) says that, whenever a choice experiment succeeds in recovering agents’ underlying preferences, it also serves to recover a utility in the class of utilities of interest. For example, if an agent has subjective expected utility preferences, and these can be recovered from a choice experiment, then so can the parameters of the subjective expected utility representation: the agents’ beliefs and Bernoulli utility index. Or, if the agent has variational preferences that can be inferred from choice data, then so can the different components of the variational utility representation.

Actual data on choices may be subject to sampling noise, and agents who randomly deviate from their preferences. The results we have just mentioned are useful in such settings, once the randomness in preference estimates is taken into account. As a complement to our main findings, we proceed with a model that explicitly takes noisy choice, and randomness, into account. Specifically, we consider choice problems that are sampled at random, and an agent who may deviate from their preferences. They make mistakes. In such a setting, we present sufficient conditions for the consistency of utility function estimates (Theorem 3).

In the last part of the paper we take a step back and revisit the problem of preference recovery, with the goal of showing how data from a finite choice experiment can approximate a preference relation, and, in consequence, a utility function. Our model considers a large, but finite, number of binary choices. We show that when preferences are monotone, then preference recovery is possible (Theorem 5). In such environments, utility recovery follows for the models of choice under uncertainty that we have been interested in (Corollary 1).

Related literature.

The literature on revealed preference theory in economics is primarily devoted to tests for consistency with rational choice. The main result in the literature, Afriat’s theorem (Afriat, 1967a; Diewert, 1973; Varian, 1982), is in the context of standard demand theory (assuming linear budgets and a finite dataset). Versions of Afriat’s result have been obtained in a model with infinite data (Reny, 2015), nonlinear budget sets (e.g., Matzkin, 1991; Forges and Minelli, 2009), general choice problems (e.g., Chavas and Cox, 1993; Nishimura, Ok, and Quah, 2017), and multiperson equilibrium models (e.g., Brown and Matzkin, 1996; Carvajal, Deb, Fenske, and Quah, 2013). Algorithmic questions related to revealed preference are discussed by Echenique, Golovin, and Wierman (2011) and Camara (2022). The monograph by Chambers and Echenique (2016) presents an overview of results.

The revealed preference literature is primarily concerned with describing the datasets that are consistent with the theory, not with recovering or learning a preference, or a utility. In the context of demand theory and choice from linear budgets, Mas-Colell (1978) introduces sufficient conditions under which a preference relation is recovered, in the limit, from a sequence of ever richer demand data observations. More recently, Forges and Minelli (2009) derive the analog of Mas-Colell’s results for nonlinear budget sets. An important strand of literature focuses on non-parametric econometric estimation methods applied to demand theory data: Blundell, Browning, and Crawford (2003, 2008) propose statistical tests for revealed preference data, and consider counterfactual bounds on demand changes.

The problem of preference and utility recovery has been studied from the perspective of statistical learning theory. Beigman and Vohra (2006) considers the problem of learning a demand function within the PAC paradigm, which is closely related to the exercise we perform in Section 4. A key difference is that we work with data on pairwise choices, which are common in experimental settings (including in many recent large-scale online experiments). Zadimoghaddam and Roth (2012) look at the utility recovery problem, as in Beigman and Vohra (2006), but instead of learning a demand function they want to understand when a utility can be learned efficiently. Balcan, Daniely, Mehta, Urner, and Vazirani (2014) follow up on this important work by providing sample complexity guarantees, while Ugarte (2022) considers the problem of recovery of preferences under noisy choice data, as in our paper, but within the demand theory framework. Similarly, the early work of Balcan, Constantin, Iwata, and Wang (2012) considers a PAC learning question, focusing on important sub-classes of valuations in economics. Bei, Chen, Garg, Hoefer, and Sun (2016) pursues the problem assuming that a seller proposes budgets with the objective of learning an agent’s utility (they focus on quasilinear utility, and a seller that obtains aggregate demand data). Zhang and Conitzer (2020) considers this problem under an active-learning paradigm, and contrasts with the PAC sample complexity.

In all, these works are important precedents for our paper, but they are all within the demand theory setting. The results do not port to other environments, such as, for example, binary choice under risk or uncertainty. The closest paper to ours is Chambers, Echenique, and Lambert (2021), which looks at a host of related questions to our paper but focusing on preference, not utility, recovery. The work by Chambers, Echenique, and Lambert considers choices from binary choice problem, but does not address the question of recovering, or learning, a utility function. As we explain below in the paper, the problem for utilities is more delicate than the problem for preferences. In this line of work, Chase and Prasad (2019) obtains important results on learning a utility but restricted to settings of intertemporal choice. The work by Basu and Echenique (2020) looks at learnability of utility functions (within the PAC learning paradigm), but focusing on particular models of choice under uncertainty. Some of our results rely on measures of the richness of a theory, or of a family of preferences, which is discussed by Basu and Echenique (2020) and Fudenberg, Gao, and Liang (2021): the former by estimating the VC dimension of theories of choice under uncertainty, and the latter by proposing and analyzing new measures of richness that are well-suited for economics, as well as implementing them one economic datasets.

Finally, it is worth mentioning that preference and utilty recovery is potentially subject to to strategic manipulations, as emphasized by Dong, Roth, Schutzman, Waggoner, and Wu (2018) and Echenique and Prasad (2020). This possibility is ignored in our work.

2. The Question

We want to understand when utilities can be recovered from data on an agent’s choices. Consider an agent with a utility function $u$ . We want know when, given enough data on the agent’s choices, we can “estimate” or “recover” a utility function that is guaranteed to be close to $u$ .

In statistical terminology, recovery is analogous to the consistency of an estimator, and approximation guarantees are analogous to learnability. Imagine a dataset of size $k$ , obtained from an incentivized experiment with $k$ different choice problems.¹¹1Such datasets are common in experimental economics, including cases with very large $k$ . See, for example, von Gaudecker, van Soest, and Wengstrom (2011), Chapman, Dean, Ortoleva, Snowberg, and Camerer (2017), Chapman, Dean, Ortoleva, Snowberg, and Camerer (2022) and Falk, Becker, Dohmen, Enke, Huffman, and Sunde (2018). One can also apply our results to roll call data from congress, as in Poole and Rosenthal (1985) or Clinton, Jackman, and Rivers (2004). Large-scale A/B testing by tech firms may provide further examples (albeit involving proprietary datasets). The observed choice behavior in the data may be described by a preference $\succeq^{k}$ , which is associated with a utility function $u^{k}$ . The preference $\succeq^{k}$ could be a rationalizing preference, or a preference estimate. So we choose a utility representation for $u^{k}$ . The recovery, or consistency, property is that $u^{k}\to u$ as $k\to\infty$ .

Suppose that the utility $u$ represents preferences $\succeq$ , which summarize the agent’s full choice behavior. Clearly, unless $\succeq^{k}\to\succeq$ , the exercise is hopeless. So our first order of business is to understand when $\succeq^{k}\to\succeq$ is enough to ensure that $u^{k}\to u$ . In other words, we want to understand when recovering preferences is sufficient for recovering utilities. To this end, our main results are in Section 3.4. In recovering a utility, we are interested in particular parametric representations. In choice over uncertainty, for example, one may be interested in measures of risk-attitudes, or uncertainty aversion. It is key then that the utility recovery exercises preserves the aspects of utility that allow such measures to be have meaning. If, say, preferences have the “constant relative risk aversion” (CRRA) form, then we want to recover the Arrow-Pratt measure of risk aversion.

Our data is presumably obtained in an experimental setting, where an agent’s behavior may be recorded with errors; o in which the agent may randomly deviate from their underlying preference $\succeq$ . Despite such errors, with high probability, “on the sample path,” we should obtain that $\succeq^{k}\to\succeq$ . In our paper we uncover situations where this convergence leads to utility recovery. Indeed, the results in Section 3.4 and 3.5 may be applied to say that, in many popular models in decision theory, when $\succeq^{k}\to\succeq$ (with high probability), then the resulting utility representations enable utility recovery (with high probability).

The next step is to discuss learning and sample complexity. Here we need to explicitly account for randomness and errors. We lay out a model of random choice, with random sampling of choice problems and errors in agents’ choices. The errors may take a very general form, as long as random choices are more likely to go in the direction of preferences than against it (if $x\succ y$ then $x$ is the more likely choice from the choice problem $\{x,y\}$ ), and that this likelihood ratio remains bounded away from one. Contrast with the standard theory of discrete choice, where the randomness usually is taken to be additive, and independent of the particular pair of alternatives that are being compared.

Here we consider a formal statistical consistency problem, and exhibit situations where utility recovery is feasible. We use ideas from the literature on PAC learning to provide formal finite sample-size bounds for each desired approximation guarantee. See Section 4.

3. The Model

3.1. Basic definitions and notational conventions

Let $X$ be a set. Given a binary relation $R\subseteq X\times X$ , we write $x\mathrel{R}y$ when $(x,y)\in R$ . A binary relation that is complete and transitive is called a weak order. If $X$ is a topological space, then we say that $R$ is continuous if $R$ is closed as a subset of $X\times X$ (see, for example, Bergstrom, Parks, and Rader, 1976). A preference relation is a weak order that is also continuous.

A preference relation $\succeq$ is locally strict if, for all $x,y\in X$ , $x\succeq y$ implies that for each neighborhood $U$ of $(x,y)$ , there is $(x^{\prime},y^{\prime})\in U$ with $x\succ y$ . The notion of local strictness was first introduced by Border and Segal (1994) as a generalization of the property of being locally non-satiated from consumer theory.

If $\succeq$ is a preference on $X$ and $u:X\to\mathbf{R}$ is a function for which $x\succeq y$ if and only if $u(x)\geq u(y)$ then we say that $u$ is a representation of $\succeq$ , or that $u$ is a utility function for $\succeq$ .

If $A\subseteq\mathbf{R}^{d}$ is a Borel set, we write $\Delta(A)$ for the set of all Borel probability measures on $A$ . We endow $\Delta(A)$ with the weak* topology. If $S$ is a finite set, then we topologize $\Delta(A)^{S}$ with the product topology.

For $p,q\in\Delta(A)$ , we say that $p$ is larger than $q$ in the sense of first-order stochastic dominance if $\int_{A}fdx\geq\int_{A}fdy$ for all monotone increasing, continuous and bounded functions $f$ on $A$ .

3.2. Topologies on preferences and utilities.

The set of preferences over $X$ , when $X$ is a topological space, is endowed with the topology of closed convergence. The space of corresponding utility representations is endowed with the compact-open topology. These are the standard topologies for preferences and utilities, used in prior work in mathematical economics. See, for example, Hildenbrand (1970), Kannai (1970), and Mas-Colell (1974). Here we offer definitions and a brief discussion of our choice of topology.

Let $X$ be a topological space, and $\mathcal{F}=\{F^{n}\}_{n}$ be a sequence of closed sets in $X\times X$ (with the product topology). We define $\textrm{Li}(\mathcal{F})$ and $\textrm{Ls}(\mathcal{F})$ to be closed subsets of $X\times X$ as follows:

•

$(x,y)\in\textrm{Li}(\mathcal{F})$ if and only if, for all neighborhoods $V$ of $(x,y)$ , there exists $N\in\mathbf{N}$ such that $F^{n}\cap V\neq\varnothing$ for all $n\geq N$ .
•

$(x,y)\in\textrm{Ls}(\mathcal{F})$ if and only if, for all neighborhoods $V$ of $(x,y)$ , and all $N\in\mathbf{N}$ , there is $n\geq N$ such that $F^{n}\cap V\neq\varnothing$ .

Observe that $\textrm{Li}(\mathcal{F})\subseteq\textrm{Ls}(\mathcal{F})$ . The definition of closed convergence is as follows.

Definition 1.

$F^{n}$ converges to $F$ in the topology of closed convergence if $\textrm{Li}(\mathcal{F})=F=\textrm{Ls}(\mathcal{F})$ .

Closed convergence captures the property that agents with similar preferences should have similar choice behavior—a property that is necessary to be able to learn the preference from finite data. Specifically, if $X\subseteq\mathbf{R}^{n}$ , and $\mathcal{P}$ is the set of all locally strict and continuous preferences on $X$ , then the topology of closed convergence is the smallest topology on $\mathcal{P}$ for which the sets

\{(x,y,\succeq):x\succ y\}\subseteq X\times X\times\mathcal{P}

are open.²²2See Kannai (1970) and Hildenbrand (1970) for a discussion; a proof of this claim is available from the authors upon request. In words: suppose that $x\succ y$ , then for $x^{\prime}$ close to $x$ , $y^{\prime}$ close to $y$ , and $\succeq^{\prime}$ close to $\succeq$ , we obtain that $x^{\prime}\succ^{\prime}y^{\prime}$ .

For utility functions, we adopt the compact-open topology, which we also claim is a natural choice of topology. The compact-open topology is characterized by the convergence criterion of uniform convergence on compact sets. The reason it is natural for utility functions is that a utility usually has two arguments: one is the object being “consumed” (a lottery, for example) and the other is the ordinal preference that utility is meant to represent. (The preference argument is usually implicit, but of course it remains a key aspect of the exercise.) Now an analyst wants the utility to be “jointly continuous,” or continuous in both of its arguments. For such a purpose, the natural topology on the set of utilities, when they are viewed solely as functions of consumption, is indeed the compact-open topology. More formally, consider the following result, originally due to Mas-Colell (1977).³³3Levin (1983) provides a generalization to incomplete preferences.

Theorem 1.

Let $X$ be a locally compact Polish space, and $\mathcal{P}$ the space of all continuous preferences on $X$ endowed with the topology of closed convergence. Then there exists a continuous function $U:\mathcal{P}\times X\to[0,1]$ so that $x\mapsto U(\succeq,x)$ represents $\succeq$ .

We may view the map $U$ as a mapping from $\succeq$ to the space of utility functions. Then continuity of this induced mapping is equivalent to the joint continuity result discussed in Theorem 1, as long as we impose the compact-open topology on the space of utility functions (see Fox (1945)).

3.3. The model

As laid our in Section 2, we want to understand when we may conclude that $u^{k}\to u$ from knowing that $\succeq^{k}\to\succeq$ . Mas-Colell’s theorem (Theorem 1) provides general conditions under which there exists one utility representation that has the requisite convergence property, but he is clear about the practical limitations of his result: “There is probably not a simple constructive (“canonical”) method to find a $U$ function.” In contrast, economists are generally interested in specific parameterizations of utility.

For example, if an agent has subjective expected-utility preferences, economists want to estimate beliefs and a von-Neumann-Morgenstern index; not some arbitrary representation of the agent’s preferences. Or, if the data involve intertemporal choices, and the agent discounts utility exponentially, then an economist will want to estimate their discount factor. Such specific parameterizations of utility are not meaningful in the context of Theorem 1.

The following (trivial) example shows that there is indeed a problem to be studied. Convergence of arbitrary utility representations to the correct limit is not guaranteed, even when recovered utilities form a convergent sequence, and recovered preferences converge to the correct limit.

Example 1.

Consider expected-utility preferences on $\Delta(K)^{S}$ , where $K$ is a compact space, $S$ a finite set of states, and $\Delta^{S}(K)$ is the set of Anscombe-Aumann acts. Fix an affine function $v:\Delta(K)\to\mathbf{R}$ , a prior $\mu\in\Delta(S)$ , and consider the preference $\succeq$ with representation $\int_{S}v(f(s))\mathop{}\!\mathrm{d}\mu(s)$ .

Now if we set $\succeq^{k}=\succeq$ then $\succeq^{k}\to\succeq$ holds trivially. However, it is possible to choose an expected utility representation $\int_{S}v^{k}(f(s))\mathop{}\!\mathrm{d}\mu^{k}(s)$ that does not converge to a utility representation (of any kind) for $\succeq$ . In fact one could choose a $\mu^{k}$ and a “normalization” for $v^{k}$ , for example $\|v^{k}\|=1$ (imagine for concreteness that $K$ is finite, and use the Euclidean norm for $v^{k}$ ). Specifically, choose scalars $\beta^{k}$ with $\|\beta^{k}+\frac{1}{k}v\|=1$ . Then the utility $f\mapsto\int_{S}v^{k}(f(s))\mathop{}\!\mathrm{d}\mu(s)$ represents $\succeq^{k}$ and converges to a constant function.

The punchline is that the limiting utility represents the preference that exhibits complete indifference among all acts. This is true, no matter what the original preference $\succeq$ was.

In the example, we have imposed some discipline on the representation. Given that the utility converges to a constant, the discipline we have chosen is a particular normalization of the utility representations (their norm is constant). The normalization just makes the construction of the example slightly more challenging, and reflects perhaps the most basic care that an analyst could impose on the recovery exercise.

3.4. Anscombe-Aumann acts

We present our first main result in the context of Anscombe-Aumann acts, the workhorse model of the modern theory of decisions under uncertainty. Let $S$ be a finite set of states of the world, and fix a closed interval of the real line $[a,b]\subseteq\mathbf{R}$ . An act is a function $f:S\to\Delta([a,b])$ . We interpret the elements of $\Delta([a,b])$ as monetary lotteries, so that acts are state-contingent monetary lotteries. The set of all acts is $\Delta([a,b])^{S}$ . When $p\in\Delta([a,b])$ , we denote the constant act that is identically equal to $p$ by $(p,\ldots,p)$ ; or sometimes by $p$ for short.

Note that we do not work with abstract, general, Anscombe-Aumann acts, but in assuming monetary lotteries we impose a particular structure on the objective lotteries in our Anscombe-Aumann framework. The reason is that our theory necessitates a certain known and objective direction of preference. Certain preference comparisons must be known a priori: monotonicity of preference will do the job, but for monotonicity to be objective we need the structure of monetary lotteries.

An act $f$ dominates an act $g$ if, for all $s\in S$ , $f(s)$ first-order stochastic dominates $g(s)$ . And $f$ strictly dominates $g$ if, for all $s\in S$ , $f(s)$ strictly first-order stochastic dominates $g(s)$ . A preference $\succeq$ over acts is weakly monotone if $f\succeq g$ whenever $f$ first-order stochastic dominates $g$ .

Let $U$ be the set of all continuous and monotone weakly increasing functions $u:[a,b]\to\mathbf{R}$ with $u(a)=0$ and $u(b)=1$ . A pair $(V,u)$ is a standard representation if $V:\Delta([a,b])^{S}\to\mathbf{R}$ and $u\in U$ are continuous functions such that $v(p,\ldots,p)=\int_{[a,b]}u\mathop{}\!\mathrm{d}p$ , for all constant acts $(p,\ldots,p)$ . Moreover, we say that a standard representation $(V,u)$ is aggregative if there is an aggregator $H:[0,1]^{S}\to\mathbf{R}$ with $V(f)=H((\int u\mathop{}\!\mathrm{d}f(s))_{s\in S})$ for $f\in\Delta([a,b])^{S}$ . An aggregative representation with aggregator $H$ is denoted by $(V,u,H)$ . Observe that a standard representation rules out total indifference.

A preference $\succeq$ on $\Delta([a,b])^{S}$ is standard if it is weakly monotone, and there is a standard representation $(V,u)$ in which $V$ represents $\succeq$ . Roughly, standard preferences will be those that satisfy the expected utility axioms across constant acts, and are monotone with respect to the (statewise) first order stochastic dominance relation. Aggregative preferences will additionally satisfy an analogue of Savage’s P3 or the Anscombe-Aumann notion of monotonicity.

Example 2.

Variational preferences (Maccheroni, Marinacci, and Rustichini, 2006) are standard and aggregative.⁴⁴4Variational preferences are widely used in macroeconomics and finance to capture decision makers’ concerns for using a misspecified model. Here it is important to recover the different components of a representation, $v$ and $c$ , because they quantify key features of the environment. See for example Hansen and Sargent (2001); Hansen, Sargent, Turmuhambetova, and Williams (2006); Hansen and Sargent (2022). Let

V(f)=\inf\{\int v(f(s))d\pi(s)+c(\pi):\pi\in\Delta(S)\}

where

(1)

$v:\Delta([a,b])\to\mathbf{R}$ is continuous and affine.
(2)

$c:\Delta(S)\to[0,\infty]$ is lower semicontinuous, convex and grounded (meaning that $\inf\{c(\pi):\pi\in\Delta(S)\}=0$ ).

Note that $V(p,\ldots,p)=v(p)+\inf\{c(\pi):\pi\in\Delta(S)\}=\int u\mathop{}\!\mathrm{d}p$ , by the assumption that $c$ is grounded, and where the existence of $u:[a,b]\to\mathbf{R}$ so that $v(p)=\int u\mathop{}\!\mathrm{d}p$ is an instance of the Riesz representation theorem. It is clear that we may choose $u\in U$ . So $(V,u)$ is a standard representation.

Letting $H:[0,1]^{S}\to\mathbf{R}$ be defined by $H(x)=\inf\{\sum_{s\in S}x(s)\pi(s)+c(\pi):\pi\in\Delta(S)\}$ , we see that indeed $(V,u,H)$ is also an aggregative representation of these preferences.

Some other examples of aggregative preferences include special cases of the variational model Gilboa and Schmeidler (1989), as well as generalizations of it, Cerreia-Vioglio, Maccheroni, Marinacci, and Montrucchio (2011); Chandrasekher, Frick, Iijima, and Le Yaouanq (2021), and others which are not comparable Schmeidler (1989); Chateauneuf, Grabisch, and Rico (2008); Chateauneuf and Faro (2009).⁵⁵5A class of variational preferences that are of particular interest to computer scientists are preferences with a max-min representation (Gilboa and Schmeidler, 1989). These evaluate acts by $V(f)=\inf\{\int v(f(s))d\pi(s):\pi\in\Pi\},$ with $\Pi\subseteq\Delta(S)$ a closed and convex set. Here $c$ is the indicator function of $\Pi$ (as defined in convex analysis).

Theorem 2.

Let $\succeq$ be a standard preference with standard representation $(V,u)$ , and $\{\succeq^{k}\}$ a sequence of standard preferences, each with a standard representation $(V^{k},u^{k})$ .

(1)

If $\succeq^{k}\to\succeq$ , then $(V^{k},u^{k})\to(V,u)$ .
(2)

If, in addition, these preferences are aggregative with representations $(V^{k},u^{k},H^{k})$ and $(V,u,H)$ , then $H^{k}\to H$ .

In terms of interpretation, Theorem 2 suggests that, as preferences converge, risk-attitudes, or von Neumann morgenstern utility indices also converge in a pointwise sense. The aggregative part claims that we can study the convergence of risk attitudes and the convergence of the aggregator controlling for risk separately. So, for example, in the multiple priors case, two decision makers whose preferences are close will have similar sets of priors.

3.5. Preferences over lotteries and certainty equivalents

In this section, we focus on a canonical representation for preferences over lotteries: the certainty equivalent. There are many models of preferences over lotteries, but we have in mind in particular Cerreia-Vioglio, Dillenberger, and Ortoleva (2015), whereby a preference representation over lotteries is given by $U(p)=\inf_{u\in\mathcal{U}}u^{-1}(\int udp)$ ; a minimum over a set of certainty equivalents for expected utility maximizers. Key is that for this representation, and any degenerate lottery $\delta_{x}$ , $U(\delta_{x})=x$ .

Let $[a,b]\subset\mathbf{R}$ , where $a<b$ , be an interval in the real line and consider $\Delta([a,b])$ . Say that $\succeq$ on $\Delta([a,b])$ is certainty monotone if when ever $p$ first order stochastically dominates $q$ , then $p\succeq q$ , and for all $x,y\in[a,b]$ for which $x>y$ , $\delta_{x}\succ\delta_{y}$ . Any certainty monotone continuous preference $\succeq$ and any lottery $p\in\Delta([a,b])$ then possesses a unique certainty equivalent $x\in[0,1]$ , satisfying $\delta_{x}\sim p$ . To this end, we define $\mbox{ce}(\succeq,p)$ to be the certainty equivalent of $p$ for $\succeq$ . It is clear that, fixing $\succeq$ , $\mbox{ce}(\cdot,\succeq)$ is a continuous utility representation of $\succeq$ .

Proposition 1.

Let $\succeq$ be a certainty monotone preference and let $p\in\Delta([a,b])$ . Let $\{\succeq^{k}\}$ be a sequence of certainty monotone preferences and let $p^{k}$ be a sequence in $\Delta([a,b])$ . If $(\succeq^{k},p^{k})\rightarrow(\succeq,p)$ , then $\mbox{ce}(\succeq^{k},p^{k})\rightarrow\mbox{ce}(\succeq,p)$ .

To this end, the map carrying each preference to its certainty equivalent representation is a continuous map in the topology of closed convergence.

4. Utility recovery with noisy choice data

We develop a model of noisy choice data, and consider when utility may be recovered from a traditional estimation procedure. Recovery here takes the form of an explicit consistency result, together with sample complexity bounds in a PAC learning framework.

The focus is on the Wald representation, analogous to the certainty equivalent we considered in Section 3.5. When choosing among vectors in $x\in\mathbf{R}^{d}$ , the Wald representation is $u(x)\in\mathbf{R}$ so that

x\sim(u(x),\ldots,u(x)).

If the choice space is well behaved, a Wald representation exists for any monotone and continuous preference relation. To this end, we move beyond the Anscombe-Aumann setting that we considered above, but it should be clear that some versions of Anscombe-Aumann can be accommodated within the assumptions of this section.

Our main results for the model that explicitly accounts for noisy choice data assumes Wald representations that are either Lipschitz or homogeneous (meaning that preferences are homothetic).

4.1. Noisy choice data

The primitives of our noisy choice model are collected in the tuple $(X,\mathcal{P},\lambda,q)$ , where:

•

$X\subseteq\mathbf{R}^{d}$ is the ambient choice, or consumption, space. The set $X$ is endowed with the (relative) topology inherited from $\mathbf{R}^{d}$ .
•

$\mathcal{P}$ is a class of continuous and locally strict preferences on $X$ . The class comes with a set of utility functions $\mathcal{U}$ , so that each element of $\mathcal{P}$ has a utility representation in the set $\mathcal{U}$ .
•

$\lambda$ is a probability measure on $X$ , assumed to be absolutely continuous with respect to Lebesgue measure. We also assume that $\lambda\geq c\,\mathrm{Leb}$ , where $c>0$ is a constant and Leb denotes Lebesgue measure.
•

$q:X\times X\times\mathcal{P}\to[0,1]$ is a random choice function, so $q(x,y;\succeq)$ is the probability that an agent with preferences $\succeq$ chooses $x$ over $y$ . Assume that if $x\succ y$ , then $x$ is chosen with probability $q(x,y;\succeq)>1/2$ and $y$ with probability $q(y,x;\succeq^{*})=1-q(x,y;\succeq)$ . If $x\sim y$ then $x$ and $y$ are chosen with equal probability.
•

We shall assume that the error probability $q$ satisfies that

$\Theta\equiv\inf\{q(\succeq,(x,y)):x\succ y\text{ and }\succeq\in\mathcal{P}\}>\frac{1}{2}.$

The tuple $(X,\mathcal{P},\lambda,q)$ describes a data-generating process for noisy choice data. Fix a sample size $n$ and consider an agent with preference $\succeq^{*}\in\mathcal{P}$ . A sequence of choice problems $\{x_{i},y_{i}\}$ , $1\leq i\leq n$ are obtained by drawing $x_{i}$ and $y_{i}$ from $X$ , independently, according to the law $\lambda$ . Then a choice is made from each problem $\{x_{i},y_{i}\}$ according to $q(\cdot,\cdot;\succeq^{*})$ .

Observe that our assumptions on $q$ are mild. We allow errors to depend on the pair $\{x,y\}$ under consideration, almost arbitrarily. The only requirement is that one is more likely to choose according to one’s preference than to go against them, as well as the more technical assumptions of measurability and a control on how large the deviation from $1/2$ - $1/2$ choice may get.

To keep track of the chosen alternative, we order the elements of each problem so that $(x_{i},y_{i})$ means that $x_{i}$ was chosen from the choice problem $\{x_{i},y_{i}\}$ . So a sample of size $n$ is $\{(x_{1},y_{1}),\ldots,(x_{n},y_{n})\}$ , consisting of $2n$ iid draws from $X\times X$ according to our stochastic choice model: in the $i$ th draw, the choice problem was $\{x_{i},y_{i}\}$ and $x_{i}$ was chosen.

A utility function $u_{n}\in\mathcal{U}$ is chosen to maximize the number of rationalized choices in the data. So $u_{n}$ maximizes $\sum_{i=1}^{n}\mathbf{1}_{u(x_{i})\geq u(y_{i})}$ . The space of utility functions is endowed with a metric, $\rho$ . In this section, all we ask of $\rho$ is that, for any $u,u^{\prime}\in\mathcal{U}$ , there is $x\in X$ with $\left|u(x)-u^{\prime}(x)\right|\geq\rho(u,u^{\prime})$ . For example, we could use the sup norm for the purposes of any of the results in this section.

4.1.1. Lipschitz utilities

One set of sufficient conditions will need the family of relevant utility representations to satisfy a Lipschitz property with a common Lipschitz bound. The representations are of the Wald kind, as in Section 3.5. We now add the requirement of having the Lipschitz property, which allows us to connect differences in utility functions to quantifiable observable (but noisy) choice behavior. The main idea is expressed in Lemma 4 of Section 6.

We say that $(X,\mathcal{P},\lambda,q)$ is a Lipschitz environment if:

(1)

$X\subseteq\mathbf{R}^{d}$ is convex, compact, and has nonempty interior.
(2)

Each preference $\succeq\in\mathcal{P}$ has a Wald utility representation $u_{\succeq}:X\to\mathbf{R}$ so that $x\sim u_{\succeq}(x)\mathbf{1}$ .
(3)

All utilities in $\mathcal{U}$ are Lipschitz, and admit a common Lipschitz constant $\kappa$ . So, for any $x,x^{\prime}\in X$ and $u\in\mathcal{U}$ , $|u(x)-u(x^{\prime})|\leq\kappa\|x-x^{\prime}\|$ .

4.1.2. Homothetic preferences

The second set of sufficient conditions involve homothetic preferences. It turns out, in this case, that the Wald representations have a homogeneity property, and this allows us to connect differences in utilities to a probability of detecting such differences. The key insights is contained in Lemma 5 of Section 6.

We employ the following auxiliary notation. $S^{M}_{\alpha}=\{x\in\mathbf{R}^{d}:\|x\|=M\text{ and }x\geq\alpha\mathbf{1}\}$ and $D^{M}_{\alpha}=\{\theta x:x\in S^{M}_{\alpha}\text{ and }\theta\in[0,1]\}$ .

We say that $(X,\mathcal{P},\lambda,q)$ is a homothetic environment if:

(1)

$X=D^{M}_{\alpha}$ for some (small) $\alpha>0$ and (large) $M>0$ .
(2)

$\mathcal{P}$ is a class of continuous, monotone, homothetic, and complete preferences on $X\subseteq\mathbf{R}^{d}$ .
(3)

$\mathcal{U}$ is a class of Wald representations, so that for each $\succeq\in\mathcal{P}$ there is a utility function $u\in\mathcal{U}$ with $x\sim u(x)\mathbf{1}$ .

Remark: if $u\in U$ is the Wald representation of $\succeq$ , then $u$ is homogeneous of degree one because $x\sim u(x)\mathbf{1}$ iff $\lambda x\sim\lambda u(x)\mathbf{1}$ , so $u(\lambda x)=\lambda u(x)$ .

4.1.3. VC dimension

The Vapnik-Chervonenkis (VC) dimension of a set $\mathcal{P}$ of preferences is the largest sample size $n$ for which there exists a utility $u\in\mathcal{U}$ that perfectly rationalizes all the choices in the data, no matter what those are. That is so that $n=\sum_{i=1}^{n}\mathbf{1}_{u(x_{i})\geq u(y_{i})}$ for any dataset $(x_{i},y_{i})_{i=1}^{n}$ of size $n$ .

VC dimension is a basic ingredient in the standard PAC learning paradigm. It is a measure of the complexity of a theory used in machine learning, and lies behind standard results on uniform laws of large numbers (see, for example, Boucheron, Bousquet, and Lugosi (2005)). Applications of VC to decision theory can be found in Basu and Echenique (2020) and Chambers, Echenique, and Lambert (2021).

It is worth noting that VC dimension is used in classification tasks. It may not be obvious, but when it comes to preferences, our exercise may be thought of as classification. For each pair of alternatives $x$ and $y$ , a preference $\succeq$ “classifies” the pair as $x\succeq y$ or $y\succ x$ . Then we can think of preference recovery as a problem of learning a classifier within the class $\mathcal{P}$ .

4.2. Consistency and sample complexity

Theorem 3.

Consider a noisy choice environment $(X,\mathcal{P},\lambda,q)$ that is either a homothetic or a Lipschitz environment. Suppose that $u^{*}\in\mathcal{U}$ is the Wald utility representation of $\succeq^{*}\in\mathcal{P}$ .

(1)

The estimates $u_{n}$ converge to $u^{*}$ in probability.
(2)

There are constants $K$ and $\bar{C}$ so that, for any $\delta\in(0,1)$ and $n$ , with probability at least $1-\delta$ ,

$\rho(u_{n},u^{*})\leq\bar{C}\left(K\sqrt{V/n}+\sqrt{2\ln(1/\delta)/n}\right)^{1/D},$

where $V$ is the VC dimension of $\mathcal{P}$ , $D=d$ when the environment is Lipschitz and $D=2d$ when it is homothetic.

Of course, the second statement in the theorem is only meaningful when the VC dimension of $\mathcal{P}$ is finite. The constants $K$ and $\bar{C}$ depend on the primitives in the environment, but not on preferences, utilities, or sample sizes.

5. Recovering preferences and utilities

The discussion in Section 3.4 focused on utility recovery, taking convergence of preferences as given. Here we take a step back, provide some conditions for preference recovery that are particularly relevant for the setting of Section 3.4, and then connect these back to utility recovery in Corollary 1. First we describe an experimental setting in which preferences may be elicited: an agent, or subject, faces a sequence of (incentivized) choice problems, and the choices made produce data on his preferences. The specific model and description below is borrowed from Chambers, Echenique, and Lambert (2021), but the setting is completely standard in choice theory.

Let $X=\Delta([a,b])^{S}$ be the set of acts over monetary lotteries, as discussed in Section 3.4. A choice function is a pair $(\Sigma,c)$ with $\Sigma\subseteq 2^{X}\setminus\{\varnothing\}$ a collection of nonempty subsets of $X$ , and $c:\Sigma\to 2^{X}$ with $\varnothing\neq c(A)\subseteq A$ for all $A\in\Sigma$ . When $\Sigma$ , the domain of $c$ , is implied, we refer to $c$ as a choice function.

A choice function $(\Sigma,c)$ is generated by a preference relation $\succeq$ over $X$ if

c(A)=\{x\in A:x\succeq y\text{ for all }y\in B\},

for all $A\in\Sigma$ .

The notation $(\Sigma,c_{\succeq})$ means that the choice function $(\Sigma,c_{\succeq})$ is generated by the preference relation $\succeq$ on $X$ .

Our model features an experimenter (a female) and a subject (a male). The subject chooses among alternatives in a way described by a preference $\succeq^{*}$ over $X$ , which we refer to as data-generating preference. The experimenter seeks to infer $\succeq^{*}$ from the subject’s choices in a finite experiment.

In a finite experiment, the subject is presented with finitely many unordered pairs of alternatives $B_{k}=\{x_{k},y_{k}\}$ in $X$ . For every pair $B_{k}$ , the subject is asked to choose one of the two alternatives: $x_{k}$ or $y_{k}$ .

A sequence of experiments is a collection $\Sigma_{\infty}=\{B_{i}\}_{i\in\mathbf{N}}$ of pairs of possible choices presented to the subject. Let $\Sigma_{k}=\{B_{1},\dots,B_{k}\}$ collect the first $k$ elements of a sequence of experiments, and $B=\cup_{k=1}^{\infty}B_{k}$ be the set of all alternatives that are used over all the experiments in a sequence. Here $\Sigma_{k}$ is a finite experiment of size $k$ .

We make two assumptions on $\Sigma_{\infty}$ . The first is that $B$ is dense in $X$ . The second is that, for any $x,y\in B$ there is $k$ for which $B_{k}=\{x,y\}$ . The first assumption is obviously needed to obtain any general preference recovery result. The second assumption means that the experimenter is able to elicit the subject’s choices over all pairs used in her experiment.⁶⁶6If there is a countable dense $A\subseteq X$ , then one can always construct such a sequence of experiments via a standard diagonalization argument.

For each $k$ , the subject’s preference $\succeq^{*}$ generates a choice function $(\Sigma_{k},c)$ by letting, for each $B_{i}\in\Sigma_{k}$ , $c(B)$ be a maximal element of $B_{i}$ according to $\succeq^{*}$ . Thus the choice behavior observed by the experimenter is always consistent with $(\Sigma_{k},c_{\succeq^{*}})$ .

We introduce two notions of rationalization: weak and strong. A preference $\succeq_{k}$ weakly rationalizes $(\Sigma_{k},c)$ if, for all $B_{i}\in\Sigma_{k}$ , $c(B_{i})\subseteq c_{\succeq_{k}}(B_{i})$ . A preference $\succeq_{k}$ weakly rationalizes a choice sequence $(\Sigma_{\infty},c)$ if it rationalizes the choice function of order $k$ $(\Sigma_{k},c)$ , for all $k\geq 1$ .

A preference $\succeq_{k}$ strongly rationalizes $(\Sigma_{k},c)$ if, for all $B_{i}\in\Sigma_{k}$ , $c(B_{i})=c_{\succeq_{k}}(B_{i})$ . A preference $\succeq_{k}$ strongly rationalizes a choice sequence $(\Sigma_{\infty},c)$ if it rationalizes the choice function of order $k$ $(\Sigma_{k},c)$ , for all $k\geq 1$ .

In the history of revealed preference theory in consumer theory, strong rationalizability came first. It is essentially the notion in Samuelson (1938) and Richter (1966). Strong rationalizability is the appropriate notion when it is known that all potentially chosen alternatives are actually chosen, or when we want to impose, as an added discipline, that the observed choices are uniquely optimal in each choice problem. This makes sense when studying demand functions, as Samuelson did. Weak rationalizability was one of the innovations in Afriat (1967b), who was interested in demand correspondences.⁷⁷7As an illustration of the difference between these two notions of rationalizability, note that, in the setting of consumer theory, one leads to the Strong Axiom of Revealed Preference while the other to the Generalized Axiom of Revealed Preference. Of course, Afriat’s approach is also distinct in assuming a finite dataset. See Chambers and Echenique (2016) for a detailed discussion.

5.1. A general “limiting” result

Our next result serves to contrast what can be achieved with the “limiting” (countably infinite) data with the limit of preferences recovered from finite choice experiments.

Theorem 4.

Suppose that $\succeq$ and $\succeq^{*}$ are two continuous preference relations (complete and transitive). If $\succeq|_{B\times B}=\succeq^{*}|_{B\times B}$ , then $\succeq=\succeq^{*}$ .

Indeed, as the proof makes clear, Theorem 4 would hold more generally for any $X$ which is a connected topological space, but it may not hold in absence of connectedness. There is a sense in which the limiting case with an infinite amount of data offers no problems for preference recovery. The structure we impose is needed for the limit of rationalizations drawn from finite data.

5.2. Recovery from finite data in the AA model

Here we adopt the same structural assumptions as in Section 3.4, namely that $X=\Delta([a,b])^{S}$ , endowed with the weak topology and the first order stochastic dominance relation. However, the result easily extends to broader environments, as the proof makes clear.

Theorem 5.

There is a sequence of finite experiments $\Sigma_{\infty}$ so that if the subject’s preference $\succeq^{*}$ is continuous and weakly monotone, and for each $k\in\mathbf{N}$ , $\succeq^{k}$ is a continuous and weakly monotone preference that strongly rationalizes a choice function $(\Sigma_{k},c)$ generated by $\succeq^{*}$ ; then $\succeq_{k}\rightarrow\succeq^{*}$ .

Corollary 1.

Let $\succeq^{*}$ and $\succeq^{k}$ be as in the statement of Theorem 5. If, in addition, $\succeq^{*}$ and $\succeq^{k}$ have standard representations $(V,u)$ and $(V^{k},u^{k})$ then $(V,u)=\lim_{k\to\infty}(V^{k},u^{k})$ .

Note that Theorem 5 requires the existence of the data-generating preference $\succeq^{*}$ .

A “dual” result to Theorem 5 was established in Chambers, Echenique, and Lambert (2021). There, the focus was on weak rationalization via $\succeq^{k}$ , which is a weaker notion than the strong rationalization hypothesized here. To achieve a weak rationalization result, we assumed instead that preferences were strictly monotone.

6. Proofs

In this section, unless we say otherwise, we denote by $X$ the set of acts $\Delta([a,b])^{S}$ , and the elements of $X$ by $x,y,z$ etc. Note that $X$ is compact Polish when $\Delta([a,b])$ is endowed with the topology of weak convergence of probability measures. Let $\mathcal{P}$ be the set of all complete and continuous binary relations on $X$ .

6.1. Lemmas

The lemmas stated here will be used in the proofs of our results.

Lemma 1.

Let $X\subseteq\mathbf{R}^{n}$ . If $\{x^{\prime}_{n}\}$ is an increasing sequence in $X$ , and $\{x^{\prime\prime}_{n}\}$ is a decreasing sequence, such that $\sup\{x^{\prime}_{n}:n\geq 1\}=x^{*}=\inf\{x^{\prime\prime}_{n}:n\geq 1\}$ . Then

\lim_{n\rightarrow\infty}x^{\prime}_{n}=x^{*}=\lim_{n\rightarrow\infty}x^{\prime\prime}_{n}.

Proof.

This is obviously true for $n=1$ . For $n>1$ , convergence and sups and infs are obtained component-by-component, so the result follows. ∎

Lemma 2.

Let $X=\Delta([a,b])$ . Let $\{x_{n}\}$ be a convergent sequence in $X$ , with $x_{n}\rightarrow x^{*}$ . Then there is an increasing sequence $\{x^{\prime}_{n}\}$ and an a decreasing sequence $\{x^{\prime\prime}_{n}\}$ such that $x^{\prime}_{n}\leq x_{n}\leq x^{\prime\prime}_{n}$ , and $\lim_{n\rightarrow\infty}x^{\prime}_{n}=x^{*}=\lim_{n\rightarrow\infty}x^{\prime\prime}_{n}$ .

Proof.

The set $X$ ordered by first order stochastic dominance is a complete lattice (see, for example, Lemma 3.1 in Kertz and Rösler (2000)). Suppose that $x_{n}\rightarrow x^{*}$ . Define $x^{\prime}_{n}$ and $x^{\prime\prime}_{n}$ by $x^{\prime}_{n}=\inf\{x_{m}:n\leq m\}$ and $x^{\prime\prime}_{n}=\sup\{x_{m}:n\leq m\}$ . Clearly, $\{x^{\prime}_{n}\}$ is an increasing sequence, $\{x^{\prime\prime}_{n}\}$ is decreasing, and $x^{\prime}_{n}\leq x_{n}\leq x^{\prime\prime}_{n}$ .

Let $F_{x}$ denote the cdf associated with $x$ . Note that $F_{x^{\prime\prime}_{n}}(r)=\inf\{F_{x_{m}}(r):n\leq m\}$ while $F_{x^{\prime}_{n}}(r)$ is the right-continuous modification of $\sup\{F_{x_{m}}(r):n\leq m\}$ . For any point of continuity $r$ of $F$ , $F_{x_{m}}(r)\rightarrow F_{x^{*}}(r)$ , so

F_{x}(r)=\sup\{\inf\{F_{x_{m}}(r):n\leq m\}:n\geq 1\}

by Lemma 1.

Moreover, $F_{x^{*}}(r)=\inf\{\sup\{F_{x_{m}}(r):n\leq m\}:n\geq 1\}$ . Let $\varepsilon>0$ . Then

\begin{split}F_{x^{*}}(r-\varepsilon)\leftarrow\sup\{F_{x_{m}}(r-\varepsilon):n\leq m\}\leq F_{x^{\prime}_{n}}(r)\leq\sup\{F_{x_{m}}(r+\varepsilon):n\leq m\}\\ \rightarrow F_{x^{*}}(r+\varepsilon)\end{split}

Then $F_{x^{\prime}_{n}}(r)\rightarrow F_{x^{*}}(r)$ , as $r$ is a point of continuity of $F_{x^{*}}$ . ∎

The results we have obtained motivate two definitions that will prove useful. Say that the set $X$ , together with the collection of finite experiments $\Sigma_{\infty}$ , has the countable order property if for each $x\in X$ and each neighborhood $V$ of $x$ in $X$ there is $x^{\prime},x^{\prime\prime}\in(\cup_{i}B_{i})\cap V$ with $x^{\prime}\leq x\leq x^{\prime\prime}$ . We say that $X$ has the squeezing property if for any convergent sequence $\{x_{n}\}_{n}$ in $X$ , if $x_{n}\rightarrow x^{*}$ then there is an increasing sequence $\{x^{\prime}_{n}\}_{n}$ , and an a decreasing sequence $\{x^{\prime\prime}_{n}\}_{n}$ , such that $x^{\prime}_{n}\leq x_{n}\leq x^{\prime\prime}_{n}$ , and $\lim_{n\rightarrow\infty}x^{\prime}_{n}=x^{*}=\lim_{n\rightarrow\infty}x^{\prime\prime}_{n}$ .

Lemma 3.

If $X=\Delta([a,b])^{S}$ , then $X$ has the squeezing property, and there is $\Sigma_{\infty}$ such that $(X,\Sigma_{\infty})$ has the countable order property.

Proof.

The squeezing property follows from Lemma 2, and the countable order property from Theorem 15.11 of Aliprantis and Border (2006): Indeed, let $B$ be the set of probability distributions $p$ with finite support on $\mathbf{Q}\cap[a,b]$ , where for all $q\in\mathbf{Q}\cap[a,b]$ , $p(q)\in\mathbf{Q}$ . Then we may choose a sequence of pairs $B_{i}$ , and let $\Sigma_{\infty}$ to be $B_{i}$ with $B=\cup B_{i}$ so that the countable order property is satisfied. ∎

6.2. Proof of Theorem 2

Without loss of generality, we may set $[a,b]=[0,1]$ . First we show that $u^{k}\to u$ in the compact-open topology. To this end, let $x^{k}\to x$ . We want to show that $u^{k}(x^{k})\to u(x)$ . Suppose then that this is not the case, and by selecting a subsequence that $u^{k}(x^{k})\to Y>u(x)$ (without loss). Note that $\delta_{x^{k}}\sim^{k}p^{k}$ , where $p^{k}$ is the lottery that pays $1$ with probability $u^{k}(x^{k})\in[0,1]$ , and $0$ with probability $1-u^{k}(x^{k})$ . Let $p$ be the lottery that pays $1$ with probability $Y$ , and $0$ with probability $1-Y$ (given that the range of $u^{k}$ is $[0,1]$ , we must have $Y\in[0,1]$ ). Now we have that $(\delta_{x^{k}},p^{k})\to(\delta_{x},p)$ and $\delta_{x^{k}}\sim^{k}p^{k}$ implies $\delta_{x}\sim p$ . This is a contradiction because $\delta_{x}$ is indifferent in $\succeq$ to the lottery that pays $1$ with probability $u^{k}(x^{k})\in[0,1]$ , and $0$ with probability $1-u^{k}(x^{k})$ . The latter is strictly first-order stochastically dominated by the lottery $p$ .

To finish the proof, we show that $V^{k}\to V$ . This is the same as proving that $V^{k}(f^{k})\to V(f)$ when $f^{k}\to f$ . For each $k$ , continuity and weak monotonicity imply that there is $x^{k}\in[0,1]$ so that

V^{k}(f^{k})=V^{k}(\delta_{x^{k}},\ldots,\delta_{x^{k}})=u^{k}(x^{k}).

Similarly, there is $x$ with $V(f)=V(\delta_{x},\ldots,\delta_{x})=u(x)$ .

Now we argue that $x^{k}\to x$ . Indeed $\{x^{k}\}$ is a sequence in $[0,1]$ . If there is a subsequence that converges to, say, $x^{\prime}>x$ then we may choose $x^{\prime\prime}=\frac{x+x^{\prime}}{2}$ and eventually

f^{k}\succeq^{k}(\delta_{x^{\prime\prime}},\ldots,\delta_{x^{\prime\prime}})\succ(\delta_{x},\ldots,\delta_{x})\sim f,

using weak monotonicity. This is impossible because $(f^{k},(\delta_{x^{k}},\ldots,\delta_{x^{k}})\to(f,(\delta_{x^{\prime}},\ldots,\delta_{x^{\prime}}))$ and $f^{k}\succeq^{k}((\delta_{x^{k}},\ldots,\delta_{x^{k}})$ imply that $f\succeq((\delta_{x^{\prime}},\ldots,\delta_{x^{\prime}})\succeq(\delta_{x^{\prime\prime}},\ldots,\delta_{x^{\prime\prime}})$ .

Finally, using what we know about the convergence of $u^{k}$ to $u$ , $V^{k}(f^{k})=u^{k}(x^{k})\to u(x)=V(f)$ .

We now turn to the second statement in the theorem. Observe that $H^{k}$ is a continuous function from $[0,1]^{S}$ onto $[0,1]$ . Let $z^{k}\in[0,1]^{S}$ be an arbitrary convergent sequence, and say that $z^{k}\rightarrow z^{*}$ . We claim that $H^{k}(z^{k})\rightarrow H(z^{*})$ . Without loss we may assume that $H^{k}(z^{k})\rightarrow Y$ , by taking a subsequence if necessary. For each $k$ and $s$ , choose $y^{k}(s)\in[0,1]$ for which $u^{k}(y^{k}(s))=z^{k}(s)$ . Again, without loss, we may assume that $y^{k}\rightarrow y^{*}$ by taking a subsequence if necessary, and using the finiteness of $S$ . Observe also that $u(y^{*}(s))=z^{*}(s)$ as we have shown that $u^{k}\to u$ in the compact-open topology.

Now, we may also choose $\hat{z}^{k}\in[0,1]$ so that

u^{k}(\hat{z}^{k})=H^{k}(z^{k})=H^{k}((u^{k}(y^{k}(s)))_{s\in S}),

and further may again without loss (by taking a subsequence) assume that $\hat{z}^{k}$ converges to $\hat{z}^{*}$ . Thus $u(\hat{z}^{*})=\lim u^{k}(\hat{z}^{k})=\lim H^{k}(z^{k})=Y$ , again using what we have shown regarding $u^{k}\to u$ . Then $(\delta_{\hat{z}^{k}},\ldots,\delta_{\hat{z}^{k}})\sim^{k}(y^{k}(s))_{s\in S}$ so that, by taking limits, $(\delta_{\hat{z}^{*}},\ldots,\delta_{\hat{z}^{*}})\sim^{*}(y^{*}(s))_{s}$ . This implies that $Y=u(\hat{z}^{*})=H(u(y^{*}(s))=H(z^{*})$ .

6.3. Proof of Proposition 1

Take $(\succeq^{k},p^{k})$ as in the statement of the Proposition, and observe that for every $p\in\Delta([a,b])$ , $\mbox{ce}(\succeq^{k},p^{k})\in[a,b]$ . Suppose by means of contradiction that $\mbox{ce}(\succeq^{k},p^{k})\rightarrow\mbox{ce}(\succeq,p)$ is false, then there is some $\epsilon>0$ and a subsequence for which $|\mbox{ce}(\succeq^{k_{m}},p^{k_{m}})-\mbox{ce}(\succeq,p)|>\epsilon$ , by taking a further subsequence, we assume without loss that $\mbox{ce}(\succeq^{k_{m}},p^{k_{m}})\rightarrow\alpha\neq\mbox{ce}(\succeq,p)$ . Now, $p^{k_{m}}\sim^{k_{m}}\delta_{\mbox{ce}(\succeq^{k_{m}},p^{k_{m}})}$ , and $p^{k_{m}}\rightarrow p$ and $\delta_{\mbox{ce}(\succeq^{k_{m}},p^{k_{m}})}\rightarrow\delta_{\alpha}$ . So by definition of closed convergence, it follows that $p\sim\delta_{\alpha}$ ; but this violates certainty monotonicity as $\alpha\neq\mbox{ce}(\succeq,p)$ .

7. Proof of Theorem 3

First some notation. Let $\mu_{n}(\succeq)=\frac{1}{n}\sum_{i=1}^{n}\mathbf{1}_{x_{i}\succeq y_{i}}$ , and $\succeq_{n}\in\mathcal{P}$ be represented by $u_{n}\in\mathcal{U}$ . By definition of $u_{n}$ , we have that $\mu_{n}(\succeq_{n})\geq\mu_{n}(\succeq)$ for all $\succeq\in\mathcal{P}$ . And we use $\mathrm{Vol}(A)$ to denote the volume of a set $A$ in $\mathbf{R}^{d}$ , when this is well defined (see Schneider (2014)).

Consider the measure $\mu$ on $X\times X$ defined as

\mu(A,\succeq)=\int_{A}q(\succeq;x,y)\mathop{}\!\mathrm{d}\lambda(x,y).

In particular

\mu(\succeq^{\prime},\succeq)=\int_{X\times X}\mathbf{1}_{\succeq^{\prime}}(x,y)q(\succeq;x,y)\mathop{}\!\mathrm{d}\lambda(x,y).

is the probability that a choice with error made at a randomly-drawn choice problem by an agent with preference $\succeq$ will coincide with $\succeq^{\prime}$ .

The key identification result shown in Chambers, Echenique, and Lambert (2021) is that, if $\succeq^{\prime}\neq\succeq$ , then

\mu(\succeq^{\prime},\succeq)<\mu(\succeq,\succeq).

Lemma 4.

Consider a Lipschitz noise choice environment $(X,\mathcal{P},\lambda,q)$ . There is a constant $C$ with the following property. If $\succeq$ and $\succeq^{\prime}$ are two preferences in $\mathcal{P}$ with representations $u$ and $u^{\prime}$ (respectively) in $\mathcal{U}$ . Then

C\rho(u,u^{\prime})^{d}\leq\mu(\succeq,\succeq)-\mu(\succeq^{\prime},\succeq)

Proof.

The ball in $\mathbf{R}^{d}$ with center $x$ and radius $\varepsilon$ is denoted by $B_{\varepsilon}(x)$ . First we show that the map

\varepsilon\mapsto\frac{\mathrm{Vol}(B_{\epsilon}(x)\cap X)}{\mathrm{Vol}(B_{\epsilon}(x))},

defined for $x\in X$ , is nonincreasing as a function of $\epsilon>0$ .

Indeed, let $\epsilon_{1}<\epsilon_{2}$ , and let $y\in B_{\epsilon_{2}}(x)\cap X$ . Then $y\in X$ and $\|y-x\|\leq\epsilon_{2}$ . By convexity of $X$ , $y_{1}\equiv x+\frac{\epsilon_{1}}{\epsilon_{2}}(y-x)=(1-\frac{\epsilon_{1}}{\epsilon_{2}})x+\frac{\epsilon_{1}}{\epsilon_{2}}y\in X$ , and $y_{1}\in B_{\epsilon_{1}}(x)$ . Observe further by properties of Lebesgue measure in $\mathbf{R}^{d}$ that $\mathrm{Vol}(\{x+\frac{\epsilon_{1}}{\epsilon_{2}}(y-x):y\in B_{\epsilon_{2}}(x)\cap X\})=\left(\frac{\epsilon_{1}}{\epsilon_{2}}\right)^{d}\mathrm{Vol}(B_{\epsilon_{2}}(x)\cap X)$ . Therefore, $\mathrm{Vol}(B_{\epsilon_{1}}(x)\cap X)\geq\left(\frac{\epsilon_{1}}{\epsilon_{2}}\right)^{d}\mathrm{Vol}(B_{\epsilon_{2}}(x)\cap X)$ . Since $\mathrm{Vol}(B_{\epsilon_{1}}(x))=\left(\frac{\epsilon_{1}}{\epsilon_{2}}\right)^{d}\mathrm{Vol}(B_{\epsilon_{2}}(x))$ , it follows that

\frac{\mathrm{Vol}(B_{\epsilon_{1}}(x)\cap X)}{\mathrm{Vol}(B_{\epsilon_{1}}(x))}\geq\frac{\mathrm{Vol}(B_{\epsilon_{2}}(x)\cap X)}{\mathrm{Vol}(B_{\epsilon_{2}}(x))},

like we wanted to show.

Now observe that there exists $\bar{\varepsilon}>0$ large enough that $X\subseteq B_{\varepsilon}(x)$ for all $\varepsilon\geq\bar{\varepsilon}$ and $x\in X$ . Hence, for any $x\in X$ and $\varepsilon\in(0,\bar{\varepsilon}]$

\frac{\mathrm{Vol}(B_{\epsilon}(x)\cap X)}{\mathrm{Vol}(B_{\epsilon}(x))}\geq\frac{\mathrm{Vol}(X)}{\mathrm{Vol}(B_{\bar{\epsilon}}(x))}\equiv c^{\prime}>0,

as $X$ has nonempty interior and the volume of a ball in $\mathbf{R}^{d}$ is independent of its center.

Now we proceed with the proof of the statement in the lemma. Let $\Delta=\rho(u,u^{\prime})$ and fix $x\in X$ with (wlog) $u(x)-u^{\prime}(x)=\Delta>0$ . Set

\varepsilon=\frac{\Delta}{4\kappa}.

We may assume that $\varepsilon\leq 2\bar{\varepsilon}$ as defined above, as otherwise we can use a larger upper bound on the Lipschitz constants for the functions in $\mathcal{U}$ .

Consider the interval

I=[(u^{\prime}(x)+\kappa\varepsilon)\mathbf{1},(u(x)-\kappa\varepsilon)\mathbf{1}],

with volume

(u(x)-\kappa\varepsilon-(u^{\prime}(x)+\kappa\varepsilon))^{d}=(\Delta/2)^{d}.

Consider $B_{\varepsilon/2}(x)$ . If $y\in B_{\varepsilon/2}(x)$ then $\left|\tilde{u}(y)-\tilde{u}(x)\right|<\kappa\varepsilon$ for any $\tilde{u}\in\mathcal{U}$ .

Now, if $z\in I$ and $y\in B_{\varepsilon}(x)$ then

u(y)>u(x)-\kappa\varepsilon=u((x-\kappa\varepsilon)\mathbf{1})\geq u(z)

by monotonicity. Similarly,

u^{\prime}(z)\geq u^{\prime}((x+\kappa\varepsilon)\mathbf{1})=u^{\prime}(x)+\kappa\varepsilon>u^{\prime}(y)

Thus $(y,z)\in\succ\setminus\succeq^{\prime}$ for any $(y,z)\in B_{\varepsilon}(x)\times I$ , and

	$\displaystyle\mu(\succeq,\succeq)-\mu(\succeq^{\prime},\succeq)$	$\displaystyle=\int 1_{\succ\setminus\succ^{\prime}}(y,z)[q(\succeq;(y,z))-q(\succeq;(z,y))]\mathop{}\!\mathrm{d}\lambda(y,z)$
		$\displaystyle\geq\int_{B_{\varepsilon/2}(x)\times I}1_{\succ\setminus\succ^{\prime}}(y,z)[q(\succeq;(y,z))-q(\succeq;(z,y))]\mathop{}\!\mathrm{d}\lambda(y,z)$
		$\displaystyle\geq\lambda(B_{\varepsilon(x)/2}\times I)\inf\{q(\succeq;(y,z)-q(\succeq;(z,y)):(y,z)\in B_{\varepsilon/2}(x)\times I\}.$

Where the first identity is shown in Chambers, Echenique, and Lambert (2021). The second inequality follows because $q(\succeq;(x,y))>1/2>q(\succeq;(y,x))$ on $(x,y)\in\succ$ . The third inequality is because $(y,z)\in\succ\setminus{\succeq}^{\prime}\subseteq\succ\setminus{\succ^{\prime}}$ on $B_{\varepsilon}(x)\times I$ .

By the assumptions we have placed on $\lambda$ , and the calculations above, we know that

\lambda(B_{\varepsilon(x)/2})\geq\bar{c}\;\mathrm{Vol}(B_{\bar{\epsilon}}(x)\cap X)\geq\bar{c}c^{\prime}\;\mathrm{Vol}(B_{\bar{\epsilon}}(x))=\bar{c}c^{\prime}\frac{(\varepsilon/2)^{d}\pi^{d/2}}{\Gamma(1+d/2)}.

So there is a constant $C^{\prime\prime}$ (that only depends on $X$ and $\bar{c}$ ) so that $\lambda(I\times B_{\varepsilon/2}(x))$ is bounded below by

(\Delta/2)^{d}\frac{C^{\prime\prime}(\varepsilon/2)^{d}\pi^{d/2}}{\Gamma(1+d/2)}=(\Delta/2)^{d}\frac{C^{\prime\prime}\Delta^{d}\pi^{d/2}}{(8\kappa)^{d}\Gamma(1+d/2)}=C^{\prime}\Delta^{2d}.

Here $C^{\prime}$ is a constant that only depends on $C^{\prime\prime}$ , $\kappa$ and $d$ .

By the assumption that $\Theta>1/2$ , we get that

\mu(\succeq,\succeq)-\mu(\succeq^{\prime},\succeq)\geq C\Delta^{2d}

for some constant $C$ that depends on $C^{\prime}$ and $\Theta$ . ∎

Lemma 5.

Consider a homothetic noise choice environment $(X,\mathcal{P},\lambda,q)$ . There is a constant $C$ with the following property. If $\succeq$ and $\succeq^{\prime}$ are two preferences in $\mathcal{P}$ with representations $u$ and $u^{\prime}$ (respectively) in $\mathcal{U}$ . Then

C\rho(u,u^{\prime})^{2d}\leq\mu(\succeq,\succeq)-\mu(\succeq^{\prime},\succeq)

Proof.

Let $x\in X$ be such that

\rho(u,u^{\prime})\leq u(x)-u^{\prime}(x)=\Delta>0.

Choose $\eta\in(0,1)$ so that $u(\eta x)-u^{\prime}(x)=\Delta/2$ . Let

I=(u^{\prime}(x)\mathbf{1},u(\eta x)\mathbf{1})

and

Z_{\eta}=[\eta x,x]\cap D^{M}_{\alpha}.

Note that $I\subseteq X$ because by homotheticity, $\|x\|=M$ and hence $x\geq\alpha\mathbf{1}$ . Then we must have $\alpha\mathbf{1}\leq u^{\prime}(x)\mathbf{1}$ as $\alpha\mathbf{1}\not\leq u^{\prime}(x)\mathbf{1}$ would mean that $u^{\prime}(x)\mathbf{1}\ll\alpha\mathbf{1}$ , contradicting monotonicity and $x\sim^{\prime}u^{\prime}(x)\mathbf{1}$ .

Observe that if $y\in I$ and $z\in Z_{\eta}$ then we have that

u(y)<u(u(\eta x)\mathbf{1})=u(\eta x)\leq u(z),

as $y<u(\eta x)\mathbf{1}$ and $\eta x\leq z$ ; while

u^{\prime}(z)\leq u^{\prime}(x)=u^{\prime}(u^{\prime}(x)\mathbf{1})<u^{\prime}(y).

Hence $(z,y)\in{\succ}\setminus{\succeq^{\prime}}$ .

First we estimate $\mathrm{Vol}(Z_{\eta})$ . Write $Z_{0}$ for $[0,x]\cap D^{M}_{\alpha}$ . Define the function $f(z)=x+(1-\eta)(z-x)$ and note that when $z\in Z_{0}$ then $f(z)=\eta x+(1-\eta)z\in[\eta x,x]$ because $z\geq 0$ . Note also that $f(z)$ is a convex combination of $x$ and $z$ , so $f(z)\in D^{M}_{\alpha}$ as the latter is a convex set. This shows that

Z_{\eta}=\{x\}+(1-\eta)(Z_{0}-\{x\}),

and hence that $\mathrm{Vol}(Z_{\eta})=(1-\eta)^{d}\mathrm{Vol}(Z_{0})$ .

Now, since $Z_{0}$ is star shaped we have

\mathrm{Vol}(Z_{0})=\frac{1}{d}\int_{y\in S^{M}_{\alpha}}\rho(y,[0,x])^{d}\mathop{}\!\mathrm{d}y\geq(\frac{\alpha}{M})^{d}A^{M}_{\alpha},

where $A^{M}_{\alpha}$ is the surface area of $S^{M}_{\alpha}$ and $\rho(y,[0,x])=\max\{\theta>0:\theta y\in[0,x]$ is the radial function of the set $[0,x]$ (see Schneider (2014) page 57). The inequality results from $\rho(y,[0,x])\geq\alpha/M$ as $x_{i}\geq\alpha$ and $y_{i}\leq M$ for any $y\in S^{M}_{\alpha}$ .

Now,

1-\eta=1-\frac{\Delta/2+u^{\prime}(x)}{u(x)}=\frac{\Delta/2}{u(x)}\geq\frac{\Delta/2}{M},

as $u(x)\leq M$ . Thus we have that

\mathrm{Vol}(Z_{\eta})\geq\Delta^{d}C^{\prime},

with $C^{\prime}=\mathrm{Vol}(Z_{0})/(2M)^{d}>0$ , a constant.

Moreover, we have $\mathrm{Vol}(I)=(\Delta/2)^{d}$ as $I\subseteq X$ . Then we obtain, again using a formula derived in Chambers, Echenique, and Lambert (2021), and that $q(\succeq;(x,y))>1/2>q(\succeq;(y,x))$ on $(x,y)\in\succ$ :

	$\displaystyle\mu(\succeq,\succeq)-\mu(\succeq^{\prime},\succeq)$	$\displaystyle=\int 1_{\succ\setminus\succ^{\prime}}(z,y)[q(\succeq;(z,y))-q(\succeq;(y,z))]\mathop{}\!\mathrm{d}\lambda(z,y)$
		$\displaystyle\geq\int_{Z_{\eta}\times I}1_{\succ\setminus\succ^{\prime}}(z,y)[q(\succeq;(z,y))-q(\succeq;(y,z))]\mathop{}\!\mathrm{d}\lambda(z,y)$
		$\displaystyle\geq\lambda(Z_{\lambda}\times I)\inf\{q(\succeq;(z,y)-q(\succeq;(y,z)):(z,y)\in Z_{\eta}\times I\}$
		$\displaystyle\geq(\Delta/2)^{d}C^{\prime}\Delta^{d}\Theta,$

where $\Theta=\inf\{q(\succeq;(z,y)-q(\succeq;(y,z)):(z,y)\in Z_{\eta}\times I\}>0$ . ∎

7.1. Proof of Theorem 3

For the rest of this proof, we denote $\mu(\succeq,\succeq^{*})$ by $\mu(\succeq)$ .

The rest of the proof uses routine ideas from statistical learning theory. By standard results (see, for example, Theorem 3.1 in Boucheron, Bousquet, and Lugosi (2005)), there exists an event $E$ with probability at least $1-\delta$ on which:

\sup\{\left|\mu_{n}(\succeq)-\mu(\succeq)\right|:\succeq\in P\}\leq\mathop{\mathbf{missing}}{E}\sup\{\left|\mu_{n}(\succeq)-\mu(\succeq)\right|:\succeq\in P\}+\sqrt{\frac{2\ln(1/\delta)}{n}}.

Moreover, again by standard arguments (see Theorem 3.2 in Boucheron, Bousquet, and Lugosi (2005)), we also have

\mathop{\mathbf{missing}}{E}\sup\{\left|\mu_{n}(\succeq)-\mu(\succeq)\right|:\succeq\in P\}\leq 2\mathop{\mathbf{missing}}{E}\sup\{\frac{1}{n}\left|\sum_{i}\sigma_{i}\mathbf{1}_{\tilde{x}_{i}\succeq y_{i}}\right|:\succeq\in\mathcal{P}\},

where

R_{n}(\mathcal{P})=\mathop{\mathbf{missing}}{E}\sup\{\frac{1}{n}\left|\sum_{i}\sigma_{i}\mathbf{1}_{\tilde{x}_{i}\succeq y_{i}}\right|:\succeq\in\mathcal{P}\}

is the Rademacher average of $\mathcal{P}$ .

Now, by the Vapnik-Chervonenkis inequality (see Theorem 3.4 in Boucheron, Bousquet, and Lugosi (2005)), we have that

\mathop{\mathbf{missing}}{E}\sup\{\left|\mu_{n}(\succeq)-\mu(\succeq)\right|:\succeq\in P\}\leq K\sqrt{\frac{V}{n}},

where $V$ is the VC dimension of $\mathcal{P}$ , and $K$ is a universal constant.

So on the event $E$ , we have we have that

\sup\{\left|\mu_{n}(\succeq)-\mu(\succeq)\right|:\succeq\in P\}\leq K\sqrt{V/n}+\sqrt{\frac{2\ln(1/\delta)}{n}}.

We now combine these statements with Lemmas 4 and 5. In particular, we let $D=d$ or $D=2d$ depending on which of the lemmas we invoke. Let $u^{*}\in\mathcal{U}$ represent $\succeq^{*}$ and $u_{n}\in\mathcal{U}$ represent $\succeq_{n}$ . Let $\Delta=\rho(u^{*},u_{n})$ , a magnitude that depends on the sample. Then, on the event $E$ , by Lemma 4 or 5, we have that

	$\displaystyle C\Delta^{D}$	$\displaystyle\leq\mu(\succeq^{*})-\mu(\succeq_{n})$
		$\displaystyle=\mu(\succeq^{})-\mu_{n}(\succeq^{})+\mu_{n}(\succeq^{*})-\mu_{n}(\succeq_{n})+\mu_{n}(\succeq_{n})-\mu(\succeq_{n})$
		$\displaystyle\leq 2K\sqrt{\frac{V}{n}}+2\sqrt{\frac{2\ln(1/\delta)}{n}},$

where we have used that $\mu_{n}(\succeq^{*})-\mu_{n}(\succeq_{n})<0$ by definition of $\succeq_{n}$ . This proves the second statement in the theorem.

To prove the first statement in the theorem, by Lemmas 4 and 5 again, and using that $\mu_{n}(\succeq_{n})\geq\mu_{n}(\succeq^{*})$ , we have that, for any $\eta>0$ ,

	$\displaystyle\mathop{}\mathrm{Pr}(\rho(u^{*},u_{n})>\eta)$	$\displaystyle\leq\mathop{}\mathrm{Pr}(\mu(\succeq^{*})-\mu(\succeq_{n})>C\eta^{D})$
		$\displaystyle\leq\mathop{}\mathrm{Pr}(\mu(\succeq^{})-\mu_{n}(\succeq^{})>C\eta^{D}/2)+\mathop{}\mathrm{Pr}(\mu_{n}(\succeq_{n})-\mu(\succeq_{n})>C\eta^{D}/2)$
		$\displaystyle\leq 2\mathop{}\mathrm{Pr}(\sup\{\left\|\mu(\succeq^{\prime})-\mu_{n}(\succeq^{\prime})\right\|:\succeq^{\prime}\in\mathcal{P}\}>C\eta^{D}/2)\to 0$

as $n\to\infty$ by the uniform convergence in probability result shown in Chambers, Echenique, and Lambert (2021).

7.2. Proof of Theorem 5

By standard results (see Hildenbrand (1970)), since $X$ is locally compact Polish, the topology of closed convergence is compact metric.

We will show that for any subsequence of $\succeq^{k}$ , there is a subsubsequence converging to $\succeq^{*}$ , which will establish that $\succeq^{k}\rightarrow\succeq^{*}$ .

So choose a convergent subsubsequence of the given subsequence. To simplify notation and with a slight abuse of notation, let us also refer to this subsubsequence as $\succeq^{k}$ . Call its limit $\succeq$ ; $\succeq$ is complete as the set of complete relations is closed in the closed convergence topology. It is therefore sufficient to establish that $\succ^{*}\subseteq\succ$ and $\succeq^{*}\subseteq\succeq$ .

First we show that $x\succ^{*}y$ implies that $x\succ y$ . So let $x\succ^{*}y$ . Let $U$ and $V$ be neighborhoods of $x$ and $y$ , respectively, such that $x^{\prime}\succ^{*}y^{\prime}$ for all $x^{\prime}\in U$ and $y^{\prime}\in V$ . Such neighborhoods exist by the continuity of $\succeq^{*}$ . We prove first that if $(x^{\prime},y^{\prime})\in U\times V$ , then there exists $N$ such that $x^{\prime}\succ_{n}y^{\prime}$ for all $n\geq N$ . Recall that $B=\cup\{B^{\prime}:B^{\prime}\in\Sigma_{\infty}\}$ . By hypothesis, there exist $x^{\prime\prime}\in U\cap B$ and $y^{\prime\prime}\in V\cap B$ such that $x^{\prime\prime}\leq x^{\prime}$ and $y^{\prime}\leq y^{\prime\prime}$ . Each $\succeq_{n}$ is a strong rationalization of the finite experiment of order $n$ , so if $\{\tilde{x},\tilde{y}\}\in\Sigma_{n}$ then $\tilde{x}\succ_{n}\tilde{y}$ implies that $\tilde{x}\succ_{m}\tilde{y}$ for all $m\geq n$ . Since $x^{\prime\prime},y^{\prime\prime}\in B$ , there is $N$ is such that $\{x^{\prime\prime},y^{\prime\prime}\}\in\Sigma_{N}$ . Thus $x^{\prime\prime}\succ^{*}y^{\prime\prime}$ implies that $x^{\prime\prime}\succ_{n}y^{\prime\prime}$ for all $n\geq N$ . So, for $n\geq N$ , $x^{\prime}\succ_{n}y^{\prime}$ , as $\succeq_{n}$ is weakly monotone.

Now we establish that $x\succ y$ . Let $\{(x_{n},y_{n})\}$ be an arbitrary sequence with $(x_{n},y_{n})\rightarrow(x,y)$ . By hypothesis, there is an increasing sequence $\{x^{\prime}_{n}\}$ , and a decreasing sequence $\{y^{\prime}_{n}\}$ , such that $x^{\prime}_{n}\leq x_{n}$ and $y_{n}\leq y^{\prime}_{n}$ while $(x,y)=\lim_{n\rightarrow\infty}(x^{\prime}_{n},y^{\prime}_{n})$ .

Let $N$ be large enough that $x^{\prime}_{N}\in U$ and $y^{\prime}_{N}\in V$ . Let $N^{\prime}\geq N$ be such that $x^{\prime}_{N}\succ_{n}y^{\prime}_{N}$ for all $n\geq N^{\prime}$ (we established the existence of such $N^{\prime}$ above). Then, for any $n\geq N^{\prime}$ we have that

x_{n}\geq x^{\prime}_{n}\geq x^{\prime}_{N}\succ_{n}y^{\prime}_{N}\geq y^{\prime}_{n}\geq y_{n}.

By the weak monotonicity of $\succeq_{n}$ , then, $x_{n}\succ_{n}y_{n}$ . The sequence $\{(x_{n},y_{n})\}$ was arbitrary, so $(y,x)\notin\succeq=\lim_{n\rightarrow\infty}\succeq_{n}$ . Thus $\neg(y\succeq x)$ . Completeness of $\succeq$ implies that $x\succ y$ .

In second place we show that if $x\succeq^{*}y$ then $x\succeq y$ , thus completing the proof. So let $x\succeq^{*}y$ . We recursively construct sequences $x^{n_{k}},y^{n_{k}}$ such that $x^{n_{k}}\succeq^{n_{k}}y^{n_{k}}$ and $x^{n_{k}}\rightarrow x$ , $y^{n_{k}}\rightarrow y$ .

So, for any $k\geq 1$ , choose $x^{\prime}\in N_{x}(1/k)\cap B$ with $x^{\prime}\geq x$ , and $y^{\prime}\in N_{y}(1/k)\cap B$ with $y^{\prime}\leq y$ ; so that $x^{\prime}\succeq^{*}x\succeq^{*}y\succeq^{*}y^{\prime}$ , as $\succeq^{*}$ is weakly monotone. Recall that $\succeq_{n}$ strongly rationalizes $c_{\succeq^{*}}$ for $\Sigma_{n}$ . So $x^{\prime}\succeq^{*}y^{\prime}$ and $x^{\prime},y^{\prime}\in B$ imply that $x^{\prime}\succeq_{n}y^{\prime}$ for all $n$ large enough. Let $n_{k}>n_{k-1}$ (where we can take $n_{0}=0$ ) such that $x^{\prime}\succeq_{n_{k}}y^{\prime}$ ; and let $x^{n_{k}}=x^{\prime}$ and $y^{n_{k}}=y^{\prime}$ .

Then we have $(x^{n_{k}},y^{n_{k}})\rightarrow(x,y)$ and $x_{n_{k}}\succeq_{n_{k}}y_{n_{k}}$ . Thus $x\succeq y$ .

7.3. Proof of Theorem 4

First, it is straightforward to show that $x\succ y$ implies $x\succeq^{\prime}y$ . Because otherwise there are $x,y$ for which $x\succ y$ and $y\succ^{\prime}x$ . Take an open neighborhood $U$ about $(x,y)$ and a pair $(z,w)\in U\cap(B\times B)$ for which $z\succ w$ and $w\succ^{\prime}z$ , a contradiction. Symmetrically, we also have $x\succ^{\prime}y$ implies $x\succeq y$ .

Now, without loss, suppose that there is a pair $x,y$ for which $x\succ y$ and $x\sim^{\prime}y$ . By connectedness and continuity, $V=\{z:x\succ z\succ y\}$ is nonempty. Indeed if we assume, towards a contradiction that $V=\varnothing$ , then $\{z:x\succ z\}$ and $\{z:z\succ y\}$ are nonempty open sets. Further, for any $z\in X$ , either $x\succ z$ or $z\succ y$ (because if $\neg(x\succ z)$ then by completeness $z\succeq x$ , which implies that $z\succ y$ ). Conclude that $\{z:x\succ z\}\cup\{z:z\succ y\}=X$ and each of the sets are nonempty and open (by continuity of the preference $\succeq$ ); these sets are disjoint, violating connectedness of $X$ . So we conclude that $V$ is nonempty. By continuity of the preference $\succeq$ , $V$ os open.

We claim that there is a pair $(w,z)\in(V\times V)\cap(B\times B)$ for which $w\succ z$ . For otherwise, for all $(w,z)\in V\times V\cap(B\times B)$ , $w\sim z$ . Conclude then by continuity that for all $(w,z)\in V\times V$ , $w\sim z$ . Observe that this implies that, for any $w\in V$ , the set $\{z:w\succ z\succ y\}=\varnothing$ , as if $w\succ z\succ y$ , we also have that $x\succeq w\succ z$ , from which we conclude $x\succ z$ , so that $z\in V$ and hence $z\sim w$ , a contradiction. Observe that $\{z:w\succ z\succ y\}=\varnothing$ contradicts the continuity of $\succeq$ and the connectedness of $X$ (same argument as nonemptyness of $V$ ; see our discussion above).

We have shown that there is $(w,z)\in(V\times V)\cap(B\times B)$ for which $w\succ z$ , so that $x\succ w\succ z\succ y$ . Further, we have hypothesized that $x\sim^{\prime}y$ . By the first paragraph, we know that $x\succeq^{\prime}w\succeq^{\prime}z\succeq^{\prime}y$ . If, by means of contradiction, we have $w\succ^{\prime}z$ , then $x\succ^{\prime}y$ , a contradiction. So $w\sim^{\prime}z$ and $w\succ z$ , a contradiction to $\succeq_{B\times B}=\succeq^{\prime}_{B\times B}$ .

References

(1)
Afriat (1967a) Afriat, S. N. (1967a): “The Construction of Utility Functions from Expenditure Data,” International Economic Review, 8(1), 67–77.
Afriat (1967b) (1967b): “The Construction of Utility Functions from Expenditure Data,” International Economic Review, 8(1), 67–77.
Aliprantis and Border (2006) Aliprantis, C. D., and K. Border (2006): Infinite Dimensional Analysis: A Hitchhiker’s Guide. Springer, 3 edn.
Balcan, Constantin, Iwata, and Wang (2012) Balcan, M. F., F. Constantin, S. Iwata, and L. Wang (2012): “Learning valuation functions,” in Conference on Learning Theory, pp. 4–1. JMLR Workshop and Conference Proceedings.
Balcan, Daniely, Mehta, Urner, and Vazirani (2014) Balcan, M.-F., A. Daniely, R. Mehta, R. Urner, and V. V. Vazirani (2014): “Learning economic parameters from revealed preferences,” in International Conference on Web and Internet Economics, pp. 338–353. Springer.
Basu and Echenique (2020) Basu, P., and F. Echenique (2020): “On the falsifiability and learnability of decision theories,” Theoretical Economics, 15(4), 1279–1305.
Bei, Chen, Garg, Hoefer, and Sun (2016) Bei, X., W. Chen, J. Garg, M. Hoefer, and X. Sun (2016): “Learning Market Parameters Using Aggregate Demand Queries,” in AAAI.
Beigman and Vohra (2006) Beigman, E., and R. Vohra (2006): “Learning from revealed preference,” in Proceedings of the 7th ACM Conference on Electronic Commerce, pp. 36–42.
Bergstrom, Parks, and Rader (1976) Bergstrom, T. C., R. P. Parks, and T. Rader (1976): “Preferences which Have Open Graphs,” Journal of Mathematical Economics, 3(3), 265–268.
Blundell, Browning, and Crawford (2008) Blundell, R., M. Browning, and I. Crawford (2008): “Best Nonparametric Bounds on Demand Responses,” Econometrica, 76(6), 1227–1262.
Blundell, Browning, and Crawford (2003) Blundell, R. W., M. Browning, and I. A. Crawford (2003): “Nonparametric Engel Curves and Revealed Preference,” Econometrica, 71(1), 205–240.
Border and Segal (1994) Border, K. C., and U. Segal (1994): “Dynamic Consistency Implies Approximately Expected Utility Preferences,” Journal of Economic Theory, 63(2), 170–188.
Boucheron, Bousquet, and Lugosi (2005) Boucheron, S., O. Bousquet, and G. Lugosi (2005): “Theory of classification: A survey of some recent advances,” ESAIM: probability and statistics, 9, 323–375.
Brown and Matzkin (1996) Brown, D. J., and R. L. Matzkin (1996): “Testable Restrictions on the Equilibrium Manifold,” Econometrica, 64(6), 1249–1262.
Camara (2022) Camara, M. K. (2022): “Computationally Tractable Choice,” in Proceedings of the 23rd ACM Conference on Economics and Computation, EC ’22, p. 28, New York, NY, USA. Association for Computing Machinery.
Carvajal, Deb, Fenske, and Quah (2013) Carvajal, A., R. Deb, J. Fenske, and J. K.-H. Quah (2013): “Revealed Preference Tests of the Cournot Model,” Econometrica, 81(6), 2351–2379.
Cerreia-Vioglio, Dillenberger, and Ortoleva (2015) Cerreia-Vioglio, S., D. Dillenberger, and P. Ortoleva (2015): “Cautious expected utility and the certainty effect,” Econometrica, 83(2), 693–728.
Cerreia-Vioglio, Maccheroni, Marinacci, and Montrucchio (2011) Cerreia-Vioglio, S., F. Maccheroni, M. Marinacci, and L. Montrucchio (2011): “Uncertainty averse preferences,” Journal of Economic Theory, 146(4), 1275–1330.
Chambers and Echenique (2016) Chambers, C. P., and F. Echenique (2016): Revealed preference theory, vol. 56. Cambridge University Press.
Chambers, Echenique, and Lambert (2021) Chambers, C. P., F. Echenique, and N. S. Lambert (2021): “Recovering preferences from finite data,” Econometrica, 89(4), 1633–1664.
Chandrasekher, Frick, Iijima, and Le Yaouanq (2021) Chandrasekher, M., M. Frick, R. Iijima, and Y. Le Yaouanq (2021): “Dual-self representations of ambiguity preferences,” Econometrica, forthcoming.
Chapman, Dean, Ortoleva, Snowberg, and Camerer (2017) Chapman, J., M. Dean, P. Ortoleva, E. Snowberg, and C. Camerer (2017): “Willingness to Pay and Willingness to Accept are Probably Less Correlated Than You Think,” NBER working paper No. 23954.
Chapman, Dean, Ortoleva, Snowberg, and Camerer (2022) (2022): “Econographics,” Forthcoming, Journal of Political Economic: Microeconomics.
Chase and Prasad (2019) Chase, Z., and S. Prasad (2019): “Learning Time Dependent Choice,” in 10th Innovations in Theoretical Computer Science Conference (ITCS).
Chateauneuf and Faro (2009) Chateauneuf, A., and J. H. Faro (2009): “Ambiguity through confidence functions,” Journal of Mathematical Economics, 45(9-10), 535–558.
Chateauneuf, Grabisch, and Rico (2008) Chateauneuf, A., M. Grabisch, and A. Rico (2008): “Modeling attitudes toward uncertainty through the use of the Sugeno integral,” Journal of Mathematical Economics, 44(11), 1084–1099.
Chavas and Cox (1993) Chavas, J.-P., and T. L. Cox (1993): “On Generalized Revealed Preference Analysis,” The Quarterly Journal of Economics, 108(2), 493–506.
Clinton, Jackman, and Rivers (2004) Clinton, J., S. Jackman, and D. Rivers (2004): “The Statistical Analysis of Roll Call Data,” The American Political Science Review, 98(2), 355–370.
Diewert (1973) Diewert, W. E. (1973): “Afriat and Revealed Preference Theory,” The Review of Economic Studies, 40(3), 419–425.
Dong, Roth, Schutzman, Waggoner, and Wu (2018) Dong, J., A. Roth, Z. Schutzman, B. Waggoner, and Z. S. Wu (2018): “Strategic classification from revealed preferences,” in Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 55–70.
Echenique, Golovin, and Wierman (2011) Echenique, F., D. Golovin, and A. Wierman (2011): “A revealed preference approach to computational complexity in economics,” in Proceedings of the 12th ACM conference on Electronic commerce, pp. 101–110.
Echenique and Prasad (2020) Echenique, F., and S. Prasad (2020): “Incentive Compatible Active Learning,” in 11th Innovations in Theoretical Computer Science Conference (ITCS).
Falk, Becker, Dohmen, Enke, Huffman, and Sunde (2018) Falk, A., A. Becker, T. Dohmen, B. Enke, D. Huffman, and U. Sunde (2018): “Global Evidence on Economic Preferences,” The Quarterly Journal of Economics, 133(4), 1645–1692.
Forges and Minelli (2009) Forges, F., and E. Minelli (2009): “Afriat’s Theorem for General Budget Sets,” Journal of Economic Theory, 144(1), 135–145.
Fox (1945) Fox, R. H. (1945): “On topologies for function spaces,” Bull. Amer. Math. Soc., 51, 429–432.
Fudenberg, Gao, and Liang (2021) Fudenberg, D., W. Gao, and A. Liang (2021): “How Flexible is that Functional Form? Measuring the Restrictiveness of Theories,” in Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 497–498.
Gilboa and Schmeidler (1989) Gilboa, I., and D. Schmeidler (1989): “Maxmin expected utility with non-unique prior,” Journal of mathematical economics, 18(2), 141–153.
Hansen and Sargent (2001) Hansen, L. P., and T. J. Sargent (2001): “Robust control and model uncertainty,” American Economic Review, 91(2), 60–66.
Hansen and Sargent (2022) (2022): “Risk, Ambiguity, and Misspecification: Decision Theory, Robust Control, and Statistics,” Mimeo: NYU.
Hansen, Sargent, Turmuhambetova, and Williams (2006) Hansen, L. P., T. J. Sargent, G. Turmuhambetova, and N. Williams (2006): “Robust control and model misspecification,” Journal of Economic Theory, 128(1), 45–90.
Hildenbrand (1970) Hildenbrand, W. (1970): “On Economies with Many Agents,” Journal of Economic Theory, 2(2), 161–188.
Kannai (1970) Kannai, Y. (1970): “Continuity Properties of the Core of a Market,” Econometrica, 38(6), 791–815.
Kertz and Rösler (2000) Kertz, R. P., and U. Rösler (2000): “Complete Lattices of Probability Measures with Applications to Martingale Theory,” Lecture Notes-Monograph Series, 35, 153–177.
Levin (1983) Levin, V. L. (1983): “A continuous utility theorem for closed preorders on a metrizable $\sigma$ -compact space,” Doklady Akademii Nauk, 273(4), 800–804.
Maccheroni, Marinacci, and Rustichini (2006) Maccheroni, F., M. Marinacci, and A. Rustichini (2006): “Ambiguity aversion, robustness, and the variational representation of preferences,” Econometrica, 74(6), 1447–1498.
Mas-Colell (1974) Mas-Colell, A. (1974): “Continuous and Smooth Consumers: Approximation Theorems,” Journal of Economic Theory, 8(3), 305–336.
Mas-Colell (1977) (1977): “On the Continuous Representation of Preorders,” International Economic Review, 18(2), 509–513.
Mas-Colell (1978) (1978): “On Revealed Preference Analysis,” The Review of Economic Studies, 45(1), 121–131.
Matzkin (1991) Matzkin, R. L. (1991): “Axioms of Revealed Preference for Nonlinear Choice Sets,” Econometrica, 59(6), 1779–1786.
Nishimura, Ok, and Quah (2017) Nishimura, H., E. A. Ok, and J. K.-H. Quah (2017): “A Comprehensive Approach to Revealed Preference Theory,” American Economic Review, 107(4), 1239–1263.
Poole and Rosenthal (1985) Poole, K. T., and H. Rosenthal (1985): “A Spatial Model for Legislative Roll Call Analysis,” American Journal of Political Science, 29(2), 357–384.
Reny (2015) Reny, P. J. (2015): “A Characterization of Rationalizable Consumer Behavior,” Econometrica, 83(1), 175–192.
Richter (1966) Richter, M. K. (1966): “Revealed Preference Theory,” Econometrica, 34(3), 635–645.
Samuelson (1938) Samuelson, P. A. (1938): “A note on the pure theory of consumer’s behaviour,” Economica, 5(17), 61–71.
Schmeidler (1989) Schmeidler, D. (1989): “Subjective probability and expected utility without additivity,” Econometrica, 57(3), 571–587.
Schneider (2014) Schneider, R. (2014): Convex Bodies: the Brunn-Minkowski Theory. Cambridge University Press, 2 edn.
Ugarte (2022) Ugarte, C. (2022): “Preference Recoverability from Inconsistent Choices,” Mimeo, UC Berkeley.
Varian (1982) Varian, H. R. (1982): “The Nonparametric Approach to Demand Analysis,” Econometrica, 50(4), 945–973.
von Gaudecker, van Soest, and Wengstrom (2011) von Gaudecker, H.-M., A. van Soest, and E. Wengstrom (2011): “Heterogeneity in Risky Choice Behavior in a Broad Population,” The American Economic Review, 101(2), 664–94.
Zadimoghaddam and Roth (2012) Zadimoghaddam, M., and A. Roth (2012): “Efficiently learning from revealed preference,” in International Workshop on Internet and Network Economics, pp. 114–127. Springer.
Zhang and Conitzer (2020) Zhang, H., and V. Conitzer (2020): “Learning the Valuations of a k-demand Agent,” in International Conference on Machine Learning.