This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\AtAppendix

The Value of Context:
Human versus Black Box Evaluators

Andrei Iakovlev and Annie Liang Department of Economics, Northwestern University. We thank Modibo Camara, Krishna Dasaratha, Alex Frankel, Ben Golub, Kevin He, Xiaosheng Mu, Matthew Murphy, Jacopo Perego, Debraj Ray, and Marzena Rostek for helpful comments and suggestions.
Abstract

Machine learning algorithms are now capable of performing evaluations previously conducted by human experts (e.g., medical diagnoses). How should we conceptualize the difference between evaluation by humans and by algorithms, and when should an individual prefer one over the other? We propose a framework to examine one key distinction between the two forms of evaluation: Machine learning algorithms are standardized, fixing a common set of covariates by which to assess all individuals, while human evaluators customize which covariates are acquired to each individual. Our framework defines and analyzes the advantage of this customization—the value of context—in environments with high-dimensional data. We show that unless the agent has precise knowledge about the joint distribution of covariates, the benefit of additional covariates generally outweighs the value of context.

1 Introduction

“A statistical formula may be highly successful in predicting whether or not a person will go to a movie in the next week. But someone who knows that this person is laid up with a broken leg will beat the formula. No formula can take into account the infinite range of such exceptional events.” — Atul Gawande, Complications: A Surgeon’s Notes on an Imperfect Science


Predictions about people are increasingly automated using black-box algorithms. How should individuals compare evaluation by algorithms (e.g., medical diagnosis by a machine learning algorithm) with more traditional evaluation by human experts (e.g., medical diagnosis by a doctor)?

One important distinction is that black-box algorithms are standardized, fixing a common set of inputs by which to assess all individuals. Unless the inputs to the black box are exhaustive, additional information can (in some cases) substantially modify the interpretation of those inputs that have been acquired. For example, the context that a patient is currently fasting may change the interpretations of “dizziness” and “electrolyte imbalance,” and the context that a job applicant is an environmental activist may change how a prior history of arrest is perceived. If these auxiliary characteristics are not specified as inputs in the algorithm, the individual cannot supply them.

In contrast, individuals can often explain their unusual circumstances or characteristics to a human evaluator through conversation. Thus, even if the human evaluator considers fewer inputs than a black box algorithm does, these inputs may be better adapted to the individual being evaluated. The perception that humans are better able to take into account an individual’s unique situation is a significant factor in patient resistance to AI in healthcare (Longoni et al., 2019). Our objective is to understand when, and to what extent, this difference between human and machine evaluation matters.

Our contribution in this paper is twofold. First, we propose a theoretical framework that formalizes this distinction between human and black box evaluation. Second, we explain assumptions under which it will turn out that the the agent should prefer one form of evaluation over the other. We see our paper as a complement to a growing empirical literature that compares human versus black box evaluation. Here our goal is to conceptualize the difference between human and black box evaluators, and to clarify properties of the informational environment that are important for choosing between the two.

In our model, an agent is described by a binary covariate vector and a real-valued type (e.g., the severity of the agent’s medical condition). The type can be written as a function of the covariates, which we henceforth call the type function. Covariates are separated into standard covariates (e.g., medical history, lab tests, imaging scans) and nonstandard covariates (e.g., religious information, genetic data, wearable device data, and financial data).

We suppose that the agent may know how the standard covariates are correlated with the type, but cannot distinguish between the predictive roles of the nonstandard covariates. Formally, the agent has a belief over the type function, and we impose two assumptions on the agent’s prior. The first is a symmetry assumption that says that the agent’s prior over these functions is unchanged by permuting the labels and values of the nonstandard covariates. If we interpret the covariates as signals about the agent’s type, then uncertainty about the type function corresponds to uncertainty about the signal structure (à la model uncertainty, e.g., Acemoglu et al. (2015) and Morris and Yildiz (2019)). The second assumption fixes the unpredictability of the agent’s type to be constant in the total number of covariates. We impose this because in many applications, machine learning algorithms have millions of inputs, and yet cannot predict the outcome perfectly. Thus our “many covariates” limit does not represent a situation in which the total of amount of information grows large, but rather one in which the type function can be arbitrarily complex. We view these two assumptions as useful conceptual benchmarks, but subsequently show that neither are essential for our main results (see Section 5 for details).

The agent’s payoff is determined by his true type and an evaluation, which may be made either by a human evaluator or a black-box evaluator. In either case the evaluation is a conditional expectation of the agent’s type given the agent’s standard covariates and some fraction of the agent’s nonstandard covariates. But the sets of nonstandard covariates that are observed by the black box evaluator and the human evaluator differ in two ways.

First, the black box evaluator observes a larger fraction of the nonstandard covariates than the human evaluator does (since humans cannot process as much information as black box algorithms can). Second, the nonstandard covariates observed by the black box evaluator are a pre-specified set of algorithmic inputs, which are fixed across individuals. For example, a designer of a medical algorithm may specify a set of inputs including (among others) blood type, BMI, and smoking status. The black box algorithm learns a mapping from those inputs into the diagnosis. We view the human evaluator as instead uncovering nonstandard covariates during a conversation, where the specific path of questioning may vary across agents. Thus the human evaluator may end up learning about one individual’s sleep schedule but another individual’s financial situation.

Rather than modeling these conversations directly, we consider an upper bound on the agent’s payoff under human evaluation, where the covariates that the human observes are the ones that maximize the agent’s payoffs (subject to the human’s capacity constraint). We say that the agent prefers the black box if the agent’s expected payoffs are higher under black box evaluation even compared to these best-case conversations with the human.

This comparison essentially reduces to the question of whether the agent prefers an evaluator who observes a larger fraction of (non-targeted) nonstandard covariates about the agent, or an evaluator who observes a smaller but targeted fraction of nonstandard covariates.111This question is spiritually related to Akbarpour et al. (2024)’s comparison of the network diffusion value of a small number of targeted seeds versus a larger number of randomly selected seeds. Like them, we will find that a larger number of (non-targeted) inputs is superior, but the mechanisms behind these results are very different; in particular, network structure does not play a role in our results. Towards this comparison, we first introduce a benchmark, which is the expected payoff that the agent would receive if interacting with an evaluator who observes no nonstandard covariates. We define the value of context to be the improvement in the agent’s payoffs under best-case human evaluation, relative to this benchmark. The value of context thus quantifies the extent to which the agent’s payoffs can possibly be improved when the evaluator observes nonstandard covariates suited to that agent.

Our first main result says that under our assumptions on the agent’s prior, the expected value of context vanishes to zero as the number of covariates grows large. Thus even though there may be realizations of the type function given which the value of context is large, in expectation it is not. The contrapositive of this result is that if the expected value of context is high in some application, it must be that our assumptions on the prior do not hold, i.e., the agent has some ex-ante knowledge about the predictive roles of the nonstandard covariates.

We prove this result by studying the sensitivity of the evaluator’s expectation to the set of covariates that are revealed. Intuitively, a large value of context requires that the evaluator’s beliefs move sharply after observing certain nonstandard covariates. We show that the largest feasible change in the evaluator’s beliefs can be written as the maximum over a set of random variables, each corresponding to the movement in the evaluator’s beliefs for a given choice of covariates to reveal. The proof proceeds by first reducing this problem to studying the maximum of a growing sequence of (appropriately constructed) i.i.d. variables, and then applying a result from Chernozhukov et al. (2013) to show that this expected maximum concentrates on its expectation as the number of covariates grows large. We conclude by bounding this expectation and demonstrating that it vanishes.

We next use this result to compare the agent’s expected payoff under human and black box evaluation, when the total number of covariates is sufficiently large. We show that when the agent prefers a more accurate evaluation—formally, when the agent’s payoff is convex in the evaluation—the agent should (eventually) prefer an algorithmic evaluator with access to more covariates over a human evaluator to whom the agent can provide context. And when the agent’s payoff is concave in the evaluation, the conclusion is (eventually) reversed. We view these conclusions as relevant not only in the many-covariates limit: We quantify the number of covariates that is needed for our result to hold, and show that it can be quite small. For example, if the agent’s utility function satisfies mild regularity conditions, the Human evaluator observes 10% of covariates, and the Black Box evaluator observes 90%, of covariates, then our result holds as long as there are at least 14 covariates.

We subsequently strengthen our main results in two ways: First, we show that not only does the expected value of context vanish for each agent, but in fact the expected maximum value of context across agents also vanishes. Thus, the expected value of context is eventually small for everyone in the population. Second, we show that our main results extend when the agent and evaluator interact in a disclosure game, where the agent chooses which nonstandard covariates to reveal, and the evaluator makes inferences about the agent based on which covariates are revealed (given the agent’s equilibrium reporting strategy).

We conclude by examining the role of our assumptions about the agent’s prior, and the extent to which our results depend on them. First, we study two variations of our main model, in which the symmetry assumption is relaxed: In the first, we suppose that there is a “low-dimensional” set of covariates that relevant for predicting the agent’s type; in the second, we suppose that the agent knows ex-ante the predictive role of certain nonstandard covariates. In both of these settings, our main results extend partially but can also fail: For example, if the set of relevant covariates is sufficiently small that they can be fully disclosed to the evaluator, then the expected value of context typically will not vanish. Next we show that our results extend also in a model in which the predictability of the agent’s type is higher in environments with a larger number of covariates (thus relaxing our second asumption on the prior). Finally we provide an abstract learning condition under which our results extend: It is enough for the informativeness of each individual set of covariates to vanish as the total number of covariates grows large. Together with our main results, these extensions clarify different categories of informational assumptions under which the expected value of context does or does not turn out to be high.

Our model is not meant to be a complete description of the differences between human and black box evaluation. For example, we do not consider human or algorithmic bias (Kleinberg et al., 2017; Gillis et al., 2021), explainability (Yang et al., 2024), preferences for empathetic evaluators, or the possibility that the human evaluator has access to information that is not available to the algorithm (e.g., for privacy protection reasons as in Agarwal et al. (2023)). We also suppose that both evaluators form correct conditional expectations, thus abstracting away from the possibility of algorithmic overfitting and of bounded human rationality (e.g., as considered in Spiegler (2020) and Haghtalab et al. (2021)).222The problem of overfitting, while practically important, is a function of how the algorithm is trained. We are interested here in intrinsic differences between the qualitative nature of human and black box evaluation, which are difficult to resolve by training the algorithm differently. We leave extensions of our model that include these other interesting differences to future work.

1.1 Related Literature

Our paper is situated at the intersection of the literatures on learning (Section 1.1.1) and strategic information disclosure (Section 1.1.2), where our analysis is primarily differentiated from the previous frameworks by our assumption that the agent has model uncertainty (see Section 1.1.1). Our paper is also inspired by a recent empirical literature that compares human and AI evaluation, which we review in Section 1.1.3.

1.1.1 (Asymptotic) Learning

A large literature studies asymptotic learning and agreement across Bayesian agents (Blackwell and Dubins, 1962). Our main result (Theorem 3.1) can be viewed as bounding (in expectation) the differences in beliefs across Bayesian agents who are given different information. As in Vives (1992), Golub and Jackson (2012), Liang and Mu (2019), Harel et al. (2020), and Frick et al. (2023) among others, we quantify the rate of convergence in beliefs. The learning rates that we look at are, however, of a different nature from those studied previously. One important distinction is that these previous papers consider asymptotics as the total amount of information accumulates, while our analysis considers asymptotics with respect to a sequence of information structures that we show are increasingly less informative. A second important difference is that the classic learning models suppose that the agent updates to a signal with a known signal structure, while our agent has uncertainty over the signal structure (as in Acemoglu et al. (2015) and Morris and Yildiz (2019)). Our results characterize the informativeness of this signal in expectation, where the agent’s model uncertainty takes a particular (and new) form motivated by the applications we have in mind.

Finally, our paper is related to Di Tillio et al. (2021), which compares the informativeness of an unbiased signal to the informativeness of a selected signal whose realization is the maximum realization across i.i.d. unbiased signals. Again the key difference is our assumption of model uncertainty—that is, in Di Tillio et al. (2021), the signal structures that are being compared are deterministic and known, while in ours they are random and compared in expectation. In particular, our agent’s prior belief over signal structures can have support on signal processes which are not i.i.d. (for example, it may be that the meaning of one signal is dependent on the meaning of another).

1.1.2 Strategic Information Disclosure

Several literatures study persuasion via strategic information disclosure. Our model—in which the sender has private information about his type vector, and selectively chooses which elements to disclose to a naive receiver—is closest to models of disclosure of hard information (Dye, 1985; Grossman and Hart, 1980), in particular Milgrom (1981).333A similar model of information is considered in Glazer and Rubinstein (2004) and Antic and Chakraborty (2023). The key difference (which follows from our assumption of model uncertainty) is that our sender has uncertainty about how his reports are interpreted. Additionally, our focus is not on examining which incentive-compatible reporting strategy is optimal,444Indeed, in our main model we do not require choice of an incentive-compatible reporting strategy, since the receiver updates to the sender’s disclosure as if it were exogenous information. This is primarily for convenience—we show in Section 4.2 that our results extend in a disclosure game. but instead on asymptotic limits of belief manipulability as the number of components in the type vector grows large. This latter focus is special to our motivating applications.

Our model also has important differences from the other main strands of the persuasion literature. Unlike models of cheap talk (Crawford and Sobel, 1982), our agent chooses between messages whose meanings are fixed exogenously (through the realization of the joint distribution relating covariates to the type) rather than in an equilibrium. Unlike the literature on Bayesian persuasion (Kamenica and Gentzkow (2011)), our sender chooses which signal realization to share ex-post from a finite set of signal realizations, rather than committing to a flexibly chosen information structure ex-ante.555Thus, for example, Bayes plausibility is not satisfied in our setting—the sender’s expectation of the receiver’s expectation of the state (following disclosure) is generally not the prior expectation of the state. Indeed, our model gives the sender substantial power to influence the receiver’s beliefs relative to this previous literature. It is perhaps surprising, then, that despite the lack of constraints imposed on the sender, we find that the sender is extremely limited in his influence. In our model, this emerges because the sender has a limited choice from a set of information structures, whose informativeness (we show) is vanishing in the total number of covariates.666The covariates in our model play a similar role to attributes, although the literature on attributes has focused on choice of which attributes to learn about (e.g., Klabjan et al. (2014) and Liang et al. (2022)), rather than which attributes to disclose for the purpose of persuasion. An exception is Bardhi (2023), who studies a principal-agent problem in which a principal selectively samples attributes to influence an agent decision.

1.1.3 Human vs AI Evaluation

Recent empirical papers compare the accuracy of human evaluation with AI evaluation, finding that machine learning algorithms outperform experts in problems including medical diagnosis (Rajpurkar et al., 2017; Jung et al., 2017; Agarwal et al., 2023), prediction of pretrial misconduct (Kleinberg et al., 2017; Angelova et al., 2022), and prediction of worker productivity (Chalfin et al., 2016). Nonetheless, many individuals continue to distrust algorithmic predictions (Jussupow and Heinzl, 2020; Bastani et al., 2022; Lai et al., 2023). These findings motivate our goal of understanding whether individuals should prefer human evaluators, and when instead the replacement of human evaluation with algorithmic evaluation is welfare-improving for users, as suggested in Obermeyer and Emanuel (2016) among others.

In principle, human decision-making guided by algorithmic predictions should be superior to either human or algorithmic prediction alone. In practice the evidence is more mixed, with the provision of algorithmic recommendations sometimes leading human decision-makers to less accurate predictions (Hoffman et al., 2017; Angelova et al., 2022; Agarwal et al., 2023).777Other papers instead consider algorithmic prediction tools that take human evaluation as an input, with greater success towards improving accuracy (e.g., Raghu et al. (2019)). The question of how to aggregate human and machine evaluations is thus important but subtle, and depends on (among other things) whether human decision-makers understand the correlation between their information and that of the algorithm (McLaughlin and Spiess, 2022; Gillis et al., 2021; Agarwal et al., 2023). We abstract away from these complexities, focusing instead on (one aspect of) the more basic question of why human oversight is even necessary to begin with. We provide a tractable way of formalizing the advantage of human evaluation, and quantify the size of this advantage.

2 Model

2.1 Setup

Agents are each described by a binary covariate vector 𝕩n=(x1,x2,,xn)\mathbb{x}_{n}=(x_{1},x_{2},\dots,x_{n}) and a type y[y¯,y¯]y\in\left[-\overline{y},\overline{y}\right] (where 0y¯<0\leq\overline{y}<\infty), which are structurally related by the function

y=f(x1,,xn).y=f(x_{1},\dots,x_{n}).

We refer to ff henceforth as the type function. The distribution over covariate vectors is uniform in the population.888All of our results extend for arbitrary finite-valued covariates.

We refer to the covariates indexed to 𝒮={1,,s}\mathcal{S}=\{1,\dots,s\} as standard covariates and the covariates indexed to 𝒩={s+1,,n}\mathcal{N}=\{s+1,\dots,n\} as nonstandard covariates.

Example 1 (Job Interview).

Standard covariates describing a job applicant may include their work history, education level, college GPA, and the coding languages they know. Nonstandard covariates may include their social media activity (e.g., number of followers, posts, likes), wearable device data (e.g., sleep patterns, physical activity level), and hobbies (e.g., whether they are active readers, whether they enjoy extreme sports).

Example 2 (Medical Prediction).

Standard covariates describing a patient may include symptoms, prior diagnoses, family medical history, lab tests and imaging results. Nonstandard covariates may include the patient’s religious practices, genetic data, wearable device data, and financial data.999See Acosta et al. (2022) for further examples of nonstandard patient covariates that may be predictive, but which are not currently used by clinicians for medical evaluations.

An evaluation of the agent, y^[y¯,y¯]\hat{y}\in\left[-\overline{y},\overline{y}\right], is described in the following section. The agent has a Lipschitz continuous utility function u:[y¯,y¯]2u:\left[-\overline{y},\overline{y}\right]^{2}\rightarrow\mathbb{R}, which maps the evaluation y^\hat{y} and the agent’s true type yy into a payoff.

Example 3 (Higher Evaluations are Better).

The agent’s payoff is

u(y^,y)=ϕ(y^)u(\hat{y},y)=\phi(\hat{y})

for some increasing ϕ\phi. This corresponds, for example, to an agent receiving a desired outcome (e.g., a loan or a promotion) with probability increasing in the evaluation.

Example 4 (More Accurate Evaluations are Better).

The agent’s payoff is

u(y^,y)=(y^y)2.u(\hat{y},y)=-(\hat{y}-y)^{2}.

This corresponds to harms that are decreasing in the accuracy of the evaluation, e.g., medical prediction problems where more accurate evaluations are desired.

2.2 Evaluation of the agent

There are two evaluators: a black box evaluator, henceforth Black Box (it), and a human evaluator, henceforth Human (she). Both form evaluations as an expectation of the agent’s type yy given observed covariates, so we will introduce notation for these conditional expectations. For any covariate vector 𝕩n=(x1,,xn)\mathbb{x}_{n}=(x_{1},\dots,x_{n}) and subset of nonstandard covariates A𝒩A\subseteq\mathcal{N}, let

CA(𝕩n)={x~{0,1}n:x~i=xii𝒮A}C_{A}(\mathbb{x}_{n})=\{\tilde{x}\in\{0,1\}^{n}:\tilde{x}_{i}=x_{i}\quad\forall i\in\mathcal{S}\cup A\} (2.1)

be the set of all covariate vectors that agree with 𝕩n\mathbb{x}_{n} on the covariates with indices in 𝒮A\mathcal{S}\cup A. Further define

y^𝕩nf(A)=1|CA(𝕩n)|xCA(𝕩n)f(x)\hat{y}^{f}_{\mathbb{x}_{n}}(A)=\frac{1}{|C_{A}(\mathbb{x}_{n})|}\sum_{x\in C_{A}(\mathbb{x}_{n})}f(x) (2.2)

to be the conditional expectation of the agent’s type given their standard covariates and their nonstandard covariates with indices in AA. We use

U𝕩nf(A)=u(y^𝕩nf(A),y)U^{f}_{\mathbb{x}_{n}}(A)=u\left(\hat{y}^{f}_{\mathbb{x}_{n}}(A),y\right)

to denote the agent’s payoff given this evaluation.

Both the human and black box evaluation take the form (2.2), but the observed sets of nonstandard covariates AA are different across the evaluators. Black Box observes the nonstandard covariates in the set B={s+1,,s+bn}B=\{s+1,\dots,s+b_{n}\} where bn=αbnb_{n}=\lfloor\alpha_{b}\cdot n\rfloor.101010One can instead assume that these nonstandard covariates are selected uniformly at random. This will not affect the results of this paper. Importantly, this set is held fixed across agents. So an individual with covariate vector 𝕩n\mathbb{x}_{n} receives the evaluation y^𝕩nf(B)\hat{y}_{\mathbb{x}_{n}}^{f}(B) and payoff U𝕩nf(B)U_{\mathbb{x}_{n}}^{f}(B) when evaluated by the Black Box.111111It is not important for our results that BB is common across individuals; what we require is that any randomness in BB is independent of the agent’s covariates and type. For example, if the set BB were drawn uniformly at random for each agent, our results would hold.

Human differs from Black Box in two ways. First, Human has a capacity of hn=αhnh_{n}=\lfloor\alpha_{h}\cdot n\rfloor nonstandard covariates per agent, where αh<αb\alpha_{h}<\alpha_{b} (i.e., Human cannot process as many inputs as Black Box). Second, Human does not pre-specify which nonstandard covariates to observe, but rather learns these through conversation, and thus potentially observes different nonstandard covariates for each agent. For example, a doctor (evaluator) may pose different questions to different patients (agents) depending on their answers to previous questions. Or a job candidate (agent) might choose to disclose to an interviewer (evaluator) certain nonstandard covariates that are favorable to him.

Rather than modeling the complex process of a conversation, we study the quantity

maxH𝒩,|H|αhnU𝕩nf(H)\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H) (2.3)

which is the agent’s payoff when the posterior expectation about his type is based on those αhn\alpha_{h}\cdot n or fewer covariates that are best for him.

We can interpret this quantity as an upper bound for the agent’s payoffs under certain assumptions. First, if the evaluator selects which covariates to observe, then (2.3) is an upper bound on the agent’s possible payoffs across all possible evaluator selection rules. Second, if covariates are disclosed by the agent, but the evaluator updates to the disclosed covariates as if they had been chosen exogenously, then again (2.3) represents an upper bound on the agent’s possible payoffs.121212Jin et al. (2021) and Farina et al. (2023) report that the beliefs of experimental subjects falls somewhere in between this naive benchmark and equilibrium beliefs, since subjects do not completely account for the strategic nature of disclosure.

If however the covariates are disclosed by the agent in a disclosure game, and the evaluator accounts for the strategic nature of this disclosure, then whether (2.3) represents an upper bound will depend on what we assume that the agent knows at the time of disclosure.131313This is roughly because the agent can potentially “sneak in” information about the other covariates via the covariates that are revealed. We show in Section 4.2 that if the agent knows his entire covariate vector, then (2.3) need not upper bound every agent’s payoffs. Nevertheless, we present a different quantity that does upper bound the maximum payoff that any agent can obtain in this disclosure game, and show that our main results extend when we replace (2.3) with this quantity. To streamline the exposition we focus on the prior two interpretations (in which the human evaluator either selects the covariates herself or updates to the agent’s disclosures naively), and discuss disclosure games in Section 4.2.

2.3 Value of context

A key input towards understanding the comparison between Human and Black Box is quantifying the extent to which individualized context improves the agent’s payoffs.

Definition 2.1 (Value of Context).

The value of context for an agent with covariate vector 𝕩n\mathbb{x}_{n} and type y=f(𝕩n)y=f(\mathbb{x}_{n}) is

v(f,𝕩n)=maxH𝒩,|H|αhnU𝕩nf(H)U𝕩nf()v(f,\mathbb{x}_{n})=\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}n}U_{\mathbb{x}_{n}}^{f}(H)-U_{\mathbb{x}_{n}}^{f}(\varnothing)

i.e., the best possible improvement in the agent’s utility when the evaluator additionally observes up to αhn\alpha_{h}\cdot n covariates for the agent.

In general, the value of context depends on the type function ff as well as on the agent’s own covariate vector 𝕩n\mathbb{x}_{n}.141414The value of context given a specific function ff is spiritually related to the communication complexity of ff (Kushilevitz and Nisan, 1996).

Example 5 (High Value of Context).

Let u(y^,y)=y^u(\hat{y},y)=\hat{y}, i.e., the agent’s payoff is the evaluation. Suppose x1x_{1} is a standard covariate (observed no matter what), while x2,,x100x_{2},\dots,x_{100} are nonstandard covariates. The type yy is related to these covariates via the type function

y=f(x1,,x100)={c if x1=x2c if x1x2y=f(x_{1},\dots,x_{100})=\left\{\begin{array}[]{cc}c&\mbox{ if }x_{1}=x_{2}\\ -c&\mbox{ if }x_{1}\neq x_{2}\end{array}\right.

For an agent who can reveal (up to) one covariate and whose covariate vector is (1,1,,1)(1,1,\dots,1), the value of context is cc, since revealing x2=1x_{2}=1 moves the expectation of his type from 0 to cc. This example corresponds to settings in which some nonstandard covariate substantially moderates the interpretation of a standard covariate. For such type functions ff, it is important for the evaluator to observe the right nonstandard covariates, and so the value of context can be large.

Example 6 (Low Value of Context).

Suppose the type function in the previous example is instead y=f(x1)=x1y=f(x_{1})=x_{1} (leaving all other details of the example unchanged). Then the value of context is 0 for every agent. In this example, nonstandard covariates are irrelevant for predicting the type, so there is no value to the evaluator discovering the “right” covariates.

In what follows, we give the agent uncertainty about ff and characterize the agent’s expected value of context and expected payoffs, integrating over the agent’s belief about ff.151515If we interpret the covariates in our model as signals about the type, then the function relating covariates to type corresponds to the signal structure.

We do this for two reasons. First, in many applications it is not realistic to suppose that the agent knows ff. For example, a patient who anticipates that a diagnosis will be based on an image scan of his kidney may recognize that there are properties of the image that are indicative of whether he has the condition or not, but likely does not know what the relevant properties are, or how they determine the diagnosis.161616In the case of a job interview, the function ff may reflect particular subjective preferences of the firm, which are initially unknown to the agent.

Second, the case with uncertainty about ff turns out to yield a more elegant analysis than the one in which ff is known. That is, although the value of context for specific functions ff depends on details of that function and on the agent’s own covariate vector, there is a large class of prior beliefs (described in the following section) for which it is possible to draw strong detail-free conclusions about the expected value of context.

2.4 Model Uncertainty

We impose two assumptions on the agent’s prior belief about ff. Together, these assumptions deliver a setting in which many different structural relationships between the covariates and the type are possible (including both ones where the value of context is high and low), but ex-ante those relationships are not known.

The first assumption says that while the agent may know how standard covariates impact the type, he has no ex-ante knowledge about the roles of the nonstandard covariates.

Assumption 1 (Exchangeability).

For every realization of the standard covariates (x1,,xs)(x_{1},\dots,x_{s}), the sequence

(Y1n,,Y2nsn)(f(x1,,xs,xs+1,,xn):(xs+1,,xn){0,1}ns)(Y^{n}_{1},\dots,Y^{n}_{2^{n-s}})\equiv(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s}) (2.4)

is finitely exchangeable.

The set {(f(x1,,xs,xs+1,,xn):(xs+1,,xn){0,1}ns}\{(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s}\} ranges over all covariate vectors that share the standard covariate values (x1,,xs)(x_{1},\dots,x_{s}). Assumption 1 says that the joint distribution of these types is ex-ante invariant to permutations of the labels and values of the nonstandard covariates. An agent whose prior satisfies Assumption 1 is thus agnostic about how the nonstandard covariates impact the type.

While our assumption of no prior knowledge about the role of nonstandard covariates is strong, it is consistent with our interpretation of the nonstandard covariates as those covariates for which there is little historical data about correlations. For example, if it were known that a higher GPA positively correlates with on-the-job performance, but not how a large number of social media followers predicts performance, then we would think of GPA as a standard covariate and the number of social media followers as a nonstandard covariate.

Assumption 1 does not restrict how the agent’s prior varies with nn, the number of covariates. In a model in which x1,x2,x_{1},x_{2},\dots were drawn i.i.d. from a type-dependent distribution FyF_{y}, the total quantity of information about yy would increase in the number of covariates, and the evaluator’s uncertainty about yy would vanish as nn grew large. This does not seem descriptive of real applications: credit scoring algorithms and healthcare algorithms use millions of covariates, but there remains substantial residual uncertainty about the agent’s type. We take the opposite extreme in which the predictability of the agent’s type is a primitive of the setting, which is held constant for all nn. In our model, nn is not a measure of the total quantity of information, but instead moderates the richness of the informational environment and the potential complexity of the mapping ff. Loosely speaking, as nn grows large, the agent has a more extensive set of words to describe a fixed unknown yy.171717As nn grows large, the smallest possible informational size of each covariate (in the sense of McLean and Postlewaite (2002)) vanishes. But we do not require each covariate to be equally informationally relevant in the realized function. So, for example, f(x1,,xn)=x1f(x_{1},\dots,x_{n})=x_{1} can be in the support of the agent’s beliefs for nn arbitrarily large (see Example 7).

Assumption 2 (Constant Unpredictability of YY).

Fix any realization of the standard covariates (x1,,xs)(x_{1},\dots,x_{s}), and let (Y1n,,Y2nsn)(Y^{n}_{1},\dots,Y_{2^{n-s}}^{n}) be defined as in (2.4) for each n+n\in\mathbb{Z}_{+}. Then for every pair n>n~n^{\prime}>\tilde{n}, the sequence (Y1n~,,Y2n~sn~)(Y^{\tilde{n}}_{1},\dots,Y^{\tilde{n}}_{2^{\tilde{n}-s}}) and the truncated sequence (Y1n,,Y2n~sn)(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{\tilde{n}-s}}) are identically distributed.

The statement of Assumption 2 formally depends on the ordering of types within the vector (Y1n,,Y2nn)(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{n^{\prime}}}), since this determines which types appear in the truncated sequence (Y1n,,Y2nn)(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{n}}). But if we further impose Assumption 1 (and we will always impose these assumptions jointly), then the ordering of types is irrelevant: That is, when Assumption 2 holds for one such ordering, it will hold for all orderings.


It is important to note that in our model, Assumptions 1 and 2 are placed ex-ante on the agent’s prior, and not ex-post on the realized function ff. For example, the function f(x1,,xn)=x1f(x_{1},\dots,x_{n})=x_{1}, which says that the only covariate that matters is x1x_{1}, is strongly asymmetric (x1x_{1} is differentiated from the other covariates) and also features a single “large” covariate (the realization of x1x_{1} completely determines yy). Our assumptions do not rule out the possibility of this function. Rather, they require that if this function is considered possible, then certain other functions are as well.181818Assumption 1 requires that for every permutation π:{0,1}n{0,1}n\pi:\{0,1\}^{n}\rightarrow\{0,1\}^{n}, the function gπg_{\pi} satisfying gπ(x1,,xn)=f(π(x1,,xn))g_{\pi}(x_{1},\dots,x_{n})=f(\pi(x_{1},\dots,x_{n})) is also in the support of the agent’s beliefs.

Simple examples of priors satisfying these two assumptions are given below.

Example 7.

Let y{0,1}y\in\{0,1\}, in which case the space of possible functions f:𝒳𝒴f:\mathcal{X}\rightarrow\mathcal{Y} can be identified with {0,1}2n\{0,1\}^{2^{n}}. Suppose that for each nn, the agent has a uniform prior on the set of all functions {0,1}2n\{0,1\}^{2^{n}}. Then Assumptions 1 and 2 are satisfied.

Example 8.

Suppose there is a distribution FF on [y¯,y¯][-\overline{y},\overline{y}] such that for each nn,

(f(x1,,xs,xs+1,,xn):(xs+1,,xn){0,1}ns)i.i.d.F.\left(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s}\right)\sim_{i.i.d.}F.

Then Assumptions 1 and 2 are satisfied.


Priors that violate these assumptions include the following.

Example 9 (Only One Covariate is Relevant).

The type is equal to the value of the nonstandard covariate xIx_{I}, where the index II is drawn uniformly at random from 𝒩\mathcal{N}. Then Assumption 1 fails.191919Suppose n=2n=2, and both covariates are nonstandard. Then under the agent’s prior, f{f^,f~}f\in\left\{\hat{f},\tilde{f}\right\} where f^(1,1)=f^(1,0)=1\hat{f}(1,1)=\hat{f}(1,0)=1 while f^(0,1)=f^(0,0)=0\hat{f}(0,1)=\hat{f}(0,0)=0, and f~(1,1)=f~(0,1)=1\tilde{f}(1,1)=\tilde{f}(0,1)=1 while f~(1,0)=f~(0,0)=0\tilde{f}(1,0)=\tilde{f}(0,0)=0. So the agent knows with certainty that f(1,1)=1f(1,1)=1 but f(0,0)=0f(0,0)=0, in violation of Assumption 1. This example is consistent with exchangeability in the labels of the nonstandard covariates, but not with exchangeability in their realizations.

Example 10 (Higher Values are Better).

The value of f(𝕩n)f(\mathbb{x}_{n}) is (independently) drawn from a uniform distribution on [1,2][1,2] if xs+1=1x_{s+1}=1, and (independently) drawn from a uniform distribution on [0,1][0,1] if xs+1=0x_{s+1}=0. Then Assumption 1 fails.

We view these two assumptions as useful conceptual benchmarks, but neither is necessary for our subsequent results. In Section 5, we explore how far our main results generalize under different relaxations of Assumptions 1 and 2.

2.5 Expected Value of Context

We now define the expected value of context from the point of view of an agent who knows his covariate vector 𝕩n\mathbb{x}_{n} but does not know the function ff (and hence also does not know his type y=f(𝕩n)y=f(\mathbb{x}_{n})). As we show in Section 4.1, the assumption that the agent knows 𝕩n\mathbb{x}_{n} is immaterial for the results.

Definition 2.2 (Expected Value of Context).

For every n+n\in\mathbb{Z}_{+} and covariate vector 𝕩n{0,1}n\mathbb{x}_{n}\in\{0,1\}^{n}, the expected value of context is

V(n,𝕩n)=𝔼[v(f,𝕩n)].V(n,\mathbb{x}_{n})=\mathbb{E}\left[v(f,\mathbb{x}_{n})\right].

This quantity tells us the extent to which context improves the agent’s payoffs in expectation.

We similarly compare evaluators based on the expected payoff that the agent receives.

Definition 2.3.

Consider any agent with covariate vector 𝕩n\mathbb{x}_{n}. If

𝔼[maxH𝒩,|H|αhnU𝕩nf(H)]<𝔼[U𝕩nf(B)]\mathbb{E}\left[\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H)\right]<\mathbb{E}\left[U_{\mathbb{x}_{n}}^{f}(B)\right] (2.5)

then say that the agent prefers the black box evaluator. And if

𝔼[minH𝒩,|H|αhnU𝕩nf(H)]>𝔼[U𝕩nf(B)]\mathbb{E}\left[\min_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H)\right]>\mathbb{E}\left[U_{\mathbb{x}_{n}}^{f}(B)\right] (2.6)

then say that the agent prefers the human evaluator.

These definitions correspond to a thought experiment in which (for example) a patient has a choice between being seen by a doctor or assessed by an algorithm. If the patient chooses the algorithm, his standard covariates and αbn\alpha_{b}\cdot n arbitrarily chosen nonstandard covariates will be sent to the algorithm. If the patient chooses the doctor, he will engage in a conversation with the doctor, where his standard covariates and αhn\alpha_{h}\cdot n selected nonstandard covariates will be revealed. Which should the patient choose?

The first part of Definition 2.3 compares the agent’s expected payoff under black box evaluation with the best-case expected payoff under human evaluation, namely when the human evaluator observes those (up to) αhn\alpha_{h}\cdot n covariates that maximize the agent’s payoffs. If the agent’s expected payoff is nevertheless higher under black box evaluation even after biasing the agent towards the human in this way, we say that the agent prefers to be evaluated by the black box. The second part of the definition compares the agent’s expected payoff under black box evaluation with the worst-case expected payoff under human evaluation, namely when the human evaluator observes those (up to) αhn\alpha_{h}\cdot n covariates that minimize the agent’s payoffs. If the agent’s expected payoff is lower under black box evaluation even after biasing the agent against the human in this way, then we say that the agent prefers to be evaluated by the human.202020In Section 4.2 we further discuss the extent to which these interpretations are valid when the evaluator also updates her beliefs to the selection of covariates.

These are clearly very conservative criteria for what it means to prefer the human or the black box. In practice, we would expect the set of revealed covariates HH to be intermediate to the two cases considered in Definition 2.3, i.e., that HH neither maximizes nor minimizes the agent’s payoffs.212121Angelova et al. (2022) provide evidence that some judges condition on irrelevant defendant covariates when predicting misconduct rates. But if we can conclude either that the agent prefers the black box evaluator or the human evaluator according to Definition 2.3, then the same conclusion would hold for these more realistic models of HH.

3 Main Results

Section 3.1 characterizes the expected value of context in a simple example. Section 3.2 presents our first main result, which says that the expected value of context vanishes to zero as the number of covariates grows large. Section 3.3 compares human and black box evaluators.

3.1 Example

Suppose there are two covariates, x1x_{1} and x2x_{2}, both nonstandard. For each covariate vector 𝕩{0,1}2\mathbb{x}\in\{0,1\}^{2}, define the random variable Y𝕩=f(𝕩)Y_{\mathbb{x}}=f(\mathbb{x}), where the randomness is in the realization of ff.

X1X2Y𝕩00Y0001Y0110Y1011Y11\begin{array}[]{ccc}X_{1}&X_{2}&Y_{\mathbb{x}}\\ 0&0&Y_{00}\\ 0&1&Y_{01}\\ 1&0&Y_{10}\\ 1&1&Y_{11}\end{array}
Table 1: The four possible covariate vectors and their associated types.

The agent has utility function u(y^,y)=y^u(\hat{y},y)=\hat{y} and covariate vector (1,1)(1,1). Suppose Human observes up to one nonstandard covariate; then, there are three possibilities for what the evaluator observes. If Human observes x1=1x_{1}=1, her evaluation is

Z1Y10+Y112.Z_{1}\equiv\frac{Y_{10}+Y_{11}}{2}.

If Human observes x2=1x_{2}=1, her evaluation is

Z2Y01+Y112.Z_{2}\equiv\frac{Y_{01}+Y_{11}}{2}.

And if Human observes no nonstandard covariates, then her evaluation remains the unconditional average

ZY00+Y01+Y10+Y114.Z_{\varnothing}\equiv\frac{Y_{00}+Y_{01}+Y_{10}+Y_{11}}{4}.

So the expected value of context for this agent is

𝔼[max{Z,Z1,Z2}Z].\mathbb{E}\left[\max\left\{Z_{\varnothing},Z_{1},Z_{2}\right\}-Z_{\varnothing}\right]. (3.1)

Suppose nn grows large with up to hn=n2h_{n}=\lfloor\frac{n}{2}\rfloor covariates observed. There are two opposing forces affecting the value of context. First, when nn is larger there are more distinct sets of covariates that can be revealed to Human, and hence the max in (3.1) is taken over a larger number of posterior expectations. This increases the value of context. On the other hand, each ZkZ_{k} is a sample average, and the number of elements in this sample average also grows in nn.222222For example, observing X1=1X_{1}=1 with n=2n=2 gives the evaluator a posterior expectation of (Y10+Y11)/2(Y_{10}+Y_{11})/2, while the same observation gives the evaluator a posterior expectation of (Y100+Y101+Y110+Y111)/2(Y_{100}+Y_{101}+Y_{110}+Y_{111})/2 if n=3n=3. By the law of large numbers, each ZkZ_{k} thus concentrates on its expectation (which is common across kk) as nn grows large, so the difference between any ZkZ_{k} and ZkZ_{k^{\prime}} grows small. What we have to determine is whether the growth rate in the number of subsets of nonstandard covariates (of size hn\leq h_{n}) is sufficiently large such that the maximum difference in evaluations across these sets is nevertheless asymptotically bounded away from zero. The answer turns out to be no.

3.2 The Expected Value of Context

Our main result says that for every agent, the expected value of context (as defined in Definition 2.2) vanishes as nn grows large.

Theorem 3.1.

Suppose Assumptions 1 and 2 hold. Then for every covariate vector 𝕩{0,1}\mathbb{x}\in\{0,1\}^{\infty}, the expected value of context vanishes to zero as nn grows large, i.e.,

limnV(n,𝕩n)=0.\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

Thus although the value of context may be substantial for certain type functions (such as in Example 5), it does not matter on average across these functions when the agent’s prior satisfies Assumptions 1 and 2. This also implies that for sufficiently large nn, the provision of context does not “typically” matter; that is, the probability that the agent gains substantially from targeted information acquisition is small.

The core of the proof of Theorem 3.1 is an argument that the extent to which context can change the evaluator’s posterior expectation vanishes in the number of covariates. We outline that argument here. For each nn, there are Kn=j=0αhn(nsj)K_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j} sets of αhn\alpha_{h}n (or fewer) nonstandard covariates that can be disclosed. We can enumerate and index these sets to k=1,,Knk=1,\dots,K_{n}. Each set kk induces a posterior expectation ZkZ_{k} which is a sample average of random variables Yxf(x)Y_{x}\equiv f(x). The expected value of context (for this utility function) is

𝔼[max1kKnZk]𝔼[Z]\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]-\mathbb{E}[Z_{\varnothing}]

where ZZ_{\varnothing} is Human’s posterior expectation given observation of standard covariates only. Normalizing E[Z]=0E[Z_{\varnothing}]=0, it remains to study properties of the first-order statistic max1kKnZk\max_{1\leq k\leq K_{n}}Z_{k}.

There are two challenges to analyzing this quantity. First, the correlation structure of Z1,,ZKnZ_{1},\dots,Z_{K_{n}} can be complex: The variables ZkZ_{k} are neither independent (because the same posterior expectation YxY_{x} can appear as an element in different sample averages Zk,ZkZ_{k},Z_{k^{\prime}}) nor identically distributed (because the sample averages are of different sizes depending on how many nonstandard covariates are revealed). The second challenge is that the length of the sequence (Z1,,ZKn)(Z_{1},\dots,Z_{K_{n}}) grows exponentially in nn. Thus even though each term within the maximum eventually converges to a normally distributed random variable (with shrinking variance), the errors of each term may in principle accumulate in a way that the maximum grows large.

Our approach is to first construct new i.i.d. variables ZkiidZ^{iid}_{k}, with the property that

𝔼[max{Z1,,ZKn}]𝔼[max{Z1iid,,ZKniid}]\mathbb{E}\left[\max\{Z_{1},\dots,Z_{K_{n}}\}\right]\leq\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right] (3.2)

Applying a result from Chernozhukov et al. (2013), we show that max1kKnZkiid\max_{1\leq k\leq K_{n}}Z_{k}^{iid} (properly normalized) converges to max1kKnZkNormal\max_{1\leq k\leq K_{n}}Z_{k}^{Normal} in distribution, where (due to properties of our problem) ZkNormaliid𝒩(0,12n(1αh)s)Z_{k}^{Normal}\sim_{iid}\mathcal{N}\left(0,\frac{1}{2^{n(1-\alpha_{h})-s}}\right). Having reduced the analysis to studying the expected maximum of i.i.d.  Gaussian variables, classic bounds apply to show that this quantity is no more than

12n(1αh)slog(Kn).\frac{1}{2^{n(1-\alpha_{h})-s}}\sqrt{\log(K_{n})}. (3.3)

This display quantifies the importance of each of the two forces discussed in Section 3.1. First, as nn grows larger, the number of posterior expectations Kn=j=0αhn(nsj)2nsK_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j}\leq 2^{n-s} grows exponentially in nn, increasing the expected value of context. But second, as nn grows larger, each ZkZ_{k} concentrates on its expectation, where its variance, 12n(1αh)s\frac{1}{2^{n(1-\alpha_{h})-s}}, decreases exponentially in nn. What the bound in display (3.3) tells us is that the exponential growth in the number of variables is eventually dominated by the exponential reduction in the variance of each variable, yielding the result.


This proof sketch also clarifies the role of Assumption 1. As we show in Section 5.4, the statement of the theorem extends so long as the evaluator’s posterior expectation ZkZ_{k} concentrates on its expectation sufficiently quickly as nn grows large. Roughly speaking, this means that the informativeness of any specific set of covariates is decreasing in the total number of covariates. Thus the precise symmetry imposed by Assumption 1 is not critical for Theorem 3.1 to hold.

On the other hand, the conclusion of Theorem 3.1 can fail if the agent has substantial prior knowledge about how yy is related to the covariates.

Example 11.

Let s=0s=0, so that there are no standard covariates. Suppose that for each nn,

y=f(x1,,xn)=1ni=1nxiUy=f(x_{1},\dots,x_{n})=\frac{1}{n}\sum_{i=1}^{n}x_{i}\cdot U

where UU is a uniform random variable on [0,1][0,1]. This model violates Assumption 1, since it is known that higher realizations of the agent’s covariates are good news about the agent’s type. The conclusion of Theorem 3.1 also does not hold: For any nn, the evaluator’s prior expectation is 𝔼[f(𝕩n)]=1/4\mathbb{E}[f(\mathbb{x}_{n})]=1/4. But if αn\lfloor\alpha\cdot n\rfloor covariates are revealed to be 1, the evaluator’s posterior expectation is equal to 14+14αnn\frac{1}{4}+\frac{1}{4}\frac{\lfloor\alpha n\rfloor}{n}. So the expected value of context for an agent with 𝐱n=(1,,1)\mathbf{x}_{n}=(1,...,1) is asymptotically bounded away from zero.

In Section 5 we explore several relaxations of Assumptions 1 and 2. Our first relaxation of Assumption 1 supposes that there is a “low-dimensional” set of covariates that predictive of the agent’s type, while the remaining covariates are irrelevant. The second relaxation supposes that there is a subset of nonstandard covariates whose effects are known. We also consider a relaxation of Assumption 2 where the evaluator’s ability to predict YY is increasing in the total number of covariates that define the type. We formalize these extensions of our main model and examine the extent to which Theorem 3.1 extends.

3.3 Human versus Black Box

We now turn to the question of when the agent should prefer the human evaluator and when the agent should prefer the black box evaluator.

Assumption 3.

The agent’s expected utility can be written as 𝔼[ϕ(y^)]\mathbb{E}[\phi(\hat{y})] for some twice continuously differentiable function ϕ\phi.232323Restricting to utility functions that depend on a posterior mean is a common assumption in the literature on information design, see e.g., Kamenica and Gentzkow (2011), Frankel (2014) and Dworczak and Martini (2019). Moreover, there exists C<C<\infty such that

supy^[y¯,y¯]|ϕ(y^)|infy^[y¯,y¯]|ϕ′′(y^)|<C2.\frac{\sup_{\hat{y}\in[-\overline{y},\overline{y}]}|\phi^{\prime}(\hat{y})|}{\inf_{\hat{y}\in[-\overline{y},\overline{y}]}|\phi^{\prime\prime}(\hat{y})|}<\frac{C}{2}.

Roughly speaking, the numerator describes the sensitivity of the function ϕ\phi to the evaluation, and the denominator describes the curvature of the function ϕ\phi. The assumption thus says that the curvature of the function must be sufficiently large relative to its slope. While there is no formal relationship, the LHS is evocative of the coefficient of absolute risk aversion of the function ϕ\phi.242424Recall that the coefficient of absolute risk aversion of the function ϕ\phi is ϕ(y^)ϕ′′(y^))-\frac{\phi^{\prime}(\hat{y})}{\phi^{\prime\prime}(\hat{y}))}.

Theorem 3.2.

Suppose Assumptions 1-3 hold, and let

N=min{n+:(αbαh)n12log2(n)1>log2(C)}.N=\min\left\{n\in\mathbb{R}_{+}:(\alpha_{b}-\alpha_{h})n-\frac{1}{2}\log_{2}(n)-1>\log_{2}(C)\right\}. (3.4)

Then:

  • (a)

    If ϕ\phi is strictly convex, the agent prefers the black box evaluator for all nNn\geq N.

  • (b)

    If ϕ\phi is strictly concave, the agent prefers the human evaluator for all nNn\geq N.

The comparisons in this theorem apply for reasonably small NN. For example, let C=100C=100, in which case the restriction in Assumption 3 is quite weak. Figure 1 fixes αh=0.1\alpha_{h}=0.1 and plots NN for different values of αb\alpha_{b}. If (say), the human evaluator observes 10% of covariates while Black Box observes 90%, then the comparisons in Theorem 3.2 hold for all n14n\geq 14.

Refer to caption
Figure 1: Let C=100C=100 and αh=0.1\alpha_{h}=0.1. Then the comparisons in Theorem 3.2 hold for all nNn\geq N as depicted here.

The case of convex ϕ\phi (Part (a)) corresponds to a preference for more accurate evaluations.252525Consider any two sets of covariates AAA\subset A^{\prime} and let y^A\hat{y}_{A}, y^A\hat{y}_{A^{\prime}} be the corresponding posterior expectations. The distribution of y^A\hat{y}_{A^{\prime}} (i.e., the posterior expectation that conditions on more information) is a mean-preserving spread of the distribution of y^A\hat{y}_{A}. When ϕ\phi is convex, the former leads to a higher expected utility. Such an agent “prefers more accurate evaluations” in the sense that giving the evaluator better information (in the standard Blackwell sense) leads to an improvement in the agent’s expected utility. Such an agent prefers for the evaluation to be based on more information (advantaging Black Box), but also prefers for the evaluation to be based on more relevant covariates (advantaging Human). We show that what eventually dominates is how many covariates the evaluators observe, not how they are selected; for an agent who prefers accuracy, this favors the Black Box.

Part (b) of Theorem 3.2 says that if instead ϕ\phi is concave, then the agent should eventually prefer the human evaluator. We conclude this section with example decision problems that induce utility functions satisfying the conditions of either part of the theorem.

Example 12.

Suppose the agent receives a dollar wage equal to the evaluation, and is risk averse in money. Then his utility function is u(y^,y)=ϕ(y^)u(\hat{y},y)=\phi(\hat{y}) for some increasing and concave ϕ\phi, and Part (a) of Theorem 3.2 says that the agent prefers to be evaluated by the human.

Example 13.

Suppose the agent’s type is y{0,1}y\in\{0,1\}, and the evaluator chooses an action aa based on the observed covariates. The evaluator and agent share the utility function 𝔼[(ay)2]-\mathbb{E}[(a-y)^{2}]. The evaluator’s optimal action is a=y^a=\hat{y}, and the agent’s expected payoff given this action is

𝔼[(y^y)2]=𝔼[(y^(1y^)2+(1y^)(y^)2)]=𝔼[ϕ(y^)]\mathbb{E}\left[-(\hat{y}-y)^{2}\right]=\mathbb{E}\left[-\left(\hat{y}(1-\hat{y})^{2}+(1-\hat{y})(\hat{y})^{2}\right)\right]=\mathbb{E}\left[\phi(\hat{y})\right]

where ϕ(y^)=y^2y^\phi(\hat{y})=\hat{y}^{2}-\hat{y} is convex. So Part (a) of Theorem 3.2 says that the agent eventually prefers evaluation by the black box evaluator. Although the conditions of Theorem 3.2 are no longer met when yy is not binary, we show in Appendix A.7 that the conclusion of Part (a) of Theorem 3.2 generalizes for arbitrary yy given the mean-squared error payoff function described in this example.

4 Extensions

We now show that we are able to strengthen our main results (Theorems 3.1 and 3.2) in the following ways. In Section 4.1, we show that not only does the expected value of context vanish for each individual agent, but in fact the expected maximum value of context across agents also vanishes. That is, in expectation the most that context can benefit any agent in the population is small. From this, it is immediate that our main results also extend in a generalization of our model in which the agent has uncertainty over his covariate vector. In Section 4.2, we show that our main results extend when the agent and evaluator interact in a disclosure game, wherein the evaluator updates his beliefs to the agent’s strategic choice of what to disclose.

4.1 Max value of context across agents

So far we have studied the the expected value of context for a single agent. If we instead ask whether the firm should use human or algorithmic evaluation—for example, whether a hospital should automate diagnoses or rely on doctor evaluations—other statistics may also be relevant. For example, it may matter whether the value of context is large for any agent in the population (e.g., because a lawsuit regarding algorithmic error may be brought on the basis of harm to any specific individual (Jha, 2020)). We thus study the expected maximum value of context, as defined below.

Definition 4.1.

For any n+n\in\mathbb{Z}_{+}, the expected maximum value of context is

Vmax(n)=𝔼[max𝕩n{0,1}nv(f,𝕩n)].V^{\text{max}}(n)=\mathbb{E}\left[\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v(f,\mathbb{x}_{n})\right].

The following corollary says that this quantity also vanishes as nn grows large.

Corollary 1.

Suppose Suppose Assumptions 1 and 2 hold. Then the expected maximum value of context vanishes to zero as nn grows large, i.e., limnV¯max(n)=0.\lim_{n\rightarrow\infty}\overline{V}^{\text{max}}(n)=0.

Thus, the expected value of context vanishes uniformly across agents in the population. This result immediately implies that Theorems 3.1 and 3.2 extend in any generalization of our model in which the agent has uncertainty not only over ff but also over his own covariate vector 𝕩n\mathbb{x}_{n}.

4.2 Strategic Disclosure

So far we’ve remained agnostic as to whether the agent or evaluator chooses which nonstandard covariates are revealed, assuming that in either case the evaluator updates as if the covariates were revealed exogenously. We now consider a more traditional disclosure game, in which the agent chooses which nonstandard covariates are revealed, and the human evaluator updates her beliefs about the agent’s type in part based on which covariates are chosen.

For any fixed function ff, call the following an ff-context disclosure game: There are two players, the agent and the evaluator. The function ff is common knowledge.262626We do not interpret this assumption literally. At the other extreme where ff is unknown to the agent, there is no informational content in which covariates the agent chooses to reveal, as they are all symmetric from the agent’s point of view. The set of possible disclosures 𝒟\mathcal{D} is the set of all pairs (H,(xi)iH)(H,(x_{i})_{i\in H}) consisting of a set of nonstandard covariates H𝒩H\subseteq\mathcal{N} and values for those covariates. A disclosure d=(H,(xi)iH)d=(H,(x^{\prime}_{i})_{i\in H}) is feasible for an agent with covariate vector (x1,,xn)(x_{1},\dots,x_{n}) if the disclosed covariate values are truthful, i.e., xi=xix_{i}=x_{i}^{\prime} for every iHi\in H.

The agent chooses a disclosure strategy, which is a map

σ:{0,1}n𝒟\sigma:\{0,1\}^{n}\rightarrow\mathcal{D}

from covariate vectors to feasible disclosures. The agent then privately observes his covariate vector 𝕩n\mathbb{x}_{n} and discloses σ(𝕩n)\sigma(\mathbb{x}_{n}). The evaluator observes this disclosure and chooses an action y^\hat{y}. That is, the evaluator’s strategy is a function σE:𝒟[y¯,y¯]\sigma_{E}:\mathcal{D}\rightarrow[-\overline{y},\overline{y}]. The evaluator’s payoff is (y^y)2-(\hat{y}-y)^{2} and the agent’s payoff is some function u(y^)u(\hat{y}).

In this section we focus on pure strategy Perfect Bayesian Nash equilibria (PBE) of this game, henceforth simply equilibria. (A similar result holds for mixed strategy equilibria, which is demonstrated in the appendix.)

Definition 4.2.

Let vD(f,𝕩n)v^{D}(f,\mathbb{x}_{n}) denote the highest payoff that an agent with covariate vector 𝕩n\mathbb{x}_{n} receives in any pure-strategy equilibrium of the ff-context disclosure game. The expected maximum value of context disclosure is

VD(n)=𝔼[max𝕩n{0,1}nvD(f,𝕩n)].V^{D}(n)=\mathbb{E}\left[\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v^{D}(f,\mathbb{x}_{n})\right].

We show that the best payoff that an agent can receive in any pure strategy ff-context equilibrium is bounded above by the maximum value of context across agents.

Proposition 4.1.

Suppose Assumptions 1 and 2 hold. Then for all nn,

VD(n)Vmax(n).V^{D}(n)\leq V^{\text{max}}(n).

Thus, applying Proposition 4.1 and Corollary 1, our previous results extend.

5 Relaxing our Assumptions on the Prior

As shown in Example 11, our main results can fail if the assumption of symmetric uncertainty over the role of the nonstandard covariate values (Assumption 1) is broken. We now propose two relaxations of Assumption 1 and one relaxation of Assumption 2, and explore the extent to which our main result extends. In Section 5.1, we suppose that it is known ex-ante that some rnr_{n} covariates are relevant, while the remaining nrnn-r_{n} are not, so that even as nn grows to infinity, the effective number of covariates potentially grows more slowly. In Section 5.2 we allow the agent to have prior knowledge about the role of certain nonstandard covariates. In Section 5.3, we consider a model in which the predictability of the agent’s type is increasing in the total number of covariates. Finally, Section 5.4 provides an abstract condition on the learning environment under which our main results hold, which requires the evaluator’s uncertainty about the agent’s type to grow sufficiently fast in nn.

5.1 Not all covariates are relevant

Under Assumption 1, it cannot be known ex-ante that some covariates are irrelevant for predicting the type. The assumption thus rules out settings such as the following.

Example 14.

The evaluator is a job interviewer. Although in principle there is an infinite number of covariates that can describe a job candidate, it is understood that not all of them are actually relevant to the job candidate’s ability. That is, there is some potentially large (but not exhaustive) set of covariates that contain all of the predictive content about the candidate’s ability, and the remaining covariates are either irrelevant for predicting ability, or are predictive only because they correlate with other intrinsically predictive covariates.

If irrelevant covariates cannot be disclosed to the evaluator, then we return to our main model with a smaller nn and our previous results extend directly. The more novel case is the one in which it is known that nrnn-r_{n} covariates are irrelevant, but those covariates can still be disclosed to the evaluator (for example, because it is not commonly understood that they are irrelevant).272727To see the difference, consider the case in which the agent simply wants the evaluator to hold a higher posterior expectation. The irrelevant covariates create noise, and for some realizations of ff it may be that disclosing an irrelevant covariate leads to a higher evaluation.

To model this, we suppose there is a sequence of sets of relevant covariates (R1,R2,)(R_{1},R_{2},\dots) such that each RnR_{n} includes the standard covariates in 𝒮\mathcal{S} and is of size s+rns+r_{n}, where rn=αrnr_{n}=\lfloor\alpha_{r}\cdot n\rfloor is the (known) number of relevant nonstandard covariates. We will moreover without loss index these sets so that the relevant covariates are xs+1,xs+2,,xs+rnx_{s+1},x_{s+2},\dots,x_{s+r_{n}}. The irrelevance of the remaining covariates is reflected in the following assumption, which says that, holding fixed the values of the relevant covariates, the values of the irrelevant covariates do not change the type.

Assumption 4 (Irrelevance).

There is a function g(x1,,xs+rn)g(x_{1},\dots,x_{s+r_{n}}) such that

f(x1,,xn)=g(x1,,xs+rn)f(x_{1},\dots,x_{n})=g(x_{1},\dots,x_{s+r_{n}})

for every (x1,,xn){0,1}n(x_{1},\dots,x_{n})\in\{0,1\}^{n}.

We then modify Assumptions 1 and 2 to apply only to the relevant covariates.

Assumption 5 (Exchangeability).

For every realization of (x1,,xs)(x_{1},\dots,x_{s}), the sequence

(Y~1n,,Y~2nn)(g(x1,,xs,xs+1,,xs+rn):(xs+1,,xs+rn){0,1}rns)(\widetilde{Y}^{n}_{1},\dots,\widetilde{Y}^{n}_{2^{n}})\equiv(g(x_{1},\dots,x_{s},x_{s+1},\dots,x_{s+r_{n}}):(x_{s+1},\dots,x_{s+r_{n}})\in\{0,1\}^{r_{n}-s}) (5.1)

is finitely exchangeable.

Assumption 6 (Constant Unpredictability of YY).

For every realization of the standard covariates (x1,,xs)(x_{1},\dots,x_{s}) and every pair n>nn^{\prime}>n, the sequence (Y~1n,,Y~2s+rnn)(\widetilde{Y}^{n}_{1},\dots,\widetilde{Y}_{2^{s+r_{n}}}^{n}) and the truncated sequence (Y~1n,,Y~2s+rnn)(\widetilde{Y}^{n^{\prime}}_{1},\dots,\widetilde{Y}^{n^{\prime}}_{2^{s+r_{n}}}) are identical in distribution.

Our main model is otherwise unchanged—in particular, we allow the agent to disclose any of the nsn-s nonstandard covariates, including those which are irrelevant. We show that our previous results extend so long as αh<αr\alpha_{h}<\alpha_{r}.

Proposition 5.1.

Suppose Assumptions 5 and 6 hold, where αh<αr\alpha_{h}<\alpha_{r}. Then for every covariate vector 𝕩{0,1}\mathbb{x}\in\{0,1\}^{\infty} the expected value of context vanishes to zero as nn grows large.

The case where αr<αh\alpha_{r}<\alpha_{h} (violating the assumption of the result) corresponds to a setting in which the number of relevant covariates is so small that the agent can disclose all of them. For example, if a job candidate is convinced that only 10 nonstandard covariates are actually relevant for predicting his on-the-job ability, and all of these nonstandard covariates can be shared during a job interview, then our main results do not extend and we should think of the value of context as being potentially large. On the other hand, if the set of relevant covariates are low-dimensional relative to the total number of covariates, but are still too numerous to be fully revealed, then our main results do extend.

This result suggests that whether human or black box evaluation is more appropriate should be determined in part based on whether the available signal is concentrated in a small number of covariates (favoring the human evaluator) or spread out across a large number of covariates (favoring the black box evaluator). The same application may transition between these regimes over time. For example, in a medical setting where black box diagnosis is highly accurate based on non-interpretable features of an image scan, it may not be possible to communicate sufficient information via any small number of covariates. But if the predictive features of the image are subsequently better understood and defined, then it may be that a small set of (newly defined) features does eventually capture all of the signal content, and can be fully disclosed in a conversation.

5.2 Certain nonstandard covariates have known effects

Another possibility is that the agent knows how certain nonstandard covariates are correlated with the type.

Example 15.

The agent is a patient who resided around Chernobyl at the time of the Chernobyl nuclear disaster of 1986. The agent is being evaluated for potential thyroid conditions, and knows that this particular part of his history increases the probability of a thyroid condition.

Specifically suppose there is a set K{1,,n}K\subseteq\{1,\dots,n\} of covariate indices whose effects are known. The set KK includes the standard covariates, but possibly also includes some nonstandard covariates. Without loss, we can index these x1,,xKx_{1},\dots,x_{K}. We weaken Assumption 1 to the following:

Assumption 7 (Exchangeability).

For every realization of the covariates (x1,,xK)(x_{1},\dots,x_{K}),

(Y1n,,Y2nn)(f(x1,,xK,xK+1,,xn):(xK+1,,xn){0,1}n|K|)(Y^{n}_{1},\dots,Y^{n}_{2^{n}})\equiv(f(x_{1},\dots,x_{K},x_{K+1},\dots,x_{n}):(x_{K+1},\dots,x_{n})\in\{0,1\}^{n-|K|}) (5.2)

is finitely exchangeable.

This assumption imposes exchangeability only over the nonstandard covariates whose effects are not ex-ante known. Clearly if KK is a strict superset of 𝒮\mathcal{S}, then the expected value of context need not vanish under Assumptions 2 and 7. A simple example is the following.

Example 16.

Suppose there are no standard covariates, and K={1}K=\{1\}, i.e., the first nonstandard covariate has a known effect, where f(𝕩n)U[1,0]f(\mathbb{x}_{n})\sim U[-1,0] if x1=0x_{1}=0 and f(𝕩n)U[0,1]f(\mathbb{x}_{n})\sim U[0,1] if x1=1x_{1}=1. Suppose further that the agent’s covariate vector satisfies x1=1x_{1}=1. Then the prior expectation of the agent’s type is 0, but revealing x1=1x_{1}=1 moves the posterior expectation to 1/21/2. So the expected value of context does not vanish.

But if we modify the definition in (2.2) to

y^𝕩nf(A)=𝔼[YXi=xiiKA]\hat{y}^{f}_{\mathbb{x}_{n}}(A)=\mathbb{E}[Y\mid X_{i}=x_{i}\quad\forall i\in K\cup A]

with KK replacing 𝒮\mathcal{S}, and again let U𝕩nf(A)=u(y^𝕩nf(A),y)U^{f}_{\mathbb{x}_{n}}(A)=u\left(\hat{y}^{f}_{\mathbb{x}_{n}}(A),y\right), then the modified expected value of context

v(f,𝕩n)=maxH𝒩\K|H|αhnU𝕩nf(H)U𝕩nf()v(f,\mathbb{x}_{n})=\max_{\begin{subarray}{c}H\subseteq\mathcal{N}\backslash K\\ |H|\leq\alpha_{h}n\end{subarray}}U_{\mathbb{x}_{n}}^{f}(H)-U_{\mathbb{x}_{n}}^{f}(\varnothing)

evaluates the value of context beyond those covariates with known effects. The same proof shows that this expected value of context vanishes to zero as nn grows large. That is, beyond the value of context that is already clear to the agent based on private knowledge about his nonstandard covariates, the agent does not expect substantial additional gain from the remaining covariates.

5.3 Information accumulates in nn

So far we have assumed that the predictability of YY is constant in the number of covariates. This is not essential for our results. Suppose instead that

yn=f(x1,,xn)+εny_{n}=f(x_{1},\dots,x_{n})+\varepsilon_{n}

where εn\varepsilon_{n} is a mean-zero random variable that is independent of the covariates (x1,,xn)(x_{1},\dots,x_{n}). This describes a setting in which the covariates are not sufficient to reveal the agent’s type, and there is a residual unknown.

Our previous results extend directly when the distribution of εn\varepsilon_{n} is the same for all nn. Another natural case is one in which Var(εn)\text{Var}(\varepsilon_{n}) decreases monotonically in nn, with limnVar(εn)=0\lim_{n\rightarrow\infty}\text{Var}(\varepsilon_{n})=0; that is, in environments with a larger number of covariates, the agent’s type is more predictable. In Appendix B.5 we show that Theorem 3.1 directly extends. We also show that the comparisons in parts (a) and (b) of Theorem 3.2 hold for sufficiently large nn, under the following assumption:

Assumption 8.

For each n+n\in\mathbb{Z}_{+}, let σε,n2:=Var(εn)\sigma_{\varepsilon,n}^{2}:=\text{Var}(\varepsilon_{n}) and assume that εnσε,n2\frac{\varepsilon_{n}}{\sqrt{\sigma_{\varepsilon,n}^{2}}} admits a pdf, which we denote by gng_{n}. The sequence {gn(0)}n\{g_{n}(0)\}_{n} is bounded.

Loosely speaking, our comparisons in Theorem 3.2 continue to hold so long as the variance of εn\varepsilon_{n} does not increase too fast in nn.

5.4 Sufficient residual uncertainty

In this final section, we provide an abstract condition on the evaluator’s learning environment, under which Theorem 3.1 extends.

For each nn, let 𝒟n\mathcal{D}_{n} denote the set of all disclosures respecting the human evaluator’s capacity constraint, i.e., all pairs (H,(xi)iH)(H,(x_{i})_{i\in H}) consisting of a set H{s+1,,n}H\subseteq\{s+1,\dots,n\} with αhn\lfloor\alpha_{h}\cdot n\rfloor or fewer nonstandard covariates, and values (xi)iH(x_{i})_{i\in H} for those covariates. Further define 𝒟=n1𝒟n\mathcal{D}=\cup_{n\geq 1}\mathcal{D}_{n} to be the set of all disclosures. Similarly, for each nn let n\mathcal{F}_{n} be the set of all type functions f:{0,1}n[y¯,y¯]f:\{0,1\}^{n}\rightarrow[-\overline{y},\overline{y}], and define =n1n\mathcal{F}=\cup_{n\geq 1}\mathcal{F}_{n}. An evaluation rule is any family ρ=(ρf)f\rho=(\rho_{f})_{f\in\mathcal{F}} where each ρf:𝒟[y¯,y¯]\rho_{f}:\mathcal{D}\rightarrow[-\overline{y},\overline{y}] maps disclosures into evaluations for the given function ff. Finally, fixing any update rule ρ\rho, number of covariates nn, and disclosure d𝒟nd\in\mathcal{D}_{n}, let

Zdn=ρf(d)Z_{d}^{n}=\rho_{f}(d)

be the random evaluation when ff is drawn from n\mathcal{F}_{n} according to the agent’s prior.

We impose two assumptions below on the evaluation rule. The first says that the expected evaluation ZdnZ_{d}^{n} is equal to the prior expected type μ𝔼[Y]\mu\equiv\mathbb{E}[Y]; the second says that the distribution of the evaluation concentrates on μ\mu sufficiently fast as the number of hidden covariates nn grows large. Intuitively, the assumption requires that as the number of residual unknowns—i.e., the covariates which are predictive of the type, but are not revealed to the evaluator—grows large, the informativeness of any fixed disclosure becomes small.282828In the limit with an uninformative disclosure, the distribution of the evaluation is degenerate at the prior expectation μ\mu for any Bayesian updating rule.

Assumption 9 (Unbiased).

𝔼[Zdn]=μ\mathbb{E}[Z_{d}^{n}]=\mu for every disclosure dd.

Assumption 10 (Fast Concentration).

For any sequence of feasible disclosures (dn)n1(d_{n})_{n\geq 1},

Var(Zdnn)=o(1Kn)Var(Z_{d_{n}}^{n})=o\left(\frac{1}{K_{n}}\right)

where Kn=j=0αhn(nsj)K_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j} is the number of unique sets H{s+1,,n}H\subseteq\{s+1,\dots,n\} with αhn\alpha_{h}n or fewer elements.

These assumptions do not in general represent a weakening of our main model. Previously we studied the evaluation rule ρ\rho mapping each disclosure into the conditional expectation of the agent’s type, and imposed Assumption 1 on the agent’s prior about ff. In this model, the evaluation ZdnZ_{d}^{n} for any disclosure d=(H,(xi)iH)d=(H,(x_{i})_{i\in H}) could be represented as a sample average consisting of 2ns|H|2^{n-s-|H|} elements. Assumption 9 is clearly satisfied (because the update rule is Bayesian), but one can select a sequence of disclosures (dn)(d_{n}) such that Var(Zdnn)=12n(1αh)sVar(Z_{d_{n}}^{n})=\frac{1}{2^{n(1-\alpha_{h})-s}} (see the proof of Theorem 3.1 for details). Thus the speed of convergence demanded in Assumption 10 is not met when αh\alpha_{h} is sufficiently large.

Nevertheless, Assumption 10 identifies the qualitative property of our main setting that gave us Theorem 3.1: residual uncertainty must have the power to overwhelm any information revealed through disclosure. Under these assumptions, our main result extends.

Proposition 5.2.

Suppose Assumptions 9 and 10 hold. Then for every covariate vector 𝕩{0,1}\mathbb{x}\in\{0,1\}^{\infty}, the expected value of context vanishes to zero as nn grows large, i.e.,

limnV(n,𝕩n)=0.\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

This result also clarifies that neither the precise symmetry imposed by Assumption 1, nor the assumption of Bayesian updating in our main model, are crucial for our main result.

6 Conclusion

One argument against replacing human experts with algorithmic predictions is that no matter how many covariates are taken as input by the algorithm, the number of potentially relevant circumstances and characteristics is still more numerous. In cases where some important fact is missed by a human evaluator, it is often possible to correct this oversight. There is no such safety net with a black box algorithm.

This is a compelling narrative, yet our results suggest that it may be less important than it initially seems. When there is a large number of nonstandard covariates that may matter for the prediction problem, but the agent does not know how these nonstandard covariates impact the type, then the expected value of disclosing additional information is small—even when we assume that the agent can identify the most useful covariates to disclose, and that the claims about these covariates are taken at face value.

In contrast, if the agent has substantial prior knowledge about the predictive roles of the nonstandard covariates, then our conclusion will not be appropriate. In particular, if there is a “low-dimensional” set of covariates that predict the type and can be fully disclosed (as in Example 9), or if there is a known structural relationship between covariates and the type (as in Example 11), then the expected value to disclosing additional information may be large. We thus view our results as revealing a link betwen the value of targeting information acquisition (beyond simply conditioning on large quantities of information) and the extent of prior “structural information” about the numerous covariates that can be brought up as explanations.

We conclude with two alternative interpretations of our model and results.

Online versus offline learning. In our model, a key distinction between human and black box evaluation is that the human can adapt which covariates are acquired based on other properties of the agent, while the black box cannot. This is an appropriate comparison of human and black box evaluators as they currently stand: The black box algorithms used to make predictions about humans are usually supervised machine learning algorithms which are pre-trained on a large data set. But new black box algorithms, such as large language models, blur this distinction, and future evaluations (e.g., medical diagnoses) may be conducted by black box systems with which the agent can communicate.

From this more forward-looking perspective, our results can be understood as comparing the merits of online versus offline learning. That is, how valuable is it to have the evaluator dynamically acquire information given feedback from the agent? Our result suggests that this is not important in expectation. For example, Part (a) of Theorem 3.2 implies that an agent who cares about accuracy should prefer a supervised machine learning algorithm trained on a large number of covariates over a conversation with ChatGPT that reveals a smaller number of covariates.

Value of human supervision of algorithms. While we have interpreted the ss standard covariates as a small set of covariates acquired by the human evaluator, an alternative interpretation is that they are the initial inputs to an algorithm. In this case, the expected value of context quantifies the sensitivity of the algorithm’s predictions to the addition of further relevant inputs, e.g., as identified by a human manager. This interpretation is particularly relevant when we consider accuracy as the objective, in which case the value of context tells us how wrong the algorithm is compared to if the algorithm could be retrained on additional relevant inputs. Theorem 3.1 says that while in certain cases additional inputs would lead to a substantially more accurate prediction, under our symmetry assumption on the agent’s prior this will not typically be the case.

References

  • Acemoglu et al. (2015) Acemoglu, D., V. Chernozhukov, and M. Yildiz (2015): “Fragility of Asymptotic Agreement under Bayesian Learning,” Theoretical Economics, 11, 187–225.
  • Acosta et al. (2022) Acosta, J., G. Falcone, P. Rajpurkar, and E. Topol (2022): “Multimodal biomedical AI,” Nature Medicine, 28, 1773–1784.
  • Agarwal et al. (2023) Agarwal, N., A. Moehring, P. Rajpurkar, and T. Salz (2023): “Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology,” Working Paper 31422, National Bureau of Economic Research.
  • Akbarpour et al. (2024) Akbarpour, M., S. Malladi, and A. Saberi (2024): “Just a Few Seeds More: Value of Network Information for Diffusion,” Working Paper.
  • Angelova et al. (2022) Angelova, V., W. Dobbie, , and C. S. Yang (2022): “Algorithmic Recommendations and Human Discretion,” Working Paper.
  • Antic and Chakraborty (2023) Antic, N. and A. Chakraborty (2023): “Selected Facts,” Working Paper.
  • Arnold and Groeneveld (1979) Arnold, B. C. and R. A. Groeneveld (1979): “Bounds on expectations of linear systematic statistics based on dependent samples,” The Annals of Statistics, 220–223.
  • Bardhi (2023) Bardhi, A. (2023): “Attributes: Selective Learning and Influence,” Working Paper.
  • Bastani et al. (2022) Bastani, H., O. Bastani, and W. P. Sinchaisri (2022): “Improving Human Decision-Making with Machine Learning,” .
  • Berman (1964) Berman, S. M. (1964): “Limit Theorems for the Maximum Term in Stationary Sequences,” The Annals of Mathematical Statistics, 35, 502 – 516.
  • Blackwell and Dubins (1962) Blackwell, D. and L. Dubins (1962): “Merging of Opinions with Increasing Information,” The Annals of Mathematical Statistics.
  • Chalfin et al. (2016) Chalfin, A., O. Danieli, A. Hillis, Z. Jelveh, M. Luca, J. Ludwig, and S. Mullainathan (2016): “Productivity and Selection of Human Capital with Machine Learning,” American Economic Review, 106, 124–27.
  • Chernozhukov et al. (2013) Chernozhukov, V., D. Chetverikov, and K. Kato (2013): “Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors,” .
  • Crawford and Sobel (1982) Crawford, V. P. and J. Sobel (1982): “Strategic information transmission,” Econometrica: Journal of the Econometric Society, 1431–1451.
  • Di Tillio et al. (2021) Di Tillio, A., M. Ottaviani, and P. N. Sørensen (2021): “Strategic Sample Selection,” Econometrica, 89, 911–953.
  • Dworczak and Martini (2019) Dworczak, P. and G. Martini (2019): “The Simple Economics of Optimal Persuasion,” Journal of Political Economy, 127, 1993–2048.
  • Dye (1985) Dye, R. A. (1985): “Disclosure of Nonproprietary Information,” Journal of Accounting Research, 23, 123–145.
  • Farina et al. (2023) Farina, A., G. Frechette, A. Lizzeri, and J. Perego (2023): “The Selective Disclosure of Evidence: An Experiment,” Working Paper.
  • Frankel (2014) Frankel, A. (2014): “Aligned Delegation,” American Economic Review, 104, 66–83.
  • Frick et al. (2023) Frick, M., R. Iijima, and Y. Ishii (2023): “Learning Efficiency of Multiagent Information Structures,” Journal of Political Economy, 131, 3377–3414.
  • Gillis et al. (2021) Gillis, T., B. McLaughlin, and J. Spiess (2021): “On the Fairness of Machine-Assisted Human Decisions,” Working Paper.
  • Glazer and Rubinstein (2004) Glazer, J. and A. Rubinstein (2004): “On optimal rules of persuasion,” Econometrica, 72, 1715–1736.
  • Golub and Jackson (2012) Golub, B. and M. Jackson (2012): “How Homophily Affects the Speed of Learning and Best-Response Dynamics,” The Quarterly Journal of Economics, 127, 1287–1338.
  • Grossman and Hart (1980) Grossman, S. J. and O. D. Hart (1980): “Disclosure Laws and Takeover Bids,” The Journal of Finance, 35, 323–334.
  • Haghtalab et al. (2021) Haghtalab, N., M. Jackson, and A. Procaccia (2021): “Belief polarization in a complex world: A learning theory perspective,” PNAS, 118, 141–73.
  • Harel et al. (2020) Harel, M., E. Mossel, P. Strack, and O. Tamuz (2020): “Rational Groupthink*,” The Quarterly Journal of Economics, 136, 621–668.
  • Hoffman et al. (2017) Hoffman, M., L. B. Kahn, and D. Li (2017): “Discretion in Hiring*,” The Quarterly Journal of Economics, 133, 765–800.
  • Jha (2020) Jha, S. (2020): “Can you sue an algorithm for malpractice? It depends,” .
  • Jin et al. (2021) Jin, G. Z., M. Luca, and D. Martin (2021): “Is No News (Perceived As) Bad News? An Experimental Investigation of Information Disclosure,” American Economic Journal: Microeconomics, 13, 141–73.
  • Jung et al. (2017) Jung, J., C. Concannon, R. Shroff, S. Goel, and D. G. Goldstein (2017): “Simple rules for complex decisions,” Working Paper.
  • Jussupow and Heinzl (2020) Jussupow, Ekaterina; Benbasat, I. and A. Heinzl (2020): “Why are we averse towards algorithms? A comprehensive literature review on algorithm aversion,” in In Proceedings of the 28th European Conference on Information Systems.
  • Kamenica and Gentzkow (2011) Kamenica, E. and M. Gentzkow (2011): “Bayesian Persuasion,” American Economic Review, 101, 2590–2615.
  • Klabjan et al. (2014) Klabjan, D., W. Olszewski, and A. Wolinsky (2014): “Attributes,” Games and Economic Behavior, 88, 190–206.
  • Kleinberg et al. (2017) Kleinberg, J., H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan (2017): “Human Decisions and Machine Predictions,” The Quarterly Journal of Economics, 133, 237–293.
  • Kushilevitz and Nisan (1996) Kushilevitz, E. and N. Nisan (1996): Communication Complexity, Cambridge University Press.
  • Lai et al. (2023) Lai, V., C. Chen, A. Smith-Renner, Q. V. Liao, and C. Tan (2023): “Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA: Association for Computing Machinery, FAccT ’23, 1369–1385.
  • Liang and Mu (2019) Liang, A. and X. Mu (2019): “Complementary Information and Learning Traps*,” The Quarterly Journal of Economics, 135, 389–448.
  • Liang et al. (2022) Liang, A., X. Mu, and V. Syrgkanis (2022): “Dynamically Aggregating Diverse Information,” Econometrica, 90, 47–80.
  • Longoni et al. (2019) Longoni, C., A. Bonezzi, and C. K. Morewedge (2019): “Resistance to Medical Artificial Intelligence,” Journal of Consumer Research, 46, 629–650.
  • McLaughlin and Spiess (2022) McLaughlin, B. and J. Spiess (2022): “Algorithmic Assistance with Recommendation-Dependent Preferences,” Working Paper.
  • McLean and Postlewaite (2002) McLean, R. and A. Postlewaite (2002): “Informational Size and Incentive Compatibility,” Econometrica, 70, 2421–2453.
  • Milgrom (1981) Milgrom, P. R. (1981): “Good News and Bad News: Representation Theorems and Applications,” The Bell Journal of Economics, 12, 380–391.
  • Morris and Yildiz (2019) Morris, S. and M. Yildiz (2019): “Crises: Equilibrium Shifts and Large Shocks,” American Economic Review, 109, 2823–54.
  • Obermeyer and Emanuel (2016) Obermeyer, Z. and E. J. Emanuel (2016): “Predicting the Future - Big Data, Machine Learning, and Clinical Medicine,” The New England Journal of Medicine, 375, 1216–9.
  • Raghu et al. (2019) Raghu, M., K. Blumer, G. Corrado, J. Kleinberg, Z. Obermeyer, and S. Mullainathan (2019): “The Algorithmic Automation Problem: Prediction, Triage, and Human Effort,” Working Paper.
  • Rajpurkar et al. (2017) Rajpurkar, P., J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, M. P. Lungren, and A. Y. Ng (2017): “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” Working Paper.
  • Spiegler (2020) Spiegler, R. (2020): “Behavioral Implications of Causal Misperceptions,” Annual Review of Economics, 12, 81–106.
  • Vives (1992) Vives, X. (1992): “How Fast do Rational Agents Learn?” Review of Economic Studies, 60, 329–347.
  • Yang et al. (2024) Yang, K. H., N. Yoder, and A. Zentefis (2024): “Explaining Models,” Working Paper.

Appendix A Proof of Generalization of Theorem 3.1

In a change of notation relative to the main text, we subsequently use 𝕏n\mathbb{X}_{n} to denote the agent’s covariate vector and YY to denote the agent’s type (leaving 𝕩n\mathbb{x}_{n} and yy to denote realizations of these random variables). Moreover, rather than supposing that YY is deterministically related to 𝕏n\mathbb{X}_{n} via a function ff, let (𝕏n,Y)Pn(\mathbb{X}_{n},Y)\sim P^{n} where PnP^{n} is unknown. We replace Assumptions 1 and 2 with the following weaker assumption.

Assumption 11.

Fix any realization of the standard covariates 𝕩𝒮{0,1}s\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}. There is an infinitely exchangeable sequence (Y~1,Y~2,)(\widetilde{Y}_{1},\widetilde{Y}_{2},\dots) such that for every nn\in\mathbb{N}, the sequence

(𝔼[Y(X1,,Xn)=(𝕩𝒮,𝕩𝒮)])𝕩𝒮{0,1}ns\left(\mathbb{E}[Y\mid(X_{1},\dots,X_{n})=(\mathbb{x}_{\mathcal{S}},\mathbb{x}_{-\mathcal{S}})]\right)_{\mathbb{x}_{-\mathcal{S}}\in\{0,1\}^{n-s}}

has the same distribution as (Y~1,,Y~2n)(\widetilde{Y}_{1},\dots,\widetilde{Y}_{2^{n}}).

That is, permuting the labels and/or values of the nonstandard covariates does not change the joint distribution of the conditional expectations of yy. When yy is degenerate conditional on 𝕩n\mathbb{x}_{n}, Assumption 11 reduces to our previous two assumptions. We will prove the following generalization of Theorem 3.1.

Theorem A.1.

Suppose Assumption 11 holds. Then for every covariate vector 𝕩{0,1}\mathbb{x}\in\{0,1\}^{\infty}, the expected value of context vanishes to zero as nn grows large, i.e., limnV(n,𝕩n)=0\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

Towards this, we will first prove the conclusion under a strengthening of Assumption 11, where exchangeability is replaced by an assumption that conditional expectations are i.i.d. across the different possible completions of the agent’s covariate vector.

Assumption 12.

Fix any realization of the standard covariates 𝕩𝒮{0,1}s\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}. Then there is a distribution FF such that for every nn\in\mathbb{N}, the conditional expectations

𝔼[Y(X1,,Xn)=(𝕩𝒮,𝕩𝒮)]iidF\mathbb{E}[Y\mid(X_{1},\dots,X_{n})=(\mathbb{x}_{\mathcal{S}},\mathbb{x}_{-\mathcal{S}})]\sim_{iid}F

across all vectors 𝕩𝒮{0,1}n\mathbb{x}_{-\mathcal{S}}\in\{0,1\}^{n}.

Theorem A.2.

Suppose Assumption 12 holds. Then for every covariate vector 𝕩{0,1}\mathbb{x}\in\{0,1\}^{\infty}, the expected value of context vanishes to zero as nn grows large, i.e., limnV(n,𝕩n)=0\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

Sections A.1-A.4 prove Theorem A.2, and Section A.5 shows that Theorem A.2 implies Theorem A.1.

A.1 Outline for Proof of Theorem A.2

Fix any realization (x1,,xs)(x_{1},\dots,x_{s}) of the agent’s standard covariates. After observing (x1,,xs)(x_{1},\dots,x_{s}), the evaluator assigns positive probability to the 2ns2^{n-s} covariate vectors whose first ss entries are equal to (x1,,xs)(x_{1},\dots,x_{s}). Let these covariate vectors be indexed by 𝕩j\mathbb{x}^{j} where j=1,,2nsj=1,\dots,2^{n-s}, and define

Yj𝔼Pn[Y(X1,,Xn)=𝕩j]Y_{j}\equiv\mathbb{E}_{P^{n}}\left[Y\mid(X_{1},\dots,X_{n})=\mathbb{x}^{j}\right]

to be the (random) expected type given covariate vector 𝕩j\mathbb{x}^{j}. By assumption that the marginal distribution over covariate vectors is uniform, the evaluator’s posterior expectation of the agent’s type after observing the agent’s standard covariates is

Y^(,𝕩n)=12nsi=12nsYjZn.\widehat{Y}(\varnothing,\mathbb{x}_{n})=\frac{1}{2^{n-s}}\sum_{i=1}^{2^{n-s}}Y_{j}\equiv Z^{n}_{\varnothing}.

There are Kn=k=0hn(nsk)K_{n}=\sum_{k=0}^{h_{n}}\binom{n-s}{k} subsets of {s+1,,n}\{s+1,\dots,n\} that contain hnh_{n} or fewer elements. Enumerate these sets as H1,,HKnH_{1},\dots,H_{K_{n}}. For each HkH_{k}, let

Sk={j:𝕩jCHk(𝕩n)}S_{k}=\left\{j\,:\,\mathbb{x}^{j}\in C_{H_{k}}(\mathbb{x}_{n})\right\}

be the set of indices for those covariate vectors 𝕩j\mathbb{x}^{j} that agree with the agent’s covariate vector 𝕩n\mathbb{x}_{n} in entries (1,,s)Hk(1,\dots,s)\cup H_{k} (where CHk(𝕩n)C_{H_{k}}(\mathbb{x}_{n}) is as defined in (2.1)). After observing the agent’s nonstandard covariates in the set HkH_{k}, the evaluator’s posterior expectation about the agent’s type is

Y^(Hk,𝕩n)=jSkYj|Sk|Zk.\widehat{Y}(H_{k},\mathbb{x}_{n})=\frac{\sum_{j\in S_{k}}Y_{j}}{|S_{k}|}\equiv Z_{k}.

Although the distributions of the random variables ZkZ_{k} vary across nn, we suppress this dependence in what follows to save on notation. The remainder of the proof proceeds by first showing that in expectation the possible increase in the evaluator’s posterior expectation over the prior expectation μ𝔼[Y]\mu\equiv\mathbb{E}[Y] is vanishing.

Proposition A.1.

limn𝔼[max1kKnZkμ]=0.\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]=0.

This is subsequently strengthened to the statement that the expected maximum absolute difference between ZkZ_{k} and μ\mu converges to zero.

Proposition A.2.

limn𝔼[max1kKn|Zkμ|]=0.\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|]=0.

And finally we apply the above proposition to demonstrate the conclusion of the theorem, i.e., that

limnV(n)=limn𝔼[max1kKnu(Zk,Y)]𝔼[u(Zn,Y)]=0\lim_{n\rightarrow\infty}V(n)=\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)\right]-\mathbb{E}\left[u\left(Z^{n}_{\varnothing},Y\right)\right]=0

Thus in expectation the possible increase in the agent’s payoff also vanishes. We suppress dependence of VV on the covariate vector 𝐱n\mathbf{x}_{n} in what follows, writing simply V(n)V(n).

A.2 Proof of Proposition A.1

Statement of the proposition: limn𝔼[max1kKnZkμ]=0.\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]=0.

The quantity 𝔼[max1kKnZk]\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}] is the expected first-order statistic of a sequence of non-i.i.d. variables Z1,,ZKnZ_{1},\dots,Z_{K_{n}}. The proof is organized as follows. In Sections A.2.1 and A.2.2, we define i.i.d. variables ZkiidZ^{iid}_{k} with the property that

𝔼[max{Z1,,ZKn}]𝔼[max{Z1iid,,ZKniid}].\mathbb{E}\left[\max\{Z_{1},\dots,Z_{K_{n}}\}\right]\leq\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right]. (A.1)

In Sections A.2.3 and A.2.4, we show that the RHS of the above display converges to μ\mu as nn grows large.

A.2.1 Replacing ZkZ_{k}’s with independent variables ZkindZ_{k}^{ind}

In general, disclosures kk and kk^{\prime} may lead to posterior expectations ZkZ_{k} and ZkZ_{k^{\prime}} that are correlated due to the presence of the same YiY_{i}’s across the different sample averages. We first show that replacing these ZkZ_{k}’s with properly defined independent random variables weakly increases the value of context.

Definition A.1.

For each 1kKn1\leq k\leq K_{n} define

Zkind=j=1|Sk|Yjk|Sk|Z_{k}^{ind}=\frac{\sum_{j=1}^{|S_{k}|}Y_{j}^{k}}{|S_{k}|} (A.2)

where YjkiidFY_{j}^{k}\sim_{iid}F, so that each ZkindZ_{k}^{ind} has the same distribution as ZkZ_{k}, but the vector (Z1ind,,ZKind)(Z_{1}^{ind},\dots,Z_{K}^{ind}) is mutually independent.

Lemma A.1.

Let

Vn𝔼[max{Z1,,ZKn}]V_{n}\equiv\mathbb{E}[\max\{Z_{1},...,Z_{K_{n}}\}]

and

Vnind𝔼[max{Z1ind,,ZKnind}].V^{ind}_{n}\equiv\mathbb{E}[\max\{Z_{1}^{ind},...,Z_{K_{n}}^{ind}\}].

Then VnVnindV_{n}\leq V^{ind}_{n} for all n+n\in\mathbb{Z}_{+}.

Proof.

Throughout we use XYX\succeq Y to mean that the distribution of XX first-order stochastically dominates the distribution of YY.

Sublemma 1.

Let X1X_{1},…, XQX_{Q},WW be a sequence of real-valued random variables (not necessarily i.i.d.). Let a1>a2>>aQ1>aQ>0a_{1}>a_{2}>...>a_{Q-1}>a_{Q}>0 be a sequence of positive constants. Further, let Y1,,YQY_{1},...,Y_{Q} be i.i.d. random variables, independent of (X1,,XQ,W)(X_{1},\dots,X_{Q},W). Define

MC\displaystyle M_{C} =maxi{1,,Q}{Xi+aiY1}\displaystyle=\max_{i\in\{1,...,Q\}}\{X_{i}+a_{i}Y_{1}\}
MI\displaystyle M_{I} =maxi{1,,Q}{Xi+aiYi}\displaystyle=\max_{i\in\{1,...,Q\}}\{X_{i}+a_{i}Y_{i}\}

Then MIMCM_{I}\succeq M_{C} and max{MI,W}max{MC,W}\max\{M_{I},W\}\succeq\max\{M_{C},W\}.

Proof.

For q{1,,Q}q\in\{1,...,Q\} define:

MCq\displaystyle M_{C}^{q} =max{maxi{1,,q1}{Xi+aiY1},Xq+aqY1}\displaystyle=\max\left\{\max_{i\in\{1,...,q-1\}}\{X_{i}+a_{i}Y_{1}\},X_{q}+a_{q}Y_{1}\right\}
M~Cq\displaystyle\widetilde{M}_{C}^{q} =max{maxi{1,,q1}{Xi+aiY1},Xq+aqYq}\displaystyle=\max\left\{\max_{i\in\{1,...,q-1\}}\{X_{i}+a_{i}Y_{1}\},X_{q}+a_{q}Y_{q}\right\}

so that MCqM_{C}^{q} is the maximum of the first qq terms in MCM_{C}, and M~Cq\widetilde{M}_{C}^{q} replaces Y1Y_{1} in the qq-th term of MCqM_{C}^{q} with YqY_{q}. We first demonstrate an analogue of the desired conclusions for MCqM_{C}^{q} and M~Cq\widetilde{M}_{C}^{q}.

Sublemma 2.

M~CqMCq\widetilde{M}_{C}^{q}\succeq M_{C}^{q} and max{M~Cq,W}max{MCq,W}\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\}.

Proof.

Without loss of generality set aq=1a_{q}=1. We’ll first show that M~CqMCq\widetilde{M}_{C}^{q}\succeq M_{C}^{q}. To establish first-order stochastic dominance, we need to show that for all tt\in\mathbb{R} it holds that

(MCqt)(M~Cqt)0\mathbb{P}(M_{C}^{q}\leq t)-\mathbb{P}(\widetilde{M}_{C}^{q}\leq t)\geq 0

For each i{1,,q1}i\in\{1,...,q-1\} define the event

Bi:={Xq+Y1>Xi+aiY1}{Y1<1ai1(XqXi)}.B_{i}:=\{X_{q}+Y_{1}>X_{i}+a_{i}Y_{1}\}\equiv\left\{Y_{1}<\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}.

Further let

B=i=1qBi={Y1<mini{1,,q1}1ai1(XqXi)}B=\bigcap_{i=1}^{q}B_{i}=\left\{Y_{1}<\min_{i\in\{1,...,q-1\}}\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}

be the event that Xq+Y1X_{q}+Y_{1} achieves the maximum among {Xi+aiY1}i=1q\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q}. We’ll show that the FOSD rankings in Sublemma 2 hold both on event BB and also on its complement BcB^{c}.

Define

B~:=(Yq<mini{1,,q1}{1ai1(XqXi)})\widetilde{B}:=\left(Y_{q}<\min_{i\in\{1,...,q-1\}}\left\{\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}\right)

to be the event that Xq+YqX_{q}+Y_{q} achieves the maximum among {Xi+aiYq}i=1q\{X_{i}+a_{i}Y_{q}\}_{i=1}^{q}. Then

M~Cq|B\displaystyle\widetilde{M}_{C}^{q}|B (Xq+Yq)|B\displaystyle\succeq(X_{q}+Y_{q})|B
=dXq|B+Yq\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}|B+Y_{q} since Yq(X1,,Xq,Y1)\displaystyle\text{since }Y_{q}\perp\!\!\!\perp(X_{1},\dots,X_{q},Y_{1})
Xq|B+Yq|B~\displaystyle\succeq X_{q}|B+Y_{q}|\tilde{B} since YqYqB~\displaystyle\text{since }Y_{q}\succeq Y_{q}\mid\widetilde{B}
=dXq|B+Y1|B\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}|B+Y_{1}|B since Y1B=dYqB~\displaystyle\text{since }Y_{1}\mid B\stackrel{{\scriptstyle d}}{{=}}Y_{q}\mid\widetilde{B}
=d(Xq+Y1)|B=dMCq|B\displaystyle\stackrel{{\scriptstyle d}}{{=}}(X_{q}+Y_{1})|B\stackrel{{\scriptstyle d}}{{=}}M_{C}^{q}|B

Thus M~Cq|BMCq|B\widetilde{M}_{C}^{q}|B\succeq M_{C}^{q}|B.

Now consider the event BcB^{c}, on which Xq+Y1X_{q}+Y_{1} does not achieve the maximum among {Xi+aiY1}i=1q\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q}. Then either X1+Yqmax{Xi+aiY1}i=1q1X_{1}+Y_{q}\leq\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q-1}, in which case M~Cq=MCq\widetilde{M}_{C}^{q}=M_{C}^{q}, or X1+Yq>max{Xi+aiY1}i=1q1X_{1}+Y_{q}>\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q-1}, in which case M~Cq>MCq\widetilde{M}_{C}^{q}>M_{C}^{q}. So

M~Cq|Bcmax{X1+a1Y1,,Xq1+aq1Y1}|Bc=dMCq|Bc.\widetilde{M}_{C}^{q}|B^{c}\succeq\max\{X_{1}+a_{1}Y_{1},...,X_{q-1}+a_{q-1}Y_{1}\}|B^{c}\stackrel{{\scriptstyle d}}{{=}}M_{C}^{q}|B^{c}.

and hence M~Cq|BcMCq|Bc\widetilde{M}_{C}^{q}|B^{c}\succeq M_{C}^{q}|B^{c}.

Now we show that max{M~Cq,W}max{MCq,W}\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\}. For any realization ww of WW, let XiwX^{w}_{i} denote the conditional random variable Xi|W=wX_{i}|W=w. Define MCq,wM_{C}^{q,w} and M~Cq,w\widetilde{M}_{C}^{q,w} identically to MCqM_{C}^{q} and M~Cq\widetilde{M}_{C}^{q}, replacing each XiX_{i} by XiwX_{i}^{w}. Then by independence of WW and (Y1,,Yq)(Y_{1},\dots,Y_{q}), the distribution of max{MCq,w,w}\max\{M_{C}^{q,w},w\} is identical to that of max{MCq,W}|W=w\max\{M_{C}^{q},W\}|W=w, and the distribution of max{M~Cq,w,w}\max\{\widetilde{M}_{C}^{q,w},w\} is identical to that of max{M~Cq,W}|(W=w)\max\{\widetilde{M}_{C}^{q},W\}|(W=w).

Applying the first part of this sublemma to MCq,wM^{q,w}_{C} and M~Cq,w\widetilde{M}_{C}^{q,w}, we conclude that MIq,wMCq,wM^{q,w}_{I}\succeq M_{C}^{q,w}. Since max{.,w}\max\{.,w\} is an increasing convex function, it preserves the first-order stochastic dominance relation and hence max{M~Cq,W}|(W=w)max{MCq,W}|(W=w)\max\{\widetilde{M}_{C}^{q},W\}|(W=w)\succeq\max\{M^{q}_{C},W\}|(W=w). This argument holds pointwise for all ww so max{M~Cq,W}max{MCq,W}\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\} as desired. ∎

We now complete the proof that max{MC,W}max{MI,W}\max\{M_{C},W\}\succeq\max\{M_{I},W\}. From similar (omitted) arguments it follows that MIMCM_{I}\succeq M_{C}. For each q{1,,Q1}q\in\{1,\dots,Q-1\} define

M^Cq=max{max{Xi+aiY1}i=1q,max{Xi+aiYi}i=q+1Q,W}\widehat{M}_{C}^{q}=\max\left\{\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q},\max\{X_{i}+a_{i}Y_{i}\}_{i=q+1}^{Q},W\right\}

observing that max{MI,W}=M^C1\max\{M_{I},W\}=\widehat{M}_{C}^{1} and that M^CQmax{MC,W}\widehat{M}_{C}^{Q}\succeq\max\{M_{C},W\} (by Sublemma 2). Moreover, for each q{1,,Q1}q\in\{1,\dots,Q-1\},

M^Cq\displaystyle\widehat{M}_{C}^{q} =max{MCq,Wq}\displaystyle=\max\left\{M_{C}^{q},W^{q}\right\}
M^Cq1\displaystyle\widehat{M}_{C}^{q-1} =max{M~Cq,Wq}\displaystyle=\max\{\widetilde{M}_{C}^{q},W^{q}\}

where Wq=max{max{Xi+aiYi}i=qQ,W}W^{q}=\max\left\{\max\{X_{i}+a_{i}Y_{i}\}_{i=q}^{Q},W\right\} is independent of (Y1,,Yq1)(Y_{1},\dots,Y_{q-1}). So applying Sublemma 2, M^Cq1M^Cq\widehat{M}_{C}^{q-1}\succeq\widehat{M}_{C}^{q} as desired. ∎

Finally, we use Sublemma 1 to establish Lemma A.1, i.e., the expected value of context weakly increases if we make the YY’s within different disclosures independent. We will prove this iteratively. For arbitrary nn\in\mathbb{N}, define the random variable

M=max{Z1,,ZKn}=max{jS1Yj|S1|,,jSKnYj|SKn|}.M=\max\{Z_{1},\dots,Z_{K_{n}}\}=\max\left\{\frac{\sum_{j\in S_{1}}Y_{j}}{|S_{1}|},\dots,\frac{\sum_{j\in S_{K_{n}}}Y_{j}}{|S_{K_{n}}|}\right\}.

Fix any YiY_{i}. We will show that replacing YiY_{i} across different sample averages with independent copies of this random variable leads to a FOSD increase in the distribution of MM.

Let I={k:iSk}I=\{k:i\in S_{k}\} be the set of indices of sample averages which contain YiY_{i}. Then we can rewrite the previous display as

max{maxkIjSkYk|Sk|,maxkIjSkYk|Sk|}\displaystyle\max\left\{\,\max_{k\in I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|},\,\max_{k\notin I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|}\right\}

or

max{maxkI{Xk+1|Sk|Yi},W}\max\left\{\max_{k\in I}\left\{X_{k}+\frac{1}{|S_{k}|}Y_{i}\right\},W\right\} (A.3)

where Xk1|Sk|jSk,jiYjX_{k}\equiv\frac{1}{|S_{k}|}\sum_{j\in S_{k},j\neq i}Y_{j} for each kIk\in I, and WmaxkIjSkYk|Sk|W\equiv\max_{k\notin I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|}. Because (Y1,,YKn)(Y_{1},\dots,Y_{K_{n}}) are mutually independent, YiY_{i} is independent of each XkX_{k} and WW. So applying Lemma 1, the random variable in (A.3) has a distribution that is first-order stochastically dominated by the distribution of

max{maxkI{Xk+1|Sk|Yik},W}\max\left\{\max_{k\in I}\left\{X_{k}+\frac{1}{|S_{k}|}Y_{i}^{k}\right\},W\right\}

as desired. Since YiY_{i} is arbitrary, this concludes the proof.

A.2.2 Replacing ZkindZ_{k}^{ind} with i.i.d. Variables ZkiidZ_{k}^{iid}

The variables Z1ind,,ZKnindZ_{1}^{ind},\dots,Z_{K_{n}}^{ind} are sample averages of unequal sizes ranging between 2nshn2^{n-s-h_{n}} and 2ns2^{n-s} elements. We next show that replacing each of these variables with a sample average of 2nshn2^{n-s-h_{n}} elements (the smallest size) weakly increases the value of context.

Definition A.2.

For each 1kKn1\leq k\leq K_{n} define

Zkiid=j=12nshnYjk2nshnZ_{k}^{iid}=\frac{\sum_{j=1}^{2^{n-s-h_{n}}}Y_{j}^{k}}{2^{n-s-h_{n}}} (A.4)

to be the analogue of ZkindZ_{k}^{ind} with 2nshn2^{n-s-h_{n}} elements instead of |Sk|2nshn|S_{k}|\geq 2^{n-s-h_{n}}, so that the variables Z1iid,,ZKniidZ_{1}^{iid},\dots,Z_{K_{n}}^{iid} are iid.

Lemma A.2.

Let

Vniid𝔼[max{Z1iid,,ZKniid}].V^{iid}_{n}\equiv\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right].

Then VnindVniidV^{ind}_{n}\leq V^{iid}_{n} for all n+n\in\mathbb{Z}_{+}.

Proof.

We use the following result.

Sublemma 3.

Suppose Y1,Y2,,YnY_{1},Y_{2},\dots,Y_{n} are independent and identically distributed random variables, and define Y¯n=1ni=1nYi\overline{Y}_{n}=\frac{1}{n}\sum_{i=1}^{n}Y_{i} to be their sample average. Let n<nn^{\prime}<n and define Y¯n=1ni=1nYi\overline{Y}_{n^{\prime}}=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}Y_{i}. Then the distribution of Y¯n\overline{Y}_{n^{\prime}} is a mean preserving spread of the distribution of Y¯n\overline{Y}_{n}.

Proof.

First observe that 𝔼[YjY¯n]=Y¯n\mathbb{E}[Y_{j}\mid\overline{Y}_{n}]=\overline{Y}_{n} for any j=1,,nj=1,\dots,n, since

Y¯n=𝔼[Y¯nY¯n]=1ni=1n𝔼[YiY¯n]=𝔼[YjY¯n]\overline{Y}_{n}=\mathbb{E}[\overline{Y}_{n}\mid\overline{Y}_{n}]=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[Y_{i}\mid\overline{Y}_{n}]=\mathbb{E}[Y_{j}\mid\overline{Y}_{n}]

where the final equality follows by assumption that the YiY_{i}’s are iid. Then

𝔼[Y¯nY¯n]\displaystyle\mathbb{E}[\overline{Y}_{n^{\prime}}\mid\overline{Y}_{n}] =1ni=1n𝔼[YiY¯n]=1ni=1nY¯n=Y¯n\displaystyle=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}\mathbb{E}[Y_{i}\mid\overline{Y}_{n}]=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}\overline{Y}_{n}=\overline{Y}_{n}

and the distribution of Y¯n\overline{Y}_{n^{\prime}} is a mean-preserving spread of the distribution of Y¯n\overline{Y}_{n} as desired. ∎

This lemma implies that each ZkiidZ_{k}^{iid} second-order stochastically dominates ZkindZ_{k}^{ind} (since |Sk|2nshn|S_{k}|\geq 2^{n-s-h_{n}} for all kk). The desired result then follows by Jensen’s inequality, since the entries of (Z1ind,,ZKind)(Z_{1}^{ind},\dots,Z_{K}^{ind}) are (by construction) independent and the maximum is a convex function. ∎

A.2.3 Asymptotic Normality

Lemma A.3.

Let

VnN𝔼[max{Z1N,,ZKnN}]V^{N}_{n}\equiv\mathbb{E}\left[\max\{Z_{1}^{N},\dots,Z_{K_{n}}^{N}\}\right]

where ZkN𝒩(μ,12nshn)Z^{N}_{k}\sim\mathcal{N}\left(\mu,\frac{1}{2^{n-s-h_{n}}}\right). Then limn|VniidVnN|=0.\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0.

Proof.

Without loss of generality, let Var(Yjk)=1Var(Y_{j}^{k})=1.292929If Var(Yjk)=0Var(Y_{j}^{k})=0, the statement of Theorem 3.1 holds trivially. First observe that

2nshnVniid=𝔼[max{Z~1iid,,Z~Kniid}]\sqrt{2^{n-s-h_{n}}}\cdot V^{iid}_{n}=\mathbb{E}\left[\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\}\right]

where each

Z~kiid=12nshni=12nshnYik.\widetilde{Z}^{iid}_{k}=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\sum_{i=1}^{2^{n-s-h_{n}}}Y^{k}_{i}.

Similarly we can write

2nshnVnN=𝔼[max{Z~1N,,Z~KnN}]\sqrt{2^{n-s-h_{n}}}\cdot V^{N}_{n}=\mathbb{E}\left[\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}\right]

where each

Z~Niid𝒩(μ,1).\widetilde{Z}^{N}\sim_{iid}\mathcal{N}\left(\mu,1\right).

When the assumptions for Corollary 2.1 from Chernozhukov et al. (2013) are met (to be verified momentarily), we can conclude that

ρ(max{Z~1iid,,Z~Kniid},max{Z~1N,,Z~KnN})0\rho\left(\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\},\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}\right)\rightarrow 0

where ρ\rho denotes Kolmogorov distance. Thus also

ρ(Mniid,MnN)0\rho(M^{iid}_{n},M^{N}_{n})\rightarrow 0 (A.5)

where

Mniid\displaystyle M^{iid}_{n} =12nshnmax{Z~1iid,,Z~Kniid}\displaystyle=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\}
MnN\displaystyle M^{N}_{n} =12nshnmax{Z~1N,,Z~KnN}\displaystyle=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}

By assumption, each YikY_{i}^{k} is supported on [y¯,y¯]\left[-\overline{y},\overline{y}\right] for some finite y¯\overline{y}. This implies |Mniid|y¯|M_{n}^{iid}|\leq\overline{y} for all nn, so the sequence (Mniid)n(M_{n}^{iid})_{n} is uniformly integrable. The convergence in (A.5) thus implies

limn|𝔼[Mniid]𝔼[MnN]|=limn|VniidVnN|=0\lim_{n\rightarrow\infty}\left|\mathbb{E}\left[M_{n}^{iid}\right]-\mathbb{E}\left[M_{n}^{N}\right]\right|=\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0

as desired.

It remains to verify that the conditions of Corollary 2.1 from Chernozhukov et al. (2013) are met. This follows from the assumption that YjkY_{j}^{k}’s are uniformly bounded, and the observation that

log(Kn2nshn)72(1c)(nshn)n0\frac{\log(K_{n}\cdot 2^{n-s-h_{n}})^{7}}{2^{(1-c)(n-s-h_{n})}}\xrightarrow{n\rightarrow\infty}0

for any c(0,1)c\in(0,1), since Kn=j=0hn(nsj)2nsK_{n}=\sum_{j=0}^{h_{n}}{n-s\choose j}\leq 2^{n-s} by the Binomial Theorem and αh<1\alpha_{h}<1. ∎

A.2.4 Upper Bound for Expected Maximum of Gaussians

Finally by Berman (1964), which provides an upper bound for the expected maximum of independent Gaussian random variables

VnN12nshn2log(Kn)12n(1αh)s2nV^{N}_{n}\leq\frac{1}{2^{n-s-h_{n}}}\cdot 2\sqrt{\log(K_{n})}\leq\frac{1}{2^{n(1-\alpha_{h})-s}}\cdot 2\sqrt{n}

where the final expression converges to zero as nn\rightarrow\infty by assumption that αh<1\alpha_{h}<1. Since clearly also limn𝔼[max1kKnZkμ]0\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]\geq 0, this concludes the proof of Proposition A.1.

A.3 Proof of Proposition A.2

Statement of the proposition: limn𝔼[max1kKn|Zkμ|]=0.\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|]=0.

In an abuse of notation, let ZkZkμZ_{k}\equiv Z_{k}-\mu denote de-meaned sample average. By rewriting the max within the expectation we obtain

𝔼[max1kKn|Zk|]\displaystyle\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}|\right] =𝔼[max{max1kKnZk,min1kKnZk}]\displaystyle=\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}Z_{k},-\min_{1\leq k\leq K_{n}}Z_{k}\right\}\right]
𝔼[max{max1kKn{Zk},0}]+𝔼[max{min1kKn{Zk},0}]\displaystyle\leq\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]+\mathbb{E}\left[\max\left\{-\min_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]

We will show that each term of this final expression converges to zero. Observe that

𝔼[max{max1kKn{Zk},0}]=(max1kKnZk0)𝔼[max1kKnZkmax1kKnZk0]\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right] (A.6)

Moreover,

𝔼[max1kKnZk]\displaystyle\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right] =(max1kKnZk0)𝔼[max1kKnZkmax1kKnZk0]\displaystyle=\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right]
+(max1kKnZk<0)𝔼[max1kKnZkmax1kKnZk<0]\displaystyle\quad\quad+\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right]

so

(max1kKnZk0)\displaystyle\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right) 𝔼[max1kKnZkmax1kKnZk0]=\displaystyle\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right]=
=𝔼[max1kKnZk](max1kKnZk<0)𝔼[max1kKnZkmax1kKnZk<0]\displaystyle=\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]-\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right] (A.7)

From Lemma A.1,

limn𝔼[max1kKnZk]=0.\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]=0. (A.8)

Moreover, we showed in Section A.2.1 that the distribution of (Z1ind,,ZKnind)(Z^{ind}_{1},\dots,Z^{ind}_{K_{n}}) first-order-stochastically-dominates that of (Z1,,ZKn)(Z_{1},\dots,Z_{K_{n}}), so

(max1kKnZk<0)(max1kKnZkind<0)1kKn(Zkind<0)\displaystyle\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\leq\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z^{ind}_{k}<0\right)\leq\prod_{1\leq k\leq K_{n}}\mathbb{P}(Z_{k}^{ind}<0)

which converges to zero as nn grows large since each (Zkind<0)<1\mathbb{P}(Z_{k}^{ind}<0)<1. Finally,

𝔼[max1kKnZkmax1kKnZk<0][Y¯,Y¯]\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right]\in\left[-\overline{Y},\overline{Y}\right] (A.9)

uniformly across nn. Putting together (A.6) - (A.9) we have that

limn𝔼[max{max1kKn{Zk},0}]=0\lim_{n\rightarrow\infty}\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=0

as desired. The argument that

limn𝔼[max{min1kKn{Zk},0}]=0\lim_{n\rightarrow\infty}\mathbb{E}\left[\max\left\{-\min_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=0

follows identically, observing that Proposition A.1 is satisfied for Y~Y\widetilde{Y}\equiv-Y, and that

min1kKnZk=max1kKnjSkYj|Sk|=max1kKnjSkY~j|Sk|.-\min_{1\leq k\leq K_{n}}Z_{k}=\max_{1\leq k\leq K_{n}}-\frac{\sum_{j\in S_{k}}Y_{j}}{|S_{k}|}=\max_{1\leq k\leq K_{n}}\frac{\sum_{j\in S_{k}}\widetilde{Y}_{j}}{|S_{k}|}.

A.4 Concluding the proof of Theorem A.2

Recall that Zn12nsj=12nsYjZ^{n}_{\varnothing}\equiv\frac{1}{2^{n-s}}\sum_{j=1}^{2^{n-s}}Y_{j} denotes the (random) posterior expectation when the agent chooses not to disclose any nonstandard covariates. Clearly V(n)0V(n)\geq 0 (since the agent can always choose to disclose nothing). Also

V(n)\displaystyle V(n) =𝔼[max1kKnu(Zk,Y)]𝔼[u(Zn,Y)]\displaystyle=\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)\right]-\mathbb{E}\left[u(Z^{n}_{\varnothing},Y)\right]
𝔼[max1kKn|u(Zk,Y)u(Zn,Y)|]\displaystyle\leq\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)|\right] (A.10)

Each absolute difference |u(Zk,Y)u(Zn,Y)||u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)| can be bounded from above using the triangle inequality

|u(Zk,Y)u(Zn,Y)||u(Zk,Y)u(μ,Y)|+|u(μ,Y)u(Zn,Y)||u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)|\leq|u(Z_{k},Y)-u(\mu,Y)|+|u(\mu,Y)-u(Z^{n}_{\varnothing},Y)| (A.11)

Since uu is by assumption Lipschitz continuous in the first argument, there is a constant BB such that

|u(zk,y)u(μ,y)|B|zkμ||u(z_{k},y)-u(\mu,y)|\leq B|z_{k}-\mu| (A.12)

and

|u(μ,y)u(z,y)|B|zμ||u(\mu,y)-u(z_{\varnothing},y)|\leq B|z_{\varnothing}-\mu| (A.13)

for any realizations zkz_{k} and zz_{\varnothing} of ZkZ_{k} and ZnZ^{n}_{\varnothing}. Combining equations A.10-A.13 we get

V(n)B(𝔼[max1kKn|Zkμ|]+𝔼[|Znμ|])V(n)\leq B\left(\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|\right]+\mathbb{E}\left[|Z^{n}_{\varnothing}-\mu|\right]\right)

Clearly 𝔼[Zn]=μ\mathbb{E}[Z^{n}_{\varnothing}]=\mu. Moreover, by assumption that each YY is uniformly bounded above and below, the sequence (Zn)(Z^{n}_{\varnothing}) is uniformly integrable. It follows from the Law of Large Numbers that

limn𝔼[|Znμ|]=0\lim_{n\rightarrow\infty}\mathbb{E}[|Z^{n}_{\varnothing}-\mu|]=0

Finally, limn𝔼[max1kKn|Zkμ|]=0\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|\right]=0 follows directly from Lemma A.2. So the RHS of A.11 converges to zero, implying V(n)0V(n)\rightarrow 0 as desired.

A.5 Theorem A.2 implies Theorem A.1

In an abuse of notation, let PnFP^{n}\sim F mean that Y𝕩niidFY_{\mathbb{x}_{n}}\sim_{iid}F across all covariate vectors 𝕩n\mathbb{x}_{n}. We have already shown in Theorem 3.1 that limn𝔼PnF(vn(P))=0\lim_{n\rightarrow\infty}\mathbb{E}_{P^{n}\sim F}(v_{n}(P))=0 for any distribution FF. Now suppose instead that Assumption 1 is satisfied. By de Finetti’s theorem, there exists a set Θ\Theta, family of conditional measures (πθ)θΘ(\pi_{\theta})_{\theta\in\Theta}, and measure νΔ(Θ)\nu\in\Delta(\Theta) such that

V(n,𝕩)=Θ𝔼PnFθ(vn(P,𝕩n))𝑑ν(θ)\displaystyle V(n,\mathbb{x})=\int_{\Theta}\mathbb{E}_{P^{n}\sim F_{\theta}}(v_{n}(P,\mathbb{x}_{n}))d\nu(\theta)

where the inner expectation converges to zero for every θ\theta by Theorem A.2. By assumption that uu is Lipschitz continuous on a compact domain, there exist u¯\underline{u} and u¯\overline{u} such that u(y^,y)[u¯,u¯]u(\hat{y},y)\in[\underline{u},\overline{u}] for all (y^,y)(\hat{y},y). So 𝔼PnFθ(vn(P,𝕩n))\mathbb{E}_{P^{n}\sim F_{\theta}}(v_{n}(P,\mathbb{x}_{n})) is pointwise bounded above by u¯u¯\overline{u}-\underline{u}, and the Dominated Convergence Theorem implies limnV(n,𝕩)=0\lim_{n\rightarrow\infty}V(n,\mathbb{x})=0, as desired.

A.6 Proof of Theorem 3.2

Throughout the proof we set s=0s=0, μ=0\mu=0 and σ2=𝔼(Yi2)=1\sigma^{2}=\mathbb{E}(Y_{i}^{2})=1 without loss of generality. We’ll start by demonstrating that the stated results hold asymptotically (i.e., for large enough nn) and subsequently prove that the bound in (3.4) is sufficient.

(a) As before let Bn{1,,2n}B_{n}\subseteq\{1,\dots,2^{n}\} index those 2nbn2^{n-b_{n}} covariate vectors that agree with the agent’s covariate vector for all covariates in BB. Then the black box evaluator’s posterior expectation is the sample average

ZBn=12nbnjBnYj.Z_{B}^{n}=\frac{1}{2^{n-b_{n}}}\sum_{j\in B_{n}}Y_{j}.

We will show that

Δ(n)\displaystyle\Delta(n) 𝔼[ϕ(ZBn)]𝔼[max1kKnϕ(Zk)]\displaystyle\equiv\mathbb{E}[\phi(Z_{B}^{n})]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})\right]
=𝔼[ϕ(ZBn)ϕ(0)]𝔼[max1kKnϕ(Zk)ϕ(0)]>0\displaystyle=\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]>0

for large enough nn.

We start by analyzing the first difference 𝔼[ϕ(ZBn)ϕ(0)]\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]. Using Taylor’s expansion we get

𝔼[ϕ(ZBn)ϕ(0)]=𝔼[ϕ(0)ZBn]+𝔼[ϕ′′(Z~)2(ZBn)2]\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]=\mathbb{E}[\phi^{\prime}(0)Z_{B}^{n}]+\mathbb{E}\left[\frac{\phi^{\prime\prime}(\tilde{Z})}{2}(Z_{B}^{n})^{2}\right]

for some Z~[0,ZBn]\tilde{Z}\in[0,Z_{B}^{n}]. Note that 𝔼[ZBn]=𝔼[Y]=0\mathbb{E}[Z_{B}^{n}]=\mathbb{E}[Y]=0. Moreover, ϕ′′(Z~)c1>0\phi^{\prime\prime}(\tilde{Z})\geq c_{1}>0 for some c1c_{1}, since ϕ\phi is strictly convex. Thus

𝔼[ϕ(ZBn)ϕ(0)]c1𝔼[(ZBn)2]=c12(1αb)n\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]\geq c_{1}\mathbb{E}[(Z_{B}^{n})^{2}]=\frac{c_{1}}{2^{(1-\alpha_{b})n}} (A.14)

Next turn to 𝔼[max1kKnϕ(Zk)ϕ(0)]\mathbb{E}[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)]. For each term inside the maximum we have that

ϕ(Zk)ϕ(0)c2|Zk|\phi(Z_{k})-\phi(0)\leq c_{2}|Z_{k}| (A.15)

where the latter inequality follows from the fact that ϕ\phi^{\prime} is continuous on a compact set, and hence bounded by some c20c_{2}\geq 0. Thus

𝔼[max1kKnϕ(Zk)ϕ(0)]c2𝔼[max{|Zk|}]\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]\leq c_{2}\mathbb{E}\left[\max\{|Z_{k}|\}\right]

From our proof of Proposition A.2 it follows that

𝔼[max{|Zk|}]12(1αh)n1log(Kn)\mathbb{E}[\max\{|Z_{k}|\}]\leq\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}

And, thus

𝔼[max1kKnϕ(Zk)ϕ(0)]c212(1αh)n1log(Kn)\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]\leq c_{2}\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}

Combining the bounds from steps 1 and 2 we get

Δ(n)c112(1αb)n2c212(1αh)nlog(Kn)\Delta(n)\geq c_{1}\frac{1}{2^{(1-\alpha_{b})n}}-2c_{2}\frac{1}{2^{(1-\alpha_{h})n}}\sqrt{\log(K_{n})}

The RHS is positive for all large nn if and only if

2(1αh)n2(1αb)nn\frac{2^{(1-\alpha_{h})n}}{2^{(1-\alpha_{b})n}}\xrightarrow{n\rightarrow\infty}\infty

since log(Kn)\sqrt{\log(K_{n})} has sub-exponential but non-constant asymptotics. This condition is satisfied if and only if αb>αh\alpha_{b}>\alpha_{h}.

(b) Since ϕ-\phi is convex, the above arguments apply to show that

𝔼[ϕ(ZBn)ϕ(0)]c12(1αb)n\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]\leq-\frac{c_{1}}{2^{(1-\alpha_{b})n}}

for some c1>0c_{1}>0, while

𝔼[min1kKnϕ(Zk)ϕ(0)]\displaystyle\mathbb{E}\left[\min_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right] =𝔼[max1kKnϕ(Zk)(ϕ(0))]\displaystyle=-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}-\phi(Z_{k})-(-\phi(0))\right]
c212(1αh)n1log(Kn)\displaystyle\geq-c_{2}\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}

for some c2>0c_{2}>0. The desired conclusion follows.


Finally, we show that the bound in (3.4) is sufficient for the comparison in part (a)(a) of the result (with identical computations applying to part (b)(b)). Suppose ϕ(.)\phi(.) is strictly convex and denote C=2c2c1C=\frac{2c_{2}}{c_{1}}, where c1,c2>0c_{1},c_{2}>0 are the constants used above, respectively reflecting ϕ\phi’s lowest degree of convexity (c1=infy[y¯,y¯]|ϕ′′(y)|c_{1}=\inf_{y\in[-\overline{y},\overline{y}]}|\phi^{\prime\prime}(y)|) and largest growth rate (c2=supy[y¯,y¯]|ϕ(y)|c_{2}=\sup_{y\in[-\overline{y},\overline{y}]}|\phi^{\prime}(y)|). Following the proof of part (a), the Black Box is preferred if

2(αbαh)n>Clog(Kn)2^{(\alpha_{b}-\alpha_{h})n}>C\sqrt{\log(K_{n})}

Since Kn2nK_{n}\leq 2^{n}, this inequality is satisfied if

(αbαr)n12log2(n)>log2(C)(\alpha_{b}-\alpha_{r})n-\frac{1}{2}\log_{2}(n)>\log_{2}(C)

The above inequality implicitly defines a threshold on the sufficient number of covariates N(C)N(C), exceeding which the Black Box is preferred.

A.7 Result Extending Theorem 3.2 Part (a)

Consider a model in which the evaluator chooses an action aa given the realization of the agent’s covariates, and the evaluator and agent share the payoff function (ay)2-(a-y)^{2}. The following result shows that the conclusion of Part (a) of Theorem 3.2 extends for non-binary types yy.

Proposition A.3.

There exists an NN sufficiently large such that the agent prefers the black box evaluator for all nNn\geq N.

Proof.

Throughout the proof set s=0s=0, 𝔼[Y]=0\mathbb{E}[Y]=0 and σ2=𝔼(Yi2)=1\sigma^{2}=\mathbb{E}(Y_{i}^{2})=1 without loss. We will show that

𝔼[u(ZBn,y)]\displaystyle\mathbb{E}[u(Z_{B}^{n},y)] 𝔼[max1kKnu(Zk,y)]\displaystyle-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)\right]
=𝔼[u(ZBn,y)u(0,y)]𝔼[max1kKnu(Zk,y)u(0,y)]>0\displaystyle=\mathbb{E}[u(Z_{B}^{n},y)-u(0,y)]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)-u(0,y)\right]>0

for large enough nn.

Let xB=(xi)iBx_{B}=(x_{i})_{i\in B} denote the covariates that Black Box observes, and as before let ZBn=𝔼[yxB]Z_{B}^{n}=\mathbb{E}[y\mid x_{B}] denote Black Box’s (random) posterior expectation. The optimal action choice a=ZBna=Z_{B}^{n} yields expected payoff Var(yxB)\text{Var}(y\mid x_{B}). By the Law of Total Variance, 𝔼[Var(yxB)]=Var(ZBn)Var(Y)\mathbb{E}[-\text{Var}(y\mid x_{B})]=\text{Var}(Z_{B}^{n})-\text{Var}(Y). Since additionally 𝔼[u(0,y)]=Var(y)\mathbb{E}[u(0,y)]=\text{Var}(y), we obtain

𝔼[u(ZBn,y)u(0,y)]=𝔼[(ZBn)2]=12(1αB)n.\mathbb{E}[u(Z_{B}^{n},y)-u(0,y)]=\mathbb{E}\left[(Z_{B}^{n})^{2}\right]=\frac{1}{2^{(1-\alpha_{B})n}}.

Now turn to 𝔼[max1kKnu(Zk,y)u(0,y)]\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)-u(0,y)\right]. By Lipschitz continuity of uu, there is a constant c2c_{2} such that u(zk,y)u(0,y)c2|zk|u(z_{k},y)-u(0,y)\leq c_{2}|z_{k}| holds pointwise for each realization of (zk,y)(z_{k},y). So

𝔼[max1kKnu(Zk,Y)u(0,Y)]c2𝔼[max{|Zk|}]\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)-u(0,Y)\right]\leq c_{2}\mathbb{E}[\max\{|Z_{k}|\}]

The remainder of the proof proceeds identically to the proof of Theorem 3.2. ∎

Appendix B Proofs for Results in Sections 4 and 5

B.1 Proof of Corollary 1

We continue in the general setting outlined in the proof of Theorem A.1. Fix any realization 𝕩𝒮=(x1,,xs)\mathbb{x}_{\mathcal{S}}=(x_{1},\dots,x_{s}) of the standard covariates. As in the proof of Theorem 3.1, there are 2ns2^{n-s} covariate vectors 𝕩n{0,1}n\mathbb{x}_{n}\in\{0,1\}^{n} with positive probability conditional on 𝕩𝒮\mathbb{x}_{\mathcal{S}}. Index these by j=1,,2nsj=1,\dots,2^{n-s}, and define

Yj𝕩𝒮𝔼Pn[Y(X1,,Xn)=𝕩nj]Y^{\mathbb{x}_{\mathcal{S}}}_{j}\equiv\mathbb{E}_{P^{n}}\left[Y\mid(X_{1},\dots,X_{n})=\mathbb{x}^{j}_{n}\right]

to be the expected type given covariate vector 𝕩nj\mathbb{x}_{n}^{j}. For each covariate vector 𝕩n\mathbb{x}_{n} and each disclosure set Dk{s+1,,n}D_{k}\subseteq\{s+1,\dots,n\}, there is a corresponding set of covariate vectors SkS_{k} such that the evaluator’s posterior expectation after the agent discloses his covariates in set DkD_{k} is

Zk𝕩𝒮=jSkYj𝕩𝒮|Sk|.Z_{k}^{\mathbb{x}_{\mathcal{S}}}=\frac{\sum_{j\in S_{k}}Y^{\mathbb{x}_{\mathcal{S}}}_{j}}{|S_{k}|}.

Different from the proof of Theorem 3.1, there are now K¯n=j=0hn(nsj)2j\overline{K}_{n}=\sum_{j=0}^{h_{n}}{n-s\choose j}2^{j} unique sets SkS_{k} (ranging over not only the different possible sets of covariates to disclose but also their values). By the Binomial Theorem,

j=0hn(nsj)2jj=0ns(nsj)2j=3ns.\sum_{j=0}^{h_{n}}{n-s\choose j}2^{j}\leq\sum_{j=0}^{n-s}{n-s\choose j}2^{j}=3^{n-s}.

Following the proof of Lemma A.1, we obtain that

𝔼(max1kK¯n|Zk𝕩𝒮μ|)12nshnClog(K¯n)12n(1αh)sClog(3ns)\mathbb{E}\left(\max_{1\leq k\leq\overline{K}_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu|\right)\leq\frac{1}{2^{n-s-h_{n}}}C\sqrt{\log(\overline{K}_{n})}\leq\frac{1}{2^{n(1-\alpha_{h})-s}}C\sqrt{\log(3^{n-s})}

which again converges to zero by assumption that αh<1\alpha_{h}<1. Finally observe that

𝔼[max𝕩𝒮{0,1}s(max1kKn|Zk𝕩𝒮μ|)]\displaystyle\mathbb{E}\left[\max_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\left(\max_{1\leq k\leq K_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu|\right)\right] 𝔼[𝕩𝒮{0,1}smax1kKn|Zk𝕩𝒮μ|]\displaystyle\leq\mathbb{E}\left[\sum_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\max_{1\leq k\leq K_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu|\right]
=𝕩𝒮{0,1}s𝔼[max1kKn|Zk𝕩𝒮μ|].\displaystyle=\sum_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu|\right].

Since each 𝔼[max1kKn|Zk𝕩𝒮|]0\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}|\right]\rightarrow 0 as nn\rightarrow\infty, the RHS converges to zero. We thus obtain the analogue of Lemma A.2 for the expected maximum value of context, and the remainder of the proof proceeds identically to Theorem 3.1.

B.2 Proof of Proposition 4.1

Throughout this proof, we set s=0s=0 for simplicity of notation.

Let (σ,μ)(\sigma^{*},\mu^{*}) denote a typical PBE, where σ\sigma^{*} is the Sender’s disclosure strategy and μ\mu^{*} is the Receiver’s belief function. Fixing any such equilibrium, we use Zμ(d)Z_{\mu^{*}}(d) to denote the Receiver’s posterior expectation given disclosure dd. We first prove that at least one pure-strategy equilibrium always exists.

Proposition B.1.

For every n and f there exists a pure-strategy f-context equilibrium.

Proof.

Consider a candidate equilibrium (σ,μ)(\sigma^{*},\mu^{*}), where σ(𝐱n)=\sigma^{*}(\mathbf{x}_{n})=\varnothing for all xn{0,1}n\textbf{x}_{n}\in\{0,1\}^{n} (which is clearly a feasible disclosure for all agents). The Receiver’s beliefs at disclosure \varnothing are pinned down by Bayes’ rule. For any other disclosure dd\neq\varnothing, we construct out-of-equilibrium beliefs such that u(Zμ())u(Zμ(d))u(Z_{\mu*}(\varnothing))\geq u(Z_{\mu^{*}}(d)). This is always possible, for example by setting Zμ()=Zμ(d)Z_{\mu*}(\varnothing)=Z_{\mu^{*}}(d) for every dd. Then by construction reporting \varnothing is a best response for any xn\textbf{x}_{n}, so we are done. ∎

Consider any function ff and any pure-strategy equilibrium (σ,μ)(\sigma^{*},\mu^{*}) of the ff-context disclosure game. Let d1,,dNd_{1},\dots,d_{N} index the disclosures that have positive probability under σ\sigma^{*} (i.e., all d𝒟d\in\mathcal{D} such that σ(𝕩n)=d\sigma^{*}(\mathbb{x}_{n})=d for some 𝕩n\mathbb{x}_{n}). For each such disclosure did_{i},

Zμ(di)=1|{x:σ(x)=di}|x:σ(x)=dif(x)Z_{\mu^{*}}(d_{i})=\frac{1}{|\{x:\sigma^{*}(x)=d_{i}\}|}\sum_{x:\sigma^{*}(x)=d_{i}}f(x)

is the evaluator’s posterior expectation upon observing disclosure did_{i}. Given the evaluator’s payoff function, the optimal action for the evaluator is precisely Zμ(di)Z_{\mu^{*}}(d_{i}). Let

d=(H,(𝒳i)iH):=argmax1iNu(Zμ(di))d^{*}=\left(H^{*},(\mathcal{X}^{*}_{i})_{i\in H^{*}}\right):=\arg\max_{1\leq i\leq N}u(Z_{\mu^{*}}(d_{i})) (B.1)

be the disclosure that yields the highest payoff to the Sender. Then it must be that σ(𝕩n)=d\sigma^{*}(\mathbb{x}_{n})=d^{*} for every covariate vector 𝕩n\mathbb{x}_{n} for which disclosure dd^{*} is feasible. Otherwise dd^{*} would be a profitable deviation. Hence the evaluator’s posterior expectation in this equilibrium is the same as it would have been given disclosure of dd^{*} in our main model. So

u(Zμ(d))max𝕩n{0,1}nv(f,𝕩n).u(Z_{\mu^{*}}(d^{*}))\leq\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v(f,\mathbb{x}_{n}).

Since the payoff received by an agent with any other covariate vector cannot exceed u(Zμ(d))u(Z_{\mu^{*}}(d^{*})) (by (B.1)), we have the desired result.

B.3 Result for Mixed Strategy Equilibria

In this part we restrict to equilibria (σ,μ)(\sigma^{*},\mu^{*}) with the property that argmaxy^A(σ,μ)u(y^)\operatorname*{arg\,max}_{\hat{y}\in A_{(\sigma^{*},\mu^{*})}}u(\hat{y}) is unique on the set A(σ,μ)A_{(\sigma^{*},\mu^{*})} of posterior expectations with positive probability in this equilibrium. Call these equilibria generic. (A sufficient condition for all equilibria to be generic is if uu is strictly monotone.)

For each nn and ff, let vD(f,𝕩n)v^{D}(f,\mathbb{x}_{n}) denote the highest payoff that an agent with covariate vector 𝕩n\mathbb{x}_{n} receives in any generic equilibrium (potentially mixed) of the ff-context disclosure game. Further define

vfD(n)=max𝕩nvD(f,𝕩n)v^{D}_{f}(n)=\max_{\mathbb{x}_{n}}v^{D}(f,\mathbb{x}_{n})

and

V𝒟(n)=𝔼[vfD(n)]V^{\mathcal{D}}(n)=\mathbb{E}[v^{D}_{f}(n)]

where the expectation is with respect to the realization of ff.

Proposition B.2.

Suppose Assumption 1 holds and u(.)u(.) is twice continuously differentiable. Then limnV𝒟(n)=0\lim_{n\rightarrow\infty}V^{\mathcal{D}}(n)=0.

Proof.

Fix nn, ff, and a context equilibrium (σ,μ)(\sigma^{*},\mu^{*}) of the ff-context disclosure game. Let 𝒵[y¯,y¯]\mathcal{Z}^{*}\subseteq[-\overline{y},\overline{y}] be the compact set of all equilibrium posterior expectations that are realized with positive probability in this equilibrium. Further, denote

Z(1)=argmaxz𝒵u(z)Z_{(1)}^{*}=\arg\max_{z\in\mathcal{Z}^{*}}u(z)

to be the most-preferred achievable posterior expectation, which is unique by assumption of genericity of the equilibrium.

Since Z(1)Z_{(1)}^{*} is the best attainable posterior expectation, an agent achieves Z(1)Z_{(1)}^{*} in equilibrium if and only if it is feasible. (Otherwise, the agent can profitably deviate to the feasible disclosure that induces this posterior expectation.)

Let 𝒳{0,1}n\mathcal{X}^{*}\subseteq\{0,1\}^{n} denote the set of agents who have a feasible disclosure that achieves Z(1)Z_{(1)}^{*}. Let 𝒟(𝒳)\mathcal{D}(\mathcal{X}^{*}) be the set of disclosures that agents in 𝒳\mathcal{X}^{*} send with positive probability in equilibrium. By the logic above, 𝒟(𝒳)𝒟(𝒳𝒳)=\mathcal{D}(\mathcal{X}^{*})\cap\mathcal{D}(\mathcal{X}\setminus\mathcal{X}^{*})=\varnothing. Using the structure of this equilibrium we can write

𝔼[Y]=Z(1)p𝒳+(1p𝒳)𝔼[Y|X𝒳]\mathbb{E}[Y]=Z_{(1)}^{*}p_{\mathcal{X}^{*}}+(1-p_{\mathcal{X}^{*}})\mathbb{E}[Y|X\notin\mathcal{X}^{*}] (B.2)

where p𝒳p_{\mathcal{X}^{*}} is the ex-ante probability that the agent’s covariate vector belongs to 𝒳\mathcal{X}^{*}, and 𝔼[Y|X𝒳]\mathbb{E}[Y|X\notin\mathcal{X}^{*}] is the expectation of the agent’s type given that his covariate vector does not belong to 𝒳\mathcal{X}^{*}. Here we utilize the fact that the evaluator’s posterior expectation is constant at Z(1)Z_{(1)}^{*} across all agents with covariate vectors in 𝒳\mathcal{X}^{*}.303030In general this does not have to be the case. We rule this out in the definition of the equilibrium.

Now, consider the following alternative “strategy” σ0\sigma_{0}, which relaxes the feasibility constraint: For any 𝕩𝒳𝒳\mathbb{x}\in\mathcal{X}\setminus\mathcal{X}^{*} let σ0(x)σ(x)\sigma_{0}(\textbf{x})\equiv\sigma^{*}(\textbf{x}), i.e., the disclosures are the same as in the original equilibrium. Further choose some arbitrary disclosure d0𝒟(𝒳)d_{0}\in\mathcal{D}(\mathcal{X}^{*}) and let σ0(x)=d0\sigma_{0}(\textbf{x})=d_{0} for all 𝕩𝒳\mathbb{x}\in\mathcal{X}^{*}. The Receiver’s posterior expectation following observation of disclosure d0d_{0} is

Z0=x𝒳Yx|𝒳|Z_{0}=\frac{\sum_{x\in\mathcal{X}^{*}}Y_{x}}{|\mathcal{X}^{*}|}

and, analogous to (B.2), we can write

𝔼[Y]=Z0p𝒳+(1p𝒳)𝔼[Y|X𝒳]\mathbb{E}[Y]=Z_{0}p_{\mathcal{X}^{*}}+(1-p_{\mathcal{X}^{*}})\mathbb{E}[Y|X\notin\mathcal{X}^{*}] (B.3)

Combining equations (B.2) and (B.3) we conclude:

Z(1)=x𝒳Yx|𝒳|Z_{(1)}^{*}=\frac{\sum_{x\in\mathcal{X}^{*}}Y_{x}}{|\mathcal{X}^{*}|}

which almost surely converges to 𝔼[Y]\mathbb{E}[Y] so long as |𝒳|n|\mathcal{X}^{*}|\xrightarrow{n\rightarrow\infty}\infty. Since the YxY_{x}’s are uniformly bounded, this also implies 𝔼[Z(1)]𝔼[Y]\mathbb{E}[Z^{*}_{(1)}]\rightarrow\mathbb{E}[Y], as desired. We now demonstrate that indeed |𝒳|n|\mathcal{X}^{*}|\xrightarrow{n\rightarrow\infty}\infty.

For any disclosure dd denote by Cd{0,1}nC_{d}\subseteq\{0,1\}^{n} the set of all covariate vectors x given which dd is feasible. Since Z(1)Z_{(1)}^{*} is achieved by all agents for whom Z(1)Z_{(1)}^{*} is feasible, it must be that for every disclosure d𝒟(𝒳)d\in\mathcal{D}(\mathcal{X}^{*}) we have Cd𝒳C_{d}\subseteq\mathcal{X}^{*}. Then for any d𝒟(𝒳)d\in\mathcal{D}(\mathcal{X}^{*}),

|𝒳||Cd|n.|\mathcal{X}^{*}|\geq|C_{d}|\xrightarrow{n\rightarrow\infty}\infty.

where the limit follows by assumption that αh<1\alpha_{h}<1. This completes the proof. ∎

B.4 Proof of Proposition 5.1

We again continue in the general setting outlined in the proof of Theorem A.1, and adopt the conventions that 𝔼(Y)=μ\mathbb{E}(Y)=\mu while Var(Y)=1\text{Var}(Y)=1. We prove the result for a weakening of Assumptions 5 and 6 to the following.

Assumption 13.

Fix any realization of the standard covariates 𝕩𝒮{0,1}s\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}. There is an infinitely exchangeable sequence (Y~1,Y~2,)(\widetilde{Y}_{1},\widetilde{Y}_{2},\dots) such that for every nn\in\mathbb{N}, the sequence

(Y𝕩Rn,𝕩Rn:(xi)iRn\𝒮{0,1}rns)(Y_{\mathbb{x}_{R_{n}},\mathbb{x}_{-R_{n}}}:(x_{i})_{i\in R_{n}\backslash\mathcal{S}}\in\{0,1\}^{r_{n}-s})

has the same distribution as (Y~1,,Y~2rn)(\widetilde{Y}_{1},\dots,\widetilde{Y}_{2^{r_{n}}}).

Recalling that rnr_{n} is the number of relevant covariates, there are 2rn2^{r_{n}} distinct expected conditional types, which we can enumerate as Y1,,Y2rnY_{1},\dots,Y_{2^{r_{n}}}. If disclosure kk involves disclosing krk_{r} relevant covariates, then there is a set SkS_{k} of size 2rnkr2^{r_{n}-k_{r}} such that the evaluator’s posterior expectation can be written

Zk=12nhnjSk2nrn(hnkr)Yj=12rnkrjSkYj.Z_{k}=\frac{1}{2^{n-h_{n}}}\sum_{j\in S_{k}}2^{n-r_{n}-(h_{n}-k_{r})}Y_{j}=\frac{1}{2^{r_{n}-k_{r}}}\sum_{j\in S_{k}}Y_{j}.

As in Step 1 of the proof of Theorem 3.1 (Section A.2.1), replace each YjY_{j} with a variable Yjk=dYjY_{j}^{k}\stackrel{{\scriptstyle d}}{{=}}Y_{j} which is independent across disclosure sets. This yields the random variables

Zkind=12rnkrjSkYjk.Z_{k}^{ind}=\frac{1}{2^{r_{n}-k_{r}}}\sum_{j\in S_{k}}Y^{k}_{j}.

As in the proof of Proposition A.1, it follows from Lemma 1 that

𝔼[max{Z1,,ZKn}]𝔼[max{Z1ind,,ZKnind}].\mathbb{E}[\max\{Z_{1},\dots,Z_{K_{n}}\}]\leq\mathbb{E}[\max\{Z_{1}^{ind},\dots,Z_{K_{n}}^{ind}\}].

Next define

Zkiid=12rnhnj=12rnhnYjkZ^{iid}_{k}=\frac{1}{2^{r_{n}-h_{n}}}\sum_{j=1}^{2^{r_{n}-h_{n}}}Y_{j}^{k}

and note that these are identically and independently distributed with shared variance

Var(Zkiid)=12rnhn.Var(Z^{iid}_{k})=\frac{1}{2^{r_{n}-h_{n}}}.

Following the arguments in Step 2 of the proof of Theorem 3.1 (Section A.2.2), we get

𝔼[max{Z1ind,,ZKnind}]𝔼[max{Z1iid,,ZKniid}].\mathbb{E}[\max\{Z^{ind}_{1},\dots,Z^{ind}_{K_{n}}\}]\leq\mathbb{E}[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}].

where as before Kn=j=0hn(nj)K_{n}=\sum_{j=0}^{h_{n}}\binom{n}{j}. Further, by the argument given in Step 3 of the proof of Theorem 3.1 (Section A.2.3),

limn|VniidVnN|=0\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0

where

VnN𝔼[max{Z1N,,ZKnN}]V^{N}_{n}\equiv\mathbb{E}\left[\max\{Z_{1}^{N},\dots,Z_{K_{n}}^{N}\}\right]

and Zk𝒩(μ,12rnhn)Z_{k}\sim\mathcal{N}\left(\mu,\frac{1}{2^{r_{n}-h_{n}}}\right). Again applying the bound from Berman (1964), we have

VnN12rnhnClog(Kn)12n(αrαh)Cn.V^{N}_{n}\leq\frac{1}{2^{r_{n}-h_{n}}}C\sqrt{\log(K_{n})}\leq\frac{1}{2^{n(\alpha_{r}-\alpha_{h})}}C\sqrt{n}.

By assumption that αr>αh\alpha_{r}>\alpha_{h}, the right-hand expression converges to zero as nn grows large, concluding the proof.

B.5 Supporting Materials for Section 5.3

We show here that both main results generalize to the setting with noise. Instead of repeating the proof step by step, we emphasize what changes must be made in order for the proofs to translate. To be consistent with our previous notation, we again operate with y~i,n=Yi+εn\tilde{y}_{i,n}=Y_{i}+\varepsilon_{n} where ii enumerates covariate vectors. In all subsequent notation a tilde will indicate an object that includes noise εn\varepsilon_{n}, and objects without one are the same as in the main text. Observe that after adding the noise term εn\varepsilon_{n}, the key objects from the proofs transform in the following way: ZkZ_{k} is replaced by Z~k=Zk+εn\tilde{Z}_{k}=Z_{k}+\varepsilon_{n}, and 𝔼[maxkZk]\mathbb{E}[\max_{k}Z_{k}] is replaced by 𝔼[maxkZ~k]\mathbb{E}[\max_{k}\tilde{Z}_{k}]. Since 𝔼[maxkZ~k]=𝔼[maxkZk]\mathbb{E}[\max_{k}\tilde{Z}_{k}]=\mathbb{E}[\max_{k}Z_{k}], the proof of Theorem 3.1 translates directly.

To demonstrate Theorem 3.2, we will show how to adjust the proof of part (a) with part (b) following analogously. The first change is that (A.14) becomes

𝔼[ϕ(Z~Bn)ϕ(0)]c1𝔼[(ZBn)2]c1σε,n2\mathbb{E}[\phi(\tilde{Z}_{B}^{n})-\phi(0)]\geq c_{1}\mathbb{E}[(Z_{B}^{n})^{2}]-c_{1}\sigma_{\varepsilon,n}^{2}

The inequality in (A.15) is also modified to

ϕ(Z~k)ϕ(0)c2|Zk|+c2|εn|\phi(\tilde{Z}_{k})-\phi(0)\leq c_{2}|Z_{k}|+c_{2}|\varepsilon_{n}|

for every disclosure kk. Thus we obtain

Δ(n)c112(1αb)n2c212(1αh)nlog(Kn)+c2𝔼[|εn|]c1σε,n2\Delta(n)\geq c_{1}\frac{1}{2^{(1-\alpha_{b})n}}-2c_{2}\frac{1}{2^{(1-\alpha_{h})n}}\sqrt{\log(K_{n})}+c_{2}\mathbb{E}[|\varepsilon_{n}|]-c_{1}\sigma_{\varepsilon,n}^{2}

We will show that the ratio 𝔼[|εn|]σε,n2\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}} grows arbitrary large with nn, thus asymptotically exceeding c1c2\frac{c_{1}}{c_{2}}. Fixing some d>0d>0 and applying Markov inequality, we obtain

𝔼[|εn|]σε,n2d(|εn|σε,n2dσε,n2)\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}}\geq d\cdot\mathbb{P}\left(\frac{|\varepsilon_{n}|}{\sqrt{\sigma_{\varepsilon,n}^{2}}}\geq d\sqrt{\sigma_{\varepsilon,n}^{2}}\right) (B.4)

If we denote the CDF of εnσε,n2\frac{\varepsilon_{n}}{\sqrt{\sigma_{\varepsilon,n}^{2}}} as GnG_{n}, the RHS of the above inequality can be rewritten as d(1+Gn(dσε,n2))Gn(dσε,n2))d\cdot\left(1+G_{n}(-d\sqrt{\sigma_{\varepsilon,n}^{2}}))-G_{n}(d\sqrt{\sigma_{\varepsilon,n}^{2}})\right). As nn grows large, the term in brackets tends to 12gn(0)dσε,n2+o(σε,n2)1-2g_{n}(0)d\sqrt{\sigma_{\varepsilon,n}^{2}}+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right). We will omit the o()o(\cdot) term until the end of the proof.

Fix an arbitrary δ>0\delta>0 and let d=c1c2+2δd=\frac{c_{1}}{c_{2}}+2\delta. Further, fix NN such that σε,n212gd(1c1+c2δc1+2c2δ)\sqrt{\sigma_{\varepsilon,n}^{2}}\leq\frac{1}{2gd}(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta}), where g=maxngn(0)<g=\max_{n}g_{n}(0)<\infty. Then since σε,n2\sigma_{\varepsilon,n}^{2} is decreasing and gn(0)gg_{n}(0)\leq g we have that for all nNn\geq N

2gn(0)dσε,n22gd12gd(1c1+c2δc1+2c2δ)2g_{n}(0)d\sqrt{\sigma_{\varepsilon,n}^{2}}\leq 2gd\frac{1}{2gd}\left(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta}\right)

Combining this inequality with (B.4) we get

𝔼[|εn|]σε,n2(c1c2+2δ)(1(1c1+c2δc1+2c2δ))+o(σε,n2)=(c1c2+δ)+o(σε,n2)\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}}\geq\left(\frac{c_{1}}{c_{2}}+2\delta\right)\left(1-\left(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta}\right)\right)+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right)=\left(\frac{c_{1}}{c_{2}}+\delta\right)+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right)

for all nNn\geq N. Since δ\delta is an arbitrary positive number, this concludes the proof.

B.6 Proof of Proposition 5.2

Throughout the proof we assume u(x)xu(x)\equiv x and s=0s=0. In addition, for simplicity of notation, we enumerate feasible disclosures by kk and denote the corresponding posteriors (as random variables) as Zkn:=ρf(dk)Z_{k}^{n}:=\rho_{f}(d_{k}). To upper bound the value of context, we apply a result from Arnold and Groeneveld (1979):

|𝔼[maxk{1,,Kn}Zkn𝔼[i=1KnZinKn]]|(11Kn)i=1KnVar(Zin)+1Kni=1Kn(Kn(𝔼[Zin]i=1Kn𝔼[Zin]Kn))2\begin{split}&\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}-\mathbb{E}\left[\frac{\sum_{i=1}^{K_{n}}Z_{i}^{n}}{K_{n}}\right]\right]\right|\leq\\ &\sqrt{\left(1-\frac{1}{K_{n}}\right)\sum_{i=1}^{K_{n}}Var(Z_{i}^{n})+\frac{1}{K_{n}}\sum_{i=1}^{K_{n}}\left(\sqrt{K_{n}}\left(\mathbb{E}[Z_{i}^{n}]-\frac{\sum_{i=1}^{K_{n}}\mathbb{E}[Z_{i}^{n}]}{K_{n}}\right)\right)^{2}}\end{split} (B.5)

By Assumption 9, inequality B.5 simplifies to

|𝔼[maxk{1,,Kn}Zkn]μ|(11Kn)i=1KnVar(Zin)\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}\right]-\mu\right|\leq\sqrt{\left(1-\frac{1}{K_{n}}\right)\sum_{i=1}^{K_{n}}Var(Z_{i}^{n})}

Finally, Assumption 10 implies that Var(Zkn)=o(1Kn)Var(Z_{k}^{n})=o(\frac{1}{K_{n}}) for every disclosure kk. Hence

|𝔼[maxk{1,,Kn}Zkn]μ|(11Kn)Kno(Kn1)\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}\right]-\mu\right|\leq\sqrt{\left(1-\frac{1}{K_{n}}\right)K_{n}o(K_{n}^{-1})}

which yields the desired result after taking a limit in nn. The argument for the lower bound follows the same line of reasoning and is thus omitted.