\AtAppendix

The Value of Context:
Human versus Black Box Evaluators

Andrei Iakovlev and Annie Liang Department of Economics, Northwestern University. We thank Modibo Camara, Krishna Dasaratha, Alex Frankel, Ben Golub, Kevin He, Xiaosheng Mu, Matthew Murphy, Jacopo Perego, Debraj Ray, and Marzena Rostek for helpful comments and suggestions.

Abstract

Machine learning algorithms are now capable of performing evaluations previously conducted by human experts (e.g., medical diagnoses). How should we conceptualize the difference between evaluation by humans and by algorithms, and when should an individual prefer one over the other? We propose a framework to examine one key distinction between the two forms of evaluation: Machine learning algorithms are standardized, fixing a common set of covariates by which to assess all individuals, while human evaluators customize which covariates are acquired to each individual. Our framework defines and analyzes the advantage of this customization—the value of context—in environments with high-dimensional data. We show that unless the agent has precise knowledge about the joint distribution of covariates, the benefit of additional covariates generally outweighs the value of context.

1 Introduction

“A statistical formula may be highly successful in predicting whether or not a person will go to a movie in the next week. But someone who knows that this person is laid up with a broken leg will beat the formula. No formula can take into account the infinite range of such exceptional events.” — Atul Gawande, Complications: A Surgeon’s Notes on an Imperfect Science

Predictions about people are increasingly automated using black-box algorithms. How should individuals compare evaluation by algorithms (e.g., medical diagnosis by a machine learning algorithm) with more traditional evaluation by human experts (e.g., medical diagnosis by a doctor)?

One important distinction is that black-box algorithms are standardized, fixing a common set of inputs by which to assess all individuals. Unless the inputs to the black box are exhaustive, additional information can (in some cases) substantially modify the interpretation of those inputs that have been acquired. For example, the context that a patient is currently fasting may change the interpretations of “dizziness” and “electrolyte imbalance,” and the context that a job applicant is an environmental activist may change how a prior history of arrest is perceived. If these auxiliary characteristics are not specified as inputs in the algorithm, the individual cannot supply them.

In contrast, individuals can often explain their unusual circumstances or characteristics to a human evaluator through conversation. Thus, even if the human evaluator considers fewer inputs than a black box algorithm does, these inputs may be better adapted to the individual being evaluated. The perception that humans are better able to take into account an individual’s unique situation is a significant factor in patient resistance to AI in healthcare (Longoni et al., 2019). Our objective is to understand when, and to what extent, this difference between human and machine evaluation matters.

Our contribution in this paper is twofold. First, we propose a theoretical framework that formalizes this distinction between human and black box evaluation. Second, we explain assumptions under which it will turn out that the the agent should prefer one form of evaluation over the other. We see our paper as a complement to a growing empirical literature that compares human versus black box evaluation. Here our goal is to conceptualize the difference between human and black box evaluators, and to clarify properties of the informational environment that are important for choosing between the two.

In our model, an agent is described by a binary covariate vector and a real-valued type (e.g., the severity of the agent’s medical condition). The type can be written as a function of the covariates, which we henceforth call the type function. Covariates are separated into standard covariates (e.g., medical history, lab tests, imaging scans) and nonstandard covariates (e.g., religious information, genetic data, wearable device data, and financial data).

We suppose that the agent may know how the standard covariates are correlated with the type, but cannot distinguish between the predictive roles of the nonstandard covariates. Formally, the agent has a belief over the type function, and we impose two assumptions on the agent’s prior. The first is a symmetry assumption that says that the agent’s prior over these functions is unchanged by permuting the labels and values of the nonstandard covariates. If we interpret the covariates as signals about the agent’s type, then uncertainty about the type function corresponds to uncertainty about the signal structure (à la model uncertainty, e.g., Acemoglu et al. (2015) and Morris and Yildiz (2019)). The second assumption fixes the unpredictability of the agent’s type to be constant in the total number of covariates. We impose this because in many applications, machine learning algorithms have millions of inputs, and yet cannot predict the outcome perfectly. Thus our “many covariates” limit does not represent a situation in which the total of amount of information grows large, but rather one in which the type function can be arbitrarily complex. We view these two assumptions as useful conceptual benchmarks, but subsequently show that neither are essential for our main results (see Section 5 for details).

The agent’s payoff is determined by his true type and an evaluation, which may be made either by a human evaluator or a black-box evaluator. In either case the evaluation is a conditional expectation of the agent’s type given the agent’s standard covariates and some fraction of the agent’s nonstandard covariates. But the sets of nonstandard covariates that are observed by the black box evaluator and the human evaluator differ in two ways.

First, the black box evaluator observes a larger fraction of the nonstandard covariates than the human evaluator does (since humans cannot process as much information as black box algorithms can). Second, the nonstandard covariates observed by the black box evaluator are a pre-specified set of algorithmic inputs, which are fixed across individuals. For example, a designer of a medical algorithm may specify a set of inputs including (among others) blood type, BMI, and smoking status. The black box algorithm learns a mapping from those inputs into the diagnosis. We view the human evaluator as instead uncovering nonstandard covariates during a conversation, where the specific path of questioning may vary across agents. Thus the human evaluator may end up learning about one individual’s sleep schedule but another individual’s financial situation.

Rather than modeling these conversations directly, we consider an upper bound on the agent’s payoff under human evaluation, where the covariates that the human observes are the ones that maximize the agent’s payoffs (subject to the human’s capacity constraint). We say that the agent prefers the black box if the agent’s expected payoffs are higher under black box evaluation even compared to these best-case conversations with the human.

This comparison essentially reduces to the question of whether the agent prefers an evaluator who observes a larger fraction of (non-targeted) nonstandard covariates about the agent, or an evaluator who observes a smaller but targeted fraction of nonstandard covariates.¹¹1This question is spiritually related to Akbarpour et al. (2024)’s comparison of the network diffusion value of a small number of targeted seeds versus a larger number of randomly selected seeds. Like them, we will find that a larger number of (non-targeted) inputs is superior, but the mechanisms behind these results are very different; in particular, network structure does not play a role in our results. Towards this comparison, we first introduce a benchmark, which is the expected payoff that the agent would receive if interacting with an evaluator who observes no nonstandard covariates. We define the value of context to be the improvement in the agent’s payoffs under best-case human evaluation, relative to this benchmark. The value of context thus quantifies the extent to which the agent’s payoffs can possibly be improved when the evaluator observes nonstandard covariates suited to that agent.

Our first main result says that under our assumptions on the agent’s prior, the expected value of context vanishes to zero as the number of covariates grows large. Thus even though there may be realizations of the type function given which the value of context is large, in expectation it is not. The contrapositive of this result is that if the expected value of context is high in some application, it must be that our assumptions on the prior do not hold, i.e., the agent has some ex-ante knowledge about the predictive roles of the nonstandard covariates.

We prove this result by studying the sensitivity of the evaluator’s expectation to the set of covariates that are revealed. Intuitively, a large value of context requires that the evaluator’s beliefs move sharply after observing certain nonstandard covariates. We show that the largest feasible change in the evaluator’s beliefs can be written as the maximum over a set of random variables, each corresponding to the movement in the evaluator’s beliefs for a given choice of covariates to reveal. The proof proceeds by first reducing this problem to studying the maximum of a growing sequence of (appropriately constructed) i.i.d. variables, and then applying a result from Chernozhukov et al. (2013) to show that this expected maximum concentrates on its expectation as the number of covariates grows large. We conclude by bounding this expectation and demonstrating that it vanishes.

We next use this result to compare the agent’s expected payoff under human and black box evaluation, when the total number of covariates is sufficiently large. We show that when the agent prefers a more accurate evaluation—formally, when the agent’s payoff is convex in the evaluation—the agent should (eventually) prefer an algorithmic evaluator with access to more covariates over a human evaluator to whom the agent can provide context. And when the agent’s payoff is concave in the evaluation, the conclusion is (eventually) reversed. We view these conclusions as relevant not only in the many-covariates limit: We quantify the number of covariates that is needed for our result to hold, and show that it can be quite small. For example, if the agent’s utility function satisfies mild regularity conditions, the Human evaluator observes 10% of covariates, and the Black Box evaluator observes 90%, of covariates, then our result holds as long as there are at least 14 covariates.

We subsequently strengthen our main results in two ways: First, we show that not only does the expected value of context vanish for each agent, but in fact the expected maximum value of context across agents also vanishes. Thus, the expected value of context is eventually small for everyone in the population. Second, we show that our main results extend when the agent and evaluator interact in a disclosure game, where the agent chooses which nonstandard covariates to reveal, and the evaluator makes inferences about the agent based on which covariates are revealed (given the agent’s equilibrium reporting strategy).

We conclude by examining the role of our assumptions about the agent’s prior, and the extent to which our results depend on them. First, we study two variations of our main model, in which the symmetry assumption is relaxed: In the first, we suppose that there is a “low-dimensional” set of covariates that relevant for predicting the agent’s type; in the second, we suppose that the agent knows ex-ante the predictive role of certain nonstandard covariates. In both of these settings, our main results extend partially but can also fail: For example, if the set of relevant covariates is sufficiently small that they can be fully disclosed to the evaluator, then the expected value of context typically will not vanish. Next we show that our results extend also in a model in which the predictability of the agent’s type is higher in environments with a larger number of covariates (thus relaxing our second asumption on the prior). Finally we provide an abstract learning condition under which our results extend: It is enough for the informativeness of each individual set of covariates to vanish as the total number of covariates grows large. Together with our main results, these extensions clarify different categories of informational assumptions under which the expected value of context does or does not turn out to be high.

Our model is not meant to be a complete description of the differences between human and black box evaluation. For example, we do not consider human or algorithmic bias (Kleinberg et al., 2017; Gillis et al., 2021), explainability (Yang et al., 2024), preferences for empathetic evaluators, or the possibility that the human evaluator has access to information that is not available to the algorithm (e.g., for privacy protection reasons as in Agarwal et al. (2023)). We also suppose that both evaluators form correct conditional expectations, thus abstracting away from the possibility of algorithmic overfitting and of bounded human rationality (e.g., as considered in Spiegler (2020) and Haghtalab et al. (2021)).²²2The problem of overfitting, while practically important, is a function of how the algorithm is trained. We are interested here in intrinsic differences between the qualitative nature of human and black box evaluation, which are difficult to resolve by training the algorithm differently. We leave extensions of our model that include these other interesting differences to future work.

1.1 Related Literature

Our paper is situated at the intersection of the literatures on learning (Section 1.1.1) and strategic information disclosure (Section 1.1.2), where our analysis is primarily differentiated from the previous frameworks by our assumption that the agent has model uncertainty (see Section 1.1.1). Our paper is also inspired by a recent empirical literature that compares human and AI evaluation, which we review in Section 1.1.3.

1.1.1 (Asymptotic) Learning

A large literature studies asymptotic learning and agreement across Bayesian agents (Blackwell and Dubins, 1962). Our main result (Theorem 3.1) can be viewed as bounding (in expectation) the differences in beliefs across Bayesian agents who are given different information. As in Vives (1992), Golub and Jackson (2012), Liang and Mu (2019), Harel et al. (2020), and Frick et al. (2023) among others, we quantify the rate of convergence in beliefs. The learning rates that we look at are, however, of a different nature from those studied previously. One important distinction is that these previous papers consider asymptotics as the total amount of information accumulates, while our analysis considers asymptotics with respect to a sequence of information structures that we show are increasingly less informative. A second important difference is that the classic learning models suppose that the agent updates to a signal with a known signal structure, while our agent has uncertainty over the signal structure (as in Acemoglu et al. (2015) and Morris and Yildiz (2019)). Our results characterize the informativeness of this signal in expectation, where the agent’s model uncertainty takes a particular (and new) form motivated by the applications we have in mind.

Finally, our paper is related to Di Tillio et al. (2021), which compares the informativeness of an unbiased signal to the informativeness of a selected signal whose realization is the maximum realization across i.i.d. unbiased signals. Again the key difference is our assumption of model uncertainty—that is, in Di Tillio et al. (2021), the signal structures that are being compared are deterministic and known, while in ours they are random and compared in expectation. In particular, our agent’s prior belief over signal structures can have support on signal processes which are not i.i.d. (for example, it may be that the meaning of one signal is dependent on the meaning of another).

1.1.2 Strategic Information Disclosure

Several literatures study persuasion via strategic information disclosure. Our model—in which the sender has private information about his type vector, and selectively chooses which elements to disclose to a naive receiver—is closest to models of disclosure of hard information (Dye, 1985; Grossman and Hart, 1980), in particular Milgrom (1981).³³3A similar model of information is considered in Glazer and Rubinstein (2004) and Antic and Chakraborty (2023). The key difference (which follows from our assumption of model uncertainty) is that our sender has uncertainty about how his reports are interpreted. Additionally, our focus is not on examining which incentive-compatible reporting strategy is optimal,⁴⁴4Indeed, in our main model we do not require choice of an incentive-compatible reporting strategy, since the receiver updates to the sender’s disclosure as if it were exogenous information. This is primarily for convenience—we show in Section 4.2 that our results extend in a disclosure game. but instead on asymptotic limits of belief manipulability as the number of components in the type vector grows large. This latter focus is special to our motivating applications.

Our model also has important differences from the other main strands of the persuasion literature. Unlike models of cheap talk (Crawford and Sobel, 1982), our agent chooses between messages whose meanings are fixed exogenously (through the realization of the joint distribution relating covariates to the type) rather than in an equilibrium. Unlike the literature on Bayesian persuasion (Kamenica and Gentzkow (2011)), our sender chooses which signal realization to share ex-post from a finite set of signal realizations, rather than committing to a flexibly chosen information structure ex-ante.⁵⁵5Thus, for example, Bayes plausibility is not satisfied in our setting—the sender’s expectation of the receiver’s expectation of the state (following disclosure) is generally not the prior expectation of the state. Indeed, our model gives the sender substantial power to influence the receiver’s beliefs relative to this previous literature. It is perhaps surprising, then, that despite the lack of constraints imposed on the sender, we find that the sender is extremely limited in his influence. In our model, this emerges because the sender has a limited choice from a set of information structures, whose informativeness (we show) is vanishing in the total number of covariates.⁶⁶6The covariates in our model play a similar role to attributes, although the literature on attributes has focused on choice of which attributes to learn about (e.g., Klabjan et al. (2014) and Liang et al. (2022)), rather than which attributes to disclose for the purpose of persuasion. An exception is Bardhi (2023), who studies a principal-agent problem in which a principal selectively samples attributes to influence an agent decision.

1.1.3 Human vs AI Evaluation

Recent empirical papers compare the accuracy of human evaluation with AI evaluation, finding that machine learning algorithms outperform experts in problems including medical diagnosis (Rajpurkar et al., 2017; Jung et al., 2017; Agarwal et al., 2023), prediction of pretrial misconduct (Kleinberg et al., 2017; Angelova et al., 2022), and prediction of worker productivity (Chalfin et al., 2016). Nonetheless, many individuals continue to distrust algorithmic predictions (Jussupow and Heinzl, 2020; Bastani et al., 2022; Lai et al., 2023). These findings motivate our goal of understanding whether individuals should prefer human evaluators, and when instead the replacement of human evaluation with algorithmic evaluation is welfare-improving for users, as suggested in Obermeyer and Emanuel (2016) among others.

In principle, human decision-making guided by algorithmic predictions should be superior to either human or algorithmic prediction alone. In practice the evidence is more mixed, with the provision of algorithmic recommendations sometimes leading human decision-makers to less accurate predictions (Hoffman et al., 2017; Angelova et al., 2022; Agarwal et al., 2023).⁷⁷7Other papers instead consider algorithmic prediction tools that take human evaluation as an input, with greater success towards improving accuracy (e.g., Raghu et al. (2019)). The question of how to aggregate human and machine evaluations is thus important but subtle, and depends on (among other things) whether human decision-makers understand the correlation between their information and that of the algorithm (McLaughlin and Spiess, 2022; Gillis et al., 2021; Agarwal et al., 2023). We abstract away from these complexities, focusing instead on (one aspect of) the more basic question of why human oversight is even necessary to begin with. We provide a tractable way of formalizing the advantage of human evaluation, and quantify the size of this advantage.

2 Model

2.1 Setup

Agents are each described by a binary covariate vector $\mathbb{x}_{n}=(x_{1},x_{2},\dots,x_{n})$ and a type $y\in\left[-\overline{y},\overline{y}\right]$ (where $0\leq\overline{y}<\infty$ ), which are structurally related by the function

y=f(x_{1},\dots,x_{n}).

We refer to $f$ henceforth as the type function. The distribution over covariate vectors is uniform in the population.⁸⁸8All of our results extend for arbitrary finite-valued covariates.

We refer to the covariates indexed to $\mathcal{S}=\{1,\dots,s\}$ as standard covariates and the covariates indexed to $\mathcal{N}=\{s+1,\dots,n\}$ as nonstandard covariates.

Example 1 (Job Interview).

Standard covariates describing a job applicant may include their work history, education level, college GPA, and the coding languages they know. Nonstandard covariates may include their social media activity (e.g., number of followers, posts, likes), wearable device data (e.g., sleep patterns, physical activity level), and hobbies (e.g., whether they are active readers, whether they enjoy extreme sports).

Example 2 (Medical Prediction).

Standard covariates describing a patient may include symptoms, prior diagnoses, family medical history, lab tests and imaging results. Nonstandard covariates may include the patient’s religious practices, genetic data, wearable device data, and financial data.⁹⁹9See Acosta et al. (2022) for further examples of nonstandard patient covariates that may be predictive, but which are not currently used by clinicians for medical evaluations.

An evaluation of the agent, $\hat{y}\in\left[-\overline{y},\overline{y}\right]$ , is described in the following section. The agent has a Lipschitz continuous utility function $u:\left[-\overline{y},\overline{y}\right]^{2}\rightarrow\mathbb{R}$ , which maps the evaluation $\hat{y}$ and the agent’s true type $y$ into a payoff.

Example 3 (Higher Evaluations are Better).

The agent’s payoff is

u(\hat{y},y)=\phi(\hat{y})

for some increasing $\phi$ . This corresponds, for example, to an agent receiving a desired outcome (e.g., a loan or a promotion) with probability increasing in the evaluation.

Example 4 (More Accurate Evaluations are Better).

The agent’s payoff is

u(\hat{y},y)=-(\hat{y}-y)^{2}.

This corresponds to harms that are decreasing in the accuracy of the evaluation, e.g., medical prediction problems where more accurate evaluations are desired.

2.2 Evaluation of the agent

There are two evaluators: a black box evaluator, henceforth Black Box (it), and a human evaluator, henceforth Human (she). Both form evaluations as an expectation of the agent’s type $y$ given observed covariates, so we will introduce notation for these conditional expectations. For any covariate vector $\mathbb{x}_{n}=(x_{1},\dots,x_{n})$ and subset of nonstandard covariates $A\subseteq\mathcal{N}$ , let

C_{A}(\mathbb{x}_{n})=\{\tilde{x}\in\{0,1\}^{n}:\tilde{x}_{i}=x_{i}\quad\forall i\in\mathcal{S}\cup A\}

(2.1)

be the set of all covariate vectors that agree with $\mathbb{x}_{n}$ on the covariates with indices in $\mathcal{S}\cup A$ . Further define

\hat{y}^{f}_{\mathbb{x}_{n}}(A)=\frac{1}{|C_{A}(\mathbb{x}_{n})|}\sum_{x\in C_{A}(\mathbb{x}_{n})}f(x)

(2.2)

to be the conditional expectation of the agent’s type given their standard covariates and their nonstandard covariates with indices in $A$ . We use

U^{f}_{\mathbb{x}_{n}}(A)=u\left(\hat{y}^{f}_{\mathbb{x}_{n}}(A),y\right)

to denote the agent’s payoff given this evaluation.

Both the human and black box evaluation take the form (2.2), but the observed sets of nonstandard covariates $A$ are different across the evaluators. Black Box observes the nonstandard covariates in the set $B=\{s+1,\dots,s+b_{n}\}$ where $b_{n}=\lfloor\alpha_{b}\cdot n\rfloor$ .¹⁰¹⁰10One can instead assume that these nonstandard covariates are selected uniformly at random. This will not affect the results of this paper. Importantly, this set is held fixed across agents. So an individual with covariate vector $\mathbb{x}_{n}$ receives the evaluation $\hat{y}_{\mathbb{x}_{n}}^{f}(B)$ and payoff $U_{\mathbb{x}_{n}}^{f}(B)$ when evaluated by the Black Box.¹¹¹¹11It is not important for our results that $B$ is common across individuals; what we require is that any randomness in $B$ is independent of the agent’s covariates and type. For example, if the set $B$ were drawn uniformly at random for each agent, our results would hold.

Human differs from Black Box in two ways. First, Human has a capacity of $h_{n}=\lfloor\alpha_{h}\cdot n\rfloor$ nonstandard covariates per agent, where $\alpha_{h}<\alpha_{b}$ (i.e., Human cannot process as many inputs as Black Box). Second, Human does not pre-specify which nonstandard covariates to observe, but rather learns these through conversation, and thus potentially observes different nonstandard covariates for each agent. For example, a doctor (evaluator) may pose different questions to different patients (agents) depending on their answers to previous questions. Or a job candidate (agent) might choose to disclose to an interviewer (evaluator) certain nonstandard covariates that are favorable to him.

Rather than modeling the complex process of a conversation, we study the quantity

\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H)

(2.3)

which is the agent’s payoff when the posterior expectation about his type is based on those $\alpha_{h}\cdot n$ or fewer covariates that are best for him.

We can interpret this quantity as an upper bound for the agent’s payoffs under certain assumptions. First, if the evaluator selects which covariates to observe, then (2.3) is an upper bound on the agent’s possible payoffs across all possible evaluator selection rules. Second, if covariates are disclosed by the agent, but the evaluator updates to the disclosed covariates as if they had been chosen exogenously, then again (2.3) represents an upper bound on the agent’s possible payoffs.¹²¹²12Jin et al. (2021) and Farina et al. (2023) report that the beliefs of experimental subjects falls somewhere in between this naive benchmark and equilibrium beliefs, since subjects do not completely account for the strategic nature of disclosure.

If however the covariates are disclosed by the agent in a disclosure game, and the evaluator accounts for the strategic nature of this disclosure, then whether (2.3) represents an upper bound will depend on what we assume that the agent knows at the time of disclosure.¹³¹³13This is roughly because the agent can potentially “sneak in” information about the other covariates via the covariates that are revealed. We show in Section 4.2 that if the agent knows his entire covariate vector, then (2.3) need not upper bound every agent’s payoffs. Nevertheless, we present a different quantity that does upper bound the maximum payoff that any agent can obtain in this disclosure game, and show that our main results extend when we replace (2.3) with this quantity. To streamline the exposition we focus on the prior two interpretations (in which the human evaluator either selects the covariates herself or updates to the agent’s disclosures naively), and discuss disclosure games in Section 4.2.

2.3 Value of context

A key input towards understanding the comparison between Human and Black Box is quantifying the extent to which individualized context improves the agent’s payoffs.

Definition 2.1 (Value of Context).

The value of context for an agent with covariate vector $\mathbb{x}_{n}$ and type $y=f(\mathbb{x}_{n})$ is

v(f,\mathbb{x}_{n})=\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}n}U_{\mathbb{x}_{n}}^{f}(H)-U_{\mathbb{x}_{n}}^{f}(\varnothing)

i.e., the best possible improvement in the agent’s utility when the evaluator additionally observes up to $\alpha_{h}\cdot n$ covariates for the agent.

In general, the value of context depends on the type function $f$ as well as on the agent’s own covariate vector $\mathbb{x}_{n}$ .¹⁴¹⁴14The value of context given a specific function $f$ is spiritually related to the communication complexity of $f$ (Kushilevitz and Nisan, 1996).

Example 5 (High Value of Context).

Let $u(\hat{y},y)=\hat{y}$ , i.e., the agent’s payoff is the evaluation. Suppose $x_{1}$ is a standard covariate (observed no matter what), while $x_{2},\dots,x_{100}$ are nonstandard covariates. The type $y$ is related to these covariates via the type function

y=f(x_{1},\dots,x_{100})=\left\{\begin{array}[]{cc}c&\mbox{ if }x_{1}=x_{2}\\ -c&\mbox{ if }x_{1}\neq x_{2}\end{array}\right.

For an agent who can reveal (up to) one covariate and whose covariate vector is $(1,1,\dots,1)$ , the value of context is $c$ , since revealing $x_{2}=1$ moves the expectation of his type from 0 to $c$ . This example corresponds to settings in which some nonstandard covariate substantially moderates the interpretation of a standard covariate. For such type functions $f$ , it is important for the evaluator to observe the right nonstandard covariates, and so the value of context can be large.

Example 6 (Low Value of Context).

Suppose the type function in the previous example is instead $y=f(x_{1})=x_{1}$ (leaving all other details of the example unchanged). Then the value of context is 0 for every agent. In this example, nonstandard covariates are irrelevant for predicting the type, so there is no value to the evaluator discovering the “right” covariates.

In what follows, we give the agent uncertainty about $f$ and characterize the agent’s expected value of context and expected payoffs, integrating over the agent’s belief about $f$ .¹⁵¹⁵15If we interpret the covariates in our model as signals about the type, then the function relating covariates to type corresponds to the signal structure.

We do this for two reasons. First, in many applications it is not realistic to suppose that the agent knows $f$ . For example, a patient who anticipates that a diagnosis will be based on an image scan of his kidney may recognize that there are properties of the image that are indicative of whether he has the condition or not, but likely does not know what the relevant properties are, or how they determine the diagnosis.¹⁶¹⁶16In the case of a job interview, the function $f$ may reflect particular subjective preferences of the firm, which are initially unknown to the agent.

Second, the case with uncertainty about $f$ turns out to yield a more elegant analysis than the one in which $f$ is known. That is, although the value of context for specific functions $f$ depends on details of that function and on the agent’s own covariate vector, there is a large class of prior beliefs (described in the following section) for which it is possible to draw strong detail-free conclusions about the expected value of context.

2.4 Model Uncertainty

We impose two assumptions on the agent’s prior belief about $f$ . Together, these assumptions deliver a setting in which many different structural relationships between the covariates and the type are possible (including both ones where the value of context is high and low), but ex-ante those relationships are not known.

The first assumption says that while the agent may know how standard covariates impact the type, he has no ex-ante knowledge about the roles of the nonstandard covariates.

Assumption 1 (Exchangeability).

For every realization of the standard covariates $(x_{1},\dots,x_{s})$ , the sequence

(Y^{n}_{1},\dots,Y^{n}_{2^{n-s}})\equiv(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s})

(2.4)

is finitely exchangeable.

The set $\{(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s}\}$ ranges over all covariate vectors that share the standard covariate values $(x_{1},\dots,x_{s})$ . Assumption 1 says that the joint distribution of these types is ex-ante invariant to permutations of the labels and values of the nonstandard covariates. An agent whose prior satisfies Assumption 1 is thus agnostic about how the nonstandard covariates impact the type.

While our assumption of no prior knowledge about the role of nonstandard covariates is strong, it is consistent with our interpretation of the nonstandard covariates as those covariates for which there is little historical data about correlations. For example, if it were known that a higher GPA positively correlates with on-the-job performance, but not how a large number of social media followers predicts performance, then we would think of GPA as a standard covariate and the number of social media followers as a nonstandard covariate.

Assumption 1 does not restrict how the agent’s prior varies with $n$ , the number of covariates. In a model in which $x_{1},x_{2},\dots$ were drawn i.i.d. from a type-dependent distribution $F_{y}$ , the total quantity of information about $y$ would increase in the number of covariates, and the evaluator’s uncertainty about $y$ would vanish as $n$ grew large. This does not seem descriptive of real applications: credit scoring algorithms and healthcare algorithms use millions of covariates, but there remains substantial residual uncertainty about the agent’s type. We take the opposite extreme in which the predictability of the agent’s type is a primitive of the setting, which is held constant for all $n$ . In our model, $n$ is not a measure of the total quantity of information, but instead moderates the richness of the informational environment and the potential complexity of the mapping $f$ . Loosely speaking, as $n$ grows large, the agent has a more extensive set of words to describe a fixed unknown $y$ .¹⁷¹⁷17As $n$ grows large, the smallest possible informational size of each covariate (in the sense of McLean and Postlewaite (2002)) vanishes. But we do not require each covariate to be equally informationally relevant in the realized function. So, for example, $f(x_{1},\dots,x_{n})=x_{1}$ can be in the support of the agent’s beliefs for $n$ arbitrarily large (see Example 7).

Assumption 2 (Constant Unpredictability of $Y$ ).

Fix any realization of the standard covariates $(x_{1},\dots,x_{s})$ , and let $(Y^{n}_{1},\dots,Y_{2^{n-s}}^{n})$ be defined as in (2.4) for each $n\in\mathbb{Z}_{+}$ . Then for every pair $n^{\prime}>\tilde{n}$ , the sequence $(Y^{\tilde{n}}_{1},\dots,Y^{\tilde{n}}_{2^{\tilde{n}-s}})$ and the truncated sequence $(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{\tilde{n}-s}})$ are identically distributed.

The statement of Assumption 2 formally depends on the ordering of types within the vector $(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{n^{\prime}}})$ , since this determines which types appear in the truncated sequence $(Y^{n^{\prime}}_{1},\dots,Y^{n^{\prime}}_{2^{n}})$ . But if we further impose Assumption 1 (and we will always impose these assumptions jointly), then the ordering of types is irrelevant: That is, when Assumption 2 holds for one such ordering, it will hold for all orderings.

It is important to note that in our model, Assumptions 1 and 2 are placed ex-ante on the agent’s prior, and not ex-post on the realized function $f$ . For example, the function $f(x_{1},\dots,x_{n})=x_{1}$ , which says that the only covariate that matters is $x_{1}$ , is strongly asymmetric ( $x_{1}$ is differentiated from the other covariates) and also features a single “large” covariate (the realization of $x_{1}$ completely determines $y$ ). Our assumptions do not rule out the possibility of this function. Rather, they require that if this function is considered possible, then certain other functions are as well.¹⁸¹⁸18Assumption 1 requires that for every permutation $\pi:\{0,1\}^{n}\rightarrow\{0,1\}^{n}$ , the function $g_{\pi}$ satisfying $g_{\pi}(x_{1},\dots,x_{n})=f(\pi(x_{1},\dots,x_{n}))$ is also in the support of the agent’s beliefs.

Simple examples of priors satisfying these two assumptions are given below.

Example 7.

Let $y\in\{0,1\}$ , in which case the space of possible functions $f:\mathcal{X}\rightarrow\mathcal{Y}$ can be identified with $\{0,1\}^{2^{n}}$ . Suppose that for each $n$ , the agent has a uniform prior on the set of all functions $\{0,1\}^{2^{n}}$ . Then Assumptions 1 and 2 are satisfied.

Example 8.

Suppose there is a distribution $F$ on $[-\overline{y},\overline{y}]$ such that for each $n$ ,

\left(f(x_{1},\dots,x_{s},x_{s+1},\dots,x_{n}):(x_{s+1},\dots,x_{n})\in\{0,1\}^{n-s}\right)\sim_{i.i.d.}F.

Then Assumptions 1 and 2 are satisfied.

Priors that violate these assumptions include the following.

Example 9 (Only One Covariate is Relevant).

The type is equal to the value of the nonstandard covariate $x_{I}$ , where the index $I$ is drawn uniformly at random from $\mathcal{N}$ . Then Assumption 1 fails.¹⁹¹⁹19Suppose $n=2$ , and both covariates are nonstandard. Then under the agent’s prior, $f\in\left\{\hat{f},\tilde{f}\right\}$ where $\hat{f}(1,1)=\hat{f}(1,0)=1$ while $\hat{f}(0,1)=\hat{f}(0,0)=0$ , and $\tilde{f}(1,1)=\tilde{f}(0,1)=1$ while $\tilde{f}(1,0)=\tilde{f}(0,0)=0$ . So the agent knows with certainty that $f(1,1)=1$ but $f(0,0)=0$ , in violation of Assumption 1. This example is consistent with exchangeability in the labels of the nonstandard covariates, but not with exchangeability in their realizations.

Example 10 (Higher Values are Better).

The value of $f(\mathbb{x}_{n})$ is (independently) drawn from a uniform distribution on $[1,2]$ if $x_{s+1}=1$ , and (independently) drawn from a uniform distribution on $[0,1]$ if $x_{s+1}=0$ . Then Assumption 1 fails.

We view these two assumptions as useful conceptual benchmarks, but neither is necessary for our subsequent results. In Section 5, we explore how far our main results generalize under different relaxations of Assumptions 1 and 2.

2.5 Expected Value of Context

We now define the expected value of context from the point of view of an agent who knows his covariate vector $\mathbb{x}_{n}$ but does not know the function $f$ (and hence also does not know his type $y=f(\mathbb{x}_{n})$ ). As we show in Section 4.1, the assumption that the agent knows $\mathbb{x}_{n}$ is immaterial for the results.

Definition 2.2 (Expected Value of Context).

For every $n\in\mathbb{Z}_{+}$ and covariate vector $\mathbb{x}_{n}\in\{0,1\}^{n}$ , the expected value of context is

V(n,\mathbb{x}_{n})=\mathbb{E}\left[v(f,\mathbb{x}_{n})\right].

This quantity tells us the extent to which context improves the agent’s payoffs in expectation.

We similarly compare evaluators based on the expected payoff that the agent receives.

Definition 2.3.

Consider any agent with covariate vector $\mathbb{x}_{n}$ . If

\mathbb{E}\left[\max_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H)\right]<\mathbb{E}\left[U_{\mathbb{x}_{n}}^{f}(B)\right]

(2.5)

then say that the agent prefers the black box evaluator. And if

\mathbb{E}\left[\min_{H\subseteq\mathcal{N},|H|\leq\alpha_{h}\cdot n}U_{\mathbb{x}_{n}}^{f}(H)\right]>\mathbb{E}\left[U_{\mathbb{x}_{n}}^{f}(B)\right]

(2.6)

then say that the agent prefers the human evaluator.

These definitions correspond to a thought experiment in which (for example) a patient has a choice between being seen by a doctor or assessed by an algorithm. If the patient chooses the algorithm, his standard covariates and $\alpha_{b}\cdot n$ arbitrarily chosen nonstandard covariates will be sent to the algorithm. If the patient chooses the doctor, he will engage in a conversation with the doctor, where his standard covariates and $\alpha_{h}\cdot n$ selected nonstandard covariates will be revealed. Which should the patient choose?

The first part of Definition 2.3 compares the agent’s expected payoff under black box evaluation with the best-case expected payoff under human evaluation, namely when the human evaluator observes those (up to) $\alpha_{h}\cdot n$ covariates that maximize the agent’s payoffs. If the agent’s expected payoff is nevertheless higher under black box evaluation even after biasing the agent towards the human in this way, we say that the agent prefers to be evaluated by the black box. The second part of the definition compares the agent’s expected payoff under black box evaluation with the worst-case expected payoff under human evaluation, namely when the human evaluator observes those (up to) $\alpha_{h}\cdot n$ covariates that minimize the agent’s payoffs. If the agent’s expected payoff is lower under black box evaluation even after biasing the agent against the human in this way, then we say that the agent prefers to be evaluated by the human.²⁰²⁰20In Section 4.2 we further discuss the extent to which these interpretations are valid when the evaluator also updates her beliefs to the selection of covariates.

These are clearly very conservative criteria for what it means to prefer the human or the black box. In practice, we would expect the set of revealed covariates $H$ to be intermediate to the two cases considered in Definition 2.3, i.e., that $H$ neither maximizes nor minimizes the agent’s payoffs.²¹²¹21Angelova et al. (2022) provide evidence that some judges condition on irrelevant defendant covariates when predicting misconduct rates. But if we can conclude either that the agent prefers the black box evaluator or the human evaluator according to Definition 2.3, then the same conclusion would hold for these more realistic models of $H$ .

3 Main Results

Section 3.1 characterizes the expected value of context in a simple example. Section 3.2 presents our first main result, which says that the expected value of context vanishes to zero as the number of covariates grows large. Section 3.3 compares human and black box evaluators.

3.1 Example

Suppose there are two covariates, $x_{1}$ and $x_{2}$ , both nonstandard. For each covariate vector $\mathbb{x}\in\{0,1\}^{2}$ , define the random variable $Y_{\mathbb{x}}=f(\mathbb{x})$ , where the randomness is in the realization of $f$ .

\begin{array}[]{ccc}X_{1}&X_{2}&Y_{\mathbb{x}}\\ 0&0&Y_{00}\\ 0&1&Y_{01}\\ 1&0&Y_{10}\\ 1&1&Y_{11}\end{array}

Table 1: The four possible covariate vectors and their associated types.

The agent has utility function $u(\hat{y},y)=\hat{y}$ and covariate vector $(1,1)$ . Suppose Human observes up to one nonstandard covariate; then, there are three possibilities for what the evaluator observes. If Human observes $x_{1}=1$ , her evaluation is

Z_{1}\equiv\frac{Y_{10}+Y_{11}}{2}.

If Human observes $x_{2}=1$ , her evaluation is

Z_{2}\equiv\frac{Y_{01}+Y_{11}}{2}.

And if Human observes no nonstandard covariates, then her evaluation remains the unconditional average

Z_{\varnothing}\equiv\frac{Y_{00}+Y_{01}+Y_{10}+Y_{11}}{4}.

So the expected value of context for this agent is

\mathbb{E}\left[\max\left\{Z_{\varnothing},Z_{1},Z_{2}\right\}-Z_{\varnothing}\right].

(3.1)

Suppose $n$ grows large with up to $h_{n}=\lfloor\frac{n}{2}\rfloor$ covariates observed. There are two opposing forces affecting the value of context. First, when $n$ is larger there are more distinct sets of covariates that can be revealed to Human, and hence the max in (3.1) is taken over a larger number of posterior expectations. This increases the value of context. On the other hand, each $Z_{k}$ is a sample average, and the number of elements in this sample average also grows in $n$ .²²²²22For example, observing $X_{1}=1$ with $n=2$ gives the evaluator a posterior expectation of $(Y_{10}+Y_{11})/2$ , while the same observation gives the evaluator a posterior expectation of $(Y_{100}+Y_{101}+Y_{110}+Y_{111})/2$ if $n=3$ . By the law of large numbers, each $Z_{k}$ thus concentrates on its expectation (which is common across $k$ ) as $n$ grows large, so the difference between any $Z_{k}$ and $Z_{k^{\prime}}$ grows small. What we have to determine is whether the growth rate in the number of subsets of nonstandard covariates (of size $\leq h_{n}$ ) is sufficiently large such that the maximum difference in evaluations across these sets is nevertheless asymptotically bounded away from zero. The answer turns out to be no.

3.2 The Expected Value of Context

Our main result says that for every agent, the expected value of context (as defined in Definition 2.2) vanishes as $n$ grows large.

Theorem 3.1.

Suppose Assumptions 1 and 2 hold. Then for every covariate vector $\mathbb{x}\in\{0,1\}^{\infty}$ , the expected value of context vanishes to zero as $n$ grows large, i.e.,

\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

Thus although the value of context may be substantial for certain type functions (such as in Example 5), it does not matter on average across these functions when the agent’s prior satisfies Assumptions 1 and 2. This also implies that for sufficiently large $n$ , the provision of context does not “typically” matter; that is, the probability that the agent gains substantially from targeted information acquisition is small.

The core of the proof of Theorem 3.1 is an argument that the extent to which context can change the evaluator’s posterior expectation vanishes in the number of covariates. We outline that argument here. For each $n$ , there are $K_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j}$ sets of $\alpha_{h}n$ (or fewer) nonstandard covariates that can be disclosed. We can enumerate and index these sets to $k=1,\dots,K_{n}$ . Each set $k$ induces a posterior expectation $Z_{k}$ which is a sample average of random variables $Y_{x}\equiv f(x)$ . The expected value of context (for this utility function) is

\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]-\mathbb{E}[Z_{\varnothing}]

where $Z_{\varnothing}$ is Human’s posterior expectation given observation of standard covariates only. Normalizing $E[Z_{\varnothing}]=0$ , it remains to study properties of the first-order statistic $\max_{1\leq k\leq K_{n}}Z_{k}$ .

There are two challenges to analyzing this quantity. First, the correlation structure of $Z_{1},\dots,Z_{K_{n}}$ can be complex: The variables $Z_{k}$ are neither independent (because the same posterior expectation $Y_{x}$ can appear as an element in different sample averages $Z_{k},Z_{k^{\prime}}$ ) nor identically distributed (because the sample averages are of different sizes depending on how many nonstandard covariates are revealed). The second challenge is that the length of the sequence $(Z_{1},\dots,Z_{K_{n}})$ grows exponentially in $n$ . Thus even though each term within the maximum eventually converges to a normally distributed random variable (with shrinking variance), the errors of each term may in principle accumulate in a way that the maximum grows large.

Our approach is to first construct new i.i.d. variables $Z^{iid}_{k}$ , with the property that

\mathbb{E}\left[\max\{Z_{1},\dots,Z_{K_{n}}\}\right]\leq\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right]

(3.2)

Applying a result from Chernozhukov et al. (2013), we show that $\max_{1\leq k\leq K_{n}}Z_{k}^{iid}$ (properly normalized) converges to $\max_{1\leq k\leq K_{n}}Z_{k}^{Normal}$ in distribution, where (due to properties of our problem) $Z_{k}^{Normal}\sim_{iid}\mathcal{N}\left(0,\frac{1}{2^{n(1-\alpha_{h})-s}}\right)$ . Having reduced the analysis to studying the expected maximum of i.i.d. Gaussian variables, classic bounds apply to show that this quantity is no more than

\frac{1}{2^{n(1-\alpha_{h})-s}}\sqrt{\log(K_{n})}.

(3.3)

This display quantifies the importance of each of the two forces discussed in Section 3.1. First, as $n$ grows larger, the number of posterior expectations $K_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j}\leq 2^{n-s}$ grows exponentially in $n$ , increasing the expected value of context. But second, as $n$ grows larger, each $Z_{k}$ concentrates on its expectation, where its variance, $\frac{1}{2^{n(1-\alpha_{h})-s}}$ , decreases exponentially in $n$ . What the bound in display (3.3) tells us is that the exponential growth in the number of variables is eventually dominated by the exponential reduction in the variance of each variable, yielding the result.

This proof sketch also clarifies the role of Assumption 1. As we show in Section 5.4, the statement of the theorem extends so long as the evaluator’s posterior expectation $Z_{k}$ concentrates on its expectation sufficiently quickly as $n$ grows large. Roughly speaking, this means that the informativeness of any specific set of covariates is decreasing in the total number of covariates. Thus the precise symmetry imposed by Assumption 1 is not critical for Theorem 3.1 to hold.

On the other hand, the conclusion of Theorem 3.1 can fail if the agent has substantial prior knowledge about how $y$ is related to the covariates.

Example 11.

Let $s=0$ , so that there are no standard covariates. Suppose that for each $n$ ,

y=f(x_{1},\dots,x_{n})=\frac{1}{n}\sum_{i=1}^{n}x_{i}\cdot U

where $U$ is a uniform random variable on $[0,1]$ . This model violates Assumption 1, since it is known that higher realizations of the agent’s covariates are good news about the agent’s type. The conclusion of Theorem 3.1 also does not hold: For any $n$ , the evaluator’s prior expectation is $\mathbb{E}[f(\mathbb{x}_{n})]=1/4$ . But if $\lfloor\alpha\cdot n\rfloor$ covariates are revealed to be 1, the evaluator’s posterior expectation is equal to $\frac{1}{4}+\frac{1}{4}\frac{\lfloor\alpha n\rfloor}{n}$ . So the expected value of context for an agent with $\mathbf{x}_{n}=(1,...,1)$ is asymptotically bounded away from zero.

In Section 5 we explore several relaxations of Assumptions 1 and 2. Our first relaxation of Assumption 1 supposes that there is a “low-dimensional” set of covariates that predictive of the agent’s type, while the remaining covariates are irrelevant. The second relaxation supposes that there is a subset of nonstandard covariates whose effects are known. We also consider a relaxation of Assumption 2 where the evaluator’s ability to predict $Y$ is increasing in the total number of covariates that define the type. We formalize these extensions of our main model and examine the extent to which Theorem 3.1 extends.

3.3 Human versus Black Box

We now turn to the question of when the agent should prefer the human evaluator and when the agent should prefer the black box evaluator.

Assumption 3.

The agent’s expected utility can be written as $\mathbb{E}[\phi(\hat{y})]$ for some twice continuously differentiable function $\phi$ .²³²³23Restricting to utility functions that depend on a posterior mean is a common assumption in the literature on information design, see e.g., Kamenica and Gentzkow (2011), Frankel (2014) and Dworczak and Martini (2019). Moreover, there exists $C<\infty$ such that

\frac{\sup_{\hat{y}\in[-\overline{y},\overline{y}]}|\phi^{\prime}(\hat{y})|}{\inf_{\hat{y}\in[-\overline{y},\overline{y}]}|\phi^{\prime\prime}(\hat{y})|}<\frac{C}{2}.

Roughly speaking, the numerator describes the sensitivity of the function $\phi$ to the evaluation, and the denominator describes the curvature of the function $\phi$ . The assumption thus says that the curvature of the function must be sufficiently large relative to its slope. While there is no formal relationship, the LHS is evocative of the coefficient of absolute risk aversion of the function $\phi$ .²⁴²⁴24Recall that the coefficient of absolute risk aversion of the function $\phi$ is $-\frac{\phi^{\prime}(\hat{y})}{\phi^{\prime\prime}(\hat{y}))}$ .

Theorem 3.2.

Suppose Assumptions 1-3 hold, and let

N=\min\left\{n\in\mathbb{R}_{+}:(\alpha_{b}-\alpha_{h})n-\frac{1}{2}\log_{2}(n)-1>\log_{2}(C)\right\}.

(3.4)

Then:

(a)

If $\phi$ is strictly convex, the agent prefers the black box evaluator for all $n\geq N$ .
(b)

If $\phi$ is strictly concave, the agent prefers the human evaluator for all $n\geq N$ .

The comparisons in this theorem apply for reasonably small $N$ . For example, let $C=100$ , in which case the restriction in Assumption 3 is quite weak. Figure 1 fixes $\alpha_{h}=0.1$ and plots $N$ for different values of $\alpha_{b}$ . If (say), the human evaluator observes 10% of covariates while Black Box observes 90%, then the comparisons in Theorem 3.2 hold for all $n\geq 14$ .

Refer to caption — Figure 1: Let $C=100$ and $\alpha_{h}=0.1$ . Then the comparisons in Theorem 3.2 hold for all $n\geq N$ as depicted here.

The case of convex $\phi$ (Part (a)) corresponds to a preference for more accurate evaluations.²⁵²⁵25Consider any two sets of covariates $A\subset A^{\prime}$ and let $\hat{y}_{A}$ , $\hat{y}_{A^{\prime}}$ be the corresponding posterior expectations. The distribution of $\hat{y}_{A^{\prime}}$ (i.e., the posterior expectation that conditions on more information) is a mean-preserving spread of the distribution of $\hat{y}_{A}$ . When $\phi$ is convex, the former leads to a higher expected utility. Such an agent “prefers more accurate evaluations” in the sense that giving the evaluator better information (in the standard Blackwell sense) leads to an improvement in the agent’s expected utility. Such an agent prefers for the evaluation to be based on more information (advantaging Black Box), but also prefers for the evaluation to be based on more relevant covariates (advantaging Human). We show that what eventually dominates is how many covariates the evaluators observe, not how they are selected; for an agent who prefers accuracy, this favors the Black Box.

Part (b) of Theorem 3.2 says that if instead $\phi$ is concave, then the agent should eventually prefer the human evaluator. We conclude this section with example decision problems that induce utility functions satisfying the conditions of either part of the theorem.

Example 12.

Suppose the agent receives a dollar wage equal to the evaluation, and is risk averse in money. Then his utility function is $u(\hat{y},y)=\phi(\hat{y})$ for some increasing and concave $\phi$ , and Part (a) of Theorem 3.2 says that the agent prefers to be evaluated by the human.

Example 13.

Suppose the agent’s type is $y\in\{0,1\}$ , and the evaluator chooses an action $a$ based on the observed covariates. The evaluator and agent share the utility function $-\mathbb{E}[(a-y)^{2}]$ . The evaluator’s optimal action is $a=\hat{y}$ , and the agent’s expected payoff given this action is

\mathbb{E}\left[-(\hat{y}-y)^{2}\right]=\mathbb{E}\left[-\left(\hat{y}(1-\hat{y})^{2}+(1-\hat{y})(\hat{y})^{2}\right)\right]=\mathbb{E}\left[\phi(\hat{y})\right]

where $\phi(\hat{y})=\hat{y}^{2}-\hat{y}$ is convex. So Part (a) of Theorem 3.2 says that the agent eventually prefers evaluation by the black box evaluator. Although the conditions of Theorem 3.2 are no longer met when $y$ is not binary, we show in Appendix A.7 that the conclusion of Part (a) of Theorem 3.2 generalizes for arbitrary $y$ given the mean-squared error payoff function described in this example.

4 Extensions

We now show that we are able to strengthen our main results (Theorems 3.1 and 3.2) in the following ways. In Section 4.1, we show that not only does the expected value of context vanish for each individual agent, but in fact the expected maximum value of context across agents also vanishes. That is, in expectation the most that context can benefit any agent in the population is small. From this, it is immediate that our main results also extend in a generalization of our model in which the agent has uncertainty over his covariate vector. In Section 4.2, we show that our main results extend when the agent and evaluator interact in a disclosure game, wherein the evaluator updates his beliefs to the agent’s strategic choice of what to disclose.

4.1 Max value of context across agents

So far we have studied the the expected value of context for a single agent. If we instead ask whether the firm should use human or algorithmic evaluation—for example, whether a hospital should automate diagnoses or rely on doctor evaluations—other statistics may also be relevant. For example, it may matter whether the value of context is large for any agent in the population (e.g., because a lawsuit regarding algorithmic error may be brought on the basis of harm to any specific individual (Jha, 2020)). We thus study the expected maximum value of context, as defined below.

Definition 4.1.

For any $n\in\mathbb{Z}_{+}$ , the expected maximum value of context is

V^{\text{max}}(n)=\mathbb{E}\left[\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v(f,\mathbb{x}_{n})\right].

The following corollary says that this quantity also vanishes as $n$ grows large.

Corollary 1.

Suppose Suppose Assumptions 1 and 2 hold. Then the expected maximum value of context vanishes to zero as $n$ grows large, i.e., $\lim_{n\rightarrow\infty}\overline{V}^{\text{max}}(n)=0.$

Thus, the expected value of context vanishes uniformly across agents in the population. This result immediately implies that Theorems 3.1 and 3.2 extend in any generalization of our model in which the agent has uncertainty not only over $f$ but also over his own covariate vector $\mathbb{x}_{n}$ .

4.2 Strategic Disclosure

So far we’ve remained agnostic as to whether the agent or evaluator chooses which nonstandard covariates are revealed, assuming that in either case the evaluator updates as if the covariates were revealed exogenously. We now consider a more traditional disclosure game, in which the agent chooses which nonstandard covariates are revealed, and the human evaluator updates her beliefs about the agent’s type in part based on which covariates are chosen.

For any fixed function $f$ , call the following an $f$ -context disclosure game: There are two players, the agent and the evaluator. The function $f$ is common knowledge.²⁶²⁶26We do not interpret this assumption literally. At the other extreme where $f$ is unknown to the agent, there is no informational content in which covariates the agent chooses to reveal, as they are all symmetric from the agent’s point of view. The set of possible disclosures $\mathcal{D}$ is the set of all pairs $(H,(x_{i})_{i\in H})$ consisting of a set of nonstandard covariates $H\subseteq\mathcal{N}$ and values for those covariates. A disclosure $d=(H,(x^{\prime}_{i})_{i\in H})$ is feasible for an agent with covariate vector $(x_{1},\dots,x_{n})$ if the disclosed covariate values are truthful, i.e., $x_{i}=x_{i}^{\prime}$ for every $i\in H$ .

The agent chooses a disclosure strategy, which is a map

\sigma:\{0,1\}^{n}\rightarrow\mathcal{D}

from covariate vectors to feasible disclosures. The agent then privately observes his covariate vector $\mathbb{x}_{n}$ and discloses $\sigma(\mathbb{x}_{n})$ . The evaluator observes this disclosure and chooses an action $\hat{y}$ . That is, the evaluator’s strategy is a function $\sigma_{E}:\mathcal{D}\rightarrow[-\overline{y},\overline{y}]$ . The evaluator’s payoff is $-(\hat{y}-y)^{2}$ and the agent’s payoff is some function $u(\hat{y})$ .

In this section we focus on pure strategy Perfect Bayesian Nash equilibria (PBE) of this game, henceforth simply equilibria. (A similar result holds for mixed strategy equilibria, which is demonstrated in the appendix.)

Definition 4.2.

Let $v^{D}(f,\mathbb{x}_{n})$ denote the highest payoff that an agent with covariate vector $\mathbb{x}_{n}$ receives in any pure-strategy equilibrium of the $f$ -context disclosure game. The expected maximum value of context disclosure is

V^{D}(n)=\mathbb{E}\left[\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v^{D}(f,\mathbb{x}_{n})\right].

We show that the best payoff that an agent can receive in any pure strategy $f$ -context equilibrium is bounded above by the maximum value of context across agents.

Proposition 4.1.

Suppose Assumptions 1 and 2 hold. Then for all $n$ ,

V^{D}(n)\leq V^{\text{max}}(n).

Thus, applying Proposition 4.1 and Corollary 1, our previous results extend.

5 Relaxing our Assumptions on the Prior

As shown in Example 11, our main results can fail if the assumption of symmetric uncertainty over the role of the nonstandard covariate values (Assumption 1) is broken. We now propose two relaxations of Assumption 1 and one relaxation of Assumption 2, and explore the extent to which our main result extends. In Section 5.1, we suppose that it is known ex-ante that some $r_{n}$ covariates are relevant, while the remaining $n-r_{n}$ are not, so that even as $n$ grows to infinity, the effective number of covariates potentially grows more slowly. In Section 5.2 we allow the agent to have prior knowledge about the role of certain nonstandard covariates. In Section 5.3, we consider a model in which the predictability of the agent’s type is increasing in the total number of covariates. Finally, Section 5.4 provides an abstract condition on the learning environment under which our main results hold, which requires the evaluator’s uncertainty about the agent’s type to grow sufficiently fast in $n$ .

5.1 Not all covariates are relevant

Under Assumption 1, it cannot be known ex-ante that some covariates are irrelevant for predicting the type. The assumption thus rules out settings such as the following.

Example 14.

The evaluator is a job interviewer. Although in principle there is an infinite number of covariates that can describe a job candidate, it is understood that not all of them are actually relevant to the job candidate’s ability. That is, there is some potentially large (but not exhaustive) set of covariates that contain all of the predictive content about the candidate’s ability, and the remaining covariates are either irrelevant for predicting ability, or are predictive only because they correlate with other intrinsically predictive covariates.

If irrelevant covariates cannot be disclosed to the evaluator, then we return to our main model with a smaller $n$ and our previous results extend directly. The more novel case is the one in which it is known that $n-r_{n}$ covariates are irrelevant, but those covariates can still be disclosed to the evaluator (for example, because it is not commonly understood that they are irrelevant).²⁷²⁷27To see the difference, consider the case in which the agent simply wants the evaluator to hold a higher posterior expectation. The irrelevant covariates create noise, and for some realizations of $f$ it may be that disclosing an irrelevant covariate leads to a higher evaluation.

To model this, we suppose there is a sequence of sets of relevant covariates $(R_{1},R_{2},\dots)$ such that each $R_{n}$ includes the standard covariates in $\mathcal{S}$ and is of size $s+r_{n}$ , where $r_{n}=\lfloor\alpha_{r}\cdot n\rfloor$ is the (known) number of relevant nonstandard covariates. We will moreover without loss index these sets so that the relevant covariates are $x_{s+1},x_{s+2},\dots,x_{s+r_{n}}$ . The irrelevance of the remaining covariates is reflected in the following assumption, which says that, holding fixed the values of the relevant covariates, the values of the irrelevant covariates do not change the type.

Assumption 4 (Irrelevance).

There is a function $g(x_{1},\dots,x_{s+r_{n}})$ such that

f(x_{1},\dots,x_{n})=g(x_{1},\dots,x_{s+r_{n}})

for every $(x_{1},\dots,x_{n})\in\{0,1\}^{n}$ .

We then modify Assumptions 1 and 2 to apply only to the relevant covariates.

Assumption 5 (Exchangeability).

For every realization of $(x_{1},\dots,x_{s})$ , the sequence

(\widetilde{Y}^{n}_{1},\dots,\widetilde{Y}^{n}_{2^{n}})\equiv(g(x_{1},\dots,x_{s},x_{s+1},\dots,x_{s+r_{n}}):(x_{s+1},\dots,x_{s+r_{n}})\in\{0,1\}^{r_{n}-s})

(5.1)

is finitely exchangeable.

Assumption 6 (Constant Unpredictability of $Y$ ).

For every realization of the standard covariates $(x_{1},\dots,x_{s})$ and every pair $n^{\prime}>n$ , the sequence $(\widetilde{Y}^{n}_{1},\dots,\widetilde{Y}_{2^{s+r_{n}}}^{n})$ and the truncated sequence $(\widetilde{Y}^{n^{\prime}}_{1},\dots,\widetilde{Y}^{n^{\prime}}_{2^{s+r_{n}}})$ are identical in distribution.

Our main model is otherwise unchanged—in particular, we allow the agent to disclose any of the $n-s$ nonstandard covariates, including those which are irrelevant. We show that our previous results extend so long as $\alpha_{h}<\alpha_{r}$ .

Proposition 5.1.

Suppose Assumptions 5 and 6 hold, where $\alpha_{h}<\alpha_{r}$ . Then for every covariate vector $\mathbb{x}\in\{0,1\}^{\infty}$ the expected value of context vanishes to zero as $n$ grows large.

The case where $\alpha_{r}<\alpha_{h}$ (violating the assumption of the result) corresponds to a setting in which the number of relevant covariates is so small that the agent can disclose all of them. For example, if a job candidate is convinced that only 10 nonstandard covariates are actually relevant for predicting his on-the-job ability, and all of these nonstandard covariates can be shared during a job interview, then our main results do not extend and we should think of the value of context as being potentially large. On the other hand, if the set of relevant covariates are low-dimensional relative to the total number of covariates, but are still too numerous to be fully revealed, then our main results do extend.

This result suggests that whether human or black box evaluation is more appropriate should be determined in part based on whether the available signal is concentrated in a small number of covariates (favoring the human evaluator) or spread out across a large number of covariates (favoring the black box evaluator). The same application may transition between these regimes over time. For example, in a medical setting where black box diagnosis is highly accurate based on non-interpretable features of an image scan, it may not be possible to communicate sufficient information via any small number of covariates. But if the predictive features of the image are subsequently better understood and defined, then it may be that a small set of (newly defined) features does eventually capture all of the signal content, and can be fully disclosed in a conversation.

5.2 Certain nonstandard covariates have known effects

Another possibility is that the agent knows how certain nonstandard covariates are correlated with the type.

Example 15.

The agent is a patient who resided around Chernobyl at the time of the Chernobyl nuclear disaster of 1986. The agent is being evaluated for potential thyroid conditions, and knows that this particular part of his history increases the probability of a thyroid condition.

Specifically suppose there is a set $K\subseteq\{1,\dots,n\}$ of covariate indices whose effects are known. The set $K$ includes the standard covariates, but possibly also includes some nonstandard covariates. Without loss, we can index these $x_{1},\dots,x_{K}$ . We weaken Assumption 1 to the following:

Assumption 7 (Exchangeability).

For every realization of the covariates $(x_{1},\dots,x_{K})$ ,

(Y^{n}_{1},\dots,Y^{n}_{2^{n}})\equiv(f(x_{1},\dots,x_{K},x_{K+1},\dots,x_{n}):(x_{K+1},\dots,x_{n})\in\{0,1\}^{n-|K|})

(5.2)

is finitely exchangeable.

This assumption imposes exchangeability only over the nonstandard covariates whose effects are not ex-ante known. Clearly if $K$ is a strict superset of $\mathcal{S}$ , then the expected value of context need not vanish under Assumptions 2 and 7. A simple example is the following.

Example 16.

Suppose there are no standard covariates, and $K=\{1\}$ , i.e., the first nonstandard covariate has a known effect, where $f(\mathbb{x}_{n})\sim U[-1,0]$ if $x_{1}=0$ and $f(\mathbb{x}_{n})\sim U[0,1]$ if $x_{1}=1$ . Suppose further that the agent’s covariate vector satisfies $x_{1}=1$ . Then the prior expectation of the agent’s type is 0, but revealing $x_{1}=1$ moves the posterior expectation to $1/2$ . So the expected value of context does not vanish.

But if we modify the definition in (2.2) to

\hat{y}^{f}_{\mathbb{x}_{n}}(A)=\mathbb{E}[Y\mid X_{i}=x_{i}\quad\forall i\in K\cup A]

with $K$ replacing $\mathcal{S}$ , and again let $U^{f}_{\mathbb{x}_{n}}(A)=u\left(\hat{y}^{f}_{\mathbb{x}_{n}}(A),y\right)$ , then the modified expected value of context

v(f,\mathbb{x}_{n})=\max_{\begin{subarray}{c}H\subseteq\mathcal{N}\backslash K\\ |H|\leq\alpha_{h}n\end{subarray}}U_{\mathbb{x}_{n}}^{f}(H)-U_{\mathbb{x}_{n}}^{f}(\varnothing)

evaluates the value of context beyond those covariates with known effects. The same proof shows that this expected value of context vanishes to zero as $n$ grows large. That is, beyond the value of context that is already clear to the agent based on private knowledge about his nonstandard covariates, the agent does not expect substantial additional gain from the remaining covariates.

5.3 Information accumulates in $n$

So far we have assumed that the predictability of $Y$ is constant in the number of covariates. This is not essential for our results. Suppose instead that

y_{n}=f(x_{1},\dots,x_{n})+\varepsilon_{n}

where $\varepsilon_{n}$ is a mean-zero random variable that is independent of the covariates $(x_{1},\dots,x_{n})$ . This describes a setting in which the covariates are not sufficient to reveal the agent’s type, and there is a residual unknown.

Our previous results extend directly when the distribution of $\varepsilon_{n}$ is the same for all $n$ . Another natural case is one in which $\text{Var}(\varepsilon_{n})$ decreases monotonically in $n$ , with $\lim_{n\rightarrow\infty}\text{Var}(\varepsilon_{n})=0$ ; that is, in environments with a larger number of covariates, the agent’s type is more predictable. In Appendix B.5 we show that Theorem 3.1 directly extends. We also show that the comparisons in parts (a) and (b) of Theorem 3.2 hold for sufficiently large $n$ , under the following assumption:

Assumption 8.

For each $n\in\mathbb{Z}_{+}$ , let $\sigma_{\varepsilon,n}^{2}:=\text{Var}(\varepsilon_{n})$ and assume that $\frac{\varepsilon_{n}}{\sqrt{\sigma_{\varepsilon,n}^{2}}}$ admits a pdf, which we denote by $g_{n}$ . The sequence $\{g_{n}(0)\}_{n}$ is bounded.

Loosely speaking, our comparisons in Theorem 3.2 continue to hold so long as the variance of $\varepsilon_{n}$ does not increase too fast in $n$ .

5.4 Sufficient residual uncertainty

In this final section, we provide an abstract condition on the evaluator’s learning environment, under which Theorem 3.1 extends.

For each $n$ , let $\mathcal{D}_{n}$ denote the set of all disclosures respecting the human evaluator’s capacity constraint, i.e., all pairs $(H,(x_{i})_{i\in H})$ consisting of a set $H\subseteq\{s+1,\dots,n\}$ with $\lfloor\alpha_{h}\cdot n\rfloor$ or fewer nonstandard covariates, and values $(x_{i})_{i\in H}$ for those covariates. Further define $\mathcal{D}=\cup_{n\geq 1}\mathcal{D}_{n}$ to be the set of all disclosures. Similarly, for each $n$ let $\mathcal{F}_{n}$ be the set of all type functions $f:\{0,1\}^{n}\rightarrow[-\overline{y},\overline{y}]$ , and define $\mathcal{F}=\cup_{n\geq 1}\mathcal{F}_{n}$ . An evaluation rule is any family $\rho=(\rho_{f})_{f\in\mathcal{F}}$ where each $\rho_{f}:\mathcal{D}\rightarrow[-\overline{y},\overline{y}]$ maps disclosures into evaluations for the given function $f$ . Finally, fixing any update rule $\rho$ , number of covariates $n$ , and disclosure $d\in\mathcal{D}_{n}$ , let

Z_{d}^{n}=\rho_{f}(d)

be the random evaluation when $f$ is drawn from $\mathcal{F}_{n}$ according to the agent’s prior.

We impose two assumptions below on the evaluation rule. The first says that the expected evaluation $Z_{d}^{n}$ is equal to the prior expected type $\mu\equiv\mathbb{E}[Y]$ ; the second says that the distribution of the evaluation concentrates on $\mu$ sufficiently fast as the number of hidden covariates $n$ grows large. Intuitively, the assumption requires that as the number of residual unknowns—i.e., the covariates which are predictive of the type, but are not revealed to the evaluator—grows large, the informativeness of any fixed disclosure becomes small.²⁸²⁸28In the limit with an uninformative disclosure, the distribution of the evaluation is degenerate at the prior expectation $\mu$ for any Bayesian updating rule.

Assumption 9 (Unbiased).

$\mathbb{E}[Z_{d}^{n}]=\mu$ for every disclosure $d$ .

Assumption 10 (Fast Concentration).

For any sequence of feasible disclosures $(d_{n})_{n\geq 1}$ ,

Var(Z_{d_{n}}^{n})=o\left(\frac{1}{K_{n}}\right)

where $K_{n}=\sum_{j=0}^{\lfloor\alpha_{h}n\rfloor}\binom{n-s}{j}$ is the number of unique sets $H\subseteq\{s+1,\dots,n\}$ with $\alpha_{h}n$ or fewer elements.

These assumptions do not in general represent a weakening of our main model. Previously we studied the evaluation rule $\rho$ mapping each disclosure into the conditional expectation of the agent’s type, and imposed Assumption 1 on the agent’s prior about $f$ . In this model, the evaluation $Z_{d}^{n}$ for any disclosure $d=(H,(x_{i})_{i\in H})$ could be represented as a sample average consisting of $2^{n-s-|H|}$ elements. Assumption 9 is clearly satisfied (because the update rule is Bayesian), but one can select a sequence of disclosures $(d_{n})$ such that $Var(Z_{d_{n}}^{n})=\frac{1}{2^{n(1-\alpha_{h})-s}}$ (see the proof of Theorem 3.1 for details). Thus the speed of convergence demanded in Assumption 10 is not met when $\alpha_{h}$ is sufficiently large.

Nevertheless, Assumption 10 identifies the qualitative property of our main setting that gave us Theorem 3.1: residual uncertainty must have the power to overwhelm any information revealed through disclosure. Under these assumptions, our main result extends.

Proposition 5.2.

Suppose Assumptions 9 and 10 hold. Then for every covariate vector $\mathbb{x}\in\{0,1\}^{\infty}$ , the expected value of context vanishes to zero as $n$ grows large, i.e.,

\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0.

This result also clarifies that neither the precise symmetry imposed by Assumption 1, nor the assumption of Bayesian updating in our main model, are crucial for our main result.

6 Conclusion

One argument against replacing human experts with algorithmic predictions is that no matter how many covariates are taken as input by the algorithm, the number of potentially relevant circumstances and characteristics is still more numerous. In cases where some important fact is missed by a human evaluator, it is often possible to correct this oversight. There is no such safety net with a black box algorithm.

This is a compelling narrative, yet our results suggest that it may be less important than it initially seems. When there is a large number of nonstandard covariates that may matter for the prediction problem, but the agent does not know how these nonstandard covariates impact the type, then the expected value of disclosing additional information is small—even when we assume that the agent can identify the most useful covariates to disclose, and that the claims about these covariates are taken at face value.

In contrast, if the agent has substantial prior knowledge about the predictive roles of the nonstandard covariates, then our conclusion will not be appropriate. In particular, if there is a “low-dimensional” set of covariates that predict the type and can be fully disclosed (as in Example 9), or if there is a known structural relationship between covariates and the type (as in Example 11), then the expected value to disclosing additional information may be large. We thus view our results as revealing a link betwen the value of targeting information acquisition (beyond simply conditioning on large quantities of information) and the extent of prior “structural information” about the numerous covariates that can be brought up as explanations.

We conclude with two alternative interpretations of our model and results.

Online versus offline learning. In our model, a key distinction between human and black box evaluation is that the human can adapt which covariates are acquired based on other properties of the agent, while the black box cannot. This is an appropriate comparison of human and black box evaluators as they currently stand: The black box algorithms used to make predictions about humans are usually supervised machine learning algorithms which are pre-trained on a large data set. But new black box algorithms, such as large language models, blur this distinction, and future evaluations (e.g., medical diagnoses) may be conducted by black box systems with which the agent can communicate.

From this more forward-looking perspective, our results can be understood as comparing the merits of online versus offline learning. That is, how valuable is it to have the evaluator dynamically acquire information given feedback from the agent? Our result suggests that this is not important in expectation. For example, Part (a) of Theorem 3.2 implies that an agent who cares about accuracy should prefer a supervised machine learning algorithm trained on a large number of covariates over a conversation with ChatGPT that reveals a smaller number of covariates.

Value of human supervision of algorithms. While we have interpreted the $s$ standard covariates as a small set of covariates acquired by the human evaluator, an alternative interpretation is that they are the initial inputs to an algorithm. In this case, the expected value of context quantifies the sensitivity of the algorithm’s predictions to the addition of further relevant inputs, e.g., as identified by a human manager. This interpretation is particularly relevant when we consider accuracy as the objective, in which case the value of context tells us how wrong the algorithm is compared to if the algorithm could be retrained on additional relevant inputs. Theorem 3.1 says that while in certain cases additional inputs would lead to a substantially more accurate prediction, under our symmetry assumption on the agent’s prior this will not typically be the case.

References

Acemoglu et al. (2015) Acemoglu, D., V. Chernozhukov, and M. Yildiz (2015): “Fragility of Asymptotic Agreement under Bayesian Learning,” Theoretical Economics, 11, 187–225.
Acosta et al. (2022) Acosta, J., G. Falcone, P. Rajpurkar, and E. Topol (2022): “Multimodal biomedical AI,” Nature Medicine, 28, 1773–1784.
Agarwal et al. (2023) Agarwal, N., A. Moehring, P. Rajpurkar, and T. Salz (2023): “Combining Human Expertise with Artificial Intelligence: Experimental Evidence from Radiology,” Working Paper 31422, National Bureau of Economic Research.
Akbarpour et al. (2024) Akbarpour, M., S. Malladi, and A. Saberi (2024): “Just a Few Seeds More: Value of Network Information for Diffusion,” Working Paper.
Angelova et al. (2022) Angelova, V., W. Dobbie, , and C. S. Yang (2022): “Algorithmic Recommendations and Human Discretion,” Working Paper.
Antic and Chakraborty (2023) Antic, N. and A. Chakraborty (2023): “Selected Facts,” Working Paper.
Arnold and Groeneveld (1979) Arnold, B. C. and R. A. Groeneveld (1979): “Bounds on expectations of linear systematic statistics based on dependent samples,” The Annals of Statistics, 220–223.
Bardhi (2023) Bardhi, A. (2023): “Attributes: Selective Learning and Influence,” Working Paper.
Bastani et al. (2022) Bastani, H., O. Bastani, and W. P. Sinchaisri (2022): “Improving Human Decision-Making with Machine Learning,” .
Berman (1964) Berman, S. M. (1964): “Limit Theorems for the Maximum Term in Stationary Sequences,” The Annals of Mathematical Statistics, 35, 502 – 516.
Blackwell and Dubins (1962) Blackwell, D. and L. Dubins (1962): “Merging of Opinions with Increasing Information,” The Annals of Mathematical Statistics.
Chalfin et al. (2016) Chalfin, A., O. Danieli, A. Hillis, Z. Jelveh, M. Luca, J. Ludwig, and S. Mullainathan (2016): “Productivity and Selection of Human Capital with Machine Learning,” American Economic Review, 106, 124–27.
Chernozhukov et al. (2013) Chernozhukov, V., D. Chetverikov, and K. Kato (2013): “Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors,” .
Crawford and Sobel (1982) Crawford, V. P. and J. Sobel (1982): “Strategic information transmission,” Econometrica: Journal of the Econometric Society, 1431–1451.
Di Tillio et al. (2021) Di Tillio, A., M. Ottaviani, and P. N. Sørensen (2021): “Strategic Sample Selection,” Econometrica, 89, 911–953.
Dworczak and Martini (2019) Dworczak, P. and G. Martini (2019): “The Simple Economics of Optimal Persuasion,” Journal of Political Economy, 127, 1993–2048.
Dye (1985) Dye, R. A. (1985): “Disclosure of Nonproprietary Information,” Journal of Accounting Research, 23, 123–145.
Farina et al. (2023) Farina, A., G. Frechette, A. Lizzeri, and J. Perego (2023): “The Selective Disclosure of Evidence: An Experiment,” Working Paper.
Frankel (2014) Frankel, A. (2014): “Aligned Delegation,” American Economic Review, 104, 66–83.
Frick et al. (2023) Frick, M., R. Iijima, and Y. Ishii (2023): “Learning Efficiency of Multiagent Information Structures,” Journal of Political Economy, 131, 3377–3414.
Gillis et al. (2021) Gillis, T., B. McLaughlin, and J. Spiess (2021): “On the Fairness of Machine-Assisted Human Decisions,” Working Paper.
Glazer and Rubinstein (2004) Glazer, J. and A. Rubinstein (2004): “On optimal rules of persuasion,” Econometrica, 72, 1715–1736.
Golub and Jackson (2012) Golub, B. and M. Jackson (2012): “How Homophily Affects the Speed of Learning and Best-Response Dynamics,” The Quarterly Journal of Economics, 127, 1287–1338.
Grossman and Hart (1980) Grossman, S. J. and O. D. Hart (1980): “Disclosure Laws and Takeover Bids,” The Journal of Finance, 35, 323–334.
Haghtalab et al. (2021) Haghtalab, N., M. Jackson, and A. Procaccia (2021): “Belief polarization in a complex world: A learning theory perspective,” PNAS, 118, 141–73.
Harel et al. (2020) Harel, M., E. Mossel, P. Strack, and O. Tamuz (2020): “Rational Groupthink*,” The Quarterly Journal of Economics, 136, 621–668.
Hoffman et al. (2017) Hoffman, M., L. B. Kahn, and D. Li (2017): “Discretion in Hiring*,” The Quarterly Journal of Economics, 133, 765–800.
Jha (2020) Jha, S. (2020): “Can you sue an algorithm for malpractice? It depends,” .
Jin et al. (2021) Jin, G. Z., M. Luca, and D. Martin (2021): “Is No News (Perceived As) Bad News? An Experimental Investigation of Information Disclosure,” American Economic Journal: Microeconomics, 13, 141–73.
Jung et al. (2017) Jung, J., C. Concannon, R. Shroff, S. Goel, and D. G. Goldstein (2017): “Simple rules for complex decisions,” Working Paper.
Jussupow and Heinzl (2020) Jussupow, Ekaterina; Benbasat, I. and A. Heinzl (2020): “Why are we averse towards algorithms? A comprehensive literature review on algorithm aversion,” in In Proceedings of the 28th European Conference on Information Systems.
Kamenica and Gentzkow (2011) Kamenica, E. and M. Gentzkow (2011): “Bayesian Persuasion,” American Economic Review, 101, 2590–2615.
Klabjan et al. (2014) Klabjan, D., W. Olszewski, and A. Wolinsky (2014): “Attributes,” Games and Economic Behavior, 88, 190–206.
Kleinberg et al. (2017) Kleinberg, J., H. Lakkaraju, J. Leskovec, J. Ludwig, and S. Mullainathan (2017): “Human Decisions and Machine Predictions,” The Quarterly Journal of Economics, 133, 237–293.
Kushilevitz and Nisan (1996) Kushilevitz, E. and N. Nisan (1996): Communication Complexity, Cambridge University Press.
Lai et al. (2023) Lai, V., C. Chen, A. Smith-Renner, Q. V. Liao, and C. Tan (2023): “Towards a Science of Human-AI Decision Making: An Overview of Design Space in Empirical Human-Subject Studies,” in Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA: Association for Computing Machinery, FAccT ’23, 1369–1385.
Liang and Mu (2019) Liang, A. and X. Mu (2019): “Complementary Information and Learning Traps*,” The Quarterly Journal of Economics, 135, 389–448.
Liang et al. (2022) Liang, A., X. Mu, and V. Syrgkanis (2022): “Dynamically Aggregating Diverse Information,” Econometrica, 90, 47–80.
Longoni et al. (2019) Longoni, C., A. Bonezzi, and C. K. Morewedge (2019): “Resistance to Medical Artificial Intelligence,” Journal of Consumer Research, 46, 629–650.
McLaughlin and Spiess (2022) McLaughlin, B. and J. Spiess (2022): “Algorithmic Assistance with Recommendation-Dependent Preferences,” Working Paper.
McLean and Postlewaite (2002) McLean, R. and A. Postlewaite (2002): “Informational Size and Incentive Compatibility,” Econometrica, 70, 2421–2453.
Milgrom (1981) Milgrom, P. R. (1981): “Good News and Bad News: Representation Theorems and Applications,” The Bell Journal of Economics, 12, 380–391.
Morris and Yildiz (2019) Morris, S. and M. Yildiz (2019): “Crises: Equilibrium Shifts and Large Shocks,” American Economic Review, 109, 2823–54.
Obermeyer and Emanuel (2016) Obermeyer, Z. and E. J. Emanuel (2016): “Predicting the Future - Big Data, Machine Learning, and Clinical Medicine,” The New England Journal of Medicine, 375, 1216–9.
Raghu et al. (2019) Raghu, M., K. Blumer, G. Corrado, J. Kleinberg, Z. Obermeyer, and S. Mullainathan (2019): “The Algorithmic Automation Problem: Prediction, Triage, and Human Effort,” Working Paper.
Rajpurkar et al. (2017) Rajpurkar, P., J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, M. P. Lungren, and A. Y. Ng (2017): “CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning,” Working Paper.
Spiegler (2020) Spiegler, R. (2020): “Behavioral Implications of Causal Misperceptions,” Annual Review of Economics, 12, 81–106.
Vives (1992) Vives, X. (1992): “How Fast do Rational Agents Learn?” Review of Economic Studies, 60, 329–347.
Yang et al. (2024) Yang, K. H., N. Yoder, and A. Zentefis (2024): “Explaining Models,” Working Paper.

Appendix A Proof of Generalization of Theorem 3.1

In a change of notation relative to the main text, we subsequently use $\mathbb{X}_{n}$ to denote the agent’s covariate vector and $Y$ to denote the agent’s type (leaving $\mathbb{x}_{n}$ and $y$ to denote realizations of these random variables). Moreover, rather than supposing that $Y$ is deterministically related to $\mathbb{X}_{n}$ via a function $f$ , let $(\mathbb{X}_{n},Y)\sim P^{n}$ where $P^{n}$ is unknown. We replace Assumptions 1 and 2 with the following weaker assumption.

Assumption 11.

Fix any realization of the standard covariates $\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}$ . There is an infinitely exchangeable sequence $(\widetilde{Y}_{1},\widetilde{Y}_{2},\dots)$ such that for every $n\in\mathbb{N}$ , the sequence

\left(\mathbb{E}[Y\mid(X_{1},\dots,X_{n})=(\mathbb{x}_{\mathcal{S}},\mathbb{x}_{-\mathcal{S}})]\right)_{\mathbb{x}_{-\mathcal{S}}\in\{0,1\}^{n-s}}

has the same distribution as $(\widetilde{Y}_{1},\dots,\widetilde{Y}_{2^{n}})$ .

That is, permuting the labels and/or values of the nonstandard covariates does not change the joint distribution of the conditional expectations of $y$ . When $y$ is degenerate conditional on $\mathbb{x}_{n}$ , Assumption 11 reduces to our previous two assumptions. We will prove the following generalization of Theorem 3.1.

Theorem A.1.

Suppose Assumption 11 holds. Then for every covariate vector $\mathbb{x}\in\{0,1\}^{\infty}$ , the expected value of context vanishes to zero as $n$ grows large, i.e., $\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0$ .

Towards this, we will first prove the conclusion under a strengthening of Assumption 11, where exchangeability is replaced by an assumption that conditional expectations are i.i.d. across the different possible completions of the agent’s covariate vector.

Assumption 12.

Fix any realization of the standard covariates $\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}$ . Then there is a distribution $F$ such that for every $n\in\mathbb{N}$ , the conditional expectations

\mathbb{E}[Y\mid(X_{1},\dots,X_{n})=(\mathbb{x}_{\mathcal{S}},\mathbb{x}_{-\mathcal{S}})]\sim_{iid}F

across all vectors $\mathbb{x}_{-\mathcal{S}}\in\{0,1\}^{n}$ .

Theorem A.2.

Suppose Assumption 12 holds. Then for every covariate vector $\mathbb{x}\in\{0,1\}^{\infty}$ , the expected value of context vanishes to zero as $n$ grows large, i.e., $\lim_{n\rightarrow\infty}V(n,\mathbb{x}_{n})=0$ .

Sections A.1-A.4 prove Theorem A.2, and Section A.5 shows that Theorem A.2 implies Theorem A.1.

A.1 Outline for Proof of Theorem A.2

Fix any realization $(x_{1},\dots,x_{s})$ of the agent’s standard covariates. After observing $(x_{1},\dots,x_{s})$ , the evaluator assigns positive probability to the $2^{n-s}$ covariate vectors whose first $s$ entries are equal to $(x_{1},\dots,x_{s})$ . Let these covariate vectors be indexed by $\mathbb{x}^{j}$ where $j=1,\dots,2^{n-s}$ , and define

Y_{j}\equiv\mathbb{E}_{P^{n}}\left[Y\mid(X_{1},\dots,X_{n})=\mathbb{x}^{j}\right]

to be the (random) expected type given covariate vector $\mathbb{x}^{j}$ . By assumption that the marginal distribution over covariate vectors is uniform, the evaluator’s posterior expectation of the agent’s type after observing the agent’s standard covariates is

\widehat{Y}(\varnothing,\mathbb{x}_{n})=\frac{1}{2^{n-s}}\sum_{i=1}^{2^{n-s}}Y_{j}\equiv Z^{n}_{\varnothing}.

There are $K_{n}=\sum_{k=0}^{h_{n}}\binom{n-s}{k}$ subsets of $\{s+1,\dots,n\}$ that contain $h_{n}$ or fewer elements. Enumerate these sets as $H_{1},\dots,H_{K_{n}}$ . For each $H_{k}$ , let

S_{k}=\left\{j\,:\,\mathbb{x}^{j}\in C_{H_{k}}(\mathbb{x}_{n})\right\}

be the set of indices for those covariate vectors $\mathbb{x}^{j}$ that agree with the agent’s covariate vector $\mathbb{x}_{n}$ in entries $(1,\dots,s)\cup H_{k}$ (where $C_{H_{k}}(\mathbb{x}_{n})$ is as defined in (2.1)). After observing the agent’s nonstandard covariates in the set $H_{k}$ , the evaluator’s posterior expectation about the agent’s type is

\widehat{Y}(H_{k},\mathbb{x}_{n})=\frac{\sum_{j\in S_{k}}Y_{j}}{|S_{k}|}\equiv Z_{k}.

Although the distributions of the random variables $Z_{k}$ vary across $n$ , we suppress this dependence in what follows to save on notation. The remainder of the proof proceeds by first showing that in expectation the possible increase in the evaluator’s posterior expectation over the prior expectation $\mu\equiv\mathbb{E}[Y]$ is vanishing.

Proposition A.1.

$\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]=0.$

This is subsequently strengthened to the statement that the expected maximum absolute difference between $Z_{k}$ and $\mu$ converges to zero.

Proposition A.2.

$\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|]=0.$

And finally we apply the above proposition to demonstrate the conclusion of the theorem, i.e., that

\lim_{n\rightarrow\infty}V(n)=\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)\right]-\mathbb{E}\left[u\left(Z^{n}_{\varnothing},Y\right)\right]=0

Thus in expectation the possible increase in the agent’s payoff also vanishes. We suppress dependence of $V$ on the covariate vector $\mathbf{x}_{n}$ in what follows, writing simply $V(n)$ .

A.2 Proof of Proposition A.1

Statement of the proposition: $\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]=0.$

The quantity $\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}]$ is the expected first-order statistic of a sequence of non-i.i.d. variables $Z_{1},\dots,Z_{K_{n}}$ . The proof is organized as follows. In Sections A.2.1 and A.2.2, we define i.i.d. variables $Z^{iid}_{k}$ with the property that

\mathbb{E}\left[\max\{Z_{1},\dots,Z_{K_{n}}\}\right]\leq\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right].

(A.1)

In Sections A.2.3 and A.2.4, we show that the RHS of the above display converges to $\mu$ as $n$ grows large.

A.2.1 Replacing $Z_{k}$ ’s with independent variables $Z_{k}^{ind}$

In general, disclosures $k$ and $k^{\prime}$ may lead to posterior expectations $Z_{k}$ and $Z_{k^{\prime}}$ that are correlated due to the presence of the same $Y_{i}$ ’s across the different sample averages. We first show that replacing these $Z_{k}$ ’s with properly defined independent random variables weakly increases the value of context.

Definition A.1.

For each $1\leq k\leq K_{n}$ define

Z_{k}^{ind}=\frac{\sum_{j=1}^{|S_{k}|}Y_{j}^{k}}{|S_{k}|}

(A.2)

where $Y_{j}^{k}\sim_{iid}F$ , so that each $Z_{k}^{ind}$ has the same distribution as $Z_{k}$ , but the vector $(Z_{1}^{ind},\dots,Z_{K}^{ind})$ is mutually independent.

Lemma A.1.

Let

V_{n}\equiv\mathbb{E}[\max\{Z_{1},...,Z_{K_{n}}\}]

and

V^{ind}_{n}\equiv\mathbb{E}[\max\{Z_{1}^{ind},...,Z_{K_{n}}^{ind}\}].

Then $V_{n}\leq V^{ind}_{n}$ for all $n\in\mathbb{Z}_{+}$ .

Proof.

Throughout we use $X\succeq Y$ to mean that the distribution of $X$ first-order stochastically dominates the distribution of $Y$ .

Sublemma 1.

Let $X_{1}$ ,…, $X_{Q}$ , $W$ be a sequence of real-valued random variables (not necessarily i.i.d.). Let $a_{1}>a_{2}>...>a_{Q-1}>a_{Q}>0$ be a sequence of positive constants. Further, let $Y_{1},...,Y_{Q}$ be i.i.d. random variables, independent of $(X_{1},\dots,X_{Q},W)$ . Define

	$\displaystyle M_{C}$	$\displaystyle=\max_{i\in\{1,...,Q\}}\{X_{i}+a_{i}Y_{1}\}$
	$\displaystyle M_{I}$	$\displaystyle=\max_{i\in\{1,...,Q\}}\{X_{i}+a_{i}Y_{i}\}$

Then $M_{I}\succeq M_{C}$ and $\max\{M_{I},W\}\succeq\max\{M_{C},W\}$ .

Proof.

For $q\in\{1,...,Q\}$ define:

	$\displaystyle M_{C}^{q}$	$\displaystyle=\max\left\{\max_{i\in\{1,...,q-1\}}\{X_{i}+a_{i}Y_{1}\},X_{q}+a_{q}Y_{1}\right\}$
	$\displaystyle\widetilde{M}_{C}^{q}$	$\displaystyle=\max\left\{\max_{i\in\{1,...,q-1\}}\{X_{i}+a_{i}Y_{1}\},X_{q}+a_{q}Y_{q}\right\}$

so that $M_{C}^{q}$ is the maximum of the first $q$ terms in $M_{C}$ , and $\widetilde{M}_{C}^{q}$ replaces $Y_{1}$ in the $q$ -th term of $M_{C}^{q}$ with $Y_{q}$ . We first demonstrate an analogue of the desired conclusions for $M_{C}^{q}$ and $\widetilde{M}_{C}^{q}$ .

Sublemma 2.

$\widetilde{M}_{C}^{q}\succeq M_{C}^{q}$ and $\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\}$ .

Proof.

Without loss of generality set $a_{q}=1$ . We’ll first show that $\widetilde{M}_{C}^{q}\succeq M_{C}^{q}$ . To establish first-order stochastic dominance, we need to show that for all $t\in\mathbb{R}$ it holds that

\mathbb{P}(M_{C}^{q}\leq t)-\mathbb{P}(\widetilde{M}_{C}^{q}\leq t)\geq 0

For each $i\in\{1,...,q-1\}$ define the event

B_{i}:=\{X_{q}+Y_{1}>X_{i}+a_{i}Y_{1}\}\equiv\left\{Y_{1}<\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}.

Further let

B=\bigcap_{i=1}^{q}B_{i}=\left\{Y_{1}<\min_{i\in\{1,...,q-1\}}\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}

be the event that $X_{q}+Y_{1}$ achieves the maximum among $\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q}$ . We’ll show that the FOSD rankings in Sublemma 2 hold both on event $B$ and also on its complement $B^{c}$ .

Define

\widetilde{B}:=\left(Y_{q}<\min_{i\in\{1,...,q-1\}}\left\{\frac{1}{a_{i}-1}(X_{q}-X_{i})\right\}\right)

to be the event that $X_{q}+Y_{q}$ achieves the maximum among $\{X_{i}+a_{i}Y_{q}\}_{i=1}^{q}$ . Then

$\displaystyle\widetilde{M}_{C}^{q}\|B$	$\displaystyle\succeq(X_{q}+Y_{q})\|B$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}\|B+Y_{q}$	$\displaystyle\text{since }Y_{q}\perp\!\!\!\perp(X_{1},\dots,X_{q},Y_{1})$
	$\displaystyle\succeq X_{q}\|B+Y_{q}\|\tilde{B}$	$\displaystyle\text{since }Y_{q}\succeq Y_{q}\mid\widetilde{B}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}\|B+Y_{1}\|B$	$\displaystyle\text{since }Y_{1}\mid B\stackrel{{\scriptstyle d}}{{=}}Y_{q}\mid\widetilde{B}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}(X_{q}+Y_{1})\|B\stackrel{{\scriptstyle d}}{{=}}M_{C}^{q}\|B$

Thus $\widetilde{M}_{C}^{q}|B\succeq M_{C}^{q}|B$ .

Now consider the event $B^{c}$ , on which $X_{q}+Y_{1}$ does not achieve the maximum among $\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q}$ . Then either $X_{1}+Y_{q}\leq\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q-1}$ , in which case $\widetilde{M}_{C}^{q}=M_{C}^{q}$ , or $X_{1}+Y_{q}>\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q-1}$ , in which case $\widetilde{M}_{C}^{q}>M_{C}^{q}$ . So

\widetilde{M}_{C}^{q}|B^{c}\succeq\max\{X_{1}+a_{1}Y_{1},...,X_{q-1}+a_{q-1}Y_{1}\}|B^{c}\stackrel{{\scriptstyle d}}{{=}}M_{C}^{q}|B^{c}.

and hence $\widetilde{M}_{C}^{q}|B^{c}\succeq M_{C}^{q}|B^{c}$ .

Now we show that $\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\}$ . For any realization $w$ of $W$ , let $X^{w}_{i}$ denote the conditional random variable $X_{i}|W=w$ . Define $M_{C}^{q,w}$ and $\widetilde{M}_{C}^{q,w}$ identically to $M_{C}^{q}$ and $\widetilde{M}_{C}^{q}$ , replacing each $X_{i}$ by $X_{i}^{w}$ . Then by independence of $W$ and $(Y_{1},\dots,Y_{q})$ , the distribution of $\max\{M_{C}^{q,w},w\}$ is identical to that of $\max\{M_{C}^{q},W\}|W=w$ , and the distribution of $\max\{\widetilde{M}_{C}^{q,w},w\}$ is identical to that of $\max\{\widetilde{M}_{C}^{q},W\}|(W=w)$ .

Applying the first part of this sublemma to $M^{q,w}_{C}$ and $\widetilde{M}_{C}^{q,w}$ , we conclude that $M^{q,w}_{I}\succeq M_{C}^{q,w}$ . Since $\max\{.,w\}$ is an increasing convex function, it preserves the first-order stochastic dominance relation and hence $\max\{\widetilde{M}_{C}^{q},W\}|(W=w)\succeq\max\{M^{q}_{C},W\}|(W=w)$ . This argument holds pointwise for all $w$ so $\max\{\widetilde{M}_{C}^{q},W\}\succeq\max\{M_{C}^{q},W\}$ as desired. ∎

We now complete the proof that $\max\{M_{C},W\}\succeq\max\{M_{I},W\}$ . From similar (omitted) arguments it follows that $M_{I}\succeq M_{C}$ . For each $q\in\{1,\dots,Q-1\}$ define

\widehat{M}_{C}^{q}=\max\left\{\max\{X_{i}+a_{i}Y_{1}\}_{i=1}^{q},\max\{X_{i}+a_{i}Y_{i}\}_{i=q+1}^{Q},W\right\}

observing that $\max\{M_{I},W\}=\widehat{M}_{C}^{1}$ and that $\widehat{M}_{C}^{Q}\succeq\max\{M_{C},W\}$ (by Sublemma 2). Moreover, for each $q\in\{1,\dots,Q-1\}$ ,

	$\displaystyle\widehat{M}_{C}^{q}$	$\displaystyle=\max\left\{M_{C}^{q},W^{q}\right\}$
	$\displaystyle\widehat{M}_{C}^{q-1}$	$\displaystyle=\max\{\widetilde{M}_{C}^{q},W^{q}\}$

where $W^{q}=\max\left\{\max\{X_{i}+a_{i}Y_{i}\}_{i=q}^{Q},W\right\}$ is independent of $(Y_{1},\dots,Y_{q-1})$ . So applying Sublemma 2, $\widehat{M}_{C}^{q-1}\succeq\widehat{M}_{C}^{q}$ as desired. ∎

Finally, we use Sublemma 1 to establish Lemma A.1, i.e., the expected value of context weakly increases if we make the $Y$ ’s within different disclosures independent. We will prove this iteratively. For arbitrary $n\in\mathbb{N}$ , define the random variable

M=\max\{Z_{1},\dots,Z_{K_{n}}\}=\max\left\{\frac{\sum_{j\in S_{1}}Y_{j}}{|S_{1}|},\dots,\frac{\sum_{j\in S_{K_{n}}}Y_{j}}{|S_{K_{n}}|}\right\}.

Fix any $Y_{i}$ . We will show that replacing $Y_{i}$ across different sample averages with independent copies of this random variable leads to a FOSD increase in the distribution of $M$ .

Let $I=\{k:i\in S_{k}\}$ be the set of indices of sample averages which contain $Y_{i}$ . Then we can rewrite the previous display as

\displaystyle\max\left\{\,\max_{k\in I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|},\,\max_{k\notin I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|}\right\}

\max\left\{\max_{k\in I}\left\{X_{k}+\frac{1}{|S_{k}|}Y_{i}\right\},W\right\}

(A.3)

where $X_{k}\equiv\frac{1}{|S_{k}|}\sum_{j\in S_{k},j\neq i}Y_{j}$ for each $k\in I$ , and $W\equiv\max_{k\notin I}\frac{\sum_{j\in S_{k}}Y_{k}}{|S_{k}|}$ . Because $(Y_{1},\dots,Y_{K_{n}})$ are mutually independent, $Y_{i}$ is independent of each $X_{k}$ and $W$ . So applying Lemma 1, the random variable in (A.3) has a distribution that is first-order stochastically dominated by the distribution of

\max\left\{\max_{k\in I}\left\{X_{k}+\frac{1}{|S_{k}|}Y_{i}^{k}\right\},W\right\}

as desired. Since $Y_{i}$ is arbitrary, this concludes the proof.

∎

A.2.2 Replacing $Z_{k}^{ind}$ with i.i.d. Variables $Z_{k}^{iid}$

The variables $Z_{1}^{ind},\dots,Z_{K_{n}}^{ind}$ are sample averages of unequal sizes ranging between $2^{n-s-h_{n}}$ and $2^{n-s}$ elements. We next show that replacing each of these variables with a sample average of $2^{n-s-h_{n}}$ elements (the smallest size) weakly increases the value of context.

Definition A.2.

For each $1\leq k\leq K_{n}$ define

Z_{k}^{iid}=\frac{\sum_{j=1}^{2^{n-s-h_{n}}}Y_{j}^{k}}{2^{n-s-h_{n}}}

(A.4)

to be the analogue of $Z_{k}^{ind}$ with $2^{n-s-h_{n}}$ elements instead of $|S_{k}|\geq 2^{n-s-h_{n}}$ , so that the variables $Z_{1}^{iid},\dots,Z_{K_{n}}^{iid}$ are iid.

Lemma A.2.

Let

V^{iid}_{n}\equiv\mathbb{E}\left[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}\right].

Then $V^{ind}_{n}\leq V^{iid}_{n}$ for all $n\in\mathbb{Z}_{+}$ .

Proof.

We use the following result.

Sublemma 3.

Suppose $Y_{1},Y_{2},\dots,Y_{n}$ are independent and identically distributed random variables, and define $\overline{Y}_{n}=\frac{1}{n}\sum_{i=1}^{n}Y_{i}$ to be their sample average. Let $n^{\prime}<n$ and define $\overline{Y}_{n^{\prime}}=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}Y_{i}$ . Then the distribution of $\overline{Y}_{n^{\prime}}$ is a mean preserving spread of the distribution of $\overline{Y}_{n}$ .

Proof.

First observe that $\mathbb{E}[Y_{j}\mid\overline{Y}_{n}]=\overline{Y}_{n}$ for any $j=1,\dots,n$ , since

\overline{Y}_{n}=\mathbb{E}[\overline{Y}_{n}\mid\overline{Y}_{n}]=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[Y_{i}\mid\overline{Y}_{n}]=\mathbb{E}[Y_{j}\mid\overline{Y}_{n}]

where the final equality follows by assumption that the $Y_{i}$ ’s are iid. Then

\displaystyle\mathbb{E}[\overline{Y}_{n^{\prime}}\mid\overline{Y}_{n}]

\displaystyle=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}\mathbb{E}[Y_{i}\mid\overline{Y}_{n}]=\frac{1}{n^{\prime}}\sum_{i=1}^{n^{\prime}}\overline{Y}_{n}=\overline{Y}_{n}

and the distribution of $\overline{Y}_{n^{\prime}}$ is a mean-preserving spread of the distribution of $\overline{Y}_{n}$ as desired. ∎

This lemma implies that each $Z_{k}^{iid}$ second-order stochastically dominates $Z_{k}^{ind}$ (since $|S_{k}|\geq 2^{n-s-h_{n}}$ for all $k$ ). The desired result then follows by Jensen’s inequality, since the entries of $(Z_{1}^{ind},\dots,Z_{K}^{ind})$ are (by construction) independent and the maximum is a convex function. ∎

A.2.3 Asymptotic Normality

Lemma A.3.

Let

V^{N}_{n}\equiv\mathbb{E}\left[\max\{Z_{1}^{N},\dots,Z_{K_{n}}^{N}\}\right]

where $Z^{N}_{k}\sim\mathcal{N}\left(\mu,\frac{1}{2^{n-s-h_{n}}}\right)$ . Then $\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0.$

Proof.

Without loss of generality, let $Var(Y_{j}^{k})=1$ .²⁹²⁹29If $Var(Y_{j}^{k})=0$ , the statement of Theorem 3.1 holds trivially. First observe that

\sqrt{2^{n-s-h_{n}}}\cdot V^{iid}_{n}=\mathbb{E}\left[\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\}\right]

where each

\widetilde{Z}^{iid}_{k}=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\sum_{i=1}^{2^{n-s-h_{n}}}Y^{k}_{i}.

Similarly we can write

\sqrt{2^{n-s-h_{n}}}\cdot V^{N}_{n}=\mathbb{E}\left[\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}\right]

where each

\widetilde{Z}^{N}\sim_{iid}\mathcal{N}\left(\mu,1\right).

When the assumptions for Corollary 2.1 from Chernozhukov et al. (2013) are met (to be verified momentarily), we can conclude that

\rho\left(\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\},\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}\right)\rightarrow 0

where $\rho$ denotes Kolmogorov distance. Thus also

\rho(M^{iid}_{n},M^{N}_{n})\rightarrow 0

(A.5)

where

	$\displaystyle M^{iid}_{n}$	$\displaystyle=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\max\{\widetilde{Z}^{iid}_{1},\dots,\widetilde{Z}^{iid}_{K_{n}}\}$
	$\displaystyle M^{N}_{n}$	$\displaystyle=\frac{1}{\sqrt{2^{n-s-h_{n}}}}\max\{\widetilde{Z}^{N}_{1},\dots,\widetilde{Z}^{N}_{K_{n}}\}$

By assumption, each $Y_{i}^{k}$ is supported on $\left[-\overline{y},\overline{y}\right]$ for some finite $\overline{y}$ . This implies $|M_{n}^{iid}|\leq\overline{y}$ for all $n$ , so the sequence $(M_{n}^{iid})_{n}$ is uniformly integrable. The convergence in (A.5) thus implies

\lim_{n\rightarrow\infty}\left|\mathbb{E}\left[M_{n}^{iid}\right]-\mathbb{E}\left[M_{n}^{N}\right]\right|=\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0

as desired.

It remains to verify that the conditions of Corollary 2.1 from Chernozhukov et al. (2013) are met. This follows from the assumption that $Y_{j}^{k}$ ’s are uniformly bounded, and the observation that

\frac{\log(K_{n}\cdot 2^{n-s-h_{n}})^{7}}{2^{(1-c)(n-s-h_{n})}}\xrightarrow{n\rightarrow\infty}0

for any $c\in(0,1)$ , since $K_{n}=\sum_{j=0}^{h_{n}}{n-s\choose j}\leq 2^{n-s}$ by the Binomial Theorem and $\alpha_{h}<1$ . ∎

A.2.4 Upper Bound for Expected Maximum of Gaussians

Finally by Berman (1964), which provides an upper bound for the expected maximum of independent Gaussian random variables

V^{N}_{n}\leq\frac{1}{2^{n-s-h_{n}}}\cdot 2\sqrt{\log(K_{n})}\leq\frac{1}{2^{n(1-\alpha_{h})-s}}\cdot 2\sqrt{n}

where the final expression converges to zero as $n\rightarrow\infty$ by assumption that $\alpha_{h}<1$ . Since clearly also $\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}Z_{k}-\mu]\geq 0$ , this concludes the proof of Proposition A.1.

A.3 Proof of Proposition A.2

Statement of the proposition: $\lim_{n\rightarrow\infty}\mathbb{E}[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|]=0.$

In an abuse of notation, let $Z_{k}\equiv Z_{k}-\mu$ denote de-meaned sample average. By rewriting the max within the expectation we obtain

	$\displaystyle\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\|Z_{k}\|\right]$	$\displaystyle=\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}Z_{k},-\min_{1\leq k\leq K_{n}}Z_{k}\right\}\right]$
		$\displaystyle\leq\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]+\mathbb{E}\left[\max\left\{-\min_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]$

We will show that each term of this final expression converges to zero. Observe that

\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right]

(A.6)

Moreover,

	$\displaystyle\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]$	$\displaystyle=\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right]$
		$\displaystyle\quad\quad+\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right]$

	$\displaystyle\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right)$	$\displaystyle\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}\geq 0\right]=$
		$\displaystyle=\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]-\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\cdot\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right]$		(A.7)

From Lemma A.1,

\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\right]=0.

(A.8)

Moreover, we showed in Section A.2.1 that the distribution of $(Z^{ind}_{1},\dots,Z^{ind}_{K_{n}})$ first-order-stochastically-dominates that of $(Z_{1},\dots,Z_{K_{n}})$ , so

\displaystyle\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z_{k}<0\right)\leq\mathbb{P}\left(\max_{1\leq k\leq K_{n}}Z^{ind}_{k}<0\right)\leq\prod_{1\leq k\leq K_{n}}\mathbb{P}(Z_{k}^{ind}<0)

which converges to zero as $n$ grows large since each $\mathbb{P}(Z_{k}^{ind}<0)<1$ . Finally,

\mathbb{E}\left[\max_{1\leq k\leq K_{n}}Z_{k}\mid\max_{1\leq k\leq K_{n}}Z_{k}<0\right]\in\left[-\overline{Y},\overline{Y}\right]

(A.9)

uniformly across $n$ . Putting together (A.6) - (A.9) we have that

\lim_{n\rightarrow\infty}\mathbb{E}\left[\max\left\{\max_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=0

as desired. The argument that

\lim_{n\rightarrow\infty}\mathbb{E}\left[\max\left\{-\min_{1\leq k\leq K_{n}}\left\{Z_{k}\right\},0\right\}\right]=0

follows identically, observing that Proposition A.1 is satisfied for $\widetilde{Y}\equiv-Y$ , and that

-\min_{1\leq k\leq K_{n}}Z_{k}=\max_{1\leq k\leq K_{n}}-\frac{\sum_{j\in S_{k}}Y_{j}}{|S_{k}|}=\max_{1\leq k\leq K_{n}}\frac{\sum_{j\in S_{k}}\widetilde{Y}_{j}}{|S_{k}|}.

A.4 Concluding the proof of Theorem A.2

Recall that $Z^{n}_{\varnothing}\equiv\frac{1}{2^{n-s}}\sum_{j=1}^{2^{n-s}}Y_{j}$ denotes the (random) posterior expectation when the agent chooses not to disclose any nonstandard covariates. Clearly $V(n)\geq 0$ (since the agent can always choose to disclose nothing). Also

	$\displaystyle V(n)$	$\displaystyle=\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)\right]-\mathbb{E}\left[u(Z^{n}_{\varnothing},Y)\right]$
		$\displaystyle\leq\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\|u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)\|\right]$		(A.10)

Each absolute difference $|u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)|$ can be bounded from above using the triangle inequality

|u(Z_{k},Y)-u(Z^{n}_{\varnothing},Y)|\leq|u(Z_{k},Y)-u(\mu,Y)|+|u(\mu,Y)-u(Z^{n}_{\varnothing},Y)|

(A.11)

Since $u$ is by assumption Lipschitz continuous in the first argument, there is a constant $B$ such that

|u(z_{k},y)-u(\mu,y)|\leq B|z_{k}-\mu|

(A.12)

and

|u(\mu,y)-u(z_{\varnothing},y)|\leq B|z_{\varnothing}-\mu|

(A.13)

for any realizations $z_{k}$ and $z_{\varnothing}$ of $Z_{k}$ and $Z^{n}_{\varnothing}$ . Combining equations A.10-A.13 we get

V(n)\leq B\left(\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|\right]+\mathbb{E}\left[|Z^{n}_{\varnothing}-\mu|\right]\right)

Clearly $\mathbb{E}[Z^{n}_{\varnothing}]=\mu$ . Moreover, by assumption that each $Y$ is uniformly bounded above and below, the sequence $(Z^{n}_{\varnothing})$ is uniformly integrable. It follows from the Law of Large Numbers that

\lim_{n\rightarrow\infty}\mathbb{E}[|Z^{n}_{\varnothing}-\mu|]=0

Finally, $\lim_{n\rightarrow\infty}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}-\mu|\right]=0$ follows directly from Lemma A.2. So the RHS of A.11 converges to zero, implying $V(n)\rightarrow 0$ as desired.

A.5 Theorem A.2 implies Theorem A.1

In an abuse of notation, let $P^{n}\sim F$ mean that $Y_{\mathbb{x}_{n}}\sim_{iid}F$ across all covariate vectors $\mathbb{x}_{n}$ . We have already shown in Theorem 3.1 that $\lim_{n\rightarrow\infty}\mathbb{E}_{P^{n}\sim F}(v_{n}(P))=0$ for any distribution $F$ . Now suppose instead that Assumption 1 is satisfied. By de Finetti’s theorem, there exists a set $\Theta$ , family of conditional measures $(\pi_{\theta})_{\theta\in\Theta}$ , and measure $\nu\in\Delta(\Theta)$ such that

\displaystyle V(n,\mathbb{x})=\int_{\Theta}\mathbb{E}_{P^{n}\sim F_{\theta}}(v_{n}(P,\mathbb{x}_{n}))d\nu(\theta)

where the inner expectation converges to zero for every $\theta$ by Theorem A.2. By assumption that $u$ is Lipschitz continuous on a compact domain, there exist $\underline{u}$ and $\overline{u}$ such that $u(\hat{y},y)\in[\underline{u},\overline{u}]$ for all $(\hat{y},y)$ . So $\mathbb{E}_{P^{n}\sim F_{\theta}}(v_{n}(P,\mathbb{x}_{n}))$ is pointwise bounded above by $\overline{u}-\underline{u}$ , and the Dominated Convergence Theorem implies $\lim_{n\rightarrow\infty}V(n,\mathbb{x})=0$ , as desired.

A.6 Proof of Theorem 3.2

Throughout the proof we set $s=0$ , $\mu=0$ and $\sigma^{2}=\mathbb{E}(Y_{i}^{2})=1$ without loss of generality. We’ll start by demonstrating that the stated results hold asymptotically (i.e., for large enough $n$ ) and subsequently prove that the bound in (3.4) is sufficient.

(a) As before let $B_{n}\subseteq\{1,\dots,2^{n}\}$ index those $2^{n-b_{n}}$ covariate vectors that agree with the agent’s covariate vector for all covariates in $B$ . Then the black box evaluator’s posterior expectation is the sample average

Z_{B}^{n}=\frac{1}{2^{n-b_{n}}}\sum_{j\in B_{n}}Y_{j}.

We will show that

	$\displaystyle\Delta(n)$	$\displaystyle\equiv\mathbb{E}[\phi(Z_{B}^{n})]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})\right]$
		$\displaystyle=\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]>0$

for large enough $n$ .

We start by analyzing the first difference $\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]$ . Using Taylor’s expansion we get

\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]=\mathbb{E}[\phi^{\prime}(0)Z_{B}^{n}]+\mathbb{E}\left[\frac{\phi^{\prime\prime}(\tilde{Z})}{2}(Z_{B}^{n})^{2}\right]

for some $\tilde{Z}\in[0,Z_{B}^{n}]$ . Note that $\mathbb{E}[Z_{B}^{n}]=\mathbb{E}[Y]=0$ . Moreover, $\phi^{\prime\prime}(\tilde{Z})\geq c_{1}>0$ for some $c_{1}$ , since $\phi$ is strictly convex. Thus

\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]\geq c_{1}\mathbb{E}[(Z_{B}^{n})^{2}]=\frac{c_{1}}{2^{(1-\alpha_{b})n}}

(A.14)

Next turn to $\mathbb{E}[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)]$ . For each term inside the maximum we have that

\phi(Z_{k})-\phi(0)\leq c_{2}|Z_{k}|

(A.15)

where the latter inequality follows from the fact that $\phi^{\prime}$ is continuous on a compact set, and hence bounded by some $c_{2}\geq 0$ . Thus

\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]\leq c_{2}\mathbb{E}\left[\max\{|Z_{k}|\}\right]

From our proof of Proposition A.2 it follows that

\mathbb{E}[\max\{|Z_{k}|\}]\leq\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}

And, thus

\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]\leq c_{2}\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}

Combining the bounds from steps 1 and 2 we get

\Delta(n)\geq c_{1}\frac{1}{2^{(1-\alpha_{b})n}}-2c_{2}\frac{1}{2^{(1-\alpha_{h})n}}\sqrt{\log(K_{n})}

The RHS is positive for all large $n$ if and only if

\frac{2^{(1-\alpha_{h})n}}{2^{(1-\alpha_{b})n}}\xrightarrow{n\rightarrow\infty}\infty

since $\sqrt{\log(K_{n})}$ has sub-exponential but non-constant asymptotics. This condition is satisfied if and only if $\alpha_{b}>\alpha_{h}$ .

(b) Since $-\phi$ is convex, the above arguments apply to show that

\mathbb{E}[\phi(Z_{B}^{n})-\phi(0)]\leq-\frac{c_{1}}{2^{(1-\alpha_{b})n}}

for some $c_{1}>0$ , while

	$\displaystyle\mathbb{E}\left[\min_{1\leq k\leq K_{n}}\phi(Z_{k})-\phi(0)\right]$	$\displaystyle=-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}-\phi(Z_{k})-(-\phi(0))\right]$
		$\displaystyle\geq-c_{2}\frac{1}{2^{(1-\alpha_{h})n-1}}\sqrt{\log(K_{n})}$

for some $c_{2}>0$ . The desired conclusion follows.

Finally, we show that the bound in (3.4) is sufficient for the comparison in part $(a)$ of the result (with identical computations applying to part $(b)$ ). Suppose $\phi(.)$ is strictly convex and denote $C=\frac{2c_{2}}{c_{1}}$ , where $c_{1},c_{2}>0$ are the constants used above, respectively reflecting $\phi$ ’s lowest degree of convexity ( $c_{1}=\inf_{y\in[-\overline{y},\overline{y}]}|\phi^{\prime\prime}(y)|$ ) and largest growth rate ( $c_{2}=\sup_{y\in[-\overline{y},\overline{y}]}|\phi^{\prime}(y)|$ ). Following the proof of part (a), the Black Box is preferred if

2^{(\alpha_{b}-\alpha_{h})n}>C\sqrt{\log(K_{n})}

Since $K_{n}\leq 2^{n}$ , this inequality is satisfied if

(\alpha_{b}-\alpha_{r})n-\frac{1}{2}\log_{2}(n)>\log_{2}(C)

The above inequality implicitly defines a threshold on the sufficient number of covariates $N(C)$ , exceeding which the Black Box is preferred.

A.7 Result Extending Theorem 3.2 Part (a)

Consider a model in which the evaluator chooses an action $a$ given the realization of the agent’s covariates, and the evaluator and agent share the payoff function $-(a-y)^{2}$ . The following result shows that the conclusion of Part (a) of Theorem 3.2 extends for non-binary types $y$ .

Proposition A.3.

There exists an $N$ sufficiently large such that the agent prefers the black box evaluator for all $n\geq N$ .

Proof.

Throughout the proof set $s=0$ , $\mathbb{E}[Y]=0$ and $\sigma^{2}=\mathbb{E}(Y_{i}^{2})=1$ without loss. We will show that

	$\displaystyle\mathbb{E}[u(Z_{B}^{n},y)]$	$\displaystyle-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)\right]$
		$\displaystyle=\mathbb{E}[u(Z_{B}^{n},y)-u(0,y)]-\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)-u(0,y)\right]>0$

for large enough $n$ .

Let $x_{B}=(x_{i})_{i\in B}$ denote the covariates that Black Box observes, and as before let $Z_{B}^{n}=\mathbb{E}[y\mid x_{B}]$ denote Black Box’s (random) posterior expectation. The optimal action choice $a=Z_{B}^{n}$ yields expected payoff $\text{Var}(y\mid x_{B})$ . By the Law of Total Variance, $\mathbb{E}[-\text{Var}(y\mid x_{B})]=\text{Var}(Z_{B}^{n})-\text{Var}(Y)$ . Since additionally $\mathbb{E}[u(0,y)]=\text{Var}(y)$ , we obtain

\mathbb{E}[u(Z_{B}^{n},y)-u(0,y)]=\mathbb{E}\left[(Z_{B}^{n})^{2}\right]=\frac{1}{2^{(1-\alpha_{B})n}}.

Now turn to $\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},y)-u(0,y)\right]$ . By Lipschitz continuity of $u$ , there is a constant $c_{2}$ such that $u(z_{k},y)-u(0,y)\leq c_{2}|z_{k}|$ holds pointwise for each realization of $(z_{k},y)$ . So

\mathbb{E}\left[\max_{1\leq k\leq K_{n}}u(Z_{k},Y)-u(0,Y)\right]\leq c_{2}\mathbb{E}[\max\{|Z_{k}|\}]

The remainder of the proof proceeds identically to the proof of Theorem 3.2. ∎

Appendix B Proofs for Results in Sections 4 and 5

B.1 Proof of Corollary 1

We continue in the general setting outlined in the proof of Theorem A.1. Fix any realization $\mathbb{x}_{\mathcal{S}}=(x_{1},\dots,x_{s})$ of the standard covariates. As in the proof of Theorem 3.1, there are $2^{n-s}$ covariate vectors $\mathbb{x}_{n}\in\{0,1\}^{n}$ with positive probability conditional on $\mathbb{x}_{\mathcal{S}}$ . Index these by $j=1,\dots,2^{n-s}$ , and define

Y^{\mathbb{x}_{\mathcal{S}}}_{j}\equiv\mathbb{E}_{P^{n}}\left[Y\mid(X_{1},\dots,X_{n})=\mathbb{x}^{j}_{n}\right]

to be the expected type given covariate vector $\mathbb{x}_{n}^{j}$ . For each covariate vector $\mathbb{x}_{n}$ and each disclosure set $D_{k}\subseteq\{s+1,\dots,n\}$ , there is a corresponding set of covariate vectors $S_{k}$ such that the evaluator’s posterior expectation after the agent discloses his covariates in set $D_{k}$ is

Z_{k}^{\mathbb{x}_{\mathcal{S}}}=\frac{\sum_{j\in S_{k}}Y^{\mathbb{x}_{\mathcal{S}}}_{j}}{|S_{k}|}.

Different from the proof of Theorem 3.1, there are now $\overline{K}_{n}=\sum_{j=0}^{h_{n}}{n-s\choose j}2^{j}$ unique sets $S_{k}$ (ranging over not only the different possible sets of covariates to disclose but also their values). By the Binomial Theorem,

\sum_{j=0}^{h_{n}}{n-s\choose j}2^{j}\leq\sum_{j=0}^{n-s}{n-s\choose j}2^{j}=3^{n-s}.

Following the proof of Lemma A.1, we obtain that

\mathbb{E}\left(\max_{1\leq k\leq\overline{K}_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu|\right)\leq\frac{1}{2^{n-s-h_{n}}}C\sqrt{\log(\overline{K}_{n})}\leq\frac{1}{2^{n(1-\alpha_{h})-s}}C\sqrt{\log(3^{n-s})}

which again converges to zero by assumption that $\alpha_{h}<1$ . Finally observe that

	$\displaystyle\mathbb{E}\left[\max_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\left(\max_{1\leq k\leq K_{n}}\|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu\|\right)\right]$	$\displaystyle\leq\mathbb{E}\left[\sum_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\max_{1\leq k\leq K_{n}}\|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu\|\right]$
		$\displaystyle=\sum_{\mathbb{x}_{\mathcal{S}}\in\{0,1\}^{s}}\mathbb{E}\left[\max_{1\leq k\leq K_{n}}\|Z_{k}^{\mathbb{x}_{\mathcal{S}}}-\mu\|\right].$

Since each $\mathbb{E}\left[\max_{1\leq k\leq K_{n}}|Z_{k}^{\mathbb{x}_{\mathcal{S}}}|\right]\rightarrow 0$ as $n\rightarrow\infty$ , the RHS converges to zero. We thus obtain the analogue of Lemma A.2 for the expected maximum value of context, and the remainder of the proof proceeds identically to Theorem 3.1.

B.2 Proof of Proposition 4.1

Throughout this proof, we set $s=0$ for simplicity of notation.

Let $(\sigma^{*},\mu^{*})$ denote a typical PBE, where $\sigma^{*}$ is the Sender’s disclosure strategy and $\mu^{*}$ is the Receiver’s belief function. Fixing any such equilibrium, we use $Z_{\mu^{*}}(d)$ to denote the Receiver’s posterior expectation given disclosure $d$ . We first prove that at least one pure-strategy equilibrium always exists.

Proposition B.1.

For every n and f there exists a pure-strategy f-context equilibrium.

Proof.

Consider a candidate equilibrium $(\sigma^{*},\mu^{*})$ , where $\sigma^{*}(\mathbf{x}_{n})=\varnothing$ for all $\textbf{x}_{n}\in\{0,1\}^{n}$ (which is clearly a feasible disclosure for all agents). The Receiver’s beliefs at disclosure $\varnothing$ are pinned down by Bayes’ rule. For any other disclosure $d\neq\varnothing$ , we construct out-of-equilibrium beliefs such that $u(Z_{\mu*}(\varnothing))\geq u(Z_{\mu^{*}}(d))$ . This is always possible, for example by setting $Z_{\mu*}(\varnothing)=Z_{\mu^{*}}(d)$ for every $d$ . Then by construction reporting $\varnothing$ is a best response for any $\textbf{x}_{n}$ , so we are done. ∎

Consider any function $f$ and any pure-strategy equilibrium $(\sigma^{*},\mu^{*})$ of the $f$ -context disclosure game. Let $d_{1},\dots,d_{N}$ index the disclosures that have positive probability under $\sigma^{*}$ (i.e., all $d\in\mathcal{D}$ such that $\sigma^{*}(\mathbb{x}_{n})=d$ for some $\mathbb{x}_{n}$ ). For each such disclosure $d_{i}$ ,

Z_{\mu^{*}}(d_{i})=\frac{1}{|\{x:\sigma^{*}(x)=d_{i}\}|}\sum_{x:\sigma^{*}(x)=d_{i}}f(x)

is the evaluator’s posterior expectation upon observing disclosure $d_{i}$ . Given the evaluator’s payoff function, the optimal action for the evaluator is precisely $Z_{\mu^{*}}(d_{i})$ . Let

d^{*}=\left(H^{*},(\mathcal{X}^{*}_{i})_{i\in H^{*}}\right):=\arg\max_{1\leq i\leq N}u(Z_{\mu^{*}}(d_{i}))

(B.1)

be the disclosure that yields the highest payoff to the Sender. Then it must be that $\sigma^{*}(\mathbb{x}_{n})=d^{*}$ for every covariate vector $\mathbb{x}_{n}$ for which disclosure $d^{*}$ is feasible. Otherwise $d^{*}$ would be a profitable deviation. Hence the evaluator’s posterior expectation in this equilibrium is the same as it would have been given disclosure of $d^{*}$ in our main model. So

u(Z_{\mu^{*}}(d^{*}))\leq\max_{\mathbb{x}_{n}\in\{0,1\}^{n}}v(f,\mathbb{x}_{n}).

Since the payoff received by an agent with any other covariate vector cannot exceed $u(Z_{\mu^{*}}(d^{*}))$ (by (B.1)), we have the desired result.

B.3 Result for Mixed Strategy Equilibria

In this part we restrict to equilibria $(\sigma^{*},\mu^{*})$ with the property that $\operatorname*{arg\,max}_{\hat{y}\in A_{(\sigma^{*},\mu^{*})}}u(\hat{y})$ is unique on the set $A_{(\sigma^{*},\mu^{*})}$ of posterior expectations with positive probability in this equilibrium. Call these equilibria generic. (A sufficient condition for all equilibria to be generic is if $u$ is strictly monotone.)

For each $n$ and $f$ , let $v^{D}(f,\mathbb{x}_{n})$ denote the highest payoff that an agent with covariate vector $\mathbb{x}_{n}$ receives in any generic equilibrium (potentially mixed) of the $f$ -context disclosure game. Further define

v^{D}_{f}(n)=\max_{\mathbb{x}_{n}}v^{D}(f,\mathbb{x}_{n})

and

V^{\mathcal{D}}(n)=\mathbb{E}[v^{D}_{f}(n)]

where the expectation is with respect to the realization of $f$ .

Proposition B.2.

Suppose Assumption 1 holds and $u(.)$ is twice continuously differentiable. Then $\lim_{n\rightarrow\infty}V^{\mathcal{D}}(n)=0$ .

Proof.

Fix $n$ , $f$ , and a context equilibrium $(\sigma^{*},\mu^{*})$ of the $f$ -context disclosure game. Let $\mathcal{Z}^{*}\subseteq[-\overline{y},\overline{y}]$ be the compact set of all equilibrium posterior expectations that are realized with positive probability in this equilibrium. Further, denote

Z_{(1)}^{*}=\arg\max_{z\in\mathcal{Z}^{*}}u(z)

to be the most-preferred achievable posterior expectation, which is unique by assumption of genericity of the equilibrium.

Since $Z_{(1)}^{*}$ is the best attainable posterior expectation, an agent achieves $Z_{(1)}^{*}$ in equilibrium if and only if it is feasible. (Otherwise, the agent can profitably deviate to the feasible disclosure that induces this posterior expectation.)

Let $\mathcal{X}^{*}\subseteq\{0,1\}^{n}$ denote the set of agents who have a feasible disclosure that achieves $Z_{(1)}^{*}$ . Let $\mathcal{D}(\mathcal{X}^{*})$ be the set of disclosures that agents in $\mathcal{X}^{*}$ send with positive probability in equilibrium. By the logic above, $\mathcal{D}(\mathcal{X}^{*})\cap\mathcal{D}(\mathcal{X}\setminus\mathcal{X}^{*})=\varnothing$ . Using the structure of this equilibrium we can write

\mathbb{E}[Y]=Z_{(1)}^{*}p_{\mathcal{X}^{*}}+(1-p_{\mathcal{X}^{*}})\mathbb{E}[Y|X\notin\mathcal{X}^{*}]

(B.2)

where $p_{\mathcal{X}^{*}}$ is the ex-ante probability that the agent’s covariate vector belongs to $\mathcal{X}^{*}$ , and $\mathbb{E}[Y|X\notin\mathcal{X}^{*}]$ is the expectation of the agent’s type given that his covariate vector does not belong to $\mathcal{X}^{*}$ . Here we utilize the fact that the evaluator’s posterior expectation is constant at $Z_{(1)}^{*}$ across all agents with covariate vectors in $\mathcal{X}^{*}$ .³⁰³⁰30In general this does not have to be the case. We rule this out in the definition of the equilibrium.

Now, consider the following alternative “strategy” $\sigma_{0}$ , which relaxes the feasibility constraint: For any $\mathbb{x}\in\mathcal{X}\setminus\mathcal{X}^{*}$ let $\sigma_{0}(\textbf{x})\equiv\sigma^{*}(\textbf{x})$ , i.e., the disclosures are the same as in the original equilibrium. Further choose some arbitrary disclosure $d_{0}\in\mathcal{D}(\mathcal{X}^{*})$ and let $\sigma_{0}(\textbf{x})=d_{0}$ for all $\mathbb{x}\in\mathcal{X}^{*}$ . The Receiver’s posterior expectation following observation of disclosure $d_{0}$ is

Z_{0}=\frac{\sum_{x\in\mathcal{X}^{*}}Y_{x}}{|\mathcal{X}^{*}|}

and, analogous to (B.2), we can write

\mathbb{E}[Y]=Z_{0}p_{\mathcal{X}^{*}}+(1-p_{\mathcal{X}^{*}})\mathbb{E}[Y|X\notin\mathcal{X}^{*}]

(B.3)

Combining equations (B.2) and (B.3) we conclude:

Z_{(1)}^{*}=\frac{\sum_{x\in\mathcal{X}^{*}}Y_{x}}{|\mathcal{X}^{*}|}

which almost surely converges to $\mathbb{E}[Y]$ so long as $|\mathcal{X}^{*}|\xrightarrow{n\rightarrow\infty}\infty$ . Since the $Y_{x}$ ’s are uniformly bounded, this also implies $\mathbb{E}[Z^{*}_{(1)}]\rightarrow\mathbb{E}[Y]$ , as desired. We now demonstrate that indeed $|\mathcal{X}^{*}|\xrightarrow{n\rightarrow\infty}\infty$ .

For any disclosure $d$ denote by $C_{d}\subseteq\{0,1\}^{n}$ the set of all covariate vectors x given which $d$ is feasible. Since $Z_{(1)}^{*}$ is achieved by all agents for whom $Z_{(1)}^{*}$ is feasible, it must be that for every disclosure $d\in\mathcal{D}(\mathcal{X}^{*})$ we have $C_{d}\subseteq\mathcal{X}^{*}$ . Then for any $d\in\mathcal{D}(\mathcal{X}^{*})$ ,

|\mathcal{X}^{*}|\geq|C_{d}|\xrightarrow{n\rightarrow\infty}\infty.

where the limit follows by assumption that $\alpha_{h}<1$ . This completes the proof. ∎

B.4 Proof of Proposition 5.1

We again continue in the general setting outlined in the proof of Theorem A.1, and adopt the conventions that $\mathbb{E}(Y)=\mu$ while $\text{Var}(Y)=1$ . We prove the result for a weakening of Assumptions 5 and 6 to the following.

Assumption 13.

(Y_{\mathbb{x}_{R_{n}},\mathbb{x}_{-R_{n}}}:(x_{i})_{i\in R_{n}\backslash\mathcal{S}}\in\{0,1\}^{r_{n}-s})

has the same distribution as $(\widetilde{Y}_{1},\dots,\widetilde{Y}_{2^{r_{n}}})$ .

Recalling that $r_{n}$ is the number of relevant covariates, there are $2^{r_{n}}$ distinct expected conditional types, which we can enumerate as $Y_{1},\dots,Y_{2^{r_{n}}}$ . If disclosure $k$ involves disclosing $k_{r}$ relevant covariates, then there is a set $S_{k}$ of size $2^{r_{n}-k_{r}}$ such that the evaluator’s posterior expectation can be written

Z_{k}=\frac{1}{2^{n-h_{n}}}\sum_{j\in S_{k}}2^{n-r_{n}-(h_{n}-k_{r})}Y_{j}=\frac{1}{2^{r_{n}-k_{r}}}\sum_{j\in S_{k}}Y_{j}.

As in Step 1 of the proof of Theorem 3.1 (Section A.2.1), replace each $Y_{j}$ with a variable $Y_{j}^{k}\stackrel{{\scriptstyle d}}{{=}}Y_{j}$ which is independent across disclosure sets. This yields the random variables

Z_{k}^{ind}=\frac{1}{2^{r_{n}-k_{r}}}\sum_{j\in S_{k}}Y^{k}_{j}.

As in the proof of Proposition A.1, it follows from Lemma 1 that

\mathbb{E}[\max\{Z_{1},\dots,Z_{K_{n}}\}]\leq\mathbb{E}[\max\{Z_{1}^{ind},\dots,Z_{K_{n}}^{ind}\}].

Next define

Z^{iid}_{k}=\frac{1}{2^{r_{n}-h_{n}}}\sum_{j=1}^{2^{r_{n}-h_{n}}}Y_{j}^{k}

and note that these are identically and independently distributed with shared variance

Var(Z^{iid}_{k})=\frac{1}{2^{r_{n}-h_{n}}}.

Following the arguments in Step 2 of the proof of Theorem 3.1 (Section A.2.2), we get

\mathbb{E}[\max\{Z^{ind}_{1},\dots,Z^{ind}_{K_{n}}\}]\leq\mathbb{E}[\max\{Z^{iid}_{1},\dots,Z^{iid}_{K_{n}}\}].

where as before $K_{n}=\sum_{j=0}^{h_{n}}\binom{n}{j}$ . Further, by the argument given in Step 3 of the proof of Theorem 3.1 (Section A.2.3),

\lim_{n\rightarrow\infty}|V^{iid}_{n}-V^{N}_{n}|=0

where

V^{N}_{n}\equiv\mathbb{E}\left[\max\{Z_{1}^{N},\dots,Z_{K_{n}}^{N}\}\right]

and $Z_{k}\sim\mathcal{N}\left(\mu,\frac{1}{2^{r_{n}-h_{n}}}\right)$ . Again applying the bound from Berman (1964), we have

V^{N}_{n}\leq\frac{1}{2^{r_{n}-h_{n}}}C\sqrt{\log(K_{n})}\leq\frac{1}{2^{n(\alpha_{r}-\alpha_{h})}}C\sqrt{n}.

By assumption that $\alpha_{r}>\alpha_{h}$ , the right-hand expression converges to zero as $n$ grows large, concluding the proof.

B.5 Supporting Materials for Section 5.3

We show here that both main results generalize to the setting with noise. Instead of repeating the proof step by step, we emphasize what changes must be made in order for the proofs to translate. To be consistent with our previous notation, we again operate with $\tilde{y}_{i,n}=Y_{i}+\varepsilon_{n}$ where $i$ enumerates covariate vectors. In all subsequent notation a tilde will indicate an object that includes noise $\varepsilon_{n}$ , and objects without one are the same as in the main text. Observe that after adding the noise term $\varepsilon_{n}$ , the key objects from the proofs transform in the following way: $Z_{k}$ is replaced by $\tilde{Z}_{k}=Z_{k}+\varepsilon_{n}$ , and $\mathbb{E}[\max_{k}Z_{k}]$ is replaced by $\mathbb{E}[\max_{k}\tilde{Z}_{k}]$ . Since $\mathbb{E}[\max_{k}\tilde{Z}_{k}]=\mathbb{E}[\max_{k}Z_{k}]$ , the proof of Theorem 3.1 translates directly.

To demonstrate Theorem 3.2, we will show how to adjust the proof of part (a) with part (b) following analogously. The first change is that (A.14) becomes

\mathbb{E}[\phi(\tilde{Z}_{B}^{n})-\phi(0)]\geq c_{1}\mathbb{E}[(Z_{B}^{n})^{2}]-c_{1}\sigma_{\varepsilon,n}^{2}

The inequality in (A.15) is also modified to

\phi(\tilde{Z}_{k})-\phi(0)\leq c_{2}|Z_{k}|+c_{2}|\varepsilon_{n}|

for every disclosure $k$ . Thus we obtain

\Delta(n)\geq c_{1}\frac{1}{2^{(1-\alpha_{b})n}}-2c_{2}\frac{1}{2^{(1-\alpha_{h})n}}\sqrt{\log(K_{n})}+c_{2}\mathbb{E}[|\varepsilon_{n}|]-c_{1}\sigma_{\varepsilon,n}^{2}

We will show that the ratio $\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}}$ grows arbitrary large with $n$ , thus asymptotically exceeding $\frac{c_{1}}{c_{2}}$ . Fixing some $d>0$ and applying Markov inequality, we obtain

\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}}\geq d\cdot\mathbb{P}\left(\frac{|\varepsilon_{n}|}{\sqrt{\sigma_{\varepsilon,n}^{2}}}\geq d\sqrt{\sigma_{\varepsilon,n}^{2}}\right)

(B.4)

If we denote the CDF of $\frac{\varepsilon_{n}}{\sqrt{\sigma_{\varepsilon,n}^{2}}}$ as $G_{n}$ , the RHS of the above inequality can be rewritten as $d\cdot\left(1+G_{n}(-d\sqrt{\sigma_{\varepsilon,n}^{2}}))-G_{n}(d\sqrt{\sigma_{\varepsilon,n}^{2}})\right)$ . As $n$ grows large, the term in brackets tends to $1-2g_{n}(0)d\sqrt{\sigma_{\varepsilon,n}^{2}}+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right)$ . We will omit the $o(\cdot)$ term until the end of the proof.

Fix an arbitrary $\delta>0$ and let $d=\frac{c_{1}}{c_{2}}+2\delta$ . Further, fix $N$ such that $\sqrt{\sigma_{\varepsilon,n}^{2}}\leq\frac{1}{2gd}(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta})$ , where $g=\max_{n}g_{n}(0)<\infty$ . Then since $\sigma_{\varepsilon,n}^{2}$ is decreasing and $g_{n}(0)\leq g$ we have that for all $n\geq N$

2g_{n}(0)d\sqrt{\sigma_{\varepsilon,n}^{2}}\leq 2gd\frac{1}{2gd}\left(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta}\right)

Combining this inequality with (B.4) we get

\frac{\mathbb{E}[|\varepsilon_{n}|]}{\sigma_{\varepsilon,n}^{2}}\geq\left(\frac{c_{1}}{c_{2}}+2\delta\right)\left(1-\left(1-\frac{c_{1}+c_{2}\delta}{c_{1}+2c_{2}\delta}\right)\right)+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right)=\left(\frac{c_{1}}{c_{2}}+\delta\right)+o\left(\sqrt{\sigma_{\varepsilon,n}^{2}}\right)

for all $n\geq N$ . Since $\delta$ is an arbitrary positive number, this concludes the proof.

B.6 Proof of Proposition 5.2

Throughout the proof we assume $u(x)\equiv x$ and $s=0$ . In addition, for simplicity of notation, we enumerate feasible disclosures by $k$ and denote the corresponding posteriors (as random variables) as $Z_{k}^{n}:=\rho_{f}(d_{k})$ . To upper bound the value of context, we apply a result from Arnold and Groeneveld (1979):

\begin{split}&\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}-\mathbb{E}\left[\frac{\sum_{i=1}^{K_{n}}Z_{i}^{n}}{K_{n}}\right]\right]\right|\leq\\ &\sqrt{\left(1-\frac{1}{K_{n}}\right)\sum_{i=1}^{K_{n}}Var(Z_{i}^{n})+\frac{1}{K_{n}}\sum_{i=1}^{K_{n}}\left(\sqrt{K_{n}}\left(\mathbb{E}[Z_{i}^{n}]-\frac{\sum_{i=1}^{K_{n}}\mathbb{E}[Z_{i}^{n}]}{K_{n}}\right)\right)^{2}}\end{split}

(B.5)

By Assumption 9, inequality B.5 simplifies to

\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}\right]-\mu\right|\leq\sqrt{\left(1-\frac{1}{K_{n}}\right)\sum_{i=1}^{K_{n}}Var(Z_{i}^{n})}

Finally, Assumption 10 implies that $Var(Z_{k}^{n})=o(\frac{1}{K_{n}})$ for every disclosure $k$ . Hence

\left|\mathbb{E}\left[\max_{k\in\{1,...,K_{n}\}}Z_{k}^{n}\right]-\mu\right|\leq\sqrt{\left(1-\frac{1}{K_{n}}\right)K_{n}o(K_{n}^{-1})}

which yields the desired result after taking a limit in $n$ . The argument for the lower bound follows the same line of reasoning and is thus omitted.

$\displaystyle\widetilde{M}_{C}^{q}\|B$	$\displaystyle\succeq(X_{q}+Y_{q})\|B$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}\|B+Y_{q}$	$\displaystyle\text{since }Y_{q}\perp\!\!\!\perp(X_{1},\dots,X_{q},Y_{1})$
	$\displaystyle\succeq X_{q}\|B+Y_{q}\|\tilde{B}$	$\displaystyle\text{since }Y_{q}\succeq Y_{q}\mid\widetilde{B}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}X_{q}\|B+Y_{1}\|B$	$\displaystyle\text{since }Y_{1}\mid B\stackrel{{\scriptstyle d}}{{=}}Y_{q}\mid\widetilde{B}$
	$\displaystyle\stackrel{{\scriptstyle d}}{{=}}(X_{q}+Y_{1})\|B\stackrel{{\scriptstyle d}}{{=}}M_{C}^{q}\|B$

The Value of Context: Human versus Black Box Evaluators

Abstract

1 Introduction

1.1 Related Literature

1.1.1 (Asymptotic) Learning

1.1.2 Strategic Information Disclosure

1.1.3 Human vs AI Evaluation

2 Model

2.1 Setup

Example 1 (Job Interview).

Example 2 (Medical Prediction).

Example 3 (Higher Evaluations are Better).

Example 4 (More Accurate Evaluations are Better).

2.2 Evaluation of the agent

2.3 Value of context

Definition 2.1 (Value of Context).

Example 5 (High Value of Context).

Example 6 (Low Value of Context).

2.4 Model Uncertainty

Assumption 1 (Exchangeability).

Assumption 2 (Constant Unpredictability of YY).

Example 7.

Example 8.

Example 9 (Only One Covariate is Relevant).

Example 10 (Higher Values are Better).

2.5 Expected Value of Context

Definition 2.2 (Expected Value of Context).

Definition 2.3.

3 Main Results

3.1 Example

3.2 The Expected Value of Context

Theorem 3.1.

Example 11.

3.3 Human versus Black Box

Assumption 3.

Theorem 3.2.

Example 12.

Example 13.

4 Extensions

4.1 Max value of context across agents

Definition 4.1.

Corollary 1.

4.2 Strategic Disclosure

Definition 4.2.

Proposition 4.1.

5 Relaxing our Assumptions on the Prior

5.1 Not all covariates are relevant

Example 14.

Assumption 4 (Irrelevance).

Assumption 5 (Exchangeability).

Assumption 6 (Constant Unpredictability of YY).

Proposition 5.1.

5.2 Certain nonstandard covariates have known effects

Example 15.

Assumption 7 (Exchangeability).

Example 16.

5.3 Information accumulates in nn

Assumption 8.

5.4 Sufficient residual uncertainty

Assumption 9 (Unbiased).

Assumption 10 (Fast Concentration).

Proposition 5.2.

6 Conclusion

References

Appendix A Proof of Generalization of Theorem 3.1

Assumption 11.

Theorem A.1.

Assumption 12.

Theorem A.2.

A.1 Outline for Proof of Theorem A.2

Proposition A.1.

Proposition A.2.

A.2 Proof of Proposition A.1

A.2.1 Replacing ZkZ_{k}’s with independent variables Zki​n​dZ_{k}^{ind}

Definition A.1.

Lemma A.1.

Proof.

Sublemma 1.

Proof.

Sublemma 2.

The Value of Context:
Human versus Black Box Evaluators

Assumption 2 (Constant Unpredictability of $Y$ ).

Assumption 6 (Constant Unpredictability of $Y$ ).

5.3 Information accumulates in $n$

A.2.1 Replacing $Z_{k}$ ’s with independent variables $Z_{k}^{ind}$

A.2.2 Replacing $Z_{k}^{ind}$ with i.i.d. Variables $Z_{k}^{iid}$