Agreement Implies Accuracy for Substitutable Signals
Inspired by Aumann’s agreement theorem, [Aar05] studied the amount of communication necessary for two Bayesian experts to approximately agree on the expectation of a random variable. Aaronson showed that, remarkably, the number of bits does not depend on the amount of information available to each expert. However, in general the agreed-upon estimate may be inaccurate: far from the estimate they would settle on if they were to share all of their information. We show that if the experts’ signals are substitutes—meaning the experts’ information has diminishing marginal returns—then it is the case that if the experts are close to agreement then they are close to the truth. We prove this result for a broad class of agreement and accuracy measures that includes squared distance and KL divergence. Additionally, we show that although these measures capture fundamentally different kinds of agreement, Aaronson’s agreement result generalizes to them as well.
1 Introduction
Suppose that Alice and Bob are honest, rational Bayesians who wish to estimate some quantity—say, the unemployment rate one year from now. Alice is an expert on historical macroeconomic trends, while Bob is an expert on contemporary monetary policy. They convene to discuss and share their knowledge with each other until they reach an agreement about the expected value of the future unemployment rate. Alice and Bob could reach agreement by sharing everything they had ever learned, at which point they would have the same information, but the process would take years. How then should they proceed?
In the seminal work “Agreeing to Disagree,” Aumann [Aum76] observed that Alice and Bob can reach agreement simply by taking turns sharing their current expected value for the quantity. In addition to modeling communication between Bayesian agents, protocols similar to this one model financial markets: each trader shares partial information about their expected value on their turn (discussed in Section 5). A remarkable result by Scott Aaronson [Aar05] shows that if Alice and Bob follow certain protocols of this form, they will agree to within with probability by communicating bits.111To ensure that each message is short, Alice and Bob share discretized versions of their estimates; we discuss this in Section 2. Notably, this bound only depends on the error Alice and Bob are willing to tolerate, and not on the amount of information available to them.
Absent from Aaronson’s results, however, is what Alice and Bob agree on. In particular, there is no guarantee that Alice and Bob will be accurate, meaning their agreed-upon estimate will be close (in e.g. expected squared distance) to what they would believe if they shared all of their information. In fact, they might agree on something highly inaccurate: suppose that Alice and Bob have independent, uniformly random bits , and wish to estimate the XOR . Alice and Bob agree from the onset, as from each of their perspectives, the expected value of is . Yet this expectation is far from the best estimate given their collective knowledge, which is either or . So while agreement is fundamental to understanding communication between Bayesians—in Aumann’s terms, they cannot “agree to disagree”—agreement is far from the whole story. An important open problem is therefore what assumptions guarantee that Alice and Bob are accurate once they agree.
We address this open problem by introducing a natural condition, called rectangle substitutes, under which agreement implies accuracy. Rectangle substitutes is a notion of informational substitutes: the property that additional information has diminishing marginal returns. The notion of substitutes is ubiquitous in optimization problems, and informational substitutes conditions have recently been used to analyze equilibria in markets [CW16]. We show that under the rectangle substitutes condition, any protocol leading to agreement will also lead to accuracy. We then extend these results beyond the case of squared error, to a broad family of measures of agreement and accuracy including KL divergence.
1.1 Overview of approach and results
In [Aar05], Alice and Bob are said to agree if the squared distance between their estimates is small. Likewise, we can say that Alice and Bob are accurate if the squared distance between each of their estimates and the truth is small. In Section 3 we present our first main result: under these definitions, if the information structure satisfies rectangle substitutes, then agreement implies accuracy. In other words, under this assumption, when two Bayesians agree—regardless of how little information they have shared—they necessarily agree on the truth.
The proof involves carefully partitioning the space of posterior beliefs induced by the protocol.
Agreement is used to show that Alice and Bob usually fall into the same partition element, which means that Bob would not learn much from learning the partition element of Alice’s expectation.
Then, the rectangle substitutes condition is used to show that if Bob were to learn Alice’s partition element, then he would be very close to knowing the truth.
Aaronson measures agreement in terms of squared error, yet other measurements like KL divergence may be better suited for some settings. For example, if Alice and Bob estimate the probability of a catastrophic event as and , respectively, then under squared error they are said to agree closely, but arguably they disagree strongly, as reflected by their large KL divergence. Motivated these different ways to measure agreement, we next ask:
-
1.
Can Aaronson’s protocols be generalized to other notions of agreement, such that the number of bits communicated is independent of the amount of information available to Alice and Bob?
-
2.
Do other notions of agreement necessarily imply accuracy under rectangle substitutes?
In Section 4, we give our second and third main results: the answer to both questions is yes. Specifically, the positive results apply when when measuring agreement and accuracy using Bregman divergences, a class of error measures that includes both squared distance and KL divergence.222The third result holds under an “approximate triangle inequality” condition on the Bregman divergence, which is satisfied by most or all natural choices; indeed, it is nontrivial to construct a Bregman divergence that does not satisfy this property.
Aaronson’s proof of his agreement theorem turns out to be specific to squared distance. Our agreement theorem (Theorem 4.11) modifies Aaronson’s protocol to depend on the particular Bregman divergence, i.e. the relevant error measure. It then proceeds in a manner inspired by Aaronson but using several new ideas. Our proof that agreement implies accuracy under rectangle substitutes for general Bregman divergences also involves some nontrivial changes to our proof for squared distance. In particular, the fact that the length of an interval cannot be inferred from the Bregman divergence between its endpoints necessitates a closer analysis of the partition of Alice’s and Bob’s beliefs.
We conclude in Section 5 with a discussion of connections between agreement protocols and information revelation in financial markets, and discuss an interesting potential avenue for future work.
1.2 Related Work
Our setting is related to but distinct from communication complexity. In that field (e.g. [RY20]), the goal is for Alice and Bob to correctly compute a function of their inputs while communicating as few bits as possible and using any protocol necessary. By contrast, [Aar05] considered a goal of agreement, not correctness, and focused on specific natural protocols, which he showed achieve this goal in a constant number of bits. Our work focuses on Aaronson’s setting. We discuss how our results might be framed in terms of communication complexity in Appendix E.
Our introduction of the substitutes condition is inspired by its usefulness in prediction markets [CW16]. The “expectation-sharing” agreement protocols we study bear a strong similarity to dynamics of market prices. [Ost12] introduced a condition under which convergence of prices in a market implies that all information is aggregated. This can be viewed as an “agreement implies accuracy” condition. We discuss these works and the connection of our work to markets in Section 5. Another similar definition of informational substitutes is used by [NR21] in the context of robust aggregation of forecasts.
Finally, we note that the “agreement protocols” we study are not related to key agreement protocols in cryptography, where the goal is for two communicating parties to jointly construct a shared string for cryptographic use.
2 Preliminaries
2.1 Information Structures
We consider a set of states of the world, with a probability distribution over the world states. There are two experts, Alice and Bob. Alice learns the value of a random variable ; we call Alice’s signal and her signal set. Correspondingly, Bob learns the value of a random variable . These signals each convey partial information about the true state . Alice and Bob are interested in a third random variable . We use the term information structure to refer to the tuple .
We denote by the random variable that is equal to the expected value of conditioned on both Alice’s signal and Bob’s signal . We also define and . For a measurable set , we define ; we define analogously for . Additionally, for , we define , i.e. the expected value of conditioned on the particular value of and the knowledge that . If Alice knows that Bob’s signal belongs to (and nothing else about his signal), then the expected value of conditional on her information is ; we refer to this as Alice’s expectation. Likewise, for , we define . Finally, we define . This is the expectation of a third party who only knows that and .
In general we often wish to take expectations conditioned on (for some ). We will use the shorthand for in such cases.
2.2 Agreement Protocols
The notion of agreement between Alice and Bob is central to our work. We first define agreement in terms of squared error, and generalize to other error measures in Section 4.
Definition 2.1 (-agree).
Let and be Alice’s and Bob’s expectations, respectively ( and are random variables on ). Alice and Bob -agree if .
The constant makes the left-hand side represent Alice’s and Bob’s distance to the average of their expectations.
Our setting follows [Aar05], which examined communication protocols that cause Alice and Bob to agree. In a (deterministic) communication protocol, Alice and Bob take turns sending each other messages. On Alice’s turns, Alice communicates a message that is a deterministic function of her input (i.e. her signal ) and all previous communication, and likewise for Bob on his turns. A rectangle is a set of the form where and . The transcript of the protocol at a time step (i.e. after messages have been sent) partitions into rectangles: for any given sequence of messages, there are subsets such that the protocol transcript at time is equal to this sequence if and only if . For a given communication protocol, we may think of and as random variables. Alice’s expectation at time (i.e. after the -th message has been sent) is and Bob’s expectation at time is . Finally, the protocol terminates at a certain time (which need not be known in advance of the protocol). While typically in communication complexity a protocol is associated with a final output, in this case we are interested in Alice’s and Bob’s expectations, so we do not require an output.
It will be convenient to hypothesize a third party observer, whom we call Charlie, who observes the protocol but has no other information. At time , Charlie has expectation . Charlie’s expectation can also be interpreted as the expectation of according to Alice and Bob’s common knowledge.
The following definition formalizes the relationship between communication protocols and agreement.
Definition 2.2 (-agreement protocol).
Given an information structure , a communication protocol causes Alice and Bob to -agree on if Alice and Bob -agree at the end of the protocol, i.e., if , where the expected value is over Alice’s and Bob’s inputs. We say that a communication protocol is an -agreement protocol if the protocol causes Alice and Bob to -agree on every information structure.
Aaronson defines and analyzes two -agreement protocols.333A minor difference to our framing is that [Aar05] focuses on probable approximate agreement: protocols that cause the absolute difference between Alice and Bob to be at most with probability all but . While the results as presented in this section are stronger than those in [Aar05] (the original results follow from these as a consequence of Markov’s inequality), these results follow from a straightforward modification of his proofs. The first of these is the standard protocol, in which Alice and Bob take turns stating their expectations for a number of time steps that can be computed by Alice and Bob independently in advance of the protocol, and which is guaranteed to be at most .
The fact that exchanging their expectations for time steps results in -agreement is profound and compelling. However, the standard protocol may require an unbounded number of bits of communication, since Alice and Bob are exchanging real numbers. To address this, Aaronson defines another agreement protocol that is truly polynomial-communication (which we slightly modify for our purposes):
Definition 2.3 (Discretized protocol, [Aar05]).
Choose . In the discretized protocol with parameter , on her turn (at time ), Alice sends “low” if her expectation is smaller than Charlie’s by more than , i.e. if ; “high” if her expectation is larger than Charlie’s by more than ; and “medium” otherwise. Bob acts analogously on his turn. At the start of the protocol, Alice and Bob use the information structure to independently compute the time that minimizes . The protocol ends at this time.
Theorem 2.4 ([Aar05, Theorem 4]).
The discretized protocol with parameter is an -agreement protocol with transcript length bits.
In general, we refer to Aaronson’s standard and discretized protocols as examples of expectation-sharing protocols. We will define other examples in Section 4, similar to Aaronson’s discretized protocol but with different cutoffs for low, medium, and high. We also interpret expectation-sharing protocols in the context of markets in Section 5.
2.3 Accuracy and Informational Substitutes
Most of our main results give conditions such that if Alice and Bob -agree, then Alice’s and Bob’s estimates are accurate. By accurate, we mean that Alice’s and Bob’s expectations are close to , i.e., what they would believe if they knew each other’s signals. (After all, they cannot hope to have a better estimate of than ; for this reason we sometimes refer to as the “truth.”) Formally:
Definition 2.5 (-accurate).
Let be Alice’s expectation. Alice is -accurate if . We define -accuracy analogously for Bob.
One cannot hope for an unconditional result stating that if Alice and Bob agree, then they are accurate. Consider for instance the XOR information structure from the introduction: Alice and Bob each receive independent random bits as input, and is the XOR of these bits. Then from the start Alice and Bob agree that the expected value of is exactly , but this value is far from , which is either or .
Intuitively, this situation arises because Alice’s and Bob’s signals are informational complements: each signal is not informative by itself, but they are informative when taken together. On the other hand, we say that signals are informational substitutes if learning one signal is less valuable if you already know the other signal. An extreme example is if for any random variable . Here becomes useless upon learning and vice versa. In [CW16],444We recommend the ArXiv version for the most up-to-date introduction to informational substitutes. the authors discuss formalizations of several notions of informational substitutes. All of these notions capture “diminishing marginal value,” in the sense that, roughly speaking, the value of partial information is a submodular set function. The various definitions proposed by [CW16] only differ in how finely they allow decomposing and to obtain a marginal unit. Our definition has the same format, but uses a decomposition inspired by information rectangles in communication complexity. Recall that we write as shorthand for .
Definition 2.6.
An information structure satisfies rectangle substitutes if it satisfies weak substitutes on every sub-rectangle, i.e., if for every , we have
We will show that under rectangle substitutes, if Alice and Bob approximately agree, then they are approximately accurate.
Interpreting substitutes.
Both sides of the inequality in Definition 2.6 represent the “value” of learning as measured by a decrease in error. The left-hand side gives the decrease if one already knows and that ; the right-hand side gives the decrease if one only knows that . Substitutes thus says: the marginal value of learning is smaller if one already knows than if one does not. This statement should hold for every sub-rectangle . We remark that the inequality can be rearranged to focus instead on the marginal value of rather than . We also note that in the XOR information structure, the left-hand side of the inequality is while the right-hand side is zero: a large violation of the substitutes condition. In the example , the left side is always zero.
[CW16] discusses three interpretations of substitutes, which motivate it as a natural condition. (1) Each side of the inequality measures an improvement in prediction error, here the squared loss, due to learning . Under substitutes, the improvement is smaller if one already knows . (2) Each side measures a decrease in uncertainty due to learning . Under substitutes, provides less information about if one already knows .555Here, uncertainty is measured by variance of one’s belief. Under the KL divergence analogue covered in Section 4.1, uncertainty is measured in bits via Shannon entropy. (3) Each side measures the decrease in distance of a posterior expectation from the truth when learning . The distance to changes less if one already knows .
2.4 The Pythagorean Theorem
We will use the following fact throughout. We defer the proof to Appendix C, where we establish a more general version of this statement.
Proposition 2.7 (Pythagorean theorem).
Let be a random variable, where is a sigma-algebra, and be a random variable defined on . Then
We use the phrase Pythagorean theorem in part because of its form, and in part because it is precisely the familiar Pythagorean theorem when the random variables are viewed as points in a Hilbert space666We do not make use of this abstraction in our work, but we refer the interested reader to [Šid57]. with inner product .
Informally, is a random variable, is the expected value of conditional on some partial information, and is a random variable that only depends on this information. So the theorem applies when is a coarse estimate of and is at least as coarse as , a scenario that often occurs in our setting.
One application of the Pythagorean theorem in our context takes , (the expected value of conditioned on the experts’ signals), and (Alice’s expected value, which only depends on her signal and thus on the signal pair). This particular application, along with the symmetric one taking , allows us to rewrite the rectangle substitutes condition in a form that we will find more convenient:
Remark 2.8.
An information structure satisfies rectangle substitutes if and only if
(1) |
for all .
3 Results for Squared Distance
Our main results show that, under the rectangle substitutes condition, any communication protocol that causes Alice and Bob to agree also causes them to be accurate. We now show the first of these results, which is specific to the squared distance error measure that we have been discussing.
3.1 Agreement Implies Accuracy
Theorem 3.1.
Let be an information structure that satisfies rectangle substitutes. For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.
The crux of the argument is the following lemma.
Lemma 3.2.
Let be an information structure that satisfies rectangle substitutes. Let . Then
Proof of Theorem 3.1.
Consider any protocol that causes Alice and Bob to -agree on . Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Intuitively, is the set of plausible signal pairs according to an external observer of the protocol. Observe that and are random variables, each a function of both and . We have
In the second step, we apply Lemma 3.2 to the information structure restricted to — that is, to , where and . (Note that we use the fact that if satisfies rectangle substitutes, then so does ; this is because a rectangle of is also a rectangle of .) The third step follows by the concavity of . Therefore, Bob is accurate (and Alice is likewise by symmetry). ∎
The proof of Lemma 3.2 relies on the following claim. We defer the proof of Lemma 3.2 (and Claim 3.3) to Appendix B, and instead sketch the proofs here.
Claim 3.3.
For any , it is possible to partition into intervals in a way so that each interval has length at most , and
where denotes the such that , and is defined analogously.777For convenience we define and to be some number greater than .
Intuitively, Claim 3.3 is true because if is small, then and are likely to fall into the same interval.
We now sketch the proof of Lemma 3.2. To see why Claim 3.3 is relevant, recall that we wish to upper bound the expectation of . Let . By the Pythagorean theorem, we have
By using the rectangle substitutes condition for for every , we find that
(2) |
Therefore, we have
(3) |
Claim 3.3 lets us argue that the first of these two terms is small (because and are always within of each other) and that the second term is also small (because conditioned on , is known with high probability). We find that choosing gives us the bound in Lemma 3.2.
Theorem 3.1 is a general result about agreement protocols. Applying the result to Aaronson’s discretized protocol gives us the following result.
Corollary 3.4.
Let be any information structure that satisfies universal rectangle substitutes. For any , Alice and Bob will be -accurate after running Aaronson’s discretized protocol with parameter (and this takes bits of communication).
Remark 3.5.
The discretized protocol is not always the most efficient agreement protocol. For example, Proposition B.1 shows that if the rectangle substitutes condition holds, agreement (and therefore accuracy) can be reached with just bits, an improvement on Corollary 3.4. We discuss communication complexity further in Appendix E. Even if more efficient protocols are sometimes possible, expectation-sharing protocols are of interest because they model naturally-occurring communication processes. For example, they capture the dynamics of prices in markets, which we also discuss in Section 5. More generally, we find it remarkable that Alice and Bob become accurate by running the agreement protocol (indeed any agreement protocol), despite such protocols being designed with only agreement in mind.
Finally, we observe the following important consequence of Theorem 3.1: once Alice and Bob agree, they continue to agree.
Corollary 3.6.
Let be an information structure that satisfies rectangle substitutes. Consider a communication protocol with the property that Alice and Bob -agree after round . Then Alice and Bob -agree on all subsequent time steps.
Proof.
If Alice and Bob -agree then they are -accurate, so in particular . Note that is a decreasing function of , since for any we have
by the Pythagorean theorem. Therefore, for any , we have . Symmetrically, we have . Therefore, , which means that after round , Alice and Bob -agree. ∎
Corollary 3.6 stands in contrast to the more general case, in which it is possible that Alice and Bob “nearly agree for the first time steps, then disagree violently at the -th step” [Aar05, §2.2]. Thus, while the main purpose of Theorem 3.1 is a property about accuracy, an agreement property falls out naturally: under the rectangle substitutes condition, once Alice and Bob are close to agreement, they will remain in relatively close agreement into the future.
3.2 Graceful Decay Under Closeness to Rectangle Substitutes
In a sense, the rectangle substitutes condition is quite strong: it requires that the weak substitutes condition be satisfied on every sub-rectangle. One might hope for a result that generalizes Theorem 3.1 to information structures that almost satisfy the rectangle substitutes condition but do not quite. Let us formally define a notion of closeness to rectangle substitutes.
Definition 3.7.
An information structure satisfies -approximate rectangle substitutes if for every partition of into rectangles,888There are partitions into rectangles that cannot arise from a communication protocol. Our results would apply equally if this condition were instead defined for every partition that could arise from a communication protocol, but we state this condition more generally so that it could be applicable in a broader context than the analysis of communication protocols. the rectangle substitutes condition holds in expectation over the partition, up to an additive constant of , i.e., if we have
(4) |
where is the rectangle containing .
Remark 3.8.
The -approximate rectangle substitutes property is a relaxation of the rectangle substitutes property, in the sense that the two are equivalent if . To see this, first observe that if satisfies rectangle substitutes, then it satisfies Equation 4 with pointwise across all , and thus in expectation. In the other direction, suppose that satisfies -approximate rectangle substitutes. Let and consider the partition of into rectangles that contains and, separately, every other signal pair in its own rectangle. For this partition, Equation 4 reduces precisely to Equation 1 (the rectangle substitutes condition for and ).
Theorem 3.1 generalizes to approximate rectangle substitutes as follows.
Theorem 3.9.
Let be an information structure that satisfies -approximate rectangle substitutes. For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.
Proof.
We first observe that Lemma 3.2 can be modified as follows.
Lemma 3.10.
Let be an information structure that satisfies -approximate rectangle substitutes. Let . Then
The proof of Lemma 3.10 is exactly the same as that of Lemma 3.2, except that Equation 2 (Equation 6 in the full proof) includes an additive term on the left-hand side:
This modified inequality follows immediately from the -approximate rectangle substitutes condition, noting that one partition of into rectangles is . The extra term produces the term in the lemma statement.
To prove the theorem, let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Let be the minimum such that satisfies -approximate rectangle substitutes. Note that : otherwise, by taking the union over the worst-case partitions for each we would exhibit a partition of into rectangles that would violate the -approximate rectangle substitutes property. Therefore we have
As in the proof of Theorem 3.1, the second step follows by applying Lemma 3.2 to the information structure restricted to . ∎
4 Results for Other Divergence Measures
Squared distance is a compelling error measure because it elicits the mean. That is, if you wish to estimate a random variable and will be penalized according to the squared distance between and your estimate, the strategy that minimizes your expected penalty is to report the expected value of (conditional on the information you have). This is in contrast to e.g. absolute distance as an error measure, which would instead elicit the median of your distribution. The class of error measures that elicit the mean is precisely the class of Bregman divergences (defined below).
In this section, our main result is a generalization of Theorem 3.1 to (almost) arbitrary Bregman divergences (see e.g. Theorem 4.14). Additionally, we provide a generalization of Aaronson’s discretized protocol to arbitrary Bregman divergences (Theorem 4.11).
4.1 Preliminaries on Bregman Divergences
Definition 4.1.
Given a differentiable,999When we say “differentiable,” we mean differentiable on the interior of the interval on which is defined. strictly convex function , and , the Bregman divergence from to is
Proposition 4.2 ([Ban+05]).
Given a random variable , the quantity is minimized by .
An intuitive formulation of Bregman divergence is that can be found by drawing the line tangent to at and computing how far below the point this line passes. We illustrate this in Figure 1. Note that the Bregman divergence is not in general symmetric in its arguments; indeed, is the only for which it is.
The Bregman divergence with respect to is precisely the squared distance. Another common Bregman divergence is the KL divergence, which corresponds to , the negative of Shannon entropy.
We generalize relevant notions such as agreement and accuracy to arbitrary Bregman divergences as follows. In the definitions below, is a differentiable, strictly convex function.
Definition 4.3.
Let be Alice’s expectation. Alice is -accurate if , and likewise for Bob.
We discuss our choice of the order of these two arguments (i.e. why we do not instead consider the expectation of ) in Appendix D. We now define -agreement, and to do so we first define the Jensen-Bregman divergence.
Definition 4.4.
For , the Jensen-Bregman divergence between and with respect to is
The validity of the second equality can be easily derived from the definition of Bregman divergence. Note that the Jensen-Bregman divergence, unlike the Bregman divergence, is symmetric in its arguments. The Jensen-Bregman divergence is a lower bound on the average Bregman divergence from Alice and Bob to any other point (see Proposition C.1 (i)).
Definition 4.5.
Let and be Alice’s and Bob’s expectations, respectively. Alice and Bob -agree with respect to if .
In Appendix D we discuss alternative definitions of agreement and accuracy. The upshot of this discussion is that our definition of agreement is the weakest reasonable one, and our definition of accuracy is the strongest reasonable one. This means that the main result of this section—that under a wide class of Bregman divergence, agreement implies accuracy—is quite powerful: it starts with a weak premise and proves a strong conclusion.
Definition 4.6.
Given an information structure , a communication protocol causes Alice and Bob to -agree on with respect to if Alice and Bob -agree with respect to at the end of the protocol. A communication protocol is an -agreement protocol with respect to if the protocol causes Alice and Bob to -agree with respect to on every information structure.
We also generalize the notion of rectangle substitutes to this domain, following [CW16], which explored notions of substitutes for arbitrary Bregman divergences.
Definition 4.7.
Let be a differentiable, strictly convex function. An information structure satisfies rectangle substitutes with respect to if for every , we have
The Pythagorean theorem (Proposition 2.7) generalizes to arbitrary Bregman divergences:
Proposition 4.8.
Let be a random variable, where is a sigma-algebra, and be a random variable defined on . Then
Although the proof of this observation is fairly straightforward, to our knowledge Proposition 4.8 is original to this work. We provide a proof in Appendix C. Just as we did with squared error, this general Pythagorean theorem allows us to rewrite the rectangle substitutes condition for Bregman divergences.
Remark 4.9.
An information structure satisfies rectangle substitutes with respect to if and only if for all we have
(5) |
Given the interpretation of Bregman divergences as measures of error, we can interpret the left side as Bob’s expected error in predicting the truth while the right side is Charlie’s expected error when predicting Alice’s expectation. Both sides measure a prediction error due to not having Alice’s signal, but from different starting points.
4.2 Generalizing the Discretized Protocol
Later in this work, we will show that under some weak conditions, protocols that cause Alice and Bob to agree with respect to also cause Alice and Bob to be accurate with respect to . However, this raises an interesting question: are there protocols that cause Alice and Bob to agree with respect to ? In particular, we are interested in natural expectation-sharing protocols. Aaronson’s discretized protocol is specific to , and it is not immediately obvious how to generalize it. We present the following generalization.
Definition 4.10.
Let be a differentiable, strictly convex function, and let . Choose . In the discretized protocol with respect to with parameter , on her turn (at time ), Alice sends “medium” if , and otherwise either “low” or “high”, depending on whether is smaller or larger (respectively) than . Bob acts analogously on his turn. At the start of the protocol, Alice and Bob use the information structure to independently compute the time that minimizes . The protocol ends at this time.
Theorem 4.11.
The discretized protocol with respect to with parameter is an -agreement protocol that involves bits of communication.
Our proof draws inspiration from Aaronson’s proof of the discretized protocol, but has significant differences. The key idea is to keep track of the monovariant . This is Charlie’s expected error (as measured by the Bregman divergence from the correct answer ) after time step —recall that Charlie is our name for a third-party observer of the protocol. Note that this quantity is at most and at least . Hence, if we show that the quantity decreases by at least some value every time Alice and Bob do not -agree, then we will have shown that Alice and Bob must -agree within time steps. We defer the proof to Appendix C.
4.3 Approximate Triangle Inequality
Our results will hold for a class of Jensen-Bregman divergences that satisfy an approximate version of the triangle inequality. Specifically, we will require to satisfy the following -approximate triangle inequality for some .
Definition 4.12.
Given a differentiable, strictly convex function and a positive number , we say that satisfies the -approximate triangle inequality if for all we have
It is possible to construct functions such that there is no positive for which satisfies the -approximate triangle inequality. However, satisfies the -approximate triangle inequality for some positive for essentially all natural choices of .
Proposition 4.13.
Let be a differentiable, strictly convex function.
-
(i)
If satisfies the triangle inequality, then satisfies the -approximate triangle inequality.
-
(ii)
If (i.e. is squared distance) or if (i.e. is KL divergence), then satisfies the triangle inequality (and so satisfies the -approximate triangle inequality).
Proof.
Regarding Fact (i), suppose that satisfies the triangle inequality. Then for all we have . Squaring both sides and observing that completes the proof.
4.4 Generalized Agreement Implies Generalized Accuracy
In all of the results in this subsection, we consider the following setting: is a differentiable convex function; is a positive real number such that satisfies the -approximate triangle inequality; and is an information structure that satisfies rectangle substitutes with respect to .
We prove generalizations of Theorem 3.1, showing that under the rectangle substitutes condition, if a protocol ends with Alice and Bob in approximate agreement, then Alice and Bob are approximately accurate. The first result we state assumes that is symmetric, but is otherwise quite general.
Theorem 4.14.
Assume that is symmetric about the line . For any communication protocol that causes Alice and Bob to -agree on , and for any , Alice and Bob are
after the protocol terminates.
This result is not our most general, as it assumes that is symmetric, but this assumption likely holds for most use cases. To apply the result optimally, one must first optimize as a function of . For example, setting (with defined below) gives us the following corollary:101010Corollary 4.15 as stated (without the symmetry assumption) is actually a corollary of Theorem 4.18.
Corollary 4.15.
Assume that . For any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates, where the constant hidden by depends on .
Remark 4.16.
Concretely, if is bounded then we can choose , in which case our bound simplifies to . If instead we assume that (as is the case if is a metric), then the bound is . If both of these are true, as is the case for , then the bound is , which recovers our result in Theorem 3.1.
For equal to the negative of Shannon entropy (i.e. the for which is KL divergence), setting in Theorem 4.14 gives us the following corollary.
Corollary 4.17.
If , then for any communication protocol that causes Alice and Bob to -agree on , Alice and Bob are -accurate after the protocol terminates.
Theorem 4.14 follows from our most general result about agreement implying accuracy:
Theorem 4.18.
Let be the maximum possible difference in -values of two points that differ by at most , and let be the concave envelope of , i.e.
For any communication protocol that causes Alice and Bob to -agree on , and for any , Alice and Bob are
after the protocol terminates.
Proof.
To prove Theorem 4.18, it suffices to prove the following lemma.
Lemma 4.19.
Let be a differentiable convex function on and be such that satisfies the -approximate triangle inequality. Let be an information structure that satisfies rectangle substitutes with respect to . Let . Then for any , we have
Consider any protocol that causes Alice and Bob to -agree on . Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob.
Let . Note that
Therefore, for any we have
In the first step, we apply Lemma 4.19 to the information structure restricted to —that is, to , where and . The next two steps follow by the convexity of and , respectively. ∎
The basic outline of the proof of Lemma 4.19 is similar to that of Lemma 3.2. Once again, we partition into intervals. Analogously to Equation 3, and with defined analogously, we find that
As before, we wish to upper bound each summand. However, the fact that the Bregman divergence is now arbitrary introduces complications. First, it is no longer the case that we can directly relate the length of an interval to the Bregman divergence between its endpoints. Second, we consider functions that become infinitely steep near and (such as the negative of Shannon entropy), which makes matters more challenging. This means that we need to be more careful when partitioning into intervals: see Algorithm C.3 for our new approach. Additionally, bounding the second summand involves reasoning carefully about the behavior of the function , which is responsible for the introduction of into the lemma statement. We defer the full proof of Lemma 4.19 to Appendix C.
5 Connections to Markets
In this work, we established a natural condition on information structures, rectangle substitutes, under which any agreement protocol results in accurate beliefs. As we saw, a particularly natural class of agreement protocols are expectation-sharing protocols, where Alice and Bob take turns stating their current expected value, or discretizations thereof.
Expectation-sharing protocols have close connections to financial markets. In markets, the actions of traders reveal partial information about their believed value for an asset, i.e., their expectation. Specifically, a trader’s decision about whether to buy or sell, and how much, can be viewed as revealing a discretization of this expectation. In many theoretical models of markets (see e.g. [Ost12]) traders eventually reach agreement. The intuition behind this phenomenon is that a trader who disagrees with the price leaves money on the table by refusing to trade. Our work thus provides a lens into a well-studied question:111111This is related to the efficient market hypothesis, the problem of when market prices reflect all available information, which traces back at least to [Fam70] and [Hay45]. Modern models of financial markets are often based on [Kyl85]; we refer the reader to e.g. [Ost12] and references therein for further information. when are market prices accurate?
An important caveat, however, is that traders behave strategically, and may not disclose their true expected value. For example, a trader may choose to withhold information until a later point when doing so would be more profitable. Therefore, to interpret the actions of traders as revealing discretized versions of their expected value, one first has to understand the Bayes-Nash equilibria of the market. [CW16] studies conditions under which traders are incentivized to reveal all of their information on their first trading opportunity. They call a market equilibrium all-rush if every trader is incentivized to reveal their information immediately. Their main result, roughly speaking, is that there is an all-rush equilibrium if and only if the information structure satisfies strong substitutes—a different strengthening of the weak substitutes condition. This result is specific to settings in which traders have the option to reveal all of their information on their turn—a setting that would be considered trivial from the standpoint of communication theory.
An exciting question for further study is therefore: under what information structure conditions and market settings is it a Bayes-Nash equilibrium to follow an agreement protocol leading to accurate beliefs? In other words, what conditions give not only that agreement implies accuracy, but also that the market incentivizes participants to follow the protocol? Together with [CW16], our work suggests that certain substitutes-like conditions could suffice.
References
- [Aar05] Scott Aaronson “The complexity of agreement” In Proceedings of the 37th Annual ACM Symposium on Theory of Computing, Baltimore, MD, USA, May 22-24, 2005 ACM, 2005, pp. 634–643 DOI: 10.1145/1060590.1060686
- [ABB13] Sreangsu Acharyya, Arindam Banerjee and Daniel Boley “Bregman Divergences and Triangle Inequality” In Proceedings of the 13th SIAM International Conference on Data Mining, May 2-4, 2013. Austin, Texas, USA SIAM, 2013, pp. 476–484 DOI: 10.1137/1.9781611972832.53
- [Aum76] Robert J. Aumann “Agreeing to Disagree” In The Annals of Statistics 4.6 Institute of Mathematical Statistics, 1976, pp. 1236–1239 URL: http://www.jstor.org/stable/2958591
- [Ban+05] Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon and Joydeep Ghosh “Clustering with Bregman divergences” In Journal of Machine Learning Research 6, 2005, pp. 1705–1749
- [CCR08] P. Chen, Y. Chen and M. Rao “Metrics defined by Bregman Divergences” In Communications in Mathematical Sciences 6.4 International Press of Boston, 2008, pp. 915–926 DOI: cms/1229619676
- [CW16] Yiling Chen and Bo Waggoner “Informational Substitutes” In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA IEEE Computer Society, 2016, pp. 239–247 DOI: 10.1109/FOCS.2016.33
- [ES03] Dominik Maria Endres and Johannes E. Schindelin “A new metric for probability distributions” In IEEE Trans. Inf. Theory 49.7, 2003, pp. 1858–1860 DOI: 10.1109/TIT.2003.813506
- [Fam70] Eugene F. Fama “Efficient capital markets: A review of theory and empirical work” In The Journal of Finance 25.2 Blackwell Publishing for the American Finance Association, 1970, pp. 383–417
- [Hay45] Friedrich August Hayek “The use of knowledge in society” In The American economic review 35.4 JSTOR, 1945, pp. 519–530
- [Kyl85] Albert S. Kyle “Continuous auctions and insider trading” In Econometrica: Journal of the Econometric Society 53.6, 1985, pp. 1315–1335
- [NR21] Eric Neyman and Tim Roughgarden “Are You Smarter Than a Random Expert? The Robust Aggregation of Substitutable Signals”, 2021
- [NR21a] Eric Neyman and Tim Roughgarden “From Proper Scoring Rules to Max-Min Optimal Forecast Aggregation” In EC ’21: The 22nd ACM Conference on Economics and Computation, Budapest, Hungary, July 18-23, 2021 ACM, 2021, pp. 734 DOI: 10.1145/3465456.3467599
- [Ost12] Michael Ostrovsky “Information aggregation in dynamic markets with strategic traders” In Econometrica 80.6 Wiley Online Library, 2012, pp. 2595–2647
- [RY20] Anup Rao and Amir Yehudayoff “Communication Complexity: and Applications” Cambridge University Press, 2020
- [Šid57] Z. Šidák “On Relations Between Strict-Sense and Wide-Sense Conditional Expectations” In Theory of Probability and Its Applications 2, 1957, pp. 267–272
Appendix A Details Omitted From Section 2
Above we stated that the weak substitutes condition is not sufficient for agreement to imply accuracy. To see this, consider the following information structure:
-
•
Nature flips a coin. Alice’s and Bob’s signals are each a pair consisting of the outcome of the coin flip and an additional bit:
-
•
If the coin lands heads, Alice and Bob are given highly correlated bits as part of their signals: the bits are and with probability 45% each and and with probability 5% each.
-
•
If the coin lands tails, Alice and Bob are given highly anticorrelated bits as part of their signals: the bits are and with probability 5% each and and with probability 45% each.
The value is the XOR of Alice’s and Bob’s bits. It can be verified that this information structure satisfies weak substitutes; the intuition is that a majority of the value comes from knowing the outcome of the coin flip, which can be inferred from Alice’s (or Bob’s) signal alone.
Alice and Bob 0-agree at the very start. However, they are not 0-accurate, since their expectations are either 10% or 90%, and the right answer is either 0 or 1. Therefore, weak substitutes alone is insufficient for agreement to imply accuracy.
Appendix B Details Omitted From Section 3
Proof of Claim 3.3.
We claim that in fact we can choose the ’s so that each is in . This ensures that each interval has length at most .
For , let be the probability that is between and , inclusive. Note that .
Observe that if is selected uniformly from , the expected value of is equal to , because both quantities are equal to the probability that is between and . Therefore, if is additionally chosen according to , we have
This means that
Thus, if each is selected uniformly at random from , the expected value of would be at most . In particular this means that there exist choices of the ’s such that . ∎
Proof of Lemma 3.2.
Fix a large positive integer (we will later find it optimal to set ). Consider a partition of into intervals satisfying the conditions of Claim 3.3. Let . Additionally, let and be as defined in Claim 3.3.
Our goal is to upper bound the expectation of . In pursuit of this goal, we observe that by the Pythagorean theorem, we have
We now use the rectangle substitutes assumption: for any , by applying Equation 1 to and , we know that
Taking the expectation over (i.e. choosing each with probability equal to ), we have that
(6) |
Therefore, we have
(7) |
We will argue use Claim 3.3 to argue that each of these two summands is small. The argument regarding the first summand is straightforward: for any , we have that , which means that .
We now upper bound the second summand.121212The proof below takes sums over and thus implicitly assumes that is finite, but the proof extends to infinite , with sums over replaced by integrals with respect to the probability measure over . For any , let and . Then and . Observe that
(8) |
To handle the first expectation, we note that if , then . To see this, observe
Rerranging and taking absolute values, we conclude
Therefore, recalling , we have
Proposition B.1.
Consider the following protocol, parametrized by . Alice and Bob send their initial expectations to each other, rounding to the nearest multiple of . This protocol entails communicating bits. At the end of the protocol, Alice and Bob -agree and are -accurate (with respect to ).
Proof.
Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Recall that we use and to denote the sets of all of Alice’s and Bob’s possible signals, respectively. We have
since and are guaranteed to be within of each other by construction. Thus, Bob is -accurate, and likewise for Alice. By the -approximate triangle inequality for , it follows that Alice and Bob -agree. ∎
Appendix C Details Omitted From Section 4
Proof of Proposition 4.8.
Let . We have
The third-to-last step follows from the fact that is -measurable (we are using the “pulling out known factors” property of conditional expectation). The last step follows from the fact that . ∎
Proposition C.1.
Let be a differentiable convex function on the interval . For all , we have
-
(i)
for every .
-
(ii)
satisfies the reverse triangle inequality: for every , we have .
-
(iii)
For all , we have .
-
(iv)
For a random variable supported on , we have
Proof.
Fact (i) follows from Proposition 4.2. Regarding Fact (ii), without loss of generality assume that and that (uniformly adding a constant to the derivative of does not change any Jensen-Bregman divergence, hence the second assumption). Then , so . Since , we also have that , so . Thus, we have
Regarding the equality in Fact (iv), we have
where the first step follows from the fact that , and .
Regarding the inequality in Fact (iv), without loss of generality assume that . By convexity we have that
so
In the last step we use the fact that for a convex function and a random variable defined on an interval with mean , the maximum possible value of is attained if is either or with the appropriate probabilities. ∎
Proof of Theorem 4.11.
Case 1:
. Let us use “hi,” “lo,” and “md” to denote the events that Alice says “high,” Alice says “low,” and Alice says “medium,” respectively. We have
where “” is short for “,” a notation we use throughout the proof. We thus have
(9) |
We now make use of the following lemma.
Lemma C.2.
Suppose that turn is Alice’s. Let “hi” denote the event that Alice says “high.” Let . Then
The analogous statement is true if Alice says “low,” and likewise if it is instead Bob’s turn.
Case 2:
. Using the Pythagorean theorem to write the same Bregman divergence in two ways, we have that
This means that one of the two summands on the left-hand side is at least .
Case 2a: . In that case we have that
by the same logic as in Case 1.
Case 2b: .
In each of our cases, we have that
Therefore, the total number of steps until agreement is first reached cannot be more than
This completes the proof. ∎
We now prove Lemma C.2.
Proof of Lemma C.2.
We will restrict our probability space to outcomes where Charlie knows at time (and thus omit “” from here on). For convenience, we will let be Alice’s expectation (a random variable) and be Charlie’s expectation (which is a particular number in ). We will let , so that if Alice says “high” then Charlie knows that and that .
Let , and let . Note that if Alice says “high” then . In our new notation, we may write , and we wish to show that . Put otherwise, our goal is to show that
For convenience we will let denote the quantity on the left-hand side.
Let be the number larger than such that , so that whenever Alice says “high.”131313If for all then Alice never says “high” and the lemma statement is trivial. Observe that since is convex (Bregman divergences are convex in their first argument), for a fixed value of , the value of is maximized when is either or (with probabilities and , respectively). Therefore we have
(10) |
Case 1:
. In that case we have
Case 2:
. In that case we have
(11) |
Case 2a: . Then we have
(In the last step we again use that .) Now, it is easy to verify that the second fraction is at least (this comes down to the fact that ), so we indeed have that .
Case 2b: . We claim that for all , we have that
(12) |
To see this, suppose for contradiction that for some we have . Then
On the other hand, we have that both and are less than or equal to , by definition of . This means that
but this implies that and that , a contradiction.
Plugging in into Equation 12, we find that
Plugging this bound into Equation 11, we get that
where in the second-to-last step we use that and in the last step we again use the fact that . ∎
Proof of Lemma 4.19.
We will partition into a number of small intervals , , , …, with certain desirable properties (which we will describe below). For , we will let . For a given , we will let be the such that .
Our goal is to upper bound the expectation of . In pursuit of this goal, we observe that by Proposition 4.8 we have
(13) |
Now, for any , by applying Equation 5 to and , we know that
(Here, “” is short for “.”) This is our only use of the rectangle substitutes assumption. Now, taking the expectation over (i.e. choosing each with probability equal to ), we have that
Together with Equation 13, this tells us that
(14) |
Our goal will be to bound the two summands in Equation 14. We will specify the boundaries of the intervals with this goal in mind.
On an intuitive level, we are hoping for two things to be true:
-
•
In order for the first summand to be small, we want and to be similar in value. In other words, we want each interval is “short” (for a notion of shortness with respect to that we are about to discuss).
-
•
In order for the second summand to be small, we want and to be similar in value. In other words, the estimate of a third party who knows shouldn’t change much upon learning . One way to ensure this is by creating the intervals in a way that makes the third party very confident about the value of before learning it. Intuitively this should be true because Alice and Bob approximately agree, so Alice’s estimate is likely to be close to Bob’s. However, we must be careful to strategically choose the boundaries of our intervals so that Alice’s and Bob’s estimates are unlikely to be on opposite sides of a boundary.141414This limits how many intervals we can reasonably use, which is why we cannot make our intervals arbitrarily short to satisfy the first of our two criteria.
What, formally, do we need for the first summand to be small? For any , we have . We can apply Proposition C.1 (iv) to the random variable on the probability subspace given by . Since takes on values in , we have that
(15) |
where is shorthand for the Jensen-Bregman divergence between the endpoints of . Therefore, if is small for all , then the first summand (which is an expected value of over ) is also small.
What about the second summand? As per the intuition above, we wish to choose our boundary points so that Alice’s and Bob’s estimates are unlikely to be on opposite sides of any boundary. Let be the smaller of the two estimates and be the larger one. We say that thwart a point if and . We define the thwart density of to be
Roughly speaking, we will choose such that is small on average.
We will approach this problem by first creating intervals to satisfy the first criterion (short intervals), without regard to the second, and then modifying them to satisfy the second without compromising the first. Formally, we choose our intervals according to the following algorithm.
Algorithm C.3 (Partitioning into intervals ).
-
(1)
Choose points such that the intervals thus created all have Jensen-Bregman divergence between and , inclusive, where and are as in the statement of Lemma 4.19. ( is not pre-determined; it is defined as one more than the number of intervals created.) (See footnote for why this is possible.151515Define so that (this is possible because is continuous in its arguments). Define so that . Keep going until an endpoint is defined such that adding as before would leave an interval with Jensen-Bregman divergence less than . Now, instead of defining in this way, define it so that . Since , the -approximate triangle inequality that we have by assumption tells us that .)
-
(2)
Let for convenience. Define . For , let . Let be such161616If the infimum is achieved (e.g. if the space of signals to Alice and Bob is finite), then we can set . Our algorithm works in more generality, at the expense of a factor of in our final bound. Note that by replacing with a smaller constant can arbitrarily reduce this factor. that .
-
(3)
Return the intervals .
We begin by observing that for any , we have
where for convenience we define . Therefore, by Equation 15, we have
(16) |
It remains to bound the second summand of Equation 14, , which is the bulk of the proof. We proceed in two steps:
-
(1)
(Lemma C.4) We show that is small. This means that Alice’s and Bob’s estimates are unlikely to lie on opposite sides of some boundary point . As a consequence, Bob is highly likely to know with a lot of confidence
-
(2)
(Lemma C.7) We bound the second summand as a function of . The intuition is that if is small, then Bob is highly likely to know with a lot of confidence, which means that he does not learn too much from learning .
We begin with the first step; recall our notation and .
Lemma C.4.
Proof.
We use the following claim, whose proof we provide afterward.
Claim C.5.
Let be any sub-interval of and let . Then there is an increasing sequence of points , such that for every , , and where
We apply Claim C.5 to the intervals , with . Let be the points whose existence the claim proves, and let , so that . Observe that , because the intervals are disjoint for all . We make the following claim (we provide the proof afterward).
Claim C.6.
(17) |
We may rewrite Equation 17 as
Recall that . Scaling the ’s to add to decreases the left-hand side above, so we may assume that . Note that is convex. Thus, by using a weighted Jensen inequality on the left-hand side with weights , we find that
This completes the proof of Lemma C.4. ∎
Proof of Claim C.5.
Let , or if this number does not exist or is larger than . Note that , as we have , so if the first term were less than we would have some with . On the other hand, , since
and if the right-hand side were more than then that would contradict the definition of as an infimum. Therefore, .
If , we are done. Otherwise, let . Then . Define analogously, and so forth.
All that remains to show is the upper bound on . This is where we use the fact that (by construction) . Summing over all , we have
which (after rearranging) completes the proof. ∎
Proof of Claim C.6.
First note that by construction, for all . By repeated use of the -approximate triangle inequality,171717We sub-divide into and , then subdivide each of these, and so on. we find that
On the other hand, we have
Here, the third step follows by the reverse triangle inequality (Fact (ii) of Proposition C.1) and the fourth step follows by rearranging the order of summation.181818The case that the space of signals is infinite is identical except that the summation is replaced by an integral over the probability space. Combining the last two facts gives us that
which rearranges to the desired identity. ∎
We are now ready to bound the second summand, i.e. , where is the such that Alice’s estimate lies in . For convenience we will define for Bob by analogy as the such that lies in . By Lemma C.4 and the preceding discussion, we know that
Lemma C.7.
Let . Then
The key idea is that because with probability near , learning is unlikely to make Bob update his estimate much.
Proof.
Consider any signal and let . We have191919This proof takes sums over and thus implicitly assumes that is finite, but the proof extends to infinite , with sums over replaced by integrals with respect to the probability measure over .
Note that , so by Proposition C.1 we have that
Let , so . Then
The second term is at most , since is the range of . To bound the first term, we note that cannot differ from by more than , as otherwise the average value of could not be . Therefore, is bounded by the largest possible difference in -values of two points that differ by at most . Therefore, we have
where is defined as in the statement of Lemma C.7. If is symmetric on , then for and otherwise. This is a concave function, but is not in general concave. However, consider as defined in the lemma statement, so for all . Then
Here, the second step follows by Jensen’s inequality with terms and weights , the second-to-last step follows from the fact that is convex and , and the last step follows from the fact that is convex and . ∎
Appendix D Alternative Definitions of Agreement and Accuracy
For arbitrary Bregman divergences, there are several notions of agreement and accuracy that are worth considering. Before we discuss these, we make a note about the order of arguments in a Bregman divergence. In our context, it makes the most sense to talk of the Bregman divergence from a more informed estimate to a less informed estimate. By a “more informed estimate” we mean a finer-grained one, i.e. one that is informed by more knowledge. For example, in terms of estimating in the context of this work explores, is more informed than , which is more informed than and , which are each more informed than , which is more informed than .
To see that this is the natural order of the arguments, recall that Bregman divergences are motivated by the property that they elicit the mean (see Proposition 4.2): if an agent who gives an estimate of for the value of a random variable incurs a loss of , then the agent minimizes their expected loss by reporting . This means that the expert ought to report the expected value of given the information that the expert knows.
This means that given two estimates of , and , of which is more informed, the quantity has a natural interpretation: it is the expected amount the expert gains by learning more and refining their estimate from to . This follows by the Pythagorean theorem:
D.1 Alternative Definitions of Agreement
One important motivation for using the Jensen-Bregman divergence to the midpoint as the definition of agreement is that this quantity serves as a lower bound on the expected amount that Charlie disagrees with Alice and Bob. Formally:
Definition D.1.
Let , , and be Alice’s, Bob’s, and Charlie’s expectations, respectively (these are random variables on ). Alice and Bob -agree with Charlie if .
(This is the order of arguments because Alice and Bob are more informed than Charlie.) By Proposition C.1 (i), we know that if Alice and Bob -agree with Charlie then they -agree.
As it happens the fact that under this (stronger) definition of agreement implies accuracy under rectangle substitutes follows immediately:
Proposition D.2.
Let be an information structure that satisfies rectangle substitutes. For any communication protocol that causes Alice and Bob to -agree with Charlie on , Alice and Bob are -accurate after the protocol terminates.
Proof.
Let be the set of possible signals of Alice at the end of the protocol which are consistent with the protocol transcript, and define likewise for Bob. Recall that Charlie’s expectation is . We have
where the first inequality follows by rectangle substitutes and the last inequality follows because Alice and Bob -agree with Charlie. ∎
The drawback of Definition D.1 is that it is not so much a definition of Alice and Bob’s agreement with each other, so much as a definition of agreement with respect to the protocol being run (since Charlie only exists within the context of the protocol). Put otherwise, it is impossible to determine whether Alice and Bob -agree with Charlie simply by knowing Alice and Bob’s expectations; one must also know Charlie’s expectation, which cannot be determined from Alice’s and Bob’s expectations. The question “how far from agreement are Alice and Bob if Alice believes 25% and Bob believes 30%?” makes sense in the context of -agreement, but not in the context of -agreement with Charlie.
A different notion of agreement, which (like -agreement) only depends on Alice’s and Bob’s expectations, uses the symmetrized Bregman divergence between these expectations: .
Definition D.3.
Let and be Alice’s and Bob’s expectations, respectively (these are random variables on ). Alice and Bob satisfy symmetrized -agreement if .
By Proposition C.1 (iii), we know that if Alice and Bob satisfy symmetrized -agreement then they -agree.
In our context, symmetrized Bregman divergence is less natural than Jensen-Bregman divergence. This is symmetrized Bregman divergence (unlike Jensen-Bregman divergence) does not seem to closely relate to our previous discussion of the Bregman divergence from a more informed to a less informed estimate being most natural.
D.2 Alternative Notions of Accuracy
Our definition of Alice’s accuracy as the expected Bregman divergence from the truth to Alice’s expectation seems like the most natural one. However, one may desire a definition of accuracy that takes both Alice’s and Bob’s expectations into account, judging the pair’s accuracy based on their consensus belief, rather than each of their individual beliefs. For instance, one could say that Alice and Bob are -midpoint-accurate if . By this definition, Alice’s and Bob’s expectations could individually be far from the truth, but they are considered accurate because the average of their expectations is close to correct.
Proposition D.4.
If Alice and Bob are -accurate, then they are -midpoint-accurate.
Proof.
Observe that for all it is the case that
The first inequality is true simply because lies in between and . Therefore,
∎
Another natural choice for Alice’s and Bob’s consensus belief is the QA pool (see [NR21a]). Proposition D.4 likewise holds for the QA pool in place of the midpoint, and indeed holds for any choice of consensus belief that is guaranteed to lie in between Alice’s and Bob’s expectations. Thus, any such definition will be weaker than our definition of -accuracy for Alice and Bob (up to a constant factor).
To summarize, among the above definitions of agreement, -agreement is the weakest; and among the above definitions of accuracy, Alice’s and Bob’s -accuracy is the strongest. This is an indication of strength for Theorem 4.18: it starts from a relatively weak premise and reaches a relatively strong conclusion.
Appendix E Implications for Communication Complexity
Our results can be framed in a communication complexity context, where they imply that “substitutable” functions can be computed with probability (over the inputs) with a transcript length depending only on . This is a nonstandard and weak notion of computing the function, but sketching the reduction may inspire future work on connections between substitutes and communication complexity.
In a classic deterministic communication complexity setup (e.g. [RY20]), Alice holds , Bob holds , and the goal is to compute some function using a communication protocol (see Section 2.2). Our setting captures this model when . Observe that in this case, , i.e. Alice and Bob’s information together determine completely. A communication protocol defines its output by a function where is the space of transcripts. We can simply let , i.e. rounding the ex post expectation to either zero or one. This is equivalent to the belief of “Charlie”, or the common knowledge of Alice and Bob after the protocol is completed.
Definition E.1 (Rectangle substitutes, -computes).
Given a function and a distribution over , we say satisfy rectangle substitutes if the corresponding information structure with satisfies rectangle substitutes (Definition 2.6). We say a protocol -computes over if, with probability at least over , the protocol has .
By our results, under rectangle substitutes , any agreement protocol approximately computes over . More precisely, using a fast substitutes-agreement protocol similar to Proposition B.1, we obtain the following.
Corollary E.2.
Suppose satisfy rectangle substitutes. Then for every , there is a deterministic communication protocol using bits of communication that -computes over .
Proof.
In round one, Alice sends her current expectation rounded to a multiple of ; call this message . In round two, Bob sends his updated expectation rounded to a multiple of ; call this message . The protocol then halts, and the output is rounded to either zero or one. It uses bits. Let be the random rectangle associated with the protocol.
By construction, , and is the expectation of conditioned on , so it follows that . Using substitutes (just as in Proposition B.1),
By construction, . Therefore, by the -approximate triangle inequality for squared distance (e.g. Proposition 4.13)),
Now, the protocol is incorrect if . Using Markov’s inequality,
Therefore, given , we run the protocol with . The probability of an incorrect output is at most , and we use bits of communication. ∎