This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Allocation with Strategic Agents and Unknown Distribution

Steven Yin
Department of Industrial Engineering and Operations Research
Columbia University
New York, NY 10027
sy2737@columbia.edu
&Shipra Agrawal
Department of Industrial Engineering and Operations Research
Columbia University
New York, NY 10027
sa3305@columbia.edu
Assaf Zeevi
Graduate School of Business
Columbia University
New York, NY 10027
assaf@gsb.columbia.edu

Online Allocation and Learning in the Presence of Strategic Agents

Steven Yin
Department of Industrial Engineering and Operations Research
Columbia University
New York, NY 10027
sy2737@columbia.edu
&Shipra Agrawal
Department of Industrial Engineering and Operations Research
Columbia University
New York, NY 10027
sa3305@columbia.edu
Assaf Zeevi
Graduate School of Business
Columbia University
New York, NY 10027
assaf@gsb.columbia.edu
Abstract

We study the problem of allocating TT sequentially arriving items among nn homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents’ total valuation of items allocated to them. The agents’ valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents’ utility compared to that under the optimal offline allocation policy.

1 Introduction

A classic sequential resource allocation problem is to allocate TT sequentially arriving items to nn agents, where each agent must receive a predetermined fraction of the items. The goal is to maximize social welfare, i.e., the agents’ total valuation of the items allocated to them. This problem is non-trivial even when the agents’ valuations are stochastic and i.i.d. with a known distribution, the main difficulty being that the allocations must be performed in real-time; specifically, an item must be allocated to an agent in the current round without knowledge of their future valuations.

A more challenging (and quite useful) extension of the problem which has been the focus of recent literature considers the case where the distribution of the agents’ valuations is apriori unknown to the planner. In such settings, algorithms based on online learning can be used to adaptively learn the valuation distribution from observed valuations in previous rounds, and improve the allocation policy over time (see [2, 10, 6, 5] for some examples). However, these mechanisms implicitly assume that the agents report their valuations truthfully, so that the mechanism can directly learn from the reported valuations in order to maximize the social welfare.

Many practical resource allocations settings do not conform with the truthful reporting assumption. In particular, selfish and strategic agents may have an incentive to misreport their valuations if that can lead to individual utility gain (possibly at the expense of social welfare). Hence, an allocation policy that does not take such misreporting incentives into account can incur significant loss in social welfare in presence of strategic agents. For example, consider a simple setting with two agents whose true valuations are i.i.d. and uniformly distributed between 0 and 1. That is,

X1,X2i.i.d.Uniform[0,1]X_{1},X_{2}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Uniform[0,1]}

Each agent is pre-determined to receive an equal fraction of all the items. The optimal welfare maximizing allocation policy is to allocate the item to the agent with higher valuation in (almost) every round. This policy results in T/3T/3 expected utility (𝔼[X1|X1>X2]/2\mathbb{E}[X_{1}|X_{1}>X_{2}]/2) for each agent, and a social welfare of 2T/32T/3. However, suppose that the first agent chooses to misreport in the following way: the agent reports a high valuation of 11 whenever her true valuation is in [0.5,1][0.5,1] and a low valuation of 0 whenever her true valuation is in [0,0.5][0,0.5]. Assuming the other agent remains truthful, this will lead to the first agent receiving all the items in her top 1/21/2 quantile, and therefore a significantly increased utility of 3T/83T/8 (𝔼[X1|X1>0.5]/2\mathbb{E}[X_{1}|X_{1}>0.5]/2) compared to T/3T/3 under truthful reporting. The social welfare however, goes down to 5T/85T/8 in this case. Thus under the optimal policy, each agent has an incentive to misreport her valuations in order to gain individual utility. The incentives to misreport may be further amplified under an online learning based allocation algorithm that learns approximately optimal policies from the valuations observed in the previous rounds. In such settings, the agents can potentially mislead the online learning algorithm to learn a more favorable policy over time by repeatedly misreporting their values.

Motivated by these shortcomings, in this paper, we consider the problem of designing an online learning and allocation mechanism in the presence of strategic agents. Specifically, we consider the problem of sequentially allocating TT items to nn strategic agents. The problem proceeds in TT rounds. In each round t=1,,Tt=1,\ldots,T, the agents’ true valuations Xi,t,i=1,,nX_{i,t},i=1,\ldots,n for the ttht^{th} item are generated i.i.d. from a distribution FF a priori unknown to the central planner. However, the central planner can only observe a value X~i,t\tilde{X}_{i,t} reported by each agent ii, which may or may not be the same as her true valuation Xi,tX_{i,t} for the item. Using the reported valuations from the current and previous rounds, the central planner needs to make an irrevocable decision of who to allocate the current item. The allocations should be made in a way such that each agent at the end receives a fixed fraction pip^{*}_{i} of the TT items, where pi>0,i=1npi=1p^{*}_{i}>0,\sum_{i=1}^{n}p^{*}_{i}=1. The objective of the central planner is to maximize the total utility of the agents, where utility of each agent is defined as the sum total true valuations of items received by the agent.

Our main contribution is a mechanism that achieves both: (a) Bayesian incentive compatibility, i.e., assuming all the other agents are truthful, with high probability no single agent can gain a significant utility by deviating from the truthful reporting strategy, and; (b) near-optimal regret guarantees, namely, the utility of each individual agent under the online mechanism is “close" to that achieved under the optimal offline allocation policy.

Organization

After discussing the related literature in some further detail in Section 2, we formally introduce the problem setting and some of the core concepts in Section 3. Section 4 describes our online learning and allocation algorithm and provides formal statements of our main results (Theorem 1 and Theorem 2). Section 5 and Section 6 provide an overview of the proofs of the above theorems. All the missing details of the proofs are provided in the appendix. Finally, in Section 7 we discuss some limitations and future directions.

2 Literature review

Our work lies at the intersection of online learning and mechanism design. From an online learning perspective, our setting is closely related to the recent work on constrained online resource allocation under stochastic i.i.d. rewards/costs (e.g., see [10, 2, 1, 6, 5]). However, a crucial assumption in those settings is that the central planner can observe the true rewards/costs of an allocation, which in our setting would mean that the central planner can observe agents’ true valuations of the items being allocated. Our work extends these settings to allow for selfish and strategic agents who may have incentives to misreport their valuations. As discussed in the introduction, unless the online allocation mechanism design takes these incentives into account, selfish agents may significantly misreport their valuations to cause significant loss in social welfare.

Incentives and strategic agents have been previously considered in online allocation mechanism design, however, most of that work has focused on auction design where payments are used as a key mechanism for limiting rational agents’ incentives to misreport their valuations. For example, [3] study a posted-price mechanism in a repeated auction setting where buyers’ valuations are context dependent. [12] extend this work to multi-buyer setting, using second-price auction with dynamic personalized reserve prices. [17] study a similar problem in a non-contextual setting. (There is also a significant literature that studies learning in repeated auction settings from the bidder’s perspective. Since our paper focuses on the central planner’s point of view, we omit references to that literature. ) All of the above-mentioned works are concerned with maximizing revenue for the seller, and use money/payments as a key instrument for eliciting private information about the bidder’s valuations for the items. In this paper, we are concerned with online allocation without money, and the goal is to maximize each agent’s utility.

Recently, there has been some work on studying reductions from mechanism design with money to those without money. [13] provided a black-box reduction from any one-shot, BIC mechanism with money, to an approximately BIC mechanism without money. However, their reduction relies crucially on knowing the true value distribution of agents and therefore is not applicable to our setting. [18] consider a specific (one-round) facility allocation problem and explicitly formulate the idea of designing mechanisms without money to achieve approximately optimal performance against mechanisms with money. Subsequently, there is a series of papers that extended the results on mechanism design without money in a single shot setting, when the bidders’ value distribution is unknown [14, 16, 8, 9, 8]. These papers either use a very restricted setting with just two agents, or use very specific/simple valuation functions for the agents. Even in these basic settings, they show that the best one can hope for is a constant approximation to what one can achieve with a mechanism that uses money. It is not clear what kind of regret guarantees such reductions imply in a repeated online learning setting. Therefore, we do not consider such reductions from auction mechanisms with money to be a fruitful direction for achieving our goals of both incentive compatibility and low (sublinear) regret for our online allocation problem.

Finally, in a repeated allocation settings with known valuation distribution, there are more positive results for truthful mechanism design without money. For example, [15] and later [7] studied the problem of repeatedly allocating items to agents with known value distributions; both use a state-based “promised utility” framework.

To summarize, to the best of our knowledge, this is the first paper to incorporate strategic agents’ incentives in the well-studied online allocation problem with stochastic i.i.d. rewards and unknown distributions. Thus, it bridges the gap between the online learning and allocation literature which focuses on non-strategic inputs, and the work on learning in repeated auctions which focuses on allocation mechanisms that utilize money (payments) to achieve incentive compatibility.

3 Problem formulation

The offline problem

We first state the offline version of the problem which will serve as our benchmark for the online problem. There is a set of nn agents, and a distribution 𝑭\bm{F} over 𝒳[0,x¯]n\mathcal{X}\coloneqq[0,\bar{x}]^{n}. Each draw 𝑿𝑭\bm{X}\sim\bm{F} from this distribution represents the nn agents’ valuations of one item: 𝑿=[X1,,Xn]\bm{X}=[X_{1},\ldots,X_{n}]. We assume that the agents’ valuations are i.i.d., i.e.

𝑭=FF.\bm{F}=F\otimes\ldots\otimes F.

A matching policy (aka allocation policy) maps, potentially with some exogenous randomness, a value vector 𝑿\bm{X} to one of the agents i{1,,n}i\in\{1,\ldots,n\}. Specifically, given a realized value vector 𝑿[0,x¯]n\bm{X}\in[0,\bar{x}]^{n}, a (possibly randomized) policy π\pi maps 𝑿\bm{X} to agent π(𝑿){1,,n}\pi(\bm{X})\in\{1,\ldots,n\}, with the probability of agent ii receiving an allocation given by (π(𝑿)=i)\mathbb{P}(\pi(\bm{X})=i). The offline optimization problem is to find a social welfare-maximizing policy π\pi^{*} such that each agent ii in expectation receives a predetermined fraction pip^{*}_{i} of the pool of items, where pi>0,ipi=1p_{i}^{*}>0,\sum_{i}p_{i}^{*}=1. The problem of finding optimal policy can therefore be stated as the following

maxπ\displaystyle\max\limits_{\pi}\quad 𝔼[i=1nXi𝟙(π(𝑿)=i)]\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}X_{i}\mathbbm{1}(\pi(\bm{X})=i)\right] (1)
s.t. (π(𝑿)=i)=pii\displaystyle\mathbb{P}(\pi(\bm{X})=i)=p^{*}_{i}\quad\forall i

where the expectations are taken both over 𝑿𝑭\bm{X}\sim\bm{F} and any randomness in the mapping made by policy π\pi given 𝑿\bm{X}. Solving the offline problem is non-trivial, as it is an infinite dimensional optimization problem as stated in its’ current form in (1). But it turns out to be closely related to Semi-Discrete Optimal Transport, and that the dual of (1) can be written as

minλn(λ,𝑭)i[n]𝕃i(λ)(Xi+λi)𝑑𝑭(𝑿)λp\min_{\lambda\in\mathbb{R}^{n}}\mathcal{E}(\lambda,\bm{F})\coloneqq\sum_{i\in[n]}\int_{\mathbb{L}_{i}(\lambda)}(X_{i}+\lambda_{i})\,d\bm{F}(\bm{X})-\lambda^{\top}p^{*} (2)

where 𝕃i\mathbb{L}_{i} is what is sometimes referred to as the Laguerre cell:

𝕃i(λ)={𝑿:Xi+λi>Xj+λjji,}.\mathbb{L}_{i}(\lambda)=\left\{\bm{X}:X_{i}+\lambda_{i}>X_{j}+\lambda_{j}\,\forall j\neq i,\right\}. (3)

Let λ(𝑭)\lambda^{*}(\bm{F}) denote an optimal solution to (2). When it is clear from the context, we omit the distribution 𝑭\bm{F}. It is known that an optimal solution to (1) is given by the following deterministic policy defined by the Laguerre cell partition (Proposition 2.1 [4]):

π(𝑿)=i for all 𝑿𝕃i(λ),i=1,,n\pi^{*}(\bm{X})=i\text{ for all }\bm{X}\in\mathbb{L}_{i}(\lambda^{*}),i=1,\ldots,n (4)

More generally, we will refer to any policy defined by a Laguerre cell partition as a greedy policy.

Definition 1 (Greedy allocation policy).

Consider any allocation policy that partitions the domain [0,x¯]n[0,\bar{x}]^{n} as 𝕃i(λ)\mathbb{L}_{i}(\lambda) (as defined in (4)) for some λn\lambda\in\mathbb{R}^{n}. We refer to such a policy as the greedy allocation policy with parameter λ\lambda.

Note that there are efficient algorithms for solving (2) (see[4]) when the distribution 𝑭\bm{F} is known. Therefore we will treat λ()\lambda^{*}(\cdot) as a black-box that can be computed efficiently for any given input distribution 𝑭^\hat{\bm{F}}.

The online problem: approximate Bayesian incentive compatibility and regret

We are interested in the case when items are sequentially allocated over TT rounds, and that the distribution 𝑭\bm{F} is initially unknown. Specifically, in each round t=1,,Tt=1,\ldots,T, the agents’ true valuations 𝑿t=(Xi,t,i=1,,n)\bm{X}_{t}=(X_{i,t},i=1,\ldots,n) are generated i.i.d. from the distribution 𝑭\bm{F} a priori unknown to the central planner. However, the central planner does not observe 𝑿t\bm{X}_{t} but only observes the reported valuations 𝑿~t=(X~i,t,i=1,,n)\tilde{\bm{X}}_{t}=(\tilde{X}_{i,t},i=1,\ldots,n) which may or may not be the same as the true valuations.

An online allocation mechanism consists of a sequence of allocation policies π1,,πt\pi_{1},\ldots,\pi_{t} where the policy πt\pi_{t} at time tt may be adaptively chosen based on the observed information until before time tt:

t={𝑿~1,,𝑿~t1,π1,,πt1}.\mathcal{H}_{t}=\{\tilde{\bm{X}}_{1},\ldots,\tilde{\bm{X}}_{t-1},\pi_{1},\ldots,\pi_{t-1}\}. (5)

Given allocation policy πt\pi_{t} at time tt, the agent ii’s utility at time tt is given by

ui(𝑿~t,𝑿t,πt)=Xi,t𝟙[πt(𝑿~t)=i]u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\pi_{t})=X_{i,t}\mathbbm{1}[\pi_{t}(\tilde{\bm{X}}_{t})=i]

If an agent ii has already reached his target allocation of piTp^{*}_{i}T items, then he cannot be allocated more items. Note that since the allocation policy may be randomized, for any given value vector 𝑿\bm{X}, πt(𝑿)\pi_{t}(\bm{X}) is a random variable. To ensure truthful reporting in presence of strategic agents, we are interested in mechanisms that are (approximately) Bayesian incentive compatible.

Definition 2 (Approximate-BIC).

For an online allocation mechanism, let πt,t=1,,T\pi_{t},t=1,\ldots,T be the sequence of allocations when all agents report truthfully, i.e., when 𝐗~t=𝐗t,t\tilde{\bm{X}}_{t}=\bm{X}_{t},\forall t; and let π~ti,t=1,,T\tilde{\pi}^{i}_{t},t=1,\ldots,T be the sequence when all agents except ii report truthfully, i.e., Xj,t=X~j,t,jiX_{j,t}=\tilde{X}_{j,t},\forall j\neq i. Then the online allocation mechanism is called (α,δ)(\alpha,\delta)-approximate Bayesian Incentive Compatible if, for all i=1,,ni=1,\ldots,n, with probability at least 1δ1-\delta,

t=1Tui(𝑿~t,𝑿t,π~ti)t=1Tui(𝑿t,𝑿t,πt)α\sum_{t=1}^{T}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\tilde{\pi}_{t}^{i})-\sum_{t=1}^{T}u_{i}(\bm{X}_{t},\bm{X}_{t},\pi_{t})\leq\alpha

Here, the probability is with respect to the randomness in true valuations 𝐗t𝐅\bm{X}_{t}\sim\bm{F} and any randomness in the online allocation policy. For the online policy to be approximate-BIC, the statement should hold for all possible misreporting of valuations X~i,tXi,t\tilde{X}_{i,t}\neq X_{i,t}.

Therefore, if α\alpha is small, then an individual agent has little incentive to strategize. Note that approximate-BIC also implies that truthful reporting for all agents constitutes an approximate-Nash equilibrium. Assuming that all agents are truthful, we are also interested in bounding each individual agent’s regret.

Definition 3 (Individual regret).

We define an individual agent ii’s regret under an online allocation mechanism as the difference between agent ii’s realized utility over TT rounds and the expected utility achieved in the offline expected problem. That is,

Regreti(T)=T𝔼[ui(𝑿,𝑿,λ)]t=1Tui(𝑿t,𝑿t,πt).\text{Regret}_{i}(T)=T\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]-\sum_{t=1}^{T}u_{i}(\bm{X}_{t},\bm{X}_{t},\pi_{t}). (6)

Here π1,,πT\pi_{1},\ldots,\pi_{T} denote the allocation policies used by the online allocation mechanism in round t=1,,Tt=1,\ldots,T.

Note that since social welfare is given by the sum of all agents’ utilities, a bound on individual regret implies a bound on the regret in social welfare of the mechanism.

4 Algorithm and main results

We present an online allocation mechanism that is approximately-BIC, and further achieves low regret guarantees on individual regret when all agents are truthful. Our algorithm contains two components: a learner, and a detector. Intuitively, the detector makes sure that the mechanism is approximately BIC, and the learner adaptively learns utility-maximizing allocation policies assuming truthful agents.

The learner runs in epochs with geometrically increasing lengths. The starting time of each epoch kk is given by Lk=2k,k=0,1,L_{k}=2^{k},k=0,1,\ldots, which is also when the allocation policy is updated. At the end of each epoch (i.e., at time t=Lk1t=L_{k}-1 for epoch kk), the learner takes all the previously reported values from all the agents, and uses them to construct an empirical distribution of the agents’ valuations. The learner implicitly assumes truthful agents in its computations. Therefore, since the agents’ true valuations are i.i.d., it first constructs a single, one-dimensional empirical distribution F^t\hat{F}_{t}, and then uses it to construct the corresponding nn-dimensional distribution 𝑭^t\hat{\bm{F}}_{t}:

F^t(x)=1tns=1ti=1n𝟙[X~i,sx]\hat{F}_{t}(x)=\frac{1}{tn}\sum_{s=1}^{t}\sum_{i=1}^{n}\mathbbm{1}[\tilde{X}_{i,s}\leq x] (7)
𝑭^t=F^tF^t\hat{\bm{F}}_{t}=\hat{F}_{t}\otimes\ldots\otimes\hat{F}_{t}

The learning algorithm then solves the offline problem (2) using 𝑭^t\hat{\bm{F}}_{t}, and uses the resulting greedy allocation policy characterized by λ(𝑭^t)\lambda^{*}(\hat{\bm{F}}_{t}) to allocate the items in the following epoch.

In parallel to the learner, the detector constructs and monitors, in each time step tt, and for each agent ii, two empirical distributions. One using the reported valuations from agent ii: F¯t\bar{F}_{t}, and one using the reported valuations from all the other agents: F~t\tilde{F}_{t}. The detection algorithm then computes the supremum between the two empirical CDFs, supx|F¯t(x)F~t(x)|\sup_{x}|\bar{F}_{t}(x)-\tilde{F}_{t}(x)|. If this difference is greater than a predetermined threshold Δt\Delta_{t}, then the detector raises a flag that there has been a violation of truthful reporting and the entire allocation game stops. Otherwise, the process continues.

The threshold Δt\Delta_{t} needs to be chosen such that if everyone is truthful, then with high probability the detection algorithm will not pull the trigger. At the same time, if someone deviates from truthful reporting significantly, then it should detect this with high probability. The typical concentration result used in comparing empirical CDFs is the Dvoretzky-Kiefer-Wolfowitz (DKW) inequality[11]. However, since a strategic agent can adaptively change its’ misreporting strategy, we cannot directly apply the DKW inequality, which assumes i.i.d. samples. Instead, we use martingale version of the DKW inequality (Lemma 1), and use that to choose an appropriate threshold Δt\Delta_{t}. The details are given in Algorithm 1 and Algorithm 2.

Input: T,δT,\delta
Initialize: λ=[0,,0]\lambda=[0,\ldots,0], k=0k=0, K=log2(T)K=\log_{2}(T), Lk=2k,k=0,,KL_{k}=2^{k},k=0,\ldots,K;
1 for t1,2,3,,Tt\leftarrow 1,2,3,\ldots,T do
2       Observe 𝑿t\bm{X}_{t}. Run Detection Algorithm (Algorithm 2) with sample set S={𝑿1,,𝑿t}S=\{\bm{X}_{1},\ldots,\bm{X}_{t}\}, and threshold Δt=641tlog(256etδ)\Delta_{t}=64\sqrt{\frac{1}{t}\log(\frac{256et}{\delta})}
3       if Detection Algorithm Return Reject then
4             Terminates.
5      if one of the agents i{1,,n}i\in\{1,\ldots,n\} has reached the allocation capacity piTp^{*}_{i}T then
6            Allocate randomly among agents who have not reached capacity
7      else
8             Allocate item using greedy allocation policy λ\lambda
9      
10      if t=Lk+11t=L_{k+1}-1 then
11             Compute F^t\hat{F}_{t} from samples {𝑿1,,𝑿t}\{\bm{X}_{1},\ldots,\bm{X}_{t}\} as in (7).
12             λλ(𝑭^t)\lambda\leftarrow\lambda^{*}(\hat{\bm{F}}_{t})
13             kk+1k\leftarrow k+1
14      
Algorithm 1 Epoch Based Online Allocation Algorithm
Input: Sample set S={𝑿1,,𝑿t}S=\{\bm{X}_{1},\ldots,\bm{X}_{t}\}, threshold Δt\Delta_{t}.
1 for i1,,ni\leftarrow 1,\ldots,n do
2       Compute F¯t(x)=1ts=1t𝟙[𝑿isx]\bar{F}_{t}(x)=\frac{1}{t}\sum_{s=1}^{t}\mathbbm{1}[\bm{X}^{s}_{i}\leq x] as the empirical CDF of the samples collected from agent ii
3       Compute F~t(x)=1t(n1)s=1tji𝟙[𝑿jsx]\tilde{F}_{t}(x)=\frac{1}{t(n-1)}\sum_{s=1}^{t}\sum_{j\neq i}\mathbbm{1}[\bm{X}^{s}_{j}\leq x] be the empirical CDF of all reported values from the other agents.
4       if supx|F~t(x)F¯t(x)|Δt2\sup_{x}|\tilde{F}_{t}(x)-\bar{F}_{t}(x)|\geq\frac{\Delta_{t}}{2} then
5             Return Reject
6      
Return Accept
Algorithm 2 Detection Algorithm

Our main results are the following guarantees on incentive compatibility and regret of our online allocation algorithm.

Theorem 1 (Approximate-BIC).

Algorithm 1 is (O(nTlog(nT/δ)),δ)(O(\sqrt{nT\log(nT/\delta)}),\delta)-approximate BIC.

Since truthful reporting constitutes an approximate equilibrium, it is reasonable to then assume that agents will act truthfully. We show the following individual regret bound assuming truthfulness.

Theorem 2 (Individual Regret).

Assuming all agents report their valuations truthfully, then under the online allocation mechanism given by Algorithm 1, with probability 1δ1-\delta, every agent ii’s individual regret can be bounded as:

Regreti(T)\displaystyle\text{Regret}_{i}(T) 4221nTlog(4nlog2T+nTδ)x¯\displaystyle\leq\frac{4\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{nT\log(\frac{4n\log_{2}T+nT}{\delta})}\bar{x}
=O(nTlog(nT/δ))\displaystyle=O(\sqrt{nT\log(nT/\delta)})

Showing approximate incentive compatibility, and then guaranteeing regret under the assumption of truthfulness, is an approach commonly seen in the online mechanism design literature (e.g. Theorem 4 in [17]). In the next section, we describe the high level proof ideas for the main results above.

5 Proof ideas

Proof ideas for Theorem 1

We establish that the mechanism is approximately BIC by showing that no single agent has incentive to significantly deviate from reporting true valuations if all the other agents are truthful. The proof consists of two parts. In Step 1, we prove that any significant deviation from the truth can be detected and will lead the mechanism to terminate. In Step 2,3, we prove that in order to achieve a significant gain in utility, an agent indeed has to report values that significantly deviate from the truth.

Step 1

Assuming that there is only one (unidentified) strategic agent while all the other agents are truthful, we first show that if the detector does not trigger a violation by time tt then with high probability, the empirical distribution of valuations reported by the strategic agent is no more than O(1/t)O(1/\sqrt{t}) away from the true distribution (Lemma 3). The key observation here is that since the agents’ valuations are i.i.d., we can compare their reported values to detect if any single agent’s distribution is significantly different from everyone else’s. A technical challenge in making statistical comparisons here is that the strategic agent can adaptively change their reporting strategy over time based on the realized outcomes. Therefore, we derive a novel martingale version of the DKW inequality to show concentration of the empirical distribution relative to the true underlying distribution.

Step 2

In a given round, given the history, the mechanism’s allocation policy is a fixed greedy allocation policy given by λ\lambda. If the distribution of strategic agent’s reported values differs from the true distribution by at most Δ\Delta, then the agent’s expected utility gain in that round, compared to reporting truthfully, is at most O(Δ)O(\Delta), (see Lemma 4).

Step 3

If over tt rounds, the distribution of the strategic agent’s reported values is at most Δ\Delta away from the true distribution, then the learning algorithm will, with high probability find an allocation policy that is at most O(nΔ)O(\sqrt{n}\Delta) away (in terms of individual utility) from what it would have learned if all the agents were truthful instead (see Lemma 2).

To understand the significance and distinction between results in Step 2 vs. Step 3, note that a strategic agent has two separate ways to gain utility. The first is to report valuations in a way that the agent immediately wins more/better items under the central planner’s current allocation policy. However, since the central planner is updating its’ allocation policy over time, the strategic agent can also misreport in a way that benefits its’ future utility, by “tricking” the central planner into learning a policy that favors him later on. Together, Step 2 and Step 3 show that the agent cannot gain significant advantage over being truthful in either manner.

In many existing works on online auctions mechanisms design, where the central planner dynamically adjusts the reserve price over time, these two types of strategic behaviors are in conflict: the agent either sacrifices future utility to gain immediate utility; or sacrifices near-term utility for future utility. The results in those settings therefore often rely on this observation to show approximate incentive compatibility. In our case however, since there is no money involved, it is not clear if such a conflict between short and long term utility exists. Nonetheless, we are able to bound the agents’ ability to strategize. Step 2 bounds the agent’s short term incentive to be strategic, whereas Step 3 bounds the longer term incentive to be strategic. Combining these steps gives us a proof for Theorem 1.

Proof ideas for Theorem 2

Recall that here we assume all agents’ are truthful. The proof involves two main steps.

Step 1

We show that uniformly for any t=1,,Tt=1,\ldots,T, with high probability, the empirical distribution constructed from the first tt samples is close (within a distance of O~(1/nt)\tilde{O}(1/\sqrt{nt})) to the true distribution FF. Here, the factor of 1/n1/\sqrt{n} comes from the fact that in each round we observe nn independent samples from the value distribution, one from each of the agents. This also implies that if all the agents are reporting truthfully, then, with high probability, the detector will not falsely trigger.

Step 2

We show that the allocation policy learned under the empirical distribution estimated from the samples is close to the the optimal allocation policy (Lemma 2). Specifically, after tt rounds if the empirical CDF is at most O(1/nt)O(1/\sqrt{nt}) away from the true distribution, then each agents’ expected utility in one round under the learned allocation policy is at most n/t\sqrt{n/t} away (both from above and from below) from the optimal. By using an epoch based approach we can then show that each agents’ individual regret is with high probability bounded by O(nT)O(\sqrt{nT}) over the entire planning horizon TT.

6 Proof details

We will now outline our proof in more detail. All missing proofs can be found in the Appendix. First we state the following martingale variation of the well-known Dvoretzky-Kiefer-Wolfowitz (DKW) inequality[11]. This is critical when dealing with strategic agents as they can adapt their strategy over time, resulting in non-independent (reported) values.

Lemma 1 (Martingale Version of DKW Inequality).

Given a sequence of random variables Y1,,YTY_{1},\ldots,Y_{T}, let t=σ(Y1,,Yt),t=1,,T\mathcal{F}_{t}=\sigma(Y_{1},\ldots,Y_{t}),t=1,\ldots,T be the filtration representing the information in the first tt variables. Let Ft(y):=Pr(Yty|t1)F_{t}(y):=\Pr(Y_{t}\leq y|\mathcal{F}_{t-1}), and F¯T(y):=1Tt=1T𝟙[Yty]\bar{F}_{T}(y):=\frac{1}{T}\sum_{t=1}^{T}\mathbbm{1}[Y_{t}\leq y]. Then,

(supy|F¯T(y)1Tt=1TFt(y)|α)(128eTα)eTα2/128\mathbb{P}\left(\sup_{y}\left|\bar{F}_{T}(y)-\frac{1}{T}\sum_{t=1}^{T}F_{t}(y)\right|\geq\alpha\right)\leq\left(\frac{128eT}{\alpha}\right)e^{-T\alpha^{2}/128}

Next we introduce a new notation to to denote the fraction of allocation that jj receives under the greedy allocation policy with parameter λ\lambda and valuation distribution 𝑭\bm{F}:

pj(𝑭,λ):=𝑿𝑭(𝑿𝕃j(λ)).p_{j}(\bm{F},\lambda):=\mathbb{P}_{\bm{X}\sim\bm{F}}(\bm{X}\in\mathbb{L}_{j}(\lambda)).

We start with proving Theorem 2, as we will use this to prove Theorem 1 later.

6.1 Individual Regret Bound (Theorem 2)

In Algorithm 1, the allocation policy is trained on the empirical distribution constructed from samples. We want to show that this difference between empirical and population distribution will not impair the performance of the resulting allocation policy too much.

Lemma 2.

Let 𝐆=G1Gn\bm{G}=G^{1}\otimes\ldots\otimes G^{n}, and 𝐅=F1Fn\bm{F}=F^{1}\otimes\ldots\otimes F^{n} be two distributions over [0,x¯]n[0,\bar{x}]^{n} where the marginals on each coordinate are independent. Suppose supx|Fi(x)Gi(x)|Δi\sup_{x}|F^{i}(x)-G^{i}(x)|\leq\Delta\,\forall i. Let λ=λ(𝐆)\lambda=\lambda^{*}(\bm{G}), and λ=λ(𝐅)\lambda^{*}=\lambda^{*}(\bm{F}). Then

|𝔼𝑿𝑭[ui(𝑿,𝑿,λ)]𝔼𝑿𝑭[ui[𝑿,𝑿,λ]]|nΔx¯\left|\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}(\bm{X},\bm{X},\lambda)]-\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}[\bm{X},\bm{X},\lambda^{*}]]\right|\leq n\Delta\bar{x}
Proof of Theorem 2

Now we have the main ingredients for Theorem 2. We use the DKW inequality to show that the empirical distribution constructed in (7) is close to the true distribution w.h.p.. Then we use Lemma 2 to show that the allocation policy selected by the learner based on the empirical distribution is almost optimal in expectation. The details can be found in the Appendix B.

6.2 Approximate-Bayesian Incentive Compatibility (Theorem 1)

Theorem 2 says that online utility of each agent cannot be too far below the offline optimum if everyone behaves truthfully. In order to show approximate-BIC, it suffices to show that the strategic agent cannot gain too much more than the offline optimum. To do so, we need to bound both the short term and longer term incentives for the agent to be strategic.

6.2.1 Short term incentive

We start with bounding the short term strategic incentive. We first show that if agent reports from an average distribution that is very different from the truthful distribution, then with high probability Algorithm 2 can detect that. Note that given the strategic agent’s strategy in a given round, his reported value is drawn from a distribution potentially different from FF.

Lemma 3.

Fix a time step tt. Let Δ=64log(256etδ)t\Delta=64\sqrt{\frac{\log(\frac{256et}{\delta})}{t}}. Let Fs,s=1,,tF_{s},s=1,\ldots,t be the strategic agent’s reported value distributions in each time step given the history, i.e., Fs(x):=(X~i,sx|s).F_{s}(x):=\mathbb{P}(\tilde{X}_{i,s}\leq x|{\cal H}_{s}). If the average distribution F¯=1ts=1tFs\bar{F}=\frac{1}{t}\sum_{s=1}^{t}F_{s} is such that supx|F¯(x)F(x)|Δ\sup_{x}|\bar{F}(x)-F(x)|\geq\Delta, then Algorithm 1 will terminate at or before time tt with probability at least 1δ1-\delta.

Next, we show that if the agent restricts the reported distribution to not deviate more than Δ\Delta from the true distribution (so that the deviation may go undetected by the detection algorithm), then the potential gain in the agent’s utility compared to truthful reporting is upper bounded by x¯Δ\bar{x}\Delta. This bounds the agent’s incentive to be strategic.

Lemma 4.

Fix a round tt and a single strategic agent ii, so that the remaining agents are truthful, i.e., X~j,t=Xj,t,ji\tilde{X}_{j,t}=X_{j,t},\forall j\neq i. Let Fr()F_{r}(\cdot) denote the marginal distribution of values X~i,t\tilde{X}_{i,t} reported by the strategic agent ii at time tt conditional on the history, i.e.,

Fr(x):=(X~i,tx|t).F_{r}(x):=\mathbb{P}(\tilde{X}_{i,t}\leq x|{\cal H}_{t}).

Suppose that supx|F(x)Fr(x)|Δ\sup_{x}|F(x)-F_{r}(x)|\leq\Delta. Then, at any time tt, given any greedy allocation policy λ\lambda,

𝔼[ui(𝑿~t,𝑿t,λ)|t]𝔼[ui(𝑿t,𝑿t,λ)]x¯Δ\mathbb{E}[u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\lambda)|{\cal H}_{t}]-\mathbb{E}[u_{i}(\bm{X}_{t},\bm{X}_{t},\lambda)]\leq\bar{x}\Delta

Note that FrF_{r} specifies only the marginal distribution of X~i,t|t\tilde{X}_{i,t}|\mathcal{H}_{t} and not the joint distribution of (𝑿~t,𝑿t)|t(\tilde{\bm{X}}_{t},\bm{X}_{t})|\mathcal{H}_{t}. Indeed the above lemma claims that the given bound on utility gain holds for all possible joint distributions as long as the marginal FrF_{r} of X~i,t|t\tilde{X}_{i,t}|\mathcal{H}_{t} is at most Δ\Delta away from FF.

Intuitively, Lemma 3 and Lemma 4 together bound the agent’s short term incentive to be strategic: if the agent deviates from the truthful distribution too much, then the mechanism will terminate early and the agent will lose out on all the future utility (Lemma 3); and given any greedy allocation strategy set by the central planner, we have that if the agents deviates within the undetectable range of Algorithm 2, then the gain in utility compared to acting truthfully is small (Lemma 4). Next, we bound an agent’s incentive to lie in order to make the mechanism learn a suboptimal greedy allocation policy that is more favorable to the agent.

6.2.2 Long term incentive

In order to bound the longer term incentive to be strategic, we want to show that despite agent ii being strategic, the central planner can still learn an allocation policy that closely approximates the offline optimal allocation policy. This means that the agent’s influence over the central planner’s allocation policies is limited.

Lemma 5.

Fix a round TTT^{\prime}\leq T and a strategic agent ii. If agent ii is the only one being strategic, and Algorithm 2 has not been triggered by the end of time TT^{\prime}, then with probability 1δ1-\delta, λ^λ(𝐅^T)\hat{\lambda}\coloneqq\lambda^{*}(\hat{\bm{F}}_{T^{\prime}}) satisfies

𝔼[ui(𝑿,𝑿,λ^)]𝔼[ui(𝑿,𝑿,λ)]nΔTx¯\mathbb{E}[u_{i}(\bm{X},\bm{X},\hat{\lambda})]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]\leq n\Delta_{T^{\prime}}\bar{x}

where ΔT=811nTlog(256e(T)δ)\Delta_{T^{\prime}}=81\sqrt{\frac{1}{nT^{\prime}}\log(\frac{256e(T^{\prime})}{\delta})} and λ=λ(𝐅)\lambda^{*}=\lambda^{*}(\bm{F}).

In particular, consider T=LkT^{\prime}=L_{k}. Then the lemma above shows that if an agent was not kicked out by the end of epoch k1k-1, then with high probability the greedy allocation policy in epoch kk will be such that the agents’ expected utility by being truthful is close to what he would have received in the offline optimal solution (Lemma 5). We can now combine this result with Lemma 3 and Lemma 4 to bound the utility that any single strategic agent can gain over the entire trajectory.

Lemma 6.

If agent ii is the only one being strategic, then with probability 1δ1-\delta, agent ii’s online utility is upper bounded by

t=1Tui(𝑿~t,𝑿t,λ~kt)T𝔼[ui(𝑿,𝑿,λ)]+286221nTlog(256elog2Tδ)x¯\sum_{t=1}^{T}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\tilde{\lambda}_{k_{t}})\leq T\mathbb{E}\left[u_{i}(\bm{X},\bm{X},\lambda^{*})\right]+\frac{286\sqrt{2}}{\sqrt{2}-1}\sqrt{nT\log(\frac{256e\log_{2}T}{\delta})}\bar{x}

Here ktk_{t} denotes the epoch number that time step tt lies in, and λ~kt\tilde{\lambda}_{k_{t}} denotes the allocation policy used by the central planner in that epoch.

Proof of Theorem 1

We have already proven Theorem 2 which bounds individual regret defined as the difference T𝔼[ui(𝑿,𝑿,λ)]t=1Tui(𝑿t,𝑿t,λkt)T\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]-\sum_{t=1}^{T}u_{i}(\bm{X}_{t},\bm{X}_{t},\lambda_{k_{t}}), i.e., the difference between the utility of agent ii under the offline optimal policy and that under the allocation policy learned by the algorithm when all the agents are truthful. The proof of Theorem 1 follows from plugging in the upper bound on T𝔼[ui(𝑿,𝑿,λ)]T\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})] from this theorem into Lemma 6, to obtain the desired bound on the expression t=1Tui(𝑿~t,𝑿t,λ~kt)t=1Tui(𝑿t,𝑿t,λkt)\sum_{t=1}^{T}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\tilde{\lambda}_{k_{t}})-\sum_{t=1}^{T}u_{i}(\bm{X}_{t},\bm{X}_{t},\lambda_{k_{t}}), i.e., on the total gain in utility achievable by misreporting under our mechanism. Further details are in Appendix C.

7 Limitations and Future Directions

Although our goal is to develop mechanisms that are robust to selfish and strategic agents, real applications often involve bad faith actors that have extrinsic motivation to behave adversarially. As such, deployment of such resource allocation mechanisms to critical applications requires significant additional validation. In future work we would like to explore the limit of relaxing the i.i.d. assumption that we place on the distribution of valuations across agents. This is a natural relaxation because if one agent thinks the item is good then it’s likely that other agents would like the item as well. Furthermore it is also conceivable that agents are heterogeneous and so have different value distributions for the items. However this seems to require a completely different strategy for detecting, and disincentivize strategic behaviors, as we can no longer catch the strategic agent through comparing each agent’s reported distribution with that of others.

Acknowledgement

This work was supported in part by an NSF CAREER award [CMMI 1846792] awarded to author S. Agrawal.

References

  • [1] Shipra Agrawal and Nikhil R Devanur “Fast algorithms for online stochastic convex programming” In Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, 2014, pp. 1405–1424 SIAM
  • [2] Shipra Agrawal, Zizhuo Wang and Yinyu Ye “A Dynamic Near-Optimal Algorithm for Online Linear Programming” In Operations Research 62.4 Linthicum, MD, USA: INFORMS, 2014, pp. 876–890
  • [3] Kareem Amin, Afshin Rostamizadeh and Umar Syed “Repeated contextual auctions with strategic buyers” In Advances in Neural Information Processing Systems 27, 2014
  • [4] Genevay Aude, Marco Cuturi, Gabriel Peyré and Francis Bach “Stochastic optimization for large-scale optimal transport” In arXiv preprint arXiv:1605.08527, 2016
  • [5] Santiago Balseiro, Haihao Lu and Vahab Mirrokni “Regularized online allocation problems: Fairness and beyond” In International Conference on Machine Learning, 2021, pp. 630–639 PMLR
  • [6] Santiago Balseiro, Haihao Lu and Vahab Mirrokni “The best of many worlds: Dual mirror descent for online allocation problems” In arXiv preprint arXiv:2011.10124, 2020
  • [7] Santiago R Balseiro, Huseyin Gurkan and Peng Sun “Multiagent mechanism design without money” In Operations Research 67.5 INFORMS, 2019, pp. 1417–1436
  • [8] Richard Cole, Vasilis Gkatzelis and Gagan Goel “Mechanism design for fair division: allocating divisible items without payments” In Proceedings of the fourteenth ACM conference on Electronic commerce, 2013, pp. 251–268
  • [9] Richard Cole, Vasilis Gkatzelis and Gagan Goel “Positive results for mechanism design without money”, 2013
  • [10] Nikhil R. Devanur, Kamal Jain, Balasubramanian Sivan and Christopher A. Wilkens “Near Optimal Online Algorithms and Fast Approximation Algorithms for Resource Allocation Problems” In J. ACM 66.1 New York, NY, USA: Association for Computing Machinery, 2019 DOI: 10.1145/3284177
  • [11] Aryeh Dvoretzky, Jack Kiefer and Jacob Wolfowitz “Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator” In The Annals of Mathematical Statistics JSTOR, 1956, pp. 642–669
  • [12] Negin Golrezaei, Adel Javanmard and Vahab Mirrokni “Dynamic incentive-aware learning: Robust pricing in contextual auctions” In Advances in Neural Information Processing Systems 32, 2019
  • [13] Artur Gorokh, Siddhartha Banerjee and Krishnamurthy Iyer “From monetary to nonmonetary mechanism design via artificial currencies” In Mathematics of Operations Research INFORMS, 2021
  • [14] Mingyu Guo and Vincent Conitzer “Strategy-proof allocation of multiple items between two agents without payments or priors.” In AAMAS, 2010, pp. 881–888 Citeseer
  • [15] Mingyu Guo, Vincent Conitzer and Daniel M Reeves “Competitive repeated allocation without payments” In International Workshop on Internet and Network Economics, 2009, pp. 244–255 Springer
  • [16] Li Han, Chunzhi Su, Linpeng Tang and Hongyang Zhang “On strategy-proof allocation without payments or priors” In International Workshop on Internet and Network Economics, 2011, pp. 182–193 Springer
  • [17] Yash Kanoria and Hamid Nazerzadeh “Incentive-compatible learning of reserve prices for repeated auctions” In Operations Research 69.2 INFORMS, 2021, pp. 509–524
  • [18] Ariel D Procaccia and Moshe Tennenholtz “Approximate mechanism design without money” In ACM Transactions on Economics and Computation (TEAC) 1.4 ACM New York, NY, USA, 2013, pp. 1–26
  • [19] Alexander Rakhlin, Karthik Sridharan and Ambuj Tewari “Sequential complexities and uniform martingale laws of large numbers” In Probability Theory and Related Fields 161.1-2 Springer, 2015, pp. 111–153

Appendix A Concentration Results

Lemma 7 (DKW Inequality [11]).

Given i.i.d. samples X1,,XTX_{1},\ldots,X_{T} from a distribution FF (cdf), let F^T(x)=1Tt=1T𝟙[Xtx]\hat{F}_{T}(x)=\frac{1}{T}\sum_{t=1}^{T}\mathbbm{1}[X_{t}\leq x]. Then,

(supx|F^T(x)F(x)|α)2e2Tα2\mathbb{P}\left(\sup_{x}\left|\hat{F}_{T}(x)-F(x)\right|\geq\alpha\right)\leq 2e^{-2T\alpha^{2}}

A.1 Proof of Lemma 1

See 1

Proof.

This follows from sequential uniform convergence, see Lemma 10,11 in [19], and the fact that indicator functions have fat-shattering dimension 11. ∎

A more convenient way to use Lemma 1 is the following corollary:

Corollary 1.

Given a sequence of random variables Y1,,YTY_{1},\ldots,Y_{T}, let t=σ(Y1,,Yt),t=1,,T\mathcal{F}_{t}=\sigma(Y_{1},\ldots,Y_{t}),t=1,\ldots,T be the filtration representing the information in the first tt variables. Suppose Ft(y)=Pr(Yty|t1)F_{t}(y)=\Pr(Y_{t}\leq y|\mathcal{F}_{t-1}), and let F¯T(y):=1Tt=1T𝟙[Yty]\bar{F}_{T}(y):=\frac{1}{T}\sum_{t=1}^{T}\mathbbm{1}[Y_{t}\leq y]. If α16log(128etδ)T\alpha\geq 16\sqrt{\frac{\log(\frac{128et}{\delta})}{T}}, then with probability 1δ1-\delta

supx|F¯T(x)1Tt=1TFt(x)|α\sup_{x}\left|\bar{F}_{T}(x)-\frac{1}{T}\sum_{t=1}^{T}F_{t}(x)\right|\leq\alpha
Proof.
(128eTα)eTα2/128δ\displaystyle\left(\frac{128eT}{\alpha}\right)e^{-T\alpha^{2}/128}\leq\delta
\displaystyle\iff α2128log(128eTδ)T+128Tlog(1α2)\displaystyle\alpha^{2}\geq\frac{128\log(\frac{128eT}{\delta})}{T}+\frac{128}{T}\log(\frac{1}{\alpha^{2}})
\displaystyle\impliedby α2256log(128eTδ)T\displaystyle\alpha^{2}\geq\frac{256\log(\frac{128eT}{\delta})}{T}
\displaystyle\iff α16log(128eTδ)T\displaystyle\alpha\geq 16\sqrt{\frac{\log(\frac{128eT}{\delta})}{T}}

Appendix B Proof of Theorem 2

B.1 Proof of Lemma 2

We first state two helper claims. The first one states that for any fixed greedy allocation policy λ\lambda, if the two distributions of valuations are similar, then the final allocation sizes for each receiver will also be close.

Claim 1.

Fix a greedy allocation policy λ\lambda. Let 𝐆=G1Gn\bm{G}=G^{1}\otimes\ldots\otimes G^{n}, and 𝐅=F1Fn\bm{F}=F^{1}\otimes\ldots\otimes F^{n} be two distributions over [0,x¯]n[0,\bar{x}]^{n} where the marginals in each coordinate are independent. Suppose supx|Fi(x)Gi(x)|Δi\sup_{x}|F^{i}(x)-G^{i}(x)|\leq\Delta\,\forall i. Then

j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+nΔ.\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}\leq n\Delta.

Next we show that if the allocation sizes are similar for two different greedy allocation policies, then the corresponding allocation decisions (domain partitions) are also similar.

Claim 2.

Let λ,λ\lambda^{\prime},\lambda be any two fixed greedy allocation policies, and 𝐅\bm{F} a distribution over [0,x¯]n[0,\bar{x}]^{n}. For all jj, let Ωj=𝕃j(λ)\Omega_{j}^{\prime}=\mathbb{L}_{j}(\lambda^{\prime}), and Ωj=𝕃j(λ)\Omega_{j}=\mathbb{L}_{j}(\lambda). Suppose j(pj(𝐅,λ)pj(𝐅,λ))+=j(pj(𝐅,λ)pj(𝐅,λ))+Δ\sum_{j}(p_{j}(\bm{F},\lambda^{\prime})-p_{j}(\bm{F},\lambda))^{+}=\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{F},\lambda^{\prime}))^{+}\leq\Delta. Then

(ΩjΩj)Δj\mathbb{P}(\Omega^{\prime}_{j}\setminus\Omega_{j})\leq\Delta\quad\forall j

Using these two Claims, we can now prove Lemma 2. The proofs for these two helper Claims follow after the proof of Lemma 2.

Proof of Lemma 2.

Claim 1 shows that

j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+nΔ\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}\leq n\Delta

Note that by definition of λ\lambda^{*} (due to the constraint Pr(Ωj)=pj\Pr(\Omega_{j})=p_{j}^{*} in problem (1)), we have pj(𝑮,λ)=pj=pj(𝑭,λ)p_{j}(\bm{G},\lambda)=p^{*}_{j}=p_{j}(\bm{F},\lambda^{*}) for all jj. This means that

j(pj(𝑭,λ)pj(𝑭,λ))+=j(pj(𝑭,λ)pj(𝑭,λ))+nΔ\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{F},\lambda^{*}))^{+}=\sum_{j}(p_{j}(\bm{F},\lambda^{*})-p_{j}(\bm{F},\lambda))^{+}\leq n\Delta

Now we can apply Claim 2 and conclude that

(ΩjΩj)nΔj,and \displaystyle\mathbb{P}(\Omega^{*}_{j}\setminus\Omega_{j})\leq n\Delta\quad\forall j,\quad\text{and }\quad (ΩjΩj)nΔj,\displaystyle\mathbb{P}(\Omega_{j}\setminus\Omega^{*}_{j})\leq n\Delta\quad\forall j, (8)

where Ωj=𝕃j(λ)\Omega_{j}=\mathbb{L}_{j}(\lambda) and Ωj=𝕃j(λ)\Omega^{*}_{j}=\mathbb{L}_{j}(\lambda^{*}). Therefore,

𝔼𝑿𝑭[ui(𝑿,𝑿,λ)]𝔼𝑿𝑭[ui[𝑿,𝑿,λ]]\displaystyle\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}(\bm{X},\bm{X},\lambda)]-\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}[\bm{X},\bm{X},\lambda^{*}]]
=\displaystyle= 𝑿ΩiXi𝑑𝑭(𝑿)𝑿ΩiXi𝑑𝑭(𝑿)\displaystyle\int_{\bm{X}\in\Omega_{i}}X_{i}\,d\bm{F}(\bm{X})-\int_{\bm{X}\in\Omega^{*}_{i}}X_{i}\,d\bm{F}(\bm{X})
\displaystyle\leq 𝑿ΩiΩiXi𝑑𝑭(𝑿)\displaystyle\int_{\bm{X}\in\Omega_{i}\setminus\Omega^{*}_{i}}X_{i}\,d\bm{F}(\bm{X})
\displaystyle\leq nΔx¯\displaystyle n\Delta\bar{x}

Using the same steps as above we can also show that

𝔼𝑿𝑭[ui(𝑿,𝑿,λ)]𝔼𝑿𝑭[ui[𝑿,𝑿,λ]]nΔx¯.\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}(\bm{X},\bm{X},\lambda^{*})]-\mathbb{E}_{\bm{X}\sim\bm{F}}[u_{i}[\bm{X},\bm{X},\lambda]]\leq n\Delta\bar{x}.

B.1.1 Proof of Claim 1

We first show variant of Claim 1 where the two distributions only differ in one coordinate:

Claim 3.

Fix a greedy allocation policy λ\lambda. Let 𝐆=G1G2Gn\bm{G}=G^{1}\otimes G^{2}\ldots\otimes G^{n}, and 𝐅=F1F2Fn\bm{F}=F^{1}\otimes F^{2}\ldots\otimes F^{n} be two distributions over [0,x¯]n[0,\bar{x}]^{n} where the marginals in each coordinate are independent. Assume that 𝐆\bm{G} and 𝐅\bm{F} differ only in one coordinate, w.l.o.g. say coordinate ii. Then, if supx|Fi(x)Gi(x)|Δ\sup_{x}|F^{i}(x)-G^{i}(x)|\leq\Delta,

j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+Δ.\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}\leq\Delta.
Proof.

We start with the distribution 𝑮=F1Fi1GiFi+1Fn\bm{G}=F^{1}\cdots F^{i-1}\otimes G^{i}\otimes F^{i+1}\cdots F^{n} and replace GiG^{i} with a distribution FiF^{i} to construct 𝑭=F1F2Fn\bm{F}=F^{1}\otimes F^{2}\cdots\otimes F^{n}. We will construct FiF^{i} in such a way that it is at most Δ\Delta away from GiG^{i} and the changes in the allocation proportions are maximized. Note that since ipi(𝑭,λ)=1\sum_{i}p_{i}(\bm{F},\lambda)=1 and ipi(𝑮,λ)=1\sum_{i}p_{i}(\bm{G},\lambda)=1, we know that

LHS(𝑭)j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+RHS(𝑭)LHS(\bm{F})\coloneqq\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}\coloneqq RHS(\bm{F})

is always true for any 𝑭,𝑮,λ\bm{F},\bm{G},\lambda. This means that we can focus on either maximizing either the LHS or the RHS of the above equation. There are two types of FiF^{i} that we can use. One is such that pi(𝑭,λ)pi(𝑮,λ)0p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)\geq 0 and the other is pi(𝑭,λ)pi(𝑮,λ)<0p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)<0. We can therefore bound the above quantity under these two scenarios separately:

maxFi:pi(𝑭,λ)pi(𝑮,λ)0RHS(𝑭)maxFi:pi(𝑭,λ)pi(𝑮,λ)0ji(pj(𝑮,λ)pj(𝑭,λ))+\displaystyle\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)\geq 0}RHS(\bm{F})\iff\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)\geq 0}\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+} (9)
maxFi:pi(𝑭,λ)pi(𝑮,λ)<0LHS(𝑭)maxFi:pi(𝑭,λ)pi(𝑮,λ)<0ji(pj(𝑭,λ)pj(𝑮,λ))+\displaystyle\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)<0}LHS(\bm{F})\iff\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)<0}\sum_{j\neq i}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+} (10)

Therefore for the rest of the proof we can focus on bounding the right hand side of (9) and (10).

Bounding the RHS of (9)

Let F~(x)(Gi(x)Δ)+x<x¯,F~(x¯)1\tilde{F}(x)\coloneqq(G^{i}(x)-\Delta)^{+}\,\forall x<\bar{x},\tilde{F}(\bar{x})\coloneqq 1. We claim that the FiF^{i} that maximizes ji(pj(𝑮,λ)pj(𝑭,λ))+\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}, while being at most Δ\Delta away, is F~\tilde{F}. To see this, consider a different distribution FF^{\prime} on the support [0,x¯][0,\bar{x}] such that supx|F(x)Gi(x)|Δ\sup_{x}|F^{\prime}(x)-G^{i}(x)|\leq\Delta. We know that F(x)F~(x)F^{\prime}(x)\geq\tilde{F}(x).

Later in Claim 4, we show that for any two distributions GG and FF, we can sample XFX\sim F using YY sampled from GG by performing the following transformation:

F1(Gu(Y))F^{-1}(G^{u}(Y))

where GuG^{u} is the random variable defined for distribution GG in (19) and F1inf{x:F(x)p}F^{-1}\coloneqq\inf\{x\in\mathbb{R}:F(x)\geq p\} denotes the generalized inverse, sometimes also referred to as the quantile function. This is essentially the inverse CDF method applied to a general distribution (instead of a uniformly sampled variable). In particular, let GiuG^{iu} be the following random function:

Giu(y)={Gi(y) if Gi(y)=Gi(y)Uniform[Gi(y),Gi(y)] if Gi(y)>Gi(y)G^{iu}(y)=\begin{cases}G^{i}(y)&\text{ if }G^{i}(y)=G^{i}(y_{-})\\ \text{Uniform}[G^{i}(y_{-}),G^{i}(y)]\quad&\text{ if }G^{i}(y)>G^{i}(y_{-})\end{cases}

Now, denote by 𝑭~,𝑭\tilde{\bm{F}},\bm{F}^{\prime}, the joint distribution that we get from 𝑮\bm{G} on replacing GiG^{i} with F~\tilde{F} and FF^{\prime}, respectively. Then, the winning probabilities for the agents in these two cases are:

pi(𝑭~,λ)=\displaystyle p_{i}(\tilde{\bm{F}},\lambda)= 0x¯jiFj(x~+λiλj)dF~(x~)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(\tilde{x}+\lambda_{i}-\lambda_{j})d\tilde{F}(\tilde{x})
=\displaystyle= 0x¯𝔼Giu(x)[jiFj(F~1(Giu(x))+λiλj)]𝑑Gi(x),\displaystyle\int_{0}^{\bar{x}}\mathbb{E}_{G^{iu}(x)}\left[\prod_{j\neq i}F^{j}({\tilde{F}}^{-1}(G^{iu}(x))+\lambda_{i}-\lambda_{j})\right]dG^{i}(x), (11)
pj(𝑭~,λ)=\displaystyle p_{j}(\tilde{\bm{F}},\lambda)= xF~(x+λjλi)k{i,j}Fk(x+λjλk)dFj(x),ji\displaystyle\int_{x}\tilde{F}(x+\lambda_{j}-\lambda_{i})\prod_{k\not\in\{i,j\}}F^{k}(x+\lambda_{j}-\lambda_{k})dF^{j}(x),\quad\forall j\neq i (12)
pi(𝑭,λ)=\displaystyle p_{i}(\bm{F}^{\prime},\lambda)= x𝔼Giu(x)[jiFj(F1(Giu(x))+λiλj)]𝑑Gi(x),\displaystyle\int_{x}\mathbb{E}_{G^{iu}(x)}\left[\prod_{j\neq i}F^{j}(F^{\prime-1}(G^{iu}(x))+\lambda_{i}-\lambda_{j})\right]dG^{i}(x), (13)
pj(𝑭,λ)=\displaystyle p_{j}(\bm{F}^{\prime},\lambda)= xF(x+λjλi)k{i,j}Fk(x+λjλk)dFj(x),ji\displaystyle\int_{x}F^{\prime}(x+\lambda_{j}-\lambda_{i})\prod_{k\not\in\{i,j\}}F^{k}(x+\lambda_{j}-\lambda_{k})dF^{j}(x),\quad\forall j\neq i (14)

Since F~(x)F(x)x[0,x¯]\tilde{F}(x)\leq F^{\prime}(x)\forall x\in[0,\bar{x}] by construction, F~1(p)F1(p)p[0,1]\tilde{F}^{-1}(p)\geq F^{\prime-1}(p)\forall p\in[0,1]. It’s easy to see that (11)(13)\eqref{eq: abstract misreporting claim helper11}\geq\eqref{eq: abstract misreporting claim helper21}, and (12)(14)\eqref{eq: abstract misreporting claim helper12}\leq\eqref{eq: abstract misreporting claim helper22}. Using this we have

ji(pj(𝑮,λ)pj(𝑭~,λ))+\displaystyle\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\tilde{\bm{F}},\lambda))^{+}\geq ji(pj(𝑮,λ)pj(𝑭,λ))+.\displaystyle\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F}^{\prime},\lambda))^{+}.

and substituting FF^{\prime} by GiG^{i} and again using (11)(13)\eqref{eq: abstract misreporting claim helper11}\geq\eqref{eq: abstract misreporting claim helper21}, and (12)(14)\eqref{eq: abstract misreporting claim helper12}\leq\eqref{eq: abstract misreporting claim helper22}, we get

pj(𝑮,λ)pj(𝑭~,λ)\displaystyle p_{j}(\bm{G},\lambda)-p_{j}(\tilde{\bm{F}},\lambda) 0ji,\displaystyle\geq 0\quad\forall j\neq i,
pi(𝑮,λ)pi(𝑭~,λ)\displaystyle p_{i}(\bm{G},\lambda)-p_{i}(\tilde{\bm{F}},\lambda) 0\displaystyle\leq 0

This shows that F~\tilde{F} is the maximizer of the RHS of (9) among all distributions that are at most Δ\Delta away from GiG^{i}. Using this we have

maxFi:pi(𝑭,λ)pi(𝑮,λ)0ji(pj(𝑮,λ)pj(𝑭,λ))+\displaystyle\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)\geq 0}\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}
=\displaystyle= ji(pj(𝑮,λ)pj(𝑭~,λ))+\displaystyle\sum_{j\neq i}(p_{j}(\bm{G},\lambda)-p_{j}(\tilde{\bm{F}},\lambda))^{+}
=\displaystyle= pi(𝑭~,λ)pi(𝑮,λ)\displaystyle p_{i}(\tilde{\bm{F}},\lambda)-p_{i}(\bm{G},\lambda)
=\displaystyle= 0x¯jiFj(x+λiλj)dF~(x)0x¯jiFj(x+λiλj)dGi(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}\left(x+\lambda_{i}-\lambda_{j}\right)d\tilde{F}(x)-\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(x+\lambda_{i}-\lambda_{j})dG^{i}(x)
=\displaystyle= xΔx¯jiFj(x+λiλj)dGi(x)+jiFj(x¯+λiλj)Δ0x¯jiFj(x+λiλj)dGi(x)\displaystyle\int_{x_{\Delta}}^{\bar{x}}\prod_{j\neq i}F^{j}\left(x+\lambda_{i}-\lambda_{j}\right)dG^{i}(x)+\prod_{j\neq i}F^{j}\left(\bar{x}+\lambda_{i}-\lambda_{j}\right)\Delta-\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(x+\lambda_{i}-\lambda_{j})dG^{i}(x)
\displaystyle\leq Δ\displaystyle\Delta
Bounding the RHS of (10)

Let F^(x)min(Gi(x)+Δ,1)\hat{F}(x)\coloneqq\min(G^{i}(x)+\Delta,1). Then we can use the same steps as above for LHS to show that F^(x)\hat{F}(x) maximizes ji(pj(𝑭^,λ)pj(𝑮,λ))+\sum_{j\neq i}(p_{j}(\hat{\bm{F}},\lambda)-p_{j}(\bm{G},\lambda))^{+}, and that

pj(𝑮,λ)pj(𝑭^,λ)0jip_{j}(\bm{G},\lambda)-p_{j}(\hat{\bm{F}},\lambda)\leq 0\,\forall j\neq i
pi(𝑮,λ)pi(𝑭^,λ)0p_{i}(\bm{G},\lambda)-p_{i}(\hat{\bm{F}},\lambda)\geq 0

This shows that F^\hat{F} is the maximizer of the RHS of (10).From there, we have

maxFi:pi(𝑭,λ)pi(𝑮,λ)<0ji(pj(𝑭,λ)pj(𝑮,λ))+\displaystyle\max_{F^{i}:p_{i}(\bm{F},\lambda)-p_{i}(\bm{G},\lambda)<0}\sum_{j\neq i}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}
=\displaystyle= j(pj(𝑭^,λ)pj(𝑮,λ))+\displaystyle\sum_{j}(p_{j}(\hat{\bm{F}},\lambda)-p_{j}(\bm{G},\lambda))^{+}
=\displaystyle= pi(𝑮,λ)pi(𝑭^,λ)\displaystyle p_{i}(\bm{G},\lambda)-p_{i}(\hat{\bm{F}},\lambda)
=\displaystyle= 0x¯jiFj(x+λiλj)dGi(x)0x¯jiFj(x+λiλj)dF^(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(x+\lambda_{i}-\lambda_{j})dG^{i}(x)-\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}\left(x+\lambda_{i}-\lambda_{j}\right)d\hat{F}(x)
=\displaystyle= 0x¯jiFj(x+λiλj)dGi(x)ΔjiFj(λiλj)0xΔjiFj(x+λiλj)dF(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(x+\lambda_{i}-\lambda_{j})dG^{i}(x)-\Delta\prod_{j\neq i}F^{j}\left(\lambda_{i}-\lambda_{j}\right)-\int_{0}^{x^{\Delta}}\prod_{j\neq i}F^{j}\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)
=\displaystyle= 0x¯jiFj(x+λiλj)dGi(x)0xΔjiFj(x+λiλj)dF(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F^{j}(x+\lambda_{i}-\lambda_{j})dG^{i}(x)-\int_{0}^{x^{\Delta}}\prod_{j\neq i}F^{j}\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)
\displaystyle\leq Δ\displaystyle\Delta

where xΔ=F1(1Δ)x^{\Delta}=F^{-1}(1-\Delta). ∎ Using Claim 3 we can now easily prove the original Claim 1.

Proof of Claim 1.

First we construct the following sequence of distributions where for any two adjacent distributions they only differ on one coordinate.

𝑮0\displaystyle\bm{G}_{0} =𝑮=G1Gn,\displaystyle=\bm{G}=G^{1}\otimes\ldots\otimes G^{n},
𝑮1\displaystyle\bm{G}_{1} =F1G2Gn,\displaystyle=F^{1}\otimes G^{2}\otimes\ldots\otimes G^{n},
𝑮2\displaystyle\bm{G}_{2} =F1F2G3Gn,\displaystyle=F^{1}\otimes F^{2}\otimes G^{3}\otimes\ldots\otimes G^{n},
\displaystyle\ldots
𝑮n\displaystyle\bm{G}_{n} =𝑭.\displaystyle=\bm{F}.

Then we can decompose the difference between p(𝑭,λ)p(\bm{F},\lambda) and pp^{*} into a sum of differences:

p(𝑭,λ)p(𝑮,λ)1\displaystyle||p(\bm{F},\lambda)-p(\bm{G},\lambda)||_{1} =p(𝑮n,λ)p(𝑮0,λ)1\displaystyle=||p(\bm{G}_{n},\lambda)-p(\bm{G}_{0},\lambda)||_{1}
i=1np(𝑮i,λ)p(𝑮i1,λ)1\displaystyle\leq\sum_{i=1}^{n}||p(\bm{G}_{i},\lambda)-p(\bm{G}_{i-1},\lambda)||_{1}
2nΔ\displaystyle\leq 2n\Delta

where the last step follows from Claim 3. Since j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+=12p(𝑭,λ)p(𝑮,λ)1\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}=\frac{1}{2}||p(\bm{F},\lambda)-p(\bm{G},\lambda)||_{1}, we have

j(pj(𝑭,λ)pj(𝑮,λ))+=j(pj(𝑮,λ)pj(𝑭,λ))+nΔ\sum_{j}(p_{j}(\bm{F},\lambda)-p_{j}(\bm{G},\lambda))^{+}=\sum_{j}(p_{j}(\bm{G},\lambda)-p_{j}(\bm{F},\lambda))^{+}\leq n\Delta

B.1.2 Proof of Claim 2

Proof.

WLOG, assume that λ1λ1λ2λ2λnλn\lambda_{1}-\lambda^{\prime}_{1}\geq\lambda_{2}-\lambda^{\prime}_{2}\ldots\geq\lambda_{n}-\lambda^{\prime}_{n}. Let Qjk=ΩjΩkQ_{jk}=\Omega_{j}\cap\Omega^{\prime}_{k} be the “error flow” of items from jj to kk. It is easy to see that for k<jk<j, 𝑿j+λj>𝑿k+λk𝑿j+λj>𝑿k+λk\bm{X}_{j}+\lambda_{j}>\bm{X}_{k}+\lambda_{k}\implies\bm{X}_{j}+\lambda^{\prime}_{j}>\bm{X}_{k}+\lambda^{\prime}_{k}. Therefore Qjk=Q_{jk}=\emptyset for k<jk<j. Then it follows that

ΩjΩji:i<jQiji:i<jk:kjQik.\Omega^{\prime}_{j}\setminus\Omega_{j}\subseteq\bigcup\limits_{i:i<j}Q_{ij}\subseteq\bigcup\limits_{i:i<j}\bigcup\limits_{k:k\geq j}Q_{ik}.

The right hand side above is the net outflow from the set {i:i<j}\{i:i<j\}. However, we know that each individual agents’ net in flow is pj(𝑭,λ)pj(𝑭,λ)p_{j}(\bm{F},\lambda^{\prime})-p_{j}(\bm{F},\lambda), so we can bound the RHS by

(i:i<jk:kjQik)i:i<jpj(𝑭,λ)pj(𝑭,λ)Δ\mathbb{P}\left(\bigcup\limits_{i:i<j}\bigcup\limits_{k:k\geq j}Q_{ik}\right)\leq\sum_{i:i<j}p_{j}(\bm{F},\lambda^{\prime})-p_{j}(\bm{F},\lambda)\leq\Delta

B.2 Proof of Theorem 2

Proof.

Fix an epoch kk, let Δ=12n(Lk1)log(2δ)\Delta=\sqrt{\frac{1}{2n(L_{k}-1)}\log(\frac{2}{\delta})}, λ^=λ(𝑭^Lk1)\hat{\lambda}=\lambda^{*}(\hat{\bm{F}}_{L_{k}-1}).

(Lemma 7) supx|F^Lk1(x)F(x)|Δ\displaystyle\sup_{x}|\hat{F}_{L_{k}-1}(x)-F(x)|\leq\Delta w.p. 1δ/2\displaystyle\text{w.p. }1-\delta/2
(Lemma 2)\displaystyle\text{(Lemma~{}\ref{lem: utility bound from learning error})}\implies 𝔼[ui(𝑿,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ^)]nΔx¯i\displaystyle\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\hat{\lambda})]\leq n\Delta\bar{x}\quad\forall i w.p. 1δ/2\displaystyle\text{w.p. }1-\delta/2 (15)
(Chernoff bound)\displaystyle(\text{Chernoff bound})\implies 𝔼[ui(𝑿,𝑿,λ)](Lk+1Lk)t=LkLk+11ui(𝑿t,𝑿t,λ^)]\displaystyle\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})](L_{k+1}-L_{k})-\sum_{t=L_{k}}^{L_{k+1}-1}u_{i}(\bm{X}_{t},\bm{X}_{t},\hat{\lambda})]
\displaystyle\leq nΔx¯(Lk+1Lk)+x¯(Lk+1Lk)2log(2δ)\displaystyle n\Delta\bar{x}(L_{k+1}-L_{k})+\bar{x}\sqrt{\frac{(L_{k+1}-L_{k})}{2}\log(\frac{2}{\delta})} w.p. 1δ\displaystyle\text{w.p. }1-\delta
=\displaystyle= 2knlog(2δ)x¯+2k1log(2δ)x¯\displaystyle\sqrt{2^{k}n\log(\frac{2}{\delta})}\bar{x}+\sqrt{2^{k-1}\log(\frac{2}{\delta})}\bar{x} w.p. 1δ\displaystyle\text{w.p. }1-\delta (16)

The above bounds the regret in one epoch if the algorithm does not terminate before the epoch ends. It remains to show that the algorithm with high probability does not terminate too early. This involves showing that with high probability, no agent hits their capacity constraint pjTp^{*}_{j}T significantly earlier than TT, and that the detection algorithm does not falsely trigger.

Continuing from (16), for any time step TTT^{\prime}\leq T, we have

T𝔼[ui(𝑿,𝑿,λ)]t=1Tui(𝑿t,𝑿t,λkt)\displaystyle T\mathbb{E}\left[u_{i}(\bm{X},\bm{X},\lambda^{*})\right]-\sum_{t=1}^{T^{\prime}}u_{i}(\bm{X}_{t},\bm{X}_{t},\lambda_{k_{t}})
\displaystyle\leq k=0log2T[(Lk+1Lk)𝔼[ui(𝑿,𝑿,λ)]t=LkLk+11ui(𝑿t,𝑿t,λk)]]\displaystyle\sum_{k=0}^{\log_{2}T^{\prime}}\left[(L_{k+1}-L_{k})\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]-\sum_{t=L_{k}}^{L_{k+1}-1}u_{i}(\bm{X}_{t},\bm{X}_{t},\lambda_{k})]\right]
+(TT)𝔼[ui(𝑿,𝑿,λ)]\displaystyle+(T-T^{\prime})\mathbb{E}\left[u_{i}(\bm{X},\bm{X},\lambda^{*})\right]
=\displaystyle= k=1log2Tn2klog(2δ)x¯+2k2log(2δ)x¯+(TT)x¯\displaystyle\sum_{k=1}^{\log_{2}T}\sqrt{n2^{k}\log(\frac{2}{\delta})}\bar{x}+\sqrt{\frac{2^{k}}{2}\log(\frac{2}{\delta})}\bar{x}+(T-T^{\prime})\bar{x} w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T
\displaystyle\leq 2nlog(2δ)x¯k=1log2T2k+(TT)x¯\displaystyle 2\sqrt{n\log(\frac{2}{\delta})}\bar{x}\sum_{k=1}^{\log_{2}T}\sqrt{2^{k}}+(T-T^{\prime})\bar{x} w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T
\displaystyle\leq 2221nTlog(2δ)x¯+(TT)x¯\displaystyle\frac{2\sqrt{2}}{\sqrt{2}-1}\sqrt{nT\log(\frac{2}{\delta})}\bar{x}+(T-T^{\prime})\bar{x} w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T (17)

where the second inequality follows from(16) and union bound. Now, since there are at most log2(T)\log_{2}(T) epochs for any TTT^{\prime}\leq T, above holds for all epochs and therefore for all TT^{\prime} with probability 1δlog2(T)1-\delta\log_{2}(T). Now we show that with high probability, for all TT2221nTlog(2δ)T^{\prime}\leq T-\frac{2\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{nT\log(\frac{2}{\delta})} and for any fixed agent ii, the constraint of total allocation to agent ii to be less than piTp^{*}_{i}T will be satisfied. Note that a byproduct of applying Lemma 2 in (15) is that |pi(𝑭,λk)pi|nΔLk1|p_{i}(\bm{F},\lambda_{k})-p^{*}_{i}|\leq n\Delta_{L_{k}-1} (See (8)). Fix a time step τ\tau,

t=1τ𝟙[argmaxj𝑿j+λktj=i]\displaystyle\sum_{t=1}^{\tau}\mathbbm{1}[\operatorname*{arg\,max}_{j}\bm{X}_{j}+\lambda_{k_{t}j}=i]
(Chernoff)\displaystyle(\text{Chernoff})\quad\leq k=1log2τ(Lk+1Lk)pi(𝑭,λk)+(Lk+1Lk)2log(2δ)\displaystyle\sum_{k=1}^{\log_{2}\tau}(L_{k+1}-L_{k})p_{i}(\bm{F},\lambda_{k})+\sqrt{\frac{(L_{k+1}-L_{k})}{2}\log(\frac{2}{\delta})} w.p. 1δlog2τ\displaystyle\text{w.p. }1-\delta\log_{2}\tau
\displaystyle\leq k=1log2τ(Lk+1Lk)(pi+nΔLk1)+(Lk+1Lk)2log(2δ)\displaystyle\sum_{k=1}^{\log_{2}\tau}(L_{k+1}-L_{k})(p^{*}_{i}+n\Delta_{L_{k}-1})+\sqrt{\frac{(L_{k+1}-L_{k})}{2}\log(\frac{2}{\delta})} w.p. 1δlog2τ\displaystyle\text{w.p. }1-\delta\log_{2}\tau
(Lk=2k)\displaystyle(L_{k}=2^{k})\quad\leq piτ+k=1log2τ(n2klog(2δ)+2k1log(2δ))\displaystyle p^{*}_{i}\tau+\sum_{k=1}^{\log_{2}\tau}\left(\sqrt{n2^{k}\log(\frac{2}{\delta})}+\sqrt{2^{k-1}\log(\frac{2}{\delta})}\right) w.p. 1δlog2τ\displaystyle\text{w.p. }1-\delta\log_{2}\tau
\displaystyle\leq piτ+2221nτlog(2δ)\displaystyle p^{*}_{i}\tau+\frac{2\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{n\tau\log(\frac{2}{\delta})} w.p. 1δlog2τ\displaystyle\text{w.p. }1-\delta\log_{2}\tau

This means that for all τT2221nTlog(2δ)\tau\leq T-\frac{2\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{nT\log(\frac{2}{\delta})}, with probability 1δlog2T1-\delta\log_{2}T,

t=1τ𝟙[argmaxj𝑿j+λktj=i]piT,\sum_{t=1}^{\tau}\mathbbm{1}[\operatorname*{arg\,max}_{j}\bm{X}_{j}+\lambda_{k_{t}j}=i]\leq p^{*}_{i}T,

Combining above with (17), we have that with probability 12δlog2T1-2\delta\log_{2}T, for any fixed ii, if the algorithm terminates at TT^{\prime} due to allocation limit reached for agent ii, then T2221nTlog(2δ)T^{\prime}\geq\frac{2\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{nT\log(\frac{2}{\delta})}, so that

T𝔼[ui(𝑿,𝑿,λ)]t=1Tui(𝑿,𝑿,λkt)2221nTlog(2δ)x¯+2221nTlog(2δ)x¯T\mathbb{E}\left[u_{i}(\bm{X},\bm{X},\lambda^{*})\right]-\sum_{t=1}^{T^{\prime}}u_{i}(\bm{X},\bm{X},\lambda_{k_{t}})\leq\frac{2\sqrt{2}}{\sqrt{2}-1}\sqrt{nT\log(\frac{2}{\delta})}\bar{x}+\frac{2\sqrt{2}}{\sqrt[]{2}-1}\sqrt[]{nT\log(\frac{2}{\delta})}\bar{x}

Finally, we also have to bound the probability that the detection algorithm falsely triggers. For a given time tt and for each ii, let

Fti(x)=\displaystyle F^{i}_{t}(x)= 1tt=1t𝟙[Xi,tx]\displaystyle\frac{1}{t}\sum_{t=1}^{t}\mathbbm{1}[X_{i,t}\leq x]
F~t(x)=\displaystyle\tilde{F}_{t}(x)= 1t(n1)t=1tji𝟙[𝑿j,tx]\displaystyle\frac{1}{t(n-1)}\sum_{t=1}^{t}\sum_{j\neq i}\mathbbm{1}[\bm{X}_{j,t}\leq x]

be the empirical CDF for agent ii and the rest of the agents. Since all agents are truthful, using Lemma 7 we have that with probability 1δ1-\delta,

supx|Fti(x)F(x)|\displaystyle\sup_{x}|F^{i}_{t}(x)-F(x)|\leq 12tlog(2δ)\displaystyle\sqrt{\frac{1}{2t}\log(\frac{2}{\delta})}
supx|F~t(x)F(x)|\displaystyle\sup_{x}|\tilde{F}_{t}(x)-F(x)|\leq 12t(n1)log(2δ)\displaystyle\sqrt{\frac{1}{2t(n-1)}\log(\frac{2}{\delta})}

This means that supx|Fti(x)F~t(x)|1tlog(2δ)321tlog(256etδ)=Δt/2\sup_{x}|F^{i}_{t}(x)-\tilde{F}_{t}(x)|\leq\sqrt{\frac{1}{t}\log(\frac{2}{\delta})}\leq 32\sqrt{\frac{1}{t}\log(\frac{256et}{\delta})}=\Delta_{t}/2, which means that Algorithm 2 is not triggered by agent ii. Using union bound, we know that with probability 1δnT1-\delta nT, the algorithm will not end early because of a false trigger (by any agent).

The result follows by replacing δ\delta with δn(2log2T+T)\frac{\delta}{n(2\log_{2}T+T)} and take the union bound over all agents.

Appendix C Proof of Theorem 1

C.1 Proof of Lemma 3

Proof.

Let α=Δ4\alpha=\frac{\Delta}{4}. We first check that the given condition on Δ\Delta satisfies (128etα)etα2/128δ2\left(\frac{128et}{\alpha}\right)e^{-t\alpha^{2}/128}\leq\frac{\delta}{2} and that 2e2t(n1)α2δ22e^{-2t(n-1)\alpha^{2}}\leq\frac{\delta}{2}

(128etα)etα2/128δ2\displaystyle\left(\frac{128et}{\alpha}\right)e^{-t\alpha^{2}/128}\leq\frac{\delta}{2}
\displaystyle\iff α2128log(256etδ)t+64tlog(1α2)\displaystyle\alpha^{2}\geq\frac{128\log(\frac{256et}{\delta})}{t}+\frac{64}{t}\log(\frac{1}{\alpha^{2}})
\displaystyle\impliedby α2256log(256etδ)t\displaystyle\alpha^{2}\geq\frac{256\log(\frac{256et}{\delta})}{t}
\displaystyle\iff Δ64log(256etδ)t\displaystyle\Delta\geq 64\sqrt{\frac{\log(\frac{256et}{\delta})}{t}}
2e2t(n1)α2δ2\displaystyle 2e^{-2t(n-1)\alpha^{2}}\leq\frac{\delta}{2}
\displaystyle\iff α12t(n1)log(4δ)\displaystyle\alpha\geq\sqrt{\frac{1}{2t(n-1)}\log(\frac{4}{\delta})}
\displaystyle\impliedby Δ64log(256etδ)t\displaystyle\Delta\geq 64\sqrt{\frac{\log(\frac{256et}{\delta})}{t}}

Let F¯t(x)=1ts=1t𝟙[X~i,sx]\bar{F}_{t}(x)=\frac{1}{t}\sum_{s=1}^{t}\mathbbm{1}[\tilde{X}_{i,s}\leq x] be the empirical CDF of the samples collected from agent ii. Let F~t(x)=1(n1)ts=1tji𝟙[X~j,sx]\tilde{F}_{t}(x)=\frac{1}{(n-1)t}\sum_{s=1}^{t}\sum_{j\neq i}\mathbbm{1}[\tilde{X}_{j,s}\leq x] be the empirical CDF of all reported values from the other agents. Let F¯(x)=1ts=1tFs(x)\bar{F}(x)=\frac{1}{t}\sum_{s=1}^{t}F_{s}(x), where Fs(x)=(X~i,sx|s)F_{s}(x)=\mathbb{P}(\tilde{X}_{i,s}\leq x|\mathcal{H}_{s}). Lemma 1 tells us that with probability 1δ/21-\delta/2,

supx|F¯t(x)F¯(x)|Δ4\sup_{x}|\bar{F}_{t}(x)-\bar{F}(x)|\leq\frac{\Delta}{4} (18)

Since other agents are truthful, their reported values are independent, and we can use the regular DKW inequality to bound the empirical distribution constructed from their values. Using Lemma 7 we can show that with probability 1δ/21-\delta/2,

supx|F~t(x)F(x)|Δ4.\sup_{x}|\tilde{F}_{t}(x)-F(x)|\leq\frac{\Delta}{4}.

Using union bound, we can conclude that if supx|F¯(x)F(x)|Δ\sup_{x}|\bar{F}(x)-F(x)|\geq\Delta, then with probability 1δ1-\delta:

supx|F~t(x)F¯t(x)|>Δ2\sup_{x}|\tilde{F}_{t}(x)-\bar{F}_{t}(x)|>\frac{\Delta}{2}

which means that Algorithm 2 would have returned Reject. ∎

C.2 Proof of Lemma 4

First we state a technical result on monotone mapping between two distributions. Given a cumulative distribution function FF, we define the following random function:

Fu(y)={F(y) if F(y)=F(y)Uniform[F(y),F(y)] if F(y)>F(y)F^{u}(y)=\begin{cases}F(y)&\text{ if }F(y)=F(y_{-})\\ \text{Uniform}[F(y_{-}),F(y)]\quad&\text{ if }F(y)>F(y_{-})\end{cases} (19)

If FF is a continuous distribution then FuF^{u} is deterministic and is the same as FF. However if FF contains point masses, then at points where FF jumps, FuF^{u} is uniformly sampled from the interval of that jump. It is easy to see that FuF^{u} has the nice property that if YFY\sim F, then Fu(Y)Uniform[0,1]F^{u}(Y)\sim\text{Uniform}[0,1].

Claim 4.

Let GG be any distribution (cdf) over 𝒳\mathcal{X}\subseteq\mathbb{R}, and FF over 𝒴\mathcal{Y}\subseteq\mathbb{R}. Then there exists a unique joint distribution rr over 𝒳×𝒴\mathcal{X}\times\mathcal{Y} with marginals G,FG,F such that the conditional distribution r(|Y)r(\cdot|Y) has the following monotonicity property: define x¯r(),x¯r()\bar{x}_{r}(\cdot),\underline{x}_{r}(\cdot) so that X[x¯r(Y),x¯r(Y)]X\in[\underline{x}_{r}(Y),\bar{x}_{r}(Y)] almost surely, i.e.,

x¯r(y)=inf{x:(X>x|Y=y)=0}\displaystyle\bar{x}_{r}(y)=\inf\{x:\mathbb{P}(X>x|Y=y)=0\}
x¯r(y)=sup{x:(X<x|Y=y)=0},\displaystyle\underline{x}_{r}(y)=\sup\{x:\mathbb{P}(X<x|Y=y)=0\},

then

x¯r(y1)x¯r(y2)y1<y2.\bar{x}_{r}(y_{1})\leq\underline{x}_{r}(y_{2})\quad\forall y_{1}<y_{2}.

In particular, the random variable X|Yr(|Y)X|Y\sim r(\cdot|Y) can be sampled as G1(Fu(Y))G^{-1}(F^{u}(Y)), where FuF^{u} is the random function defined in (19) and G1inf{x:G(x)p}G^{-1}\coloneqq\inf\{x\in\mathbb{R}:G(x)\geq p\} denotes the generalized inverse, sometimes also referred to as the quantile function.

The proof of this Claim is in Appendix D.1. Using the above result, we derive the following key result that will provide insight into a strategic agent’s best response to a greedy allocation strategy. Note that given a particular marginal distribution GG for the agent ii’s reported values and the true value distribution FF, there are many potential joint distributions between the true and reported valuations. In the following lemma, we show that the "best" joint distribution among these, in terms of agent ii’s utility maximization, is the one characterized in Claim 4.

Claim 5.

Fix a greedy allocation policy λ\lambda. Let 𝐗[0,x¯]n\bm{X}\in[0,\bar{x}]^{n} be drawn from FFF\otimes\ldots\otimes F. Fix another distribution GG over [0,x¯][0,\bar{x}]. Given 𝐗\bm{X}, define 𝐗~\tilde{\bm{X}}^{*} as follows: let X~i=G1(Fu(Xi))\tilde{X}_{i}^{*}=G^{-1}(F^{u}(X_{i})), and X~j=Xjji\tilde{X}_{j}^{*}=X_{j}\,\forall j\neq i. Let \mathcal{R} be the set of all joint distributions over [0,x¯]2[0,\bar{x}]^{2} such that the marginals are FF and GG; and for any rr\in\mathcal{R}, given 𝐗\bm{X} define 𝐗~r\tilde{\bm{X}}^{r} as follows: X~irr(|Xi)\tilde{X}^{r}_{i}\sim r(\cdot|X_{i}), and X~jr=Xjji\tilde{X}^{r}_{j}=X_{j}\,\forall j\neq i. Then

𝔼[ui(𝑿~,𝑿,λ)]maxr𝔼[ui(𝑿~r,𝑿,λ)].\mathbb{E}[u_{i}(\tilde{\bm{X}}^{*},\bm{X},\lambda)]\geq\max_{r\in\mathcal{R}}\mathbb{E}[u_{i}(\tilde{\bm{X}}^{r},\bm{X},\lambda)].
Proof.

First we show that for any joint distribution that is not monotone (i.e., does not have the monotonicity property defined in Claim 4), there is a monotone one that obtains at least as much utility. Suppose rr is one such joint distribution that is not monotone, i.e., x1<x2,\exists x_{1}<x_{2}, s.t. x¯r(x1)>x¯r(x2)\bar{x}_{r}(x_{1})>\underline{x}_{r}(x_{2}) (as defined in Claim 4). First recall that since XjF,jX_{j}\sim F,\forall j are independent, the expected utility can be written as the following:

𝔼[ui(𝑿~r,𝑿,λ)]=0x¯x¯r(x)x¯r(x)xjiF(x~+λiλj)dr(x~|x)dF(x)\displaystyle\mathbb{E}[u_{i}(\tilde{\bm{X}}^{r},\bm{X},\lambda)]=\int_{0}^{\bar{x}}\int_{\underline{x}_{r}(x)}^{\bar{x}_{r}(x)}x\prod_{j\neq i}F(\tilde{x}+\lambda_{i}-\lambda_{j})dr(\tilde{x}|x)dF(x)

Now consider a pair of values x~1>x~2\tilde{x}_{1}>\tilde{x}_{2} such that (x~1,x1)(\tilde{x}_{1},x_{1}) and (x~2,x2)(\tilde{x}_{2},x_{2}) has a non-zero probability density under distribution rr . This pair exists because x¯r(x1)>x¯r(x2)\bar{x}_{r}(x_{1})>\underline{x}_{r}(x_{2}). Then using the fact that for a,b,c,d>0,a<b,c<d:ac+bd>ad+bca,b,c,d>0,a<b,c<d:ac+bd>ad+bc, we can see that:

x1F(x~1+λiλj)+x2F(x~2+λiλj)<x1F(x~2+λiλj)+x2F(x~1+λiλj)x_{1}\prod F(\tilde{x}_{1}+\lambda_{i}-\lambda_{j})+x_{2}\prod F(\tilde{x}_{2}+\lambda_{i}-\lambda_{j})<x_{1}\prod F(\tilde{x}_{2}+\lambda_{i}-\lambda_{j})+x_{2}\prod F(\tilde{x}_{1}+\lambda_{i}-\lambda_{j})

This means that if we exchanged the probability mass between the two conditionals of x1,x2x_{1},x_{2}, the utility would be at least as much as before, if not higher. This means that at least one monotone joint distribution belongs in the set of utility maximizing joint distributions. Since Claim 4 showed that the distribution of (G1(Fu(X)),X)G^{-1}(F^{u}(X)),X) is the unique joint distribution that is monotone, we conclude that 𝑿~\tilde{\bm{X}}^{*} as defined in the lemma statement is indeed utility maximizing. ∎

Proof of Lemma 4
Proof.

Let G(x)(F(x)Δ)+x<x¯G(x)\coloneqq(F(x)-\Delta)^{+}\forall x<\bar{x}, G(x¯)1G(\bar{x})\coloneqq 1 be the distribution whose CDF is shifted down from FF by Δ\Delta. Let r~\tilde{r} be the utility maximizing joint distribution from Claim 5. Let r^\hat{r}, F^\hat{F} be a different pair of joint and marginal distribution such that supx|F(x)F^(x)|Δ\sup_{x}|F(x)-\hat{F}(x)|\leq\Delta. We know that F^(x)G(x)\hat{F}(x)\geq G(x) for all xx. Agent ii’s utilities for using r^\hat{r} and r~\tilde{r} respectively, are:

𝔼r^[ui(𝑿^,𝑿,λ)]=\displaystyle\mathbb{E}_{\hat{r}}[u_{i}(\hat{\bm{X}},\bm{X},\lambda)]= 0x¯xx¯r^(x)x¯r^(x)jiF(x^+λiλj)dr^(x^|x)dF(x)\displaystyle\int_{0}^{\bar{x}}x\int_{\underline{x}_{\hat{r}}(x)}^{\bar{x}_{\hat{r}}(x)}\prod\limits_{j\neq i}F(\hat{x}+\lambda_{i}-\lambda_{j})d\hat{r}(\hat{x}|x)dF(x)
=\displaystyle= 0x¯x𝔼Fu(x)[jiF(F^1(Fu(x))+λiλj)]𝑑F(x)\displaystyle\int_{0}^{\bar{x}}x\mathbb{E}_{F^{u}(x)}\left[\prod\limits_{j\neq i}F\left(\hat{F}^{-1}(F^{u}(x))+\lambda_{i}-\lambda_{j}\right)\right]dF(x) (20)

and

𝔼r~[ui(𝑿~,𝑿,λ)]=\displaystyle\mathbb{E}_{\tilde{r}}[u_{i}(\tilde{\bm{X}},\bm{X},\lambda)]= x𝔼Fu(x)[jiF(G1(Fu(x))+λiλj)]𝑑F(x)\displaystyle\int x\mathbb{E}_{F^{u}(x)}\left[\prod\limits_{j\neq i}F\left(G^{-1}(F^{u}(x))+\lambda_{i}-\lambda_{j}\right)\right]dF(x) (21)

respectively. Since F^(x)G(x)\hat{F}(x)\geq G(x), we know F^1(p)G1(p)\hat{F}^{-1}(p)\leq G^{-1}(p). Clearly (20)(21)\eqref{eq: alternative utility}\leq\eqref{eq: delta below utility}. We conclude that given a greedy allocation policy λ\lambda, true valuation Xi,tX_{i,t} and truthful agents jij\neq i (with X~j,t=Xj,t\tilde{X}_{j,t}=X_{j,t}), reporting X~i,tr~(|Xi,t)\tilde{X}_{i,t}\sim\tilde{r}(\cdot|X_{i,t}) is a strategy for agent ii that maximizes 𝔼[ui(𝑿~t,𝑿t,λ)]\mathbb{E}[u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\lambda)] subject to the marginal distribution constraint supx|F(x)Fr(x)|Δ\sup_{x}|F(x)-F_{r}(x)|\leq\Delta. That is,

𝔼r[ui(𝑿~t,𝑿t,λ)]𝔼r~[ui(𝑿~t,𝑿t,λ)]r s.t. supx|Fr(x)F(x)|Δ\mathbb{E}_{r}[u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\lambda)]\leq\mathbb{E}_{\tilde{r}}[u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\lambda)]\quad\forall r\text{ s.t. }\sup_{x}|F_{r}(x)-F(x)|\leq\Delta

It remains to bound the difference 𝔼r~[ui(𝑿~,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ)]\mathbb{E}_{\tilde{r}}[u_{i}(\tilde{\bm{X}},\bm{X},\lambda)]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda)]. First note that G1(p)=F1(p+Δ)G^{-1}(p)=F^{-1}(p+\Delta). Then we have that

𝔼r[ui(𝑿~,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ)]\displaystyle\mathbb{E}_{r}[u_{i}(\tilde{\bm{X}},\bm{X},\lambda)]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda)] (22)
=\displaystyle= 0x¯x(𝔼Fu(x)[jiF(F1(Fu(x)+Δ)+λiλj)]jiF(x+λiλj))𝑑F(x)\displaystyle\int_{0}^{\bar{x}}x\left(\mathbb{E}_{F^{u}(x)}\left[\prod_{j\neq i}F\left(F^{-1}(F^{u}(x)+\Delta)+\lambda_{i}-\lambda_{j}\right)\right]-\prod_{j\neq i}F(x+\lambda_{i}-\lambda_{j})\right)dF(x)
\displaystyle\leq x¯0x¯(𝔼Fu(x)[jiF(F1(Fu(x)+Δ)+λiλj)]jiF(x+λiλj))𝑑F(x)\displaystyle\bar{x}\int_{0}^{\bar{x}}\left(\mathbb{E}_{F^{u}(x)}\left[\prod_{j\neq i}F\left(F^{-1}(F^{u}(x)+\Delta)+\lambda_{i}-\lambda_{j}\right)\right]-\prod_{j\neq i}F(x+\lambda_{i}-\lambda_{j})\right)dF(x) (23)

where the inequality follows from the fact that F1(Fu(x)+Δ)xF^{-1}(F^{u}(x)+\Delta)\geq x w.p.1w.p.1 for all xx. To bound the remaining expression in the integral, we can use the fact that since the marginal distribution of x~\tilde{x} under the joint distribution r~(x~,x)\tilde{r}(\tilde{x},x) is GG, we have

0x¯0x¯jiF(x~+λiλj)dr~(x~|x)dF(x)\displaystyle\int_{0}^{\bar{x}}\int_{0}^{\bar{x}}\prod_{j\neq i}F\left(\tilde{x}+\lambda_{i}-\lambda_{j}\right)d\tilde{r}(\tilde{x}|x)dF(x)
=\displaystyle= 0x¯jiF(x+λiλj)dG(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dG(x)
=\displaystyle= xΔx¯jiF(x+λiλj)dF(x)+jiF(x¯+λiλj)Δ\displaystyle\int_{x_{\Delta}}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)+\prod_{j\neq i}F\left(\bar{x}+\lambda_{i}-\lambda_{j}\right)\Delta
\displaystyle\leq xΔx¯jiF(x+λiλj)dF(x)+Δ\displaystyle\int_{x_{\Delta}}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)+\Delta (24)

where xΔ:=F1(Δ)x_{\Delta}:=F^{-1}(\Delta). Similarly,

0x¯jiF(x+λiλj)dF(x)\displaystyle\int_{0}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)
=\displaystyle= 0xΔjiF(x+λiλj)dF(x)+xΔx¯jiF(x+λiλj)dF(x)\displaystyle\int_{0}^{x_{\Delta}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)+\int_{x_{\Delta}}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x)
\displaystyle\geq xΔx¯jiF(x+λiλj)dF(x)\displaystyle\int_{x_{\Delta}}^{\bar{x}}\prod_{j\neq i}F\left(x+\lambda_{i}-\lambda_{j}\right)dF(x) (25)

Plugging (24) and (25) back to (23), we can now bound the expression in (22), and thereby the profit from strategizing, by x¯Δ\bar{x}\Delta.

C.3 Proof of Lemma 5

Proof.

Let F¯\bar{F} be the average distribution that agent ii reported from up to round TT^{\prime}: F¯=1Tt=1TFt\bar{F}=\frac{1}{T^{\prime}}\sum_{t=1}^{T^{\prime}}F_{t}, where FtF_{t} is the reported value distribution of agent ii in time tt: Ft(x)(X~i,tx|t)F_{t}(x)\coloneqq\mathbb{P}(\tilde{X}_{i,t}\leq x|\mathcal{H}_{t}). Since the the detection algorithm has not been triggered, we can conclude using Lemma 3 that with probability 1δ1-\delta,

supx|F¯(x)F(x)|<Δ64log(256eTδ)T,\displaystyle\sup_{x}|\bar{F}(x)-F(x)|<\Delta\coloneqq 64\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{T^{\prime}}},
and supx|F¯T(x)F¯(x)|<Δ4=16log(256eTδ)T.\displaystyle\sup_{x}|\bar{F}_{T^{\prime}}(x)-\bar{F}(x)|<\frac{\Delta}{4}=16\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{T^{\prime}}}.

The second inequality holds because the proof of Lemma 3 uses the second inequality to show the first (see Equation 18). Combining the above two steps, we have

supx|F¯T(x)F(x)|<Δ4=80log(256eTδ)Tw.p. 1δ.\sup_{x}|\bar{F}_{T^{\prime}}(x)-F(x)|<\frac{\Delta}{4}=80\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{T^{\prime}}}\qquad\text{w.p. }1-\delta. (26)

This shows that if the detection algorithm has not been triggered, the empirical CDF of strategic agent’s reported values are close to the true CDF. Let F~T(x)=1(n1)Tt=1Tji𝟙[Xj,tx]\tilde{F}_{T^{\prime}}(x)=\frac{1}{(n-1)T^{\prime}}\sum_{t=1}^{T^{\prime}}\sum_{j\neq i}\mathbbm{1}[X_{j,t}\leq x] be the emipircal distriution from all agents other than ii. We know from Lemma 7 that

supx|F~T(x)F(x)|12(n1)Tlog(2δ)w.p. 1δ.\sup_{x}|\tilde{F}_{T^{\prime}}(x)-F(x)|\leq\sqrt{\frac{1}{2(n-1)T^{\prime}}\log(\frac{2}{\delta})}\qquad\text{w.p. }1-\delta. (27)

Combining (26) and (27), we can now bound the error in the combined estimation, F^T=1nTt=1Tj=1n𝟙[𝑿jtx]\hat{F}_{T^{\prime}}=\frac{1}{nT^{\prime}}\sum_{t=1}^{T^{\prime}}\sum_{j=1}^{n}\mathbbm{1}[\bm{X}^{t}_{j}\leq x]:

supx|F^T(x)F(x)|\displaystyle\sup_{x}|\hat{F}_{T^{\prime}}(x)-F(x)|
=\displaystyle= supx|1nF¯T(x)+n1nF~T(x)F(x)|\displaystyle\sup_{x}|\frac{1}{n}\bar{F}_{T^{\prime}}(x)+\frac{n-1}{n}\tilde{F}_{T^{\prime}}(x)-F(x)|
=\displaystyle= supx|1nF¯T(x)1nF(x)+n1nF~T(x)n1nF(x)|\displaystyle\sup_{x}|\frac{1}{n}\bar{F}_{T^{\prime}}(x)-\frac{1}{n}F(x)+\frac{n-1}{n}\tilde{F}_{T^{\prime}}(x)-\frac{n-1}{n}F(x)|
\displaystyle\leq supx|1nF¯T(x)1nF(x)|+supx|n1nF~T(x)n1nF(x)|\displaystyle\sup_{x}|\frac{1}{n}\bar{F}_{T^{\prime}}(x)-\frac{1}{n}F(x)|+\sup_{x}|\frac{n-1}{n}\tilde{F}_{T^{\prime}}(x)-\frac{n-1}{n}F(x)|
\displaystyle\leq 80log(256eTδ)nT+12nTlog(2δ)\displaystyle 80\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{nT^{\prime}}}+\sqrt{\frac{1}{2nT^{\prime}}\log(\frac{2}{\delta})} w.p. 12δ\displaystyle\text{w.p. }1-2\delta
\displaystyle\leq 81log(256eTδ)nT\displaystyle 81\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{nT^{\prime}}} w.p. 12δ\displaystyle\text{w.p. }1-2\delta (28)

Let 𝑭^T=F^TF^T\hat{\bm{F}}_{T^{\prime}}=\hat{F}_{T^{\prime}}\otimes\ldots\otimes\hat{F}_{T^{\prime}}, and λ=λ(𝑭^T)\lambda=\lambda^{*}(\hat{\bm{F}}_{T^{\prime}}), and ΔT=81log(256eTδ)nT\Delta_{T^{\prime}}=81\sqrt{\frac{\log(\frac{256eT^{\prime}}{\delta})}{nT^{\prime}}}. Applying Lemma 2 to (28) we have

supx|F^T(x)F(x)|ΔTw.p. 12δ\displaystyle\sup_{x}|\hat{F}_{T^{\prime}}(x)-F(x)|\leq\Delta_{T^{\prime}}\quad\text{w.p. }1-2\delta
(Lemma 2)\displaystyle(\text{Lemma~{}\ref{lem: utility bound from learning error}})\implies 𝔼[ui(𝑿,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ)]nΔTx¯\displaystyle\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda)]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]\leq n\Delta_{T^{\prime}}\bar{x}

C.4 Proof of Lemma 6

Proof.

Let Ft,t=1,,TF_{t},t=1,\ldots,T be the distributions that agent ii reports from in each round given the history, i.e. X~i,t|tFt\tilde{X}_{i,t}|\mathcal{H}_{t}\sim F_{t}. First we try to bound the utility that the strategic agent can get from a single epoch. Fix an epoch kk. Suppose TT^{\prime} is the time when either detection algorithm is triggered, or the first time some receiver hits his allocation budget of pjTp^{*}_{j}T. Let τ=min(T,Lk+11)\tau=\min(T^{\prime},L_{k+1}-1). We now define three distributions:

F¯1=\displaystyle\bar{F}^{1}= 1Lk1t=1Lk1Ft\displaystyle\frac{1}{L_{k}-1}\sum_{t=1}^{L_{k}-1}F_{t}
F¯2=\displaystyle\bar{F}^{2}= 1τ1t=1τ1Ft\displaystyle\frac{1}{\tau-1}\sum_{t=1}^{\tau-1}F_{t}
F¯3=\displaystyle\bar{F}^{3}= 1τLkt=Lkτ1Ft\displaystyle\frac{1}{\tau-L_{k}}\sum_{t=L_{k}}^{\tau-1}F_{t}

These are the average distributions that agent ii reported from, averaged across three time periods: [1,Lk)[1,L_{k}), [1,τ)[1,\tau) and [Lk,τ)[L_{k},\tau). In particular, F¯3\bar{F}^{3} is the average distribution that the strategic agent reports from in epoch kk. From Lemma 3 we know that with probability 12δ1-2\delta:

supx|F¯1(x)F(x)|\displaystyle\sup_{x}|\bar{F}^{1}(x)-F(x)|\leq 64log(256e(Lk1)δ)n(Lk1)\displaystyle 64\sqrt{\frac{\log(\frac{256e(L_{k}-1)}{\delta})}{n(L_{k}-1)}}
supx|F¯2(x)F(x)|\displaystyle\sup_{x}|\bar{F}^{2}(x)-F(x)|\leq 64log(256e(τ1)δ)n(τ1)\displaystyle 64\sqrt{\frac{\log(\frac{256e(\tau-1)}{\delta})}{n(\tau-1)}}

which together means that

supx|F¯2(x)F(x)|=supx|Lkτ(F¯1(x)F(x))+τLkτ(F¯3(x)F(x))|\displaystyle\sup_{x}|\bar{F}^{2}(x)-F(x)|=\sup_{x}|\frac{L_{k}}{\tau}(\bar{F}^{1}(x)-F(x))+\frac{\tau-L_{k}}{\tau}(\bar{F}^{3}(x)-F(x))|
\displaystyle\implies supx|F¯2(x)F(x)|supx|τLkτ(F¯3(x)F(x))|supx|Lkτ(F¯1(x)F(x))|\displaystyle\sup_{x}|\bar{F}^{2}(x)-F(x)|\geq\sup_{x}|\frac{\tau-L_{k}}{\tau}(\bar{F}^{3}(x)-F(x))|-\sup_{x}|\frac{L_{k}}{\tau}(\bar{F}^{1}(x)-F(x))|
\displaystyle\implies supx|F¯3(x)F(x)|Δ¯kmin(128ττLklog(256e(τ1)δ)n(τ1),1)\displaystyle\sup_{x}|\bar{F}^{3}(x)-F(x)|\leq\bar{\Delta}_{k}\coloneqq\min\left(\frac{128\tau}{\tau-L_{k}}\sqrt{\frac{\log(\frac{256e(\tau-1)}{\delta})}{n(\tau-1)}},1\right)

Note that the last step also uses the fact that the difference between two CDFs cannot be bigger than 1. Let rr be any joint distribution for agent ii’s reported and true valuation (x~,x)(\tilde{x},x) such that the marginal for the reported valuation is equal to F¯3\bar{F}^{3}, i.e.,

X¯i,tr(|Xi,t),Xi,tFFr(x)(X¯i,tx)=F¯3\bar{X}_{i,t}\sim r(\cdot|X_{i,t}),X_{i,t}\sim F\implies F_{r}(x)\coloneqq\mathbb{P}(\bar{X}_{i,t}\leq x)=\bar{F}^{3}

Let 𝑿¯\bar{\bm{X}} denote the reported value vector when ii is the only strategic agent and uses r(|Xi)r(\cdot|X_{i}) to pick his reported value: X¯j=Xjji\bar{X}_{j}=X_{j}\,\forall j\neq i, X¯ir(|Xi)\bar{X}_{i}\sim r(\cdot|X_{i}). Let ΔLk1=81log(256e(Lk1)δ)n(Lk1)\Delta_{L_{k}-1}=81\sqrt{\frac{\log(\frac{256e(L_{k}-1)}{\delta})}{n(L_{k}-1)}}. Using this, we have

(Lemma 5)\displaystyle(\text{Lemma~{}\ref{lem: learning under strategic report}})\implies 𝔼[ui(𝑿,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ)]nΔLk1x¯\displaystyle\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda)]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]\leq n\Delta_{L_{k}-1}\bar{x}
(Lemma 4)\displaystyle(\text{Lemma~{}\ref{lem: utility gain bound given fixed allocation policy}})\implies 𝔼[ui(𝑿¯,𝑿,λ)]𝔼[ui(𝑿,𝑿,λ)]nΔLk1x¯+Δ¯kx¯\displaystyle\mathbb{E}[u_{i}(\bar{\bm{X}},\bm{X},\lambda)]-\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]\leq n\Delta_{L_{k}-1}\bar{x}+\bar{\Delta}_{k}\bar{x}
(Corollary 1)\displaystyle(\text{Corollary~{}\ref{cor: martingale uniform convergence}})\implies t=Lkτ1ui(𝑿~t,𝑿t,λ~kt)(τLk)𝔼[ui(𝑿,𝑿,λ)]\displaystyle\sum_{t=L_{k}}^{\tau-1}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\tilde{\lambda}_{k_{t}})-(\tau-L_{k})\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]
(nΔLk1+Δ¯k)x¯(τLk)+16(τLk)log(128e(τLk)δ)x¯\displaystyle\leq(n\Delta_{L_{k}-1}+\bar{\Delta}_{k})\bar{x}(\tau-L_{k})+16\sqrt{(\tau-L_{k})\log(\frac{128e(\tau-L_{k})}{\delta})}\bar{x} w.p. 1δ\displaystyle\text{w.p. }1-\delta
81n(τLk)22(Lk1)log(256eLkδ)x¯+1442τlog(256eτδ)x¯\displaystyle\leq 81\sqrt{\frac{n(\tau-L_{k})^{2}}{2(L_{k}-1)}\log(\frac{256eL_{k}}{\delta})}\bar{x}+144\sqrt{2\tau\log(\frac{256e\tau}{\delta})}\bar{x} w.p. 1δ\displaystyle\text{w.p. }1-\delta (29)

The above is a high probability bound on how much an agent can get in one epoch. We can now bound the strategic agent’s utility over the full horizon.

t=1Tui(𝑿~t,𝑿t,λ~kt)T𝔼[ui(𝑿,𝑿,λ)]\displaystyle\sum_{t=1}^{T^{\prime}}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\tilde{\lambda}_{k_{t}})-T^{\prime}\mathbb{E}\left[u_{i}(\bm{X},\bm{X},\lambda^{*})\right]
\displaystyle\leq k=0log2T1[t=LkLk+11ui(𝑿~t,𝑿t,λ)(Lk+1Lk)𝔼[ui(𝑿,𝑿,λ)]]\displaystyle\sum_{k=0}^{\log_{2}T^{\prime}-1}\left[\sum_{t=L_{k}}^{L_{k+1}-1}u_{i}(\tilde{\bm{X}}_{t},\bm{X}_{t},\lambda)-(L_{k+1}-L_{k})\mathbb{E}[u_{i}(\bm{X},\bm{X},\lambda^{*})]\right]
(Using (29))\displaystyle(\text{Using }\eqref{eq: individual gain helper})\leq x¯(L11)+k=0log2T1(81n(Lk+1Lk)22(Lk1)log(256e(Lk1)δ)x¯\displaystyle\bar{x}(L_{1}-1)+\sum_{k=0}^{\log_{2}T^{\prime}-1}\left(81\sqrt{\frac{n(L_{k+1}-L_{k})^{2}}{2(L_{k}-1)}\log(\frac{256e(L_{k}-1)}{\delta})}\bar{x}\right.
+1442Lk+1log(256eLk+1δ)x¯)\displaystyle+\left.144\sqrt{2L_{k+1}\log(\frac{256eL_{k+1}}{\delta})}\bar{x}\right) w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T
(Lk=2k)\displaystyle(L_{k}=2^{k})\leq x¯+k=0log2T1285n2klog(256eTδ)x¯\displaystyle\bar{x}+\sum_{k=0}^{\log_{2}T^{\prime}-1}285\sqrt{n2^{k}\log(\frac{256eT^{\prime}}{\delta})}\bar{x} w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T
\displaystyle\leq (285221nTlog(256eδ)+1)x¯\displaystyle\left(\frac{285\sqrt{2}}{\sqrt{2}-1}\sqrt{nT^{\prime}\log(\frac{256e}{\delta})}+1\right)\bar{x} w.p 1δlog2T\displaystyle\text{w.p }1-\delta\log_{2}T

The result follows by replacing the original δ\delta with δlog2T\frac{\delta}{\log_{2}T}.

Appendix D Auxiliary Proofs

D.1 Proof of Claim 4

See 4

Proof.

We first prove existence by constructing a joint distribution with the desired marginals and monotonicity, then we show uniqueness.

Existence.

We will construct the joint distribution by defining the conditional distribution of XX given Y=yY=y for every yy. Note that if FF is a continuous distribution, then we can easily construct r(|Y=y)r(\cdot|Y=y) using the inverse-CDF method:

X|y=G1(F(y))X|y=G^{-1}(F(y))

where G1inf{x:G(x)p}G^{-1}\coloneqq\inf\{x\in\mathbb{R}:G(x)\geq p\} is the generalized inverse. This works because F(Y)F(Y)\sim Uniform[0,1]. If FF contains point masses, then F(Y)F(Y) is no longer uniformly distributed, and the inverse-CDF method does not work. To resolve this, we construct a different random variable Fu(y)F^{u}(y) for each value yy. For a given sample yy, If F(y)F(y)F(y)\neq F(y_{-}), let Fu(y)Uniform[F(y),F(y)]F^{u}(y)\sim\text{Uniform}[F(y_{-}),F(y)]. Otherwise, let Fu(y)=F(y)F^{u}(y)=F(y). Now we let

X|y=G1(Fu(y))X|y=G^{-1}(F^{u}(y))

To see that XX sampled using this process has the marginal distribution GG, we just need to show that Fu(Y)F^{u}(Y) is uniformly distributed. For a given pp, if ys.t.F(y)=p\exists y\,s.t.\,F(y)=p, then (Fu(Y)p)=(F(Y)p)=(Yy)=p\mathbb{P}(F^{u}(Y)\leq p)=\mathbb{P}(F(Y)\leq p)=\mathbb{P}(Y\leq y)=p. Otherwise that means ys.t.p1F(y)p and p2F(y)>p\exists y\,s.t.\,p_{1}\coloneqq F(y_{-})\leq p\text{ and }p_{2}\coloneqq F(y)>p.

(Fu(Y)p)\displaystyle\mathbb{P}(F^{u}(Y)\leq p)
=\displaystyle= (Y<y)+(Fu(y)p|Y=y)(Y=y)\displaystyle\mathbb{P}(Y<y)+\mathbb{P}(F^{u}(y)\leq p|Y=y)\mathbb{P}(Y=y)
=\displaystyle= p1+pp1p2p1(p2p1)\displaystyle p_{1}+\frac{p-p_{1}}{p_{2}-p_{1}}(p_{2}-p_{1})
=\displaystyle= p\displaystyle p

This construction also satisfies monotonicity, since if y1<y2y_{1}<y_{2}, then Fu(y1)F(y1)F^{u}(y_{1})\leq F(y_{1}) w.p.1. and Fu(y2)F(y1)F^{u}(y_{2})\geq F(y_{1}) w.p.1.

Uniqueness

Now we show uniqueness. For a given (x,y)(x,y) pair, suppose x<x¯r(y)x<\bar{x}_{r}(y). Then from monotonicity we know x¯r(y)x¯r(y)>x\underline{x}_{r}(y^{\prime})\geq\bar{x}_{r}(y)>x for all y>yy^{\prime}>y, which implies that

r(Xx,Yy)=G(x).\mathbb{P}_{r}(X\leq x,Y\leq y)=G(x).

If xx¯r(y)x\geq\bar{x}_{r}(y), then from monotonicity we know x¯r(y)x¯r(y)x\bar{x}_{r}(y^{\prime})\leq\bar{x}_{r}(y)\leq x for all y<yy^{\prime}<y, which implies that

r(Xx,Yy)=F(y)\mathbb{P}_{r}(X\leq x,Y\leq y)=F(y)

Since GG and FF are fixed, we have shown that all joint distributions rr with monotonicity and the required marginals are the same. ∎