This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Recommender Systems meet Mechanism Design

Yang Cai111Supported by a Sloan Foundation Research Fellowship and the NSF Award CCF-1942583 (CAREER).
Computer Science Department
Yale University
   Constantinos Daskalakis222Supported by NSF Awards CCF-1901292, DMS-2022448 and DMS-2134108, by a Simons Investigator Award, by the Simons Collaboration on the Theory of Algorithmic Fairness, by a DSTA grant, and by the DOE PhILMs project (No. DE-AC05-76RL01830).
EECS and CSAIL
MIT
Abstract

Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers.

We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders’ value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.

1 Introduction

Mechanism Design has found important applications in the design of offline and online markets. One of its main applications is the design of auctions, where a common goal is to maximize the seller’s revenue from the sale of one or multiple items to one or multiple bidders. This is challenging because bidders are strategic and interact with the auction in a way that benefits themselves rather than the seller. It is well-understood that, without any information about the bidders’ willingness to pay for different bundles of items, there is no meaningful way to optimize revenue. As such, a classical approach in Economics is to assume that bidders’ types – which determine their values for different bundles and thus their willingness to pay for different bundles – are not arbitrary but randomly drawn from a joint distribution DD that is common knowledge, i.e. known to all bidders and the auctioneer. With such a Bayesian prior, the revenue of different mechanisms is compared on the basis of what revenue they achieve in expectation with respect to bidder type vectors drawn from DD, and assuming that bidders play according to some (Bayesian) Nash equilibrium strategies, or some other type of (boundedly) rational behavior, e.g. no-regret learning.

Even with a Bayesian prior, however, revenue maximization is quite a challenging task. While Myerson’s celebrated work showed that a relatively simple mechanism is optimal in single-item settings [38], characterizing the structure of optimal multi-item mechanisms has been notoriously difficult both analytically and computationally. Indeed, it is known that (even approximately) optimal multi-item mechanisms may require description complexity that scales exponentially in the number of items, even when there is a single buyer [34, 27, 24, 3], they might be computationally intractable, even in simple settings [10, 23, 20], and they may exhibit several counter-intuitive properties which do not arise in single-item settings; see survey [21]. Nevertheless, recent years have seen substantial progress on various fronts: analytical characterizations of optimal multi-item mechanisms [22, 30, 35, 24]; computational frameworks for computing near-optimal multi-item mechanisms [2, 9, 10, 11, 12]; approximate multi-item revenue optimization via simple mechanisms [17, 18, 1, 33, 36, 14, 4, 46, 40, 13, 19, 15, 25]; and (approximate) multi-item revenue optimization using sample access to the type distribution [37, 31, 8, 44, 32, 7], including via the use of deep learning [29, 42, 28].

The afore-described progress on multi-item revenue optimization provides a diversity of tools that can be combined to alleviate the analytical and computational intractability of optimal mechanisms. Yet, there still remains an important challenge in applying those tools, which is that they typically require that the type distribution DD is either known or can be sampled. However, this is too strong an assumption. It is common that DD is estimated through market research or econometric analysis in related settings, involving similar items or a subset of the items. In this case, we would only hope to know some approximate distribution D^\hat{D} that is close to DD. In other settings, we may have sample access to the true distribution DD but there might be errors in measuring or recording those samples. Again, we might hope to estimate an approximate distribution D^\hat{D} that is close to DD. Unfortunately, it is well understood that designing a mechanism for D^\hat{D} and using it for DD might be a bad idea, as optimal mechanisms tend to overfit the details of the type distribution. This has motivated a strand of recent literature to study how to robustify mechanisms to errors in the distribution [5, 8, 6].

There is, in fact, another important reason why one might want to design mechanisms for some approximate type distribution. Multi-dimensional data is complex and one would want to leverage the extensive statistical and machine learning toolkit that allows approximating such high-dimensional distributions with more structured models. Indeed, while the true type distribution DD might not conform to a simple model, it might be close to a distribution D^\hat{D} that does. We would like to leverage the simple structure in D^\hat{D} to (i) alleviate the computational intractability of multi-item mechanisms, and (ii) reduce the amount of communication that the bidders and the auctioneer need to exchange. While the structured model D^\hat{D} might allow (i) and (ii), we need the guarantee that the revenue of our mechanism be robust when we apply it to the true distribution DD.

Motivated by the discussion above, in this work we build a multi-item mechanism design framework that combines matrix factorization models developed for recommendation systems with mechanism design, targeting two issues: (1) the intractability of Mechanism Design with respect to the number of items (arising from the exponential dependence of the number of types on the number of items if no further assumptions are placed); (2) the lack of exact access to the Bayesian priors. In particular, we assume that each bidder draws their type – specifying their values for a very large universe of NN items (think all restaurants in a city or all items on Amazon) – from a distribution DiD_{i} that is close to a Matrix Factorized model D^i\hat{D}_{i}, whose latent dimension is k<<Nk<<N. Targeting these approximate distributions D^i\hat{D}_{i} allows us to reduce the effective dimensionality of bidder types to kk, which has huge advantages in terms of the computational/representation/communication/sample complexity of mechanism design. We develop tools that allow us to (a) use the mechanism constructed for the approximate D^i\hat{D}_{i}’s under the true Di{D}_{i}’s without sacrificing much revenue; and (b) interact with the bidders who are unaware of the latent codes (they only understand their values for the NN items and are oblivious to the matrix factorized model) yet exploit the factorized model for efficiently communicating with them without the impractical burden of having them communicate their NN-dimensional type to the mechanism. In sum, our results are as follows:

  • With a query protocol 𝒬\mathcal{Q} that learns an approximate latent representation of a bidder’s type, Theorem 1 shows how to combine it with any mechanism M^\widehat{M} that is designed only for the Matrix Factorization model to produce a mechanism that generates comparable revenue but with respect to the true distribution. The result is obtained via a refinement of the robustification result in [7], where the loss in revenue, as well as the violation in incentive compatibility now only depend on the effective dimension of the Matrix Factorization model, kk, but not the total number of items, NN (Lemma 2).

  • We show that if the valuations are constrained-additive (Definition 5), we can obtain communication-efficient query protocols in several natural settings (Theorem 2). The queries we consider ask a bidder whether they are willing to purchase an item at a given price. In the first setting, the design matrix of the Matrix Factorization model contains a diagonally dominant matrix – a generalization of the well-known separability assumption by Donoho and Stodden [26]. In two other settings, we assume that the design matrix is generated from a probablistic model and show that a simple query protocol succeeds with high probability.

  • Combining Theorems 1 and 2, we show that, given any mechanism M^\widehat{M} that is designed only for the Matrix Factorization model, we can design a mechanism that achieves comparable revenue and only requires the bidders to answer a small number of simple queries. In particular, for several natural settings, we show that the number of queries scales quasi-linearly in the effective dimension of the Matrix Factorization model and independent of the total number of items (Proposition 1).

2 Preliminaries

2.1 Brief Introduction to Mechanism Design

We provide a brief introduction to mechanism design. To avoid a very long introduction, we only define the concepts in the context of multi-item auctions, which will be the focus of this paper. See Chapter 9 of [39] and the references therein for a more detailed introduction to mechanism design.

Multi-item Auctions.

The seller is selling NN heterogenous items to mm bidders. Each bidder ii is assumed to have a private type tit_{i} that encodes their preference over the items and bundles of items. We assume that tit_{i} lives in the NN-dimensional Euclidean space. For each bidder, there is a publicly known valuation function vi(,)v_{i}(\cdot,\cdot), where vi(ti,S)v_{i}(t_{i},S)\in\mathbb{R} is bidder ii’s value for bundle S[N]S\subseteq[N] when ii’s private type is tit_{i}. In this paper, we consider the Bayesian setting with private types, that is, each bidder’s type tit_{i} is drawn privately and independently from a publicly known distribution DiD_{i}.

Mechanism.

The seller designs a mechanism to sell the items to bidders. A mechanism consists of an allocation rule and a payment rule, where the allocation rule decides a way to allocate the items to the bidders, and the payment rule decides how much to charge each bidder.

Direct Mechanism:

In a direct mechanism, the mechanism directly solicits types from the bidders and apply the allocation and payment rules on the reported types. More specifically, for any reported type profile b=(b1,,bm)b=(b_{1},\ldots,b_{m}), a direct mechanism M:=(x(),p())M:=(x(\cdot),p(\cdot)) selects x(b){0,1}m×Nx(b)\in\{0,1\}^{m\times N} as the allocation and charges each bidder ii payment pi(b)p_{i}(b).333Note that p(b)=(p1(b),,pm(b))p(b)=(p_{1}(b),\ldots,p_{m}(b)). We slightly abuse notation to allow the allocation rule to be randomized, so x(b)Δ({0,1}m×N)x(b)\in\Delta\left(\{0,1\}^{m\times N}\right). We assume that bidders have quasi-linear utilities. If bidder ii’s private type is tit_{i}, her utility under reported bid profile bb is ui(ti,M(b))=𝔼[vi(ti,x(b))pi(b)]u_{i}\left(t_{i},M(b)\right)=\mathbb{E}\left[v_{i}\left(t_{i},x(b)\right)-p_{i}(b)\right], where the expectation is over the randomness of the allocation and payment rule.

Expected Revenue:

In this paper, our goal is to design mechanisms with high expected revenue. For a direct mechanism MM, we use Rev(M,D)\textsc{Rev}(M,D) to denote 𝔼tD[i[m]pi(t)]\mathbb{E}_{t\sim D}[\sum_{i\in[m]}p_{i}(t)], where t=(t1,,tm)t=(t_{1},\ldots,t_{m}) is the type profile and is drawn from D=×i[m]DiD=\bigtimes_{i\in[m]}D_{i}.

Incentive Compatibility and Individual Rationality

Since the bidders’ types are private, unless the mechanism incentivizes the bidders to report truthfully, there is no reason to expect that the reported types correspond to the true types. The notion of incentive compatibility is defined to capture this.

  • ε\varepsilon-Bayesian Incentive Compatible (ε\varepsilon-BIC): if bidders draw their types from some distribution D=×i=1mDiD=\bigtimes_{i=1}^{m}D_{i}, then a direct mechanism MM is ε\varepsilon-BIC with respect to DD if for each bidder i[m]i\in[m]

    𝔼tiDi[ui(ti,M(ti,ti))]𝔼tiDi[ui(ti,M(ti,ti))]ε,\mathbb{E}_{t_{-i}\sim D_{-i}}[u_{i}(t_{i},M(t_{i},t_{-i}))]\geq\mathbb{E}_{t_{-i}\sim D_{-i}}[u_{i}(t_{i},M(t^{\prime}_{i},t_{-i}))]-\varepsilon,

    for all potential misreports tit^{\prime}_{i}, in expectation over all other bidders bid tit_{-i}. A mechanism is BIC if it is 0-BIC.

  • (ε,δ)(\varepsilon,\delta)-Bayesian Incentive Compatible ((ε,δ)(\varepsilon,\delta)-BIC): if bidders draw their types from some distribution D=×i=1mDiD=\bigtimes_{i=1}^{m}D_{i}, then a direct mechanism MM is (ε,δ)(\varepsilon,\delta)-BIC with respect to DD if for each bidder i[m]i\in[m]:

    PrtiDi[𝔼tiDi[ui(ti,M(ti,ti))]𝔼tiDi[ui(ti,M(ti,ti))]ε]1δ.\Pr_{t_{i}\sim D_{i}}\left[\mathbb{E}_{t_{-i}\sim D_{-i}}[u_{i}(t_{i},M(t_{i},t_{-i}))]\geq\mathbb{E}_{t_{-i}\sim D_{-i}}[u_{i}(t_{i},M(t^{\prime}_{i},t_{-i}))]-\varepsilon\right]\geq 1-\delta.
  • Individually Rational (IR): A direct mechanism MM is IR if for all type profiles t=(t1,,tm)t=(t_{1},\ldots,t_{m}),

    ui(ti,M(ti,ti))0u_{i}(t_{i},M(t_{i},t_{-i}))\geq 0

    for all bidders i[m]i\in[m].

Indirect Mechanism:

An indirect mechanism does not directly solicit the bidders’ types. After interacting with the bidders, the mechanism selects an allocation and payments. The notions of ε\varepsilon-Bayesian Incentive Compatibility and Individual Rationality can be extended to indirect mechanisms using the solution concept of ε\varepsilon-Bayes Nash equilibrium. The notion of (ε,δ)(\varepsilon,\delta)-Bayesian Incentive Compatibility can be extended to indirect mechanisms using the new solution concept, which we call (ε,δ)(\varepsilon,\delta)-weak approximate Bayes Nash equilibrium. In an incomplete information game, a strategy profile is an (ε,δ)(\varepsilon,\delta)-weak approximate Bayes Nash equilibrium if for every bidder, with probability no more than δ\delta (over the randomness of their own type), unilateral deviation from the Bayesian Nash strategy can increase the deviating bidder’s expected utility (with respect to the randomness of the other bidders’ types and assuming those follow their Bayesian Nash equilibrium strategies) by more than ε\varepsilon.

Remark 1.

For a (ε,δ)(\varepsilon,\delta)-weak approximate Bayes Nash equilibrium, its expected revenue computation is made in this paper using the convention that all bidders follow their (ε,δ)(\varepsilon,\delta)-weak approximate Bayes Nash equilibrium strategies. At a cost of an additive m2δHm^{2}\delta H loss in revenue (where HH is the highest possible value of any bidder), we can assume that only the (1δ)(1-\delta)-fraction of types of each bidder who have no more than ε\varepsilon incentive to deviate from the weak approximate Bayes Nash equilibrium strategies follow these strategies while the remaining δ\delta fraction use arbitrary strategies. Similarly, we can interpret the (ε,δ)(\varepsilon,\delta)-weak approximate Bayes Nash equilibrium definition as requiring that at least (1δ)(1-\delta)-fraction of the types of each bidder have at most O(ε+mδH)O(\varepsilon+m\delta H) incentive to deviate from the Bayes Nash strategies assuming that for every other bidder at most δ\delta fraction of their types deviate from their Bayes Nash strategies.

2.2 Further Preliminaries

Definition 1.

Let (U,d)(U,d) be a metric space and \mathcal{B} be a σ\sigma-algebra on UU. For AA\in\mathcal{B}, let Aε={x:yAs.td(x,y)<ε}A^{\varepsilon}=\{x:\exists y\in A\ \ s.t\ \ d(x,y)<\varepsilon\}. Two probability measure PP and QQ on \mathcal{B} have Prokhorov distance

inf{ε>0:P(A)Q(Aε)+ε and Q(A)P(Aε)+ε,A}.\inf\left\{\varepsilon>0:P(A)\leq Q(A^{\varepsilon})+\varepsilon\text{ and }\ Q(A)\leq P(A^{\varepsilon})+\varepsilon,~{}\forall A\in\mathcal{B}\right\}.

We consider distributions supported on some Euclidean Space, and we choose dd to be the \ell_{\infty}-distance. We denote the \ell_{\infty}-Prokhorov distance between distributions FF, F^\widehat{F} by dP(F,F^)d_{P}(F,\widehat{F}).

We will also make use of the following characterization of the Prokhorov metric by [43].

Lemma 1 (Characterization of the Prokhorov Metric [43]).

Let FF and F^\widehat{F} be two distributions supported on n\mathbb{R}^{n}. dP(F,F^)εd_{P}(F,\widehat{F})\leq\varepsilon if and only if there exists a coupling γ\gamma of FF and F^\widehat{F}, such that Pr(x,y)γ[xy>ε]ε\Pr_{(x,y)\sim\gamma}\left[\left\lVert x-y\right\rVert_{\infty}>\varepsilon\right]\leq~{}\varepsilon.

Definition 2 (Influence Matrix and Weak Dependence).

For any dd-dimensional random vector X=(X1,,Xd){X}=(X_{1},\ldots,X_{d}), we define the influence of variable jj on variable ii as

αi,j:=supxijxjxjdTV(FXiXj=xj,Xij=xij,FXiXj=xj,Xij=xij),\alpha_{i,j}:=\sup_{\begin{subarray}{c}x_{-i-j}\\ x_{j}\neq x^{\prime}_{j}\end{subarray}}d_{TV}\left(F_{X_{i}\mid X_{j}=x_{j},X_{-i-j}=x_{-i-j}},F_{X_{i}\mid X_{j}=x^{\prime}_{j},X_{-i-j}=x_{-i-j}}\right),

where FXiXi=xiF_{X_{i}\mid X_{-i}=x_{-i}} denotes the conditional distribution of XiX_{i} given Xi=xiX_{-i}=x_{-i}, and dTV(D,D)d_{TV}(D,D^{\prime}) denotes the total variational distance between distribution DD and DD^{\prime}. Also, let αi,i:=0\alpha_{i,i}:=0 for each ii, and we use Inf(X)\textsc{Inf}(X) to denote the d×dd\times d matrix (αi,j)i[d],j[d](\alpha_{i,j})_{i\in[d],j\in[d]}. In this paper, we consider the coordinates of XX to be weakly dependent if Inf(X)2<1\left\lVert\textsc{Inf}(X)\right\rVert_{2}<1.

3 Our Model and Main Results

Setting and Goal:

We consider a classical mechanism design problem, wherein a seller is selling NN items to mm buyers, where buyer ii’s type tit_{i} is drawn from a distribution DiD_{i} over N\mathbb{R}^{N} independently. The goal is to design a mechanism that maximizes the seller’s revenue. In this paper, we operate in a setting where DiD_{i} is unknown, but we are given access to the following components: (I) For each bidder ii, we are given a machine learning model D^i\widehat{D}_{i} — of the matrix factorization type as described below, which approximates DiD_{i}. (II) We are given a good mechanism M^\widehat{M} for the approximate type distributions; in its design this mechanism can exploit the low effective dimensionality, kk, of types in the approximate model. Our goal is (III) to use (I) and (II) to obtain a good mechanism for the true type distributions.

(I) The Machine Learning Component:

We assume that each bidder’s type distribution DiD_{i} can be well-approximated by a known Matrix Factorization (MF) model D^i\widehat{D}_{i}. In particular:

  • We use AN×kA\in\mathbb{R}^{N\times k} to denote the design matrix of the model, where each column can be viewed as the type (over NN items) of an “archetype.” As described in the following two bullets, types are sampled by each D^i\widehat{D}_{i} as linear combinations over archetypes.

  • We use D^z,i\widehat{D}_{z,i} to denote a distribution over [0,1]k[0,1]^{k}. The subscript zz is not a parameter of the distribution — it serves to remind us that this distribution samples in the latent space [0,1]k[0,1]^{k} and distinguish it from the distribution D^i\widehat{D}_{i} defined next.

  • If FF is a distribution over k\mathbb{R}^{k}, we use AFA\circ F to denote the distribution of the random variable AzAz, where zFz\sim F. With this notation, we use D^i\widehat{D}_{i} to denote AD^z,iA\circ\widehat{D}_{z,i}.

  • We assume that, for each bidder, the matrix factorization model is not far away from the true type distribution, that is, for some ε1>0\varepsilon_{1}>0 we have that dP(Di,D^i)ε1d_{P}(D_{i},\widehat{D}_{i})\leq\varepsilon_{1} for all i[m]i\in[m].

Remark 2.

In the above description we assumed that all D^i\widehat{D}_{i}’s share the same design matrix AA. This is done to avoid overloading notation but all our results would hold if each D^i\widehat{D}_{i} had its own design matrix AiA_{i}.

(II) The Mechanism Design Component:

We assume that we are given a direct mechanism M^\widehat{M} for types drawn from the Machine Learning model. In particular, we assume that this mechanism makes use of the effective dimension kk of the Machine Learning model, accepting “latent types” (of dimension kk) as input from the bidders. Specifically:

  • Recall that, for each bidder ii, their valuation function vi(,):N×2[N]v_{i}(\cdot,\cdot):\mathbb{R}^{N}\times 2^{[N]}\rightarrow\mathbb{R} is common knowledge. (Recall that viv_{i} takes as input the bidder’s type and a subset of items so how the bidder values different subsets of items depends on their private type.)

  • The designer is given AA and D^z,i\widehat{D}_{z,i} for each bidder ii, and treats bidder ii’s type as drawn from D^z,i\widehat{D}_{z,i}, i.e. in the latent space [0,1]k[0,1]^{k}. With respect to such “latent types,” there is an induced valuation function. In particular, for each bidder ii, we use viA:k×2[N]v^{A}_{i}:\mathbb{R}^{k}\times 2^{[N]}\rightarrow\mathbb{R} to denote the valuation function defined as follows viA(zi,S):=vi(Azi,S){v}^{A}_{i}(z_{i},S):=v_{i}(Az_{i},S), where zikz_{i}\in\mathbb{R}^{k}.

  • With the above as setup, we assume that the designer designs a mechanism M^\widehat{M} that is BIC and IR w.r.t. D^z=×i=1mD^z,i\widehat{D}_{z}=\bigtimes_{i=1}^{m}\widehat{D}_{z,i} and valuation functions {viA(,)}i[m]\{v^{A}_{i}(\cdot,\cdot)\}_{i\in[m]}.

(III) The New Component:

We consider the regime where NkN\gg k, and our goal is to combine the Machine Learning component with the Mechanism Design component to produce a mechanism which generates revenue comparable to Rev(M^,D^z)\textsc{Rev}(\widehat{M},\widehat{D}_{z}) when used for bidders whose types are drawn from D=×i=1mDiD=\bigtimes_{i=1}^{m}D_{i}. There are two challenges: (i) M^\widehat{M} takes as input the latent representation of a bidder’s type under D^z\widehat{D}_{z}, however under DD a bidder is simply ignorant about any latent representation of their type so they cannot be asked about it; (ii) M^\widehat{M}’s revenue is evaluated with respect to D^z\widehat{D}_{z} and valuation functions {viA(,)}i[m]\{v^{A}_{i}(\cdot,\cdot)\}_{i\in[m]} and our goal is to obtain a mechanism whose revenue is similar under DD and valuation functions {vi(,)}i[m]\{v_{i}(\cdot,\cdot)\}_{i\in[m]}. We show how to use a communication efficient query protocol together with a robustification procedure to combine the Machine Learning and Mechanism Design components.

To state our results, we first need to formally define query protocols and some of their properties.

Definition 3 ((ε,δ)(\varepsilon,\delta)-query protocol).

Let 𝒬\mathcal{Q} be a query protocol, i.e., some communication protocol that exchanges messages with a bidder over possibly several rounds and outputs a vector in k\mathbb{R}^{k}. We say that a bidder interacts with the query protocol truthfully, if whenever the protocol asks the bidder to evaluate some function on their type the bidder evaluates the function and returns the result truthfully. We use 𝒬(t)k\mathcal{Q}(t)\in\mathbb{R}^{k} to denote the output of 𝒬\mathcal{Q} when interacting with a truthful bidder whose type is tNt\in\mathbb{R}^{N}. 𝒬\mathcal{Q} is called a (ε,δ)(\varepsilon,\delta)-query protocol, if for any tNt\in\mathbb{R}^{N} and zkz\in\mathbb{R}^{k} satisfying tAzε\left\lVert t-Az\right\rVert_{\infty}\leq\varepsilon, we have that z𝒬(t)δ\left\lVert z-\mathcal{Q}(t)\right\rVert_{\infty}\leq\delta.

We also need the notion of Lipschitz valuations to formally state our result.

Definition 4 (Lipschitz Valuations).

v(,):N×2[N]v(\cdot,\cdot):\mathbb{R}^{N}\times 2^{[N]}\rightarrow\mathbb{R} is a \mathcal{L}-Lipschitz valuation, if for any two types t,tNt,t^{\prime}\in\mathbb{R}^{N} and any bundle S[N]S\subseteq[N], |v(t,S)v(t,S)|tt|v(t,S)-v(t^{\prime},S)|\leq\mathcal{L}\left\lVert t-t^{\prime}\right\rVert_{\infty}.

This includes familiar settings, for example if the bidder is cc-demand, the Lipschitz constant =c\mathcal{L}=c.444A bidder is cc-demand if for any set SS of items, the bidder picks their favorite bundle with size no more than cc in SS evaluating the value of each such bundle additively, with values as determined by the bidder’s type tt. Formally, v(t,S)=maxBS,|B|cjBtjv(t,S)=\max_{B\subseteq S,|B|\leq c}\sum_{j\in B}t_{j}.

We are now ready to state our first main result.

Theorem 1.

Let D=×i=1mDiD=\bigtimes_{i=1}^{m}D_{i} be the bidders’ type distributions and vi:N×2[N]v_{i}:\mathbb{R}^{N}\times 2^{[N]}\rightarrow\mathbb{R} be a \mathcal{L}-Lipschitz valuation for each bidder i[m]i\in[m]. Also, let AN×kA\in\mathbb{R}^{N\times k} be a design matrix and D^z,i\widehat{D}_{z,i} be a distribution over k\mathbb{R}^{k} for each i[m]i\in[m].

Suppose we are given query access to a mechanism M^\widehat{M} that is BIC and IR w.r.t. D^z=×i=1mD^z,i\widehat{D}_{z}=\bigtimes_{i=1}^{m}\widehat{D}_{z,i} and valuations {viA}i[m]\{{v}^{A}_{i}\}_{i\in[m]} (as defined in the second bullet of the Mechanism Design component above), and there exists ε1>0\varepsilon_{1}>0 such that dP(Di,AD^z,i)ε1d_{P}(D_{i},A\circ\widehat{D}_{z,i})\leq\varepsilon_{1} for all i[m]i\in[m]. Given any (ε1,ε)(\varepsilon_{1},\varepsilon)-query protocol with εε1\varepsilon\geq\varepsilon_{1}, we can construct mechanism MM using only query access to M^\widehat{M} and obliviously with respect to DD, such that for any possible DD that satisfies the above conditions of Prokhorov distance closeness the following hold:

  1. 1.

    MM only interacts with every bidder using 𝒬\mathcal{Q} once;

  2. 2.

    MM is (κ,ε1)(\kappa,\varepsilon_{1})-BIC w.r.t. DD and IR, where κ=O(ε1+Amε+Amε)\kappa=O\left(\mathcal{L}\varepsilon_{1}+\left\lVert A\right\rVert_{\infty}\mathcal{L}m\varepsilon+\left\lVert A\right\rVert_{\infty}\mathcal{L}\sqrt{m\varepsilon}\right);

  3. 3.

    The expected revenue of MM is at least Rev(M^,D^z)O(mκ).\textsc{Rev}(\widehat{M},\widehat{D}_{z})-O\left(m\kappa\right).

Remark 3.

The mechanism MM will be an indirect mechanism. We are slightly imprecise here to call the mechanism (κ,ε1)(\kappa,\varepsilon_{1})-BIC. Formally what we mean is that interacting with 𝒬\mathcal{Q} truthfully is a (κ,ε1)(\kappa,\varepsilon_{1})-weak approximate Bayes Nash equilibrium. We compute the expected revenue assuming all bidders interacting with 𝒬\mathcal{Q} truthfully. As we discussed in Remark 1, with an additional additive Am2ε1\left\lVert A\right\rVert_{\infty}\mathcal{L}m^{2}\varepsilon_{1} loss in revenue, we can assume that only the (1δ)(1-\delta)-fraction of types of each bidder who have no more than ε\varepsilon incentive to deviate from the Bayes Nash strategies interact with 𝒬\mathcal{Q} truthfully while the remaining δ\delta fraction uses arbitrary strategies.

Why isn’t  [7] sufficient?

One may be tempted to prove Theorem 1 using [7]. However, there are two subtle issues with this approach: (i) The violation of the incentive compatibility constraints and the revenue loss of the robustification process in [7] depend linearly in NN, rather than in A\left\lVert A\right\rVert_{\infty} as in Theorem 1. Note that A=maxi[N]j=1k|Aij|\left\lVert A\right\rVert_{\infty}=\max_{i\in[N]}\sum_{j=1}^{k}|A_{ij}|, which only depends on kk and the largest value an archetype can have for a single item and thus could be significantly smaller than NN. (ii) The robustification process involves sampling from the conditional distribution of AD^z,iA\circ\widehat{D}_{z,i} on an NN-dimensional cube, which is equivalent to sampling from the conditional distribution of D^z,i\widehat{D}_{z,i} on a set SS whose image after the linear transformation AA is the NN-dimensional cube. However, SS may be difficult to sample from if AA is not a well-conditioned.

In the following lemma, we refine the robustification result in [7] (Theorem 3 in that paper) and show that given an approximate distribution F^\widehat{F} in the latent space and a BIC and IR mechanism M^\widehat{M} w.r.t. F^\widehat{F}, we can robustify M^\widehat{M} with negligible revenue loss so that it is an approximately BIC and exactly IR mechanism w.r.t. FF for any distribution FF that is within the ε\varepsilon-Prokhorov ball around F^\widehat{F}. Importantly, we exploit the effective dimension of the matrix factorization model to replace the dependence on NN with A\left\lVert A\right\rVert_{\infty} in both the violation of the incentive compatibility constraints and the revenue loss. Additionally, we only need to be able to sample from the conditional distribution of D^z,i\widehat{D}_{z,i} on a kk-dimensional cube. We postpone the proof of Lemma 2 to the Appendix A.

Lemma 2.

Let AN×kA\in\mathbb{R}^{N\times k} be the design matrix. Suppose we are given a collection of distributions over latent types {F^z,i}i[m]\{\widehat{F}_{z,i}\}_{i\in[m]}, where the support of each F^z,i\widehat{F}_{z,i} lies in [0,1]k[0,1]^{k}, and a BIC and IR mechanism M^\widehat{M} w.r.t. F^=×i=1mF^z,i\widehat{F}=\bigtimes_{i=1}^{m}\widehat{F}_{z,i} and valuations {viA}i[m]\{v^{A}_{i}\}_{i\in[m]}, where each viv_{i} is an \mathcal{L}-Lipschitz valuation. Let F=×i=1mFz,iF=\bigtimes_{i=1}^{m}F_{z,i} be any distribution such that dP(Fz,i,F^z,i)εd_{P}(F_{z,i},\widehat{F}_{z,i})\leq\varepsilon for all i[m]i\in[m]. Given access to a sampling algorithm 𝒮i\mathcal{S}_{i} for each i[m]i\in[m], where 𝒮i(x,δ)\mathcal{S}_{i}(x,\delta) draws a sample from the conditional distribution of F^z,i\widehat{F}_{z,i} on the kk-dimensional cube ×j[k][xj,xj+δ)\bigtimes_{j\in[k]}[x_{j},x_{j}+\delta), we can construct a randomized mechanism M~\widetilde{M} using only query access to M^\widehat{M} and obliviously with respect to FF, such that for any FF satisfying the above conditions of Prokhorov distance closeness the following hold:

  1. 1.

    MM is κ\kappa-BIC and IR w.r.t.  FF and valuations {viA}i[m]\{v^{A}_{i}\}_{i\in[m]}, where κ=O(Amε+A(δ+mεδ))\kappa=O\left(\left\lVert A\right\rVert_{\infty}\mathcal{L}m\varepsilon+\left\lVert A\right\rVert_{\infty}\mathcal{L}\left(\delta+\frac{m\varepsilon}{\delta}\right)\right);

  2. 2.

    The expected revenue of M~\widetilde{M} is Rev(M~,F)Rev(M^,F^)O(mκ).\textsc{Rev}\left(\widetilde{M},F\right)\geq\textsc{Rev}(\widehat{M},\widehat{F})-O\left(m\kappa\right).

Equipped with Lemma 2, we proceed to prove Theorem 1.

Proof of Theorem 1: Consider the following mechanism:

1:  Construct mechanism M~\widetilde{M} using Lemma 2 by choosing F^z,i\widehat{F}_{z,i} to be D^z,i\widehat{D}_{z,i} for each i[m]i\in[m] and δ\delta to be mε\sqrt{m\varepsilon}.
2:  Query each agent ii using 𝒬\mathcal{Q}. Let 𝒬(bi)\mathcal{Q}(b_{i}) be the output after interacting with bidder ii. (For any possible output produced by 𝒬\mathcal{Q}, there exists a type bNb\in\mathbb{R}^{N}, so this is w.l.o.g..)
3:  Execute mechanism M~\widetilde{M} on bid profile (𝒬(b1),,𝒬(bm))\left(\mathcal{Q}(b_{1}),\ldots,\mathcal{Q}(b_{m})\right).
Algorithm 1 Query-based Indirect Mechanism MM

Let tit_{i} be bidder ii’s type and ziz_{i} be a random variable distributed according to D^z,i\widehat{D}_{z,i}. Since dP(Di,D^i)ε1d_{P}(D_{i},\widehat{D}_{i})\leq\varepsilon_{1}, Lemma 1 guarantees a coupling between tit_{i} and AziAz_{i} such that their \ell_{\infty} distance is more than ε1\varepsilon_{1} with probability no more than ε1\varepsilon_{1}. As 𝒬\mathcal{Q} is a (ε1,ε)(\varepsilon_{1},\varepsilon)-query protocol, when tit_{i} and AziAz_{i} are not ε1\varepsilon_{1} away, we have 𝒬(ti)ziε\left\lVert\mathcal{Q}(t_{i})-z_{i}\right\rVert_{\infty}\leq\varepsilon. Hence, there exists a coupling between 𝒬(ti)\mathcal{Q}(t_{i}) and ziz_{i} so that their \ell_{\infty} distance is more than ε\varepsilon with probability no more than ε\varepsilon (recall ε1ε\varepsilon_{1}\leq\varepsilon). If we choose Fz,iF_{z,i} to be the distribution of 𝒬(ti)\mathcal{Q}(t_{i}), F^z,i\widehat{F}_{z,i} to be D^z,i\widehat{D}_{z,i}, and δ\delta to be mε\sqrt{m\varepsilon}Lemma 2 states that M~\widetilde{M} is a O(Amε+Amε)O\left(\left\lVert A\right\rVert_{\infty}\mathcal{L}m\varepsilon+\left\lVert A\right\rVert_{\infty}\mathcal{L}\sqrt{m\varepsilon}\right)-BIC mechanism if bidder ii has valuation viA()v_{i}^{A}(\cdot) and type 𝒬(ti)\mathcal{Q}(t_{i}). Consider two cases: (a) When tiAziε1\left\lVert t_{i}-Az_{i}\right\rVert_{\infty}\leq\varepsilon_{1}, then tiA𝒬(ti)ε1+Aε\left\lVert t_{i}-A\mathcal{Q}(t_{i})\right\rVert_{\infty}\leq\varepsilon_{1}+\left\lVert A\right\rVert_{\infty}\varepsilon. Since vi()v_{i}(\cdot) is \mathcal{L}-Lipschitz, deviating from interacting with 𝒬\mathcal{Q} truthfully can increase the expected utility by at most O(ε1+Amε+Amε)O\left(\mathcal{L}\varepsilon_{1}+\left\lVert A\right\rVert_{\infty}\mathcal{L}m\varepsilon+\left\lVert A\right\rVert_{\infty}\mathcal{L}\sqrt{m\varepsilon}\right). (b) When tiAzi>ε1\left\lVert t_{i}-Az_{i}\right\rVert_{\infty}>\varepsilon_{1}, the bidder may substantially improve their expected utility by deviating. Luckily, such case happens with probability no more than ε1\varepsilon_{1}. \Box

In Theorem 2, we show how to obtain (ε,δ)(\varepsilon,\delta)-queries under various settings. We further assume that the bidders’ valuations are all constrained-additive.

Definition 5 (Constrained-Additive valuation).

A valuation function v:N×2[N]v:\mathbb{R}^{N}\times 2^{[N]}\rightarrow\mathbb{R} is constrained additive if v(t,S)=maxT2SjT(μj+tj)v(t,S)=\max_{T\in\mathcal{I}\cap 2^{S}}\sum_{j\in T}(\mu_{j}+t_{j}), where \mathcal{I} is a downward-closed set system, and μ=(μ1,,μN)\mu=(\mu_{1},\ldots,\mu_{N}) is a fixed vector.555One can interpret μ\mu as the common based values for the items that are shared among all types. For example, unit-demand valuation is when \mathcal{I} includes all subsets with size no more than 11. If all elements of \mathcal{I} have size no more than \mathcal{L}, then vv is a \mathcal{L}-Lipschitz valuation.

Theorem 2.

Let all bidders’ valuations be constrained-additive. We consider queries of the form: ejTt?pe_{j}^{T}t\overset{?}{\geq}p, where eje_{j} is the jj-th standard unit vector in N\mathbb{R}^{N}. The query simply asks whether the bidder is willing to pay at least p+μjp+\mu_{j} for winning item jj. The bidder provides a Yes/No answer. We obtain communicationally efficient protocols in the following settings:

  • Deterministic Structure: If ATA^{T} can be expressed as [CTHT]ΠN[C^{T}H^{T}]\Pi_{N}, where ΠNN\Pi_{N}\in\mathbb{R}^{N} is a permutation matrix, HH is an arbitrary (Nk)×k(N-k)\times k matrix, and Ck×kC\in\mathbb{R}^{k\times k} is diagonally dominant both by rows and by columns. This is a relaxation of the well-known separability assumption by Donoho and Stodden [26], that is, ATA^{T} can be expressed as [IkHT]ΠN[I_{k}H^{T}]\Pi_{N}, where IkI_{k} is the kk-dimensional identity matrix. Let α=mini[k](|Cii|ji|Cij|)\alpha=\min_{i\in[k]}\left(|C_{ii}|-\sum_{j\neq i}|C_{ij}|\right) and β=minj[k](|Cjj|ij|Cij|)\beta=\min_{j\in[k]}\left(|C_{jj}|-\sum_{i\neq j}|C_{ij}|\right). We have a (ε,4maxj[k]Cjjαβε)\left(\varepsilon,\frac{4\cdot\max_{j\in[k]}C_{jj}}{\alpha\beta}\cdot\varepsilon\right)-query protocol using O(klog(Aε))O\left(k\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon}\right)\right) queries for any ε>0\varepsilon>0.

  • Ex-ante Analysis: If AA is generated from a distribution, where each archetype is an independent copy of a NN-dimensional random vector θ\theta.

    • Multivariate Gaussian Distributions: θ\theta is distributed according to a multivariate Gaussian distribution 𝒩(0,Σ)\mathcal{N}(0,\Sigma). If there exists a subset S[N]S\subseteq[N] such that Tr(ΣS)ρ(ΣS)>64k\frac{\mathrm{Tr}(\Sigma_{S})}{\rho(\Sigma_{S})}>64k, where ΣS=𝔼[θSθST]\Sigma_{S}=\mathbb{E}[\theta_{S}\theta_{S}^{T}] is the covariance matrix for items in SS and ρ(ΣS)\rho(\Sigma_{S}) is the largest eigenvalue of ΣS\Sigma_{S},666θS\theta_{S} is the |S||S|-dimensional vector that contains all θi\theta_{i} with iSi\in S. then with probability at least 12exp(Tr(ΣS)16ρ(ΣS))1-2\exp\left(-\frac{\mathrm{Tr}(\Sigma_{S})}{16\cdot\rho(\Sigma_{S})}\right), we have a (ε,64|S|kTr(ΣS)ε)\left(\varepsilon,\frac{64\sqrt{|S|k}}{\sqrt{\mathrm{Tr}(\Sigma_{S})}}\cdot\varepsilon\right)-query protocol using O(|S|log(Aε))O\left(|S|\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon}\right)\right) queries for any ε>0\varepsilon>0. Note that when the entries of θ\theta are i.i.d., any SS with size at least 64k64k satisfies the condition.

    • Bounded Distributions with Weak Dependence: Let θi\theta_{i} be supported on [c,c][-c,c] and has mean 0 for each i[N]i\in[N]. If there exists a subset S[N]S\subseteq[N] such that Inf(θS)2<1\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}<1, and iSvi2>16c2k|S|1Inf(θS)2\sum_{i\in S}v_{i}^{2}>\frac{16c^{2}k\sqrt{|S|}}{1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}}, where vi2:=Var[θi]v_{i}^{2}:=\mathrm{Var}[\theta_{i}], then with probability at least
      12exp((1Inf(θS)2)(iSvi2)264c4k|S|)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}\right)\cdot(\sum_{i\in S}v_{i}^{2})^{2}}{64c^{4}k|S|}\right), we have a (ε,64|S|kiSvi2ε)\left(\varepsilon,\frac{64\sqrt{|S|k}}{\sqrt{\sum_{i\in S}v_{i}^{2}}}\cdot\varepsilon\right)-query protocol using
      O(|S|log(Aε))O\left(|S|\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon}\right)\right) queries for any ε>0\varepsilon>0. Note that when the entries of θ\theta are independent, Inf(θS)2=0\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}=0 for any set SS. If each θi\theta_{i} has variance Ω(c2)\Omega(c^{2}), then any set with size at least αk2\alpha k^{2} suffices for some absolute constant α\alpha.

Remark 4.

In the ex-ante analysis, the success probabilities depend on the parameters of the distributions, but note that they are both at least 12exp(4k)1-2\exp(-4k).

Before we prove Theorem 2, we combine it with Theorem 1 to derive results for a few concrete settings.

Proposition 1.

Under the same setting as in Theorem 1 with the extra assumption that every valuation viv_{i} is constrained-additive, we can construct mechanism MM using only query access to the given mechanism M^\widehat{M} and oblivious to the true type distribution DD, such that for any possible DD, MM is (η,ε1)\left(\eta,\varepsilon_{1}\right)-BIC and IR, where η=O(ε1+Amf(ε1)+Amf(ε1))\eta=O\left(\mathcal{L}\varepsilon_{1}+\left\lVert A\right\rVert_{\infty}\mathcal{L}mf(\varepsilon_{1})+\left\lVert A\right\rVert_{\infty}\mathcal{L}\sqrt{mf(\varepsilon_{1})}\right), and has revenue at least Rev(M^,D^z)O(Am2f(ε1)+Am3/2f(ε1)1/2)\textsc{Rev}(\widehat{M},\widehat{D}_{z})-O\left(\left\lVert A\right\rVert_{\infty}\mathcal{L}m^{2}f(\varepsilon_{1})+\left\lVert A\right\rVert_{\infty}\mathcal{L}m^{3/2}f(\varepsilon_{1})^{1/2}\right). Recall that ε1\varepsilon_{1} satisfies dP(Di,AD^z,i)ε1d_{P}(D_{i},A\circ\widehat{D}_{z,i})\leq\varepsilon_{1} for all i[m]i\in[m]. We compute the function f()f(\cdot) and the number of queries for the following three concrete settings (one for each of the three assumptions in Theorem 2).

  1. 1.

    Deterministic Structure: Separability. If the design matrix AA satisfies the separability assumption by Donoho and Stodden [26], that is, ATA^{T} can be expressed as [IkHT]ΠN[I_{k}H^{T}]\Pi_{N}, where ΠNN\Pi_{N}\in\mathbb{R}^{N} is a permutation matrix, f(ε1)=4ε1f(\varepsilon_{1})=4\varepsilon_{1} for all ε>0\varepsilon>0. The number of queries each bidder needs to answer is O(klog(Aε1))O\left(k\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon_{1}}\right)\right) .

  2. 2.

    Multivariate Gaussian Distributions: Well-Conditioned Covariance Matrix. Let AA be generated from a distribution, where each archetype is an independent draw from a NN-dimensional normal distribution 𝒩(0,Σ)\mathcal{N}(0,\Sigma). Let κ(Σ)\kappa(\Sigma) be the condition number of Σ\Sigma.777Σ\Sigma is well-conditioned if κ(Σ)\kappa(\Sigma) is small. When Σ=IN\Sigma=I_{N}, κ(Σ)=1\kappa(\Sigma)=1. For any set SS with size 64κ(Σ)k64\kappa(\Sigma)k, if we query each bidder about items in SS, with probability at least 12exp(4k)1-2\exp(-4k), f(ε1)=O(kκ(Σ)Tr(ΣS)ε1)f(\varepsilon_{1})=O\left(\frac{k\sqrt{\kappa(\Sigma)}}{\sqrt{\mathrm{Tr}(\Sigma_{S})}}\cdot\varepsilon_{1}\right), and each bidder needs to answer O(κ(Σ)klog(Aε1))O\left(\kappa(\Sigma)k\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon_{1}}\right)\right) queries.

  3. 3.

    Weak Dependence: Sufficient Variance per Item. Let AA be generated from a distribution, where each archetype is an independent copy of an NN-dimensional random vector θ\theta. Assuming (i) Inf(θ)2<1\left\lVert\textsc{Inf}(\theta)\right\rVert_{2}<1, (ii) θi\theta_{i} lies in [c,c][-c,c], and (iii) Var[θi]a2\mathrm{Var}[\theta_{i}]\geq a^{2} for each i[N]i\in[N], then for any set SS with size 256c4k2a4(1INF(θ)2)2\frac{256c^{4}k^{2}}{a^{4}\left(1-\left\lVert INF(\theta)\right\rVert_{2}\right)^{2}}, if we query each bidder about items in SS, with probability at least 12exp(4k)1-2\exp(-4k), f(ε1)=O(kaε1)f(\varepsilon_{1})=O\left(\frac{\sqrt{k}}{a}\cdot\varepsilon_{1}\right) and each bidder needs to answer O(c4k2a4(1INF(θ)2)2log(Aε1))O\left(\frac{c^{4}k^{2}}{a^{4}\left(1-\left\lVert INF(\theta)\right\rVert_{2}\right)^{2}}\cdot\log\left(\frac{\left\lVert A\right\rVert_{\infty}}{\varepsilon_{1}}\right)\right) queries.888Clearly, we can weaken condition (i),(ii) and (iii). The result still holds if we can find a set SS, so that for vector θS\theta_{S}, condition (i), (ii), and (iii) hold, and |S||S| is at least 256c4k2a4(1INF(θS)2)2\frac{256c^{4}k^{2}}{a^{4}\left(1-\left\lVert INF(\theta_{S})\right\rVert_{2}\right)^{2}}.

Proof.

The results in the first and last setting follows directly from Theorem 2. For the second setting, notice that by the eigenvalue interlacing theorem, κ(ΣS)κ(Σ)\kappa(\Sigma_{S})\leq\kappa(\Sigma), as ΣS\Sigma_{S} is a principal submatrix of Σ\Sigma. Therefore, Tr(ΣS)ρ(ΣS)|S|κ(ΣS)64k\frac{\mathrm{Tr}(\Sigma_{S})}{\rho(\Sigma_{S})}\geq\frac{|S|}{\kappa(\Sigma_{S})}\geq 64k. Now, the result follows from Theorem 2. ∎

Proof of Theorem 2: Instead of directly studying the query complexity under our query model. We first consider the query complexity under a seemingly stronger query model, where we directly query the bidder about their value of ejTte_{j}^{T}t, and their answer will be within ejTt±ηe_{j}^{T}t\pm\eta for some η>0\eta>0. We refer to this type of queries as noisy value queries. Since for each item jj, |ejTAz|A|e_{j}^{T}Az|\leq\left\lVert A\right\rVert_{\infty} for all z[0,1]kz\in[0,1]^{k} and we only care about types in N\mathbb{R}^{N} that are close to some AzAz, we can use our queries to perform binary search on pp to simulate noisy value queries. In particular, we only need logA+log1/η+log1/ε\log{\left\lVert A\right\rVert_{\infty}}+\log{1/\eta}+\log{1/\varepsilon} many queries to simulate one noisy value queries. From now on, the plan is to first investigate the query complexity for noisy value queries, then convert the result to query complexity in the original model.

We first fix the notation. Let \ell be the number of noisy value queries, and Q×NQ\in\mathbb{R}^{\ell\times N} be the query matrix, where, each row of QQ is a standard unit vector. We use y^\hat{y}\in\mathbb{R}^{\ell} to denote the bidder’s answers to the queries and yy\in\mathbb{R}^{\ell} to true answers to the queries. Note that y^yη\left\lVert\hat{y}-y\right\rVert_{\infty}\leq\eta. Given y^\hat{y}, we solve the following least squares problem: minzkQAzy^22\min_{z\in\mathbb{R}^{k}}\left\lVert QAz-\hat{y}\right\rVert_{2}^{2}.

The problem has a closed form solution: z^=(ATQTQA)1ATQTy^\hat{z}=\left(A^{T}Q^{T}QA\right)^{-1}A^{T}Q^{T}\hat{y}. Let B:=QAB:=QA, and z(t)kz(t)\in\mathbb{R}^{k} be a vector that satisfies tAz(t)ε\left\lVert t-Az(t)\right\rVert_{\infty}\leq\varepsilon. We are interested in upper bounding z^z(t)\left\lVert\hat{z}-z(t)\right\rVert_{\infty}. Note that

z^z(t)=\displaystyle\hat{z}-z(t)= (BTB)1BT(y^Bz(t))\displaystyle(B^{T}B)^{-1}B^{T}(\hat{y}-Bz(t))
=\displaystyle= (BTB)1BT((y^y)+(yBz(t)))\displaystyle(B^{T}B)^{-1}B^{T}\left((\hat{y}-y)+(y-Bz(t))\right)
=\displaystyle= (BTB)1BT(y^y)+(BTB)1BTQ(tAz(t))\displaystyle(B^{T}B)^{-1}B^{T}(\hat{y}-y)+(B^{T}B)^{-1}B^{T}Q(t-Az(t))

Since the rows of QQ are all standard unit vectors, Q=1\left\lVert Q\right\rVert_{\infty}=1.

z^z(t)\displaystyle\left\lVert\hat{z}-z(t)\right\rVert_{\infty}\leq (BTB)1BT(y^y)+(BTB)1BTQ(tAz(t))\displaystyle\left\lVert(B^{T}B)^{-1}B^{T}(\hat{y}-y)\right\rVert_{\infty}+\left\lVert(B^{T}B)^{-1}B^{T}Q(t-Az(t))\right\rVert_{\infty}
\displaystyle\leq (BTB)1BT(η+Q(tAz(t)))\displaystyle\left\lVert(B^{T}B)^{-1}\right\rVert_{\infty}\left\lVert B^{T}\right\rVert_{\infty}\left(\eta+\left\lVert Q(t-Az(t))\right\rVert_{\infty}\right)
\displaystyle\leq (BTB)1BT(η+ε).\displaystyle\left\lVert(B^{T}B)^{-1}\right\rVert_{\infty}\left\lVert B^{T}\right\rVert_{\infty}(\eta+\varepsilon).

Next, we bound (BTB)1BT\left\lVert(B^{T}B)^{-1}\right\rVert_{\infty}\left\lVert B^{T}\right\rVert_{\infty} under the different assumptions.

Deterministic Structure:

We choose =k\ell=k and QQ so that QA=B=CQA=B=C. Since CC is diagonally dominant, CC is non-singular, and (CTC)1=C1(CT)1(C^{T}C)^{-1}=C^{-1}(C^{T})^{-1}.

Lemma 3 (Adapted from Theorem 1 and Corollary 1 of [45]).

If a matrix Un×nU\in\mathbb{R}^{n\times n} is diagonally dominant both by rows and by columns, and α=mini[n](|Uii|ji|Uij|)\alpha=\min_{i\in[n]}\left(|U_{ii}|-\sum_{j\neq i}|U_{ij}|\right) and β=minj[n](|Ujj|ij|Uij|)\beta=\min_{j\in[n]}\left(|U_{jj}|-\sum_{i\neq j}|U_{ij}|\right), then U11/α\left\lVert U^{-1}\right\rVert_{\infty}\leq 1/\alpha and (UT)11/β\left\lVert(U^{T})^{-1}\right\rVert_{\infty}\leq 1/\beta.

By Lemma 3, (CTC)1CTCTαβ\left\lVert(C^{T}C)^{-1}\right\rVert_{\infty}\left\lVert C^{T}\right\rVert_{\infty}\leq\frac{\left\lVert C^{T}\right\rVert_{\infty}}{\alpha\beta}. Note that CT=maxj[k]i[k]|Cij|2maxj[k]Cjj\left\lVert C^{T}\right\rVert_{\infty}=\max_{j\in[k]}\sum_{i\in[k]}|C_{ij}|\leq 2\max_{j\in[k]}C_{jj}. The last inequality is because CC is diagonally dominant by columns. To sum up, if we choose QQ so that QA=CQA=C,

z^z(t)(ε+η)CTαβ2(ε+η)maxj[k]Cjjαβ.\left\lVert\hat{z}-z(t)\right\rVert_{\infty}\leq\frac{(\varepsilon+\eta)\cdot\left\lVert C^{T}\right\rVert_{\infty}}{\alpha\beta}\leq\frac{2(\varepsilon+\eta)\cdot\max_{j\in[k]}C_{jj}}{\alpha\beta}.

Ex-ante Analysis:

Since (BTB)1k(BTB)12\left\lVert(B^{T}B)^{-1}\right\rVert_{\infty}\leq\sqrt{k}\left\lVert(B^{T}B)^{-1}\right\rVert_{2} and BTB2\left\lVert B^{T}\right\rVert_{\infty}\leq\sqrt{\ell}\left\lVert B\right\rVert_{2},

z^z(t)kσmax(B)σmin(B)2(η+ε),\left\lVert\hat{z}-z(t)\right\rVert_{\infty}\leq\frac{\sqrt{\ell k}\cdot\sigma_{max}(B)}{\sigma_{min}(B)^{2}}\cdot(\eta+\varepsilon),

where σmax(B)\sigma_{max}(B) (or σmin(B)\sigma_{min}(B)) is BB’s largest (or smallest) singular value.

Multivariate Gaussian distribution:

When θ\theta is distributed according to a multivariate Gaussian distribution, we choose =|S|\ell=|S| and QQ so that each row corresponding to an eje_{j} with jSj\in S. Now, BB is a ×k\ell\times k random matrix where each column is an independent copy of θS\theta_{S}. We use Lemma 4 to bound BB’s largest singular value σmax(B)\sigma_{max}(B) and smallest singular value σmin(B)\sigma_{min}(B). The proof of Lemma 4 is postponed to Section 4.1.

Lemma 4.

[Concentration of Singular Values under multivariate Gaussian distributions]
Let U=[X(1),,X(n)]U=[X^{(1)},\ldots,X^{(n)}] be a m×nm\times n random matrix, where each column of UU is an independent copy of a mm-dimensional random vector XX distributed according to a multivariate Gaussian distribution 𝒩(0,ΛTDΛ)\mathcal{N}(0,\Lambda^{T}D\Lambda). In particular, Λm×m\Lambda\in\mathbb{R}^{m\times m} is an orthonormal matrix, and Dm×mD\in\mathbb{R}^{m\times m} is a diagonal matrix. We have σmax(U)2Tr(D) and σmin(U)Tr(D)4,\sigma_{max}(U)\leq 2\sqrt{\mathrm{Tr}(D)}\text{ and }\sigma_{min}(U)\geq\frac{\sqrt{\mathrm{Tr}(D)}}{4}, with probability at least 12exp(Tr(D)8dmax+4n)1-2\exp\left(-\frac{\mathrm{Tr}(D)}{8\cdot d_{max}}+4n\right), where dmaxd_{max} is the largest entry in DD.

Since Tr(ΣS)ρ(ΣS)>64k\frac{\mathrm{Tr}(\Sigma_{S})}{\rho(\Sigma_{S})}>64k, by Lemma 4, σmax(B)2Tr(ΣS)\sigma_{max}(B)\leq 2\sqrt{\mathrm{Tr}(\Sigma_{S})} and σmin(B)Tr(ΣS)/4\sigma_{min}(B)\geq\sqrt{\mathrm{Tr}(\Sigma_{S})}/4 with probability at least 12exp(Tr(ΣS)16ρ(ΣS))12exp(4k)1-2\exp\left(-\frac{\mathrm{Tr}(\Sigma_{S})}{16\cdot\rho(\Sigma_{S})}\right)\geq 1-2\exp(-4k). Hence, z^z(t)32|S|kTr(ΣS)(η+ε)\left\lVert\hat{z}-z(t)\right\rVert_{\infty}\leq\frac{32\sqrt{|S|k}}{\sqrt{\mathrm{Tr}(\Sigma_{S})}}\cdot(\eta+\varepsilon) with probability at least 12exp(Tr(ΣS)16ρ(ΣS))1-2\exp\left(-\frac{\mathrm{Tr}(\Sigma_{S})}{16\cdot\rho(\Sigma_{S})}\right).

Weakly Dependent Distributions:

When the coordinates of θS\theta_{S} are weakly dependent, i.e., Inf(θS)2<1\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}<1, we choose =|S|\ell=|S| and QQ so that each row corresponding to an eje_{j} with jSj\in S. Now, BB is a ×k\ell\times k random matrix where each column is an independent copy of θS\theta_{S}. We use Lemma 5 to bound BB’s largest singular value σmax(B)\sigma_{max}(B) and smallest singular value σmin(B)\sigma_{min}(B). The proof of Lemma 5 is postponed to Section 4.2.

Lemma 5.

[Concentration of Singular Values under Weak Dependence]
Let U=[X(1),,X(n)]U=[X^{(1)},\ldots,X^{(n)}] be a m×nm\times n random matrix, where each column of UU is an independent copy of a mm-dimensional random vector XX. We assume that the coordinates of XX are weakly dependent, i.e., Inf(X)2<1\left\lVert\textsc{Inf}(X)\right\rVert_{2}<1, and each coordinate of XX lies in [c,c][-c,c] and has mean 0 and variance vi2v_{i}^{2}. Let v=i[m]vi2v=\sqrt{\sum_{i\in[m]}v_{i}^{2}}. We have σmax(U)2v and σmin(U)v4,\sigma_{max}(U)\leq 2v\text{ and }\sigma_{min}(U)\geq\frac{v}{4}, with probability at least 12exp((1Inf(X)2)v432c4nm+4n)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)v^{4}}{32c^{4}nm}+4n\right).

Since iSvi2>16c2k|S|1Inf(θS)2\sum_{i\in S}v_{i}^{2}>\frac{16c^{2}k\sqrt{|S|}}{1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}}, by Lemma 5, we have σmax(B)2iSvi2\sigma_{max}(B)\leq 2\sqrt{\sum_{i\in S}v_{i}^{2}} and σmin(B)iSvi2/4\sigma_{min}(B)\geq\sqrt{\sum_{i\in S}v_{i}^{2}}/4 with probability at least 12exp((1Inf(θS)2)(iSvi2)264c4k|S|)12exp(4k)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}\right)\cdot(\sum_{i\in S}v_{i}^{2})^{2}}{64c^{4}k|S|}\right)\geq 1-2\exp(-4k). Therefore, z^z(t)32|S|kiSvi2(η+ε)\left\lVert\hat{z}-z(t)\right\rVert_{\infty}\leq\frac{32\sqrt{|S|k}}{\sqrt{\sum_{i\in S}v_{i}^{2}}}\cdot(\eta+\varepsilon) with probability at least 12exp((1Inf(θS)2)(iSvi2)264c4k|S|)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}\right)\cdot(\sum_{i\in S}v_{i}^{2})^{2}}{64c^{4}k|S|}\right).

Query Complexity in Different Models:

We set η\eta to be ε\varepsilon.

  • Deterministic structure: we have a (ε,4maxj[k]Cjjαβε)\left(\varepsilon,\frac{4\cdot\max_{j\in[k]}C_{jj}}{\alpha\beta}\cdot\varepsilon\right)-query protocol using k(logA+2log(1/ε))k(\log{\left\lVert A\right\rVert_{\infty}}+2\log(1/\varepsilon)) queries.

  • Multivariate Gaussian distributions: with probability at least 12exp(Tr(ΣS)16ρ(ΣS))1-2\exp\left(-\frac{\mathrm{Tr}(\Sigma_{S})}{16\cdot\rho(\Sigma_{S})}\right) (no less than 12exp(4k)1-2\exp(-4k) by our choice of SS), we have a (ε,64|S|kTr(ΣS)ε)\left(\varepsilon,\frac{64\sqrt{|S|k}}{\sqrt{\mathrm{Tr}(\Sigma_{S})}}\cdot\varepsilon\right)-query protocol using |S|(logA+2log(1/ε))|S|(\log{\left\lVert A\right\rVert_{\infty}}+2\log(1/\varepsilon)) queries.

  • Weakly dependent distributions: with probability at least 12exp((1Inf(θS)2)(iSvi2)264c4k|S|)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(\theta_{S})\right\rVert_{2}\right)\cdot(\sum_{i\in S}v_{i}^{2})^{2}}{64c^{4}k|S|}\right) (no less than 12exp(4k)1-2\exp(-4k) by our choice of SS), we have a (ε,64|S|kiSvi2ε)\left(\varepsilon,\frac{64\sqrt{|S|k}}{\sqrt{\sum_{i\in S}v_{i}^{2}}}\cdot\varepsilon\right)-query protocol using |S|(logA+2log(1/ε))|S|(\log{\left\lVert A\right\rVert_{\infty}}+2\log(1/\varepsilon)) queries.

\Box

4 Bounding the Largest and Smallest Singular Values

We prove both Lemma 4 and 5 using an ε\varepsilon-net argument. We first state a lemma that says that for any matrix MM, if we can bound the maximum value of Mx2\left\lVert Mx\right\rVert_{2} over all points xx in the ε\varepsilon-net, then we also bound the largest and smallest singular values of MM.

Lemma 6 (Adapted from [41]).

For any ε<1\varepsilon<1, there exists an ε\varepsilon-net 𝒦Sn1\mathcal{K}\subseteq S^{n-1}, i.e., xSn1y𝒦xy2<ε\forall x\in S^{n-1}~{}\exists y\in\mathcal{K}~{}\left\lVert x-y\right\rVert_{2}<\varepsilon, such that |𝒦|(3/ε)n|\mathcal{K}|\leq(3/\varepsilon)^{n}. For any matrix Mm×nM\in\mathbb{R}^{m\times n}, let a=maxx𝒦Mx2a=max_{x\in\mathcal{K}}\left\lVert Mx\right\rVert_{2} and b=minx𝒦Mx2b=min_{x\in\mathcal{K}}\left\lVert Mx\right\rVert_{2}, then σmax(M)a1ε\sigma_{max}(M)\leq\frac{a}{1-\varepsilon} and σmin(M)bε1εa\sigma_{min}(M)\geq b-\frac{\varepsilon}{1-\varepsilon}\cdot a.

Proof of Lemma 6: Let xSn1x^{*}\in S^{n-1} be a vector that satisfies Mx2=σmax(M)\left\lVert Mx^{*}\right\rVert_{2}=\sigma_{max}(M). Let xx be a vector in 𝒦\mathcal{K} such that xx2ε\left\lVert x-x^{*}\right\rVert_{2}\leq\varepsilon. Then σmax(M)=Mx2Mx2+M(xx)2a+εσmax(M)\sigma_{max}(M)=\left\lVert Mx^{*}\right\rVert_{2}\leq\left\lVert Mx\right\rVert_{2}+\left\lVert M(x-x^{*})\right\rVert_{2}\leq a+\varepsilon\sigma_{max}(M), which implies that σmax(M)a1ε\sigma_{max}(M)\leq\frac{a}{1-\varepsilon}. On the other hand, for any ySn1y\in S^{n-1}, let y𝒦y^{\prime}\in\mathcal{K} satisfies yy2ε\left\lVert y-y^{\prime}\right\rVert_{2}\leq\varepsilon, then My2My2M(yy)2bεσmax(M)bε1εa\left\lVert My\right\rVert_{2}\geq\left\lVert My^{\prime}\right\rVert_{2}-\left\lVert M(y-y^{\prime})\right\rVert_{2}\geq b-\varepsilon\cdot\sigma_{max}(M)\geq b-\frac{\varepsilon}{1-\varepsilon}\cdot a. \Box

4.1 Multivariate Gaussian Distributions

In this section, we prove the case where the columns of the random matrix are drawn from a multivariate Gaussian distribution. The key is again to prove that for every unit-vector, Ux2\left\lVert Ux\right\rVert_{2} lies between [c1𝔼[Ux2],c2𝔼[Ux2]][c_{1}\cdot\mathbb{E}[\left\lVert Ux\right\rVert_{2}],c_{2}\cdot\mathbb{E}[\left\lVert Ux\right\rVert_{2}]] with high probability for some absolute constant c1c_{1} and c2c_{2} (Lemma 7). Lemma 4 follows from the combination of Lemma 76, and the union bound.

Proof of Lemma 4: Let Y(1),,Y(n)Y^{(1)},\ldots,Y^{(n)} be nn i.i.d. samples from the distribution 𝒩(0,Im)\mathcal{N}(0,I_{m}), and V:=D1/2[Y(1),,Y(s)]V:=D^{1/2}[Y^{(1)},\ldots,Y^{(s)}].

Proposition 2.

𝒩(0,Σ)=𝑑ΛT𝒩(0,D)\mathcal{N}(0,\Sigma)\overset{d}{=}\Lambda^{T}\circ\mathcal{N}(0,D) and U=𝑑ΛTVU\overset{d}{=}\Lambda^{T}V.

Proof.

𝔼[ΛTD1/2Y(i)(Y(i))TD1/2Λ]=ΛTD1/2𝔼[Y(i)(Y(i))T]D1/2Λ=ΛTDΛ=Σ\mathbb{E}[\Lambda^{T}D^{1/2}Y^{(i)}(Y^{(i)})^{T}D^{1/2}\Lambda]=\Lambda^{T}D^{1/2}\mathbb{E}[Y^{(i)}(Y^{(i)})^{T}]D^{1/2}\Lambda=\Lambda^{T}D\Lambda=\Sigma. ∎

Since Λ\Lambda is an orthonormal matrix, σmax(U)=σmax(V)\sigma_{max}(U)=\sigma_{max}(V) and σmin(U)=σmin(V)\sigma_{min}(U)=\sigma_{min}(V). We will proceed to show that both σmax(V)\sigma_{max}(V) and σmax(V)\sigma_{max}(V) concentrate around their means. We do so via an ε\varepsilon-net argument.

Lemma 7.

For any fix xSn1x\in S^{n-1}, 𝔼[Vx22]=Tr(D)\mathbb{E}[\left\lVert Vx\right\rVert_{2}^{2}]=\mathrm{Tr}(D). Moreover,

Pr[Vx22Tr(D)4]exp(Tr(D)8dmax),\Pr\left[\left\lVert Vx\right\rVert_{2}^{2}\leq\frac{\mathrm{Tr}(D)}{4}\right]\leq\exp\left(-\frac{\mathrm{Tr}(D)}{8\cdot d_{max}}\right),

and

Pr[Vx222Tr(D)]exp(Tr(D)4dmax).\Pr\left[\left\lVert Vx\right\rVert_{2}^{2}\geq 2\mathrm{Tr}(D)\right]\leq\exp\left(-\frac{\mathrm{Tr}(D)}{4\cdot d_{max}}\right).

Proof of Lemma 7: Let g1,,gng_{1},\ldots,g_{n} to be nn i.i.d. samples from 𝒩(0,1)\mathcal{N}(0,1). It is not hard to see that Vx=𝑑(d1g1,,dngn)TVx\overset{d}{=}(\sqrt{d_{1}}g_{1},\ldots,\sqrt{d_{n}}g_{n})^{T}, so we need to prove that i[n]digi2\sum_{i\in[n]}d_{i}g_{i}^{2} concentrates around its mean Tr(D)\mathrm{Tr}(D).

Pr[i[n]digi2Tr(D)t]\displaystyle\Pr\left[\sum_{i\in[n]}d_{i}g_{i}^{2}\leq\mathrm{Tr}(D)-t\right]
=\displaystyle= Pr[exp(λ(Tr(D)i[n]digi2))exp(λt)](λ>0 and will be specified later)\displaystyle\Pr\left[\exp\left(\lambda\cdot(\mathrm{Tr}(D)-\sum_{i\in[n]}d_{i}g_{i}^{2})\right)\geq\exp(\lambda t)\right]~{}\quad(\text{$\lambda>0$ and will be specified later})
\displaystyle\leq exp(λTr(D))𝔼[exp(λi[n]digi2)]exp(λt)=exp(λTr(D))i[n]𝔼[exp(λdigi2)]exp(λt)\displaystyle\frac{\exp(\lambda\mathrm{Tr}(D))\mathbb{E}\left[\exp\left(-\lambda\cdot\sum_{i\in[n]}d_{i}g_{i}^{2}\right)\right]}{\exp(\lambda t)}=\frac{\exp(\lambda\mathrm{Tr}(D))\prod_{i\in[n]}\mathbb{E}\left[\exp\left(-\lambda\cdot d_{i}g_{i}^{2}\right)\right]}{\exp(\lambda t)}

Since gi2g_{i}^{2} distributes according to a chi-square distribution, its moment generating function

𝔼[exp(λdigi2)]=11+2λdi.\mathbb{E}\left[\exp\left(-\lambda\cdot d_{i}g_{i}^{2}\right)\right]=\frac{1}{\sqrt{1+2\lambda d_{i}}}.

If we choose λ\lambda to be no more than 1/2dmax1/2d_{max}, since for any a[0,1]a\in[0,1], 1+2aea1+2a\geq e^{a}, we have that

11+2λdiexp(λdi/2).\frac{1}{\sqrt{1+2\lambda d_{i}}}\leq\exp(-\lambda d_{i}/2).

Putting everything together, we have that

Pr[i[n]digi2Tr(D)t]exp(λ(tTr(D)/2)).\Pr\left[\sum_{i\in[n]}d_{i}g_{i}^{2}\leq\mathrm{Tr}(D)-t\right]\leq\exp\left(-\lambda\cdot(t-\mathrm{Tr}(D)/2)\right).

When we choose λ=1/2dmax\lambda=1/2d_{max} and t=3/4Tr(D)t=3/4\cdot\mathrm{Tr}(D), the RHS of the inequality becomes exp(Tr(D)8dmax)\exp\left(-\frac{\mathrm{Tr}(D)}{8\cdot d_{max}}\right).

Next, we upper bound Pr[i[n]digi2Tr(D)+t]\Pr\left[\sum_{i\in[n]}d_{i}g_{i}^{2}\geq\mathrm{Tr}(D)+t\right] via a similar approach.

Pr[i[n]digi2Tr(D)+t]\displaystyle\Pr\left[\sum_{i\in[n]}d_{i}g_{i}^{2}\geq\mathrm{Tr}(D)+t\right]
=\displaystyle= Pr[exp(λ(i[n]digi2Tr(D)))exp(λt)](λ>0 and will be specified later)\displaystyle\Pr\left[\exp\left(\lambda\cdot(\sum_{i\in[n]}d_{i}g_{i}^{2}-\mathrm{Tr}(D))\right)\geq\exp(\lambda t)\right]~{}\quad(\text{$\lambda>0$ and will be specified later})
\displaystyle\leq i[n]𝔼[exp(λ(digi2di))]exp(λt)\displaystyle\frac{\prod_{i\in[n]}\mathbb{E}\left[\exp\left(\lambda\cdot(d_{i}g_{i}^{2}-d_{i})\right)\right]}{\exp(\lambda t)}

Note that 𝔼[exp(λ(digi2di))]=exp(λdi)12λdi\mathbb{E}\left[\exp\left(\lambda\cdot(d_{i}g_{i}^{2}-d_{i})\right)\right]=\frac{\exp(-\lambda d_{i})}{\sqrt{1-2\lambda d_{i}}}.

Proposition 3.

For any x[0,1/4]x\in[0,1/4], exp(x)12x1+2x\frac{\exp(-x)}{\sqrt{1-2x}}\leq\sqrt{1+2x}.

Proof of Proposition 3: We first state a few inequalities that are not hard to verify. First, for all x>0x>0, ex1x+x2e^{-x}\leq 1-x+x^{2}. Second, 14x212x28x4\sqrt{1-4x^{2}}\geq 1-2x^{2}-8x^{4} if x[0,1/2)x\in[0,1/2). Finally, 12x28x41x+x21-2x^{2}-8x^{4}\geq 1-x+x^{2} if x[0,1/4]x\in[0,1/4]. Combining all three inequalities, we have that

ex14x2=12x1+2x,for all x[0,1/4].e^{-x}\leq\sqrt{1-4x^{2}}=\sqrt{1-2x}\sqrt{1+2x},~{}\text{for all $x\in[0,1/4]$}.

\Box

If we choose λ\lambda to be no more than 1/4dmax1/4d_{max}, then by Proposition 3, exp(λdi)12λdi1+2λdi\frac{\exp(-\lambda d_{i})}{\sqrt{1-2\lambda d_{i}}}\leq\sqrt{1+2\lambda d_{i}}, which is upper bounded by exp(λdi)\exp(\lambda d_{i}). Putting everything together, we have that

Pr[i[n]digi2Tr(D)+t]exp(λ(tTr(D))).\Pr\left[\sum_{i\in[n]}d_{i}g_{i}^{2}\geq\mathrm{Tr}(D)+t\right]\leq\exp\left(-\lambda(t-\mathrm{Tr}(D))\right).

When we choose λ=1/4dmax\lambda=1/4d_{max} and t=2Tr(D)t=2\mathrm{Tr}(D), the RHS of the inequality becomes exp(Tr(D)4dmax)\exp\left(-\frac{\mathrm{Tr}(D)}{4\cdot d_{max}}\right). \Box

Next, we only consider when the good event happens, that is, for all points xx in the ε\varepsilon-net, Vx2[Tr(D)2,2Tr(D)]\left\lVert Vx\right\rVert_{2}\in\left[\frac{\sqrt{\mathrm{Tr}(D)}}{2},\sqrt{2\mathrm{Tr}(D)}\right]. Combining Lemma 7 and the union bound, we know that the good event happens with probability at least 12exp(Tr(D)8dmax+ln(3/ε)n)1-2\exp\left(-\frac{\mathrm{Tr}(D)}{8\cdot d_{max}}+\ln(3/\varepsilon)\cdot n\right). According to Lemma 6, σmax(V)2Tr(D)1ε\sigma_{max}(V)\leq\frac{\sqrt{2\mathrm{Tr}(D)}}{1-\varepsilon} and σmin(V)Tr(D)2ε1ε2Tr(D)\sigma_{min}(V)\geq\frac{\sqrt{\mathrm{Tr}(D)}}{2}-\frac{\varepsilon}{1-\varepsilon}\cdot\sqrt{2\mathrm{Tr}(D)}. If we choose ε=1/7\varepsilon=1/7, then σmax(V)2Tr(D)\sigma_{max}(V)\leq 2\sqrt{\mathrm{Tr}(D)} and σmin(V)Tr(D)4\sigma_{min}(V)\geq\frac{\sqrt{\mathrm{Tr}(D)}}{4}. \Box

4.2 Bounded Distributions with Weak Dependence

In this section, we prove the case where the columns of the random matrix are drawn from a mm-dimensional distribution that satisfies weak dependence. The overall plan is similar to the one for multivariate Gaussian distributions. The key is again to prove that for every unit-vector, Ux2\left\lVert Ux\right\rVert_{2} lies between [c1𝔼[Ux2],c2𝔼[Ux2]]\left[c_{1}\cdot\mathbb{E}[\left\lVert Ux\right\rVert_{2}],c_{2}\cdot\mathbb{E}[\left\lVert Ux\right\rVert_{2}]\right] with high probability for some absolute constant c1c_{1} and c2c_{2} (Lemma 8). Lemma 5 then follows from the combination of Lemma 86, and the union bound.

Proof of Lemma 5:

We first show that for each fix xSn1x\in S^{n-1}, Ux2\left\lVert Ux\right\rVert_{2} is concentrates around its mean. Then, we apply Lemma 6 to bound σmax(U)\sigma_{max}(U) and σmin(U)\sigma_{min}(U).

Lemma 8.

Let U=[X(1),,X(n)]U=[X^{(1)},\ldots,X^{(n)}] be a m×nm\times n random matrix, where each column of UU is an independent copy of a mm-dimensional random vector XX. We assume that the coordinates of XX are weakly dependent, i.e., Inf(X)2<1\left\lVert\textsc{Inf}(X)\right\rVert_{2}<1, and each coordinate of XX lies in [c,c][-c,c] and has mean 0 and variance vi2v_{i}^{2}. Let v=i[m]vi2v=\sqrt{\sum_{i\in[m]}v_{i}^{2}}. For any fix xSn1x\in S^{n-1}, 𝔼[Ux22]=v2\mathbb{E}[\left\lVert Ux\right\rVert_{2}^{2}]=v^{2} and

Pr[|Ux22v2|>t]2exp((1Inf(X)2)t216c4nm)\Pr\left[|\left\lVert Ux\right\rVert_{2}^{2}-v^{2}|>t\right]\leq 2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)t^{2}}{16c^{4}nm}\right)

Proof of Lemma 8: We first expand Ux22\left\lVert Ux\right\rVert_{2}^{2}.

Ux22=i[m](j[n]uijxj)2=i[m](j[n]uij2xj2+2kjuijuikxjxk).\displaystyle\left\lVert Ux\right\rVert_{2}^{2}=\sum_{i\in[m]}\left(\sum_{j\in[n]}u_{ij}x_{j}\right)^{2}=\sum_{i\in[m]}\left(\sum_{j\in[n]}u_{ij}^{2}x_{j}^{2}+2\sum_{k\neq j}u_{ij}u_{ik}x_{j}x_{k}\right).

Therefore, 𝔼[Ux22]=i[m]vi2=v2\mathbb{E}\left[\left\lVert Ux\right\rVert_{2}^{2}\right]=\sum_{i\in[m]}v_{i}^{2}=v^{2}. To prove that Ux22\left\lVert Ux\right\rVert_{2}^{2} concentrates, we first need a result by Chatterjee [16].

Lemma 9 (Adapted from Theorem 4.3 in [16]).

Let XX be a dd-dimensional random vector. Suppose function ff satisfies the following generalized Lipschitz condition:

|f(x)f(y)|i[d]ci𝟙[xiyi],|f(x)-f(y)|\leq\sum_{i\in[d]}c_{i}\mathds{1}[x_{i}\neq y_{i}],

for any xx and yy in the support of XX. If Inf(X)<1\textsc{Inf}(X)<1, we have

Pr[|f(X)𝔼[f(X)]|t]2exp((1Inf(X)2)t2i[d]ci2).\Pr\left[|f(X)-\mathbb{E}[f(X)]|\geq t\right]\leq 2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)t^{2}}{\sum_{i\in[d]}c_{i}^{2}}\right).

The function we care about is Ux22\left\lVert Ux\right\rVert_{2}^{2}, where the variables are {uij}i[m],j[n]\{u_{ij}\}_{i\in[m],j\in[n]}. If UU and UU^{\prime} only differs at the (i,j)(i,j) entry, then

|Ux22Ux22|\displaystyle|\left\lVert Ux\right\rVert_{2}^{2}-\left\lVert U^{\prime}x\right\rVert_{2}^{2}|
=\displaystyle= |uij2xj2+2kjuijuikxjxk(uij)2xj22kjuijuikxjxk|\displaystyle|u_{ij}^{2}x_{j}^{2}+2\sum_{k\neq j}u_{ij}u_{ik}x_{j}x_{k}-(u^{\prime}_{ij})^{2}x_{j}^{2}-2\sum_{k\neq j}u^{\prime}_{ij}u_{ik}x_{j}x_{k}|
\displaystyle\leq c2xj2+4c2|xj||xk|4c2|xj|(k[n]|xk|)4c2n|xj|\displaystyle c^{2}x_{j}^{2}+4c^{2}|x_{j}||x_{k}|\leq 4c^{2}|x_{j}|\left(\sum_{k\in[n]}|x_{k}|\right)\leq 4c^{2}\sqrt{n}|x_{j}|

We denote 4c2n|xj|4c^{2}\sqrt{n}|x_{j}| by cijc_{ij}. Clearly, for any UU and UU^{\prime}, |Ux22Ux22|i,j[d]cij𝟙[uijuij]|\left\lVert Ux\right\rVert_{2}^{2}-\left\lVert U^{\prime}x\right\rVert_{2}^{2}|\leq\sum_{i,j\in[d]}c_{ij}\mathds{1}[u_{ij}\neq u^{\prime}_{ij}]. Also, notice that Inf(U)=InInf(X)\textsc{Inf}(U)=I_{n}\otimes\textsc{Inf}(X), and therefore Inf(U)2=Inf(X)2\left\lVert\textsc{Inf}(U)\right\rVert_{2}=\left\lVert\textsc{Inf}(X)\right\rVert_{2}999\otimes denotes the Kronecker product of the two matrices. We apply Lemma 9 to Ux22\left\lVert Ux\right\rVert_{2}^{2} and derive the following inequality:

Pr[|Ux22v2|>t]2exp((1Inf(X)2)t2i[m],j[n]cij2)=2exp((1Inf(X)2)t216c4nm).\Pr\left[|\left\lVert Ux\right\rVert_{2}^{2}-v^{2}|>t\right]\leq 2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)t^{2}}{\sum_{i\in[m],j\in[n]}c_{ij}^{2}}\right)=2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)t^{2}}{16c^{4}nm}\right).

\Box

Next, we only consider when the good event happens, that is, for all points xx in the ε\varepsilon-net, Ux2[v2,2v]\left\lVert Ux\right\rVert_{2}\in\left[\frac{v}{2},\sqrt{2}v\right]. Combining Lemma 8 (setting t=3/4v2t=3/4v^{2}) and the union bound, we know that the good event happens with probability at least 12exp((1Inf(X)2)9v4256c4nm+ln(3/ε)n)1-2\exp\left(-\frac{\left(1-\left\lVert\textsc{Inf}(X)\right\rVert_{2}\right)9v^{4}}{256c^{4}nm}+\ln(3/\varepsilon)\cdot n\right). According to Lemma 6, σmax(U)2v1ε\sigma_{max}(U)\leq\frac{\sqrt{2}v}{1-\varepsilon} and σmin(U)v2ε1ε2v\sigma_{min}(U)\geq\frac{v}{2}-\frac{\varepsilon}{1-\varepsilon}\cdot\sqrt{2}v. If we choose ε=1/7\varepsilon=1/7, then σmax(U)2v\sigma_{max}(U)\leq 2v and σmin(U)v4\sigma_{min}(U)\geq\frac{v}{4}. \Box

References

  • [1] Saeed Alaei. Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers. In the 52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2011.
  • [2] Saeed Alaei, Hu Fu, Nima Haghpanah, Jason Hartline, and Azarakhsh Malekian. Bayesian Optimal Auctions via Multi- to Single-agent Reduction. In the 13th ACM Conference on Electronic Commerce (EC), 2012.
  • [3] Moshe Babaioff, Yannai A Gonczarowski, and Noam Nisan. The menu-size complexity of revenue approximation. Games and Economic Behavior, 2021.
  • [4] Moshe Babaioff, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. A Simple and Approximately Optimal Mechanism for an Additive Buyer. In the 55th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2014.
  • [5] Dirk Bergemann and Karl Schlag. Robust monopoly pricing. Journal of Economic Theory, 146(6):2527–2543, 2011.
  • [6] Johannes Brustle, Yang Cai, and Constantinos Daskalakis. Multi-item mechanisms without item-independence: Learnability via robustness. CoRR, abs/1911.02146, 2019.
  • [7] Johannes Brustle, Yang Cai, and Constantinos Daskalakis. Multi-item mechanisms without item-independence: Learnability via robustness. In EC, 2020.
  • [8] Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) samples. In FOCS, 2017.
  • [9] Yang Cai, Constantinos Daskalakis, and S. Matthew Weinberg. An Algorithmic Characterization of Multi-Dimensional Mechanisms. In the 44th Annual ACM Symposium on Theory of Computing (STOC), 2012.
  • [10] Yang Cai, Constantinos Daskalakis, and S. Matthew Weinberg. Optimal Multi-Dimensional Mechanism Design: Reducing Revenue to Welfare Maximization. In the 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2012.
  • [11] Yang Cai, Constantinos Daskalakis, and S. Matthew Weinberg. Reducing Revenue to Welfare Maximization : Approximation Algorithms and other Generalizations. In the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2013.
  • [12] Yang Cai, Constantinos Daskalakis, and S. Matthew Weinberg. Understanding Incentives: Mechanism Design becomes Algorithm Design. In the 54th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2013.
  • [13] Yang Cai, Nikhil R. Devanur, and S. Matthew Weinberg. A duality based unified approach to bayesian mechanism design. In STOC, 2016.
  • [14] Yang Cai and Zhiyi Huang. Simple and Nearly Optimal Multi-Item Auctions. In the 24th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2013.
  • [15] Yang Cai and Mingfei Zhao. Simple mechanisms for subadditive buyers via duality. In STOC, 2017, 2017.
  • [16] Sourav Chatterjee. Concentration inequalities with exchangeable pairs. PhD thesis, Citeseer, 2005.
  • [17] Shuchi Chawla, Jason D. Hartline, and Robert D. Kleinberg. Algorithmic Pricing via Virtual Valuations. In the 8th ACM Conference on Electronic Commerce (EC), 2007.
  • [18] Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. Multi-Parameter Mechanism Design and Sequential Posted Pricing. In the 42nd ACM Symposium on Theory of Computing (STOC), 2010.
  • [19] Shuchi Chawla and J. Benjamin Miller. Mechanism design for subadditive agents via an ex-ante relaxation. In Proceedings of the ACM Conference on Economics and Computation (EC), 2016.
  • [20] Xi Chen, Ilias Diakonikolas, Anthi Orfanou, Dimitris Paparas, Xiaorui Sun, and Mihalis Yannakakis. On the complexity of optimal lottery pricing and randomized mechanisms. In Proceedings of the 56th Annual Symposium on Foundations of Computer Science (FOCS), 2015.
  • [21] Constantinos Daskalakis. Multi-item auctions defying intuition? ACM SIGecom Exchanges, 14(1):41–75, 2015.
  • [22] Constantinos Daskalakis, Alan Deckelbaum, and Christos Tzamos. Mechanism design via optimal transport. In Michael J. Kearns, R. Preston McAfee, and Éva Tardos, editors, ACM Conference on Electronic Commerce, EC ’13, Philadelphia, PA, USA, June 16-20, 2013, pages 269–286. ACM, 2013.
  • [23] Constantinos Daskalakis, Alan Deckelbaum, and Christos Tzamos. The complexity of optimal mechanism design. In the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.
  • [24] Constantinos Daskalakis, Alan Deckelbaum, and Christos Tzamos. Strong duality for a multiple-good monopolist. Econometrica, 85(3):735–767, 2017.
  • [25] Constantinos Daskalakis, Maxwell Fishelson, Brendan Lucier, Vasilis Syrgkanis, and Santhoshini Velusamy. Simple, credible, and approximately-optimal auctions. In EC, 2020.
  • [26] David Donoho and Victoria Stodden. When does non-negative matrix factorization give a correct decomposition into parts? In Proceedings of the 16th International Conference on Neural Information Processing Systems, NIPS’03, page 1141–1148, Cambridge, MA, USA, 2003. MIT Press.
  • [27] Shaddin Dughmi, Li Han, and Noam Nisan. Sampling and representation complexity of revenue maximization. In WINE, 2014.
  • [28] Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David Parkes, and Sai Srivatsa Ravindranath. Optimal auctions through deep learning. In International Conference on Machine Learning, pages 1706–1715. PMLR, 2019.
  • [29] Zhe Feng, Harikrishna Narasimhan, and David C Parkes. Deep learning for revenue-optimal auctions with budgets. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, 2018.
  • [30] Yiannis Giannakopoulos and Elias Koutsoupias. Duality and optimality of auctions for uniform distributions. In Proceedings of the 15th ACM conference on Economics and Computation (EC), 2014.
  • [31] Kira Goldner and Anna R Karlin. A prior-independent revenue-maximizing auction for multiple additive bidders. In International Conference on Web and Internet Economics, pages 160–173. Springer, 2016.
  • [32] Yannai A Gonczarowski and S Matthew Weinberg. The sample complexity of up-to-ε\varepsilon multi-dimensional revenue maximization. In FOCS, 2018.
  • [33] Sergiu Hart and Noam Nisan. Approximate Revenue Maximization with Multiple Items. In EC, 2012.
  • [34] Sergiu Hart, Noam Nisan, et al. The menu-size complexity of auctions. Center for the Study of Rationality, 2013.
  • [35] Ian A Kash and Rafael M Frongillo. Optimal auctions with restricted allocations. In Proceedings of the 2016 ACM Conference on Economics and Computation, pages 215–232, 2016.
  • [36] Robert Kleinberg and S. Matthew Weinberg. Matroid Prophet Inequalities. In the 44th Annual ACM Symposium on Theory of Computing (STOC), 2012.
  • [37] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In Conference on Learning Theory, pages 1298–1318. PMLR, 2016.
  • [38] Roger B. Myerson. Optimal Auction Design. Mathematics of Operations Research, 6(1):58–73, 1981.
  • [39] Noam Nisan, Tim Roughgarden, E. Tardos, and V. V. Vazirani, editors. Algorithmic Game Theory. Cambridge University Press, 2007.
  • [40] Aviad Rubinstein and S. Matthew Weinberg. Simple mechanisms for a subadditive buyer and applications to revenue monotonicity. In EC, 2015.
  • [41] Mark Rudelson. Recent developments in non-asymptotic theory of random matrices. Modern aspects of random matrix theory, 72:83, 2014.
  • [42] Weiran Shen, Pingzhong Tang, and Song Zuo. Automated mechanism design via neural networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, 2019.
  • [43] Volker Strassen. The existence of probability measures with given marginals. The Annals of Mathematical Statistics, 36(2):423–439, 1965.
  • [44] Vasilis Syrgkanis. A sample complexity measure with applications to learning optimal auctions. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  • [45] James M Varah. A lower bound for the smallest singular value of a matrix. Linear Algebra and its applications, 11(1):3–5, 1975.
  • [46] Andrew Chi-Chih Yao. An n-to-1 bidder reduction for multi-item auctions and its applications. In SODA, 2015.

Appendix A Missing Proof of Lemma 2

Proof of Lemma 2: The proof essentially follows from the same analysis as Theorem 3 in [7]. We only provide a sketch here. Since we are working with the matrix factorization model and can directly exploit the low dimensionality of the latent representation, we manage to replace the dependence on NN with A\left\lVert A\right\rVert_{\infty} in both the revenue loss and violation of the truthfulness constraints. Our proof relies on the idea of “simultaneously coupling” by Brustle et al. [7]. More specifically, it couples F^z,i\widehat{F}_{z,i} with every distribution Fz,iF_{z,i} in the ε\varepsilon-Prokhorov-ball around F^z,i\widehat{F}_{z,i}. If we round both F^z,i\widehat{F}_{z,i} and any Fz,iF_{z,i} to a random grid GG with size δ\delta, we can argue that the expected total variation distance (over the randomness of the grid) between the two rounded distributions is O(ε+εδ)O(\varepsilon+\frac{\varepsilon}{\delta}) (using Theorem 2 in [7]). Now consider the following mechanism: choose a random grid GG, round the bids to the random grid, apply the mechanism MGM_{G} that we designed for the rounded distribution of ×iF^z,i\bigtimes_{i}\widehat{F}_{z,i}. More specifically, MGM_{G} is the following mechanism: for each bid bb, use 𝒮i(bi,δ)\mathcal{S}_{i}(b_{i},\delta) to sample a bid bib^{\prime}_{i} and run M^\widehat{M} on the bid profile (b1,,bm)(b^{\prime}_{1},\ldots,b^{\prime}_{m}). Since the expected total variation distance (over the randomness of the grid) between the two rounded distributions is O(ε+εδ)O(\varepsilon+\frac{\varepsilon}{\delta}), we only need to argue that when the given distribution and the true distribution are close in total variation distance, we can robustify the mechanism designed for one distribution for the other distribution. This is a much easier task, and we again use a similar argument in [7] to prove it. Combining everything, we can show that the randomized mechanism we constructed is approximately-truthful and only loses a negligible revenue compared to M^\widehat{M} under any distribution that is within the ε\varepsilon-Prokhorov-ball around the given distribution. \Box