Learning against Non-credible Auctions
Abstract
The standard framework of online bidding algorithm design assumes that the seller commits himself to faithfully implementing the rules of the adopted auction. However, the seller may attempt to cheat in execution to increase his revenue if the auction belongs to the class of non-credible auctions. For example, in a second-price auction, the seller could create a fake bid between the highest bid and the second highest bid. This paper focuses on one such case of online bidding in repeated second-price auctions. At each time , the winner with bid is charged not the highest competing bid but a manipulated price , where the parameter in essence measures the seller’s credibility. Unlike classic repeated-auction settings where the bidder has access to samples , she can only receive mixed signals of , and in this problem. The task for the bidder is to learn not only the bid distributions of her competitors but also the seller’s credibility. We establish regret lower bounds in various information models and provide corresponding online bidding algorithms that can achieve near-optimal performance. Specifically, we consider three cases of prior information based on whether the credibility and the distribution of the highest competing bids are known. Our goal is to characterize the landscape of online bidding in non-credible auctions and understand the impact of the seller’s credibility on online bidding algorithm design under different information structures.
1 Introduction
Digital advertising has experienced significant expansion due to the rapid rise of online-activities, surpassing traditional advertising as the dominant marketing influence in various industries. Between 2021 and 2022, digital advertising revenues in U.S. grew 10.8% year-over-year totalling $209.7 billion dollars (IAB 2023). In practice, a huge amount of online ads are sold via real-time auctions implemented on advertising platforms and advertisers participate in such repeated online auctions to purchase advertising opportunities. This has motivated a flourishing line of work to focus on the problem of online bidding algorithm design. In particular, learning to bid in repeated second-price auctions—often with constraints or unknown own valuations—has been well studied due to the popularity of this auction format in practice (Weed, Perchet, and Rigollet 2016; Balseiro and Gur 2019; Golrezaei et al. 2021; Balseiro, Lu, and Mirrokni 2022; Feng, Padmanabhan, and Wang 2022; Chen et al. 2022, 2023). However, these studies are in fact based on an implicit assumption that the seller commits himself to faithfully implementing the rules of the announced second-price auction.
The possibility that a seller can profitably cheat in a second-price auction was pointed out as early as the seminal paper (Vickrey 1961) that introduced this auction format. After observing all the bids, the seller can strategically exaggerate the highest competing bid and overcharge the winner up to the amount of her own bid. Several following papers studied the issue of seller cheating in second-price auctions (Rothkopf and Harstad 1995; Porter and Shoham 2005; McAdams and Schwarz 2007). Recent theoretical work by Akbarpour and Li (2020) formally modelled credibility in an extensive-form game where the seller is allowed to deviate from the auction rules as long as the deviation cannot be detected by bidders. They defined an auction to be credible if it is incentive-compatible for the seller to follow the rules in the presence of cheating opportunities. In a second-price auction, the seller can even charge the winner the amount of her own bid to obtain higher utility, with an innocent explanation that the highest and second-highest bids are identical. Therefore, the prevalent second-price auction belongs to the class of non-credible auctions in this framework. Taking credibility into consideration, advertisers are confronted with a question of practical importance: how should an advertiser bid in repeated non-credible second-price auctions to maximize her cumulative utility?
In this work, we formulate the above problem as an online learning problem for a single bidder. We consider the scenario with a single seller who runs repeated non-credible second-price auctions. At each time , the winner with bid is charged not the highest competing bid but a manipulated price for some . The parameter in essence captures the seller’s credibility, the extent to which the seller deviates from the second-price auction rules. Our linear model is equivalent to the classic bid-shilling model (Rothkopf and Harstad 1995; Porter and Shoham 2005; McAdams and Schwarz 2007) in expectation. In the bid-shilling model, the seller cheats by inserting a shill bid after observing all of the bids with probability . The seller is assumed to take full advantage of his power so the winner will pay her own bid if the seller does cheat. Then the winner’s expected payment is . Moreover, no matter what charging rules the seller actually uses, as long as the estimated credibility within this linear model is away from , it can be confirmed that the seller is cheating. We believe our results and techniques have implications for the more complicated setting with .
We assume the highest competing bids are i.i.d. sampled from a distribution . The bidder aims to maximize her expected cumulative utility, which is given by the expected difference between the total value and the total payment. Moving to the information model, We investigate three cases of prior information: (1) known and unknown ; (2) unknown and known ; (3) unknown and unknown . For all three cases, we consider bandit feedback where the bidder can observe the realized allocation and cost at each round. For the last case where neither is unknown, we additionally consider full feedback where the price is always observable regardless of the auction outcome. More discussions on modeling will be placed in Section 2.
The key challenge of this problem lies in the lack of credibility and its impact on the learning process. If assuming the seller has full commitment, it is well known that truthful bidding is the dominant strategy in second-price auctions, but this truthful property no longer holds in non-credible second-price auctions. Identifying optimal bidding strategies for utility maximization requires not only knowing the bidder’s own values but also considering the strategies of her competitors and the seller. In classic repeated-auction settings (assuming a trustworthy seller), online bidding algorithms can collect historical samples to estimate distribution . However, the bidder in non-credible auctions needs to cope with an additional dimension of uncertainty: the available observations under either bandit or full feedback are all manipulated prices, i.e., mixed signals of and . As a result, difficulties arise in the estimation of the distribution of her competitors’ bids and the seller’s credibility.
1.1 Main Contributions
First, we characterize the optimal clairvoyant bidding strategy in non-credible second-price auctions when the bidder knows both credibility and distribution . This optimal clairvoyant bidding strategy is also used as the benchmark strategy in the regret definition.
Next, we establish regret lower bounds in various information models and provide corresponding online bidding algorithms that are optimal up to log factors. Our results are summarized in Table 1.
-
•
For the case where is unknown and is known, we explore the landscape by discussing how the problem varies with different credibility parameter and . We mainly contribute to the regret analysis for , with a proven lower bound, and a concrete near-optimal algorithm.
-
•
For the case where is known and is unknown, we develop an algorithm, which adopts a dynamic estimation approach to approximate .
-
•
For the challenging case where both and are unknown, we observe that under bandit feedback, an lower bound and an algorithm follow directly from existing algorithms. We then turn to the more interesting setting with full information feedback, for which we propose an episodic bidding algorithm that learns and simultaneously in an efficient manner, while achieving a near-optimal regret of .
Overall, this work provides a theoretical regret analysis for learning against non-credible auctions. We aim to characterize the landscape of online bidding in non-credible auctions and analyze how the seller’ credibility influences the design of online bidding algorithms under different information structures.
Feedback | Upper bound | Lower bound | Theorem | ||
---|---|---|---|---|---|
Known, | Unknown | Bandit | 0 | 0 | Theorem 3.1 |
Known, | |||||
Known, | |||||
Unknown | Known | Bandit | Theorem 4.31 | ||
Unknown | Unknown | Bandit | Corollary 5.1 | ||
Full | Theorem 5.61 |
1 The regret bounds of these theorems rely on corresponding assumptions.
1.2 Related Work
Credibility in auctions. The issue of seller cheating has been studied by the game-theoretic literature in a strategic framework (McAdams and Schwarz 2007; Porter and Shoham 2005; Rothkopf and Harstad 1995). Recently, Akbarpour and Li (2020) explored the setting where the seller deviates from the auction rules in a way that can be innocently explained. To this end, they defined credibility based on the detectability of seller deviations. They further established an impossibility result that no optimal auction simultaneously achieves staticity, credibility and strategy-proofness. They showed that the first-price auction is the unique static optimal auction that achieves credibility. Hakimov and Raghavan (2023) considered the general allocation problem and introduced the definition of verifiability (i.e., allowing participants to check the correctness of their assignments) and transparency (i.e., allowing participants to check whether the allocation rule is deviated). These are stronger security notions than the credibility concept investigated by (Akbarpour and Li 2020). Dianat and Freer (2021) studied how the credibility of an auction format affects bidding behavior and final outcomes via laboratory experiments. Their empirical findings confirm the theory that sellers do have incentives to break the auction rules and overcharge the winning bidder. These pioneering works discussed how a participant can potentially detect and learn non-credible mechanisms as we do. In contrast, our work is based on the online learning framework, where information revelation is partial and sequential in nature.
Learning to bid. Our work is closely related with the line of literature on learning to bid in repeated auctions. Balseiro et al. (2019); Han, Zhou, and Weissman (2020); Han et al. (2020); Badanidiyuru, Feng, and Guruganesh (2023); Zhang et al. (2022) studied the problem of learning in repeated first-price auctions. Castiglioni, Celli, and Kroer (2022); Wang et al. (2023) studied no-regret learning in repeated first-price auctions with budget constraints. As for repeated second-price auctions, Balseiro and Gur (2019); Balseiro, Lu, and Mirrokni (2022); Chen et al. (2022) considered the bidding problem with budget constraints and Feng, Padmanabhan, and Wang (2022); Golrezaei et al. (2021) further considered return-on-spend (RoS) constraints. All these works assume that the seller has full commitment to the announced auction rules.
2 Problem Formulation
We consider the problem of online learning in repeated non-credible second-price auctions. We focus on a single bidder in a large population of bidders during a time horizon . In each round , there is an available item auctioned by a single seller. The bidder perceives a value for this item, and then submits a bid based on and all historical observations available to her. We denote the maximum bid of all other bidders by . As usual, we use the bold symbol without subscript to denote the vector ; the same goes for other variables in the present paper.
We consider a stochastic setting where is i.i.d. sampled from a distribution and is i.i.d. sampled from a distribution . The latter assumption follows from the standard mean-field approximation (Iyer, Johari, and Sundararajan 2014; Balseiro, Besbes, and Weintraub 2015) and is a common practice in literature. The main rationale behind this assumption is that when the number of other bidders is large, on average their valuations and bidding strategies are static over time. Whether is known to the bidder depends on the information structure while is always unknown to the bidder.
The seller claims that he follows the rules of the second-price auction, but he actually uses a combination of the first-price auction and the second-price auction. For simplicity, we assume a linear model. The auction outcome in round is then as follows: if , the bidder wins the item and pays , where is assumed to be a fixed weight throughout the period; if , the bidder loses the auction and pays nothing. Here we assume that ties are broken in favor of the bidder we concern to simplify exposition. We only consider since will be immediately detected by the winner and will lead to lower revenue than mere second-price auctions. One can observe the pricing rule follows the second-price auction when , and it follows the first-price auction when .
Let be the binary variable indicating whether the bidder wins the item. Let be the bidder’s cost and let be the corresponding reward.
Information structure.
In this paper, we investigate three cases of prior information: 1) known credibility and unknown distribution ; 2) unknown credibility and known distribution ; 3) unknown credibility and unknown distribution .
The first two cases not only serve as warm-up analysis, but also have practical significance in their own right. In reality, a bidder may receive some additional signals beyond the learning process to construct her belief over the seller’s credibility or the strategies of other bidders, e.g. the seller’s reputation heard from other bidders, bidding data collected through other credible channels. Then her bidding algorithm mainly aims to learn the other part in the competing environment.
We consider two cases of information feedback:
-
1.
Bandit information feedback. The bidder can observe the allocation and the cost at the end of each round .
-
2.
Full information feedback. The bidder can observe the allocation and the price at the end of each round .
The second information feedback makes sense in non-censored auctions where the seller-side platform (SSP) is supposed to provide the minimum winning price for every bidder regardless of the outcome. If the bidder wins, a dishonest seller will overstate the minimum winning price to overcharge the winner; if the bidder loses, a dishonest seller will understate the minimum winning price to trick the bidder into raising her bids in the following rounds. The full-feedback model for simplicity assumes that these two types of deceptions are symmetric, controlled by the same parameter . Note that when , both feedback models are equivalent to the binary feedback model in first-prices auctions.
We denote the historical observations available to the bidder before submitting a bid in round by . For the two cases of information feedback, we have, respectively,
We will omit the superscript or in the remaining of this paper when the context is clear.
Bidding strategy and regret.
A bidding strategy maps to a (possibly random) bid for each . For a strategy , we denote by the expected performance of , defined as follows:
where the expectation is taken with respect to the values , the highest competing bids and any possible randomness embedded in the strategy . The expect regret of the bidder is defined to be the difference in the expected cumulative rewards of the bidder’s strategy and the optimal bidding strategy, which has the perfect knowledge of , and to maximize the target:
Now we look at the the optimal bidding strategy. Using the independence of from , one has
Let and let (taking the largest in a case of a tie). Then with the perfect knowledge of and , the optimal strategy in each round submits , denoted by for short. The expect regret of strategy can written as
We will omit the superscript in the remaining of this paper when the context is clear .
3 Learning with Known and Unknown
We start with the scenario where the credibility parameter is known but the distribution of the highest competing bids is unknown. The algorithms and proofs of this section are deferred to Appendix A.
Observe that when reaches an endpoint of , this problem will degenerate into a bidding problem in repeated first-price or second-price auctions. Therefore, We will consider three cases separately: , , and . The following Theorem 3.1 provides a comprehensive characterization of this setting. For each case, it establishes a regret lower bound and gives an algorithm with optimal performance up to log factors.
Theorem 3.1.
For repeated non-credible second-price auctions with known credibility , unknown distribution and bandit feedback:
-
(1)
when , truthful bidding achieves no regret;
-
(2)
when , there exists a bidding algorithm (Algorithm A.1) that achieves an regret, and the lower bound on regret for this case is ;
-
(3)
when , there exists a bidding algorithm (Algorithm A.2) that achieves an regret, and the lower bound on regret for this case is .
The lower bounds in Theorem 3.1 show that the hardness of this learning problem increases as decreases, which demonstrates the impact of reduced credibility on online bidding optimization. When deviates from , truthful bidding is no longer a dominant strategy. The bidder has to learn the competitors’ bids to make decisions, and thus the distribution estimation error would introduce an inevitable regret of order . As long as , the bidder can infer the highest competing bid from her payment once she wins an auction by measuring the difference between and . However, when becomes , in winning rounds provides no additional information about . The complete loss of credibility cripples the bidder’s ability to observe the competitive environment and estimate the distribution , so the regret lower bound leaps from to .
The first statement of Theorem 3.1 is trivial due to the nature of the second-price auction.
The second case is equivalent to bidding in repeated first-price auctions with binary feedback (receiving only or ), which can be modeled as a contextual bandits problem with cross learning:
-
•
Cross learning over contexts. The bidder in round not only receives the reward under , but also observes the rewards under for every other .
For the contextual bandits problem with cross-learning over contexts in the stochastic setting, Balseiro et al. (2019) proposed a UCB-based algorithm that can achieve an regret, where is the number of actions. Applying this algorithm to the auction setting results in a regret bound of , where the last term comes from the discretization error and the upper bound becomes with . They also proved the regret lower bound is via a reduction to the problem of dynamic pricing.
Lemma 3.2 (Balseiro et al. (2019)).
For repeated first-price auctions with binary feedback, Algorithm A.1 can achieve an regret, and there exists a problem instance where any algorithm must incur a regret of at least regret.
The third case is similar to bidding in repeated first-price auctions with censored feedback, where the seller runs first-price auctions and always reveals the winner’s bid to each bidder so the bidder can see the highest competing bid only if she loses the auction. Han, Zhou, and Weissman (2020) modeled that problem as a contextual bandits problem with cross learning, partial ordering:
-
•
Cross learning over contexts.
-
•
Partial ordering over actions. There exists a partial order over the action set . The bidder in round not only receives the reward under , but also observes the rewards under for every other .
-
•
Partial ordering over contexts. There exists a partial order over the context set such that if , then where is the optimal auction under context .
Lemma 3.3.
[Han, Zhou, and Weissman (2020)] For the contextual bandits problem with cross-learning over contexts, partial ordering over auctions and contexts in the stochastic setting, there exists a bidding algorithm than achieves an regret.
We carefully adapt their algorithm to our setting (Algorithm A.2) and verify that the third case with satisfies the above three properties:
-
•
Cross learning over values. The bidder can calculate under by
(1) -
•
Partial ordering over bids. If the bidder wins in round , she can infer the highest competing bid by
(2) Therefore the reward under for any other can be calculated by using the corresponding allocation and price . If the bidder loses the auction with , she should also lose with and the reward is .
-
•
Partial ordering over values. We have shown in the previous section that the optimal bid under value is . The following lemma shows it is a non-decreasing function in .
Lemma 3.4.
(taking the largest in the case of a tie) is a non-decreasing function in both and .
Therefore, Algorithm A.2 can achieve an regret when . For the last piece of the puzzle, we prove the following regret lower bound. Remark that Lemma 3.5 holds for any , though the bound is not tight when .
Lemma 3.5.
For repeated non-credible second-price auctions with known credibility , unknown distribution , there exists a constant such that
even in the special case with and full feedback.
4 Learning with Unknown and Known
We next consider the scenario where distribution is known but credibility is unknown. The proofs of this section are deferred to Appendix B.
(3) |
Our bidding algorithm for this setting is depicted in Algorithm 4.1. The bidder first conducts a one-round exploration to make an appropriate initialization. After receiving the value in each round , the bidder computes , which is the estimation of based on the historical observations in the past rounds. Recall that the optimal bid shown in Section 2 maximizes . Thus, by the choice of , if the estimator is close to , the expected reward will be close to the optimal reward .
Equation 3 in Algorithm 4.1 aims to estimate by fitting the observed rewards to the expected rewards , which can be computed given is known. We apply the Azuma-Hoeffding inequality to obtain the following result.
Lemma 4.1.
For technical purpose, we make the following assumption.
Assumption 4.2.
is twice differentiable and log-concave with density function . There exist positive constants such that for .
The CDFs of many common distributions, such as gamma distributions, Gaussian distributions and uniform distributions, are all log-concave. The existence of positive bounds on the density function is also a standard and common assumption in various learning problems.
Theorem 4.3.
Suppose that Assumption 4.2 holds. For repeated non-credible second-price auctions with unknown credibility , known distribution and bandit feedback, there exists a bidding algorithm (Algorithm 4.1) that achieves an regret, and any algorithm must incur at least a constant regret.
Remark that due to the additional assumption, we cannot actually draw the conclusion that estimating is generally easier than estimating by comparing two lower bounds in Theorem 4.3 and Theorem 3.1. For example, the two-point method used in the proof of Lemma 3.5 constructs two discrete distributions to show no bidding algorithm can obtain low regret simultaneously under both distributions, while Assumption 4.2 has ruled out such bad cases.
A key step in the proof of Theorem 4.3 involves showing that the reward per round obtained by the bidder is close to the reward under optimal bid with high probability. We have with probability at least , ,
Then the regret upper bound can be established by showing and summing up through to . It is also worth discussing the robustness of this result. Even without Assumption 4.2, we can get
And with a weaker continuity condition, it still holds that .
Corollary 4.4.
Suppose that is continuous. For repeated non-credible second-price auctions with unknown credibility , known distribution and bandit feedback, Algorithm 4.1 can achieve an regret.
An intuition on why the proof of the regret upper bound may fail under some discontinuous distributions is given in Example B.3. In spite of this, we conjecture that Algorithm 4.1 can also guarantee a lower regret in the case with discontinuous and we leave that as a future direction.
5 Learning with Unknown and Unknown
Bandit feedback.
For the last scenario with both and unknown, we first consider bandit feedback. Although the seller’s credibility is unknown, this case still satisfies the cross-learning property over values, i.e., Equation 1 holds. Thus, Algorithm A.1, which actually does not use the value of , can still work in this case and achieve an regret. The regret lower bound also directly follows the third statement of Theorem 3.1 since any bidding strategy cannot obtain a better regret guarantee than when . Hence, we have the following result.
Corollary 5.1.
For repeated non-credible second-price auctions with unknown credibility , unknown distribution and bandit feedback, there exists a bidding algorithm (Algorithm A.1) that achieves an regret, and the lower bound on regret for this problem is .
In spite of the sublinear regret, the result of Corollary 5.1 is not satisfactory. Algorithm A.1 only falls into the category of usual UCB policies with cross learning, but does not make full use of the properties of this auction setting. In fact, it treats the seller’s mechanism as a black box without really estimating the seller’s credibility or other bidder’s strategies.
Full feedback.
With the above in mind, we explore whether we can make any improvements with richer feedback and more meticulous estimation of and . In the full feedback model, is always observable, which would intuitively help our estimation. Nevertheless, when , binary feedback in first-prices auctions still results in the lower bound. Thus, in what follows we exclude the extreme case of when it is impossible to achieves a better bound than . Note that sellers in reality often do not set due to concerns about reputation or cheating costs (McAdams and Schwarz 2007). This assumption is also consistent with the recent empirical findings by Dianat and Freer (2021) that although sellers in non-credible second-price auctions often overcharge, they typically do not use the rules of the first-price auction to maximize revenue.
Assumption 5.2.
There exists a constant such that .
We also make a slightly stronger assumption over .
Assumption 5.3.
is twice differentiable with log-concave density function . There exist positive constants such that and for .
(5) |
(6) |
(7) |
(8) |
(9) |
(10) |
Our bidding algorithm for full feedback is presented in Algorithm 5.1. It runs in an episodic manner, similar to Cesa-Bianchi, Gentile, and Mansour (2014),Javanmard and Nazerzadeh (2019), Badanidiyuru, Feng, and Guruganesh (2023). During a time horizon , the bidding algorithm is divided into episodes, each containing time steps. Denote be the time steps in stage , s.t. . For any time step in the first episode, we set for a proper initialization. For any time step in episode (), we use the estimated parameter and distribution in the -th episode to set the bid
and only update these estimators at the end of episode by using the data observed in episode .
The main difficulty is how to update the estimators of and in each episode. Recall that when G is known, the estimation step (Equation 3) in Algorithm 4.1 is essentially similar to using the maximum likelihood estimation (MLE) method to find the most probable . However, without the knowledge of , the bidder cannot directly estimate by matching observed rewards to expected rewards. To handle this challenge, we combine the non-parametric log-concave density estimator and MLE method, to learn and simultaneously.
We first introduce the non-parametric estimator of log-concave density function , which is adopted from Dümbgen and Rufibach (2009). In each episode s, given realized , Algorithm 5.1 for any parameter gives an estimator ,
(11) |
In this work, we restrict the function class of for any and as below,
It is w.l.o.g. to re-parameterize , where is a concave function w.r.t . Then it is equivalent to get an estimator ,
(12) |
Let be the estimated empirical distribution using the above estimator. Let be the empirical distribution of samples in episode such that . Dümbgen and Rufibach (2009) proved the following result.
Lemma 5.4 (Dümbgen and Rufibach (2009)).
The optimizer exists and is unique. For any , .
Give the above characterization of and we provide the uniform convergence bound for in the following lemma.
Lemma 5.5.
Suppose that for some , then with probability at least , holds .
Given the non-parametric estimator introduced in the above, we minimize the following MLE loss function to compute ,
(13) |
where and . Here, the indicator is carefully chosen so that
and , .
Theorem 5.6.
Suppose that Assumption 5.2 holds. For repeated non-credible second-price auctions with unknown credibility , known distribution and full feedback, there exists a bidding algorithm Algorithm 5.1 that achieves if Assumption 5.3 holds. And the lower bound on regret for this problem is .
We first give a bound on distance between and in Lemma C.3. Next we show that the distribution estimator uniformly converges to the real distribution in Lemma C.4. Our algorithm and the regret analysis follow the same spirit as in Theorem 4.3 in Badanidiyuru, Feng, and Guruganesh (2023). The main difference is that we use rather than as our indicator function. We defer the proof of this section to Appendix C.
6 Conclusion
In this paper, we formulate and study the problem of learning against non-credible auctions and evaluate several algorithms under various information structures.
It remains an open problem whether we can further relax the assumptions over distribution , especially for Theorem 4.3 and Theorem 5.6. Can we design a near optimal algorithm for general distributions with unknown and known ? Can we apply the upper bound result of Theorem 5.6 to the bandit feedback model? In this work, we assume the seller’s credibility is fixed. Another research direction is to consider that the seller may use a time-dependent charging rule. Will learning learning become impossible in this more complicated setting?
References
- Akbarpour and Li (2020) Akbarpour, M.; and Li, S. 2020. Credible auctions: A trilemma. Econometrica, 88(2): 425–467.
- An (1996) An, M. 1996. Log-concave Probability Distributions: Theory and Statistical Testing. Game theory and information, University Library of Munich, Germany.
- Badanidiyuru, Feng, and Guruganesh (2023) Badanidiyuru, A.; Feng, Z.; and Guruganesh, G. 2023. Learning to Bid in Contextual First Price Auctions. In Proceedings of the ACM Web Conference 2023, 3489–3497.
- Balseiro et al. (2019) Balseiro, S.; Golrezaei, N.; Mahdian, M.; Mirrokni, V.; and Schneider, J. 2019. Contextual bandits with cross-learning. Advances in Neural Information Processing Systems, 32.
- Balseiro, Besbes, and Weintraub (2015) Balseiro, S. R.; Besbes, O.; and Weintraub, G. Y. 2015. Repeated auctions with budgets in ad exchanges: Approximations and design. Management Science, 61(4): 864–884.
- Balseiro and Gur (2019) Balseiro, S. R.; and Gur, Y. 2019. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science, 65(9): 3952–3968.
- Balseiro, Lu, and Mirrokni (2022) Balseiro, S. R.; Lu, H.; and Mirrokni, V. 2022. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research.
- Castiglioni, Celli, and Kroer (2022) Castiglioni, M.; Celli, A.; and Kroer, C. 2022. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, 2767–2783. PMLR.
- Cesa-Bianchi, Gentile, and Mansour (2014) Cesa-Bianchi, N.; Gentile, C.; and Mansour, Y. 2014. Regret minimization for reserve prices in second-price auctions. IEEE Transactions on Information Theory, 61(1): 549–564.
- Chen et al. (2023) Chen, Y.; Wang, Q.; Duan, Z.; Sun, H.; Chen, Z.; Yan, X.; and Deng, X. 2023. Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets. arXiv preprint arXiv:2306.07709.
- Chen et al. (2022) Chen, Z.; Wang, C.; Wang, Q.; Pan, Y.; Shi, Z.; Tang, C.; Cai, Z.; Ren, Y.; Zhu, Z.; and Deng, X. 2022. Dynamic Budget Throttling in Repeated Second-Price Auctions. arXiv preprint arXiv:2207.04690.
- Dianat and Freer (2021) Dianat, A.; and Freer, M. 2021. Credibility in Second-Price Auctions: An Experimental Test. arXiv preprint arXiv:2105.00204.
- Dümbgen and Rufibach (2009) Dümbgen, L.; and Rufibach, K. 2009. Maximum likelihood estimation of a log-concave density and its distribution function: Basic properties and uniform consistency. Bernoulli, 15(1): 40–68.
- Feng, Padmanabhan, and Wang (2022) Feng, Z.; Padmanabhan, S.; and Wang, D. 2022. Online Bidding Algorithms for Return-on-Spend Constrained Advertisers. arXiv preprint arXiv:2208.13713.
- Golrezaei et al. (2021) Golrezaei, N.; Jaillet, P.; Liang, J. C. N.; and Mirrokni, V. 2021. Bidding and pricing in budget and ROI constrained markets. arXiv preprint arXiv:2107.07725.
- Hakimov and Raghavan (2023) Hakimov, R.; and Raghavan, M. 2023. Improving Transparency and Verifiability in School Admissions: Theory and Experiment.
- Han et al. (2020) Han, Y.; Zhou, Z.; Flores, A.; Ordentlich, E.; and Weissman, T. 2020. Learning to bid optimally and efficiently in adversarial first-price auctions. arXiv preprint arXiv:2007.04568.
- Han, Zhou, and Weissman (2020) Han, Y.; Zhou, Z.; and Weissman, T. 2020. Optimal no-regret learning in repeated first-price auctions. arXiv preprint arXiv:2003.09795.
- IAB (2023) IAB, T. I. A. B. 2023. Internet Advertising Revenue Report, Full year 2022 results.
- Iyer, Johari, and Sundararajan (2014) Iyer, K.; Johari, R.; and Sundararajan, M. 2014. Mean field equilibria of dynamic auctions with learning. Management Science, 60(12): 2949–2970.
- Javanmard and Nazerzadeh (2019) Javanmard, A.; and Nazerzadeh, H. 2019. Dynamic pricing in high-dimensions. The Journal of Machine Learning Research, 20(1): 315–363.
- McAdams and Schwarz (2007) McAdams, D.; and Schwarz, M. 2007. Who pays when auction rules are bent? International Journal of Industrial Organization, 25(5): 1144–1157.
- Porter and Shoham (2005) Porter, R.; and Shoham, Y. 2005. On cheating in sealed-bid auctions. Decision Support Systems, 39(1): 41–54.
- Rothkopf and Harstad (1995) Rothkopf, M. H.; and Harstad, R. M. 1995. Two models of bid-taker cheating in Vickrey auctions. Journal of Business, 257–267.
- Tsybakov (2009) Tsybakov, A. B. 2009. Introduction to Nonparametric Estimation. New York: Springer-Verlag.
- Vickrey (1961) Vickrey, W. 1961. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16(1): 8–37.
- Wang et al. (2023) Wang, Q.; Yang, Z.; Deng, X.; and Kong, Y. 2023. Learning to bid in repeated first-price auctions with budgets. arXiv preprint arXiv:2304.13477.
- Weed, Perchet, and Rigollet (2016) Weed, J.; Perchet, V.; and Rigollet, P. 2016. Online learning in repeated auctions. Journal of Machine Learning Research, 49(June): 1562–1583.
- Zhang et al. (2022) Zhang, W.; Han, Y.; Zhou, Z.; Flores, A.; and Weissman, T. 2022. Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions. arXiv preprint arXiv:2211.06358.
Supplementary Material of “Learning against Non-credible Auctions”
Appendix A Missing Algorithms and Proofs of Section 3
(14) |
(15) |
(16) |
(17) |
(18) |
(19) |
(20) |
(21) |
(22) |
See 3.4
Proof of Lemma 3.4.
Fix any with . For any , it holds that
where the inequality follows from the definition of and the conditions . Thus, all bids no larger than cannot be the largest maximizer for .
Fix any with . For any , it holds that
where the inequality follows from the definition of and the conditions . Thus, all bids no larger than cannot be the largest maximizer for . ∎
See 3.5
Proof of Lemma 3.5.
The proof follows the Le Cam’s two-point method (Tsybakov 2009). For any , consider the following two candidate distributions supported on :
where is some parameter to be chosen later. In other words, corresponds to a discrete random variable taking value in with probability , and corresponds to the probability . Fixing , let and be the expected per-round reward when bidding under and , respectively. After some algebra, it is straightforward to check that
Hence, for any , we have
(23) |
The inequality (23) is the separation condition required in the two-point method: there is no single bid that can obtain a uniformly small instantaneous regret under both and .
In the full information model, when , the bidder can always infer by
Let be the distribution of all historical samples at the beginning of time . Then for any policy ,
(24) |
where (a) is due to the fact that the maximum is no smaller than the average, (b) follows from (23), and (c) is due to the identity . Invoking Lemma A.1 and using the fact that for ,
(25) |
we have the following inequality on total variation distance:
(26) |
Finally, choosing , combining inequalities (24) and (26), we conclude that Lemma 3.5 holds with the constant . ∎
Lemma A.1 (Tsybakov (2009)).
Let be two probability measures on the same space. It holds that
Appendix B Missing Proofs of Section 4
See 4.1
Proof of Lemma 4.1.
Taking expectation on w.r.t. , one has
Then by applying Azuma–Hoeffding inequality, with probability at least ,
Then by (3) in Algorithm 4.1,
Then the proof can be concluded by applying a union bound. ∎
See 4.3
Proof of Theorem 4.3.
First, we show that the loss incurred by choosing such a bid is small with probability at least . For all ,
(27) |
where (a) holds by the choice of and (b) holds since , together with Lemma B.1; (c) follows from Lemma 4.1.
Next, we have
where (a) holds by Lemma 3.4 or Lemma B.1 and (b) holds with probability at least by applying Azuma–Hoeffding inequality.
By Lemma B.2, If , no algorithm can obtain positive expected reward under regardless of the seller’s credibility , which also implies any algorithm that guarantees can get a zero regret. Therefore, we can presume is a positive constant depending on the known distribution . Then for some constant , when , we have
(28) |
Combining (27) and (28), we have with probability at least ,
which is by taking . The proof of the regret upper bound can be finished by taking expectation.
The lower bound part trivially holds. We have for some constant ,
since at least in the first round, no single bid that can obtain a uniformly small instantaneous regret for all . ∎
See 4.4
Proof of Corollary 4.4.
The proof is analogous to that of Theorem 4.3. We first have
(29) |
where the second inequality uses the face rather than Lemma B.1. Together the inequality (28), which still holds given that is continuous, the proof can be finished by bounding . ∎
Lemma B.1.
Suppose that Assumption 4.2 holds. Then is strictly increasing in with derivative bounded by .
Proof of Lemma B.1.
First, we have
Denote . Give that Assumption 4.2 holds, we have . Thus, is a strictly increasing function and we have
∎
Lemma B.2.
Suppose that is continuous. If , then for any , no algorithm can obtain positive expected reward.
Proof of Lemma B.2.
First, as is increasing in by Lemma 3.4, is also increasing in . Thus, if , it holds that for nearly all ,
except a set contained in on which the probability distribution has a zero measure.
Next, by the continuity and monotonicity of , for all , we have on . As
we have by the tie-breaking rule in the definition of . Thus, remains zero on for all . Again by the continuity of , we have on where .
For nearly all except a zero-measure set,
where the inequality holds (1) by when and (2) by for when . Therefore, no algorithm can obtain positive expected reward under such a distribution regardless of the seller’s credibility . ∎
Example B.3.
Let . Consider the following distribution:
Then for ,
The optimal bid is calculated by
Taking , we have
Thus, in the proof of Theorem 4.3 is neither a positive constant nor , but an infinitesimal o(1). As a result, we are not able to give an upper bound for .
Intuitively, it requires for an algorithm to achieve low regret. However, when is approaching , although the bidder can still win with positive probability, it becomes harder and harder for the bidder to distinguish different under bandit feedback since in the winning rounds .
Appendix C Missing Proofs of Section 5
Proposition C.1.
Suppose that Assumption 5.3 holds. There exists positive constants and , such that ,
(30) | ||||
(31) |
Note that if the density function is log-concave, then the cumulative distribution function and are both log-concave (An 1996). Thus, Equation 30 and Equation 31 trivially holds for the bounded interval .
See 5.5
Proof of Lemma 5.5.
Denote . For any , the probability that there exists a point such that is at least
Then for any and such that , we can decompose in the following,
The second inequality follows from is -Lipschitz, Lemma 5.4, Dvoretzky-Kiefer-Wolfowitz (DKW) inequality and is -Lipschitz. It also holds with probability at least since . ∎
Given Lemma 5.5, is arbitrarily close to when is sufficiently large. In addition, Dümbgen and Rufibach (2009) also show is arbitrarily close to when is sufficiently large. Therefore, we can show,
Proposition C.2.
Suppose that Assumption 5.3 holds. Under Algorithm 5.1, there exists positive constants and , such that both
and
hold almost surely.
See 5.6
Proof of Theorem 5.6.
The lower bound directly follows from Lemma 3.5 since no algorithm can avoid regret even with known . Then we mainly focus on the upper bound. We first rewrite the regret per round in the following way,
where the first inequality holds by the choice of , i.e.,
Lemma C.3.
For each episode , we have with probability at least ,
where
Proof.
For notation simplicity, we re-parameterize by denoting . By Assumption 5.2, has an upper bound . To compute , we minimize the following MLE loss function,
We abuse notation and denote . Observe that
When equals to the real , should minimize the MLE loss function. By the second-order Taylor theorem, we have
for some on the line segment between and . Given the definition of , we have
where and are defined as follows,
Based on our construction of the algorithm, is independent with . Therefore, are independent with for any , we have
Thus, by Lemma 5.5 and Proposition C.2, we have
holds with probability at least . Then by Hoeffding’s inequality and union bound
holds with probability at least . By the optimality of ,
Invoking into , we have
holds with probability at least . So we have
holds with probability at least . ∎
Lemma C.4.
For any fixed , suppose and conditioned on , we have for all ,
holds with probability at least .
Proof.
Let be the empirical distribution of samples , i.e.
First, we give a uniform convergence bound for . The main challenge is that we cannot directly apply DKW inequality, since depends on , . To handle this challenge, we bound the lower bound and upper bound of separately. Since , and , , we have
Thus, conditioned on , for any , we have
Similarly, we have for any , conditioned on .
Therefore, applying a union bound and Lipschitzness of we have,
(32) |
holds with probability at least .
Second, we apply the similar technique used in Lemma 5.5 to bound . Denote , . Thus, for any , there must exist at least one s.t. . Let , then for any , the probability that there exists a point s.t. , is at least,
Therefore, for any , we can decompose in the following,
Indeed, the characterization results by Lemma 5.4 applies to samples . Then we have . . By the Lipshitzness of and , Equation 32 and union bound, we have
holds with probability at least when . ∎