This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A non-parametric approach for estimating consumer valuation distributions using second price auctions

Sourav Mukherjeelabel=e2][email protected] [    Rohit K Patralabel=e1][email protected] [    Kshitij Kharelabel=e3][email protected] [ Department of Statistics, University of Florida,
Abstract

We focus on online second price auctions, where bids are made sequentially, and the winning bidder pays the maximum of the second-highest bid and a seller specified reserve price. For many such auctions, the seller does not see all the bids or the total number of bidders accessing the auction, and only observes the current selling prices throughout the course of the auction. We develop a novel non-parametric approach to estimate the underlying consumer valuation distribution based on this data. Previous non-parametric approaches in the literature only use the final selling price and assume knowledge of the total number of bidders. The resulting estimate, in particular, can be used by the seller to compute the optimal profit-maximizing price for the product. Our approach is free of tuning parameters, and we demonstrate its computational and statistical efficiency in a variety of simulation settings, and also on an Xbox 7-day auction dataset on eBay.

Second price auction, Semi-parametric maximum likelihood estimation, consumer valuation distribution, standing price sequence,
keywords:
\startlocaldefs\endlocaldefs

, , and

1 Introduction

In a second price auction with reserve price, the product on sale is awarded to the highest bidder if the corresponding bid is higher than a seller-specified reserve price rr. The price paid by the winner is however, the maximum of the reserve price and the second highest bid. These auctions have been the industry standard for a long time, and are attractive to sellers as they induce the bidders to bid their true “private value” for the product, i.e., the maximum price they wish to pay for it. While some platforms have recently moved to first-price auctions, the second-price auction is still widely used on E-commerce platforms such as eBay, Rokt and online ad exchanges such as Xandr. The analysis of data obtained from these auctions presents unique challenges. For a clear understanding of these challenges, we first discuss in detail the auction framework, the observed data and the quantity of interest that we want to estimate/extract.

Auction framework: We consider an auction setting where a single product is on sale for a fixed time window [0,τ][0,\tau]. The seller sets the reserve price rr, which is used as the current selling price at time 0. Any bidder who arrives subsequently is allowed to place a bid only if his/her bid value is higher than current selling price at that time. If the bid is placed, the current selling price is updated to the second-highest bid value among the set of all placed bids up to that time, including this latest bid (the reserve price is also treated as a placed bid). 111Typically, a small increment (e.g. $0.01\$0.01) is also added to the second highest bid, but this insignificant increment is unlikely to influence bidder’s behaviour, and we ignore it in our analysis for ease of exposition. For example, suppose the current selling price at a given time is $4\$4 and the highest placed bid value up to that time is $5\$5. If a bidder comes and bids $3\$3, the bid will not be placed. If the bidder were to bid $4.5\$4.5, the bid would be placed and the current selling price would be updated to $4.5\$4.5 (since now the second largest placed bid is $4.5\$4.5). If the bidder were to bid $6\$6, the bid would be placed and the current selling price would be updated to $5\$5. At the end of the auction period, if no bid above the reserve price is placed, the item goes unsold, otherwise it is sold to the highest bidder at the selling price at time τ\tau. This final selling price is the second highest placed bid (including the reserve price) throughout the course of the auction.

Observed data: The observed data is the sequence of current selling price values (also sometimes referred to as the standing price) throughout the course of the auction, and the times at which there is a change in the selling/standing price. Typically, such data is available for multiple auctions of the same product. For example, in Section 5, we analyze data with current selling prices for 9393 different eBay 77-day auctions for Xbox. A key observation to make here is that consumers who access the auction but have bids which are less than the current selling price (standing price) are not allowed to place their bids. In other words, instead of observing the bids of all the customers who access the auction, we only observe the running second maximum of such bids.

Quantity of interest: Each bidder in the consumer population is assumed to have an independent private valuation (IPV) of the product. The IPV assumption in particular makes sense for products that are used for personal use/consumption (such as watches, jewelry, gaming equipment etc.) and is commonly used in the modeling of internet auctions (see Song (2004); Hou and Rego (2007); George and Hui (2012) and the references therein). Economic theory suggests that the dominant strategy for a bidder in a second-price auction is to bid one’s true valuation (Vickrey (1961)). The quantity that we want to estimate from the above data is the distribution of the valuations of the product under consideration for the consumer population. We refer to this as the consumer valuation distribution, and denote it by FF. As noted in George and Hui (2012), knowledge of FF provides the consumer demand curve for the product, and hence can be used by the seller to identify the profit-maximizing price (see the discussion in Section 4.1 from George and Hui (2012)).

Refer to caption
Figure 1: An illustration of a second price auction. True bid values are generated from a Pareto distribution and reserve price is at $2\$2. Blue vertical lines are the bid values, black horizontal lines are the current selling prices, and black dots are the time points when the bids are made.

An illustration of a second price auction: Figure 1 provides a concrete illustration of how a second price auction works. The data for this auction has been simulated from a setting where the bids follow a Pareto distribution with location parameter 33 and dispersion parameter 100100. The waiting times between bids are generated from an exponential distribution with rate parameter 11. In total, 100100 bids are generated in a time period τ\tau of around 115115 minutes. The reserve price for the auction is $2\$2. In Figure 1, the bid values and the current selling prices during the course of the auction are represented by the blue vertical lines and the black horizontal lines, respectively. The black dots on the black horizontal lines represents the time points (in minutes) of the 100100 bids. As can be seen from Figure 1, the initial selling price is equal to the reserve price ($2\$2). We have the first bid of around $8.05\$8.05 at 0.550.55 minutes. Since it’s higher than the reserve price $2\$2, the reserve price still remains the current selling price at 0.550.55 minutes. However, when the second bid of $5.09\$5.09 occurs at 5.965.96 minutes (5.415.41 minutes after the first bid’s occurrence), the current selling price jumps to $5.09\$5.09 as it’s the second highest value among the reserve price ($2\$2) and the two existing placed bids ($8.05\$8.05 and $5.09\$5.09). We don’t observe any jumps in the current selling price for the next few bids as they all are less than the current selling price of $5.09\$5.09 (and hence are not placed). Then we see another jump at 9.659.65 minutes, where a bid of $12.82\$12.82 which makes the current selling price jump to $8.05\$8.05. The next few bids again happen to be less than the current selling price ($8.05\$8.05). At around 2424 minutes, we see a last jump in the current selling price to $10.14\$10.14 (based on a new bid of $10.14\$10.14 which exceeds $8.05\$8.05). The subsequent bids are all less than $10.14\$10.14, it remains the current selling price throughout the rest of the auction period. This can be observed through the flat horizontal black line at $10.14\$10.14 in the time period of (24.7,114.43)(24.7,114.43) minutes. The final selling price for the auction is therefore $10.14\$10.14. The observed data for the above auction is the sequence of current selling prices given by ($2,$5.09,$8.05,$10.14)(\$2,\$5.09,\$8.05,\$10.14) and the sequence of times at which there was a change in the current selling price, given by (5.96,9.65,24)(5.96,9.65,24).

The problem of demand-curve/valuation distribution estimation using auction data has been tackled in the last two decades for a variety of auction frameworks, see Song (2004); Park and Bradlow (2005); Chan, Kadiyali and Park (2007); Yao and Mela (2008); George and Hui (2012); Backus and Lewis (2020) and the references therein. Some of these papers assume a parametric form for FF, but as George and Hui (2012) argue, available auction data may often not be rich enough to verify the validity of the underlying parametric forms. George and Hui (2012) consider the second-price ascending bid auction framework for a single (homogeneous) product described above, and develop a Bayesian non-parametric approach to estimate FF using only the final selling prices in multiple auctions for a jewelry item. Note that the final selling price is not only the second highest placed bid in the auction period, it is also the second maximum order statistic of the (potential or placed) bids of all consumers who access the auction. Using only the final selling prices leads to identifiability problems, and George and Hui (2012) address this by assuming that the total number of consumers accessing the auction is also known. This information is available for the particular jewelry dataset from a third-party vendor, but is generally not available for most such datasets. When the total number of consumers accessing the auction is not known, the identifiability issue can be solved by using another order statistic (Song (2004)) such as the largest placed bid value (if available) along with the final selling price. (Backus and Lewis, 2020, Section 5) use such an approach for analyzing a dataset containing compact camera auctions on eBay.

To the best of our knowledge, none of the existing methods use the entire sequence of current selling price values to estimate the consumer valuation distribution FF. This was the primary motivation for our work, as only using the final selling price (and possibly the maximum placed bid, if available) leaves out a lot of available information. Including the current selling price information throughout the course of the auction however, involves significant conceptual and methodological challenges. As observed previously, the final selling price (second largest placed bid) is also the second largest (potential or placed) bid of all the consumers who accessed the auction. Hence, under relevant assumptions (see beginning of Section 2), the final selling price can be interpreted as the second largest order statistic of i.i.d. samples from FF. This interpretation is central to the methodology developed in George and Hui (2012). However, as the authors in George and Hui (2012) point out, the second largest current selling price throughout the course of the auction is not necessarily the third largest order statistic among all (placed or potential) bids, unless some severely restrictive and unrealisitc assumptions are imposed on the order in which bidders arrive in the auction. Hence, extending the methodology in George and Hui (2012) to include the entire sequence of current selling prices is not feasible; a completely new method of attack is needed.

In this paper, we fill this gap in the literature and develop novel methodology for non-parametric estimation of the consumer valuation distribution in second-price ascending bid auctions which uses the entire sequence of current selling prices. Additionally, the total number of consumers accessing the auction is not assumed to be known (unlike George and Hui (2012)), and the highest placed bid in the auction is also not assumed to be known (unlike Backus and Lewis (2020)). Incorporating the above novel features is very challenging, and the methodological development (provided in Section 2) is quite involved. The extensive simulation results in Section 4 demonstrate the significant accuracy gains in the estimation of FF that can be obtained by using the entire sequence of current selling prices as opposed to just the final selling price. We note that incorporating the current selling prices during the entire auction does come with a cost. In particular, two additional assumptions (compared to those in George and Hui (2012)) that need to be made about the rate of bidder arrival and number of bids made by a single consumer. We find some deviations from these two assumptions in the eBay 77-day Xbox data that we analyze in Section 5, but these deviations are minor (see discussion at beginning of Section 2). In such settings, it is reasonable to expect that the advantage of incorporating substantial additional data/information outweighs the cost of these approximations/deviations. We conclude the paper with a discussion of future directions in Section 6.

2 Methodology for learning the consumer valuation distribution FF

We start by discussing the main assumptions needed for the subsequent methodological development. We consider a setting where we have data from several independent, non-overlapping auctions for a single (homogeneous) product. As mentioned previously, we work within the IPV framework which is quite reasonable for items/products that are used for personal consumption. Similar to George and Hui (2012), we will assume that the collection of bidders who access the auction is an i.i.d. sample from the consumer population for the corresponding product, and that the collection of private product valuations of these bidders is an i.i.d. sample from the consumer valuation distribution FF. As stated in the introduction, it can be shown that the dominant strategy for a bidder in a second-price auction is to bid his/her true valuation. Under this strategy, any consumer who accesses the auction would bid his/her valuation with no need for multiple bidding, and we assume this behavior. We do note that in practice, some consumers do not follow this strategy. For example, eBay provides an option called proxy bidding or automatic bidding which allows the computer to automatically place multiple incremental bids below a cutoff price on behalf of the consumer (see also Ockenfels and Roth (2006); P. Bajari and A. Hortacsu (2003)). Since George and Hui (2012) only use the final selling price, they use a weaker assumption which allows for multiple bidding and stipulates that a consumer bids his/her true valuation sometime before the end of the auction.

Our final assumption is regarding the arrival mechanism of bidders in the auction. We assume that consumers arrive at/access the auction according to a Poisson process with constant rate λ\lambda. Again, a rational bidder (based on economic theory) in a second-price auction should be indifferent to the timing of his/her bid (see Milgrom (2004); Barbaro and Bracht (2021)), and this assumption makes sense in such settings. Again, we note that late-bidding (sniping) has been observed in some eBay auctions (see Bose and Daripa (2017); Barbaro and Bracht (2021)).

To summarize, our assumptions are identical to those in George and Hui (2012) with the exception of the single bidding-assumption and the constant rate of arrival assumption. While these assumptions are supported in general by economic theory, deviations from these two assumptions have been observed in some online auctions. However, for datasets such as the Xbox dataset analyzed in Section 5, these deviations are minor/anomalies. For example, in the Xbox dataset, only around 10%10\% of the bidders with placed bids show a multiple bidding behavior. In such settings, there is certainly value in using the subsequent methodology which uses the entire current selling price/standing price profile and does not assume knowledge of the total number of consumers accessing the auctions. If there is strong evidence/suspicion that the assumptions are being extensively violated, of course the results from this methodology should be treated with due skepticism and caution.

The subsequent methodological development in this section is quite involved, and we have tried to make it accessible to the reader by dividing it into subsections based on the major steps, and then highlighting the key milestones within each subsection, wherever necessary. We start by finding the joint density of the observed data obtained from a single second price auction.

2.1 Joint density of the observed data in a single second price auction

Consider a given (single) second price auction with reserve price rr. Hence, the initial selling price, denoted by X0X_{0}, is equal to rr. The first time a consumer with bid value greater than rr arrives at the auction, that bid is placed but the current selling price remains rr. Subsequently, the current selling price (standing price) changes whenever a bid greater than the existing selling price is placed. Let MM denote the number of times the selling price changes throughout the course of the auction. When M>0M>0, let {Xi}i=1M\{X_{i}\}_{i=1}^{M} denote the sequence of current selling prices observed throughout the course of the auction, with XiX_{i} denoting the new selling/standing price after the ithi^{th} change. When M>0M>0, let TiT_{i} denote the intermediate time between the ithi^{th} and (i+1)th(i+1)^{th} changes in the selling/standing price for 0iM10\leq i\leq M-1. In particular, it follows that when M>0M>0, T0T_{0} denotes the waiting time from the start of the auction until the moment when for the the first time, the selling price changes to a higher value than the reserve price rr. When M=0M=0, we define T0=τT_{0}=\tau. Finally, let OO be a binary random variable indicating whether the item is sold before the end of the auction, i.e., O=1O=1 indicates the item is sold and O=0O=0 indicates that the item is not sold. Our observed data comprises of MM, OO, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, and {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1}.

We define TM=τi=0M1TiT_{M}=\tau-\sum_{i=0}^{M-1}T_{i} as the time after the last selling price change and until the auction closes. As discussed earlier, the number of consumers/bidders accessing a given (single) second price auction, denoted by NN, remains unobserved in our setup. Based on our assumption regarding the arrival mechanism (Poisson process with constant rate of arrival λ\lambda) of bidders in the auction, we note that NPoisson(λτ)N\sim\text{Poisson}(\lambda\tau).

Note that there are three scenarios at the end of the auction: (a) the item is sold above the reserve price (M>0,O=1M>0,O=1), (b) the item is sold at the reserve price (M=0,O=1M=0,O=1), and (c) the item is not sold (M=0,O=0M=0,O=0). The following lemma provides a unified formula for the joint density of the observed data encompassing all these three scenarios.

Lemma 2.1.

For the second price auction described above, the joint density of MM, OO, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1} at values mm, oo, {xi}i=1m\{x_{i}\}_{i=1}^{m}, {ti}i=0m1\{t_{i}\}_{i=0}^{m-1} is given by

g(m,o,{xi}i=1m,{ti}i=0m1)=exp(λτ) 2m(λm+1t0(1F(xm)))1{o=1}exp(λi=0mF(xi)ti)(C1i=1mf(xi))1{m>0},\displaystyle\begin{split}{}&g\Big{(}m,o,\{x_{i}\}_{i=1}^{m},\{t_{i}\}_{i=0}^{m-1}\Big{)}\\ ={}&\exp(-\lambda\tau)\ 2^{m}\ \Big{(}\lambda^{m+1}\ t_{0}\ \big{(}1-F(x_{m})\big{)}\Big{)}^{1_{\{o=1\}}}\exp\bigg{(}\lambda\sum_{i=0}^{m}F(x_{i})t_{i}\bigg{)}\bigg{(}C_{1}\prod_{i=1}^{m}f(x_{i})\bigg{)}^{1_{\{m>0\}}},\end{split}

where C1C_{1} does not depend on FF, x0x_{0} represents the reserve price (rr), λ\lambda denotes the constant rate of arrival of the bidders throughout the course of the auction, and ff represents the density function corresponding to FF.

The proof of Lemma 2.1 is quite involved and is provided in Appendix A. Next, we generalize our analysis to consider data from several independent, non-overlapping auctions of identical copies of a single (homogeneous) product.

2.2 Likelihood based on the observed data from multiple identical, non-overlapping second price auctions

Suppose we consider KK independent second price auctions of identical copies of an item (with possibly different reserve prices r1,r2,,rKr_{1},r_{2},\ldots,r_{K}). The observed data is {(Mk,Ok),{(Xi,k,Ti1,k)}i=1Mk}k=1K\{(M_{k},O_{k})\ ,\ \{(X_{i,k}\ ,\ T_{i-1,k})\}_{i=1}^{M_{k}}\}_{k=1}^{K}, where MkM_{k} denotes the number of selling price changes for the kthk^{th} auction, OkO_{k} denotes the indicator of the item being sold at the end of the kthk^{th} auction, {Xi,k}i=1Mk\{X_{i,k}\}_{i=1}^{M_{k}} denote the selling/standing price sequence for the kthk^{th} auction, and {Ti1,k}i=1Mk\{T_{i-1,k}\}_{i=1}^{M_{k}} denote the sequence of intermediate waiting times between successive changes in the standing prices. Finally, let TMk,k=τi=0Mk1Ti,kT_{M_{k},k}=\tau-\sum_{i=0}^{M_{k}-1}T_{i,k}, for all k=1,2,,Kk=1,2,\ldots,K. Since the auctions are independent, it follows by Lemma 2.1 that the likelihood function of the unknown parameters λ\lambda and FF for observed data values {(mk,ok),{(xi,k,ti1,k)}i=1mk}k=1K\{(m_{k},o_{k})\ ,\ \{(x_{i,k}\ ,\ t_{i-1,k})\}_{i=1}^{m_{k}}\}_{k=1}^{K} is given by Lik(λ,F)=k=1Kg(mk,ok,{xi,k}i=1mk,{ti,k}i=0mk1)Lik(\lambda,F)=\prod_{k=1}^{K}g(m_{k},o_{k},\{x_{i,k}\}_{i=1}^{m_{k}},\{t_{i,k}\}_{i=0}^{m_{k}-1}).

Ideally, one would like to obtain estimates of λ\lambda and FF by maximizing the function Lik(λ,F)Lik(\lambda,F). However, this likelihood function is intractable for direct maximization. A natural direction to proceed is to use the alternative maximization approach, which produces a sequence of iterates by alternatively maximizing LikLik with respect to FF given the current value of λ\lambda and then maximizing LikLik with respect to λ\lambda given the current value of FF. However, we found that such an approach can suffer from instability issues, which is not very surprising given the highly complicated and non-convex setting. Hence, we pursue and develop a slightly different approach which consists of two major steps:

  • Directly obtain an estimator λ^\widehat{\lambda} of λ\lambda using generalized method of moments.

  • Obtain an estimate of FF by maximizing the profile log-likelihood Lik(λ^,F)Lik(\widehat{\lambda},F).

Both of the steps above, especially the maximization with respect to FF, require intricate analysis, and we careful describe the details in Sections 2.3 and 2.4 below. This approach is computationally much less expensive than alternative maximization of λ\lambda and FF, and as our extensive simulations in Section 4 show, also provides stable and accurate estimates.

2.3 Estimation of λ\lambda: Generalized method of moments

Consider first a single second price auction with reserve price rr, and recall that MM denotes the number of times the selling price changes throughout the course of the auction. Our goal is to find a function hh such that E[h(M)]=λE[h(M)]=\lambda. To this end, we consider the process of consumers accessing the auction whose bid value is greater than or equal to rr. Since we are assuming that the consumers bid their true private value, it follows that the proportion of such consumers in the population of all customers is 1F(r)1-F(r), and this “thinned” process of arriving consumers with bid values greater than rr is a Poisson process with rate λ(1F(r))\lambda(1-F(r)). Let NrN_{r} represent the total number of consumers who access the auction in the period [0,τ][0,\tau] and have bid values greater than the reserve price rr. Then NrPoisson(λτ(1F(r)))N_{r}\sim\mbox{Poisson}(\lambda\tau(1-F(r))). Moreover, given Nr=nN_{r}=n, let AiA_{i} (i=1,2,,ni=1,2,\ldots,n) represent the event that the current selling price changes after the ithi^{\text{th}} consumer (with bid greater than rr) accesses the auction. Let, 1Ai\textbf{1}_{A_{i}} be the indicator function of the occurrence of the event AiA_{i}.

Note that E[MNr=0]=0=E[MNr=1]E[M\mid N_{r}=0]=0=E[M\mid N_{r}=1], and for n2n\geq 2, we have

E[MNr=n]=E[Number of selling price changesNr=n]=\displaystyle E\big{[}M\mid N_{r}=n\big{]}=E\big{[}\text{Number of selling price changes}\mid N_{r}=n\big{]}={} E[i=1n1Ai|Nr=n]\displaystyle E\bigg{[}\sum_{i=1}^{n}\textbf{1}_{A_{i}}\Big{|}N_{r}=n\bigg{]}
=(a)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}} i=2nP(AiNr=n)\displaystyle\sum_{i=2}^{n}P\big{(}A_{i}\mid N_{r}=n\big{)}
=(b)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}} i=2n2(i1)i(i1)\displaystyle\sum_{i=2}^{n}\frac{2(i-1)}{i(i-1)}
=\displaystyle={} 2i=2ni1.\displaystyle 2\sum_{i=2}^{n}i^{-1}. (2.1)

Here (a)(a) follows from the fact that two bids above rr are needed for the first change in the standing/selling price. For (b)(b), note that the arrival of ithi^{\text{th}} consumer with bid greater than rr changes the selling price if and only if the corresponding bid is the highest or second highest among the ii reserve price exceeding bids. Note that these bid values are i.i.d. with distribution FF truncated above rr. There are i(i1)i(i-1) possible choices for the joint positions of the highest and second highest bids. The ithi^{th} bid is the highest bid for (i1)(i-1) of these choices, and the second highest bid for another (i1)(i-1) choices, leading us to (b)(b). It follows from (2.1) that

E[M]=2E[1{Nr>1}i=2Nri1]=:g(λτ(1F(r))) (say). E[M]=2E\bigg{[}1_{\{N_{r}>1\}}\sum_{i=2}^{N_{r}}i^{-1}\bigg{]}=:g\big{(}\lambda\tau\big{(}1-F(r)\big{)}\big{)}\mbox{ (say). }

Note that 1{Nr>1}i=2Nri11_{\{N_{r}>1\}}\sum_{i=2}^{N_{r}}i^{-1} is increasing in NrN_{r}, and NrN_{r} is stochastically increasing in terms of its mean parameter λτ(1F(r))\lambda\tau(1-F(r)). Hence gg is a strictly increasing function, and

λ=g1(E[M])/τ(1F(r)).\lambda=g^{-1}(E[M])/\tau(1-F(r)). (2.2)

If the reserve price rr is negligible, for example compared to the smallest final selling price seen in the data set, then it is reasonable to assume that F(r)0F(r)\approx 0. Suppose now, we consider the data from KK independent second price auctions of identical copies of an item, with MkM_{k} denoting the number of standing/selling price changes throughout the course of the kthk^{th} auction, and with rkr_{k} the reserve price for the kthk^{th} auction for 1kK1\leq k\leq K. Let KrK_{r} be the collection of all auction indices with negligible reserve prices. Then, it follows that from (2.2) that

λ^:=τ1g1(|Kr|1kKrMk),\widehat{\lambda}:=\tau^{-1}\ g^{-1}\bigg{(}|K_{r}|^{-1}\sum_{k\in K_{r}}M_{k}\bigg{)},

should be a reasonable generalized method of moments estimator for λ\lambda.

Of course, the function gg is not available in closed form and needs to be computed using numerical methods. A natural approach, given the definition of gg as a Poisson expectation, is Monte Carlo. Indeed, we computed g(x)g(x) for every xx on a fine grid (with spacing 0.10.1) ranging from 0 to 55. This Monte Carlo computation of gg is a one-time process that required minimal computational effort. The resulting plot of gg is provided in Figure 2.

Refer to caption
Figure 2: Plot of gg over the interval [0,5]\left[0,5\right].

2.4 Estimation of FF: Some new notation based on pooled standing prices across all auctions

With the estimator λ^\widehat{\lambda} in our hand, we now obtain an estimate of the valuation distribution FF by maximizing the function LikP(F):=Lik(λ^,F)Lik_{P}(F):=Lik(\widehat{\lambda},F), which can be thought of as a version of profile likelihood for FF. Here the nuisance parameter λ\lambda has been profiled out not by conditional maximization, but by substituting the generalized method of moments estimator λ^\widehat{\lambda}.

Note that the function FF is constrained to be non-decreasing. A key transformation to a constraint-free parametrization (described in Section 2.5 below) is needed to facilitate the maximization of LikP(F)Lik_{P}(F). A crucial precursor to this re-parametrization is introduction of some new notation obtained by merging the standing prices from all the KK different auctions together. Let L=k=1KMkL=\sum_{k=1}^{K}M_{k} denote the total number of standing/selling price changes in all the KK auctions. Recall that {(mk,ok),{(xi,k,ti1,k)}i=1mk}k=1K\{(m_{k},o_{k})\ ,\ \{(x_{i,k}\ ,\ t_{i-1,k})\}_{i=1}^{m_{k}}\}_{k=1}^{K} denote the observed data values, and tmk,k=τi=0mk1ti,kt_{m_{k},k}=\tau-\sum_{i=0}^{m_{k}-1}t_{i,k} for 1kK1\leq k\leq K. Let \ell denote the observed value of LL. We will denote by 𝐱¯=(x¯1,x¯2,,x¯){\bf\bar{x}}=(\bar{x}_{1},\bar{x}_{2},\cdots,\bar{x}_{\ell}) the arrangement/ordering of the pooled collection  {{xi,k}i=1mk}k=1K\{\{x_{i,k}\}_{i=1}^{m_{k}}\}_{k=1}^{K}  such that x¯1<x¯2<<x¯\bar{x}_{1}<\bar{x}_{2}<\ldots<\bar{x}_{\ell}; under the assumption that FF is a continuous cdf, there should be no ties in the entries of 𝐱¯{\bf\bar{x}} with probability one. In other words, we pool the standing prices from all the auctions (excluding the reserve prices) and then arrange them in ascending order as (x¯1,x¯2,,x¯)(\bar{x}_{1},\bar{x}_{2},\cdots,\bar{x}_{\ell}). Also, for 1iL1\leq i\leq L, we define t¯i=ti¯,k¯\bar{t}_{i}=t_{\bar{i},\bar{k}} where i¯\bar{i} and k¯\bar{k} are such that x¯i=xi¯,k¯\bar{x}_{i}=x_{\bar{i},\bar{k}}. Further, let

S:=\displaystyle S:={} Set of ranks/positions ofxmk,k(k=1,2,,K)in𝐱¯for all the auctions where the\displaystyle\text{Set of ranks/positions of}\ x_{m_{k},k}\ (k=1,2,\ldots,K)\ \text{in}\ {\bf\bar{x}}\ \text{for all the auctions where the}
item is sold above the reserve price,
Ks:=\displaystyle K_{s}:={} Set of auction indices for which the item is sold.\displaystyle\text{Set of auction indices for which the item is sold}.

Consider, for example a setting with K=4K=4 independent second price auctions, with reserve prices (r1,r2,r3,r4)=(10,5,13,7)(r_{1},r_{2},r_{3},r_{4})=(10,5,13,7). Suppose that the first auction has m1=3m_{1}=3 standing price changes with (x1,1,x2,1,x3,1)=(12,15,19)(x_{1,1},\ x_{2,1},\ x_{3,1})=(12,15,19), the second auction has m2=4m_{2}=4 standing price changes with (x1,2,x2,2,x3,2,x4,2)=(16,18,20,25)(x_{1,2},\ x_{2,2},\ x_{3,2},\ x_{4,2})=(16,18,20,25), the item is sold at the reserve price r3=13r_{3}=13 in the third auction (m3=0,o3=1m_{3}=0,o_{3}=1), and the item is unsold in the fourth auction (m4=0,o4=1m_{4}=0,o_{4}=1). Pooling and rearranging the standing prices (excluding reserve prices) from all the auctions, we see that =3+4+0+0=7\ell=3+4+0+0=7, and

𝐱¯=(x¯1,x¯2,x¯3,x¯4,x¯5,x¯6,x¯7)=(12,15,16,18,19,20,25).{\bf\bar{x}}=(\bar{x}_{1},\bar{x}_{2},\bar{x}_{3},\bar{x}_{4},\bar{x}_{5},\bar{x}_{6},\bar{x}_{7})=(12,15,16,18,19,20,25).

Note that the auction item is sold above the reserve price in the first two auctions, and the final selling prices are xm1,1=19x_{m_{1},1}=19 and xm2,2=25x_{m_{2},2}=25 respectively. Examining the positions of these two prices in 𝐱¯{\bf\bar{x}} gives us S={5,7}S=\{5,7\}. Finally, Ks={1,2,3}K_{s}=\{1,2,3\} is the collection of auction indices where the item is sold.

Using the newly introduced notation above and Lemma 2.1, it follows that the function LikP(F):=Lik(λ^,F)Lik_{P}(F):=Lik(\widehat{\lambda},F) is

LikP(F)=\displaystyle Lik_{P}\left(F\right)={} Cexp(Kλ^τ)λ^+|Ks| 2kKst0,kiS(1F(x¯i))\displaystyle C^{\star}\exp\big{(}-K\widehat{\lambda}\tau\big{)}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\ \prod_{i\in S}\big{(}1-F(\bar{x}_{i})\big{)}
×exp(λ^(k=1KF(rk)t0,k+j=1F(x¯j)t¯j))[j=1f(x¯j)]1{>0},\displaystyle{\color[rgb]{0,0,0}\times\exp\bigg{(}\widehat{\lambda}\bigg{(}\sum_{k=1}^{K}F(r_{k})t_{0,k}+\sum_{j=1}^{\ell}F(\bar{x}_{j})\bar{t}_{j}\bigg{)}\bigg{)}\ \bigg{[}\prod_{j=1}^{\ell}f(\bar{x}_{j})\bigg{]}^{1_{\{\ell>0\}}},} (2.3)

where CC^{\star} doesn’t depend on FF. Maximization of the above likelihood over absolutely continuous CDFs leads to one of the standard difficulties in non-parametric estimation. As FF moves closer and closer to a CDF with a jump discontinuity at any x¯j\bar{x}_{j}, the function LikP(F)Lik_{P}(F) converges to infinity. Hence, any absolutely continuous CDF with a density function cannot be a maximizer of the above profile likelihood function. Following widely used convention in the literature (see Murphy (1994), Vardi (1982)), we will extend the parameter space to allow for the MLE of FF to be a discrete distribution function. To allow for discrete CDFs, we replace f(x¯j)f(\bar{x}_{j}) by F(x¯j)F(x¯j)F(\bar{x}_{j})-F(\bar{x}_{j}-). Thus the adapted profile likelihood can be written as

LikPA(F)=\displaystyle Lik_{PA}\left(F\right)={} Cexp(Kλ^τ)λ^+|Ks| 2kKst0,kiS(1F(x¯i))\displaystyle C^{\star}\exp\big{(}-K\widehat{\lambda}\tau\big{)}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\ \prod_{i\in S}\big{(}1-F(\bar{x}_{i})\big{)}
×exp(λ^(k=1KF(rk)t0,k+j=1F(x¯j)t¯j))[j=1(F(x¯j)F(x¯j))]1{>0},\displaystyle\times\exp\bigg{(}\widehat{\lambda}\bigg{(}\sum_{k=1}^{K}{\color[rgb]{0,0,0}F(r_{k})}t_{0,k}+\sum_{j=1}^{\ell}F(\bar{x}_{j})\bar{t}_{j}\bigg{)}\bigg{)}\ \bigg{[}\prod_{j=1}^{\ell}\Big{(}F(\bar{x}_{j})-F(\bar{x}_{j}-)\Big{)}\bigg{]}^{1_{\{\ell>0\}}}, (2.4)

where x¯0=0\bar{x}_{0}=0. We now establish a final bit of notation necessary to introduce the constraint-free reparametrization of FF. We now pool the +K\ell+K standing prices from all the auctions (including the reserve prices), i.e., {{xi,k}i=0mk}k=1K\{\{x_{i,k}\}_{i=0}^{m_{k}}\}_{k=1}^{K}, and arrange them in ascending order as z1<z2<<z+Kz_{1}<z_{2}<\cdots<z_{\ell+K}, and denote 𝐳:=(z1,z2,,z+K){\bf z}:=(z_{1},z_{2},\cdots,z_{\ell+K}). Under the assumption that FF is a continuous cdf, there should be no ties in the entries of 𝐱¯{\bf\bar{x}} with probability one. The only other entries in 𝐳{\bf z} are the KK reserve prices. In practice, it is possible that there are ties in the reserve prices, in which case we add a very small noise to the reserve prices to ensure that there are no ties in the entries of 𝐳{\bf z}. Similar to t¯i\bar{t}_{i}, we define t~i=ti~,k~\widetilde{t}_{i}=t_{\widetilde{i},\widetilde{k}} where i~\widetilde{i} and k~\widetilde{k} are such that zi=xi~,k~z_{i}=x_{\widetilde{i},\widetilde{k}}. Further, let

S¯:=\displaystyle\bar{S}:={} Set of ranks/positions ofxmk,k(k=1,2,,K)in𝐳for all the auctions where the\displaystyle\text{Set of ranks/positions of}\ x_{m_{k},k}\ (k=1,2,\ldots,K)\ \text{in}\ {\bf z}\ \text{for all the auctions where the}
item is sold above the reserve price,
𝐮:=\displaystyle\bf{u}:={} {u1,u2,,u},where ui=position ofx¯iin𝐳,i.e., x¯i=zui.\displaystyle\{u_{1},u_{2},\ldots,u_{\ell}\},\ \text{where }u_{i}=\ \text{position of}\ \bar{x}_{i}\ \text{in}\ {\bf z},\ \text{i.e., }\bar{x}_{i}=z_{u_{i}}.

Since the entries of 𝐱¯{\bf\bar{x}} and 𝐳{\bf z} both are arranged in ascending order, it follows that u1<u2<<uu_{1}<u_{2}<\ldots<u_{\ell}. In the example considered earlier in this subsection with K=4K=4 auctions, by pooling the 44 reserve price values with entries of 𝐱¯{\bf\bar{x}} and rearranging them in ascending order, we obtain

𝐳=(5,7,10,12,13,15,16,18,19,20,25).{\bf z}=(5,7,10,12,13,15,16,18,19,20,25).

Recall that the item is sold above the reserve price only in the first two auctions. By identifying the positions/ranks of xm1,1x_{m_{1},1} and xm2,2x_{m_{2},2} in 𝐳{\bf z}, we obtain S¯=(9,11)\bar{S}=(9,11). Similarly, by identifying the positions/ranks of the entries of 𝐱¯{\bf\bar{x}} in 𝐳{\bf z}, we obtain 𝐮=(4,6,7,8,9,10,11){\bf u}=(4,6,7,8,9,10,11).

It is clear from (2.4) that for maximizing LikPALik_{PA} it is enough to search over the class of CDFs with jump discontinuities at elements of 𝐱¯{\bf\bar{x}}. The next lemma (proved in Appendix B) shows that the search for a maximizer can be further restricted to a certain class of CDFs with possible jump discontinuities at elements of 𝐳{\bf z}.

Lemma 2.2.

Let 𝐳\mathcal{F}_{\bf z} denote the class of CDFs which are piece-wise constant in [0,z+K][0,z_{\ell+K}], such that the set of points of jump discontinuity (in [0,z+K][0,z_{\ell+K}]) is a superset of elements of 𝐱¯{\bf\bar{x}} and a subset of elements of 𝐳{\bf z}. Then, given any cdf FF with jump discontinuities at elements of 𝐱¯{\bf\bar{x}}, there exists F~𝐳\widetilde{F}\in\mathcal{F}_{\bf z} such that LikPA(F)LikPA(F~)Lik_{PA}(F)\leq Lik_{PA}(\widetilde{F}).

For any F𝐳F\in\mathcal{F}_{\bf z}, note that LikPA(F)Lik_{PA}(F) depends on FF only through {F(z1),F(z2)F(z2),,F(zL+K)F(z+K)}\{F(z_{1}),F(z_{2})-F(z_{2}-),\ldots,F(z_{L+K})-F(z_{\ell+K}-)\} or equivalently through

F(𝐳)=(F(z1),F(z2),,F(z+K))F({\bf z})=(F(z_{1}),F(z_{2}),\cdots,F(z_{\ell+K}))

(since FF only has jump discontinuities at elements of 𝐳{\bf z} and is otherwise piece-wise constant). This is typical in a non-parametric setting, and we can hope/expect to only obtain estimates of the valuation distribution FF at the observed standing prices (including the reserve prices).

2.5 Estimation of FF: A constraint-free reparametrization

Note that the entries of the vector F(𝐳)F({\bf z}) are non-decreasing, and this constraint complicates the maximization of FLikPA(F)F\mapsto Lik_{PA}(F). So, we transform F(𝐳)F\left(\bf{z}\right) to another +K\ell+K dimensional parameter vector 𝜽:=(θ1,θ2,,θ+K)T\boldsymbol{\theta}:=\left(\theta_{1},\theta_{2},\ldots,\theta_{\ell+K}\right)^{T} as follows:

θi=G(zi)G(zi1), 1i(+K),\theta_{i}=\dfrac{G(z_{i})}{G(z_{i-1})}\ ,\ \forall\ 1\leq i\leq(\ell+K), (2.5)

where

G(zi)=1F(zi), 1i(+K),andG(z0)=1withz0=0.G(z_{i})=1-F(z_{i})\ ,\ \forall\ 1\leq i\leq(\ell+K),\ \text{and}\ G(z_{0})=1\ \text{with}\ z_{0}=0. (2.6)

Since FF is non-decreasing, and takes values in [0,1][0,1], it follows that θi[0,1]\theta_{i}\in[0,1] (with the convention 0/0:=00/0:=0). Focusing our search on the class of CDFs in 𝐳\mathcal{F}_{\bf z} leads to additional constraints. Since any cdf FF in this class has a jump discontinuity at each x¯l=zul\bar{x}_{l}=z_{u_{l}}, it follows that G(zi)=1F(zi)>0G(z_{i})=1-F(z_{i})>0 for i<ui<u_{\ell}, and G(zi)<G(zi1)G(z_{i})<G(z_{i-1}) for every i𝐮i\in{\bf u}. In other words, we have θi<1\theta_{i}<1 for i𝐮i\in{\bf u}, and θi>0\theta_{i}>0 for i<ui<u_{\ell}. There are no other constraints on the elements of 𝜽\boldsymbol{\theta}. Also, we can retrieve F(𝐳)F\left({\bf z}\right) given 𝜽\boldsymbol{\theta} using the following equality.

F(zi)=1G(zi)=1j=1iθj, 1i(+K).F\left(z_{i}\right)=1-G(z_{i})=1-\prod_{j=1}^{i}\theta_{j}\ ,\ \forall\ 1\leq i\leq(\ell+K). (2.7)

Now, using (2.4), (2.6) and (2.5), we can rewrite the ‘adapted profile’ likelihood LikPALik_{PA} in terms of 𝜽\boldsymbol{\theta} as follows:

LikPA(𝜽)\displaystyle Lik_{PA}(\boldsymbol{\theta})
=\displaystyle={} Cexp(Kλ^τ)λ^+|Ks| 2kKst0,kiSG(x¯i)\displaystyle C^{\star}\exp\big{(}-K\widehat{\lambda}\tau\big{)}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\prod_{i\in S}G(\bar{x}_{i})
×exp(λ^(k=1K(1G(rk))t0,k+l=1(1G(x¯l))t¯l))[l=1(G(x¯l)G(x¯l))]1{>0}\displaystyle\times\exp\bigg{(}\widehat{\lambda}\bigg{(}\sum_{k=1}^{K}\big{(}1-G(r_{k})\big{)}t_{0,k}+\sum_{l=1}^{\ell}\big{(}1-G(\bar{x}_{l})\big{)}\bar{t}_{l}\bigg{)}\bigg{)}\ \bigg{[}\prod_{l=1}^{\ell}\Big{(}G(\bar{x}_{l}-)-G(\bar{x}_{l})\Big{)}\bigg{]}^{1_{\{\ell>0\}}}
=\displaystyle={} Cexp(Kλ^τ)λ^+|Ks| 2kKst0,kiSG(x¯i)\displaystyle C^{\star}\exp\big{(}-K\widehat{\lambda}\tau\big{)}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\prod_{i\in S}G(\bar{x}_{i})
×exp(λ^(Kτk=1KG(rk)t0,kl=1G(x¯l)t¯l))[l=1(G(x¯l)G(x¯l))]1{>0}\displaystyle\times\exp\bigg{(}\widehat{\lambda}\bigg{(}K\tau-\sum_{k=1}^{K}G(r_{k})t_{0,k}-\sum_{l=1}^{\ell}G(\bar{x}_{l})\bar{t}_{l}\bigg{)}\bigg{)}\ \bigg{[}\prod_{l=1}^{\ell}\Big{(}G(\bar{x}_{l}-)-G(\bar{x}_{l})\Big{)}\bigg{]}^{1_{\{\ell>0\}}}
=\displaystyle={} Cλ^+|Ks| 2kKst0,kiSG(x¯i)\displaystyle C^{\star}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\prod_{i\in S}G(\bar{x}_{i})
×exp(λ^(k=1KG(rk)t0,k+l=1G(x¯l)t¯l))[l=1(G(x¯l)G(x¯l))]1{>0}\displaystyle\times\exp\bigg{(}-\widehat{\lambda}\bigg{(}\sum_{k=1}^{K}G(r_{k})t_{0,k}+\sum_{l=1}^{\ell}G(\bar{x}_{l})\bar{t}_{l}\bigg{)}\bigg{)}\ \bigg{[}\prod_{l=1}^{\ell}\Big{(}G(\bar{x}_{l}-)-G(\bar{x}_{l})\Big{)}\bigg{]}^{1_{\{\ell>0\}}}
=\displaystyle={} Cλ^+|Ks| 2kKst0,kiS¯G(zi)exp(λ^i=1+KG(zi)t~i)[l=1(G(zul)G(zul))]1{>0},\displaystyle C^{\star}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\prod_{k\in K_{s}}t_{0,k}\prod_{i\in\bar{S}}G(z_{i})\ \exp\bigg{(}-\widehat{\lambda}\sum_{i=1}^{\ell+K}G(z_{i})\widetilde{t}_{i}\bigg{)}\ \bigg{[}\prod_{l=1}^{\ell}\Big{(}G(z_{u_{l}}-)-G(z_{u_{l}})\Big{)}\bigg{]}^{1_{\{\ell>0\}}},

where u0=0u_{0}=0. Using (2.7), we get

LikPA(𝜽)=\displaystyle Lik_{PA}(\boldsymbol{\theta})={} Cλ^+|Ks| 2(kKst0,k)(iS¯j=1iθj)exp(λ^i=1+Kt~i(j=1iθj))\displaystyle C^{\star}\ \widehat{\lambda}^{\ell+\left|K_{s}\right|}\ 2^{\ell}\bigg{(}\prod_{k\in K_{s}}t_{0,k}\bigg{)}\bigg{(}\prod_{i\in\bar{S}}\prod_{j=1}^{i}\theta_{j}\bigg{)}\ \exp\bigg{(}-\widehat{\lambda}\sum_{i=1}^{\ell+K}\widetilde{t}_{i}\bigg{(}\prod_{j=1}^{i}\theta_{j}\bigg{)}\bigg{)}
×[l=1((1θul)j=1ul1θj)]1{>0}.\displaystyle\times\bigg{[}\prod_{l=1}^{\ell}\bigg{(}(1-\theta_{u_{l}})\prod_{j=1}^{u_{l}-1}\theta_{j}\bigg{)}\bigg{]}^{1_{\{\ell>0\}}}. (2.8)

The goal now is to maximize LikPALik_{PA} with respect to 𝜽\boldsymbol{\theta}, where each entry of 𝜽\boldsymbol{\theta} is in [0,1][0,1], θi<1\theta_{i}<1 for i𝐮i\in{\bf u}, and θi>0\theta_{i}>0 for i<ui<u_{\ell}. We achieve this using the coordinate-wise ascent algorithm. The details of this algorithm are derived in Section 3.

3 Maximizing LikPA(𝜽)Lik_{PA}(\boldsymbol{\theta}): Coordinate ascent algorithm

Applying natural logarithm on both sides of the equation in (2.8), we get

ln(LikPA(𝜽))=\displaystyle\ln(Lik_{PA}(\boldsymbol{\theta}))={} ln(C)+(+|Ks|)ln(λ^)+ln(2)+kKsln(t0,k)+iS¯j=1iln(θj)\displaystyle\ln(C^{\star})+\big{(}\ell+\left|K_{s}\right|\big{)}\ln(\widehat{\lambda})+\ell\ln(2)+\sum_{k\in K_{s}}\ln(t_{0,k})+\sum_{i\in\bar{S}}\sum_{j=1}^{i}\ln(\theta_{j})
λ^i=1+Kt~i(j=1iθj)+1{>0}[l=1ln(1θul)+l=1j=1ul1lnθj],\displaystyle-\widehat{\lambda}\sum_{i=1}^{\ell+K}\widetilde{t}_{i}\bigg{(}\prod_{j=1}^{i}\theta_{j}\bigg{)}+1_{\{\ell>0\}}\bigg{[}\sum_{l=1}^{\ell}\ln(1-\theta_{u_{l}})+\sum_{l=1}^{\ell}\sum_{j=1}^{u_{l}-1}\ln\theta_{j}\bigg{]}, (3.1)

where u0=0u_{0}=0. We now introduce notation which allows for a more compact and accessible representation of ln(LikPA)\ln(Lik_{PA}). Recall that 𝐳{\bf z} is obtained by pooling all the KK reserve prices and the =k=1Kmk\ell=\sum_{k=1}^{K}m_{k} ‘non-reserve’ standing prices (elements of 𝐱¯{\bf\bar{x}}), and uju_{j} represents the position of the x¯i\bar{x}_{i} in 𝐳{\bf z} for 1j+K1\leq j\leq\ell+K. In particular, uu_{\ell} is the position of x¯\bar{x}_{\ell}, the largest ‘non-reserve’ standing price across all the KK auctions in 𝐳{\bf z}. In other words, x¯=zu\bar{x}_{\ell}=z_{u_{\ell}}. It is possible that u<+Ku_{\ell}<\ell+K. For example, in settings where the reserve price in one of the auctions where the item is unsold is larger than x¯\bar{x}_{\ell}, it follows that x¯\bar{x}_{\ell} is not the largest entry in 𝐳{\bf z} and u<+Ku_{\ell}<\ell+K. With this background, we define

li\displaystyle l_{i} =0if1iu11,\displaystyle=0\quad\text{if}\quad 1\leq i\leq u_{1}-1,
li\displaystyle l_{i} =jifujiuj+11,fori=u1,u1+1,,u1,\displaystyle=j\quad\text{if}\quad u_{j}\leq i\leq u_{j+1}-1\ ,\ \text{for}\ i=u_{1},u_{1}+1,\ldots,u_{\ell}-1,
andli\displaystyle\text{and}\ l_{i} =ifui+K.\displaystyle=\ell\quad\text{if}\quad u_{\ell}\leq i\leq\ell+K. (3.2)

In other words, note that u1<u2<<uu_{1}<u_{2}<\cdots<u_{\ell} induce an ordered partition of the set {1,2,,u1}\{1,2,\cdots,u_{\ell}-1\} into \ell disjoint subsets

{1,,u11},{u1,,u21},,{u1,,u1}.\{1,\cdots,u_{1}-1\},\{u_{1},\cdots,u_{2}-1\},\cdots,\{u_{\ell-1},\cdots,u_{\ell}-1\}.

Hence, any 1iu1\leq i\leq u_{\ell} belongs to one of the subsets in the above partition, and lil_{i} is defined to be one less than the position of that subset in the partition. For ui+Ku_{\ell}\leq i\leq\ell+K we define li=l_{i}=\ell. In the example with K=4K=4 auctions from Section 2.4, u=+K=11u_{\ell}=\ell+K=11, and

(l1,l2,l3,l4,l5,l6,l7,l8,l9,l10,l11)=(0,0,0,1,1,2,3,4,5,6,7).(l_{1},l_{2},l_{3},l_{4},l_{5},l_{6},l_{7},l_{8},l_{9},l_{10},l_{11})=(0,0,0,1,1,2,3,4,5,6,7).

Using the above notation, it follows from (3.1) that

ln(LikPA(𝜽))=\displaystyle\ln(Lik_{PA}(\boldsymbol{\theta}))={} ln(C)+(+|Ks|)ln(λ^)+ln(2)+kKsln(t0,k)+i=1+K|Qi|ln(θi)\displaystyle\ln(C^{\star})+\big{(}\ell+\left|K_{s}\right|\big{)}\ln(\widehat{\lambda})+\ell\ln(2)+\sum_{k\in K_{s}}\ln(t_{0,k})+\sum_{i=1}^{\ell+K}\left|Q_{i}\right|\ln(\theta_{i})
λ^i=1+Kt~i(j=1iθj)+1{>0}[l=1ln(1θul)+i=1+K(li)lnθi],\displaystyle-\widehat{\lambda}\sum_{i=1}^{\ell+K}\widetilde{t}_{i}\bigg{(}\prod_{j=1}^{i}\theta_{j}\bigg{)}+1_{\{\ell>0\}}\bigg{[}\sum_{l=1}^{\ell}\ln(1-\theta_{u_{l}})+\sum_{i=1}^{\ell+K}(\ell-l_{i})\ln\theta_{i}\bigg{]}, (3.3)

where

Qi:={jS¯:ji}=Set of jS¯ which are greater than or equal to i.\displaystyle Q_{i}:=\left\{j\in\bar{S}:j\geq i\right\}=\text{Set of $j\in\bar{S}$ which are greater than or equal to $i$}.

Note that

|Q1|=|S¯|=Number of auctions where the item is sold above the reserve price.\left|Q_{1}\right|=\left|\bar{S}\right|=\text{Number of auctions where the item is sold above the reserve price.}

To maximize ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})), we pursue the coordinate-wise ascent approach where each iteration of the algorithm cycles through maximization of ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) with respect to the co-ordinate θi\theta_{i} (with other entries of 𝜽\boldsymbol{\theta} fixed at their current values) for every 1i+K1\leq i\leq\ell+K. We now show that each of these +K\ell+K coordinate-wise maximizers are available in closed form.

3.1 Coordinate-wise maximizers for LikPA(𝜽)Lik_{PA}(\boldsymbol{\theta})

Based on the algebraic structure of LikPA(𝜽)Lik_{PA}(\boldsymbol{\theta}), we divide the coordinate-wise maximization steps into three groups: One with θk\theta_{k} when k𝐮k\in\bf{u}, where 𝐮\bf{u} is defined to be the set {u1,u2,,u}\{u_{1},u_{2},\ldots,u_{\ell}\}, the second with θk\theta_{k} when k𝐮k\notin\bf{u} and k<uk<u_{\ell}, and the third with θk\theta_{k} when kuk\notin u and k>uk>u_{\ell}. We discuss each case in detail separately below.

Case I: Maximization w.r.t. θi\theta_{i} for i𝐮i\in\bf{u}.   If 𝐮{\bf u} is non-empty, then >0\ell>0. For any i𝐮i\in\bf{u}, taking derivative of the expression for ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) in (3.3) w.r.t. θi\theta_{i} and equating it to zero gives us the following

[ln(LikPA(𝜽))]θi=0\displaystyle\dfrac{\partial\big{[}\ln(Lik_{PA}(\boldsymbol{\theta}))\big{]}}{\partial\theta_{i}}=0
\displaystyle\Leftrightarrow{} λ^i~=i+Kt~i~(j=1jii~θj)+(|Qi|+(li))θi11θi=0\displaystyle-\widehat{\lambda}\sum_{\widetilde{i}=i}^{\ell+K}\widetilde{t}_{\widetilde{i}}\Bigg{(}\prod_{\underset{j\neq i}{j=1}}^{\widetilde{i}}\theta_{j}\Bigg{)}+\dfrac{\big{(}\left|Q_{i}\right|+(\ell-l_{i})\big{)}}{\theta_{i}}-\dfrac{1}{1-\theta_{i}}=0
\displaystyle\Leftrightarrow{} Ai+Biθi11θi=0,\displaystyle-A_{i}+\dfrac{B_{i}}{\theta_{i}}-\dfrac{1}{1-\theta_{i}}=0, (3.4)

where

Ai=\displaystyle A_{i}={} λ^i~=i+Kt~i~(j=1jii~θj)>0\displaystyle\widehat{\lambda}\sum_{\widetilde{i}=i}^{\ell+K}\widetilde{t}_{\widetilde{i}}\Bigg{(}\prod_{\underset{j\neq i}{j=1}}^{\widetilde{i}}\theta_{j}\Bigg{)}>0
Bi=\displaystyle B_{i}={} |Qi|+(li)>0.\displaystyle\left|Q_{i}\right|+(\ell-l_{i})>0. (3.5)

Since θi1\theta_{i}\leq 1 and θi>0\theta_{i}>0, it follows that

[ln(LikPA(𝜽))]θi=0\displaystyle\dfrac{\partial\big{[}\ln(Lik_{PA}(\boldsymbol{\theta}))\big{]}}{\partial\theta_{i}}=0
\displaystyle\implies{} Aiθi2(Ai+Bi+1)θi+Bi=0.\displaystyle A_{i}\theta_{i}^{2}-\big{(}A_{i}+B_{i}+1\big{)}\theta_{i}+B_{i}=0. (3.6)

Since Bi>0B_{i}>0, it follows that the discriminant of the quadratic equation (3.6), denoted by DiD_{i}, satisfies

Di=\displaystyle D_{i}={} (Ai+Bi+1)24AiBi\displaystyle\big{(}A_{i}+B_{i}+1\big{)}^{2}-4A_{i}B_{i}
=\displaystyle={} (AiBi+1)2+4Bi> 0.\displaystyle\big{(}A_{i}-B_{i}+1\big{)}^{2}+4B_{i}\ >\ 0. (3.7)

Hence, the quadratic equation (3.6) has two real roots, namely,

θi=(Ai+Bi+1)+-Di2Ai.\theta_{i}=\dfrac{\big{(}A_{i}+B_{i}+1\big{)}\underset{-}{+}\sqrt{D_{i}}}{2A_{i}}. (3.8)

Since Di>AiBi+1\sqrt{D_{i}}>A_{i}-B_{i}+1 by (3.7), it follows that

(Ai+Bi+1)+Di2Ai2(Ai+1)2Ai>1,\displaystyle\dfrac{\big{(}A_{i}+B_{i}+1\big{)}+\sqrt{D_{i}}}{2A_{i}}\geq\dfrac{2\big{(}A_{i}+1\big{)}}{2A_{i}}>1,

since Ai>0A_{i}>0. Hence the larger root with the positive sign for the square-root discriminant always lies outside the set of allowable values for θi\theta_{i}. The smaller root with the negative sign can be shown to be strictly positive since (Ai+Bi+1)2Di=4AiBi>0(A_{i}+B_{i}+1)^{2}-D_{i}=4A_{i}B_{i}>0. Also, if AiBi+1A_{i}\geq B_{i}+1, then

Ai+Bi+1Di<Ai+Bi+12Ai.A_{i}+B_{i}+1-\sqrt{D_{i}}<A_{i}+B_{i}+1\leq 2A_{i}.

If Ai<Bi+1A_{i}<B_{i}+1, then using Ai>0A_{i}>0 we get

(Bi+1Ai)2=(Bi+1+Ai)24AiBi4Ai<Di\displaystyle(B_{i}+1-A_{i})^{2}=(B_{i}+1+A_{i})^{2}-4A_{i}B_{i}-4A_{i}<D_{i}
\displaystyle\Rightarrow (Bi+1+Ai)Di<2Ai.\displaystyle(B_{i}+1+A_{i})-\sqrt{D_{i}}<2A_{i}.

It follows that the smaller root lies in (0,1)(0,1). Since

2[ln(LikPA(𝜽))]θi2=Biθi21(1θi)2< 0,\dfrac{\partial^{2}\big{[}\ln(Lik_{PA}(\boldsymbol{\theta}))\big{]}}{\partial\theta_{i}^{2}}=-\dfrac{B_{i}}{\theta_{i}^{2}}-\dfrac{1}{(1-\theta_{i})^{2}}\ <\ 0,

it follows that the smaller root is the unique maximizer of ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) with respect to θi\theta_{i}. To conclude, the unique maximizer of ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) with respect to θi\theta_{i} is given by

θ^i=(Ai+Bi+1)Di2Ai,\widehat{\theta}_{i}=\dfrac{\big{(}A_{i}+B_{i}+1\big{)}-\sqrt{D_{i}}}{2A_{i}}, (3.9)

where AiA_{i} and BiB_{i} are as defined in (3.5).

Case II: Maximization w.r.t. θi\theta_{i} for i𝐮i\notin\bf{u} and i<ui<u_{\ell}. For any θi\theta_{i} with i𝐮i\notin\bf{u} and i<ui<u_{\ell}, the coefficient of ln(θi)\ln(\theta_{i}), given by |Qi|+1{>0}(li)\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i}), is strictly positive. Again taking derivative of the log-likelihood expression in (3.3) w.r.t. θi\theta_{i} and equating it to zero gives us

[ln(LikPA(𝜽))]θi=0\displaystyle\dfrac{\partial\big{[}\ln(Lik_{PA}(\boldsymbol{\theta}))\big{]}}{\partial\theta_{i}}=0
\displaystyle\Leftrightarrow{} λ^i~=i+Kt~i~(j=1jii~θj)+(|Qi|+1{>0}(li))θi=0\displaystyle-\widehat{\lambda}\sum_{\widetilde{i}=i}^{\ell+K}\widetilde{t}_{\widetilde{i}}\Bigg{(}\prod_{\underset{j\neq i}{j=1}}^{\widetilde{i}}\theta_{j}\Bigg{)}+\dfrac{\big{(}\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i})\big{)}}{\theta_{i}}=0
\displaystyle\Leftrightarrow{} θi=(|Qi|+1{>0}(li))Ai,\displaystyle\theta_{i}=\dfrac{\big{(}\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i})\big{)}}{A_{i}},

where AiA_{i} is as defined in (3.5). Note that (|Qi|+1{>0}(li))/Ai{(\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i}))}/{A_{i}} is positive but not guaranteed to be less than or equal to 11. However, since

2[ln(LikPA(𝜽))]θi2=(|Qi|+1{>0}(li))θi2< 0,\dfrac{\partial^{2}\big{[}\ln(Lik_{PA}(\boldsymbol{\theta}))\big{]}}{\partial\theta_{i}^{2}}=-\dfrac{\big{(}\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i})\big{)}}{\theta_{i}^{2}}\ <\ 0,

it follows that [ln(LikPA(𝜽))]/θi>0{\partial[\ln(Lik_{PA}(\boldsymbol{\theta}))]}/{\partial\theta_{i}}>0, i.e., ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) is an increasing function of θi\theta_{i} for θi<(|Qi|+1{>0}(li))/Ai\theta_{i}<{(\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i}))}/{A_{i}}. Hence, the unique maximizer of ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) with respect to θi\theta_{i} is given by

θ^i=min{1,(|Qi|+1{>0}(li))Ai}.\widehat{\theta}_{i}=\min\bigg{\{}1,\dfrac{\big{(}\left|Q_{i}\right|+1_{\{\ell>0\}}(\ell-l_{i})\big{)}}{A_{i}}\bigg{\}}. (3.10)

Case III: Maximization w.r.t. θi\theta_{i} for i𝐮i\notin\bf{u} and i>ui>u_{\ell}. In this case |Qi|=0|Q_{i}|=0 and li=l_{i}=\ell. It follows from (3.3) that ln(LikPA(𝜽))\ln(Lik_{PA}(\boldsymbol{\theta})) is maximized with respect to θi\theta_{i} at 0. Hence, we set

θ^i=0.\widehat{\theta}_{i}=0. (3.11)

This amounts to estimating F(zi)F(z_{i}) for i>ui>u_{\ell} by 11. Note that any such ziz_{i} corresponds to a reserve price which is greater than the largest observed bid. Since the data offers no evidence that the support of the true valuation distribution FF extends up to ziz_{i}, setting the estimate of F(zi)F(z_{i}) to 11 indeed seems a sensible choice in this non-parametric setting.

3.2 Constructing an initial estimate F^init\widehat{F}_{init} of FF (and 𝜽(0)\boldsymbol{\theta}^{(0)} of 𝜽\boldsymbol{\theta}) based exclusively on final selling prices and first observed bids

The details of all the steps of the coordinate ascent maximization algorithm for LikPALik_{PA} are explicitly derived above in Section 3.1. However, a crucial detail which needs to be worked out is a ‘good’ initial starting point for the algorithm. Especially for highly non-convex maximizations such as in the current setting, the choice of the initial/starting value can play a critical role in the quality of the final estimate produced by the coordinate ascent algorithm. In this section, we construct an initial estimate of FF based on the empirical distribution functions of both the final selling prices and the first observed bids (i.e., the price when for the first time the standing price jumps to a higher value from its respective reserve price), respectively. Note that the methodology developed in George and Hui (2012) also relies exclusively on the final selling prices. However, that approach requires the knowledge of the total number of consumers accessing each of the KK auctions. We do not assume this knowledge in the current setting, and need to overcome this additional challenge. Also, as stated above, we will use the first non-reserve standing prices (first observed bids) to improve the quality of our initial estimator of FF.

Once the initial estimate F^init\widehat{F}_{init} of FF is constructed, we can easily construct the initial estimate 𝜽(0):=(θ1(0),θ2(0),,θ+K(0))T\boldsymbol{\theta}^{(0)}:=(\theta_{1}^{(0)},\theta_{2}^{(0)},\ldots,\theta_{\ell+K}^{(0)})^{T} of 𝜽\boldsymbol{\theta} using (2.5) as follows:

θi(0)=1F^init(zi)1F^init(zi1), 1i(+K).\theta_{i}^{(0)}=\frac{1-\widehat{F}_{init}(z_{i})}{1-\widehat{F}_{init}(z_{i-1})},\ \forall\ 1\leq i\leq(\ell+K). (3.12)

We now describe in detail the various steps involved in construction of F^init\widehat{F}_{init}.

Step I: Construct an estimate of FF based on the empirical distribution function of only the final selling prices of auctions with relatively small reserve prices.   First, consider a single second price auction with reserve price X0=rX_{0}=r. As in Section 2.3 consider the process of consumers accessing the auction whose bid value is greater than or equal to rr, and let NrN_{r} represents the number of such consumers that access the auction in the period [0,τ][0,\tau]. As observed in Section 2.3, this thinned process of arriving consumers is a Poisson process with rate λ(1F(r))\lambda(1-F(r)), and NrPoisson(λτ(1F(r)))N_{r}\sim Poisson(\lambda\tau(1-F(r))). We derive the conditional cdf of the final selling price XMX_{M} given X0=r,Nr2X_{0}=r,N_{r}\geq 2 as a function of F(x)F(x). For this purpose, note that

P(XMxX0=r,Nr2)=\displaystyle P(X_{M}\leq x\mid X_{0}=r,N_{r}\geq 2)={} n=2P(XMxNr=n,X0=r)P(Nr=nNr2)\displaystyle\sum_{n=2}^{\infty}P(X_{M}\leq x\mid N_{r}=n,X_{0}=r)P(N_{r}=n\mid N_{r}\geq 2)
=\displaystyle={} n=2P(XMx,R>xX0=r,Nr=n)P(Nr=nNr2)\displaystyle\sum_{n=2}^{\infty}P(X_{M}\leq x,R>x\mid X_{0}=r,N_{r}=n)P(N_{r}=n\mid N_{r}\geq 2)
+n=2P(XMx,RxX0=r,Nr=n)P(Nr=nNr2).\displaystyle+\sum_{n=2}^{\infty}P(X_{M}\leq x,R\leq x\mid X_{0}=r,N_{r}=n)P(N_{r}=n\mid N_{r}\geq 2). (3.13)

Recall that RR denotes the maximum placed bid during the course of the auction (we do not observe it), and that the valuation distribution of customers arriving in the thinned Poisson process discussed above is the truncated version of FF at rr, denoted by

Fr(x):=F(x)F(r)1F(r) 1{x>r}.F_{r}(x):=\frac{F(x)-F(r)}{1-F(r)}\ 1_{\{x>r\}}.

Now, note that the event {XMx,R>x}\{X_{M}\leq x,R>x\} is equivalent to the constraint that the largest order statistic of the valuations of all the customers arriving via the thinned Poisson process is greater than xx, but second largest order statistic is less than or equal to xx. Similarly, the event {XMx,Rx}\{X_{M}\leq x,R\leq x\} is equivalent to the constraint that the largest order statistic of the valuations of all the customers arriving via the thinned Poisson process is less than or equal to xx. With λr=λ(1F(r))\lambda_{r}=\lambda(1-F(r)), it follows from (3.13) that

P(XMxX0=r,Nr2)\displaystyle P(X_{M}\leq x\mid X_{0}=r,N_{r}\geq 2)
=\displaystyle={} 1P(N2)[n=2n(Fr(x))n1(1Fr(x))exp(λrτ)(λrτ)nn!+n=2(Fr(x))nexp(λrτ)(λrτ)nn!]\displaystyle\frac{1}{P(N\geq 2)}\bigg{[}\sum_{n=2}^{\infty}\frac{n(F_{r}(x))^{n-1}(1-F_{r}(x))\exp(-{\lambda_{r}}\tau)({\lambda_{r}}\tau)^{n}}{n!}+\sum_{n=2}^{\infty}\frac{(F_{r}(x))^{n}\exp(-{\lambda_{r}}\tau)({\lambda_{r}}\tau)^{n}}{n!}\bigg{]}
=\displaystyle={} 1P(N2)[λrτexp(λrτ)(1Fr(x))n=2(λrτFr(x))n1(n1)!+exp(λrτ)n=2(λrτFr(x))nn!]\displaystyle\frac{1}{P(N\geq 2)}\bigg{[}{\lambda_{r}}\tau\exp(-{\lambda_{r}}\tau)(1-F_{r}(x))\sum_{n=2}^{\infty}\frac{({\lambda_{r}}\tau F_{r}(x))^{n-1}}{(n-1)!}+\exp(-{\lambda_{r}}\tau)\sum_{n=2}^{\infty}\frac{({\lambda_{r}}\tau F_{r}(x))^{n}}{n!}\bigg{]}
=\displaystyle={} λrτexp(λrτ)(1Fr(x))(exp(λrτFr(x))1)+exp(λrτ)(exp(λrτFr(x))λrτFr(x)1)P(N2)\displaystyle\frac{{\lambda_{r}}\tau\exp(-{\lambda_{r}}\tau)(1-F_{r}(x))\big{(}\exp({\lambda_{r}}\tau F_{r}(x))-1\big{)}+\exp(-{\lambda_{r}}\tau)\big{(}\exp({\lambda_{r}}\tau F_{r}(x))-{\lambda_{r}}\tau F_{r}(x)-1\big{)}}{P(N\geq 2)}
=\displaystyle={} exp(λrτ)(λrτ(1η)(exp(λrτη)1)+exp(λrτη)λrτη1)1exp(λrτ)λrτexp(λrτ),\displaystyle\frac{\exp(-{\lambda_{r}}\tau)\ \Big{(}{\lambda_{r}}\tau(1-\eta)\big{(}\exp({\lambda_{r}}\tau\eta)-1\big{)}+\exp({\lambda_{r}}\tau\eta)-{\lambda_{r}}\tau\eta-1\Big{)}}{1-\exp(-{\lambda_{r}}\tau)-{\lambda_{r}}\tau\exp(-{\lambda_{r}}\tau)},

where η=Fr(x)\eta=F_{r}(x). Let,

Gλr(η):=exp(λrτ)(λrτ(1η)(exp(λrτη)1)+exp(λrτη)λrτη1)1exp(λrτ)λrτexp(λrτ).G_{{\lambda_{r}}}(\eta):=\frac{\exp(-{\lambda_{r}}\tau)\ \Big{(}{\lambda_{r}}\tau(1-\eta)\big{(}\exp({\lambda_{r}}\tau\eta)-1\big{)}+\exp({\lambda_{r}}\tau\eta)-{\lambda_{r}}\tau\eta-1\Big{)}}{1-\exp(-{\lambda_{r}}\tau)-{\lambda_{r}}\tau\exp(-{\lambda_{r}}\tau)}. (3.14)

Note that

ddηGλr(η)=\displaystyle\frac{d}{d\eta}G_{{\lambda_{r}}}(\eta)={} (λrτ)2(1η)exp(λrτη)λrτ(exp(λrτη)1)+λrτexp(λrτη)λrτexp(λrτ)(1+λrτ)\displaystyle\frac{({\lambda_{r}}\tau)^{2}(1-\eta)\exp({\lambda_{r}}\tau\eta)-{\lambda_{r}}\tau\big{(}\exp({\lambda_{r}}\tau\eta)-1\big{)}+{\lambda_{r}}\tau\exp({\lambda_{r}}\tau\eta)-{\lambda_{r}}\tau}{\exp({\lambda_{r}}\tau)-(1+{\lambda_{r}}\tau)}
=\displaystyle={} (λrτ)2(1η)exp(λrτη)exp(λrτ)(1+λrτ)> 0forη(0,1).\displaystyle\frac{({\lambda_{r}}\tau)^{2}(1-\eta)\exp({\lambda_{r}}\tau\eta)}{\exp({\lambda_{r}}\tau)-(1+{\lambda_{r}}\tau)}\ >\ 0\quad\text{for}\ \eta\in(0,1). (3.15)

It follows that GλrG_{\lambda_{r}} is a strictly increasing function for η[0,1]\eta\in[0,1].

Now, coming back to our setting with KK independent auctions, suppose that we have multiple auctions with a given reserve price rr (or close to rr) where the item is sold above the reserve price. Then based on the Glivenko-Cantelli lemma, (3.14) and (3.15), we can use the function Gλr1G_{\lambda_{r}}^{-1} (with an appropriate estimate of λr\lambda_{r}) applied to empirical cdf of the final selling prices of these auctions to estimate Fr(x)F_{r}(x) for x>rx>r. Setting the estimate of F(r)F(r) to be zero, we can then obtain an estimate of F(x)F(x) for x>rx>r. Clearly, we would like to choose rr to be as small as possible.

With this background, let rminr_{\min} denote smallest non-negative number such that there are at least a qq-fraction of the reserve prices among {r1,r2,,rK}\{r_{1},r_{2},\cdots,r_{K}\} lie in (rminϵ,rmin+ϵ)(r_{\min}-\epsilon,r_{\min}+\epsilon). Here q(0,1)q\in(0,1) and ϵ\epsilon are user-specified constants, and we denote the set of indices of reserve prices which lie within (rminϵ,rmin+ϵ)(r_{\min}-\epsilon,r_{\min}+\epsilon) as V(q,ϵ)V(q,\epsilon). Ideally, one would like to have a reasonable number of auctions with very small/negligible reserve prices. For example, in the Xbox data analyzed in Section 5, roughly 25%25\% of the auctions have a reserve price less than $1\$1 (the smallest final selling price is $28\$28). Let

GSP(x):=1|V(q,ϵ)|kV(q,ϵ)1{Xmk,kx},G_{SP}(x):=\frac{1}{|V(q,\epsilon)|}\sum_{k\in V(q,\epsilon)}1_{\{X_{m_{k},k}\ \leq\ x\}},

be the empirical distribution function of the final selling prices for auctions in V(q,ϵ)V(q,\epsilon). Based on the above discussion we construct the estimator F^SP\widehat{F}_{SP} of FF as

F^SP(x)=Gλ^1(GSP(x)),x.\widehat{F}_{SP}(x)=G_{\widehat{\lambda}}^{-1}(G_{SP}(x)),\ \forall\ x\in\mathbb{R}. (3.16)

Since Gλ^1(0)=0G_{\widehat{\lambda}}^{-1}(0)=0, it follows that F^SP(x)=0\widehat{F}_{SP}(x)=0 for xrminx\leq r_{\min}. In fact, F^SP(x)=0\widehat{F}_{SP}(x)=0 for all values below the smallest final selling price for auctions corresponding to V(q,ϵ)V(q,\epsilon)). There are likely many observed standing prices in the KK auctions which are below this smallest final selling price, and these values can/should be used to improve the estimator F^SP\widehat{F}_{SP}. This process is described in the next step.

Step II: Incorporate the first non-reserve standing prices into the construction of the initial estimate of FF.   Consider again, to begin with, a single second price auction with reserve price rr, and the associated thinned Poisson process of arriving consumers with valuation greater than rr. Letting Y1,Y2Y_{1},Y_{2} represent valuations of the first two arriving consumers in the thinned process, we have

P(X1xX0=r,Nr2)=P(min{Y1,Y2}x)=1(1Fr(x))2.P(X_{1}\leq x\mid X_{0}=r,N_{r}\geq 2)=P(\min\{Y_{1},Y_{2}\}\leq x)=1-(1-F_{r}(x))^{2}. (3.17)

Similar to Step I, let

GFP(x):=1|V(q,ϵ)|kV(q,ϵ)1{X1,kx},G_{FP}(x):=\frac{1}{|V(q,\epsilon)|}\sum_{k\in V(q,\epsilon)}1_{\{X_{1,k}\ \leq\ x\}},

be the empirical distribution function of the first non-reserve standing prices for auctions in V(q,ϵ)V(q,\epsilon). Based on (3.17), we construct the estimator F^FP\widehat{F}_{FP} of FF as

F^FP(x)=11GFP(x),x.\widehat{F}_{FP}(x)=1-\sqrt{1-G_{FP}(x)},\ \forall\ x\in\mathbb{R}. (3.18)

Note that GFP(rmin)=0G_{FP}(r_{\min})=0, which implies that F^FP(rmin)=0\widehat{F}_{FP}(r_{\min})=0. However, F^FP(x)>0\widehat{F}_{FP}(x)>0 when xx is larger than the smallest first non-reserve standing price among auctions in V(q,ϵ)V(q,\epsilon). This smallest non-reserve standing price is often much smaller than the smallest final selling price, and hence F^FP\widehat{F}_{FP} can be combined with F^SP\widehat{F}_{SP} of Step I, to get a better initial estimate of FF as follows.

Step III: Combining the two different initial estimates, namely, F^FP\widehat{F}_{FP} and F^SP\widehat{F}_{SP}.   Let p1p_{1} and p2p_{2} respectively represent the largest non-reserve standing price and the smallest final selling price for auctions in V(q,ϵ)V(q,\epsilon). As discussed previously, F^SP\widehat{F}_{SP} underestimates FF below p2p_{2} and F^FP\widehat{F}_{FP} overestimates FF above p1p_{1}. Let cc be the largest real number min{p1,p2}\leq\min\{p_{1},p_{2}\} such that F^FP(c)F^SP(m1)\widehat{F}_{FP}(c)\leq\widehat{F}_{SP}(m_{1}). Then, we define a function F^(0)\widehat{F}_{(0)} based on F^FP\widehat{F}_{FP} and F^SP\widehat{F}_{SP} as follows:

F^(0)(x)={F^FP(x)ifxcF^SP(x)ifx>p1F^FP(c)+(F^SP(p1)F^FP(c)p1c)(xc)ifc<xp1.\widehat{F}_{(0)}(x)=\begin{cases}\widehat{F}_{FP}(x)&\text{if}\ \ x\leq c\\ \widehat{F}_{SP}(x)&\text{if}\ \ x>p_{1}\\ \widehat{F}_{FP}(c)+\Big{(}\frac{\widehat{F}_{SP}(p_{1})-\widehat{F}_{FP}(c)}{p_{1}-c}\Big{)}(x-c)&\text{if}\ \ c<x\leq p_{1}.\end{cases} (3.19)

This function F^(0)\widehat{F}_{(0)} in (3.19) combines the strengths of the two estimators F^FP\widehat{F}_{FP} and F^SP\widehat{F}_{SP}, and gives a more balanced estimator of FF over all regions. Finally, since GFP,GSPG_{FP},G_{SP} are step functions, so are F^FP,F^SP\widehat{F}_{FP},\widehat{F}_{SP}. It follows based on (3.19) that F^(0)\widehat{F}_{(0)} is a step function as well, and has jumps only at the first non-reserve standing prices and final selling prices for auctions in V(q,ϵ)V(q,\epsilon). A continuous version of this estimator, denoted by F^init\widehat{F}_{init} can be obtained by linear interpolation of the values between the jump points.

3.3 The Coordinate ascent algorithm for maximizing LikPALik_{PA}

All the developments and derivations in the earlier subsections can now be compiled and summarized via the following coordinate ascent algorithm to maximize LikPA(𝜽)Lik_{PA}(\boldsymbol{\theta}).

Algorithm 3.1.

Coordinate ascent algorithm:

  1. Step 1.

    Start with initial value 𝜽(0)=(θ1(0),θ2(0),,θ+K(0))T\boldsymbol{\theta}^{(0)}=(\theta_{1}^{(0)},\theta_{2}^{(0)},\ldots,\theta_{\ell+K}^{(0)})^{T}, and a user defined tolerance level ϵ>0\epsilon>0.

  2. Step 2.

    Set m=0m=0.

  3. Step 3.

    For any 1i(+K)1\leq i\leq(\ell+K), sequentially obtain θi(m+1)\theta_{i}^{(m+1)} from (3.9), (3.10) and (3.11) by using the coordinate values in (θ1(m+1),,θi1(m+1),θi+1(m),,θ+K(m))T(\theta_{1}^{(m+1)},\ldots,\theta_{i-1}^{(m+1)},\theta_{i+1}^{(m)},\ldots,\theta_{\ell+K}^{(m)})^{T} to compute Ai,Bi,CiA_{i},B_{i},C_{i}.

  4. Step 4.

    If

    ln(LikPA(𝜽(m+1)))ln(LikPA(𝜽(m)))>ϵ,\ln\big{(}Lik_{PA}\big{(}\boldsymbol{\theta}^{(m+1)}\big{)}\big{)}-\ln\big{(}Lik_{PA}\big{(}\boldsymbol{\theta}^{(m)}\big{)}\big{)}>\epsilon,

    set mm+1m\leftarrow m+1. Otherwise, go to Step5~{}5.

  5. Step 5.

    Set 𝜽^MLE=𝜽(m+1)\widehat{\boldsymbol{\theta}}_{MLE}=\boldsymbol{\theta}^{(m+1)}.

Once we get 𝜽^MLE\widehat{\boldsymbol{\theta}}_{MLE}, we can easily get the corresponding maximum likelihood estimator of FF as follows

F^MLE(zi)=1j=1iθ^MLE,j, 1i(+K).\widehat{F}_{MLE}(z_{i})=1-\prod_{j=1}^{i}\widehat{\theta}_{MLE,j}\ ,\ \forall\ 1\leq i\leq(\ell+K). (3.20)

As explained above, the adapted objective function LikPALik_{PA} and the search space 𝐳\mathcal{F}_{\bf z} of CDFs with relevant jump discontinuities are artifacts of the non-parametric approach that we pursue. However, once the estimates of the valuation distribution at elements of 𝐳{\bf z} are obtained using Algorithm 3.1, a continuous estimate on the entire valuation distribution can be constructed via interpolation. In particular, we use the values of F^(MLE)\widehat{F}_{(MLE)} at ziz_{i}’s, F^(MLE)(0)=0\widehat{F}_{(MLE)}(0)=0, and linear interpolation to construct a continuous estimator of the population valuation distribution over the entire real line.

3.4 Instability of F^MLE\widehat{F}_{MLE} near 0 and boundary correction

From extensive simulation results using several choices of the true valuation distribution, we observed that F^MLE\widehat{F}_{MLE} seems to overestimate FF near 0, particularly, in the region below the minimum of all (non-reserve) standing prices from all the auctions. In other words, F^MLE(x)\widehat{F}_{MLE}(x) seems to overestimate F(x)F(x) for x<x¯1x<\bar{x}_{1}, where x¯1\bar{x}_{1} is the smallest (non-reserve) standing price obtained from all the auctions.

Instability of non-parametric MLE near the boundaries is a common phenomenon in the literature. In our case, it happens near 0. To illustrate this, we generate a data set containing K=150K=150 independent auctions where each auction runs for τ=100\tau=100 time units. The true underlying valuation distribution FF is taken to be a gamma distribution with shape parameter 1010 and rate parameter 22. The plots of the corresponding true FF (black curve), F^MLE\widehat{F}_{MLE} (orange curve), and F^init\widehat{F}_{init} (blue curve) are shown in Figure 3. In this case, the minimum of standing prices across all the auctions is x¯1=1.742\bar{x}_{1}=1.742 (green vertical line). It is easy to observe from Figure 3 that for x<1.742x<1.742, the graph of F^MLE\widehat{F}_{MLE} (orange curve) lies way above the graphs of both true FF and F^init\widehat{F}_{init}. It indicates that F^MLE\widehat{F}_{MLE} (orange curve) overestimates FF in the region (0,1.742)(0,1.742). In comparison, we notice that the initial estimate F^init\widehat{F}_{init} is more stable in this region (this was consistently observed in a broad variety of simulation settings). We leverage this observation to make the following modification in the steps of the coordinate ascent algorithm: we only look for distribution functions which are constrained to be equal to the initial estimator F^init\widehat{F}_{init} in the region (0,x¯1)(0,\bar{x}_{1}). The modified coordinate ascent algorithm is described below in Algorithm 3.2. Once we get the constrained MLE of 𝜽\boldsymbol{\theta}, denoted by 𝜽^c,MLE\widehat{\boldsymbol{\theta}}_{c,MLE}, from Algorithm 3.2; we can easily get the values of the corresponding constrained MLE of FF, denoted by F^c,MLE\widehat{F}_{c,MLE}, at all components of 𝐳\mathbf{z} using (3.20). As discussed earlier, linear interpolation can be used to get the values of F^c,MLE\widehat{F}_{c,MLE} at all other positive real numbers. The plot of F^c,MLE\widehat{F}_{c,MLE} (red curve) for the simulated data discussed above is provided in Figure 3, and clearly illustrates the performance improvement near 0 as compared to the earlier unconstrained MLE F^MLE\widehat{F}_{MLE} (orange curve).

Refer to caption
Figure 3: The constrained MLE, F^c,MLE\widehat{F}_{c,MLE} (red curve) is more accurate than the unconstrained one (yellow curve), especially in the region below the lowest non-reserve standing price of 1.7421.742 (green vertical line) from all the auctions.
Algorithm 3.2.

Modified coordinate ascent algorithm:

  1. Step 1.

    Start with initial value 𝜽(0)=(θ1(0),θ2(0),,θ+K(0))T\boldsymbol{\theta}^{(0)}=(\theta_{1}^{(0)},\theta_{2}^{(0)},\ldots,\theta_{\ell+K}^{(0)})^{T}, and a user defined tolerance level ϵ>0\epsilon>0.

  2. Step 2.

    Set m=0m=0.

  3. Step 3.

    Obtain 𝜽(m+1)\boldsymbol{\theta}^{(m+1)} from 𝜽(m)\boldsymbol{\theta}^{(m)} as follows:

    1. (a)

      Recall that zu1=x¯1z_{u_{1}}=\bar{x}_{1}, the smallest non-reserve standing price. Set the first u1u_{1} many elements of 𝜽(m+1)\boldsymbol{\theta}^{(m+1)} to be the same as corresponding elements of 𝜽(0)\boldsymbol{\theta}^{(0)}, i.e., θi(m+1)=θi(0)\theta^{(m+1)}_{i}=\theta^{(0)}_{i} for 1iu11\leq i\leq u_{1}.

    2. (b)

      For(u1+1)i(+K)(u_{1}+1)\leq i\leq(\ell+K), sequentially obtain θi(m+1)\theta_{i}^{(m+1)} from (3.9), (3.10) and (3.11) by using the coordinate values in (θ1(m+1),,θi1(m+1),θi+1(m),,θ+K(m))T(\theta_{1}^{(m+1)},\ldots,\theta_{i-1}^{(m+1)},\theta_{i+1}^{(m)},\ldots,\theta_{\ell+K}^{(m)})^{T} to compute Ai,Bi,CiA_{i},B_{i},C_{i}.

  4. Step 4.

    If

    ln(LikPA(𝜽(m+1)))ln(LikPA(𝜽(m)))>ϵ,\ln\big{(}Lik_{PA}\big{(}\boldsymbol{\theta}^{(m+1)}\big{)}\big{)}-\ln\big{(}Lik_{PA}\big{(}\boldsymbol{\theta}^{(m)}\big{)}\big{)}>\epsilon,

    set mm+1m\leftarrow m+1. Otherwise, go to Step5~{}5.

  5. Step 5.

    Set 𝜽^c,MLE=𝜽(m+1)\widehat{\boldsymbol{\theta}}_{c,MLE}=\boldsymbol{\theta}^{(m+1)}.

4 Simulation study

In this section we consider various choices of the true underlying valuation distribution FF, e.g., uniform, piecewise uniform, pareto, gamma, and beta distributions, which are commonly used in marketing research. We then illustrate and compare the performance of the constrained (boundary corrected) MLE F^c,MLE\widehat{F}_{c,MLE} and the initial estimate F^init\widehat{F}_{init} with the corresponding true valuation distribution FF. Note that the Bayesian methodology in George and Hui (2012) (which uses only the final selling prices in each auction) requires the knowledge of the the total number of consumers accessing the auction. This does not hold in our motivating application, and hence we will also not assume such a knowledge in our synthetic data evaluations below. The estimator F^SP\widehat{F}_{SP} in (3.16) is based only on final selling prices, and does not need the knowledge of total number of customers accessing the auction. The estimator F^init\widehat{F}_{init}, which improves F^SP\widehat{F}_{SP} by combining it appropriately with the first non-reserve standing price based estimator F^FP\widehat{F}_{FP} in (3.19). Hence, F^init\widehat{F}_{init} will be used as a representative/adaptation of the final selling price based approach of George and Hui (2012) for the setting considered in this paper.

4.1 Data generation

We conducted five sets of simulation experiments, each using data simulated from a different choice of the underlying valuation distribution FF. The cumulative distribution functions corresponding to the five choices of true underlying valuation distributions are shown in Figure 4.

Refer to caption
Figure 4: “True” underlying valuation distribution functions used in the simulation studies.

For the first set of simulations, the underlying FF is a Uniform(1,20)\left(1,20\right) distribution. For the second set of simulations, the underlying FF is an equally weighted mixture of the Uniform(1,2)\left(1,2\right) and Uniform(3,4)\left(3,4\right) distributions. From a managerial/marketing perspective, this corresponds to a market with two distinct consumer segments with different average valuations. For the third set of simulations, the true underlying FF is a Pareto distribution with location parameter 33 and dispersion parameter 100100. For the fourth set of simulations, the true underlying FF is a Gamma distribution with shape parameter 1010 and rate parameter 22. For the last and fifth set of simulations, the underlying FF is a Beta distribution with its two positive shape parameters being equal to 22, i.e., Beta(2,2)(2,2) distribution.

From each of the five true underlying FF’s, we consider two settings, with K=100K=100 and K=1000K=1000 independent auctions of identical copies of an item. Varying the sample size here sheds light on the relationship between the sample size and the precision of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} and the initial estimate F^init\widehat{F}_{init} in estimating the true valuation distribution FF. For each auction, we took the auction window (τ\tau) to be 100 units, and the constant rate (λ\lambda) of arrival of bidders to be equal to 11. We then simulated the inter-arrival times between bidders from an exponential distribution with rate parameter λ=1\lambda=1, and drew the bidders’ valuations from FF, keeping track of the entire sequence of standing prices and the intermediate times between jumps in the standing price for all independent auctions involved. This sequence of standing prices throughout the course of the auction, and the intermediate times between standing price changes are then treated as the observed dataset that is subsequently used to compute the initial estimator and the constrained MLE. For each choice of true FF and number of auctions KK, 100100 replicated datasets are generated.

Since the data is generated by consistent with the modeling assumptions, one expects the MLE which utilizes all available information, to have a superior performance than the initial estimator, which only uses the final selling price and first non-reserve standing price for each auction. The goal of these simulations is to examine extensively how much improvement can be obtained from our proposed method by incorporating the additional information in a variety of settings.

4.2 Simulation Results

For each replicated dataset generated (as described in the previous subsection), we apply our non-parametric methodology to obtain the initial estimate F^init\widehat{F}_{init} and the constrained MLE F^c,MLE\widehat{F}_{c,MLE}. The goal is to compare the accuracy of each of these estimators with respect to the respective true valuation distribution FF.

We first provide a visual illustration of the results by choosing a random replicate out of 100100 for each of the 1010 simulation settings (55 true valuation distributions, and 22 settings for the total number of auctions KK). For Figure 5, we consider a randomly chosen replicate from the setting where the true valuation distribution FF is Uniform(1,20)\left(1,20\right) and K=100K=100. The estimates F^c,MLE\widehat{F}_{c,MLE}, F^init\widehat{F}_{init}, and the true valuation distribution FF are plotted. We also provide the 90%90\% confidence intervals for both estimators based on the HulC approach developed in Kuchibhotla, Balakrishnan and Wasserman (2021). The HulC approach assumes median unbiasedness of the underlying estimators. Since F^init\widehat{F}_{init} and F^c,MLE\widehat{F}_{c,MLE} are not median unbiased, we correct for the median bias using a heuristic approach described in Appendix C. It can be seen that F^c,MLE\widehat{F}_{c,MLE} is much closer to the true FF compared to F^init\widehat{F}_{init} at almost all values in the interval (1,20)(1,20). Figure 6 provides a similar plot for a randomly chosen replicate generated from the Uniform(1,20)(1,20) and K=1000K=1000 setting. As expected, we see that the bias of both F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init} reduces drastically when we increase the number of independent auctions KK from 100100 to 10001000, and F^c,MLE\widehat{F}_{c,MLE} overall provides a much more accurate estimate of the true valuation distribution FF.

Moreover, we observe from Figure 5 and Figure 6 that the 90%90\% HulC confidence regions for F^c,MLE\widehat{F}_{c,MLE} (denoted by red-colored step function plot) are in general narrower compared to that of F^init\widehat{F}_{init} (denoted by the blue-colored step function plot). In other words, F^c,MLE\widehat{F}_{c,MLE} has lower variance than F^init\widehat{F}_{init}. Again, as expected, the variance of both F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init} decreases as we change the value of KK from 100100 to 10001000.

Refer to caption
Figure 5: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be Uniform(1,20)(1,20)) for a random chosen replicate with K=100K=100 independent auctions. 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).
Refer to caption
Figure 6: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be Uniform(1,20)(1,20)) for a random chosen replicate with K=1000K=1000 independent auctions. 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).

We provide similar plots for a randomly chosen replicate from the eight other settings (with true FF being piece-wise Uniform, Pareto, Gamma and Beta, and with K=100,1000K=100,1000) in Figures 7, 8, 9, and 10, and see that a similar phenomenon holds for all these settings: (a) F^c,MLE\widehat{F}_{c,MLE} is less biased than F^init\widehat{F}_{init}, (b) F^c,MLE\widehat{F}_{c,MLE} has narrower HulC confidence bands than F^init\widehat{F}_{init}, and (c) the bias and variance of both estimators decreases as KK increases from 100100 to 10001000.

The above plots based on single chosen replicates are illustrative, but need to be complemented with performance evaluation averaged over all the 100100 replicates in each of the 1010 simulation settings. In Table 1, for each simulation setting, we provide both the KS-distance and the Total variation distance (TV-distance) between the true valuation distribution FF and the two estimates F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init} averaged over the 100100 respective replications. The results show that the constrained MLE (based on the entire collection of standing prices) uniformly outperforms the initial estimator (based only on final selling prices and first non-reserve standing prices) in all the simulation settings. This strongly suggests that if the additional assumptions of single bidding and constant arrival rate seem to largely hold, it is worth using the proposed methodology which incorporates the extra information available in the form of all standing prices within the auction period.

Refer to caption
Figure 7: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be piece-wise Uniform) for a random chosen replicate with K=100K=100 (left) and K=1000K=1000 independent auctions (right). 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).
Refer to caption
Figure 8: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be Pareto(location = 3,dispersion = 100)) for a random chosen replicate with K=100K=100 (left) and K=1000K=1000 independent auctions (right). 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).
Refer to caption
Figure 9: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be Gamma(shape = 10, rate = 2)) for a random chosen replicate with K=100K=100 (left) and K=1000K=1000 independent auctions (right). 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).
Refer to caption
Figure 10: Plot of the constrained MLE F^c,MLE\widehat{F}_{c,MLE} (red), initial estimator F^init\widehat{F}_{init} (blue) and the true valuation distribution FF (taken to be Beta(2,2)) for a random chosen replicate with K=100K=100 (left) and K=1000K=1000 independent auctions (right). 90%90\%-HulC confidence intervals are also provided for both estimators (dotted lines, matching colors).
Setting KS distance (MLE) KS distance (initial) TV distance (MLE) TV distance (initial)
F=F= Uniform, K=100K=100 0.07000.0700 0.13100.1310 0.09750.0975 0.21700.2170
F=F= Uniform, K=1000K=1000 0.02670.0267 0.05120.0512 0.04060.0406 0.08950.0895
F=F= Piec.Unif., K=100K=100 0.06220.0622 0.10170.1017 0.11150.1115 0.16600.1660
F=F= Piec. Unif., K=1000K=1000 0.02050.0205 0.03560.0356 0.07700.0770 0.09200.0920
F=F= Pareto, K=100K=100 0.07060.0706 0.11800.1180 0.06850.0685 0.18030.1803
F=F= Pareto, K=1000K=1000 0.02560.0256 0.03920.0392 0.02470.0247 0.06760.0676
F=F= Gamma, K=100K=100 0.06600.0660 0.13020.1302 0.08330.0833 0.21730.2173
F=F= Gamma, K=1000K=1000 0.02360.0236 0.05010.0501 0.02850.0285 0.08480.0848
F=F= Beta, K=100K=100 0.07960.0796 0.15000.1500 0.09350.0935 0.23200.2320
F=F= Beta, K=1000K=1000 0.02670.0267 0.05780.0578 0.02980.0298 0.08930.0893
Table 1: Kolmogorov-Smirnoff (KS) distance and Total variation (TV) distance between each of the two estimators F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init} and the true valuation distribution FF for all the 1010 simulation settings.

5 Empirical application

In this section we apply our method to estimate the true valuation distribution of an Xbox based on actual data obtained from second-price auctions on eBay. In Section 5.1, we provide an overview of the data, and discuss features and adjustments to ensure its suitability for the methodology developed in the paper. In Section 5.2, we apply our non-parametric methodology on the data set and present the findings, and perform additional performance analysis.

5.1 Data overview

The data set on eBay on online auctions of Xbox game consoles was obtained from the Modeling Online Auctions data repository. More specifically, we focus on a data set which provides information for 9393 online auctions of identical Xbox game consoles where each auction lasts for 77 days. For each auction, a user’s bid is recorded only if changes the standing price in the auction. For each such bid, the following information is provided: auctionid (unique auction identifier), bid (dollar value of the bid), bidtime (the time, in days, that the bid was placed), bidder (bidder eBay username), bidderrate (internal eBay rating of the bidder), openbid (the reserve price for the auction, set by the seller), and price (the final selling price for the auction). While the standing price values throughout the course of the auction were not directly provided, they can be easily inferred from the successful bid values from the bid column and the reserve price. Also, the bidtime column directly provides the sequence of times at which there is a change in the standing price.

As mentioned in the introduction, we found that a minor fraction of bidders (less than 10%10\% of the total) placed multiple bids. Many of these bids are consecutive bids by the same bidder to ensure that they become the leader in the option. Note that we observe only ’successful’ bids, i.e., bids which change the standing price of the auction. If a successful bidder (post bidding) observes that the standing price of the auction has changed to their bid (plus a small increment), it can be inferred that this bid is currently the second highest. Hence, through a proxy bidding system offered by eBay, the bidder could choose to incrementally push up their bid until they become the leader in the option (the standing price becomes less than their latest bid). The proxy system also needs to be provided with a ceiling value, above which no bids are to be submitted. This value is very likely the bidder’s true valuation of the product. With this in mind, and to adapt the data as much as possible to our single bidding assumption, we remove all the previous bids of such multiple bidders from the data, and keep only the final bid. Finally, there are a couple of auctions where the first successful bid values are same as the reserve prices (openbid) of the corresponding auctions. To ensure compliance with our requirement of no ties, and for uniformity, we added a small random noise from Uniform(0,0.01)\left(0,0.01\right) to all the bids across all the auctions. Since the total number of bidders accessing the auctions is not available, the final selling price based methodology in George and Hui (2012) is not applicable. As in the simulations, we will use the initial estimator F^init\widehat{F}_{init}, which is computed using only the final selling prices and first observed bids in all auctions, as a representative of this methodology in the current setting.

5.2 Analysis of Xbox data

Using the Xbox 7-day auctions dataset with slight modifications as mentioned in Section 5.1, we now compute the initial estimate F^init\widehat{F}_{init} and the constrained MLE F^c,MLE\widehat{F}_{c,MLE}. For the estimation of λ\lambda (see Section 2.3) we need to choose a subset of auctions whose reserve prices are negligible in the given context. We found that the smallest final selling price in all the auctions is $28 and the median final selling price in all the auctions is $120. Given this, we chose all auctions with reserve price less than $10 (3939 out of 9393) for obtaining the generalized method of moments based estimator of λ\lambda, and also for the computing the final selling price based estimator F^SP\widehat{F}_{SP} (see Step I in Section 3.2). Recall that F^SP\widehat{F}_{SP} is one of the components used to compute the initial estimator F^init\widehat{F}_{init}.

The plots of the initial estimate F^init\widehat{F}_{init} and the constrained MLE F^c,MLE\widehat{F}_{c,MLE} of the (unknown) true valuation distribution along with the corresponding 90%90\% HulC confidence regions are provided in Figure 11. Similar to the phenomenon observed in the simulations in Section 4.2, we notice that the HulC confidence region of F^c,MLE\widehat{F}_{c,MLE} is lesser in width than that of F^init\widehat{F}_{init}, indicating comparatively smaller variance of F^c,MLE\widehat{F}_{c,MLE}. We see that the two estimates are reasonably different: the total variation distance between them is 0.41400.4140 and the KS distance between them is 0.46860.4686. Another interesting observation is that the curves for these two estimates cross exactly once, with F^c,MLE(x)\widehat{F}_{c,MLE}(x) dominated by F^init(x)\widehat{F}_{init}(x) after the crossing point, and vice-versa before the crossing point. This implies that F^init\widehat{F}_{init} stochastically dominates F^c,MLE\widehat{F}_{c,MLE}. In other words, the final selling price/first non-reserve price based initial estimator signifies higher Xbox valuations than the MLE estimator based on the entire collection of standing prices throughout the course of the auctions.

Unlike the simulation setting, the true valuation distribution is obviously not known here. However, we still undertake a limited performance evaluation and comparison exercise for the two approaches. As discussed previously in Section 4.1, if the modeling assumptions are largely unviolated (which seems to be the case) one would expect the MLE to do better than the initial estimator. The goal of this limited evaluation is again to understand the amount of improvement, and also to examine the stability of both estimators. For this purpose, we split the entire Xbox dataset into training and test sets. In particular, we consider two choices of splits namely, 1:11:1 and 2:12:1, for the ratio of auctions in training vs. test data. For each split, using F^c,MLE,test\widehat{F}_{c,MLE,\ test} (MLE estimator using test data) as an approximation for the true valuation distribution FF, we evaluate both the initial estimate F^init,training\widehat{F}_{init,\ training} and the constrained MLE F^c,MLE,training\widehat{F}_{c,MLE,\ training} from the training set and compare each of them with F^c,MLE,test\widehat{F}_{c,MLE,\ test}. The results for one such 1:1 and 2:1 split each are shown in Figure 12. We can see that F^c,MLE,test\widehat{F}_{c,MLE,\ test} is significantly closer to F^c,MLE,training\widehat{F}_{c,MLE,\ training} as compared to F^init,training\widehat{F}_{init,\ training}. We also calculate the average total variation distance between F^c,MLE,test\widehat{F}_{c,MLE,\ test} and each of F^c,MLE,training\widehat{F}_{c,MLE,\ training} and F^init,training\widehat{F}_{init,\ training}, averaged over 10001000 replications of each random 1:1 and 2:1 split. The values are provided in Table 2.

Refer to caption
Figure 11: Plot of Initial estimate (solid blue line, based only on finals selling prices and first observed bids) vs. constrained MLE (solid red line, based on entire sequence of standing prices) of F and their corresponding 90%90\% HulC confidence regions (dotted blue and red lines) for the Xbox 7-day auctions dataset.
Refer to caption
Figure 12: Plot of F^init,training\widehat{F}_{init,\ training} (solid blue line) vs. F^c,MLE,training\widehat{F}_{c,MLE,\ training} (solid red line) vs. F^c,MLE,test\widehat{F}_{c,MLE,\ test} (dotted red line) for two different (1:1 and 2:1) splits into training and test sets of the Xbox dataset.
train : test AvgTV(F^init,training,F^c,MLE,test)\left(\widehat{F}_{init,\ training}\ ,\ \widehat{F}_{c,MLE,\ test}\right) AvgTV(F^c,MLE,training,F^c,MLE,test)\left(\widehat{F}_{c,MLE,\ training}\ ,\ \widehat{F}_{c,MLE,\ test}\right)
1:11:1 0.32570.3257 0.14540.1454
2:12:1 0.32720.3272 0.15860.1586
Table 2: Average total variation distance (AvgTV) between F^c,MLE,test\widehat{F}_{c,MLE,\ test} and each of F^init,training\widehat{F}_{init,\ training} and F^c,MLE,training\widehat{F}_{c,MLE,\ training} respectively, averaged over 10001000 replications of the random split with same proportion.

6 Discussion and future research

In this paper we have a developed a non-parametric methodology for estimating the consumer valuation distribution using second price auction data. Unlike the approach in George and Hui (2012), our methodology uses the collection of current selling price values throughout the course of the auctions, and does not require knowledge of the total number of bidders accessing the auction. Extensive simulations demonstrate that, when the modeling assumptions are true, using our approach can lead to significantly better performance than estimators based on just final selling prices and first observed bids. Two additional assumptions (compared to George and Hui (2012)) which preclude multiple bidding and postulate constant rate of arrival of the consumers to the auction are needed. Many real-life second price auctions see only minor departures from these assumptions, which are supported by economic theory. However, if there is evidence of major violation, results from the proposed methodology should be used cautiously. Generalizing our methodology by relaxing one or both of these assumptions is a topic of future research. One possible direction which we plan on exploring is allow two different rates for the bidder arrival process, with a transition between these two rates happening sometime during the auction period.

{funding}

Rohit Patra’s work was partially supported by NSF grant DMS-2210662.

References

  • Backus and Lewis (2020) {btechreport}[author] \bauthor\bsnmBackus, \bfnmM.\binitsM. and \bauthor\bsnmLewis, \bfnmG.\binitsG. (\byear2020). \btitleDynamic demand estimation in auction markets \btypeTechnical Report, \bpublisherNBER Working paper 22375. \endbibitem
  • P. Bajari and A. Hortacsu (2003) {barticle}[author] \bauthor\bsnmP. Bajari and A. Hortacsu (\byear2003). \btitleThe winner’s curse, reserve prices, and endogenous entry: Empirical insights from eBay auctions. \bjournalRand J. Econ. \bvolume34 \bpages329-355. \endbibitem
  • Barbaro and Bracht (2021) {barticle}[author] \bauthor\bsnmBarbaro, \bfnmSalvatore\binitsS. and \bauthor\bsnmBracht, \bfnmBernd\binitsB. (\byear2021). \btitleShilling, Squeezing, Sniping. A further explanation for late bidding in online second-price auctions. \bjournalJournal of Behavioral and Experimental Finance \bvolume31 \bpages100553. \bdoihttps://doi.org/10.1016/j.jbef.2021.100553 \endbibitem
  • Bose and Daripa (2017) {barticle}[author] \bauthor\bsnmBose, \bfnmSubir\binitsS. and \bauthor\bsnmDaripa, \bfnmArup\binitsA. (\byear2017). \btitleShills and snipes. \bjournalGames and Economic Behavior \bvolume104 \bpages507-516. \bdoihttps://doi.org/10.1016/j.geb.2017.05.010 \endbibitem
  • Chan, Kadiyali and Park (2007) {barticle}[author] \bauthor\bsnmChan, \bfnmT.\binitsT., \bauthor\bsnmKadiyali, \bfnmV.\binitsV. and \bauthor\bsnmPark, \bfnmY. H.\binitsY. H. (\byear2007). \btitleWillingness to pay and competition in online auctions. \bjournalJournal of Marketing Research \bvolume44 \bpages324–333. \endbibitem
  • (6) {bmisc}[author] \bauthor\bsnmeBay \btitleAutomatic Bidding. \endbibitem
  • George and Hui (2012) {barticle}[author] \bauthor\bsnmGeorge, \bfnmEdward I.\binitsE. I. and \bauthor\bsnmHui, \bfnmSam K.\binitsS. K. (\byear2012). \btitleOptimal pricing using online auction experiments: A Pólya tree approach. \bjournalThe Annals of Applied Statistics \bvolume6 \bpages55 – 82. \bdoi10.1214/11-AOAS503 \endbibitem
  • Hou and Rego (2007) {barticle}[author] \bauthor\bsnmHou, \bfnmJ.\binitsJ. and \bauthor\bsnmRego, \bfnmC.\binitsC. (\byear2007). \btitleA classification of online bidders in a private value auction: Evidence from eBay. \bjournalInternational Journal of Electronic Marketing and Retailing \bvolume1 \bpages322–338. \endbibitem
  • Kuchibhotla, Balakrishnan and Wasserman (2021) {bmisc}[author] \bauthor\bsnmKuchibhotla, \bfnmArun Kumar\binitsA. K., \bauthor\bsnmBalakrishnan, \bfnmSivaraman\binitsS. and \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2021). \btitleThe HulC: Confidence Regions from Convex Hulls. \endbibitem
  • Milgrom (2004) {bbook}[author] \bauthor\bsnmMilgrom, \bfnmP.\binitsP. (\byear2004). \btitlePutting Auction Theory to Work. \bpublisherCambridge. \endbibitem
  • Murphy (1994) {barticle}[author] \bauthor\bsnmMurphy, \bfnmS. A.\binitsS. A. (\byear1994). \btitleConsistency in a Proportional Hazards Model Incorporating a Random Effect. \bjournalThe Annals of Statistics \bvolume22 \bpages712–731. \endbibitem
  • Ockenfels and Roth (2006) {barticle}[author] \bauthor\bsnmOckenfels, \bfnmAxel\binitsA. and \bauthor\bsnmRoth, \bfnmAlvin E.\binitsA. E. (\byear2006). \btitleLate and multiple bidding in second price Internet auctions: Theory and evidence concerning different rules for ending an auction. \bjournalGames and Economic Behavior \bvolume55 \bpages297-320. \bnoteMini Special Issue: Electronic Market Design. \bdoihttps://doi.org/10.1016/j.geb.2005.02.010 \endbibitem
  • Park and Bradlow (2005) {barticle}[author] \bauthor\bsnmPark, \bfnmY. H.\binitsY. H. and \bauthor\bsnmBradlow, \bfnmE.\binitsE. (\byear2005). \btitleAn integrated model for bidding behavior in internet auctions: Whether, who, when, and how much. \bjournalJournal of Marketing Research \bvolume42 \bpages470–482. \endbibitem
  • Song (2004) {btechreport}[author] \bauthor\bsnmSong, \bfnmUnjy\binitsU. (\byear2004). \btitleNonparametric estimation of an eBay auction model with an unknown number of bidders \btypeTechnical Report, \bpublisherUniversity of British Columbia. \endbibitem
  • Vardi (1982) {barticle}[author] \bauthor\bsnmVardi, \bfnmY.\binitsY. (\byear1982). \btitleNonparametric Estimation in the Presence of Length Bias. \bjournalThe Annals of Statistics \bvolume10 \bpages616 – 620. \bdoi10.1214/aos/1176345802 \endbibitem
  • Vickrey (1961) {barticle}[author] \bauthor\bsnmVickrey, \bfnmW.\binitsW. (\byear1961). \btitleCounter-speculation, auctions, and competitive sealed tenders. \bjournalJournal of Finance \bvolume16 \bpages8–37. \endbibitem
  • Yao and Mela (2008) {barticle}[author] \bauthor\bsnmYao, \bfnmS.\binitsS. and \bauthor\bsnmMela, \bfnmC.\binitsC. (\byear2008). \btitleOnline auction demand. \bjournalMarketing Science \bvolume27 \bpages861–885. \endbibitem

Appendix A Proof of Lemma 2.1

Proof.

We first introduce some additional notation. Let {Δi}i=1M1\{\Delta_{i}\}_{i=1}^{M-1} represent the number of bidders accessing the auction between the ithi^{th} and (i+1)th(i+1)^{th} changes in the selling price, and let Δ0\Delta_{0} represent the number of bidders accessing the auction until the first time when the selling price changes to a higher value from the reserve price rr. Also, let {Si}i=2N\{S_{i}\}_{i=2}^{N} represent the time between the arrival of (i1)th(i-1)^{th} and ithi^{th} bidders accessing the auction, and S1S_{1} let represents the waiting time of the arrival of the first bidder from the start of the auction. Recall that a bidder accessing the auction is allowed to place a bid only if the bid value is greater than the current selling price. We now consider three possible scenarios at the end of the auction.

Case I: When the item is sold above the reserve price (𝐌>𝟎,𝐎=𝟏)\mathbf{(M>0,O=1)}. In this case, the number of times the selling price changes throughout the course of the auction i.e., MM, is positive. Now, let us first derive the conditional density of the standing prices {Xi}i=1M\{X_{i}\}_{i=1}^{M} given {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM, O=1O=1, and N=nN=n.

  • Since, Δ0\Delta_{0} is the number of bidders until the first time that the standing price changes to a higher value than the reserve price rr, it means that there are (Δ02)(\Delta_{0}-2) many bids that are less than rr, and only two bids are higher than rr with X1X_{1} being the second highest bid. Also, the first bid which is higher than rr can occur at (Δ01)(\Delta_{0}-1) many places.

  • For X2X_{2} to be the next standing price After X1X_{1} being the current second highest bid, the next (Δ11)(\Delta_{1}-1) bids must be less than X1X_{1} and the (Δ0+Δ1)th(\Delta_{0}+\Delta_{1})^{\text{th}} bid should be higher than X1X_{1}.

  • Continuing on like this, we should have the last (ni=0M1Δi)(n-\sum_{i=0}^{M-1}\Delta_{i}) many bids less than XMX_{M} after XMX_{M} becomes the standing price (and the second highest bid of the entire auction) with the (unobserved) highest bid RR occurring somewhere before.

It follows that the conditional density of {Xi}i=1M\{X_{i}\}_{i=1}^{M} given {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM, O=1O=1, and N=nN=n is given by

=\displaystyle={} (Δ01)F(r)Δ02F(X1)Δ11F(X2)Δ21××F(XM)ni=0M1Δi\displaystyle(\Delta_{0}-1)F(r)^{\Delta_{0}-2}F(X_{1})^{\Delta_{1}-1}F(X_{2})^{\Delta_{2}-1}\times\ldots\times F(X_{M})^{n-\sum_{i=0}^{M-1}\Delta_{i}}
×(1F(XM))i=1Mf(Xi)\displaystyle\times\big{(}1-F(X_{M})\big{)}\prod_{i=1}^{M}f(X_{i})
=\displaystyle={} (Δ01)F(r)(1F(XM))F(XM)ni=0M1Δii=1Mf(Xi)i=0M1F(Xi)Δi1,\displaystyle\dfrac{(\Delta_{0}-1)}{F(r)}\ \big{(}1-F(X_{M})\big{)}F(X_{M})^{n-\sum_{i=0}^{M-1}\Delta_{i}}\prod_{i=1}^{M}f(X_{i})\prod_{i=0}^{M-1}F(X_{i})^{\Delta_{i}-1}, (A.1)

where X0=rX_{0}=r. Note that the above holds only if M(n1)M\leq(n-1), Δ02\Delta_{0}\geq 2, Δ1,Δ2,,ΔM11\Delta_{1},\Delta_{2},\ldots,\Delta_{M-1}\geq 1, and i=0M1Δin\sum_{i=0}^{M-1}\Delta_{i}\leq n.

For a collection of i.i.d. random variables Y1,Y2,,YnY_{1},Y_{2},\cdots,Y_{n}, the distribution of number of changes in the running second maximum, and location of these changes in the index set {1,2,,n}\{1,2,\cdots,n\} is invariant under any strictly monotone transformation on the YiY_{i}s. If FF is absolutely continuous, then F1F^{-1} exists and is strictly increasing. Note {F1(Yi)}i=1n\{F^{-1}(Y_{i})\}_{i=1}^{n} is a collection of i.i.d. Uniform[0,1][0,1] random variables. Applying the above conclusions to our context with YiY_{i} being the valuation of the ithi^{th} bidder accessing the auction, it follows that the distribution of {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM, OO given N=nN=n does not depend on FF. Using (A.1), it follows that the joint density of MM, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, OO at values mm (with m>0m>0), o=1o=1, {xi}i=1m\{x_{i}\}_{i=1}^{m}, {δi}i=0m1\{\delta_{i}\}_{i=0}^{m-1} given N=nN=n is equal to

C1(δ01)F(r)(1F(xm))F(xm)ni=0m1δii=1mf(xi)i=0m1F(xi)δi1,\displaystyle C_{1}\dfrac{(\delta_{0}-1)}{F(r)}\ \big{(}1-F(x_{m})\big{)}F(x_{m})^{n-\sum_{i=0}^{m-1}\delta_{i}}\prod_{i=1}^{m}f(x_{i})\prod_{i=0}^{m-1}F(x_{i})^{\delta_{i}-1},

assuming that the arguments satisfy the constraints m(n1)m\leq(n-1), δ02\delta_{0}\geq 2, δ1,δ2,,δm11\delta_{1},\delta_{2},\ldots,\delta_{m-1}\geq 1, and i=0m1δin\sum_{i=0}^{m-1}\delta_{i}\leq n (otherwise the value of the joint density is 0). Here the term C1C_{1} is independent of FF.

Since bidders are assumed to arrive at the auction via a Poisson process with rate λ\lambda, it follows that the number of potential bidders NN in any auction follows a Poisson(λτ)(\lambda\tau) distribution. Also, conditional on N=nN=n, note that {Si}i=1n\{S_{i}\}_{i=1}^{n} are i.i.d. exponential random variables with rate λ\lambda. Hence, the joint density of the partial sum (S1,S1+S2,,i=1nSi)\bigg{(}S_{1},S_{1}+S_{2},\ldots,\sum_{i=1}^{n}S_{i}\bigg{)} given N=nN=n is

n!τn,whereSi0iandi=1nSiτ.\dfrac{n!}{\tau^{n}},\ \text{where}\ S_{i}\geq 0\ \forall\ i\ \text{and}\ \sum_{i=1}^{n}S_{i}\leq\tau. (A.2)

It follows that

(S1,S1+S2,,i=1nSi)d=(U(1),U(2),,U(n))\bigg{(}S_{1},S_{1}+S_{2},\ldots,\sum_{i=1}^{n}S_{i}\bigg{)}\ \underset{=}{d}\ \big{(}U_{(1)},U_{(2)},\ldots,U_{(n)}\big{)} (A.3)

given N=nN=n, where {Ui}i=1n\{U_{i}\}_{i=1}^{n} are i.i.d. Uniform[0,τ][0,\tau], and (U(1),U(2),,U(n))(U_{(1)},U_{(2)},\ldots,U_{(n)}) are the corresponding order statistics.

Since TiT_{i} denotes the intermediate time between the ithi^{th} and (i+1)th(i+1)^{th} changes in the standing price for 0iM10\leq i\leq M-1, it can be easily seen that

T0=i=1Δ0Si,T1=i=Δ0+1Δ0+Δ1Si,T2=i=Δ0+Δ1+1Δ0+Δ1+Δ2Si,,TM1=i=Δ0+Δ1++ΔM2+1Δ0+Δ1++ΔM1Si,T_{0}=\sum_{i=1}^{\Delta_{0}}S_{i},\ T_{1}=\sum_{i=\Delta_{0}+1}^{\Delta_{0}+\Delta_{1}}S_{i},\ T_{2}=\sum_{i=\Delta_{0}+\Delta_{1}+1}^{\Delta_{0}+\Delta_{1}+\Delta_{2}}S_{i},\ldots,\ T_{M-1}=\sum_{i=\Delta_{0}+\Delta_{1}+\ldots+\Delta_{M-2}+1}^{\Delta_{0}+\Delta_{1}+\ldots+\Delta_{M-1}}S_{i},

and TM=τi=0M1TiT_{M}=\tau-\sum_{i=0}^{M-1}T_{i}. Since {Si}i=1n\{S_{i}\}_{i=1}^{n} and (M,{Xi}i=1M,O,{Δi}i=0M1)(M,\{X_{i}\}_{i=1}^{M},O,\{\Delta_{i}\}_{i=0}^{M-1}) are independent given N=nN=n. it follows that

(T0,T0+T1,,i=0M1Ti)Td=(U(J0),U(J1),,U(JM1))T,\bigg{(}T_{0},T_{0}+T_{1},\ldots,\sum_{i=0}^{M-1}T_{i}\bigg{)}^{T}\ \underset{=}{d}\ \big{(}U_{(J_{0})},U_{(J_{1})},\ldots,U_{(J_{M-1})}\big{)}^{T}, (A.4)

given N=nN=n, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM and OO. Here Jk=i=0kΔiJ_{k}=\sum_{i=0}^{k}\Delta_{i} for k=0,1,,M1k=0,1,\ldots,M-1.

From (A.2) and (A.3), joint density of (U(J0),U(J1),,U(JM1))T(U_{(J_{0})},U_{(J_{1})},\ldots,U_{(J_{M-1})})^{T} given N=nN=n, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM and OO is equal to

f(U(J0),,U(JM1))(u0,,uM1)=(τuM1)ni=0M1ΔiB(Δ)τni=0M1(uiui1)Δi1,f_{(U_{(J_{0})},\ldots,U_{(J_{M-1})})}(u_{0},\ldots,u_{M-1})=\dfrac{(\tau-u_{M-1})^{n-\sum_{i=0}^{M-1}\Delta_{i}}}{B(\Delta)\tau^{n}}\prod_{i=0}^{M-1}(u_{i}-u_{i-1})^{\Delta_{i}-1}, (A.5)

where u1=0u_{-1}=0, and

B(Δ)=(ni=0M1Δi)!i=0M1(Δi1)!n!.B(\Delta)=\dfrac{\big{(}n-\sum_{i=0}^{M-1}\Delta_{i}\big{)}!\prod_{i=0}^{M-1}(\Delta_{i}-1)!}{n!}.

From (A.4) and (A.5), it follows that the conditional density of (T0,T0+T1,,i=0M1Ti)(T_{0},T_{0}+T_{1},\ldots,\sum_{i=0}^{M-1}T_{i}) given N=nN=n, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, MM, and OO is equal to

(TM)ni=0M1ΔiB(Δ)τni=0M1TiΔi1,\dfrac{(T_{M})^{n-\sum_{i=0}^{M-1}\Delta_{i}}}{B({\Delta})\tau^{n}}\prod_{i=0}^{M-1}T_{i}^{\Delta_{i}-1}, (A.6)

where B(Δ)B({\Delta}) is as defined above.

Since the Jacobian of the transformation from (T0,T0+T1,,i=0M1Ti)T(T_{0},T_{0}+T_{1},\ldots,\sum_{i=0}^{M-1}T_{i})^{T} to (T0,T1,,TM1)T(T_{0},T_{1},\ldots,T_{M-1})^{T} is 11, combining (A.1) and (A.6) it follows that the joint density of MM, {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1}, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Δi}i=0M1\{\Delta_{i}\}_{i=0}^{M-1}, OO at values mm (with m>0m>0), {ti}i=0m1\{t_{i}\}_{i=0}^{m-1}, {xi}i=1m\{x_{i}\}_{i=1}^{m}, {δi}i=0m1\{\delta_{i}\}_{i=0}^{m-1}, o=1o=1 given N=nN=n is equal to

C1(δ01)B(δ)τnF(r)(1F(xm))(F(xm)tm)ni=0m1δii=0m1tiδi1i=1mf(xi)i=0m1F(xi)δi1\displaystyle\frac{C_{1}(\delta_{0}-1)}{B(\delta)\tau^{n}F(r)}\big{(}1-F(x_{m})\big{)}\big{(}F(x_{m})t_{m}\big{)}^{n-\sum_{i=0}^{m-1}\delta_{i}}\prod_{i=0}^{m-1}t_{i}^{\delta_{i}-1}\prod_{i=1}^{m}f(x_{i})\prod_{i=0}^{m-1}F(x_{i})^{\delta_{i}-1}
=\displaystyle={} C1(δ01)(1F(xm))B(δ)τnF(r)(F(xm)tm)ni=0m1δii=0m1(F(xi)ti)δi1i=1mf(xi)\displaystyle\frac{C_{1}(\delta_{0}-1)\big{(}1-F(x_{m})\big{)}}{B({\delta})\tau^{n}F(r)}\big{(}F(x_{m})t_{m}\big{)}^{n-\sum_{i=0}^{m-1}\delta_{i}}\prod_{i=0}^{m-1}\big{(}F(x_{i})t_{i}\big{)}^{\delta_{i}-1}\prod_{i=1}^{m}f(x_{i})
=\displaystyle={} C1n!(1F(xm))(F(xm)tm)ni=0m1δi(δ01)(F(r)t0)δ01τnF(r)(ni=0m1δi)!(δ01)!i=1m1(F(xi)ti)δi1(δi1)!i=1mf(xi)\displaystyle\frac{C_{1}n!\big{(}1-F(x_{m})\big{)}\big{(}F(x_{m})t_{m}\big{)}^{n-\sum_{i=0}^{m-1}\delta_{i}}(\delta_{0}-1)\big{(}F(r)t_{0}\big{)}^{\delta_{0}-1}}{\tau^{n}F(r)\big{(}n-\sum_{i=0}^{m-1}\delta_{i}\big{)}!(\delta_{0}-1)!}\prod_{i=1}^{m-1}\frac{\big{(}F(x_{i})t_{i}\big{)}^{\delta_{i}-1}}{(\delta_{i}-1)!}\prod_{i=1}^{m}f(x_{i})
=\displaystyle={} C1n!t0(1F(xm))τn(F(xm)tm)ni=0m1δi(ni=0m1δi)!(F(r)t0)δ02(δ02)!i=1m1(F(xi)ti)δi1(δi1)!i=1mf(xi),\displaystyle\frac{C_{1}n!t_{0}\big{(}1-F(x_{m})\big{)}}{\tau^{n}}\frac{\big{(}F(x_{m})t_{m}\big{)}^{n-\sum_{i=0}^{m-1}\delta_{i}}}{\big{(}n-\sum_{i=0}^{m-1}\delta_{i}\big{)}!}\frac{\big{(}F(r)t_{0}\big{)}^{\delta_{0}-2}}{(\delta_{0}-2)!}\prod_{i=1}^{m-1}\frac{\big{(}F(x_{i})t_{i}\big{)}^{\delta_{i}-1}}{(\delta_{i}-1)!}\prod_{i=1}^{m}f(x_{i}), (A.7)

where x0=rx_{0}=r, and the arguments satisfy the constraints assuming that the arguments satisfy the constraints m(n1)m\leq(n-1), δ02\delta_{0}\geq 2, δ1,δ2,,δm11\delta_{1},\delta_{2},\ldots,\delta_{m-1}\geq 1, i=0m1δin\sum_{i=0}^{m-1}\delta_{i}\leq n, and i=0m1tiτ\sum_{i=0}^{m-1}t_{i}\leq\tau (otherwise the value of the joint density is 0).

Now, summing over δi\delta_{i}’s in (A.7) such that, δ02\delta_{0}\geq 2, δ1,δ2,,δm11\delta_{1},\delta_{2},\ldots,\delta_{m-1}\geq 1, i=0m1δin\sum_{i=0}^{m-1}\delta_{i}\leq n; the joint density of MM, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1}, and OO at values mm (with m>0m>0), {ti}i=0m1\{t_{i}\}_{i=0}^{m-1}, {xi}i=1m\{x_{i}\}_{i=1}^{m}, o=1o=1 given N=nN=n is equal to

C1n!t0(1F(xm))τn(nm1)!(i=0mF(xi)ti)nm1i=1mf(xi),\dfrac{C_{1}n!t_{0}\big{(}1-F(x_{m})\big{)}}{\tau^{n}(n-m-1)!}\bigg{(}\sum_{i=0}^{m}F(x_{i})t_{i}\bigg{)}^{n-m-1}\prod_{i=1}^{m}f(x_{i}), (A.8)

where x0=rx_{0}=r and the arguments satisfy the constraints m(n1)m\leq(n-1) and tm=τi=0m1ti0t_{m}=\tau-\sum_{i=0}^{m-1}t_{i}\geq 0. Moreover, since NPoisson(λτ)N\sim\text{Poisson}(\lambda\tau), we have

P(N=n)=exp(λτ)(λτ)nn!.P(N=n)=\exp(-\lambda\tau)\dfrac{(\lambda\tau)^{n}}{n!}. (A.9)

Combining (A.8) and (A.9), we get the joint density of MM, {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1}, OO, and NN at values mm (with m>0m>0), {ti}i=0m1\{t_{i}\}_{i=0}^{m-1}, {xi}i=1m\{x_{i}\}_{i=1}^{m}, o=1o=1, and nn is equal to

C1exp(λτ)λnt0(1F(xm))(nm1)!(i=0mF(xi)ti)nm1i=1mf(xi),C_{1}\exp(-\lambda\tau)\dfrac{\lambda^{n}t_{0}\big{(}1-F(x_{m})\big{)}}{(n-m-1)!}\bigg{(}\sum_{i=0}^{m}F(x_{i})t_{i}\bigg{)}^{n-m-1}\prod_{i=1}^{m}f(x_{i}), (A.10)

where x0=rx_{0}=r and the arguments satisfy the constraints m(n1)m\leq(n-1) and tm=τi=0m1ti0t_{m}=\tau-\sum_{i=0}^{m-1}t_{i}\geq 0. Finally, summing over nn in (A.10) such that n(m+1)n\geq(m+1), we get the joint density of {Xi}i=1M\{X_{i}\}_{i=1}^{M}, {Ti}i=0M1\{T_{i}\}_{i=0}^{M-1}, MM, and OO at values mm (with m>0m>0), {ti}i=0m1\{t_{i}\}_{i=0}^{m-1}, {xi}i=1m\{x_{i}\}_{i=1}^{m}, and o=1o=1 is equal to

C1exp(λτ)λm+1T0(1F(xm))exp(λi=0mF(xi)ti)i=1mf(xi)\displaystyle C_{1}\exp(-\lambda\tau)\lambda^{m+1}T_{0}\big{(}1-F(x_{m})\big{)}\exp\bigg{(}\lambda\sum_{i=0}^{m}F(x_{i})t_{i}\bigg{)}\prod_{i=1}^{m}f(x_{i})
=\displaystyle={} C1exp(λτ)(λm+1t0(1F(xm)))exp(λi=0MF(xi)ti)\displaystyle C_{1}\exp(-\lambda\tau)\Big{(}\lambda^{m+1}t_{0}\big{(}1-F(x_{m})\big{)}\Big{)}\exp\bigg{(}\lambda\sum_{i=0}^{M}F(x_{i})t_{i}\bigg{)}
×(i=1Mf(xi)),\displaystyle\times\bigg{(}\prod_{i=1}^{M}f(x_{i})\bigg{)}, (A.11)

where x0=rx_{0}=r, tm=τi=0m1ti0t_{m}=\tau-\sum_{i=0}^{m-1}t_{i}\geq 0.

Case II: When the item is sold at the reserve price (𝐌=𝟎,𝐎=𝟏)\mathbf{(M=0,O=1)}. In this case, the only bid which is higher than the reserve price remains unobserved and the value of M=0M=0. Moreover, we have X0=r=XMX_{0}=r=X_{M}, T0=τ=TMT_{0}=\tau=T_{M}, Δ0=N\Delta_{0}=N and N1N\geq 1. Since the probability that M=0,O=1M=0,O=1 given N=nN=n equals

n(F(r))n1(1F(r)).n\big{(}F(r)\big{)}^{n-1}\big{(}1-F(r)\big{)}. (A.12)

for n1n\geq 1, it follows using (A.9) and (A.12) that the joint density of MM, X0X_{0}, T0T_{0}, OO, and NN at values 0, rr, τ\tau, 11 and nn is equal to

n(F(r))n1(1F(r))exp(λτ)(λτ)nn!\displaystyle n\big{(}F(r)\big{)}^{n-1}\big{(}1-F(r)\big{)}\exp(-\lambda\tau)\dfrac{(\lambda\tau)^{n}}{n!}
=\displaystyle={} λτ(1F(r))exp(λτ)(λτF(r))n1(n1)!.\displaystyle\lambda\tau\big{(}1-F(r)\big{)}\exp(-\lambda\tau)\dfrac{\big{(}\lambda\tau F(r)\big{)}^{n-1}}{(n-1)!}. (A.13)

Summing over nn in (A.13) for n1n\geq 1, we get the joint density of MM, X0X_{0}, T0T_{0}, and OO at values 0, rr, τ\tau and 11 equals

exp(λτ)λτ(1F(r))exp(λτF(r))\displaystyle\exp(-\lambda\tau)\lambda\tau\big{(}1-F(r)\big{)}\exp\big{(}\lambda\tau F(r)\big{)} (A.14)

Case III: When the item is not sold (𝐌=𝟎,𝐎=𝟎)\mathbf{(M=0,O=0)}. This situation can occur if either all the bids are less than the reserve price or no bidding happened at all. In any case M=0M=0. Additionally, we have X0=r=XMX_{0}=r=X_{M}, T0=τ=TMT_{0}=\tau=T_{M}, Δ0=N\Delta_{0}=N and N0N\geq 0. Since the probability that M=0,O=0M=0,O=0 given N=nN=n equals

(F(r))n.\big{(}F(r)\big{)}^{n}. (A.15)

for n0n\geq 0, it follows using (A.9) and (A.15) that the joint density of MM, X0X_{0}, T0T_{0}, OO and NN at values 0, rr, τ\tau, 0 and nn is equal to

(F(r))nexp(λτ)(λτ)nn!\displaystyle\big{(}F(r)\big{)}^{n}\exp(-\lambda\tau)\dfrac{(\lambda\tau)^{n}}{n!}
=\displaystyle={} exp(λτ)(λτF(r))nn!.\displaystyle\exp(-\lambda\tau)\dfrac{\big{(}\lambda\tau F(r)\big{)}^{n}}{n!}. (A.16)

Summing over nn in (A.16) such that n0n\geq 0, we get the joint density of MM, X0X_{0}, T0T_{0}, and OO at values 0, rr, τ\tau and 0 equals

exp(λτ)exp(λτF(r))\displaystyle\exp(-\lambda\tau)\exp\big{(}\lambda\tau F(r)\big{)} (A.17)

Finally, the expressions in (A.11), (A.14), and (A.17) altogether conclude the proof of Lemma 2.1. ∎

Appendix B Proof of Lemma 2.2

For every 1l1\leq l\leq\ell, we define F~(x¯l):=F(x¯l)\widetilde{F}(\bar{x}_{l}):=F(\bar{x}_{l}). In other words, F~(zul):=F(zul)\widetilde{F}(z_{u_{l}}):=F(z_{u_{l}}) for every 1l1\leq l\leq\ell. Also, let F~(z0)=F(z0)=0\widetilde{F}(z_{0})=F(z_{0})=0 (with z0:=0z_{0}:=0 and u0:=0u_{0}:=0). Fix 1l1\leq l\leq\ell arbitrarily. We now define F~\widetilde{F} on (zul1,zul)(z_{u_{l-1}},z_{u_{l}}). Note that any element of 𝐳{\bf z} in this open interval has to be a reserve price for one of the auctions in the dataset. First,

F~(x):=F(zul1) for zul1<x<zul1+1.\widetilde{F}(x):=F(z_{u_{l-1}})\;\mbox{ for }z_{u_{l-1}}<x<z_{{u_{l-1}}+1}.

If ul1+1=ulu_{l-1}+1=u_{l}, the defining task is accomplished. Otherwise, for every ii such that ul1+1iul1u_{l-1}+1\leq i\leq u_{l}-1, we define

F~(x)=F(zi) for zix<zi+1.\widetilde{F}(x)=F(z_{i})\;\mbox{ for }z_{i}\leq x<z_{i+1}.

Hence, F~\widetilde{F} has now been defined on [0,zu][0,z_{u_{\ell}}].

We now consider two scenarios. If u=+Ku_{\ell}=\ell+K, then define F~(x)=F(x)\widetilde{F}(x)=F(x) for x>zux>z_{u_{\ell}}. It follows from the above construction that F~𝐳\widetilde{F}\in\mathcal{F}_{\bf z}. For every 1l1\leq l\leq\ell, note that

F~(x¯l)=F~(zul)=F(zul1)F(zul).\widetilde{F}(\bar{x}_{l}-)=\widetilde{F}(z_{u_{l}}-)=F(z_{u_{l}}-1)\leq F(z_{u_{l}}-).

Since F~(zul)=F(zul)\widetilde{F}(z_{u_{l}})=F(z_{u_{l}}), it follows that

F~(zul)F~(zul)F(zul)F(zul),\widetilde{F}(z_{u_{l}})-\widetilde{F}(z_{u_{l}}-)\geq F(z_{u_{l}})-F(z_{u_{l}}-),

or equivalently

F~(x¯l)F~(x¯l)F(x¯l)F(x¯l).\widetilde{F}(\bar{x}_{l})-\widetilde{F}(\bar{x}_{l}-)\geq F(\bar{x}_{l})-F(\bar{x}_{l}-).

Since F~\widetilde{F} and FF match on all elements of 𝐳{\bf z} by the above construction, we also have F~(rk)=F(rk)\widetilde{F}(r_{k})=F(r_{k}) for every 1kK1\leq k\leq K. It follows by Eq. (2.4) in the main paper that LikPA(F)LikPA(F~)Lik_{PA}(F)\leq Lik_{PA}(\widetilde{F}).

On the other hand, if u<+Ku_{\ell}<\ell+K, we define

F~(x)=F(zu) for zu<x<zu+1\widetilde{F}(x)=F(z_{u_{\ell}})\mbox{ for }z_{u_{\ell}}<x<z_{u_{\ell}+1}

and

F~(x)=1 for zu+1x.\widetilde{F}(x)=1\mbox{ for }z_{u_{\ell}+1}\leq x.

Hence, F~\widetilde{F} and FF match on all elements of {zi}i=1u\{z_{i}\}_{i=1}^{u_{\ell}}, and F~\widetilde{F} dominates FF on all elements of {zi}i=u+1+K\{z_{i}\}_{i=u_{\ell}+1}^{\ell+K}. By the exact same arguments as in the first scenario, it follows that F~(zul)F~(zul)F(zul)F(zul)\widetilde{F}(z_{u_{l}})-\widetilde{F}(z_{u_{l}}-)\geq F(z_{u_{l}})-F(z_{u_{l}}-) for every 1l1\leq l\leq\ell. It again follows by Eq. (2.4) in the main paper that LikPA(F)LikPA(F~)Lik_{PA}(F)\leq Lik_{PA}(\widetilde{F}).

The above analysis assumes that >0\ell>0. If =0\ell=0, then the vector 𝐱¯{\bf\bar{x}} is empty. It follows from Eq. (2.4) in the main paper that LikPA(F)Lik_{PA}(F) depends on FF only through {F(rk)}k=1K\{F(r_{k})\}_{k=1}^{K}, and is non-decreasing in each of these KK elements. In this case, let F~\widetilde{F} denote the CDF corresponding to the distribution which puts a point mass at zero. Then, F~𝐳\widetilde{F}\in\mathcal{F}_{\bf z} and LikPA(F)LikPA(F~)Lik_{PA}(F)\leq Lik_{PA}(\widetilde{F}). \Box

Appendix C An approach for estimating median bias for F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init}

In all of our experiments and illustrations, the HulC approach in Kuchibhotla, Balakrishnan and Wasserman (2021) is used to obtain 90% confidence bands for the estimators F^c,MLE\widehat{F}_{c,MLE} and F^init\widehat{F}_{init}. This approach, however, assumes median unbiasedness of the underlying estimator. Since F^init\widehat{F}_{init} and F^c,MLE\widehat{F}_{c,MLE} are not median unbiased, estimates of their respective median biases, denoted by

BMLE:=supx|Median(F^c,MLE(x))F0(x)| and Binit:=supx|Median(F^init(x))F0(x)|B_{MLE}:=\sup_{x\in\mathbb{R}}|\text{Median}(\widehat{F}_{c,MLE}(x))-F_{0}(x)|\mbox{ and }B_{init}:=\sup_{x\in\mathbb{R}}|\text{Median}(\widehat{F}_{init}(x))-F_{0}(x)|

are needed for an accurate application of the HulC method. Here F0F_{0} denotes the true population valuation distribution for the product under consideration (assumed to be absolutely continuous).

To obtain approximations for BinitB_{init} and BMLEB_{MLE}, consider a scenario where we observe {{F0(xi,k)}i=1mk}k=1K\{\{F_{0}(x_{i,k})\}_{i=1}^{m_{k}}\}_{k=1}^{K} instead of {{xi,k}i=1mk}k=1K\{\{x_{i,k}\}_{i=1}^{m_{k}}\}_{k=1}^{K} (with the corresponding population valuation distribution now Uniform(0,1)(0,1)). Let H^init\widehat{H}_{init}, H^(0)\widehat{H}_{(0)}, H^MLE\widehat{H}_{MLE} and H^c,MLE\widehat{H}_{c,MLE}, denote the corresponding initial estimator, discrete/step initial estimator, unconstrained MLE and the constrained MLE obtained by applying our approach to the transformed data. Since F0F_{0} is a strictly monotone transformation, the relative ordering of the bid values is left intact. Hence, MkM_{k}, the number of selling price changes throughout the course of the kthk^{th} auction, is left unchanged for the transformed data. It follows that the procedure described after Equation (2.2) in the main paper produces the same estimate λ^\widehat{\lambda} for the original and transformed data. Since Xi,kxX_{i,k}\leq x if and only if F0(Xi,k)F0(x)F_{0}(X_{i,k})\leq F_{0}(x), it follows that H^SP(y)=F^SP(F01(y))\widehat{H}_{SP}(y)=\widehat{F}_{SP}(F_{0}^{-1}(y)) and H^FP(y)=F^FP(F01(y))\widehat{H}_{FP}(y)=\widehat{F}_{FP}(F_{0}^{-1}(y)) for every y[0,1]y\in[0,1]. Here H^SP\widehat{H}_{SP} and H^FP\widehat{H}_{FP} respectively denote the final selling price and first non-reserve standing price based initial estimates for the transformed data. Based on the procedure described in Step III of Section 3.2 in the main paper, it follows that

H^(0)(y)={F^FP(F01(y))ifyF0(c)F^SP(F01(y))ify>F0(p1)F^FP(c)+(F^SP(p1)F^FP(c)F0(p1)F0(c))(yF0(c))ifF0(c)<yF0(p1).\widehat{H}_{(0)}(y)=\begin{cases}\widehat{F}_{FP}(F_{0}^{-1}(y))&\text{if}\ \ y\leq F_{0}(c)\\ \widehat{F}_{SP}(F_{0}^{-1}(y))&\text{if}\ \ y>F_{0}(p_{1})\\ \widehat{F}_{FP}(c)+\Big{(}\frac{\widehat{F}_{SP}(p_{1})-\widehat{F}_{FP}(c)}{F_{0}(p_{1})-F_{0}(c)}\Big{)}(y-F_{0}(c))&\text{if}\ \ F_{0}(c)<y\leq F_{0}(p_{1}).\end{cases}

whereas

F^(0)(F01(y))={F^FP(F01(y))ifyF0(c)F^SP(x)ify>F0(p1)F^FP(c)+(F^SP(p1)F^FP(c)p1c)(F01(y)c)ifF0(c)<yF0(p1).\widehat{F}_{(0)}(F_{0}^{-1}(y))=\begin{cases}\widehat{F}_{FP}(F_{0}^{-1}(y))&\text{if}\ \ y\leq F_{0}(c)\\ \widehat{F}_{SP}(x)&\text{if}\ \ y>F_{0}(p_{1})\\ \widehat{F}_{FP}(c)+\Big{(}\frac{\widehat{F}_{SP}(p_{1})-\widehat{F}_{FP}(c)}{p_{1}-c}\Big{)}(F_{0}^{-1}(y)-c)&\text{if}\ \ F_{0}(c)<y\leq F_{0}(p_{1}).\end{cases}

It is clear that H^(0)(y)\widehat{H}_{(0)}(y) and F^(0)(F01(y))\widehat{F}_{(0)}(F_{0}^{-1}(y)) only differ in the interval (F0(c),F0(p1)](F_{0}(c),F_{0}(p_{1})]. The difference of values in this interval arises due to the different nature of interpolation used in the two functions (linear in yy vs. linear in F01(y)F_{0}^{-1}(y)). If the above interval length is reasonably small and the derivative of F0F_{0} is relatively well-behaved in this interval, then H^(0)()\widehat{H}_{(0)}(\cdot) and F^(0)(F01())\widehat{F}_{(0)}(F_{0}^{-1}(\cdot)) should be reasonably close. Since F^init\widehat{F}_{init} and H^init\widehat{H}_{init} are continuous versions (via linear interpolation) of F^(0)\widehat{F}_{(0)} and H^(0)\widehat{H}_{(0)} respectively, the arguments above lead us to the approximation

Binit=supy(0,1)|Median(F^init(F01(y)))y|supy(0,1)|Median(H^init(y))y|.B_{init}=\sup_{y\in(0,1)}|\text{Median}(\widehat{F}_{init}(F_{0}^{-1}(y)))-y|\approx\sup_{y\in(0,1)}|\text{Median}(\widehat{H}_{init}(y))-y|. (C.1)

Since the underlying bids for the transformed data are uniformly distributed (note that XF0X\sim F_{0} implies F0(X)Uniform[0,1]F_{0}(X)\sim\mbox{Uniform}[0,1] for absolutely continuous F0F_{0}), the rightmost expression in (C.1) can be estimated using Monte Carlo.

We now focus on the MLE. Again, given that the transformed data and the original data share the same relative ordering of the bid values and the arguments above, it follows that the profile likelihood LikPALik_{PA} for the transformed data is exactly same as the original data (see Equation (2.8) in the main paper), with only one difference. The variable θi\theta_{i} for the transformed data is defined now as (1H(F0(zi)))/(1H(F0(zi1)))(1-H(F_{0}(z_{i})))/(1-H(F_{0}(z_{i-1}))), as opposed to (1F(zi))/(1F(zi1))(1-F(z_{i}))/(1-F(z_{i-1})). It follows that

H^MLE(F0(zi))=F^MLE(zi)1i+K.\widehat{H}_{MLE}(F_{0}(z_{i}))=\widehat{F}_{MLE}(z_{i})\;\forall 1\leq i\leq\ell+K.

Since the constrained MLE (at the data points) is obtained by constraining the values for the u1u_{1} indices to be equal to the initial estimator, and linear interpolation is used to get values of the constrained MLE at non-data points, similar considerations as above lead us to the approximation

BMLE=supy(0,1)|Median(F^c,MLE(F01(y)))y|supy(0,1)|Median(H^c,MLE(y))y|.B_{MLE}=\sup_{y\in(0,1)}|\text{Median}(\widehat{F}_{c,MLE}(F_{0}^{-1}(y)))-y|\approx\sup_{y\in(0,1)}|\text{Median}(\widehat{H}_{c,MLE}(y))-y|. (C.2)

Again, since the underlying bids for the transformed data are uniformly distributed, the rightmost expression in (C.2) can be estimated using Monte Carlo.

We performed simulation studies with the number of auctions K{100,500}K\in\{100,500\}, and various choices of the true valuation distribution such as Beta, Gamma and Uniform. The above approximation to the median bias works generally well in most settings. Even when this approximate is not very accurate, the approximation error is O(1/K)O(1/K) and not significant enough to make a perceptible difference in the resulting confidence curves.