A non-parametric approach for estimating consumer valuation distributions using second price auctions
Abstract
We focus on online second price auctions, where bids are made sequentially, and the winning bidder pays the maximum of the second-highest bid and a seller specified reserve price. For many such auctions, the seller does not see all the bids or the total number of bidders accessing the auction, and only observes the current selling prices throughout the course of the auction. We develop a novel non-parametric approach to estimate the underlying consumer valuation distribution based on this data. Previous non-parametric approaches in the literature only use the final selling price and assume knowledge of the total number of bidders. The resulting estimate, in particular, can be used by the seller to compute the optimal profit-maximizing price for the product. Our approach is free of tuning parameters, and we demonstrate its computational and statistical efficiency in a variety of simulation settings, and also on an Xbox 7-day auction dataset on eBay.
keywords:
, , and
1 Introduction
In a second price auction with reserve price, the product on sale is awarded to the highest bidder if the corresponding bid is higher than a seller-specified reserve price . The price paid by the winner is however, the maximum of the reserve price and the second highest bid. These auctions have been the industry standard for a long time, and are attractive to sellers as they induce the bidders to bid their true “private value” for the product, i.e., the maximum price they wish to pay for it. While some platforms have recently moved to first-price auctions, the second-price auction is still widely used on E-commerce platforms such as eBay, Rokt and online ad exchanges such as Xandr. The analysis of data obtained from these auctions presents unique challenges. For a clear understanding of these challenges, we first discuss in detail the auction framework, the observed data and the quantity of interest that we want to estimate/extract.
Auction framework: We consider an auction setting where a single product is on sale for a fixed time window . The seller sets the reserve price , which is used as the current selling price at time . Any bidder who arrives subsequently is allowed to place a bid only if his/her bid value is higher than current selling price at that time. If the bid is placed, the current selling price is updated to the second-highest bid value among the set of all placed bids up to that time, including this latest bid (the reserve price is also treated as a placed bid). 111Typically, a small increment (e.g. ) is also added to the second highest bid, but this insignificant increment is unlikely to influence bidder’s behaviour, and we ignore it in our analysis for ease of exposition. For example, suppose the current selling price at a given time is and the highest placed bid value up to that time is . If a bidder comes and bids , the bid will not be placed. If the bidder were to bid , the bid would be placed and the current selling price would be updated to (since now the second largest placed bid is ). If the bidder were to bid , the bid would be placed and the current selling price would be updated to . At the end of the auction period, if no bid above the reserve price is placed, the item goes unsold, otherwise it is sold to the highest bidder at the selling price at time . This final selling price is the second highest placed bid (including the reserve price) throughout the course of the auction.
Observed data: The observed data is the sequence of current selling price values (also sometimes referred to as the standing price) throughout the course of the auction, and the times at which there is a change in the selling/standing price. Typically, such data is available for multiple auctions of the same product. For example, in Section 5, we analyze data with current selling prices for different eBay -day auctions for Xbox. A key observation to make here is that consumers who access the auction but have bids which are less than the current selling price (standing price) are not allowed to place their bids. In other words, instead of observing the bids of all the customers who access the auction, we only observe the running second maximum of such bids.
Quantity of interest: Each bidder in the consumer population is assumed to have an independent private valuation (IPV) of the product. The IPV assumption in particular makes sense for products that are used for personal use/consumption (such as watches, jewelry, gaming equipment etc.) and is commonly used in the modeling of internet auctions (see Song (2004); Hou and Rego (2007); George and Hui (2012) and the references therein). Economic theory suggests that the dominant strategy for a bidder in a second-price auction is to bid one’s true valuation (Vickrey (1961)). The quantity that we want to estimate from the above data is the distribution of the valuations of the product under consideration for the consumer population. We refer to this as the consumer valuation distribution, and denote it by . As noted in George and Hui (2012), knowledge of provides the consumer demand curve for the product, and hence can be used by the seller to identify the profit-maximizing price (see the discussion in Section 4.1 from George and Hui (2012)).

An illustration of a second price auction: Figure 1 provides a concrete illustration of how a second price auction works. The data for this auction has been simulated from a setting where the bids follow a Pareto distribution with location parameter and dispersion parameter . The waiting times between bids are generated from an exponential distribution with rate parameter . In total, bids are generated in a time period of around minutes. The reserve price for the auction is . In Figure 1, the bid values and the current selling prices during the course of the auction are represented by the blue vertical lines and the black horizontal lines, respectively. The black dots on the black horizontal lines represents the time points (in minutes) of the bids. As can be seen from Figure 1, the initial selling price is equal to the reserve price (). We have the first bid of around at minutes. Since it’s higher than the reserve price , the reserve price still remains the current selling price at minutes. However, when the second bid of occurs at minutes ( minutes after the first bid’s occurrence), the current selling price jumps to as it’s the second highest value among the reserve price () and the two existing placed bids ( and ). We don’t observe any jumps in the current selling price for the next few bids as they all are less than the current selling price of (and hence are not placed). Then we see another jump at minutes, where a bid of which makes the current selling price jump to . The next few bids again happen to be less than the current selling price (). At around minutes, we see a last jump in the current selling price to (based on a new bid of which exceeds ). The subsequent bids are all less than , it remains the current selling price throughout the rest of the auction period. This can be observed through the flat horizontal black line at in the time period of minutes. The final selling price for the auction is therefore . The observed data for the above auction is the sequence of current selling prices given by and the sequence of times at which there was a change in the current selling price, given by .
The problem of demand-curve/valuation distribution estimation using auction data has been tackled in the last two decades for a variety of auction frameworks, see Song (2004); Park and Bradlow (2005); Chan, Kadiyali and Park (2007); Yao and Mela (2008); George and Hui (2012); Backus and Lewis (2020) and the references therein. Some of these papers assume a parametric form for , but as George and Hui (2012) argue, available auction data may often not be rich enough to verify the validity of the underlying parametric forms. George and Hui (2012) consider the second-price ascending bid auction framework for a single (homogeneous) product described above, and develop a Bayesian non-parametric approach to estimate using only the final selling prices in multiple auctions for a jewelry item. Note that the final selling price is not only the second highest placed bid in the auction period, it is also the second maximum order statistic of the (potential or placed) bids of all consumers who access the auction. Using only the final selling prices leads to identifiability problems, and George and Hui (2012) address this by assuming that the total number of consumers accessing the auction is also known. This information is available for the particular jewelry dataset from a third-party vendor, but is generally not available for most such datasets. When the total number of consumers accessing the auction is not known, the identifiability issue can be solved by using another order statistic (Song (2004)) such as the largest placed bid value (if available) along with the final selling price. (Backus and Lewis, 2020, Section 5) use such an approach for analyzing a dataset containing compact camera auctions on eBay.
To the best of our knowledge, none of the existing methods use the entire sequence of current selling price values to estimate the consumer valuation distribution . This was the primary motivation for our work, as only using the final selling price (and possibly the maximum placed bid, if available) leaves out a lot of available information. Including the current selling price information throughout the course of the auction however, involves significant conceptual and methodological challenges. As observed previously, the final selling price (second largest placed bid) is also the second largest (potential or placed) bid of all the consumers who accessed the auction. Hence, under relevant assumptions (see beginning of Section 2), the final selling price can be interpreted as the second largest order statistic of i.i.d. samples from . This interpretation is central to the methodology developed in George and Hui (2012). However, as the authors in George and Hui (2012) point out, the second largest current selling price throughout the course of the auction is not necessarily the third largest order statistic among all (placed or potential) bids, unless some severely restrictive and unrealisitc assumptions are imposed on the order in which bidders arrive in the auction. Hence, extending the methodology in George and Hui (2012) to include the entire sequence of current selling prices is not feasible; a completely new method of attack is needed.
In this paper, we fill this gap in the literature and develop novel methodology for non-parametric estimation of the consumer valuation distribution in second-price ascending bid auctions which uses the entire sequence of current selling prices. Additionally, the total number of consumers accessing the auction is not assumed to be known (unlike George and Hui (2012)), and the highest placed bid in the auction is also not assumed to be known (unlike Backus and Lewis (2020)). Incorporating the above novel features is very challenging, and the methodological development (provided in Section 2) is quite involved. The extensive simulation results in Section 4 demonstrate the significant accuracy gains in the estimation of that can be obtained by using the entire sequence of current selling prices as opposed to just the final selling price. We note that incorporating the current selling prices during the entire auction does come with a cost. In particular, two additional assumptions (compared to those in George and Hui (2012)) that need to be made about the rate of bidder arrival and number of bids made by a single consumer. We find some deviations from these two assumptions in the eBay -day Xbox data that we analyze in Section 5, but these deviations are minor (see discussion at beginning of Section 2). In such settings, it is reasonable to expect that the advantage of incorporating substantial additional data/information outweighs the cost of these approximations/deviations. We conclude the paper with a discussion of future directions in Section 6.
2 Methodology for learning the consumer valuation distribution
We start by discussing the main assumptions needed for the subsequent methodological development. We consider a setting where we have data from several independent, non-overlapping auctions for a single (homogeneous) product. As mentioned previously, we work within the IPV framework which is quite reasonable for items/products that are used for personal consumption. Similar to George and Hui (2012), we will assume that the collection of bidders who access the auction is an i.i.d. sample from the consumer population for the corresponding product, and that the collection of private product valuations of these bidders is an i.i.d. sample from the consumer valuation distribution . As stated in the introduction, it can be shown that the dominant strategy for a bidder in a second-price auction is to bid his/her true valuation. Under this strategy, any consumer who accesses the auction would bid his/her valuation with no need for multiple bidding, and we assume this behavior. We do note that in practice, some consumers do not follow this strategy. For example, eBay provides an option called proxy bidding or automatic bidding which allows the computer to automatically place multiple incremental bids below a cutoff price on behalf of the consumer (see also Ockenfels and Roth (2006); P. Bajari and A. Hortacsu (2003)). Since George and Hui (2012) only use the final selling price, they use a weaker assumption which allows for multiple bidding and stipulates that a consumer bids his/her true valuation sometime before the end of the auction.
Our final assumption is regarding the arrival mechanism of bidders in the auction. We assume that consumers arrive at/access the auction according to a Poisson process with constant rate . Again, a rational bidder (based on economic theory) in a second-price auction should be indifferent to the timing of his/her bid (see Milgrom (2004); Barbaro and Bracht (2021)), and this assumption makes sense in such settings. Again, we note that late-bidding (sniping) has been observed in some eBay auctions (see Bose and Daripa (2017); Barbaro and Bracht (2021)).
To summarize, our assumptions are identical to those in George and Hui (2012) with the exception of the single bidding-assumption and the constant rate of arrival assumption. While these assumptions are supported in general by economic theory, deviations from these two assumptions have been observed in some online auctions. However, for datasets such as the Xbox dataset analyzed in Section 5, these deviations are minor/anomalies. For example, in the Xbox dataset, only around of the bidders with placed bids show a multiple bidding behavior. In such settings, there is certainly value in using the subsequent methodology which uses the entire current selling price/standing price profile and does not assume knowledge of the total number of consumers accessing the auctions. If there is strong evidence/suspicion that the assumptions are being extensively violated, of course the results from this methodology should be treated with due skepticism and caution.
The subsequent methodological development in this section is quite involved, and we have tried to make it accessible to the reader by dividing it into subsections based on the major steps, and then highlighting the key milestones within each subsection, wherever necessary. We start by finding the joint density of the observed data obtained from a single second price auction.
2.1 Joint density of the observed data in a single second price auction
Consider a given (single) second price auction with reserve price . Hence, the initial selling price, denoted by , is equal to . The first time a consumer with bid value greater than arrives at the auction, that bid is placed but the current selling price remains . Subsequently, the current selling price (standing price) changes whenever a bid greater than the existing selling price is placed. Let denote the number of times the selling price changes throughout the course of the auction. When , let denote the sequence of current selling prices observed throughout the course of the auction, with denoting the new selling/standing price after the change. When , let denote the intermediate time between the and changes in the selling/standing price for . In particular, it follows that when , denotes the waiting time from the start of the auction until the moment when for the the first time, the selling price changes to a higher value than the reserve price . When , we define . Finally, let be a binary random variable indicating whether the item is sold before the end of the auction, i.e., indicates the item is sold and indicates that the item is not sold. Our observed data comprises of , , , and .
We define as the time after the last selling price change and until the auction closes. As discussed earlier, the number of consumers/bidders accessing a given (single) second price auction, denoted by , remains unobserved in our setup. Based on our assumption regarding the arrival mechanism (Poisson process with constant rate of arrival ) of bidders in the auction, we note that .
Note that there are three scenarios at the end of the auction: (a) the item is sold above the reserve price (), (b) the item is sold at the reserve price (), and (c) the item is not sold (). The following lemma provides a unified formula for the joint density of the observed data encompassing all these three scenarios.
Lemma 2.1.
For the second price auction described above, the joint density of , , , at values , , , is given by
where does not depend on , represents the reserve price (), denotes the constant rate of arrival of the bidders throughout the course of the auction, and represents the density function corresponding to .
2.2 Likelihood based on the observed data from multiple identical, non-overlapping second price auctions
Suppose we consider independent second price auctions of identical copies of an item (with possibly different reserve prices ). The observed data is , where denotes the number of selling price changes for the auction, denotes the indicator of the item being sold at the end of the auction, denote the selling/standing price sequence for the auction, and denote the sequence of intermediate waiting times between successive changes in the standing prices. Finally, let , for all . Since the auctions are independent, it follows by Lemma 2.1 that the likelihood function of the unknown parameters and for observed data values is given by .
Ideally, one would like to obtain estimates of and by maximizing the function . However, this likelihood function is intractable for direct maximization. A natural direction to proceed is to use the alternative maximization approach, which produces a sequence of iterates by alternatively maximizing with respect to given the current value of and then maximizing with respect to given the current value of . However, we found that such an approach can suffer from instability issues, which is not very surprising given the highly complicated and non-convex setting. Hence, we pursue and develop a slightly different approach which consists of two major steps:
-
•
Directly obtain an estimator of using generalized method of moments.
-
•
Obtain an estimate of by maximizing the profile log-likelihood .
Both of the steps above, especially the maximization with respect to , require intricate analysis, and we careful describe the details in Sections 2.3 and 2.4 below. This approach is computationally much less expensive than alternative maximization of and , and as our extensive simulations in Section 4 show, also provides stable and accurate estimates.
2.3 Estimation of : Generalized method of moments
Consider first a single second price auction with reserve price , and recall that denotes the number of times the selling price changes throughout the course of the auction. Our goal is to find a function such that . To this end, we consider the process of consumers accessing the auction whose bid value is greater than or equal to . Since we are assuming that the consumers bid their true private value, it follows that the proportion of such consumers in the population of all customers is , and this “thinned” process of arriving consumers with bid values greater than is a Poisson process with rate . Let represent the total number of consumers who access the auction in the period and have bid values greater than the reserve price . Then . Moreover, given , let () represent the event that the current selling price changes after the consumer (with bid greater than ) accesses the auction. Let, be the indicator function of the occurrence of the event .
Note that , and for , we have
(2.1) |
Here follows from the fact that two bids above are needed for the first change in the standing/selling price. For , note that the arrival of consumer with bid greater than changes the selling price if and only if the corresponding bid is the highest or second highest among the reserve price exceeding bids. Note that these bid values are i.i.d. with distribution truncated above . There are possible choices for the joint positions of the highest and second highest bids. The bid is the highest bid for of these choices, and the second highest bid for another choices, leading us to . It follows from (2.1) that
Note that is increasing in , and is stochastically increasing in terms of its mean parameter . Hence is a strictly increasing function, and
(2.2) |
If the reserve price is negligible, for example compared to the smallest final selling price seen in the data set, then it is reasonable to assume that . Suppose now, we consider the data from independent second price auctions of identical copies of an item, with denoting the number of standing/selling price changes throughout the course of the auction, and with the reserve price for the auction for . Let be the collection of all auction indices with negligible reserve prices. Then, it follows that from (2.2) that
should be a reasonable generalized method of moments estimator for .
Of course, the function is not available in closed form and needs to be computed using numerical methods. A natural approach, given the definition of as a Poisson expectation, is Monte Carlo. Indeed, we computed for every on a fine grid (with spacing ) ranging from to . This Monte Carlo computation of is a one-time process that required minimal computational effort. The resulting plot of is provided in Figure 2.

2.4 Estimation of : Some new notation based on pooled standing prices across all auctions
With the estimator in our hand, we now obtain an estimate of the valuation distribution by maximizing the function , which can be thought of as a version of profile likelihood for . Here the nuisance parameter has been profiled out not by conditional maximization, but by substituting the generalized method of moments estimator .
Note that the function is constrained to be non-decreasing. A key transformation to a constraint-free parametrization (described in Section 2.5 below) is needed to facilitate the maximization of . A crucial precursor to this re-parametrization is introduction of some new notation obtained by merging the standing prices from all the different auctions together. Let denote the total number of standing/selling price changes in all the auctions. Recall that denote the observed data values, and for . Let denote the observed value of . We will denote by the arrangement/ordering of the pooled collection such that ; under the assumption that is a continuous cdf, there should be no ties in the entries of with probability one. In other words, we pool the standing prices from all the auctions (excluding the reserve prices) and then arrange them in ascending order as . Also, for , we define where and are such that . Further, let
item is sold above the reserve price, | |||
Consider, for example a setting with independent second price auctions, with reserve prices . Suppose that the first auction has standing price changes with , the second auction has standing price changes with , the item is sold at the reserve price in the third auction (), and the item is unsold in the fourth auction (). Pooling and rearranging the standing prices (excluding reserve prices) from all the auctions, we see that , and
Note that the auction item is sold above the reserve price in the first two auctions, and the final selling prices are and respectively. Examining the positions of these two prices in gives us . Finally, is the collection of auction indices where the item is sold.
Using the newly introduced notation above and Lemma 2.1, it follows that the function is
(2.3) |
where doesn’t depend on . Maximization of the above likelihood over absolutely continuous CDFs leads to one of the standard difficulties in non-parametric estimation. As moves closer and closer to a CDF with a jump discontinuity at any , the function converges to infinity. Hence, any absolutely continuous CDF with a density function cannot be a maximizer of the above profile likelihood function. Following widely used convention in the literature (see Murphy (1994), Vardi (1982)), we will extend the parameter space to allow for the MLE of to be a discrete distribution function. To allow for discrete CDFs, we replace by . Thus the adapted profile likelihood can be written as
(2.4) |
where . We now establish a final bit of notation necessary to introduce the constraint-free reparametrization of . We now pool the standing prices from all the auctions (including the reserve prices), i.e., , and arrange them in ascending order as , and denote . Under the assumption that is a continuous cdf, there should be no ties in the entries of with probability one. The only other entries in are the reserve prices. In practice, it is possible that there are ties in the reserve prices, in which case we add a very small noise to the reserve prices to ensure that there are no ties in the entries of . Similar to , we define where and are such that . Further, let
item is sold above the reserve price, | |||
Since the entries of and both are arranged in ascending order, it follows that . In the example considered earlier in this subsection with auctions, by pooling the reserve price values with entries of and rearranging them in ascending order, we obtain
Recall that the item is sold above the reserve price only in the first two auctions. By identifying the positions/ranks of and in , we obtain . Similarly, by identifying the positions/ranks of the entries of in , we obtain .
It is clear from (2.4) that for maximizing it is enough to search over the class of CDFs with jump discontinuities at elements of . The next lemma (proved in Appendix B) shows that the search for a maximizer can be further restricted to a certain class of CDFs with possible jump discontinuities at elements of .
Lemma 2.2.
Let denote the class of CDFs which are piece-wise constant in , such that the set of points of jump discontinuity (in ) is a superset of elements of and a subset of elements of . Then, given any cdf with jump discontinuities at elements of , there exists such that .
For any , note that depends on only through or equivalently through
(since only has jump discontinuities at elements of and is otherwise piece-wise constant). This is typical in a non-parametric setting, and we can hope/expect to only obtain estimates of the valuation distribution at the observed standing prices (including the reserve prices).
2.5 Estimation of : A constraint-free reparametrization
Note that the entries of the vector are non-decreasing, and this constraint complicates the maximization of . So, we transform to another dimensional parameter vector as follows:
(2.5) |
where
(2.6) |
Since is non-decreasing, and takes values in , it follows that (with the convention ). Focusing our search on the class of CDFs in leads to additional constraints. Since any cdf in this class has a jump discontinuity at each , it follows that for , and for every . In other words, we have for , and for . There are no other constraints on the elements of . Also, we can retrieve given using the following equality.
(2.7) |
Now, using (2.4), (2.6) and (2.5), we can rewrite the ‘adapted profile’ likelihood in terms of as follows:
where . Using (2.7), we get
(2.8) |
The goal now is to maximize with respect to , where each entry of is in , for , and for . We achieve this using the coordinate-wise ascent algorithm. The details of this algorithm are derived in Section 3.
3 Maximizing : Coordinate ascent algorithm
Applying natural logarithm on both sides of the equation in (2.8), we get
(3.1) |
where . We now introduce notation which allows for a more compact and accessible representation of . Recall that is obtained by pooling all the reserve prices and the ‘non-reserve’ standing prices (elements of ), and represents the position of the in for . In particular, is the position of , the largest ‘non-reserve’ standing price across all the auctions in . In other words, . It is possible that . For example, in settings where the reserve price in one of the auctions where the item is unsold is larger than , it follows that is not the largest entry in and . With this background, we define
(3.2) |
In other words, note that induce an ordered partition of the set into disjoint subsets
Hence, any belongs to one of the subsets in the above partition, and is defined to be one less than the position of that subset in the partition. For we define . In the example with auctions from Section 2.4, , and
Using the above notation, it follows from (3.1) that
(3.3) |
where
Note that
To maximize , we pursue the coordinate-wise ascent approach where each iteration of the algorithm cycles through maximization of with respect to the co-ordinate (with other entries of fixed at their current values) for every . We now show that each of these coordinate-wise maximizers are available in closed form.
3.1 Coordinate-wise maximizers for
Based on the algebraic structure of , we divide the coordinate-wise maximization steps into three groups: One with when , where is defined to be the set , the second with when and , and the third with when and . We discuss each case in detail separately below.
Case I: Maximization w.r.t. for . If is non-empty, then . For any , taking derivative of the expression for in (3.3) w.r.t. and equating it to zero gives us the following
(3.4) |
where
(3.5) |
Since and , it follows that
(3.6) |
Since , it follows that the discriminant of the quadratic equation (3.6), denoted by , satisfies
(3.7) |
Hence, the quadratic equation (3.6) has two real roots, namely,
(3.8) |
Since by (3.7), it follows that
since . Hence the larger root with the positive sign for the square-root discriminant always lies outside the set of allowable values for . The smaller root with the negative sign can be shown to be strictly positive since . Also, if , then
If , then using we get
It follows that the smaller root lies in . Since
it follows that the smaller root is the unique maximizer of with respect to . To conclude, the unique maximizer of with respect to is given by
(3.9) |
where and are as defined in (3.5).
Case II: Maximization w.r.t. for and . For any with and , the coefficient of , given by , is strictly positive. Again taking derivative of the log-likelihood expression in (3.3) w.r.t. and equating it to zero gives us
where is as defined in (3.5). Note that is positive but not guaranteed to be less than or equal to . However, since
it follows that , i.e., is an increasing function of for . Hence, the unique maximizer of with respect to is given by
(3.10) |
Case III: Maximization w.r.t. for and . In this case and . It follows from (3.3) that is maximized with respect to at . Hence, we set
(3.11) |
This amounts to estimating for by . Note that any such corresponds to a reserve price which is greater than the largest observed bid. Since the data offers no evidence that the support of the true valuation distribution extends up to , setting the estimate of to indeed seems a sensible choice in this non-parametric setting.
3.2 Constructing an initial estimate of (and of ) based exclusively on final selling prices and first observed bids
The details of all the steps of the coordinate ascent maximization algorithm for are explicitly derived above in Section 3.1. However, a crucial detail which needs to be worked out is a ‘good’ initial starting point for the algorithm. Especially for highly non-convex maximizations such as in the current setting, the choice of the initial/starting value can play a critical role in the quality of the final estimate produced by the coordinate ascent algorithm. In this section, we construct an initial estimate of based on the empirical distribution functions of both the final selling prices and the first observed bids (i.e., the price when for the first time the standing price jumps to a higher value from its respective reserve price), respectively. Note that the methodology developed in George and Hui (2012) also relies exclusively on the final selling prices. However, that approach requires the knowledge of the total number of consumers accessing each of the auctions. We do not assume this knowledge in the current setting, and need to overcome this additional challenge. Also, as stated above, we will use the first non-reserve standing prices (first observed bids) to improve the quality of our initial estimator of .
Once the initial estimate of is constructed, we can easily construct the initial estimate of using (2.5) as follows:
(3.12) |
We now describe in detail the various steps involved in construction of .
Step I: Construct an estimate of based on the empirical distribution function of only the final selling prices of auctions with relatively small reserve prices. First, consider a single second price auction with reserve price . As in Section 2.3 consider the process of consumers accessing the auction whose bid value is greater than or equal to , and let represents the number of such consumers that access the auction in the period . As observed in Section 2.3, this thinned process of arriving consumers is a Poisson process with rate , and . We derive the conditional cdf of the final selling price given as a function of . For this purpose, note that
(3.13) |
Recall that denotes the maximum placed bid during the course of the auction (we do not observe it), and that the valuation distribution of customers arriving in the thinned Poisson process discussed above is the truncated version of at , denoted by
Now, note that the event is equivalent to the constraint that the largest order statistic of the valuations of all the customers arriving via the thinned Poisson process is greater than , but second largest order statistic is less than or equal to . Similarly, the event is equivalent to the constraint that the largest order statistic of the valuations of all the customers arriving via the thinned Poisson process is less than or equal to . With , it follows from (3.13) that
where . Let,
(3.14) |
Note that
(3.15) |
It follows that is a strictly increasing function for .
Now, coming back to our setting with independent auctions, suppose that we have multiple auctions with a given reserve price (or close to ) where the item is sold above the reserve price. Then based on the Glivenko-Cantelli lemma, (3.14) and (3.15), we can use the function (with an appropriate estimate of ) applied to empirical cdf of the final selling prices of these auctions to estimate for . Setting the estimate of to be zero, we can then obtain an estimate of for . Clearly, we would like to choose to be as small as possible.
With this background, let denote smallest non-negative number such that there are at least a -fraction of the reserve prices among lie in . Here and are user-specified constants, and we denote the set of indices of reserve prices which lie within as . Ideally, one would like to have a reasonable number of auctions with very small/negligible reserve prices. For example, in the Xbox data analyzed in Section 5, roughly of the auctions have a reserve price less than (the smallest final selling price is ). Let
be the empirical distribution function of the final selling prices for auctions in . Based on the above discussion we construct the estimator of as
(3.16) |
Since , it follows that for . In fact, for all values below the smallest final selling price for auctions corresponding to ). There are likely many observed standing prices in the auctions which are below this smallest final selling price, and these values can/should be used to improve the estimator . This process is described in the next step.
Step II: Incorporate the first non-reserve standing prices into the construction of the initial estimate of . Consider again, to begin with, a single second price auction with reserve price , and the associated thinned Poisson process of arriving consumers with valuation greater than . Letting represent valuations of the first two arriving consumers in the thinned process, we have
(3.17) |
Similar to Step I, let
be the empirical distribution function of the first non-reserve standing prices for auctions in . Based on (3.17), we construct the estimator of as
(3.18) |
Note that , which implies that . However, when is larger than the smallest first non-reserve standing price among auctions in . This smallest non-reserve standing price is often much smaller than the smallest final selling price, and hence can be combined with of Step I, to get a better initial estimate of as follows.
Step III: Combining the two different initial estimates, namely, and . Let and respectively represent the largest non-reserve standing price and the smallest final selling price for auctions in . As discussed previously, underestimates below and overestimates above . Let be the largest real number such that . Then, we define a function based on and as follows:
(3.19) |
This function in (3.19) combines the strengths of the two estimators and , and gives a more balanced estimator of over all regions. Finally, since are step functions, so are . It follows based on (3.19) that is a step function as well, and has jumps only at the first non-reserve standing prices and final selling prices for auctions in . A continuous version of this estimator, denoted by can be obtained by linear interpolation of the values between the jump points.
3.3 The Coordinate ascent algorithm for maximizing
All the developments and derivations in the earlier subsections can now be compiled and summarized via the following coordinate ascent algorithm to maximize .
Algorithm 3.1.
Once we get , we can easily get the corresponding maximum likelihood estimator of as follows
(3.20) |
As explained above, the adapted objective function and the search space of CDFs with relevant jump discontinuities are artifacts of the non-parametric approach that we pursue. However, once the estimates of the valuation distribution at elements of are obtained using Algorithm 3.1, a continuous estimate on the entire valuation distribution can be constructed via interpolation. In particular, we use the values of at ’s, , and linear interpolation to construct a continuous estimator of the population valuation distribution over the entire real line.
3.4 Instability of near and boundary correction
From extensive simulation results using several choices of the true valuation distribution, we observed that seems to overestimate near , particularly, in the region below the minimum of all (non-reserve) standing prices from all the auctions. In other words, seems to overestimate for , where is the smallest (non-reserve) standing price obtained from all the auctions.
Instability of non-parametric MLE near the boundaries is a common phenomenon in the literature. In our case, it happens near . To illustrate this, we generate a data set containing independent auctions where each auction runs for time units. The true underlying valuation distribution is taken to be a gamma distribution with shape parameter and rate parameter . The plots of the corresponding true (black curve), (orange curve), and (blue curve) are shown in Figure 3. In this case, the minimum of standing prices across all the auctions is (green vertical line). It is easy to observe from Figure 3 that for , the graph of (orange curve) lies way above the graphs of both true and . It indicates that (orange curve) overestimates in the region . In comparison, we notice that the initial estimate is more stable in this region (this was consistently observed in a broad variety of simulation settings). We leverage this observation to make the following modification in the steps of the coordinate ascent algorithm: we only look for distribution functions which are constrained to be equal to the initial estimator in the region . The modified coordinate ascent algorithm is described below in Algorithm 3.2. Once we get the constrained MLE of , denoted by , from Algorithm 3.2; we can easily get the values of the corresponding constrained MLE of , denoted by , at all components of using (3.20). As discussed earlier, linear interpolation can be used to get the values of at all other positive real numbers. The plot of (red curve) for the simulated data discussed above is provided in Figure 3, and clearly illustrates the performance improvement near as compared to the earlier unconstrained MLE (orange curve).

Algorithm 3.2.
Modified coordinate ascent algorithm:
-
Step 1.
Start with initial value , and a user defined tolerance level .
-
Step 2.
Set .
- Step 3.
-
Step 4.
If
set . Otherwise, go to Step.
-
Step 5.
Set .
4 Simulation study
In this section we consider various choices of the true underlying valuation distribution , e.g., uniform, piecewise uniform, pareto, gamma, and beta distributions, which are commonly used in marketing research. We then illustrate and compare the performance of the constrained (boundary corrected) MLE and the initial estimate with the corresponding true valuation distribution . Note that the Bayesian methodology in George and Hui (2012) (which uses only the final selling prices in each auction) requires the knowledge of the the total number of consumers accessing the auction. This does not hold in our motivating application, and hence we will also not assume such a knowledge in our synthetic data evaluations below. The estimator in (3.16) is based only on final selling prices, and does not need the knowledge of total number of customers accessing the auction. The estimator , which improves by combining it appropriately with the first non-reserve standing price based estimator in (3.19). Hence, will be used as a representative/adaptation of the final selling price based approach of George and Hui (2012) for the setting considered in this paper.
4.1 Data generation
We conducted five sets of simulation experiments, each using data simulated from a different choice of the underlying valuation distribution . The cumulative distribution functions corresponding to the five choices of true underlying valuation distributions are shown in Figure 4.

For the first set of simulations, the underlying is a Uniform distribution. For the second set of simulations, the underlying is an equally weighted mixture of the Uniform and Uniform distributions. From a managerial/marketing perspective, this corresponds to a market with two distinct consumer segments with different average valuations. For the third set of simulations, the true underlying is a Pareto distribution with location parameter and dispersion parameter . For the fourth set of simulations, the true underlying is a Gamma distribution with shape parameter and rate parameter . For the last and fifth set of simulations, the underlying is a Beta distribution with its two positive shape parameters being equal to , i.e., Beta distribution.
From each of the five true underlying ’s, we consider two settings, with and independent auctions of identical copies of an item. Varying the sample size here sheds light on the relationship between the sample size and the precision of the constrained MLE and the initial estimate in estimating the true valuation distribution . For each auction, we took the auction window () to be 100 units, and the constant rate () of arrival of bidders to be equal to . We then simulated the inter-arrival times between bidders from an exponential distribution with rate parameter , and drew the bidders’ valuations from , keeping track of the entire sequence of standing prices and the intermediate times between jumps in the standing price for all independent auctions involved. This sequence of standing prices throughout the course of the auction, and the intermediate times between standing price changes are then treated as the observed dataset that is subsequently used to compute the initial estimator and the constrained MLE. For each choice of true and number of auctions , replicated datasets are generated.
Since the data is generated by consistent with the modeling assumptions, one expects the MLE which utilizes all available information, to have a superior performance than the initial estimator, which only uses the final selling price and first non-reserve standing price for each auction. The goal of these simulations is to examine extensively how much improvement can be obtained from our proposed method by incorporating the additional information in a variety of settings.
4.2 Simulation Results
For each replicated dataset generated (as described in the previous subsection), we apply our non-parametric methodology to obtain the initial estimate and the constrained MLE . The goal is to compare the accuracy of each of these estimators with respect to the respective true valuation distribution .
We first provide a visual illustration of the results by choosing a random replicate out of for each of the simulation settings ( true valuation distributions, and settings for the total number of auctions ). For Figure 5, we consider a randomly chosen replicate from the setting where the true valuation distribution is Uniform and . The estimates , , and the true valuation distribution are plotted. We also provide the confidence intervals for both estimators based on the HulC approach developed in Kuchibhotla, Balakrishnan and Wasserman (2021). The HulC approach assumes median unbiasedness of the underlying estimators. Since and are not median unbiased, we correct for the median bias using a heuristic approach described in Appendix C. It can be seen that is much closer to the true compared to at almost all values in the interval . Figure 6 provides a similar plot for a randomly chosen replicate generated from the Uniform and setting. As expected, we see that the bias of both and reduces drastically when we increase the number of independent auctions from to , and overall provides a much more accurate estimate of the true valuation distribution .
Moreover, we observe from Figure 5 and Figure 6 that the HulC confidence regions for (denoted by red-colored step function plot) are in general narrower compared to that of (denoted by the blue-colored step function plot). In other words, has lower variance than . Again, as expected, the variance of both and decreases as we change the value of from to .


We provide similar plots for a randomly chosen replicate from the eight other settings (with true being piece-wise Uniform, Pareto, Gamma and Beta, and with ) in Figures 7, 8, 9, and 10, and see that a similar phenomenon holds for all these settings: (a) is less biased than , (b) has narrower HulC confidence bands than , and (c) the bias and variance of both estimators decreases as increases from to .
The above plots based on single chosen replicates are illustrative, but need to be complemented with performance evaluation averaged over all the replicates in each of the simulation settings. In Table 1, for each simulation setting, we provide both the KS-distance and the Total variation distance (TV-distance) between the true valuation distribution and the two estimates and averaged over the respective replications. The results show that the constrained MLE (based on the entire collection of standing prices) uniformly outperforms the initial estimator (based only on final selling prices and first non-reserve standing prices) in all the simulation settings. This strongly suggests that if the additional assumptions of single bidding and constant arrival rate seem to largely hold, it is worth using the proposed methodology which incorporates the extra information available in the form of all standing prices within the auction period.




Setting | KS distance (MLE) | KS distance (initial) | TV distance (MLE) | TV distance (initial) |
---|---|---|---|---|
Uniform, | ||||
Uniform, | ||||
Piec.Unif., | ||||
Piec. Unif., | ||||
Pareto, | ||||
Pareto, | ||||
Gamma, | ||||
Gamma, | ||||
Beta, | ||||
Beta, |
5 Empirical application
In this section we apply our method to estimate the true valuation distribution of an Xbox based on actual data obtained from second-price auctions on eBay. In Section 5.1, we provide an overview of the data, and discuss features and adjustments to ensure its suitability for the methodology developed in the paper. In Section 5.2, we apply our non-parametric methodology on the data set and present the findings, and perform additional performance analysis.
5.1 Data overview
The data set on eBay on online auctions of Xbox game consoles was obtained from the Modeling Online Auctions data repository. More specifically, we focus on a data set which provides information for online auctions of identical Xbox game consoles where each auction lasts for days. For each auction, a user’s bid is recorded only if changes the standing price in the auction. For each such bid, the following information is provided: auctionid (unique auction identifier), bid (dollar value of the bid), bidtime (the time, in days, that the bid was placed), bidder (bidder eBay username), bidderrate (internal eBay rating of the bidder), openbid (the reserve price for the auction, set by the seller), and price (the final selling price for the auction). While the standing price values throughout the course of the auction were not directly provided, they can be easily inferred from the successful bid values from the bid column and the reserve price. Also, the bidtime column directly provides the sequence of times at which there is a change in the standing price.
As mentioned in the introduction, we found that a minor fraction of bidders (less than of the total) placed multiple bids. Many of these bids are consecutive bids by the same bidder to ensure that they become the leader in the option. Note that we observe only ’successful’ bids, i.e., bids which change the standing price of the auction. If a successful bidder (post bidding) observes that the standing price of the auction has changed to their bid (plus a small increment), it can be inferred that this bid is currently the second highest. Hence, through a proxy bidding system offered by eBay, the bidder could choose to incrementally push up their bid until they become the leader in the option (the standing price becomes less than their latest bid). The proxy system also needs to be provided with a ceiling value, above which no bids are to be submitted. This value is very likely the bidder’s true valuation of the product. With this in mind, and to adapt the data as much as possible to our single bidding assumption, we remove all the previous bids of such multiple bidders from the data, and keep only the final bid. Finally, there are a couple of auctions where the first successful bid values are same as the reserve prices (openbid) of the corresponding auctions. To ensure compliance with our requirement of no ties, and for uniformity, we added a small random noise from Uniform to all the bids across all the auctions. Since the total number of bidders accessing the auctions is not available, the final selling price based methodology in George and Hui (2012) is not applicable. As in the simulations, we will use the initial estimator , which is computed using only the final selling prices and first observed bids in all auctions, as a representative of this methodology in the current setting.
5.2 Analysis of Xbox data
Using the Xbox 7-day auctions dataset with slight modifications as mentioned in Section 5.1, we now compute the initial estimate and the constrained MLE . For the estimation of (see Section 2.3) we need to choose a subset of auctions whose reserve prices are negligible in the given context. We found that the smallest final selling price in all the auctions is $28 and the median final selling price in all the auctions is $120. Given this, we chose all auctions with reserve price less than $10 ( out of ) for obtaining the generalized method of moments based estimator of , and also for the computing the final selling price based estimator (see Step I in Section 3.2). Recall that is one of the components used to compute the initial estimator .
The plots of the initial estimate and the constrained MLE of the (unknown) true valuation distribution along with the corresponding HulC confidence regions are provided in Figure 11. Similar to the phenomenon observed in the simulations in Section 4.2, we notice that the HulC confidence region of is lesser in width than that of , indicating comparatively smaller variance of . We see that the two estimates are reasonably different: the total variation distance between them is and the KS distance between them is . Another interesting observation is that the curves for these two estimates cross exactly once, with dominated by after the crossing point, and vice-versa before the crossing point. This implies that stochastically dominates . In other words, the final selling price/first non-reserve price based initial estimator signifies higher Xbox valuations than the MLE estimator based on the entire collection of standing prices throughout the course of the auctions.
Unlike the simulation setting, the true valuation distribution is obviously not known here. However, we still undertake a limited performance evaluation and comparison exercise for the two approaches. As discussed previously in Section 4.1, if the modeling assumptions are largely unviolated (which seems to be the case) one would expect the MLE to do better than the initial estimator. The goal of this limited evaluation is again to understand the amount of improvement, and also to examine the stability of both estimators. For this purpose, we split the entire Xbox dataset into training and test sets. In particular, we consider two choices of splits namely, and , for the ratio of auctions in training vs. test data. For each split, using (MLE estimator using test data) as an approximation for the true valuation distribution , we evaluate both the initial estimate and the constrained MLE from the training set and compare each of them with . The results for one such 1:1 and 2:1 split each are shown in Figure 12. We can see that is significantly closer to as compared to . We also calculate the average total variation distance between and each of and , averaged over replications of each random 1:1 and 2:1 split. The values are provided in Table 2.


train : test | AvgTV | AvgTV |
---|---|---|
6 Discussion and future research
In this paper we have a developed a non-parametric methodology for estimating the consumer valuation distribution using second price auction data. Unlike the approach in George and Hui (2012), our methodology uses the collection of current selling price values throughout the course of the auctions, and does not require knowledge of the total number of bidders accessing the auction. Extensive simulations demonstrate that, when the modeling assumptions are true, using our approach can lead to significantly better performance than estimators based on just final selling prices and first observed bids. Two additional assumptions (compared to George and Hui (2012)) which preclude multiple bidding and postulate constant rate of arrival of the consumers to the auction are needed. Many real-life second price auctions see only minor departures from these assumptions, which are supported by economic theory. However, if there is evidence of major violation, results from the proposed methodology should be used cautiously. Generalizing our methodology by relaxing one or both of these assumptions is a topic of future research. One possible direction which we plan on exploring is allow two different rates for the bidder arrival process, with a transition between these two rates happening sometime during the auction period.
Rohit Patra’s work was partially supported by NSF grant DMS-2210662.
References
- Backus and Lewis (2020) {btechreport}[author] \bauthor\bsnmBackus, \bfnmM.\binitsM. and \bauthor\bsnmLewis, \bfnmG.\binitsG. (\byear2020). \btitleDynamic demand estimation in auction markets \btypeTechnical Report, \bpublisherNBER Working paper 22375. \endbibitem
- P. Bajari and A. Hortacsu (2003) {barticle}[author] \bauthor\bsnmP. Bajari and A. Hortacsu (\byear2003). \btitleThe winner’s curse, reserve prices, and endogenous entry: Empirical insights from eBay auctions. \bjournalRand J. Econ. \bvolume34 \bpages329-355. \endbibitem
- Barbaro and Bracht (2021) {barticle}[author] \bauthor\bsnmBarbaro, \bfnmSalvatore\binitsS. and \bauthor\bsnmBracht, \bfnmBernd\binitsB. (\byear2021). \btitleShilling, Squeezing, Sniping. A further explanation for late bidding in online second-price auctions. \bjournalJournal of Behavioral and Experimental Finance \bvolume31 \bpages100553. \bdoihttps://doi.org/10.1016/j.jbef.2021.100553 \endbibitem
- Bose and Daripa (2017) {barticle}[author] \bauthor\bsnmBose, \bfnmSubir\binitsS. and \bauthor\bsnmDaripa, \bfnmArup\binitsA. (\byear2017). \btitleShills and snipes. \bjournalGames and Economic Behavior \bvolume104 \bpages507-516. \bdoihttps://doi.org/10.1016/j.geb.2017.05.010 \endbibitem
- Chan, Kadiyali and Park (2007) {barticle}[author] \bauthor\bsnmChan, \bfnmT.\binitsT., \bauthor\bsnmKadiyali, \bfnmV.\binitsV. and \bauthor\bsnmPark, \bfnmY. H.\binitsY. H. (\byear2007). \btitleWillingness to pay and competition in online auctions. \bjournalJournal of Marketing Research \bvolume44 \bpages324–333. \endbibitem
- (6) {bmisc}[author] \bauthor\bsnmeBay \btitleAutomatic Bidding. \endbibitem
- George and Hui (2012) {barticle}[author] \bauthor\bsnmGeorge, \bfnmEdward I.\binitsE. I. and \bauthor\bsnmHui, \bfnmSam K.\binitsS. K. (\byear2012). \btitleOptimal pricing using online auction experiments: A Pólya tree approach. \bjournalThe Annals of Applied Statistics \bvolume6 \bpages55 – 82. \bdoi10.1214/11-AOAS503 \endbibitem
- Hou and Rego (2007) {barticle}[author] \bauthor\bsnmHou, \bfnmJ.\binitsJ. and \bauthor\bsnmRego, \bfnmC.\binitsC. (\byear2007). \btitleA classification of online bidders in a private value auction: Evidence from eBay. \bjournalInternational Journal of Electronic Marketing and Retailing \bvolume1 \bpages322–338. \endbibitem
- Kuchibhotla, Balakrishnan and Wasserman (2021) {bmisc}[author] \bauthor\bsnmKuchibhotla, \bfnmArun Kumar\binitsA. K., \bauthor\bsnmBalakrishnan, \bfnmSivaraman\binitsS. and \bauthor\bsnmWasserman, \bfnmLarry\binitsL. (\byear2021). \btitleThe HulC: Confidence Regions from Convex Hulls. \endbibitem
- Milgrom (2004) {bbook}[author] \bauthor\bsnmMilgrom, \bfnmP.\binitsP. (\byear2004). \btitlePutting Auction Theory to Work. \bpublisherCambridge. \endbibitem
- Murphy (1994) {barticle}[author] \bauthor\bsnmMurphy, \bfnmS. A.\binitsS. A. (\byear1994). \btitleConsistency in a Proportional Hazards Model Incorporating a Random Effect. \bjournalThe Annals of Statistics \bvolume22 \bpages712–731. \endbibitem
- Ockenfels and Roth (2006) {barticle}[author] \bauthor\bsnmOckenfels, \bfnmAxel\binitsA. and \bauthor\bsnmRoth, \bfnmAlvin E.\binitsA. E. (\byear2006). \btitleLate and multiple bidding in second price Internet auctions: Theory and evidence concerning different rules for ending an auction. \bjournalGames and Economic Behavior \bvolume55 \bpages297-320. \bnoteMini Special Issue: Electronic Market Design. \bdoihttps://doi.org/10.1016/j.geb.2005.02.010 \endbibitem
- Park and Bradlow (2005) {barticle}[author] \bauthor\bsnmPark, \bfnmY. H.\binitsY. H. and \bauthor\bsnmBradlow, \bfnmE.\binitsE. (\byear2005). \btitleAn integrated model for bidding behavior in internet auctions: Whether, who, when, and how much. \bjournalJournal of Marketing Research \bvolume42 \bpages470–482. \endbibitem
- Song (2004) {btechreport}[author] \bauthor\bsnmSong, \bfnmUnjy\binitsU. (\byear2004). \btitleNonparametric estimation of an eBay auction model with an unknown number of bidders \btypeTechnical Report, \bpublisherUniversity of British Columbia. \endbibitem
- Vardi (1982) {barticle}[author] \bauthor\bsnmVardi, \bfnmY.\binitsY. (\byear1982). \btitleNonparametric Estimation in the Presence of Length Bias. \bjournalThe Annals of Statistics \bvolume10 \bpages616 – 620. \bdoi10.1214/aos/1176345802 \endbibitem
- Vickrey (1961) {barticle}[author] \bauthor\bsnmVickrey, \bfnmW.\binitsW. (\byear1961). \btitleCounter-speculation, auctions, and competitive sealed tenders. \bjournalJournal of Finance \bvolume16 \bpages8–37. \endbibitem
- Yao and Mela (2008) {barticle}[author] \bauthor\bsnmYao, \bfnmS.\binitsS. and \bauthor\bsnmMela, \bfnmC.\binitsC. (\byear2008). \btitleOnline auction demand. \bjournalMarketing Science \bvolume27 \bpages861–885. \endbibitem
Appendix A Proof of Lemma 2.1
Proof.
We first introduce some additional notation. Let represent the number of bidders accessing the auction between the and changes in the selling price, and let represent the number of bidders accessing the auction until the first time when the selling price changes to a higher value from the reserve price . Also, let represent the time between the arrival of and bidders accessing the auction, and let represents the waiting time of the arrival of the first bidder from the start of the auction. Recall that a bidder accessing the auction is allowed to place a bid only if the bid value is greater than the current selling price. We now consider three possible scenarios at the end of the auction.
Case I: When the item is sold above the reserve price . In this case, the number of times the selling price changes throughout the course of the auction i.e., , is positive. Now, let us first derive the conditional density of the standing prices given , ,
, and .
-
•
Since, is the number of bidders until the first time that the standing price changes to a higher value than the reserve price , it means that there are many bids that are less than , and only two bids are higher than with being the second highest bid. Also, the first bid which is higher than can occur at many places.
-
•
For to be the next standing price After being the current second highest bid, the next bids must be less than and the bid should be higher than .
-
•
Continuing on like this, we should have the last many bids less than after becomes the standing price (and the second highest bid of the entire auction) with the (unobserved) highest bid occurring somewhere before.
It follows that the conditional density of given , , , and is given by
(A.1) |
where . Note that the above holds only if , , , and .
For a collection of i.i.d. random variables , the distribution of number of changes in the running second maximum, and location of these changes in the index set is invariant under any strictly monotone transformation on the s. If is absolutely continuous, then exists and is strictly increasing. Note is a collection of i.i.d. Uniform random variables. Applying the above conclusions to our context with being the valuation of the bidder accessing the auction, it follows that the distribution of , , given does not depend on . Using (A.1), it follows that the joint density of , , , at values (with ), , , given is equal to
assuming that the arguments satisfy the constraints , , , and (otherwise the value of the joint density is ). Here the term is independent of .
Since bidders are assumed to arrive at the auction via a Poisson process with rate , it follows that the number of potential bidders in any auction follows a Poisson distribution. Also, conditional on , note that are i.i.d. exponential random variables with rate . Hence, the joint density of the partial sum given is
(A.2) |
It follows that
(A.3) |
given , where are i.i.d. Uniform, and are the corresponding order statistics.
Since denotes the intermediate time between the and changes in the standing price for , it can be easily seen that
and . Since and are independent given . it follows that
(A.4) |
given , , , and . Here for .
From (A.2) and (A.3), joint density of given , , , and is equal to
(A.5) |
where , and
From (A.4) and (A.5), it follows that the conditional density of given , , , , and is equal to
(A.6) |
where is as defined above.
Since the Jacobian of the transformation from to is , combining (A.1) and (A.6) it follows that the joint density of , , , , at values (with ), , , , given is equal to
(A.7) |
where , and the arguments satisfy the constraints assuming that the arguments satisfy the constraints , , , , and (otherwise the value of the joint density is ).
Now, summing over ’s in (A.7) such that, , , ; the joint density of , , , and at values (with ), , , given is equal to
(A.8) |
where and the arguments satisfy the constraints and . Moreover, since , we have
(A.9) |
Combining (A.8) and (A.9), we get the joint density of , , , , and at values (with ), , , , and is equal to
(A.10) |
where and the arguments satisfy the constraints and . Finally, summing over in (A.10) such that , we get the joint density of , , , and at values (with ), , , and is equal to
(A.11) |
where , .
Case II: When the item is sold at the reserve price . In this case, the only bid which is higher than the reserve price remains unobserved and the value of . Moreover, we have , , and . Since the probability that given equals
(A.12) |
for , it follows using (A.9) and (A.12) that the joint density of , , , , and at values , , , and is equal to
(A.13) |
Summing over in (A.13) for , we get the joint density of , , , and at values , , and equals
(A.14) |
Case III: When the item is not sold . This situation can occur if either all the bids are less than the reserve price or no bidding happened at all. In any case . Additionally, we have , , and . Since the probability that given equals
(A.15) |
for , it follows using (A.9) and (A.15) that the joint density of , , , and at values , , , and is equal to
(A.16) |
Summing over in (A.16) such that , we get the joint density of , , , and at values , , and equals
(A.17) |
Appendix B Proof of Lemma 2.2
For every , we define . In other words, for every . Also, let (with and ). Fix arbitrarily. We now define on . Note that any element of in this open interval has to be a reserve price for one of the auctions in the dataset. First,
If , the defining task is accomplished. Otherwise, for every such that , we define
Hence, has now been defined on .
We now consider two scenarios. If , then define for . It follows from the above construction that . For every , note that
Since , it follows that
or equivalently
Since and match on all elements of by the above construction, we also have for every . It follows by Eq. (2.4) in the main paper that .
On the other hand, if , we define
and
Hence, and match on all elements of , and dominates on all elements of . By the exact same arguments as in the first scenario, it follows that for every . It again follows by Eq. (2.4) in the main paper that .
The above analysis assumes that . If , then the vector is empty. It follows from Eq. (2.4) in the main paper that depends on only through , and is non-decreasing in each of these elements. In this case, let denote the CDF corresponding to the distribution which puts a point mass at zero. Then, and .
Appendix C An approach for estimating median bias for and
In all of our experiments and illustrations, the HulC approach in Kuchibhotla, Balakrishnan and Wasserman (2021) is used to obtain 90% confidence bands for the estimators and . This approach, however, assumes median unbiasedness of the underlying estimator. Since and are not median unbiased, estimates of their respective median biases, denoted by
are needed for an accurate application of the HulC method. Here denotes the true population valuation distribution for the product under consideration (assumed to be absolutely continuous).
To obtain approximations for and , consider a scenario where we observe instead of (with the corresponding population valuation distribution now Uniform). Let , , and , denote the corresponding initial estimator, discrete/step initial estimator, unconstrained MLE and the constrained MLE obtained by applying our approach to the transformed data. Since is a strictly monotone transformation, the relative ordering of the bid values is left intact. Hence, , the number of selling price changes throughout the course of the auction, is left unchanged for the transformed data. It follows that the procedure described after Equation (2.2) in the main paper produces the same estimate for the original and transformed data. Since if and only if , it follows that and for every . Here and respectively denote the final selling price and first non-reserve standing price based initial estimates for the transformed data. Based on the procedure described in Step III of Section 3.2 in the main paper, it follows that
whereas
It is clear that and only differ in the interval . The difference of values in this interval arises due to the different nature of interpolation used in the two functions (linear in vs. linear in ). If the above interval length is reasonably small and the derivative of is relatively well-behaved in this interval, then and should be reasonably close. Since and are continuous versions (via linear interpolation) of and respectively, the arguments above lead us to the approximation
(C.1) |
Since the underlying bids for the transformed data are uniformly distributed (note that implies for absolutely continuous ), the rightmost expression in (C.1) can be estimated using Monte Carlo.
We now focus on the MLE. Again, given that the transformed data and the original data share the same relative ordering of the bid values and the arguments above, it follows that the profile likelihood for the transformed data is exactly same as the original data (see Equation (2.8) in the main paper), with only one difference. The variable for the transformed data is defined now as , as opposed to . It follows that
Since the constrained MLE (at the data points) is obtained by constraining the values for the indices to be equal to the initial estimator, and linear interpolation is used to get values of the constrained MLE at non-data points, similar considerations as above lead us to the approximation
(C.2) |
Again, since the underlying bids for the transformed data are uniformly distributed, the rightmost expression in (C.2) can be estimated using Monte Carlo.
We performed simulation studies with the number of auctions , and various choices of the true valuation distribution such as Beta, Gamma and Uniform. The above approximation to the median bias works generally well in most settings. Even when this approximate is not very accurate, the approximation error is and not significant enough to make a perceptible difference in the resulting confidence curves.