Dynamic Pricing and Learning under the Bass Model
Abstract
We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters that are linked to the so-called “innovation” and “imitation” effects. Unlike the more commonly used i.i.d. and contextual demand models, in this model the posted price not only affects the demand and the revenue in the current round but also the future evolution of demand, and hence the fraction of potential market size that can be ultimately captured. Finding a revenue-maximizing dynamic pricing policy in this model is non-trivial even in the full information case, where model parameters are known. In this paper, we consider the more challenging incomplete information problem where dynamic pricing is applied in conjunction with learning the unknown model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon of length . Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size and time horizon .
Our main contribution is the development of an algorithm that satisfies a high probability regret guarantee of order ; where the market size is known a priori. Moreover, we show that no algorithm can incur smaller order of loss by deriving a matching lower bound. Unlike most regret analysis results, in the present problem the market size is the fundamental driver of the complexity; our lower bound in fact, indicates that for any fixed , most non-trivial instances of the problem have constant and large . We believe that this insight sets the problem of dynamic pricing under the Bass model apart from the typical i.i.d. setting and multi-armed bandit based models for dynamic pricing, which typically focus only on the asymptotics with respect to time horizon .
1 Introduction
Background and motivation
The dynamic pricing and learning literature, often referred to as “learning and earning,” has at its focal point the objective of maximizing revenues jointly with inferring the structure of a demand model that is a priori not known to the decision maker. It is an extremely active area of current research that can essentially be traced back to two strands of work. Within the computer science community, the first paper on the topic is [26] that studies a posted price auction with infinite inventory in which the seller does not know the willingness-to-pay (or valuation) of buyers and must learn it over repeated interactions. The problem is stateless, and demand is independent from period to period; as such, it is amenable to a multi-armed bandit-type approach and in fact can be solved near-optimally (in the sense of regret bounds) using an “explore-first-exploit-later” pricing and learning strategy. Various refinements and improvements have been proposed since in what has become a very active field of study in economics, computer science and operations research. The second strand of work originates in the operations research community [7] which focuses on the same finite horizon regret criteria in the posted-price auction problem but with limited inventory. This problem is sometimes referenced as “bandits with knapsacks” due to the follow up work of [2] and subsequent papers. In that problem, the learning objective is more subtle as the system “state” is changing over time (in the dynamic programming full information solution, this state is given by the level of remaining inventory and time). For further references and historical notes on the development of the topic see, e.g., the recent survey [9].
Most of the literature that has evolved from the inception points identified above has focused on a relatively simple setting where given current pricing decision, demand is independent of past actions and demand values. In addition, much of the literature has focused on the “stateless” problem setting, which provides further simplification and tractability. With the evolution of online platforms and marketplaces, the focus on such homogeneous modeling environments is becoming increasingly less realistic. For example, platforms now rely more and more on online reviews and ratings to inform and guide consumers. Product quality information is also increasingly available on online blogs, discussion forums, and social networks, that create further word-of-mouth effects. One clear implication on the dynamic pricing and learning problem is that the demand environment can no longer assumed to be static, meaning the demand model may change from one time instance to the next; for example, in the context of online reviews, sales of the product trigger reviews/ratings, and these in turn influence subsequent demand behavior etc.
While it is possible to model demand shifts and non-stationarities in a reasonably flexible (nonparametric) manner within the dynamic pricing and learning problem (see, e.g., [25], and [6] for a general MAB formulation), this approach can be too broad and unstructured to be effective in practical dynamic pricing settings. To that end, product diffusion models, such as the popular Bass model [3, 4], are known to be extremely robust and parsimonious, capturing aforementioned word-of-mouth and imitation effects on the growth in sales of a new product. The Bass diffusion model, originally proposed by Frank Bass in 1969 [3] has been extremely influential in marketing and management science, often described as one of the most celebrated empirical generalizations in marketing. It describes the process by which new products get adopted as an interaction between existing users and potential new users. The Bass model is attractive insofar as it creates a state-dependent evolution of market response which is well aligned with the impact of recent technological developments, such as online review platforms, on the customer purchase behavior. To that end, several recent empirical studies in marketing science and econometrics utilize abundant social data from online platforms to quantify the impact of word-of-mouth effect on consumer purchase behaviors and a new product diffusion process (e.g., [34, 14, 16, 19, 23, 36], also see [15, 29] for literature surveys).
Motivated by these observations, the objective of this paper is to investigate a novel formulation of the the dynamic pricing and demand learning problem, where the evolution of demand is governed by the Bass diffusion model, and where the parameters of this model are unknown a priori and need to be learned from repeated interactions with the market.
The proposed model and main problem studied
The Bass diffusion model [3, 4] has two parameters: the “coefficient of innovation” representing external influence or advertising effect; and the “coefficient of imitation” representing internal influence or word-of-mouth effect. The principle on which we incorporate this model is that “the probability of adopting by those who have not yet adopted is a linear function of those who had previously adopted.” More precisely, the likelihood that a potential buyer buys a new item at time , given that he or she has not yet bought it, is a linear function of the proportion of buyers at time :
(1) |
where and are the distribution and density functions of first-purchase times, respectively. Here, the parameter is the above referenced “coefficient of innovation”, and is the “coefficient of imitation.” Let be the number of potential buyers, i.e., the market size, and let be fraction of customers who has already adopted the product until time . Then, represents the cumulative sales (aka adoptions) up until time , and is the size of remaining market yet to be captured. The instantaneous sales at time , can then be expressed as
(2) |
A generalization of the Bass diffusion model that can be harnessed for the dynamic pricing context was proposed by Robinson and Lakhani [32]. In the latter model, denotes the price posted at time . Then, given previous adoption level the number of new adoptions at time instant is given by
(3) |
Thus, the current price affects not only the immediate new adoptions and revenue, but also future adoptions due to its dependence on the adoption level , which essentially forms the state at time in this stateful demand model. Finding a revenue-maximizing optimal pricing trajectory in such a dynamic demand model is non-trivial even when the model parameters are known, e.g., see [17] for some characterizations.
In this paper, we consider a stochastic variant of the above Bass model, where customers arrive sequentially and the number of customer arrivals (aka adoptions) until time is a non-homogeneous Poisson process with rate given by the right hand side of (3). More details on the stochastic model are provided in Section 2 where we describe the full problem formulation and performance objectives.
The problem of dynamic pricing under demand learning can then be described, informally, as follows: the learner (seller) is required to dynamically choose prices to be posted at time , where is chosen based on the past observations that include the number of customers arrived so far, their times of arrival, and the price decisions made in the past. The number of customers arriving until any time is given by the stochastic Bass model. Note that we use the term “customer arrival” and “customer adoption” interchangeably to mean the same thing, i.e., every customer arriving at time adopts the product and pays the price . We assume that the size of the market and the horizon are known to the learner, but the Bass model parameters (i.e., coefficient of innovation and coefficient of imitation) are unknown. The aim is to maximize the cumulative revenue over the horizon , i.e., via a suitable sequence of prices, where is the price paid by the customer and is the total number of adopters until time . In essence, we will not be directly maximizing this quantity but rather, and much in line with the online learning and multi-armed bandits literature, will focus on evaluating a suitable notion of regret and establishing “good” regret performance of our proposed pricing and learning algorithm.
Main contributions
The paper makes two significant advances in the study of the aforementioned Bass model learning and earning problem. First, we present a learning and pricing algorithm that achieves a high probability regret bound. Second, under certain mild restrictions on algorithm design, we provide a matching lower bound, showing that any algorithm must incur order regret for this problem. The precise statements of these results are provided as Theorem 4.1 and Theorem 5.1, in Section 4 and 5 respectively. Hence the “price” of incomplete information and the “cost” of learning it on the fly, are reasonably small. Unlike most regret analysis results, in the present setting the market size is the fundamental driver of the problem complexity; our lower bound, in fact, indicates that for any fixed , most non-trivial instances of the problem have constant and large . This is also reflected in the regret of our algorithm which depends sublinearly on market size , and only logarithmically on the time horizon . This insight sets the problem of dynamic pricing under Bass model uniquely apart from the typical i.i.d. and multi-armed bandit-based models for dynamic pricing.
Other Related Work
Several other models of non-stationary demand have been considered in recent literature on dynamic pricing and demand learning. [10] studies an additive demand model where the market condition is determined by the sum of an unknown market process and an adjustment term that is a known function of the price. The paper numerically studies several sliding window and weight decay based algorithms, including one experiment where their pricing strategy is tested on the Bass diffusion model. However, they do not provide any regret guarantees under the Bass model. [8] and [13] consider the pricing problem under a piece-wise stationary demand model. [25] consider a two-parametric linear demand model whose parameters are indexed by time, and the total quadratic variation of the parameters are bounded. Many of the existing dynamic pricing and learning algorithms borrow ideas from related settings in multi-armed bandit literature. [21] and [11] study piecewise stationary bandits based on sliding window and change point detection techniques. [6, 33] and [30] study regret based on an appropriate measure of reward variation. In rested and restless bandits literature, [35] and [5] also study a related problem where the reward distribution depends on the state which follows an unknown underlying stochastic process. However, the above-mentioned models of non-stationarity are typically very broad and unstructured, and fundamentally different from the stateful structured Bass diffusion model studied here.
There is also much recent work on learning and regret minimization in stateful models using general MDP and reinforcement learning frameworks (for example [37, 1, 24]). However, these results typically rely on having settings where each state can be visited many times over the learning process. This is ensured either because of an episodic MDP setting (e.g., [37]), or through an assumption of communicating or weakly communicating MDP with bounded diameter, i.e., a bound on the number of steps to visit any state from any starting state under the optimal policy (e.g., see [24, 1]). Our setting is non-episodic, and every state is transient – the state is described by the number of cumulative adopters so far and the remaining time, both of which can only move in one direction. Therefore, the results in the above and related papers on learning general MDPs are not applicable to our problem setting.
In the marketing literature, there is significant work on using the Bass model to forecast demand before launching a product. Stochastic variants of Bass model have also been considered previously, e.g., in [22, 31]. In addition to the work mentioned in the introduction, [28, 20, 38, 22] present empirical methods for using historical data from similar products or from past years to estimate Bass model parameters, in order to obtain reliable forecasts. In this context, we believe our proposed dynamic pricing and learning algorithm may provide an efficient method for adaptively estimating the Bass model parameters while optimizing revenue. However, the focus of this paper is on providing theoretical performance guarantees; the empirical performance of the proposed method has not been studied.
The work most closely related to ours is a parallel recent paper by [39]. In [39], the authors consider a dynamic pricing problem under a similar stochastic Bass model setting as the one studied here. They propose an algorithm based on Maximum Likelihood Estimation (MLE) that is claimed to achieve a logarithmic regret bound of . At first glance this regret bound seems to contradict our lower bound111Technically our lower bound only holds for algorithms that satisfy certain conditions (Assumption 1 and 2) stated in Section 5. However, we believe it is still applicable to the algorithm in [39]. Their algorithm with its constant upper bound on prices seems to satisfy both of our assumptions. of . However, it appears the results in [39] may have some mistakes (see, in particular, the current statement and proofs of Lemma 3, Theorem 5 and Theorem 6 in [39] which we believe contain the aforementioned inconsistencies). To the best of our understanding, these inconsistencies cannot be fixed without changing the current results in a significant way.
2 Problem formulation and preliminaries
The stochastic model and problem primitives
There is an unlimited supply of durable goods of a single type to be sold in a market with customers. We consider a dynamic pricing problem with sequential customer arrivals over a time interval . We denote by , the number of arrivals by time ; with . At any given time , the seller observes , the number of arrivals so far as well as their arrival times , and chooses a new price to be posted for times . The seller can use the past information to update the price any number of times until the end of time horizon . As mentioned earlier, in our formulation a customer arriving at time immediately adopts the product and pays the posted price , and therefore the terms “adoption” and “arrival” are used interchangeably throughout the text.
The customer arrival process for any pricing policy is given by a stochastic Bass diffusion model with (unknown) parameters . In this model, number of arrivals by time is a non-homogeneous Poisson point process 222 A counting process is called a non-homogeneous Poisson process with rate if all the following conditions hold: (a) ; (b) has independent increments; (c) for any we have , , . with rate given by (3), the adoption rate in deterministic Bass diffusion model [32]. That is,
where . For convenience of notation, we define function
so that , with .
The seller’s total revenue is simply the sum of the prices paid by the customers who arrived before time , under the dynamic pricing algorithm used by the seller. We denote by , the price paid by customer, i.e., . Then, the revenue over is given by:
The optimal dynamic pricing policy is defined as the one that maximizes the total expected revenue . We denote by , the total expected revenue under the optimal dynamic pricing policy. Then, regret is defined as the difference between the optimal expected revenue and seller’s revenue, i.e.,
(4) |
In this paper, we aim to provide a dynamic learning and pricing algorithm with a high probability upper bound of on regret, as well as a closely matching lower bound. Instead of analyzing the regret directly, we define a notion of “pseudo-regret,” which measures regret against the optimal revenue under the deterministic Bass model (). This is useful because as we will show later in (6), there is a simple expression for the optimal prices in the deterministic Bass model. This can be leveraged using a fluid model approach, widely used in the analysis of stochastic systems, which targets a deterministic “skeleton” as a stepping stone toward developing insights for policy design and complexity drivers. Later, we show more formally that the pseudo-regret upper bounds the regret measure , defined earlier, and further is within of . This justifies our focus on the pseudo-regret for both the purpose of developing lower bounds, as well as analysis and upper bounds on policy performance.
Next, we discuss the expressions for optimal pricing policy and optimal revenue under the deterministic Bass model, and use those to define the notion of pseudo-regret.
Optimal revenue and pricing policy under the deterministic Bass model
Recall (refer to introduction) that the adoption process under the deterministic Bass model is described as follows: given the current adoption level and price at time , the new adoptions are generated deterministically with rate 333A subtle but important difference between this deterministic model vs. the stochastic Bass model introduced earlier is that in the deterministic model, the increment in adoptions is fractional. On the other hand, in our stochastic model, the number of customer arrivals (or adoptions) is a counting process with discrete increments. This difference will be taken into account later when we compare our pseudo-regret to the original definition of regret.
(5) |
The optimal price curve for the deterministic Bass model is then defined as the price trajectory that maximizes the total revenue under the above adoption process. We denote by the total revenue under the optimal price curve.
An analytic expression for the optimal price curve under the deterministic Bass model is in fact known. It can be derived using optimal control theory (see Equation (8) of [17]). For a Bass model with parameters , horizon , and initial adoption level , the price at adoption level in the optimal price curve is given by the following expression:
(6) |
where , the adoption level at the end of horizon , is given by the following equations:
(7) | |||
or, more explicitly | |||
(8) |
For completeness, a derivation of the above expression of is included in Appendix B. Using the above notation for optimal price curve, we can write , the optimal total revenue, as:
Pseudo-Regret
We define pseudo-regret as the difference between , the optimal total revenue in the deterministic Bass model, and the algorithm’s total revenue . That is,
(9) |
Essentially, pseduo-regret replaces the benchmark of stochastic optimal revenue used in the regret definition by deterministic optimal revenue . We show that in fact the deterministic optimal revenue is a stronger benchmark, in the sense that it is always larger than the stochastic optimal revenue. Furthermore, we show that it is within of the stochastic optimal revenue. To prove this relation between the two benchmarks we demonstrate a concavity property of deterministic optimal revenue which is crucial for our results. Specifically, we define an expanded notation as the deterministic optimal revenue starting from adoption level and remaining time . Note that . Then, we show that is concave in for any , and any market parameters . More precisely, we prove the following key lemma.
Lemma 2.1 (Concavity of deterministic optimal revenue).
For any deterministic Bass model, , defined as the optimal revenue starting from adoption level and remaining time , is concave in , for all , and all adoption levels .
Using these observations, we can prove the following relation between and ; all proofs can be found in the appendix.
Lemma 2.2 (Pseduo-regret is close to regret).
For any ,
A simpler summary of this result can be stated as
Therefore, an upper bound on implies the same upper bound on . And, a lower bound on implies a lower bound on within . In the rest of the paper, we therefore derive bounds on pseudo-regret only.
Notation conventions
Throughout the paper, if a potentially fractional number (like , , etc.) is used as a whole number (for example as number of customers or as boundary of a discrete sum) without a floor or ceiling operation, the reader should assume that it is rounded down to its nearest integer.
3 Algorithm Description
The concavity property of deterministic optimal revenue and the implied relation between pseudo-regret and regret derived in Lemma 2.2 suggests that deterministic optimal revenue provides a benchmark that is comparable to the stochastic optimal revenue. Further, this benchmark is more tractable than the stochastic optimal due to the known and relatively simple analytical expressions for optimal pricing policy, as stated in Section 2. Using these insights, our algorithm is designed to essentially follow (an estimate of) the optimal price curve for the deterministic model at every time. We believe this approach could be applied to other finite horizon MDP problems where such concavity property holds, which may be of independent interest. We now describe our algorithm.
Algorithm outline
Our algorithm is provided, as input, the market size , and a constant upper bound on . For many applications, it is reasonable to assume that , i.e., , but we allow for more general settings. The market parameters are unknown to the algorithm.
The algorithm alternates between using a low “exploratory” price aimed at observing demand in order to improve the estimates of model parameters, and using (a lower confidence bound on) the deterministic optimal prices for the estimated market parameters. Setting the exploratory price as suffices for our analysis, but more generally it can be set as any lower bound on the deterministic optimal prices, i.e., we just need . Using a non-zero exploratory price could be better in practice.
Our algorithm changes price only on arrival of a new customer, and holds the price constant in between two arrivals. The prices are set as follows. The algorithm starts with using an exploratory price for the first customers, where . The prices and the observed arrival times for the first customers are then used to obtain an estimation of , and a high probability error bound on this estimate. The estimate of parameter is not updated in subsequent time steps. The algorithm then proceeds in epochs . Let . Epoch starts right after customer arrives and ends either when customer arrives, or when we reach the end of the planning horizon . In the beginning of each epoch , the exploratory price is again offered for the first customers in that epoch. The arrival times observed from these customers are used to update the estimate of the parameter and its’ high probability error bound . For the remaining customers in epoch , the algorithm offers a lower confidence bound on the deterministic optimal price computed using the current estimates of and their error bounds.
Estimation and price computation details
Since our algorithm fixes prices between two arrivals, the arrival rate of the (Poisson) adoption process is constant in between arrivals, which in turn implies that the inter-arrival times are exponential random variables. This greatly simplifies the estimation procedure for , which we describe next.
Estimates , and for every epoch are calculated using the following equations which match the observed inter-arrival times to the expected value of the corresponding exponential random variables:
(10) |
(11) |
Also, we define
(12) |
Then, using concentration results for exponential random variables, we can show that the error in estimates and are bounded and respectively. Specifically, we prove the following result. The proof is in Appendix D.
Lemma 3.1.
Assume . Then with probability , . And for any , with probability , .
Note that only the estimate and error bounds ( and ) for the parameter are updated in every epoch. The estimate and error bound ( and ) for is computed once in the beginning and remains fixed throughout the algorithm. Given , we define the price to be used by the algorithm in epoch as follows. For any ,
(13) |
Here is the optimal deterministic price curve as given by (6). In above, we clamped the price to the range . That is, if the computed price is less than we set to ; and if it is above , which is an upper bound on the optimal price (proven later in Lemma 4.4), we set equal to this upper bound.
Later in Lemma 4.1, we show that the quantities play the role of Lipschitz constants for the optimal price curve: they provide high probability upper bounds on the derivatives of the optimal price curve with respect to respectively. Consequently (in Corollary 4.1), we can show that with high probability the price defined in (13) will be lower than the corresponding deterministic optimal price. The reason that the algorithm uses a lower confidence bound on the optimal price is that we want to acquire at least as many customers in time as the optimal trajectory. The intuition here is that losing customers (and the entire revenue associated with those customers) can cost a lot more than losing a little bit of revenue from each customer.
Our algorithm is described in detail as Algorithm 1.
4 Regret Upper Bound
The main result from this section is the following upper bound on the pseudo-regret of the algorithm proposed in the previous section. Since pseudo-regret upper bounds regret (refer to Lemma 2.2) it directly implies the same upper bound on .
Theorem 4.1 (Regret upper bound).
For any market with parameters , market size , and time horizon , Algorithm 1 achieves the following regret bound with probability ,
Proof Intuition
We first give an intuitive explanation for why our algorithm works well. As mentioned earlier in the introductory sections, there is a simple closed form expression for the optimal prices in the deterministic model for any (see (6)–(8)). Moreover, the definition of pseudo-regret and Lemma 2.2 allows us to replace the stochastic optimal revenue with the deterministic optimal revenue . Our algorithmic strategy is then to follow (an estimate of) the deterministic optimal price trajectory, and show that the resulting revenue is close to the deterministic optimal revenue with high probability.
To prove this, we show that the prices used by our algorithm were set so that they lower bound the deterministic optimal price with high probability. Intuitively, using a lower price would ensure that the algorithm sees at least as many as optimal number of customer adoptions in horizon , so that the gap in revenue can be bounded simply by the gap in prices paid by those customers. The final piece of the puzzle is to show that we can learn or estimate at a fast enough rate so that the estimated prices are increasingly close to the optimal price and we do not lose too much revenue from learning.
Proof Outline
Step 1 (Bounding the estimation errors)
In Lemma 3.1 we provided a high probability upper bound on the estimation error in in each epoch of the algorithm. Using these error bounds and the definition of price in (13), we can show that the prices offered by the algorithm are close to optimal and lower bound the corresponding optimal price with high probability. Specifically, we prove the following result.
Lemma 4.1 (Error bounds for estimated prices).
Given any market parameters and their estimations that satisfy , , , , then for every ,
where are as defined in (13).
From the expressions of (error bound for ) vs. (error bound for in epoch ) in (12), observe that and . Therefore, the estimation of is the bottleneck here: it has an extra factor in the error bound. This is because the imitation factor is multiplied with the current adoption level in the definition of the Bass model (see (2)). This means that the estimation error on is likely to be large when the adoption level is low. This is why the algorithm needs to keep updating the estimate of in each epoch but not .
The above lemma and the definition of immediately implies the following.
Step 2 (Bounding the revenue loss due to lower price).
Using the fact that there are customers in epoch , and the price difference bound in Lemma 4.1 in Step 1, we can show that we lose at most revenue per epoch. This loss bound per epoch is when compared to the optimal revenue earned from the same customers, assuming that the stochastic trajectory is given as much time as needed to reach the same number of customers. More precisely, let denote the algorithm’s revenue from the customers in an epoch completed by the algorithm, and denote the potential revenue from those customers if they paid the deterministic optimal price instead. Then, we prove the following bound.
Lemma 4.2.
For any epoch in the algorithm, with probability ,
Step 3 (Bounding the revenue loss due to fewer adoptions).
In the previous step, we upper bounded the potential revenue loss for the customers that arrive in any epoch of the algorithm. However it is possible that the algorithm reaches the end of time horizon early and never even get to observe some customers, i.e. . In that case the algorithm would incur an additional regret due to fewer adoptions. Therefore, we need to show that our total number of adoptions in time cannot be much lower than that in the optimal trajectory. To show this, we use the observation made in Step 1 that in each epoch the algorithm offered a lower confidence bound on the optimal prices (note that the exploration price is always a lower bound on the optimal prices). Due to the use of lower prices, we can show that with high probability our final number of adoptions can be at most below the optimal number of adoptions in the deterministic Bass model. Further, we can prove an upper bound on the optimal price, which allows us to easily bound the revenue loss that may result from the small gap in number of adoptions. Lemma 4.3 and Lemma 4.4 make these results precise.
Lemma 4.3.
If the seller follows Algorithm 1, then with probability at least , the number of adoptions at the end of time horizon is lower bounded as:
Lemma 4.4 (Upper bound on optimal prices).
All prices in the optimal price curve for deterministic Bass model are upper bounded as:
These two results combined tell us that, compared to the optimal, we lose at most revenue due to fewer adoptions. The dominant term in regret comes from the seller losing roughly revenue on each customer, which led to revenue loss per epoch in Step 2.
Step 4 (Putting it all together).
Finally, we put the previous steps together to prove Theorem 4.1. Since the number of customers in each epoch grows geometrically and there are at most customers in total, the number of epochs is bounded by . By Step 2, the algorithm loses at most revenue on adoptions in each epoch. By Step 3, it loses at most revenue due to missed adoptions. The total regret is therefore bounded by .
5 Regret Lower Bound
We prove a lower bound on the regret of any dynamic pricing and learning algorithm under the following mild assumptions on algorithm design.
Assumption 1.
The algorithm can change price only on arrival of a new customer. The price is held constant between two arrivals.
Assumption 2.
Given a planning horizon , the price offered by the algorithm at any time is upper bounded by , for some function of .
The above assumptions are indeed satisfied by Algorithm 1 since it changes prices only on arrival of a new customer, and the prices offered are clamped to the range (refer to (13)) where is a constant upper bound on . These assumptions are also satisfied by the optimal dynamic pricing policy for the deterministic Bass model since the optimal prices are bounded by (refer to Lemma 4.4). Note that since customer arrivals are continuous in the deterministic Bass model, Assumption 1 is vacuous in that setting.
We argue that these assumptions do not significantly handicap an algorithm and preserve the difficulty of the problem. Assuming an upper bound on the price is a common practice in dynamic pricing literature. Indeed, unlike most existing literature which assumes a constant upper bound, Assumption 2 allows the price to potentially grow with the planning horizon. Intuitively, given enough time to sell the product, the seller should be able to potentially increase prices in exchange for a slower adoption rate.444 In Lemma F.1 in Appendix F, we show that if a constant upper bound is required on the price, then the problems becomes trivial for any ). Such a dynamics is observed in the optimal dynamic pricing policy for deterministic Bass model, where the price can grow with the time horizon. However, the optimal prices are still uniformly upper bounded by .
Furthermore, in the proof of Lemma 2.2 (specifically, refer to Lemma C.2 in Appendix C), we show that there exist pricing policies in the stochastic Bass model that satisfy both the above assumptions and achieve an expected revenue that is at most additive factor away from the deterministic optimal revenue . Since the lower bound provided in this section is of the order , this indicates that removing these assumptions from algorithm design is unlikely to provide an advantage significant enough to overcome the lower bound derived here.
Our main result in the section is the following regret lower bound on any algorithm under the above assumptions.
Theorem 5.1 (Regret lower bound).
Fix any , and . Then, given any pricing algorithm satisfying Assumption 1 and 2, there exist Bass model parameters with such that the expected regret of the algorithm in this market has the following lower bound:
Here the expectation is taken with respect to the stochasticity in the Bass model as well as any randomness in the algorithm.
Note that given the relation between pseudo-regret and regret in Lemma 2.2, the above theorem directly implies an lower bound on .
Lower bound implications
Theorem 5.1 highlights the regime of horizon where this learning problem is most challenging. In this problem, we observe that if is large, the seller can simply offer a high price for the entire time horizon and still capture most of the market, making the problem trivial. Specifically, for , under an additional assumption that there is a constant upper bound on price, we can show that an upper bound on regret can be trivially achieved by offering the maximum price at all times (see Lemma F.1). On the other hand, when , then we can show that achieving a sublinear (in ) regret is trivial for any algorithm. This is because from (7) we have that in this case the optimal number of adoptions . Furthermore, from Lemma 4.4 we know that all prices in the optimal curve can be bounded by in this case. This means that the optimal revenue is at most , i.e., sublinear in . Intuitively, for such a small , the word-out-mouth effect never comes into play (i.e., the term in the Bass model is dominated by ), making the problem uninteresting. Our lower bound result therefore pinpoints the exact order of , i.e., , where the difficult and interesting instances of this problem come from.
In the rest of this section, we describe the proof of Theorem 5.1. We first provide an intuition for our lower bound and then go over the proof outline.
Proof Intuition
We start with showing that the pseudo-regret of any algorithm can be lower bounded in terms of its cumulative pricing errors. Therefore, in order to achieve low regret, any algorithm must be able to estimate optimal prices accurately.
Next, we observe that in this problem, the main difficulty for any algorithm comes from the difficulty in estimating the market parameter . Since ’s observed influence on the arrival rate of customers is proportional to the current adoption level , one cannot estimate accurately when is small. This makes intuitive sense because when the number of adopters is small, we do not expect to be able to measure the word of mouth effect accurately. In fact, we demonstrate that for any , before the adoption level exceeds , no algorithm can distinguish two markets with Bass model parameters vs. . Further, we can show that in some problem instances (specifically for instances with ), the optimal prices for the first customers are very sensitive to the value of . Therefore, if an algorithm’s estimation of is not accurate, it cannot accurately compute the optimal price for these customers. This presents an impossible challenge for any algorithm: it needs an accurate estimate of in order to compute an accurate enough optimal price for the first customers, but it cannot possibly obtain such an accurate estimate while the adoption level is that low; thus, it must incur pricing errors resulting in the given lower bound on regret.
Proof Outline
Step 1.
First we show that the pseudo-regret of an algorithm can be lower bounded in terms of cumulative pricing errors made by the algorithm. Note that this result is not apriori obvious because as we have discussed earlier, prices have long term effects on the adoption curve: the immediate loss of revenue in the current round by offering too low of a price might be offset by the fact that we saved some time for the future rounds (lower price means faster arrival rate). On the other hand, if we offer a price that is higher than the optimal price, the resulting delay in customer arrival (higher price means slower arrival rate) may lead to less remaining time and fewer adoptions, which could be more harmful than the immediate extra revenue. The result proven here is crucial for precisely quantifying these tradeoffs and lower bounding the regret in terms of pricing errors.
To obtain this result, we first lower bound the impact of offering a suboptimal price at any time, on the remaining value in the deterministic Bass model. Given any current adoption level and remaining time , the impact or ‘disAdvantage’ of price is defined as the total decrease in value over the remaining time when is offered for an infinitesimal time and then optimal policy is followed. That is,
We prove the following lemma lower bounding this quantity.
Lemma 5.1.
At any adoption level and remaining time , the disadvantage of offering a suboptimal price in the deterministic Bass model is lower bounded as:
where denotes the optimal price at .
Now recall that pseudo-regret is defined as the total difference in revenue of the algorithm, which is potentially offering suboptimal prices, compared to the deterministic optimal revenue . We use the above result to quantify the impact of offering suboptimal prices for the first customers, along with a bound on difference in stochastic vs. deterministic optimal revenue, to obtain the following lower bound on the pseudo-regret:
Here, denotes the price trajectory obtained on using the algorithm’s prices in the deterministic Bass model, and denote the prices that would minimize the disadvantage at each point in this trajectory. A more precise statement of this result is in Lemma E.2 in Appendix E.
The remaining proof focuses on lower bounding the cumulative difference in pricing errors that any algorithm must make for the first customers.
Step 2.
Consider two markets with parameters , and for some constant . We show that for the first customers, any pricing algorithm will be “wrong” with a constant probability. Here by being wrong we mean that the algorithm will set a price that is closer to the optimal price of the other market. Lemma E.3 and Corollary E.1 formalize this idea. The proof is based on a standard information theoretic analysis using KL-divergence.
Step 3.
Step 4.
Finally, we put together the observations made in the previous three steps to prove Theorem 5.1. We have shown that with constant probability any algorithm will make a pricing mistake for the first customers [Step 2] and that this mistake will be large (on the order of ) [Step 3] under the condition that . We also have a lower bound on pseudo-regret in terms of total (square of) pricing errors made over the first customers [Step 1]. Combining these observations with an appropriately chosen gives us that the pseudo-regret is lower bounded by .
All the missing details of this proof can be found in Appendix E.
6 Conclusions
In this paper we investigated a novel formulation of dynamic pricing and learning, with a non-stationary demand process governed by an unknown stochastic Bass model. Deriving an optimal policy in this stochastic model is challenging even when the model parameters are known. Here, we presented an online algorithm that learns to price from past observations without a priori knowledge of the model parameters. A key insight that we derive and utilize in our algorithm design is the concavity of the optimal value in the deterministic Bass model setting. Using this concavity property, we can show that the optimal value in the deterministic Bass model is always higher than in the stochastic model, and therefore can be used as a stronger benchmark to compete with. Based on this insight, the main algorithmic idea is to follow the well-known optimal price curve for the deterministic model but with estimated model parameters substituted in place of their true values.
Our main technical result is an upper bound of on the regret of our algorithm in a market of size , along with a matching lower bound of under mild restrictions on algorithm design. Thus, our algorithm has close to the best performance achievable by any algorithm for this problem. The derivation of our lower bound is especially involved, and requires deriving novel dynamic-programming based inequalities. These allow for lower bounding the loss in long-term revenue in terms of instantaneous pricing errors made by any non-anticipating algorithm.
An interesting and perhaps surprising aspect of our upper and lower bounds is the role of the horizon vs. market size . Our upper bound depends sublinearly on but only logarithmically on the horizon . And in fact our lower bound indicates that for any fixed the most interesting (and challenging) instances of this problem are characterized by which is of constant order, and large . This insight highlights the distinct nature of pricing under stateful models like the Bass model when compared to the independent demand models and multi-armed bandit based formulations where asymptotics with respect to form the main focus of the analysis. Interesting directions for future research include investigation of other stateful demand models where the concavity property and other new dynamic programming based insights derived here may be useful.
References
- [1] Shipra Agrawal and Randy Jia “Optimistic posterior sampling for reinforcement learning: worst-case regret bounds” In Advances in Neural Information Processing Systems 30, 2017
- [2] Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins “Bandits with Knapsacks” In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science IEEE, 2013
- [3] Frank M. Bass “A New Product Growth for Model Consumer Durables” In Management Science 15.5, 1969, pp. 215–227
- [4] Frank M. Bass “A New Product Growth for Model Consumer Durables” In Management Science 50, 2004, pp. 1825–1832
- [5] Dimitris Bertsimas and Jos’e Niño-Mora “Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic” In Operations Research 48.1, 2000, pp. 80–90
- [6] Omar Besbes, Yonatan Gur and Assaf Zeevi “Optimal Exploration-Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards” In Stochastic Systems 9.4, 2019, pp. 319–337
- [7] Omar Besbes and Assaf Zeevi “Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms” In Operations Research 57.6 INFORMS, 2009, pp. 1407–1420
- [8] Omar Besbes and Assaf Zeevi “On the Minimax Complexity of Pricing in a Changing Environment” In Operations Research 59.1, 2011, pp. 66–79
- [9] Arnoud V. Boer “Dynamic pricing and learning: Historical origins, current research, and new directions” In Surveys in Operations Research and Management Science 20.1, 2015, pp. 1 –18
- [10] Arnoud V. Boer “Tracking the market: Dynamic pricing and learning in a changing environment” In European Journal of Operational Research 247.3, 2015, pp. 914 –927
- [11] Yang Cao, Zheng Wen, Branislav Kveton and Yao Xie “Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit” In Proceedings of Machine Learning Research 89 PMLR, 2019, pp. 418–427
- [12] Patrick Chareka, Ottilia Chareka and Sarah Kennendy “Locally sub-Gaussian random variables and the strong law of large numbers” In Atlantic Electronic Journal of Mathematics 1, 2006
- [13] Yiwei Chen, Zheng Wen and Yao Xie “Dynamic Pricing in an Evolving and Unknown Marketplace” In Available at SSRN, 2019
- [14] Judith A. Chevalier and Dina Mayzlin “The Effect of Word of Mouth on Sales: Online Book Reviews” In Journal of Marketing Research 43.3, 2006, pp. 345–354
- [15] Chrysanthos Dellarocas “The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms” In Management Science 49.10, 2003, pp. 1407–1424
- [16] Chrysanthos Dellarocas, Xiaoquan (Michael) Zhang and Neveen F. Awad “Exploring the value of online product reviews in forecasting sales: The case of motion pictures” In Journal of Interactive Marketing 21.4, 2007, pp. 23 –45
- [17] Robert J. Dolan and Abel P. Jeuland “Experience Curves and Dynamic Demand Models: Implications for Optimal Pricing Strategies” In Journal of Marketing 45.1 American Marketing Association, 1981, pp. 52–62
- [18] Kenji Doya, Shin Ishii, Alexandre Pouget and Rajesh PN Rao “Bayesian brain: Probabilistic approaches to neural coding” MIT press, 2007
- [19] Robert East, Kathy Hammond and Wendy Lomax “Measuring the impact of positive and negative word of mouth on brand purchase probability” In International Journal of Research in Marketing 25.3, 2008, pp. 215 –224
- [20] Zhi-Ping Fan, Yu-Jie Che and Zhen-Yu Chen “Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis” In Journal of Business Research 74, 2017, pp. 90 –100
- [21] Aurélien Garivier and Eric Moulines “On Upper-Confidence Bound Policies for Switching Bandit Problems” In Algorithmic Learning Theory Springer Berlin Heidelberg, 2011, pp. 174–188
- [22] Johan Grasman and Marcel Kornelis “Forecasting product sales with a stochastic Bass model” In Journal of Mathematics in Industry 9, 2, 2019
- [23] Raghuram Iyengar, Christophe Bulte and Thomas W. Valente “Opinion Leadership and Social Contagion in New Product Diffusion” In Marketing Science 30.2, 2011, pp. 195–212
- [24] Thomas Jaksch, Ronald Ortner and Peter Auer “Near-optimal Regret Bounds for Reinforcement Learning.” In Journal of Machine Learning Research 11.4, 2010
- [25] N. Bora Keskin and Assaf Zeevi “Chasing Demand: Learning and Earning in a Changing Environment” In Mathematics of Operations Research 42.2, 2017, pp. 277–307
- [26] Robert Kleinberg and Tom Leighton “The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions” In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 03 USA: IEEE Computer Society, 2003, pp. 594
- [27] Tor Lattimore and Csaba Szepesvári “Bandit algorithms” Cambridge University Press, 2020
- [28] Hakyeon Lee, Sang Gook Kim, Hyun Park and Pilsung Kang “Pre-launch new product demand forecasting using the Bass model: A statistical and machine learning-based approach” In Technological Forecasting and Social Change 86, 2014, pp. 49 –64
- [29] Donald Lehmann, Oded Netzer and Olivier Toubia “The Future of Quantitative Marketing: Results of a Survey” In Customer Needs and Solutions 2, 2015, pp. 5–18
- [30] Haipeng Luo, Chen-Yu Wei, Alekh Agarwal and John Langford “Efficient Contextual Bandits in Non-stationary Worlds” In Proceedings of the 31st Conference On Learning Theory 75 PMLR, 2018, pp. 1739–1776
- [31] Shun-Chen Niu “A stochastic formulation of the Bass model of new-product diffusion” In Mathematical Problems in Engineering 8, 2002
- [32] Bruce Robinson and Chet Lakhani “Dynamic Price Models for New-Product Planning” In Management Science 21.10, 1975, pp. 1113–1122
- [33] Yoan Russac, Claire Vernade and Olivier Cappé “Weighted Linear Bandits for Non-Stationary Environments” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
- [34] Sylvain Senecal and Jacques Nantel “The influence of online product recommendations on consumers’ online choices” In Journal of Retailing 80.2, 2004, pp. 159 –169
- [35] Cem Tekin and Mingyan Liu “Online Learning of Rested and Restless Bandits” In IEEE Transactions on Information Theory 58.8 Institute of ElectricalElectronics Engineers (IEEE), 2012, pp. 5588–5611
- [36] Olivier Toubia, Jacob Goldenberg and Rosanna Garcia “Improving penetration forecasts using social interactions data” In Management science 60.12, 2014, pp. 3049–3066
- [37] Lin Yang and Mengdi Wang “Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound” In International Conference on Machine Learning, 2020, pp. 10746–10756 PMLR
- [38] Peng Yin, Guowei Dou, Xudong Lin and Liangliang Liu “A hybrid method for forecasting new product sales based on fuzzy clustering and deep learning” In Kybernetes ahead-of-print, 2020
- [39] Mengzhenyu Zhang, Hyun-Soo Ahn and Joline Uichanco “Data-Driven Pricing for a New Product” In Available at SSRN 3545574, 2020
Appendix A A Concentration Result on Exponential Random Variables
Before moving on to the next sections, we state a concentration result that is used in many of the remaining proofs.
Lemma A.1.
Let be a sequence of random variables with filtration such that is an exponential random variable with rate . Suppose that there exists such that almost surely. Then for any ,
Appendix B Some Properties of the Deterministic Optimal Price Curve
B.1 Optimal pricing policy expression
The expressions in (6) (7) and (8) provided the expressions for the optimal price curve and for the total number of adoptions when optimal pricing policy is followed from an initial adoption level at to the end of the time horizon . Here we derive a more general expression for the optimal pricing policy that will give the optimal price at any current adoption level and remaining planning horizon (irrespective of what pricing policy was followed for how much time to reach the adoption level ). Also, we derive an expression for , the adoption level at the end of time if optimal pricing policy is followed for time starting from the adoption level at . These expressions will be especially useful in our lower bound derivations.
Note that under this expanded notation, . We sometimes also use the notation to emphasis the dependence has on the market parameters. Note that by this definition, if is the adoption level reached at time on following the optimal price trajectory from time .
The optimal price to offer at any given adoption level and remaining time can be derived using optimal control theory (see Equation (8) of [17]). and is given by the following pricing policy. Given adoption and remaining time , the optimal price is given by
(15) |
When the value of is clear from the context, we sometimes drop the dependence on , and use shorter notations and instead of and respectively.
To see the connection (and distinction) between and , note that is the price trajectory if the optimal policy is followed from to the end of horizon . That is, if denotes the adoption level and price at time on following optimal pricing policy from , then .
Now consider the adoption process on starting from an initial adoption level at time , and then following the optimal pricing policy. Again denotes the adoption level at time in this process. Plugging back into (5), it’s easy to derive that
(16) |
This means that under the optimal policy, the rate of adoption is constant. We can integrate the above from to , and also solve the resulting quadratic equation for to compute the final adoption level under the optimal pricing policy when starting from an initial adoption level at :
(17) | |||
(18) |
Note that on plugging in (18), we obtain the expression for in (8).
B.2 Price Lipschitz Bound
We start with some Lipschitz bounds on how the optimal price offered at adoption level can change with respect to . Note that here we are assuming and that the entire optimal price curve from the beginning changes if we change (i.e., we use (6) not (15)). In the following lemma, denotes the adoption curve on following the optimal pricing policy for all times starting from adoption level under the deterministic Bass model with parameters as given in (8). Using the expanded notation introduced in the previous subsection, it can also be called .
Lemma B.1.
, and
Proof.
In our proofs we will sometimes need to compare the optimal final adoption level under two different market parameters. To derive the results comparing two different market parameters, instead of , we use the notation that makes the dependence on explicit.
Corollary B.1.
.
Proof.
Lemma B.2.
For any ,
Appendix C Proof of Lemma 2.1 and Lemma 2.2
We first prove the concavity property of the deterministic optimal value stated as Lemma 2.1.
See 2.1
Proof.
For simplicity of notation, in this proof we use to denote the remaining time. The optimal value function for the continuous deterministic Bass model when the remaining time is can be expressed using the following dynamic programming equation (for all ),
(19) |
where .
Using the Hamilton-Jacobi-Bellman equation for the deterministic Bass model (see equation (12.8) in [18]):
(20) |
And the optimal price is the price that achieves the maximum in the above expression (see equation (12.9) in [18]), i.e.,
Solving the above maximization problem gives us an expression for the optimal price at state with time left.
(21) |
Now we split (22) to two parts and bound them by zero individually. First note that differentiating with respect to (using (17)) gives us
(23) |
Then, first we show that
which is equivalent to showing that
Using (17) | |||||
Using (23) | |||||
Using expression for from (18), we have
.
Then, since , we have that the fraction in the last inequality is at least , and therefore the last inequality holds.
Now we bound the remaining terms in the RHS of (22) by . This requires showing that
Using (17) | |||||
Using (23) |
Since , the last inequality holds.
Therefore the sum of all the terms in the right hand side of (22) is bounded by . This proves the lemma statement. ∎
Now we use the above concavity property to show that for any starting point and remaining time, the optimal revenue in the deterministic model is at least the optimal expected revenue in the stochastic model. Let be the optimal expected revenue one can achieve in the stochastic Bass model with current adopters and time remaining.
Lemma C.1.
For any , , and any :
Proof.
Given any , let be the random number of adoptions that take place in the next time in the discrete stochastic Bass model when the current price is and current number of adopters is . Let , then . Using dynamic programming, we have:
Also,
We use induction to prove the lemma by working from the end of the planning horizon (no time remaining). We know for all , . Suppose the inequality holds for , and any , . Then,
(using concavity from Lemma 2.1) | |||
(inductive assumption on ) | |||
Then, taking , we obtain the lemma statement. ∎
Finally, we prove the following lemma that will allow us to establish an upper bound on the deterministic optimal revenue compared to the stochastic optimal revenue.
Lemma C.2.
Fix any such that
then
Proof.
The proof constructs a fixed price sequence such that the expected revenue in the stochastic Bass model under these prices is within of the optimal deterministic revenue. Then, since the optimal expected stochastic revenue is at least as much as that obtained under the given price sequence, we will obtain the lemma statement.
Consider the following pricing scheme: for all time instances after arrival of customer, and until arrival of customer, post price given by: for . Here and are given by the optimal price curve and adoption levels for the deterministic Bass model, as defined in (6) and (8).
Now, since in our price sequence, the prices are fixed between two arrivals, we know that (in the stochastic Bass model) the inter-arrival time between customer and is an exponential random variable , where . Note that from (16) and (17) we have that , and that for any .
Let denotes the time of arrival of the customer in the stochastic model. Set , , , for some to be specified later. Assume for now, then by Lemma A.1, we have with probability :
Therefore, with probability at least , , which means that the total number of adoptions observed in the stochastic Bass model in time is at least . Let , . Then,
Note that the third step holds with probability . When it does not hold (with probability at most ), we can bound the gap between deterministic and stochastic revenue by a trivial upper bound of on the deterministic revenue. We set to get that
Finally, one can verify that the condition on implies that . Let , and assume that :
If , then the first line is implied by , which is satisfied by .
∎
The proof of Lemma 2.2 can now be completed using the upper and lower bounds on deterministic optimal revenue compared to the stochastic optimal revenue proven above.
Appendix D Upper Bound Proofs
D.1 Step 1: Bounding the estimation errors (Proof of Lemma 3.1, Lemma 4.1)
We prove the estimation bound on and separately in Lemma D.1 and Lemma D.2 respectively. Lemma 3.1 follows directly from these two results.
Lemma D.1.
Assuming that , then with probability , .
Proof.
From (10) we have the following expression for the estimator error of :
Recall that denotes the (stochastic) time of arrival of customer in the stochastic Bass model under the pricing decisions made by the algorithm. Note that in our algorithm prices do not change between customer arrivals. Therefore, the interarrival time between and customer follows an exponential distribution. Specifically, since the prices were fixed as for the first customers, we have for where and for .
Therefore, , and . Using Lemma A.1 we have:
where we set . Since , the condition for Lemma A.1 is satisfied. To bound the estimation error of :
(with probability ) | |||
Denote the above bound by . Plug this and into the expression above we have with probability :
where in the second inequality we used the fact that .
∎
Lemma D.2.
Assuming that , then for any , with probability , .
Proof.
Note that . From (11) we can bound the estimation error of as follows. We have
Therefore,
Similar to the estimation bound of , the main step is to bound the arrival times. Note that for , where we used the fact that by the construction of Algorithm 1, for all . Using Lemma A.1 we have:
where we set . Since , the condition for Lemma A.1 is satisfied. To bound the estimation error of :
Let denote this bound. Plug this result back into the bound on :
(*) | |||
where for the (*) step we used the fact that .
∎
See 4.1
Proof.
D.2 Step 2: Proof of Lemma 4.2
Since the prices that we offer in the stochastic Bass model is based on a discretized version of the continuous price curve in the deterministic Bass model, we first need to prove a result that says that this discretization does not introduce a lot of error. Lemma D.3 below shows that it only introduces a logarithmic (in ) amount of error, for any fixed .
Lemma D.3.
For any fixed and ,
where denotes an upper bound on the optimal prices for all .
Proof.
Using (6),
In below we abbreviate as .
The first inequality follows because we know is bounded below and above by and . Therefore the difference between evaluated at two different values is at most . ∎
Corollary D.1.
For any fixed ,
Proof.
Let . Since rounding introduces at most at additional difference in revenue, the result then immediately follows from Lemma D.3. ∎
For the lemma below, we define
See 4.2
Proof.
Let be the bound on the estimation error stated in Lemma D.1, Lemma D.2. Recall that first customers in every epoch are offered an exploration price . Let be the price paid by customer as specified in Algorithm 1 and (13). Recall also that is the optimal price curve as specified in (6). We use the short hand notations in the following calculations.
(Corollary D.1) | |||
(Lemma 4.1, Lemma D.1, Lemma D.2) | |||
(Lemma 4.4) |
where are defined in (13).
The second to last step follows because we know that using the definition of in (13) and Lemma 4.1, and that from Lemma D.1 and Lemma D.2 we know that the error bounds , hold with probability .
Assuming that , and , one can check this implies that and , which in turn implies that and . Applying these bounds to (13) we have:
Plug in the expressions of and , as well as the above bounds on , we have with probability ,
When , or , the regret can be trivially bounded by , where we used Lemma 4.4 to bound . Since the first component is a constant with respect to and the second is no larger than the expression from before, we are done.
∎
D.3 Step 3: Proof of Lemma 4.3 and Lemma 4.4
See 4.3
Proof.
Let be the prices that the algorithm offers for customer . Since Algorithm 1 offers either or the lower confidence bound price defined in (13), we know from Corollary 4.1, as well as Lemma D.1, Lemma D.2, that with probability , , where is the total number of epochs. This means that . Let , which by (7) is equal to .
Set , then by Lemma A.1, we have with probability :
This means that with probability ,, which means that the total number of adoptions observed in the stochastic Bass model is at least . The result follows by observing that there can be at most epochs. ∎
See 4.4
Proof.
Here we use the expanded notation of introduced in Appendix B. Using (6), we have for any :
The last step follows from the following upper bound on :
The first inequality is easy to see from (7): by replacing with , we can see that increases, and since this quantity is strictly monotone in , must be larger. And the last equality follows from solving (7) after replacing with . ∎
D.4 Step 4: Putting it all together for proof of Theorem 4.1
Proof of Theorem 4.1.
Pseudo-Regret | |||
where the per epoch regrets are bounded using Lemma 4.2 and the “uncaptured revenue” term is bounded using Lemma 4.3 and 4.4.
The inequality holds with probability , and is the total number of epochs, which is bounded by since the epoch length is defined with respect to the number of customers and increases geometrically. ∎
Appendix E Lower Bound Proofs
E.1 Step 1: missing lemmas and proofs
See 5.1
Proof.
Let denote for the remainder of this proof. Using (19), we can rewrite the left hand side of the lemma:
(24) |
where we define
(25) |
The distribution of limits is valid since both limits exist. The new quantity will help us quantify the instantaneous impact, or the (dis)Advantage, of offering a suboptimal price.
From above, note that . We derived the expression for earlier in the proof of Lemma 2.1 (equation (21)) as
We can now bound (24) using the derivative of :
To see the last step, consider which is a convex function minimized at . For , it’s easy to show that . So by standard strong convexity argument . For we have . ∎
Before we translate the above result into a regret bound in terms of cumulative pricing error, we explain the proof idea with some more details. Given any arbitrary pricing algorithm, let
be the first observations (tuples of price and inter-arrival times between customer and ) in the stochastic Bass model, on following the algorithm’s pricing policy. Here we assume that the algorithm is allowed to continue running even after the planning horizon has passed. If the algorithm is undefined after time , we assume that it offers price. We use these observations to define a continuous price trajectory as follows: set . We call the induced price trajectory of . Let denote the time when the adoption level hits in the deterministic Bass model following this induced pricing trajectory . In other words, . Note that is a stochastic quantity because it depends on stochastic trajectory , which in turn depends on the prices offered to the first customers in the stochastic model.
Recall that denotes the arrival time of customer in the stochastic model. First we show that the total time for customer arrivals in the deterministic vs. stochastic model (i.e., vs. ) under the two price trajectories ( vs. ) is roughly the same.
Lemma E.1.
Given such that . Then for any algorithm satisfying Assumption 1, with probability at least ,
where is an upper bound on the prices offered by algorithm.
Proof.
Since algorithm’s prices are fixed between arrival of two customers (by Assumption 1), we have that inter-arrival times follow an exponential distribution. That is, for any , given , the price paid by customer, follows the exponential distribution , where . Note that for all . Set , then Lemma A.1 in Appendix A provides that with probability :
Note that the condition on in Lemma A.1 is satisfied since .
On the other hand, for any , . It’s easy to check that . So
In the second to last step we canceled the first and third term from the previous step. Combining this with above, we have with probability
∎
Using and the induced pricing trajectory as defined right before Lemma E.1, we can now obtain the following result.
Lemma E.2.
Fix any . Then for any such that , and any algorithm satisfying Assumption 1 and 2,
where denotes the deterministic optimal price at adoption level , denotes the price offered by the algorithm for customer , and is the time at which adoption level reaches on following the price trajectory in the deterministic Bass model.
Proof.
Let be the trajectory of adoption levels in the deterministic Bass model on following price curve . Recall that by (5), . Let be the same price trajectory as but indexed by time.
First, note that for large enough we have , This is because the last inequality holds for , i.e., for any satisfying . Here we used the upper bound from Assumption 2. Therefore for the rest of the proof we can assume that .
(26) |
Given a particular sequence of for , and its’ induced continuous version as defined in the lemma statement, the quantity inside the first expectation brackets is the cumulative “disadvantage” that the incurs on the deterministic Bass model, where disadvantage is defined in Lemma 5.1. Therefore we can bound it as follows:
where in the last step we applied change of variables
The second part of (26) can be bounded by using Lemm C.1 and bounding the difference between and :
Now recall from (20) and (21) that and
(27) |
Using Lemma E.1, which bounds the difference between and , we have with probability :
Since , we can set and use Assumption 2 to conclude that
(28) |
Here we also assumed that , which implies that . This satisfies the condition for Lemma E.1. If this assumption on does not hold, then since the expected pseudo-regret is lower bounded by zero, and the first term in the lemma statement is at most , we have that the lemma statement still holds for . ∎
E.2 Step 2: missing lemmas and proofs
Lemma E.3.
For any , let be sequence of prices offered and inter-arrival times () for the first customers, and let be the filtration generated by these random variables. Here, prices could have been chosen adaptively using an arbitrary strategy as long as . Let be any two fixed prices. Fix any deterministic algorithm that takes in the past observations as input and outputs a single price . Then, for any and such that , at least one of the following holds:
or, | |||
where denotes the probability distribution under the stochastic Bass model with parameters . Note that the only random quantity is , which depends on the first observations.
Proof.
Since , the probability of observing the sequence is the product of the probabilities of observing given .
The subscript denotes the fact that ’s are generated according to the stochastic Bass model with parameters . Since customer arrivals are Poisson, we know that given price , the arrival time between customer and is exponentially distributed. Specifically, in the market, and in the market, where . We can now bound the KL divergence between the joint distributions of the first observations between the two markets:
(29) |
where the second equality follows from the standard decomposition of KL divergence (see for example Lemma 15.1 of [27]) and the inequality follows from the following bound on the KL divergence of the two exponential distributions:
The KL divergence for a general pair of exponentials is
where in the last equality is some value in between and . Now let be a sequence of observations such that the policy outputs a price that is closer to . Using Pinsker’s inequality and (29), we can bound the difference in probability of observing this sequence in the two markets:
(30) |
Plugging in the upper bound on on from the lemma statement to (30) gives us .
However, suppose neither inequality in the lemma statement holds, then by the definition of , we have that for , which is a contradiction. ∎
Above lemma holds for any two prices . Readers should think of as the optimal prices for customer in the and market respectively. To reduce clutter in the following Corollary statement, let , and let be the optimal price for market i.e., if and otherwise.
Corollary E.1.
Consider market sampled uniformly at random from , let be the other market. Suppose is such that . Let be a sequence of observations generated from the market , where . Fix any pricing algorithm that outputs based on the first observations. Then for any , any such that :
where the probability is taken both with respect to the randomness from the stochastic arrival times, as well as the uniform sampling of the market parameters.
Proof.
This directly follows from Lemma E.3. The extra factor of in the probability comes from the fact that we randomly picked one market out of two. ∎
E.3 Step 3: Lipschitz bound on the optimal pricing policy
Lemma E.4.
For any remaining time and
Proof.
Differentiating both sides of (17) with respect to gives us
We omit the initial state argument and denote in the following proof.
We replaced with (18) in the last equality.
The last inequality follows from the fact that
.
In particular, if , then
. Also
The first inequality is easy to see from (18), the second follows from Corollary B.1, and the last inequality follows from the assumption on . So the above can be simplified to
Then, it easy to verify that for , ∎
E.4 Step 4: Putting it all together for proof of Theorem 5.1
We are now ready to prove our main lower bound result Theorem 5.1. In the following proof, as defined earlier in Step 1, denotes the induced price trajectory from the algorithm’s offered prices, and is the time when adoption level hits in the deterministic Bass model on following (both were defined in detail in the paragraphs before Lemma E.1).
Proof of Theorem 5.1.
Let , , . Randomly draw the Bass model parameters from with equal probabilities. We denote the chosen market as , and the other market . In the calculations that follow, we use to indicate that the expectation is taken with respect to both the random choice of Bass model parameters as well as the randomness in the stochastic Bass model.
To reduce clutter, let , and let be the optimal price i.e., if and otherwise.
Using the fact that the maximum expected regret between the two markets must be at least the average, we have that for at least one of and ,
Finally, we still need to check the assumptions of Lemma E.2, Lemma E.4 and Corollary E.1 are satisfied for large enough .
To apply Lemma E.2, we need . Using the expanded notation introduced in Appendix B, we know from Corollary B.1 that . It is easy to verify that for large enough , specifically when and , .
To apply Lemma E.4 we need the conditions and for all . For any fixed , for large enough , specifically for for , where is a function of as defined in Assumption 2, we have that . The last inequality followed from simple algebraic calculations using the assumption on and Assumption 2. Therefore, for all . This satisfies the first condition. Furthermore, for , . This satisfies the second condition.
Finally, to apply Corollary E.1 we needed the condition . By plugging in our choice of , it is easy to verify that this condition is satisfied.
∎
Appendix F Auxiliary Results
Lemma F.1.
Let be a constant upper bound on the prices that the seller can offer: . Then there exists constants independent of such that for , there is a trivial policy that achieves regret.
Proof.
Consider the trivial pricing policy where the seller sets the price to the highest possible value for the entire planning horizon. The rate of arrival of customer is given by
We now divide the market of customers into a sequence of segments with geometrically decreasing lengths. Let . Let such that . We call customers for the customers in segment .
Since the price is constant, we use the following short hand notation . Note that by construction, . So for , .
Let be the stochastic inter-arrival time between customer and , then using Lemma A.1, we can obtain the following bound on the time it takes for all customers in segment to arrive:
Setting this to and solving for , one can verify that with probability ,
Note that we needed to apply Lemma A.1. This condition is satisfied for because:
Applying a union bound to this result for all segments, we have that with probability at least , we have that
This means that if , then with probability , the realized revenue is at least . Since the maximum possible revenue is , the regret is at most . ∎