This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic Pricing and Learning under the Bass Model

Shipra Agrawal
Columbia University
Industrial Engineering and Operations Research, e-mail: [email protected]
   Steven Yin
Columbia University
Industrial Engineering and Operations Research, e-mail: [email protected]
   Assaf Zeevi
Columbia University
Graduate School of Business, e-mail: [email protected]
Abstract

We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters (α,β)(\alpha,\beta) that are linked to the so-called “innovation” and “imitation” effects. Unlike the more commonly used i.i.d. and contextual demand models, in this model the posted price not only affects the demand and the revenue in the current round but also the future evolution of demand, and hence the fraction of potential market size mm that can be ultimately captured. Finding a revenue-maximizing dynamic pricing policy in this model is non-trivial even in the full information case, where model parameters are known. In this paper, we consider the more challenging incomplete information problem where dynamic pricing is applied in conjunction with learning the unknown model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon of length TT. Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size mm and time horizon TT.

Our main contribution is the development of an algorithm that satisfies a high probability regret guarantee of order O~(m2/3)\tilde{O}(m^{2/3}); where the market size mm is known a priori. Moreover, we show that no algorithm can incur smaller order of loss by deriving a matching lower bound. Unlike most regret analysis results, in the present problem the market size mm is the fundamental driver of the complexity; our lower bound in fact, indicates that for any fixed α,β\alpha,\beta, most non-trivial instances of the problem have constant TT and large mm. We believe that this insight sets the problem of dynamic pricing under the Bass model apart from the typical i.i.d. setting and multi-armed bandit based models for dynamic pricing, which typically focus only on the asymptotics with respect to time horizon TT.

1 Introduction

Background and motivation

The dynamic pricing and learning literature, often referred to as “learning and earning,” has at its focal point the objective of maximizing revenues jointly with inferring the structure of a demand model that is a priori not known to the decision maker. It is an extremely active area of current research that can essentially be traced back to two strands of work. Within the computer science community, the first paper on the topic is [26] that studies a posted price auction with infinite inventory in which the seller does not know the willingness-to-pay (or valuation) of buyers and must learn it over repeated interactions. The problem is stateless, and demand is independent from period to period; as such, it is amenable to a multi-armed bandit-type approach and in fact can be solved near-optimally (in the sense of regret bounds) using an “explore-first-exploit-later” pricing and learning strategy. Various refinements and improvements have been proposed since in what has become a very active field of study in economics, computer science and operations research. The second strand of work originates in the operations research community [7] which focuses on the same finite horizon regret criteria in the posted-price auction problem but with limited inventory. This problem is sometimes referenced as “bandits with knapsacks” due to the follow up work of [2] and subsequent papers. In that problem, the learning objective is more subtle as the system “state” is changing over time (in the dynamic programming full information solution, this state is given by the level of remaining inventory and time). For further references and historical notes on the development of the topic see, e.g., the recent survey [9].

Most of the literature that has evolved from the inception points identified above has focused on a relatively simple setting where given current pricing decision, demand is independent of past actions and demand values. In addition, much of the literature has focused on the “stateless” problem setting, which provides further simplification and tractability. With the evolution of online platforms and marketplaces, the focus on such homogeneous modeling environments is becoming increasingly less realistic. For example, platforms now rely more and more on online reviews and ratings to inform and guide consumers. Product quality information is also increasingly available on online blogs, discussion forums, and social networks, that create further word-of-mouth effects. One clear implication on the dynamic pricing and learning problem is that the demand environment can no longer assumed to be static, meaning the demand model may change from one time instance to the next; for example, in the context of online reviews, sales of the product trigger reviews/ratings, and these in turn influence subsequent demand behavior etc.

While it is possible to model demand shifts and non-stationarities in a reasonably flexible (nonparametric) manner within the dynamic pricing and learning problem (see, e.g., [25], and [6] for a general MAB formulation), this approach can be too broad and unstructured to be effective in practical dynamic pricing settings. To that end, product diffusion models, such as the popular Bass model [3, 4], are known to be extremely robust and parsimonious, capturing aforementioned word-of-mouth and imitation effects on the growth in sales of a new product. The Bass diffusion model, originally proposed by Frank Bass in 1969 [3] has been extremely influential in marketing and management science, often described as one of the most celebrated empirical generalizations in marketing. It describes the process by which new products get adopted as an interaction between existing users and potential new users. The Bass model is attractive insofar as it creates a state-dependent evolution of market response which is well aligned with the impact of recent technological developments, such as online review platforms, on the customer purchase behavior. To that end, several recent empirical studies in marketing science and econometrics utilize abundant social data from online platforms to quantify the impact of word-of-mouth effect on consumer purchase behaviors and a new product diffusion process (e.g., [34, 14, 16, 19, 23, 36], also see [15, 29] for literature surveys).

Motivated by these observations, the objective of this paper is to investigate a novel formulation of the the dynamic pricing and demand learning problem, where the evolution of demand is governed by the Bass diffusion model, and where the parameters of this model are unknown a priori and need to be learned from repeated interactions with the market.

The proposed model and main problem studied

The Bass diffusion model [3, 4] has two parameters: the “coefficient of innovation” representing external influence or advertising effect; and the “coefficient of imitation” representing internal influence or word-of-mouth effect. The principle on which we incorporate this model is that “the probability of adopting by those who have not yet adopted is a linear function of those who had previously adopted.” More precisely, the likelihood that a potential buyer buys a new item at time tt, given that he or she has not yet bought it, is a linear function of the proportion of buyers at time tt:

f(t)1F(t)=α+βF(t)\frac{f(t)}{1-F(t)}=\alpha+\beta F(t) (1)

where F(t)F(t) and f(t)f(t) are the distribution and density functions of first-purchase times, respectively. Here, the parameter α\alpha is the above referenced “coefficient of innovation”, and β\beta is the “coefficient of imitation.” Let mm be the number of potential buyers, i.e., the market size, and let XtX_{t} be fraction of customers who has already adopted the product until time tt. Then, mXt=mF(t)mX_{t}=mF(t) represents the cumulative sales (aka adoptions) up until time tt, and m(1Xt)m(1-X_{t}) is the size of remaining market yet to be captured. The instantaneous sales at time tt, mdXtdt=mf(t)m\frac{dX_{t}}{dt}=mf(t) can then be expressed as

mdXtdt=mα(1Xt)sales due to external influence+mβXt(1Xt)sales due to internal influence or imitationm\frac{dX_{t}}{dt}=\underbrace{m\alpha(1-X_{t})}_{\text{sales due to external influence}}+\underbrace{\textstyle m\beta X_{t}(1-X_{t})}_{\text{sales due to internal influence or imitation}} (2)

A generalization of the Bass diffusion model that can be harnessed for the dynamic pricing context was proposed by Robinson and Lakhani [32]. In the latter model, ptp_{t} denotes the price posted at time tt. Then, given previous adoption level XtX_{t} the number of new adoptions at time instant tt is given by

mdXtdt=mept(α+βXt)(1Xt).m\frac{dX_{t}}{dt}=me^{-p_{t}}(\alpha+\beta X_{t})(1-X_{t}). (3)

Thus, the current price affects not only the immediate new adoptions and revenue, but also future adoptions due to its dependence on the adoption level XtX_{t}, which essentially forms the state at time tt in this stateful demand model. Finding a revenue-maximizing optimal pricing trajectory in such a dynamic demand model is non-trivial even when the model parameters are known, e.g., see [17] for some characterizations.

In this paper, we consider a stochastic variant of the above Bass model, where customers arrive sequentially and the number of customer arrivals (aka adoptions) dtd_{t} until time tt is a non-homogeneous Poisson process with rate λt\lambda_{t} given by the right hand side of (3). More details on the stochastic model are provided in Section 2 where we describe the full problem formulation and performance objectives.

The problem of dynamic pricing under demand learning can then be described, informally, as follows: the learner (seller) is required to dynamically choose prices {pt}\{p_{t}\} to be posted at time t[0,T]t\in[0,T], where ptp_{t} is chosen based on the past observations that include the number of customers arrived so far, their times of arrival, and the price decisions made in the past. The number of customers dtd_{t} arriving until any time tt is given by the stochastic Bass model. Note that we use the term “customer arrival” and “customer adoption” interchangeably to mean the same thing, i.e., every customer arriving at time tt adopts the product and pays the price ptp_{t}. We assume that the size of the market mm and the horizon TT are known to the learner, but the Bass model parameters α,β\alpha,\beta (i.e., coefficient of innovation and coefficient of imitation) are unknown. The aim is to maximize the cumulative revenue over the horizon TT, i.e.,d=1dTpd\sum_{d=1}^{d_{T}}p_{d} via a suitable sequence of prices, where pdp_{d} is the price paid by the dthd^{th} customer and dTd_{T} is the total number of adopters until time TT. In essence, we will not be directly maximizing this quantity but rather, and much in line with the online learning and multi-armed bandits literature, will focus on evaluating a suitable notion of regret and establishing “good” regret performance of our proposed pricing and learning algorithm.

Main contributions

The paper makes two significant advances in the study of the aforementioned Bass model learning and earning problem. First, we present a learning and pricing algorithm that achieves a high probability O~(m2/3)\tilde{O}\left(m^{2/3}\right) regret bound. Second, under certain mild restrictions on algorithm design, we provide a matching lower bound, showing that any algorithm must incur order Ω(m2/3)\Omega(m^{2/3}) regret for this problem. The precise statements of these results are provided as Theorem 4.1 and Theorem 5.1, in Section 4 and 5 respectively. Hence the “price” of incomplete information and the “cost” of learning it on the fly, are reasonably small. Unlike most regret analysis results, in the present setting the market size mm is the fundamental driver of the problem complexity; our lower bound, in fact, indicates that for any fixed α,β\alpha,\beta, most non-trivial instances of the problem have constant TT and large mm. This is also reflected in the regret of our algorithm which depends sublinearly on market size mm, and only logarithmically on the time horizon TT. This insight sets the problem of dynamic pricing under Bass model uniquely apart from the typical i.i.d. and multi-armed bandit-based models for dynamic pricing.

Other Related Work

Several other models of non-stationary demand have been considered in recent literature on dynamic pricing and demand learning. [10] studies an additive demand model where the market condition is determined by the sum of an unknown market process and an adjustment term that is a known function of the price. The paper numerically studies several sliding window and weight decay based algorithms, including one experiment where their pricing strategy is tested on the Bass diffusion model. However, they do not provide any regret guarantees under the Bass model. [8] and [13] consider the pricing problem under a piece-wise stationary demand model. [25] consider a two-parametric linear demand model whose parameters are indexed by time, and the total quadratic variation of the parameters are bounded. Many of the existing dynamic pricing and learning algorithms borrow ideas from related settings in multi-armed bandit literature. [21] and [11] study piecewise stationary bandits based on sliding window and change point detection techniques. [6, 33] and [30] study regret based on an appropriate measure of reward variation. In rested and restless bandits literature, [35] and [5] also study a related problem where the reward distribution depends on the state which follows an unknown underlying stochastic process. However, the above-mentioned models of non-stationarity are typically very broad and unstructured, and fundamentally different from the stateful structured Bass diffusion model studied here.

There is also much recent work on learning and regret minimization in stateful models using general MDP and reinforcement learning frameworks (for example [37, 1, 24]). However, these results typically rely on having settings where each state can be visited many times over the learning process. This is ensured either because of an episodic MDP setting (e.g., [37]), or through an assumption of communicating or weakly communicating MDP with bounded diameter, i.e., a bound on the number of steps to visit any state from any starting state under the optimal policy (e.g., see [24, 1]). Our setting is non-episodic, and every state is transient – the state is described by the number of cumulative adopters so far and the remaining time, both of which can only move in one direction. Therefore, the results in the above and related papers on learning general MDPs are not applicable to our problem setting.

In the marketing literature, there is significant work on using the Bass model to forecast demand before launching a product. Stochastic variants of Bass model have also been considered previously, e.g., in [22, 31]. In addition to the work mentioned in the introduction, [28, 20, 38, 22] present empirical methods for using historical data from similar products or from past years to estimate Bass model parameters, in order to obtain reliable forecasts. In this context, we believe our proposed dynamic pricing and learning algorithm may provide an efficient method for adaptively estimating the Bass model parameters while optimizing revenue. However, the focus of this paper is on providing theoretical performance guarantees; the empirical performance of the proposed method has not been studied.

The work most closely related to ours is a parallel recent paper by [39]. In [39], the authors consider a dynamic pricing problem under a similar stochastic Bass model setting as the one studied here. They propose an algorithm based on Maximum Likelihood Estimation (MLE) that is claimed to achieve a logarithmic regret bound of O~(log(mT))\tilde{O}(\log(mT)). At first glance this regret bound seems to contradict our lower bound111Technically our lower bound only holds for algorithms that satisfy certain conditions (Assumption 1 and 2) stated in Section 5. However, we believe it is still applicable to the algorithm in [39]. Their algorithm with its constant upper bound on prices seems to satisfy both of our assumptions. of Ω(m2/3)\Omega(m^{2/3}). However, it appears the results in [39] may have some mistakes (see, in particular, the current statement and proofs of Lemma 3, Theorem 5 and Theorem 6 in [39] which we believe contain the aforementioned inconsistencies). To the best of our understanding, these inconsistencies cannot be fixed without changing the current results in a significant way.

2 Problem formulation and preliminaries

The stochastic model and problem primitives

There is an unlimited supply of durable goods of a single type to be sold in a market with mm customers. We consider a dynamic pricing problem with sequential customer arrivals over a time interval [0,T][0,T]. We denote by dtd_{t}, the number of arrivals by time tt; with d0=0d_{0}=0. At any given time tt, the seller observes dtd_{t}, the number of arrivals so far as well as their arrival times {τ1,τ2,,τdt}\{\tau_{1},\tau_{2},\dots,\tau_{d_{t}}\}, and chooses a new price ptp_{t} to be posted for times t>tt^{\prime}>t. The seller can use the past information to update the price any number of times until the end of time horizon TT. As mentioned earlier, in our formulation a customer arriving at time tt immediately adopts the product and pays the posted price ptp_{t}, and therefore the terms “adoption” and “arrival” are used interchangeably throughout the text.

The customer arrival process for any pricing policy is given by a stochastic Bass diffusion model with (unknown) parameters α,β\alpha,\beta. In this model, number of arrivals dtd_{t} by time tt is a non-homogeneous Poisson point process 222 A counting process {dt,t0}\{d_{t},t\geq 0\} is called a non-homogeneous Poisson process with rate λt\lambda_{t} if all the following conditions hold: (a) d0=0d_{0}=0; (b) dtd_{t} has independent increments; (c) for any t0t\geq 0 we have Pr(dt+δdt=0)=1λtδ+o(δ)\Pr(d_{t+\delta}-d_{t}=0)=1-\lambda_{t}\delta+o(\delta), Pr(dt+δdt=1)=λtδ+o(δ)\Pr(d_{t+\delta}-d_{t}=1)=\lambda_{t}\delta+o(\delta), Pr(dt+δdt2)=o(δ)\Pr(d_{t+\delta}-d_{t}\geq 2)=o(\delta). with rate λt\lambda_{t} given by (3), the adoption rate in deterministic Bass diffusion model [32]. That is,

λt=mept(α+βXt)(1Xt)\lambda_{t}=me^{-p_{t}}(\alpha+\beta X_{t})(1-X_{t})

where Xt=dt/mX_{t}=d_{t}/m. For convenience of notation, we define function

λ(p,x):=mep(α+βx)(1x),\lambda(p,x):=me^{-p}(\alpha+\beta x)(1-x),

so that λt=λ(pt,Xt)\lambda_{t}=\lambda(p_{t},X_{t}), with Xt=dt/mX_{t}=d_{t}/m.

The seller’s total revenue is simply the sum of the prices paid by the customers who arrived before time TT, under the dynamic pricing algorithm used by the seller. We denote by pdp_{d}, the price paid by dthd^{th} customer, i.e., pd=pτdp_{d}=p_{\tau_{d}}. Then, the revenue over [0,T][0,T] is given by:

Rev(T)=d=1dTpd.\displaystyle\text{Rev}(T)=\sum_{d=1}^{d_{T}}p_{d}.

The optimal dynamic pricing policy is defined as the one that maximizes the total expected revenue E[Rev(T)]E[\text{Rev}(T)]. We denote by Vstoch(T)V^{\text{stoch}}(T), the total expected revenue under the optimal dynamic pricing policy. Then, regret is defined as the difference between the optimal expected revenue Vstoch(T)V^{\text{stoch}}(T) and seller’s revenue, i.e.,

Regret(T)=Vstoch(T)Rev(T).\text{Regret}(T)=V^{\text{stoch}}(T)-\text{Rev}(T). (4)

In this paper, we aim to provide a dynamic learning and pricing algorithm with a high probability upper bound of O~(m2/3)\tilde{O}(m^{2/3}) on regret, as well as a closely matching lower bound. Instead of analyzing the regret directly, we define a notion of “pseudo-regret,” which measures regret against the optimal revenue under the deterministic Bass model (Vdet(T)V^{\text{det}}(T)). This is useful because as we will show later in (6), there is a simple expression for the optimal prices in the deterministic Bass model. This can be leveraged using a fluid model approach, widely used in the analysis of stochastic systems, which targets a deterministic “skeleton” as a stepping stone toward developing insights for policy design and complexity drivers. Later, we show more formally that the pseudo-regret upper bounds the regret measure Regret(T)\text{Regret}(T), defined earlier, and further is within O~(m)\tilde{O}(\sqrt{m}) of Regret(T)\text{Regret}(T). This justifies our focus on the pseudo-regret for both the purpose of developing lower bounds, as well as analysis and upper bounds on policy performance.

Next, we discuss the expressions for optimal pricing policy and optimal revenue under the deterministic Bass model, and use those to define the notion of pseudo-regret.

Optimal revenue and pricing policy under the deterministic Bass model

Recall (refer to introduction) that the adoption process under the deterministic Bass model is described as follows: given the current adoption level XtX_{t} and price ptp_{t} at time tt, the new adoptions are generated deterministically with rate 333A subtle but important difference between this deterministic model vs. the stochastic Bass model introduced earlier is that in the deterministic model, the increment mdXtmdX_{t} in adoptions is fractional. On the other hand, in our stochastic model, the number of customer arrivals (or adoptions) dtd_{t} is a counting process with discrete increments. This difference will be taken into account later when we compare our pseudo-regret to the original definition of regret.

mdXtdt=mept(α+βXt)(1Xt).m\frac{dX_{t}}{dt}=me^{-p_{t}}(\alpha+\beta X_{t})(1-X_{t}). (5)

The optimal price curve for the deterministic Bass model is then defined as the price trajectory {pt}\{p_{t}\} that maximizes the total revenue m0Tpt𝑑Xtm\int_{0}^{T}p_{t}dX_{t} under the above adoption process. We denote by Vdet(T)V^{\text{det}}(T) the total revenue under the optimal price curve.

An analytic expression for the optimal price curve under the deterministic Bass model is in fact known. It can be derived using optimal control theory (see Equation (8) of [17]). For a Bass model with parameters α,β\alpha,\beta, horizon TT, and initial adoption level 0, the price at adoption level xx in the optimal price curve is given by the following expression:

p(x,α,β)=1+log((α+βx)(1x)(α+βXT)(1XT)),p^{*}(x,\alpha,\beta)=1+\log\left(\frac{(\alpha+\beta x)(1-x)}{(\alpha+\beta X^{*}_{T})(1-X^{*}_{T})}\right), (6)

where XTX^{*}_{T}, the adoption level at the end of horizon TT, is given by the following equations:

XT=1e(α+βXT)(1XT)T,\displaystyle X^{*}_{T}=\frac{1}{e}(\alpha+\beta X^{*}_{T})(1-X^{*}_{T})T, (7)
or, more explicitly
XT=T(βα)e+[T(βα)e]2+4αβT22βT.\displaystyle X^{*}_{T}=\frac{T(\beta-\alpha)-e+\sqrt{[T(\beta-\alpha)-e]^{2}+4\alpha\beta T^{2}}}{2\beta T}. (8)

For completeness, a derivation of the above expression of XTX^{*}_{T} is included in Appendix B. Using the above notation for optimal price curve, we can write Vdet(T)V^{\text{det}}(T), the optimal total revenue, as:

Vdet(T)=m0XTp(x,α,β)𝑑x.V^{\text{det}}(T)=m\int_{0}^{X^{*}_{T}}p^{*}(x,\alpha,\beta)\,dx.

Pseudo-Regret

We define pseudo-regret as the difference between Vdet(T)V^{\text{det}}(T), the optimal total revenue in the deterministic Bass model, and the algorithm’s total revenue Rev(T)\text{Rev}(T). That is,

Pseudo-Regret(T)=Vdet(T)Rev(T).\text{Pseudo-Regret}(T)=V^{\text{det}}(T)-\text{Rev}(T). (9)

Essentially, pseduo-regret replaces the benchmark of stochastic optimal revenue Vstoch(T)V^{\text{stoch}}(T) used in the regret definition by deterministic optimal revenue Vdet(T)V^{\text{det}}(T). We show that in fact the deterministic optimal revenue is a stronger benchmark, in the sense that it is always larger than the stochastic optimal revenue. Furthermore, we show that it is within O~(m)\tilde{O}(\sqrt{m}) of the stochastic optimal revenue. To prove this relation between the two benchmarks we demonstrate a concavity property of deterministic optimal revenue which is crucial for our results. Specifically, we define an expanded notation Vdet(x,T)V^{\text{det}}(x,T) as the deterministic optimal revenue starting from adoption level xx and remaining time TT. Note that Vdet(T)=Vdet(0,T)V^{\text{det}}(T)=V^{\text{det}}(0,T). Then, we show that Vdet(x,T)V^{\text{det}}(x,T) is concave in xx for any TT, and any market parameters m,α,βm,\alpha,\beta. More precisely, we prove the following key lemma.

Lemma 2.1 (Concavity of deterministic optimal revenue).

For any deterministic Bass model, Vdet(x,T)V^{\text{det}}(x,T), defined as the optimal revenue starting from adoption level xx and remaining time TT, is concave in xx, for all T0T\geq 0, and all adoption levels x[0,1]x\in[0,1].

Using these observations, we can prove the following relation between Pseudo-Regret(T)\text{Pseudo-Regret}(T) and Regret(T)\text{Regret}(T); all proofs can be found in the appendix.

Lemma 2.2 (Pseduo-regret is close to regret).

For any T0T\geq 0,

Pseudo-Regret(T)Regret(T)\text{Pseudo-Regret}(T)\geq\text{Regret}(T)
Pseudo-Regret(T)Regret(T)+O(mlog2(m)+log((α+β)T)log2(mlog((α+β)T))).\text{Pseudo-Regret}(T)\leq\text{Regret}(T)+O\left(\sqrt{m}\log^{2}(m)+\log((\alpha+\beta)T)\log^{2}(m\log((\alpha+\beta)T))\right).

A simpler summary of this result can be stated as

Regret(T)Pseudo-Regret(T)Regret(T)+O~(m).\text{Regret}(T)\leq\text{Pseudo-Regret}(T)\leq\text{Regret}(T)+\tilde{O}(\sqrt{m}).

Therefore, an upper bound on Pseudo-Regret(T)\text{Pseudo-Regret}(T) implies the same upper bound on Regret(T)\text{Regret}(T). And, a lower bound on Pseudo-Regret(T)\text{Pseudo-Regret}(T) implies a lower bound on Regret(T)\text{Regret}(T) within O~(m)\tilde{O}(\sqrt{m}). In the rest of the paper, we therefore derive bounds on pseudo-regret only.

Notation conventions

Throughout the paper, if a potentially fractional number (like mXTmX^{*}_{T}, m2/3m^{2/3}, γm\gamma m etc.) is used as a whole number (for example as number of customers or as boundary of a discrete sum) without a floor or ceiling operation, the reader should assume that it is rounded down to its nearest integer.

3 Algorithm Description

The concavity property of deterministic optimal revenue and the implied relation between pseudo-regret and regret derived in Lemma 2.2 suggests that deterministic optimal revenue provides a benchmark that is comparable to the stochastic optimal revenue. Further, this benchmark is more tractable than the stochastic optimal due to the known and relatively simple analytical expressions for optimal pricing policy, as stated in Section 2. Using these insights, our algorithm is designed to essentially follow (an estimate of) the optimal price curve for the deterministic model at every time. We believe this approach could be applied to other finite horizon MDP problems where such concavity property holds, which may be of independent interest. We now describe our algorithm.

Algorithm outline

Our algorithm is provided, as input, the market size mm, and a constant upper bound ϕ\phi on α+β\alpha+\beta. For many applications, it is reasonable to assume that α+β1\alpha+\beta\leq 1, i.e., ϕ=1\phi=1, but we allow for more general settings. The market parameters α,β\alpha,\beta are unknown to the algorithm.

The algorithm alternates between using a low “exploratory” price aimed at observing demand in order to improve the estimates of model parameters, and using (a lower confidence bound on) the deterministic optimal prices for the estimated market parameters. Setting the exploratory price p0p_{0} as 0 suffices for our analysis, but more generally it can be set as any lower bound on the deterministic optimal prices, i.e., we just need 0p0p(x,α,β),x0\leq p_{0}\leq p^{*}(x,\alpha,\beta),\ \forall x. Using a non-zero exploratory price could be better in practice.

Our algorithm changes price only on arrival of a new customer, and holds the price constant in between two arrivals. The prices are set as follows. The algorithm starts with using an exploratory price p0p_{0} for the first γm\gamma m customers, where γ=m1/3\gamma=m^{-1/3}. The prices p1,,pγmp_{1},\ldots,p_{\gamma m} and the observed arrival times τ1,,τγm\tau_{1},\ldots,\tau_{\gamma m} for the first γm\gamma m customers are then used to obtain an estimation of α\alpha, and a high probability error bound AA on this estimate. The estimate of parameter α\alpha is not updated in subsequent time steps. The algorithm then proceeds in epochs i=1,2,Ki=1,2,\ldots K. Let γi=2iγ\gamma_{i}=2^{i}\gamma. Epoch ii starts right after customer γim\gamma_{i}m arrives and ends either when customer 2γim2\gamma_{i}m arrives, or when we reach the end of the planning horizon TT. In the beginning of each epoch ii, the exploratory price p0p_{0} is again offered for the first γm\gamma m customers in that epoch. The arrival times observed from these customers are used to update the estimate of the β\beta parameter and its’ high probability error bound BiB_{i}. For the remaining customers in epoch ii, the algorithm offers a lower confidence bound pi¯\underline{p_{i}} on the deterministic optimal price p(d1m,α,β)p^{*}\left(\frac{d-1}{m},\alpha,\beta\right) computed using the current estimates of α,β\alpha,\beta and their error bounds.

Input: Horizon TT, market size mm, δ(0,1)\delta\in(0,1), a constant upper bound ϕ\phi on α+β\alpha+\beta.
Initialize: X0=0X_{0}=0, γ=m1/3\gamma=m^{-1/3}, γi=2i1γ\gamma_{i}=2^{i-1}\gamma for i=1,2,i=1,2,\ldots, and exploratory price p0=0p_{0}=0
Offer p0p_{0} for the first γm\gamma m customers.
Use observed arrival times of the first γm\gamma m customers to compute α^\hat{\alpha} according to (10). Also, obtain a high probability estimation bound on |αα^|A|\alpha-\hat{\alpha}|\leq A where AA is as defined in (12).
for i1,2,3,i\leftarrow 1,2,3,... do
       Post price p0p_{0} for customers d=γim+1,,γim+γmd=\gamma_{i}m+1,\ldots,\gamma_{i}m+\gamma m.
       Use α^\hat{\alpha} along with the observed arrival times of the first γm\gamma m customers in the current epoch to calculate a new estimate β^i\hat{\beta}_{i} according to (11). Also update the high probability bound |ββ^i|Bi|\beta-\hat{\beta}_{i}|\leq B_{i}, with BiB_{i} as defined in (12).
       Use α^,A,β^i,Bi\hat{\alpha},A,\hat{\beta}_{i},B_{i} to compute price pi¯(d1m)\underline{p_{i}}(\frac{d-1}{m}) according to (13). Post price pi¯(d1m)\underline{p_{i}}(\frac{d-1}{m}) for d=γim+γm+1,,2γimd=\gamma_{i}m+\gamma m+1,\ldots,2\gamma_{i}m, or until end of time horizon TT.
       If γi13\gamma_{i}\geq\frac{1}{3} then extend the current epoch until the end of the planning horizon.
end for
ALGORITHM 1 Dynamic Learning and Pricing under Bass Model

Estimation and price computation details

Since our algorithm fixes prices between two arrivals, the arrival rate of the (Poisson) adoption process is constant in between arrivals, which in turn implies that the inter-arrival times are exponential random variables. This greatly simplifies the estimation procedure for α,β\alpha,\beta, which we describe next.

Estimates α^\hat{\alpha}, and β^i\hat{\beta}_{i} for every epoch ii are calculated using the following equations which match the observed inter-arrival times to the expected value of the corresponding exponential random variables:

1ep0α^mγm=τγm\frac{1}{e^{-p_{0}}\hat{\alpha}m}\lfloor\gamma m\rfloor=\tau_{\gamma m} (10)
γmep0(α^+β^iγi)(1γi)m=τγim+γmτγim.\frac{\lfloor\gamma m\rfloor}{e^{-p_{0}}(\hat{\alpha}+\hat{\beta}_{i}\gamma_{i})(1-\gamma_{i})m}=\tau_{\gamma_{i}m+\gamma m}-\tau_{\gamma_{i}m}. (11)

Also, we define

A8ϕ(1γ)28log(2δ)m1/3,Bi16ϕγi(1/3γ)28log(2δ)m1/3.\textstyle A\coloneqq\frac{8\phi}{(1-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}\ m^{-1/3},\ B_{i}\coloneqq\frac{16\phi}{\gamma_{i}(1/3-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}\ m^{-1/3}. (12)

Then, using concentration results for exponential random variables, we can show that the error in estimates α^\hat{\alpha} and β^i\hat{\beta}_{i} are bounded AA and BiB_{i} respectively. Specifically, we prove the following result. The proof is in Appendix D.

Lemma 3.1.

Assume m1/364(α+β)2α28log(2δ)m^{1/3}\geq 64\frac{(\alpha+\beta)^{2}}{\alpha^{2}}\sqrt{8\log(\frac{2}{\delta})}. Then with probability 1δ1-\delta, |αα^|A|\alpha-\hat{\alpha}|\leq A. And for any i=1,,Ki=1,\ldots,K, with probability 1δ1-\delta, |ββ^i|Bi|\beta-\hat{\beta}_{i}|\leq B_{i}.

Note that only the estimate and error bounds (β^i\hat{\beta}_{i} and BiB_{i}) for the parameter β\beta are updated in every epoch. The estimate and error bound (α^\hat{\alpha} and AA) for α\alpha is computed once in the beginning and remains fixed throughout the algorithm. Given α^,A,β^iBi\hat{\alpha},A,\hat{\beta}_{i}B_{i}, we define the price pi¯(d/m)\underline{p_{i}}(d/m) to be used by the algorithm in epoch ii as follows. For any dmd\leq m,

pi¯(dm)=clamp(p(dm,α^,β^i)LαALβiBi,[0,log(e+ϕT)]), where Lα=2α^A+β^i+Bi(α^A)2,Lβi=3α^A+3β^iBi.\begin{split}\underline{p_{i}}(\frac{d}{m})&=\text{clamp}\left(p^{*}(\frac{d}{m},\hat{\alpha},\hat{\beta}_{i})-L_{\alpha}A-L_{\beta i}B_{i},\ [0,\log(e+\phi T)]\right),\\ \text{ where \ \ }&L_{\alpha}=\frac{2}{\hat{\alpha}-A}+\frac{\hat{\beta}_{i}+B_{i}}{(\hat{\alpha}-A)^{2}},\\ &L_{\beta i}=\frac{3}{\hat{\alpha}-A}+\frac{3}{\hat{\beta}_{i}-B_{i}}.\end{split} (13)

Here p(,,)p^{*}(\cdot,\cdot,\cdot) is the optimal deterministic price curve as given by (6). In above, we clamped the price to the range [0,log(e+ϕT)][0,\log(e+\phi T)]. That is, if the computed price is less than 0 we set pi¯\underline{p_{i}} to 0; and if it is above log(e+ϕT)\log(e+\phi T), which is an upper bound on the optimal price (proven later in Lemma 4.4), we set pi¯\underline{p_{i}} equal to this upper bound.

Later in Lemma 4.1, we show that the quantities Lα,LβiL_{\alpha},L_{\beta i} play the role of Lipschitz constants for the optimal price curve: they provide high probability upper bounds on the derivatives of the optimal price curve with respect to α,β\alpha,\beta respectively. Consequently (in Corollary 4.1), we can show that with high probability the price defined in (13) will be lower than the corresponding deterministic optimal price. The reason that the algorithm uses a lower confidence bound on the optimal price is that we want to acquire at least as many customers in time TT as the optimal trajectory. The intuition here is that losing customers (and the entire revenue associated with those customers) can cost a lot more than losing a little bit of revenue from each customer.

Our algorithm is described in detail as Algorithm 1.

4 Regret Upper Bound

The main result from this section is the following upper bound on the pseudo-regret of the algorithm proposed in the previous section. Since pseudo-regret upper bounds regret (refer to Lemma 2.2) it directly implies the same upper bound on Regret(T)\text{Regret}(T).

Theorem 4.1 (Regret upper bound).

For any market with parameters α,β,α+βϕ\alpha,\beta,\alpha+\beta\leq\phi, market size mm, and time horizon TT, Algorithm 1 achieves the following regret bound with probability 1δ1-\delta,

Pseudo-Regret(T)O(m2/3log(m)log(Tδ)(1α+1β+βα2)ϕ)=O~(m3/2).\textstyle\text{Pseudo-Regret}(T)\leq O\left(m^{2/3}\log(m)\log(\frac{T}{\delta})\left(\frac{1}{\alpha}+\frac{1}{\beta}+\frac{\beta}{\alpha^{2}}\right)\phi\right)=\tilde{O}(m^{3/2}).

Proof Intuition

We first give an intuitive explanation for why our algorithm works well. As mentioned earlier in the introductory sections, there is a simple closed form expression for the optimal prices in the deterministic model for any α,β,T\alpha,\beta,T (see (6)–(8)). Moreover, the definition of pseudo-regret and Lemma 2.2 allows us to replace the stochastic optimal revenue Vstoch(T)V^{\text{stoch}}(T) with the deterministic optimal revenue Vdet(T)V^{\text{det}}(T). Our algorithmic strategy is then to follow (an estimate of) the deterministic optimal price trajectory, and show that the resulting revenue is close to the deterministic optimal revenue with high probability.

To prove this, we show that the prices pi¯\underline{p_{i}} used by our algorithm were set so that they lower bound the deterministic optimal price with high probability. Intuitively, using a lower price would ensure that the algorithm sees at least as many as optimal number of customer adoptions in horizon TT, so that the gap in revenue can be bounded simply by the gap in prices paid by those customers. The final piece of the puzzle is to show that we can learn or estimate α,β\alpha,\beta at a fast enough rate so that the estimated prices are increasingly close to the optimal price and we do not lose too much revenue from learning.

Proof Outline

In the remainder of this section we outline the proof of Theorem 4.1 in four steps. All the missing proofs from this section can be found in Appendix D.

Step 1 (Bounding the estimation errors)

In Lemma 3.1 we provided a high probability upper bound on the estimation error in α,β\alpha,\beta in each epoch of the algorithm. Using these error bounds and the definition of price pi¯\underline{p_{i}} in (13), we can show that the prices offered by the algorithm are close to optimal and lower bound the corresponding optimal price with high probability. Specifically, we prove the following result.

Lemma 4.1 (Error bounds for estimated prices).

Given any market parameters α,β\alpha,\beta and their estimations α^,β^i\hat{\alpha},\hat{\beta}_{i} that satisfy |αα^|A|\alpha-\hat{\alpha}|\leq A, |ββ^i|Bi|\beta-\hat{\beta}_{i}|\leq B_{i}, α^A>0\hat{\alpha}-A>0, β^iBi>0\hat{\beta}_{i}-B_{i}>0, then for every d=0,,m1d=0,\ldots,m-1,

|p(dm,α^,β^i)p(dm,α,β)|LαA+LβiBi,\textstyle\left|p^{*}(\frac{d}{m},\hat{\alpha},\hat{\beta}_{i})-p^{*}(\frac{d}{m},\alpha,\beta)\right|\leq L_{\alpha}A+L_{\beta i}B_{i},

where Lα,LβiL_{\alpha},L_{\beta i} are as defined in (13).

From the expressions of AA (error bound for α^\hat{\alpha}) vs. BiB_{i} (error bound for β^i\hat{\beta}_{i} in epoch ii) in (12), observe that A=O~(m1/3)A=\tilde{O}(m^{-1/3}) and Bi=O~(1γim1/3)B_{i}=\tilde{O}(\frac{1}{\gamma_{i}}m^{-1/3}). Therefore, the estimation of β\beta is the bottleneck here: it has an extra 1γi\frac{1}{\gamma_{i}} factor in the error bound. This is because the imitation factor β\beta is multiplied with the current adoption level in the definition of the Bass model (see (2)). This means that the estimation error on β\beta is likely to be large when the adoption level is low. This is why the algorithm needs to keep updating the estimate of β\beta in each epoch but not α\alpha.

The above lemma and the definition of pi¯\underline{p_{i}} immediately implies the following.

Corollary 4.1 (Lower confidence bound on optimal price).

Under the same assumptions as in Lemma 4.1, and given the definition of pi¯\underline{p_{i}} in (13), we have that for every d=0,,m1d=0,\ldots,m-1, pi¯(dm)p(dm,α,β)\underline{p_{i}}(\frac{d}{m})\leq p^{*}(\frac{d}{m},\alpha,\beta).

Step 2 (Bounding the revenue loss due to lower price).

Using the fact that there are γim\gamma_{i}m customers in epoch ii, and the price difference bound in Lemma 4.1 in Step 1, we can show that we lose at most O~(γim(Lαm1/3+Lβi1γim1/3))=\tilde{O}\left(\gamma_{i}m\left(L_{\alpha}m^{-1/3}+L_{\beta i}\frac{1}{\gamma_{i}}m^{-1/3}\right)\right)= O~(m2/3)\tilde{O}\left(m^{2/3}\right) revenue per epoch. This loss bound per epoch is when compared to the optimal revenue earned from the same γim\gamma_{i}m customers, assuming that the stochastic trajectory is given as much time as needed to reach the same number of customers. More precisely, let Revi\text{Rev}_{i} denote the algorithm’s revenue from the (2γimmXT)γim(2\gamma_{i}m\wedge mX^{*}_{T})-\gamma_{i}m customers in an epoch ii completed by the algorithm, and VidetV^{\text{det}}_{i} denote the potential revenue from those customers if they paid the deterministic optimal price instead. Then, we prove the following bound.

Lemma 4.2.

For any epoch ii in the algorithm, with probability 1δ1-\delta,

VidetReviO(m2/3log(Tδ)(1α+1β+βα2)ϕ)=O~(m2/3).\textstyle V^{\text{det}}_{i}-\text{Rev}_{i}\leq O\left(m^{2/3}\log\left(\frac{T}{\delta}\right)\left(\frac{1}{\alpha}+\frac{1}{\beta}+\frac{\beta}{\alpha^{2}}\right)\phi\right)=\tilde{O}(m^{2/3}).

Step 3 (Bounding the revenue loss due to fewer adoptions).

In the previous step, we upper bounded the potential revenue loss for the γim\gamma_{i}m customers that arrive in any epoch ii of the algorithm. However it is possible that the algorithm reaches the end of time horizon TT early and never even get to observe some customers, i.e. dT<XTmd_{T}<X^{*}_{T}m. In that case the algorithm would incur an additional regret due to fewer adoptions. Therefore, we need to show that our total number of adoptions in time TT cannot be much lower than that in the optimal trajectory. To show this, we use the observation made in Step 1 that in each epoch the algorithm offered a lower confidence bound on the optimal prices (note that the exploration price p0=0p_{0}=0 is always a lower bound on the optimal prices). Due to the use of lower prices, we can show that with high probability our final number of adoptions can be at most O~(m)\tilde{O}(\sqrt{m}) below the optimal number of adoptions in the deterministic Bass model. Further, we can prove an O(log(T))O(\log(T)) upper bound on the optimal price, which allows us to easily bound the revenue loss that may result from the small gap in number of adoptions. Lemma 4.3 and Lemma 4.4 make these results precise.

Lemma 4.3.

If the seller follows Algorithm 1, then with probability at least 1δlog(m)1-\delta\log(m), the number of adoptions at the end of time horizon TT is lower bounded as:

dTmXT8mXTlog(4δ).\textstyle d_{T}\geq mX^{*}_{T}-\sqrt{8mX^{*}_{T}\log(\frac{4}{\delta})}.
Lemma 4.4 (Upper bound on optimal prices).

All prices in the optimal price curve for deterministic Bass model are upper bounded as:

p(x,α,β)log(e+(α+β)T)p^{*}(x,\alpha,\beta)\leq\log\left(e+(\alpha+\beta)T\right)

These two results combined tell us that, compared to the optimal, we lose at most O~(mlog(T))\tilde{O}(\sqrt{m}\log(T)) revenue due to fewer adoptions. The dominant term in regret comes from the seller losing roughly O~(m1/3)\tilde{O}(m^{-1/3}) revenue on each customer, which led to O~(m2/3)\tilde{O}(m^{2/3}) revenue loss per epoch in Step 2.

Step 4 (Putting it all together).

Finally, we put the previous steps together to prove Theorem 4.1. Since the number of customers in each epoch grows geometrically and there are at most mm customers in total, the number of epochs is bounded by O(log(m))O(\log(m)). By Step 2, the algorithm loses at most m2/3\approx m^{2/3} revenue on adoptions in each epoch. By Step 3, it loses at most mlog(T)\approx\sqrt{m}\log(T) revenue due to missed adoptions. The total regret is therefore bounded by (log(m)m2/3+mlog(T))\approx(\log(m)m^{2/3}+\sqrt{m}\log(T)).

5 Regret Lower Bound

We prove a lower bound on the regret of any dynamic pricing and learning algorithm under the following mild assumptions on algorithm design.

Assumption 1.

The algorithm can change price only on arrival of a new customer. The price is held constant between two arrivals.

Assumption 2.

Given a planning horizon TT, the price offered by the algorithm at any time t[0,T]t\in[0,T] is upper bounded by pmaxlog(T)+C(α,β)p_{max}\coloneqq\log(T)+C(\alpha,\beta), for some function CC of α,β\alpha,\beta.

The above assumptions are indeed satisfied by Algorithm 1 since it changes prices only on arrival of a new customer, and the prices offered are clamped to the range [0,log(e+ϕT)][0,\log(e+\phi T)] (refer to (13)) where ϕ\phi is a constant upper bound on α+β\alpha+\beta. These assumptions are also satisfied by the optimal dynamic pricing policy for the deterministic Bass model since the optimal prices are bounded by log(e+(α+β)T)\log(e+(\alpha+\beta)T) (refer to Lemma 4.4). Note that since customer arrivals are continuous in the deterministic Bass model, Assumption 1 is vacuous in that setting.

We argue that these assumptions do not significantly handicap an algorithm and preserve the difficulty of the problem. Assuming an upper bound on the price is a common practice in dynamic pricing literature. Indeed, unlike most existing literature which assumes a constant upper bound, Assumption 2 allows the price to potentially grow with the planning horizon. Intuitively, given enough time to sell the product, the seller should be able to potentially increase prices in exchange for a slower adoption rate.444 In Lemma F.1 in Appendix F, we show that if a constant upper bound is required on the price, then the problems becomes trivial for any TΩ(log(m))T\geq\Omega(\log(m))). Such a dynamics is observed in the optimal dynamic pricing policy for deterministic Bass model, where the price can grow with the time horizon. However, the optimal prices are still uniformly upper bounded by log(e+(α+β)T)\log(e+(\alpha+\beta)T).

Furthermore, in the proof of Lemma 2.2 (specifically, refer to Lemma C.2 in Appendix C), we show that there exist pricing policies in the stochastic Bass model that satisfy both the above assumptions and achieve an expected revenue that is at most O~(m)\tilde{O}(\sqrt{m}) additive factor away from the deterministic optimal revenue Vdet(T)V^{\text{det}}(T). Since the lower bound provided in this section is of the order m2/3m^{2/3}, this indicates that removing these assumptions from algorithm design is unlikely to provide an advantage significant enough to overcome the lower bound derived here.

Our main result in the section is the following regret lower bound on any algorithm under the above assumptions.

Theorem 5.1 (Regret lower bound).

Fix any α>0,β>0\alpha>0,\beta>0, and T=2(1+2)eα+βT=\frac{2(1+\sqrt{2})e}{\alpha+\beta}. Then, given any pricing algorithm satisfying Assumption 1 and 2, there exist Bass model parameters (α,β)(\alpha,\beta^{\prime}) with β[β,β+(α+β)2α]\beta^{\prime}\in[\beta,\beta+\frac{(\alpha+\beta)^{2}}{\alpha}] such that the expected regret of the algorithm in this market has the following lower bound:

𝔼[Pseudo-Regret(T)]Ω((αα+β)4/3m2/3).\textstyle\mathbb{E}[\text{Pseudo-Regret}(T)]\geq\Omega\left(\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right).

Here the expectation is taken with respect to the stochasticity in the Bass model as well as any randomness in the algorithm.

Note that given the relation between pseudo-regret and regret in Lemma 2.2, the above theorem directly implies an Ω(m2/3)\Omega(m^{2/3}) lower bound on 𝔼[Regret(T)]\mathbb{E}[\text{Regret}(T)].

Lower bound implications

Theorem 5.1 highlights the regime of horizon TT where this learning problem is most challenging. In this problem, we observe that if TT is large, the seller can simply offer a high price for the entire time horizon and still capture most of the market, making the problem trivial. Specifically, for TΩ(log(m))T\geq\Omega(\log(m)), under an additional assumption that there is a constant upper bound on price, we can show that an O(log(m))O(\log(m)) upper bound on regret can be trivially achieved by offering the maximum price at all times (see Lemma F.1). On the other hand, when T=o(1)T=o(1), then we can show that achieving a sublinear (in mm) regret is trivial for any algorithm. This is because from (7) we have that in this case the optimal number of adoptions XTα+βeT=o(1)X^{*}_{T}\leq\frac{\alpha+\beta}{e}T=o(1). Furthermore, from Lemma 4.4 we know that all prices in the optimal curve can be bounded by pmax=O(1)p^{*}_{max}=O(1) in this case. This means that the optimal revenue is at most mXTpmax=o(m)mX^{*}_{T}p^{*}_{max}=o(m), i.e., sublinear in mm. Intuitively, for such a small TT, the word-out-mouth effect never comes into play (i.e., the α+βx\alpha+\beta x term in the Bass model is dominated by α\alpha), making the problem uninteresting. Our lower bound result therefore pinpoints the exact order of TT, i.e., T=Θ(1/(α+β))T=\Theta(1/(\alpha+\beta)), where the difficult and interesting instances of this problem come from.

In the rest of this section, we describe the proof of Theorem 5.1. We first provide an intuition for our lower bound and then go over the proof outline.

Proof Intuition

We start with showing that the pseudo-regret of any algorithm can be lower bounded in terms of its cumulative pricing errors. Therefore, in order to achieve low regret, any algorithm must be able to estimate optimal prices accurately.

Next, we observe that in this problem, the main difficulty for any algorithm comes from the difficulty in estimating the market parameter β\beta. Since β\beta’s observed influence on the arrival rate of customers is proportional to the current adoption level xx, one cannot estimate β\beta accurately when xx is small. This makes intuitive sense because when the number of adopters is small, we do not expect to be able to measure the word of mouth effect accurately. In fact, we demonstrate that for any ε\varepsilon, before the adoption level exceeds (mε)2/3\left(\frac{m}{\varepsilon}\right)^{2/3}, no algorithm can distinguish two markets with Bass model parameters (α,β)(\alpha,\beta) vs. (α,β+ε)(\alpha,\beta+\varepsilon). Further, we can show that in some problem instances (specifically for instances with T=Θ(1/(α+β))T=\Theta(1/(\alpha+\beta))), the optimal prices for the first m2/3m^{2/3} customers are very sensitive to the value of β\beta. Therefore, if an algorithm’s estimation of β\beta is not accurate, it cannot accurately compute the optimal price for these customers. This presents an impossible challenge for any algorithm: it needs an accurate estimate of β\beta in order to compute an accurate enough optimal price for the first m2/3m^{2/3} customers, but it cannot possibly obtain such an accurate estimate while the adoption level is that low; thus, it must incur pricing errors resulting in the given lower bound on regret.

Proof Outline

A formal proof of Theorem 5.1 is obtained through the following four steps. All the missing proofs from this section can be found in Appendix E.

Step 1.

First we show that the pseudo-regret of an algorithm can be lower bounded in terms of cumulative pricing errors made by the algorithm. Note that this result is not apriori obvious because as we have discussed earlier, prices have long term effects on the adoption curve: the immediate loss of revenue in the current round by offering too low of a price might be offset by the fact that we saved some time for the future rounds (lower price means faster arrival rate). On the other hand, if we offer a price that is higher than the optimal price, the resulting delay in customer arrival (higher price means slower arrival rate) may lead to less remaining time and fewer adoptions, which could be more harmful than the immediate extra revenue. The result proven here is crucial for precisely quantifying these tradeoffs and lower bounding the regret in terms of pricing errors.

To obtain this result, we first lower bound the impact of offering a suboptimal price at any time, on the remaining value in the deterministic Bass model. Given any current adoption level xx and remaining time TT^{\prime}, the impact or ‘disAdvantage’ Adet(x,T,p)A^{det}(x,T^{\prime},p) of price pp is defined as the total decrease in value over the remaining time when pp is offered for an infinitesimal time and then optimal policy is followed. That is,

Adet(x,T,p):=limδ0Vdet(x,T)pλ(p,x)δVdet(x+λ(p,x)δ/m,Tδ)δ.A^{det}(x,T^{\prime},p):=\lim_{\delta\rightarrow 0}\frac{V^{\text{det}}(x,T^{\prime})-p\lambda(p,x)\delta-V^{\text{det}}(x+\lambda(p,x)\delta/m,T^{\prime}-\delta)}{\delta}.

We prove the following lemma lower bounding this quantity.

Lemma 5.1.

At any adoption level xx and remaining time TT^{\prime}, the disadvantage of offering a suboptimal price pp in the deterministic Bass model is lower bounded as:

Adet(x,T,p)λ(p,x)min(14(π(x,T)p)2,110).A^{det}(x,T^{\prime},p)\geq\lambda(p,x)\min\left(\frac{1}{4}(\pi^{*}(x,T^{\prime})-p)^{2},\frac{1}{10}\right).

where π(x,T)=argminpAdet(x,T,p)\pi^{*}(x,T^{\prime})=\arg\min_{p}A^{det}(x,T^{\prime},p) denotes the optimal price at x,Tx,T^{\prime}.

Now recall that pseudo-regret is defined as the total difference in revenue of the algorithm, which is potentially offering suboptimal prices, compared to the deterministic optimal revenue Vdet(T)V^{\text{det}}(T). We use the above result to quantify the impact of offering suboptimal prices for the first nm2/3n\approx m^{2/3} customers, along with a bound on difference in stochastic vs. deterministic optimal revenue, to obtain the following lower bound on the pseudo-regret:

𝔼[Pseudo-Regret(T)]𝔼[m0n/mmin(14(πxpx)2,110)𝑑x]O~(m1/3).\mathbb{E}\left[\text{Pseudo-Regret}(T)\right]\geq\mathbb{E}\left[m\int_{0}^{n/m}\min\left(\frac{1}{4}(\pi^{*}_{x}-p_{x})^{2},\frac{1}{10}\right)dx\right]-\tilde{O}\left(m^{1/3}\right).

Here, pxp_{x} denotes the price trajectory obtained on using the algorithm’s prices in the deterministic Bass model, and πx\pi^{*}_{x} denote the prices that would minimize the disadvantage at each point in this trajectory. A more precise statement of this result is in Lemma E.2 in Appendix E.

The remaining proof focuses on lower bounding the cumulative difference in pricing errors (πxpx)2(\pi^{*}_{x}-p_{x})^{2} that any algorithm must make for the first nm2/3n\approx m^{2/3} customers.

Step 2.

Consider two markets with parameters (α,β)(\alpha,\beta), and (α,β+ϵ)(\alpha,\beta+\epsilon) for some constant ϵ\epsilon. We show that for the first n(mϵ)2/3n\approx\left(\frac{m}{\epsilon}\right)^{2/3} customers, any pricing algorithm will be “wrong” with a constant probability. Here by being wrong we mean that the algorithm will set a price that is closer to the optimal price of the other market. Lemma E.3 and Corollary E.1 formalize this idea. The proof is based on a standard information theoretic analysis using KL-divergence.

Step 3.

Next we show that for T=Θ(1α+β)T=\Theta(\frac{1}{\alpha+\beta}), the difference between the optimal prices for the two markets ((α,β)(\alpha,\beta) and (α,β+ε)(\alpha,\beta+\varepsilon)) is large (αε(α+β)2\approx\frac{\alpha\varepsilon}{(\alpha+\beta)^{2}}). We show this by proving a bound on the derivative of optimal price with respect to β\beta. Lemma E.4 in Appendix E gives the precise statement.

Step 4.

Finally, we put together the observations made in the previous three steps to prove Theorem 5.1. We have shown that with constant probability any algorithm will make a pricing mistake for the first (mϵ)2/3\approx(\frac{m}{\epsilon})^{2/3} customers [Step 2] and that this mistake will be large (on the order of ϵ\epsilon) [Step 3] under the condition that T=Θ(1/(α+β))T=\Theta(1/(\alpha+\beta)). We also have a lower bound on pseudo-regret in terms of total (square of) pricing errors made over the first nm2/3n\approx m^{2/3} customers [Step 1]. Combining these observations with an appropriately chosen ϵ\epsilon gives us that the pseudo-regret is lower bounded by O(ϵ2(mϵ)2/3)O(m2/3)O(\epsilon^{2}(\frac{m}{\epsilon})^{2/3})\approx O(m^{2/3}).

All the missing details of this proof can be found in Appendix E.

6 Conclusions

In this paper we investigated a novel formulation of dynamic pricing and learning, with a non-stationary demand process governed by an unknown stochastic Bass model. Deriving an optimal policy in this stochastic model is challenging even when the model parameters are known. Here, we presented an online algorithm that learns to price from past observations without a priori knowledge of the model parameters. A key insight that we derive and utilize in our algorithm design is the concavity of the optimal value in the deterministic Bass model setting. Using this concavity property, we can show that the optimal value in the deterministic Bass model is always higher than in the stochastic model, and therefore can be used as a stronger benchmark to compete with. Based on this insight, the main algorithmic idea is to follow the well-known optimal price curve for the deterministic model but with estimated model parameters substituted in place of their true values.

Our main technical result is an upper bound of O~(m2/3)\tilde{O}(m^{2/3}) on the regret of our algorithm in a market of size mm, along with a matching lower bound of Ω(m2/3)\Omega(m^{2/3}) under mild restrictions on algorithm design. Thus, our algorithm has close to the best performance achievable by any algorithm for this problem. The derivation of our lower bound is especially involved, and requires deriving novel dynamic-programming based inequalities. These allow for lower bounding the loss in long-term revenue in terms of instantaneous pricing errors made by any non-anticipating algorithm.

An interesting and perhaps surprising aspect of our upper and lower bounds is the role of the horizon TT vs. market size mm. Our upper bound depends sublinearly on mm but only logarithmically on the horizon TT. And in fact our lower bound indicates that for any fixed α,β\alpha,\beta the most interesting (and challenging) instances of this problem are characterized by TT which is of constant order, and large mm. This insight highlights the distinct nature of pricing under stateful models like the Bass model when compared to the independent demand models and multi-armed bandit based formulations where asymptotics with respect to TT form the main focus of the analysis. Interesting directions for future research include investigation of other stateful demand models where the concavity property and other new dynamic programming based insights derived here may be useful.

References

  • [1] Shipra Agrawal and Randy Jia “Optimistic posterior sampling for reinforcement learning: worst-case regret bounds” In Advances in Neural Information Processing Systems 30, 2017
  • [2] Ashwinkumar Badanidiyuru, Robert Kleinberg and Aleksandrs Slivkins “Bandits with Knapsacks” In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science IEEE, 2013
  • [3] Frank M. Bass “A New Product Growth for Model Consumer Durables” In Management Science 15.5, 1969, pp. 215–227
  • [4] Frank M. Bass “A New Product Growth for Model Consumer Durables” In Management Science 50, 2004, pp. 1825–1832
  • [5] Dimitris Bertsimas and Jos’e Niño-Mora “Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic” In Operations Research 48.1, 2000, pp. 80–90
  • [6] Omar Besbes, Yonatan Gur and Assaf Zeevi “Optimal Exploration-Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards” In Stochastic Systems 9.4, 2019, pp. 319–337
  • [7] Omar Besbes and Assaf Zeevi “Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms” In Operations Research 57.6 INFORMS, 2009, pp. 1407–1420
  • [8] Omar Besbes and Assaf Zeevi “On the Minimax Complexity of Pricing in a Changing Environment” In Operations Research 59.1, 2011, pp. 66–79
  • [9] Arnoud V. Boer “Dynamic pricing and learning: Historical origins, current research, and new directions” In Surveys in Operations Research and Management Science 20.1, 2015, pp. 1 –18
  • [10] Arnoud V. Boer “Tracking the market: Dynamic pricing and learning in a changing environment” In European Journal of Operational Research 247.3, 2015, pp. 914 –927
  • [11] Yang Cao, Zheng Wen, Branislav Kveton and Yao Xie “Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit” In Proceedings of Machine Learning Research 89 PMLR, 2019, pp. 418–427
  • [12] Patrick Chareka, Ottilia Chareka and Sarah Kennendy “Locally sub-Gaussian random variables and the strong law of large numbers” In Atlantic Electronic Journal of Mathematics 1, 2006
  • [13] Yiwei Chen, Zheng Wen and Yao Xie “Dynamic Pricing in an Evolving and Unknown Marketplace” In Available at SSRN, 2019
  • [14] Judith A. Chevalier and Dina Mayzlin “The Effect of Word of Mouth on Sales: Online Book Reviews” In Journal of Marketing Research 43.3, 2006, pp. 345–354
  • [15] Chrysanthos Dellarocas “The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms” In Management Science 49.10, 2003, pp. 1407–1424
  • [16] Chrysanthos Dellarocas, Xiaoquan (Michael) Zhang and Neveen F. Awad “Exploring the value of online product reviews in forecasting sales: The case of motion pictures” In Journal of Interactive Marketing 21.4, 2007, pp. 23 –45
  • [17] Robert J. Dolan and Abel P. Jeuland “Experience Curves and Dynamic Demand Models: Implications for Optimal Pricing Strategies” In Journal of Marketing 45.1 American Marketing Association, 1981, pp. 52–62
  • [18] Kenji Doya, Shin Ishii, Alexandre Pouget and Rajesh PN Rao “Bayesian brain: Probabilistic approaches to neural coding” MIT press, 2007
  • [19] Robert East, Kathy Hammond and Wendy Lomax “Measuring the impact of positive and negative word of mouth on brand purchase probability” In International Journal of Research in Marketing 25.3, 2008, pp. 215 –224
  • [20] Zhi-Ping Fan, Yu-Jie Che and Zhen-Yu Chen “Product sales forecasting using online reviews and historical sales data: A method combining the Bass model and sentiment analysis” In Journal of Business Research 74, 2017, pp. 90 –100
  • [21] Aurélien Garivier and Eric Moulines “On Upper-Confidence Bound Policies for Switching Bandit Problems” In Algorithmic Learning Theory Springer Berlin Heidelberg, 2011, pp. 174–188
  • [22] Johan Grasman and Marcel Kornelis “Forecasting product sales with a stochastic Bass model” In Journal of Mathematics in Industry 9, 2, 2019
  • [23] Raghuram Iyengar, Christophe Bulte and Thomas W. Valente “Opinion Leadership and Social Contagion in New Product Diffusion” In Marketing Science 30.2, 2011, pp. 195–212
  • [24] Thomas Jaksch, Ronald Ortner and Peter Auer “Near-optimal Regret Bounds for Reinforcement Learning.” In Journal of Machine Learning Research 11.4, 2010
  • [25] N. Bora Keskin and Assaf Zeevi “Chasing Demand: Learning and Earning in a Changing Environment” In Mathematics of Operations Research 42.2, 2017, pp. 277–307
  • [26] Robert Kleinberg and Tom Leighton “The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions” In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, FOCS 03 USA: IEEE Computer Society, 2003, pp. 594
  • [27] Tor Lattimore and Csaba Szepesvári “Bandit algorithms” Cambridge University Press, 2020
  • [28] Hakyeon Lee, Sang Gook Kim, Hyun Park and Pilsung Kang “Pre-launch new product demand forecasting using the Bass model: A statistical and machine learning-based approach” In Technological Forecasting and Social Change 86, 2014, pp. 49 –64
  • [29] Donald Lehmann, Oded Netzer and Olivier Toubia “The Future of Quantitative Marketing: Results of a Survey” In Customer Needs and Solutions 2, 2015, pp. 5–18
  • [30] Haipeng Luo, Chen-Yu Wei, Alekh Agarwal and John Langford “Efficient Contextual Bandits in Non-stationary Worlds” In Proceedings of the 31st Conference On Learning Theory 75 PMLR, 2018, pp. 1739–1776
  • [31] Shun-Chen Niu “A stochastic formulation of the Bass model of new-product diffusion” In Mathematical Problems in Engineering 8, 2002
  • [32] Bruce Robinson and Chet Lakhani “Dynamic Price Models for New-Product Planning” In Management Science 21.10, 1975, pp. 1113–1122
  • [33] Yoan Russac, Claire Vernade and Olivier Cappé “Weighted Linear Bandits for Non-Stationary Environments” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
  • [34] Sylvain Senecal and Jacques Nantel “The influence of online product recommendations on consumers’ online choices” In Journal of Retailing 80.2, 2004, pp. 159 –169
  • [35] Cem Tekin and Mingyan Liu “Online Learning of Rested and Restless Bandits” In IEEE Transactions on Information Theory 58.8 Institute of ElectricalElectronics Engineers (IEEE), 2012, pp. 5588–5611
  • [36] Olivier Toubia, Jacob Goldenberg and Rosanna Garcia “Improving penetration forecasts using social interactions data” In Management science 60.12, 2014, pp. 3049–3066
  • [37] Lin Yang and Mengdi Wang “Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound” In International Conference on Machine Learning, 2020, pp. 10746–10756 PMLR
  • [38] Peng Yin, Guowei Dou, Xudong Lin and Liangliang Liu “A hybrid method for forecasting new product sales based on fuzzy clustering and deep learning” In Kybernetes ahead-of-print, 2020
  • [39] Mengzhenyu Zhang, Hyun-Soo Ahn and Joline Uichanco “Data-Driven Pricing for a New Product” In Available at SSRN 3545574, 2020

Appendix A A Concentration Result on Exponential Random Variables

Before moving on to the next sections, we state a concentration result that is used in many of the remaining proofs.

Lemma A.1.

Let [Id,d=1,,n][I_{d},d=1,\ldots,n] be a sequence of random variables with filtration d\mathcal{F}_{d} such that Id|d1I_{d}|\mathcal{F}_{d-1} is an exponential random variable with rate λdd1\lambda_{d}\in\mathcal{F}_{d-1}. Suppose that there exists λ¯>0\underline{\lambda}>0 such that λdλ¯\lambda_{d}\geq\underline{\lambda} almost surely. Then for any ϵ2nλ¯\epsilon\leq\frac{2n}{\underline{\lambda}},

(d=1nId1λdϵ)exp(ϵ2λ¯28n)\mathbb{P}\left(\sum\limits_{d=1}^{n}I_{d}-\frac{1}{\lambda_{d}}\leq-\epsilon\right)\leq exp\left(\frac{-\epsilon^{2}\underline{\lambda}^{2}}{8n}\right)
(d=1nId1λdϵ)exp(ϵ2λ¯28n)\mathbb{P}\left(\sum\limits_{d=1}^{n}I_{d}-\frac{1}{\lambda_{d}}\geq\epsilon\right)\leq exp\left(\frac{-\epsilon^{2}\underline{\lambda}^{2}}{8n}\right)
Proof.

An exponential random variable with rate λ\lambda satisifies (see [12])

𝔼[esX]=11s/λesλ+2s2λ2s[λ2,λ2]\mathbb{E}[e^{sX}]=\frac{1}{1-s/\lambda}\leq e^{\frac{s}{\lambda}+2\frac{s^{2}}{\lambda^{2}}}\quad\forall s\in[-\frac{\lambda}{2},\frac{\lambda}{2}] (14)
(d=1nId1λdϵ)\displaystyle\mathbb{P}\left(\sum\limits_{d=1}^{n}I_{d}-\frac{1}{\lambda_{d}}\leq-\epsilon\right) =(esd=1n1λdIdesϵ)\displaystyle=\mathbb{P}\left(e^{s\sum\limits_{d=1}^{n}\frac{1}{\lambda_{d}}-I_{d}}\geq e^{s\epsilon}\right)
(Markov’s Inequality) 1esϵ𝔼[es(d=1n1λdId)]\displaystyle\leq\frac{1}{e^{s\epsilon}}\mathbb{E}\left[e^{s\left(\sum\limits_{d=1}^{n}\frac{1}{\lambda_{d}}-I_{d}\right)}\right]
1esϵ𝔼[es(d=1n11λdId)𝔼[es(1λnIn)|n1]]\displaystyle\leq\frac{1}{e^{s\epsilon}}\mathbb{E}\left[e^{s\left(\sum\limits_{d=1}^{n-1}\frac{1}{\lambda_{d}}-I_{d}\right)}\mathbb{E}\left[e^{s(\frac{1}{\lambda_{n}}-I_{n})}\big{|}\mathcal{F}_{n-1}\right]\right]
(Using (14) and λnλ¯)\displaystyle(\text{Using \eqref{eq: MGF of exponential} and $\lambda_{n}\geq\underline{\lambda}$}) 1esϵ𝔼[es(d=1n11λdId)e2s2λ¯2]\displaystyle\leq\frac{1}{e^{s\epsilon}}\mathbb{E}\left[e^{s\left(\sum\limits_{d=1}^{n-1}\frac{1}{\lambda_{d}}-I_{d}\right)}e^{2\frac{s^{2}}{\underline{\lambda}^{2}}}\right]
(Repeating the above argument)\displaystyle(\text{Repeating the above argument}) =es22nλ¯2sϵ\displaystyle=e^{s^{2}\frac{2n}{\underline{\lambda}^{2}}-s\epsilon}
(solve for optimal s=ϵλ¯24ns=\frac{\epsilon\underline{\lambda}^{2}}{4n}) =exp(ϵ2λ¯28n)\displaystyle=exp\left(\frac{-\epsilon^{2}\underline{\lambda}^{2}}{8n}\right)

Note that to use (14) in the last inequality we needed that s=ϵλ¯24nλd2s=\frac{\epsilon\underline{\lambda}^{2}}{4n}\leq\frac{\lambda_{d}}{2} for every dd, which is satisfied if ϵ2nλ¯\epsilon\leq\frac{2n}{\underline{\lambda}}. (d=1nId1λdϵ)\mathbb{P}\left(\sum\limits_{d=1}^{n}I_{d}-\frac{1}{\lambda_{d}}\geq\epsilon\right) can be bounded similarly. ∎

Appendix B Some Properties of the Deterministic Optimal Price Curve

B.1 Optimal pricing policy expression

The expressions in (6) (7) and (8) provided the expressions for the optimal price curve p(x,α,β),x[0,XT]p^{*}(x,\alpha,\beta),x\in[0,X^{*}_{T}] and for the total number of adoptions XTX^{*}_{T} when optimal pricing policy is followed from an initial adoption level X0=0X_{0}=0 at t=0t=0 to the end of the time horizon TT. Here we derive a more general expression for the optimal pricing policy π(x,α,β,T)\pi^{*}(x,\alpha,\beta,T) that will give the optimal price at any current adoption level xx and remaining planning horizon TT (irrespective of what pricing policy was followed for how much time to reach the adoption level xx). Also, we derive an expression for XT(x)X^{*}_{T}(x), the adoption level at the end of time TT if optimal pricing policy is followed for time TT starting from the adoption level X0=xX_{0}=x at t=0t=0. These expressions will be especially useful in our lower bound derivations.

Note that under this expanded notation, XT=XT(0)X^{*}_{T}=X^{*}_{T}(0). We sometimes also use the notation XT(x,α,β)X^{*}_{T}(x,\alpha,\beta) to emphasis the dependence XTX^{*}_{T} has on the market parameters. Note that by this definition, XTt(Xt)=XT(X0)X^{*}_{T-t}(X_{t})=X^{*}_{T}(X_{0}) if XtX_{t} is the adoption level reached at time tt on following the optimal price trajectory from time 0.

The optimal price to offer at any given adoption level xx and remaining time TT can be derived using optimal control theory (see Equation (8) of [17]). and is given by the following pricing policy. Given adoption x[0,1)x\in[0,1) and remaining time T>0T>0, the optimal price is given by

π(x,α,β,T):=1+log((α+βx)(1x)(α+βXT(x))(1XT(x)))\pi^{*}(x,\alpha,\beta,T):=1+\log\left(\frac{(\alpha+\beta x)(1-x)}{(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))}\right) (15)

When the value of α,β\alpha,\beta is clear from the context, we sometimes drop the dependence on α,β\alpha,\beta, and use shorter notations p(x)p^{*}(x) and π(x,T)\pi^{*}(x,T) instead of p(x,α,β)p^{*}(x,\alpha,\beta) and π(x,α,β,T)\pi^{*}(x,\alpha,\beta,T) respectively.

To see the connection (and distinction) between p(x)p^{*}(x) and π(x,T)\pi^{*}(x,T), note that p(x)p^{*}(x) is the price trajectory if the optimal policy π\pi^{*} is followed from t=0,X0=0t=0,X_{0}=0 to the end of horizon TT. That is, if Xt,ptX_{t},p_{t} denotes the adoption level and price at time t[0,T]t\in[0,T] on following optimal pricing policy from t=0,X0=0t=0,X_{0}=0, then pt=p(Xt,α,β)=π(Xt,α,β,Tt)p_{t}=p^{*}(X_{t},\alpha,\beta)=\pi^{*}(X_{t},\alpha,\beta,T-t).

Now consider the adoption process on starting from an initial adoption level xx at time t=0t=0, and then following the optimal pricing policy. Again XtX_{t} denotes the adoption level at time tt in this process. Plugging pt=π(Xt,α,β,Tt)p_{t}=\pi^{*}(X_{t},\alpha,\beta,T-t) back into (5), it’s easy to derive that

dXtdt=1e(α+βXT(x))(1XT(x))\frac{dX_{t}}{dt}=\frac{1}{e}(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x)) (16)

This means that under the optimal policy, the rate of adoption is constant. We can integrate the above from t=0t=0 to TT, and also solve the resulting quadratic equation for t=Tt=T to compute the final adoption level XT(x)X^{*}_{T}(x) under the optimal pricing policy when starting from an initial adoption level xx at t=0t=0:

XT(x)x=1e(α+βXT(x))(1XT(x))T\displaystyle X^{*}_{T}(x)-x=\frac{1}{e}(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))T (17)
XT(x)=T(βα)e+[T(βα)e]2+4αβT2+4eβxT2βT\displaystyle X^{*}_{T}(x)=\frac{T(\beta-\alpha)-e+\sqrt{[T(\beta-\alpha)-e]^{2}+4\alpha\beta T^{2}+4e\beta xT}}{2\beta T} (18)

Note that on plugging x=0x=0 in (18), we obtain the expression for XTX^{*}_{T} in (8).

B.2 Price Lipschitz Bound

We start with some Lipschitz bounds on how the optimal price offered at adoption level xx can change with respect to α,β\alpha,\beta. Note that here we are assuming X0=0X_{0}=0 and that the entire optimal price curve from the beginning changes if we change α,β\alpha,\beta (i.e., we use (6) not (15)). In the following lemma, XTX^{*}_{T} denotes the adoption curve on following the optimal pricing policy for all times t[0,T]t\in[0,T] starting from 0 adoption level under the deterministic Bass model with parameters (α,β)(\alpha,\beta) as given in (8). Using the expanded notation introduced in the previous subsection, it can also be called XT(0,α,β)X^{*}_{T}(0,\alpha,\beta).

Lemma B.1.

011XTXTα1α0\leq\frac{1}{1-X^{*}_{T}}\frac{\partial X^{*}_{T}}{\partial\alpha}\leq\frac{1}{\alpha}, and 011XTXTβ1β0\leq\frac{1}{1-X^{*}_{T}}\frac{\partial X^{*}_{T}}{\partial\beta}\leq\frac{1}{\beta}

Proof.

Below, we use XTX_{T} instead of XTX^{*}_{T} to denote the adoption at time TT, in the deterministic optimal trajectory starting at X0=0X_{0}=0. Differentiating (7) with respect to α\alpha:

βTXT2+eXTβTXT+αTXTαT\displaystyle\beta TX_{T}^{2}+eX_{T}-\beta TX_{T}+\alpha TX_{T}-\alpha T =0\displaystyle=0
2βTXTαXT+eXTαβTXTα+TXT+αTXTαT\displaystyle 2\beta T\frac{\partial X_{T}}{\partial\alpha}X_{T}+e\frac{\partial X_{T}}{\partial\alpha}-\beta T\frac{\partial X_{T}}{\partial\alpha}+TX_{T}+\alpha T\frac{\partial X_{T}}{\partial\alpha}-T =0\displaystyle=0
XTα(2TXTβ+eβT+αT)\displaystyle\frac{\partial X_{T}}{\partial\alpha}(2TX_{T}\beta+e-\beta T+\alpha T) =T(1XT)\displaystyle=T(1-X_{T})

Rearranging and substituting XTX_{T} using (8):

11XTXTα\displaystyle\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\alpha} =T2βTXT+e+(αβ)T\displaystyle=\frac{T}{2\beta TX_{T}+e+(\alpha-\beta)T}
=TT(βα)e+4T2αβ+(T(βα)e)2+(αβ)T+e\displaystyle=\frac{T}{T(\beta-\alpha)-e+\sqrt{4T^{2}\alpha\beta+\left(T\left(\beta-\alpha\right)-e\right)^{2}}+(\alpha-\beta)T+e}
=T4T2αβ+(T(βα)e)20.\displaystyle=\frac{T}{\sqrt{4T^{2}\alpha\beta+\left(T(\beta-\alpha)-e\right)^{2}}}\geq 0.

Note the denominator can be bounded in two ways:

4T2αβ+(T(βα)e)2|T(βα)e| and,\displaystyle\sqrt{4T^{2}\alpha\beta+\left(T(\beta-\alpha)-e\right)^{2}}\geq\left|T(\beta-\alpha)-e\right|\text{ and,}
4T2αβ+(T(βα)e)2=T2(α+β)2+2(αβ)T+e2|T(α+β)e|.\displaystyle\sqrt{4T^{2}\alpha\beta+\left(T(\beta-\alpha)-e\right)^{2}}=\sqrt{T^{2}(\alpha+\beta)^{2}+2(\alpha-\beta)T+e^{2}}\geq\left|T(\alpha+\beta)-e\right|.

So

11XTXTαmin(|1α+βeT|,|1αβ+eT|)=min(|1α+(βeT)|,|1α(βeT)|)1α\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\alpha}\leq\min(|\frac{1}{\alpha+\beta-\frac{e}{T}}|,|\frac{1}{\alpha-\beta+\frac{e}{T}}|)=min(|\frac{1}{\alpha+(\beta-\frac{e}{T})}|,|\frac{1}{\alpha-(\beta-\frac{e}{T})}|)\leq\frac{1}{\alpha}

Following a similar procedure, (differentiating (7) with respect to β\beta and using (8)) we can also get the following bound:

11XTXTβ\displaystyle\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\beta} =TXT2βTXT+e+(αβ)T\displaystyle=\frac{TX_{T}}{2\beta TX_{T}+e+(\alpha-\beta)T}
=T(βα)e2β4T2αβ+(T(α+β)e)2+12β\displaystyle=\frac{T(\beta-\alpha)-e}{2\beta\sqrt{4T^{2}\alpha\beta+\left(T\left(-\alpha+\beta\right)-e\right)^{2}}}+\frac{1}{2\beta}
011XTXTβ\displaystyle 0\leq\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\beta} 1β\displaystyle\leq\frac{1}{\beta}

where in the last step we used the fact that |T(βα)e|4T2αβ+(T(α+β)e)21\frac{|T(\beta-\alpha)-e|}{\sqrt{4T^{2}\alpha\beta+\left(T\left(-\alpha+\beta\right)-e\right)^{2}}}\leq 1. ∎

In our proofs we will sometimes need to compare the optimal final adoption level under two different market parameters. To derive the results comparing two different market parameters, instead of XT(x)X^{*}_{T}(x), we use the notation XT(x,α,β)X^{*}_{T}(x,\alpha,\beta) that makes the dependence on α,β\alpha,\beta explicit.

Corollary B.1.

XT(0,α,β)XT(0,α,0)=αTαT+eX^{*}_{T}(0,\alpha,\beta)\geq X^{*}_{T}(0,\alpha,0)=\frac{\alpha T}{\alpha T+e}.

Proof.

The inequality follows from the non-negativity of Xβ\frac{\partial X^{*}}{\partial\beta} proved in Lemma B.1. The equality can be easily derived by solving for XTX^{*}_{T} in (7) after plugging in β=0\beta=0. ∎

Lemma B.2.

For any 0xXT0\leq x\leq X^{*}_{T}, |p(x,α,β)α|2+β/αα,,|p(x,α,β)β|3min(α,β).\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\alpha}\right|\leq\frac{2+\beta/\alpha}{\alpha,},\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\beta}\right|\leq\frac{3}{\min(\alpha,\beta)}.

Proof.

Below, we use XTX_{T} instead of XTX^{*}_{T} to denote the optimal final adoption level, assuming that the initial adoption level is 0. Taking the derivatives of (6), and by using Lemma B.1:

|p(x,α,β)α|\displaystyle\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\alpha}\right| =|log(α+βx)α+log(1x)αlog(α+βXT)αlog(1XT)α|\displaystyle=\left|\frac{\partial\log(\alpha+\beta x)}{\partial\alpha}+\frac{\partial\log(1-x)}{\partial\alpha}-\frac{\partial\log(\alpha+\beta X_{T})}{\partial\alpha}-\frac{\partial\log(1-X_{T})}{\partial\alpha}\right|
=|1α+βx1α+βXT(1+βXTα)+11XTXTα|\displaystyle=\left|\frac{1}{\alpha+\beta x}-\frac{1}{\alpha+\beta X_{T}}(1+\beta\frac{\partial X_{T}}{\partial\alpha})+\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\alpha}\right|
max(|1α+βx+11XTXTα|,|1α+βXT(1+βXTα)|)\displaystyle\leq\max\left(\left|\frac{1}{\alpha+\beta x}+\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\alpha}\right|,\left|\frac{1}{\alpha+\beta X_{T}}(1+\beta\frac{\partial X_{T}}{\partial\alpha})\right|\right)
max(2α,1α+β/α(1XT)α)\displaystyle\leq\max\left(\frac{2}{\alpha},\frac{1}{\alpha}+\frac{\beta/\alpha(1-X_{T})}{\alpha}\right)
2+β/αα\displaystyle\leq\frac{2+\beta/\alpha}{\alpha}
|p(x,α,β)β|\displaystyle\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\beta}\right| =|log(α+βx)βlog(α+βXT)βlog(1XT)β|\displaystyle=\left|\frac{\partial\log(\alpha+\beta x)}{\partial\beta}-\frac{\partial\log(\alpha+\beta X_{T})}{\partial\beta}-\frac{\log(1-X_{T})}{\partial\beta}\right|
=|xα+βx1α+βXT(XT+βXTβ)+11XTXTβ|\displaystyle=\left|\frac{x}{\alpha+\beta x}-\frac{1}{\alpha+\beta X_{T}}(X_{T}+\beta\frac{\partial X_{T}}{\partial\beta})+\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\beta}\right|
|xα+βx1α+βXT(XT+βXTβ)|+|11XTXTβ|\displaystyle\leq\left|\frac{x}{\alpha+\beta x}-\frac{1}{\alpha+\beta X_{T}}(X_{T}+\beta\frac{\partial X_{T}}{\partial\beta})\right|+\left|\frac{1}{1-X_{T}}\frac{\partial X_{T}}{\partial\beta}\right|
1α(1+βXTβ)+1β\displaystyle\leq\frac{1}{\alpha}(1+\beta\frac{\partial X_{T}}{\partial\beta})+\frac{1}{\beta}
2α+1β\displaystyle\leq\frac{2}{\alpha}+\frac{1}{\beta}
3min(α,β)\displaystyle\leq\frac{3}{\min(\alpha,\beta)}

Appendix C Proof of Lemma 2.1 and Lemma 2.2

We first prove the concavity property of the deterministic optimal value stated as Lemma 2.1.

See 2.1

Proof.

For simplicity of notation, in this proof we use TT to denote the remaining time. The optimal value function for the continuous deterministic Bass model when the remaining time is TT can be expressed using the following dynamic programming equation (for all δ0\delta\geq 0),

Vdet(x,T)\displaystyle V^{\text{det}}(x,T) =maxppλ(p,x)δ+Vdet(x+λ(p,x)δ/m,Tδ)+o(δ)\displaystyle=\max\limits_{p}p\lambda(p,x)\delta+V^{\text{det}}(x+\lambda(p,x)\delta/m,T-\delta)+o(\delta) (19)

where λ(p,x)=mep(α+βx)(1x)\lambda(p,x)=me^{-p}(\alpha+\beta x)(1-x).

Using the Hamilton-Jacobi-Bellman equation for the deterministic Bass model (see equation (12.8) in [18]):

Vdet(x,T)T=maxppλ(p,x)+λ(p,x)mVdet(x,T)x.\frac{\partial V^{\text{det}}(x,T)}{\partial T}=\max_{p}p\lambda(p,x)+\frac{\lambda(p,x)}{m}\frac{\partial V^{\text{det}}(x,T)}{\partial x}. (20)

And the optimal price is the price that achieves the maximum in the above expression (see equation (12.9) in [18]), i.e.,

π(x,α,β,T)=argmaxppλ(p,x)+λ(p,x)mVdet(x,T)x\pi^{*}(x,\alpha,\beta,T)=\operatorname*{arg\,max}_{p}p\lambda(p,x)+\frac{\lambda(p,x)}{m}\frac{\partial V^{\text{det}}(x,T)}{\partial x}

Solving the above maximization problem gives us an expression for the optimal price at state xx with TT time left.

π(x,α,β,T)=11mVdet(x,T)x\pi^{*}(x,\alpha,\beta,T)=1-\frac{1}{m}\frac{\partial V^{\text{det}}(x,T)}{\partial x} (21)

From (15), the optimal price is also given by

π(x,α,β,T)=1+log((α+βx)(1x)(α+βXT(x))(1XT(x)))\pi^{*}(x,\alpha,\beta,T)=1+\log\left(\frac{(\alpha+\beta x)(1-x)}{(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))}\right)\\

where (refer to (17), (18))

XT(x)=x+1e(α+βXT(x))(1XT(x))T\displaystyle X^{*}_{T}(x)=x+\frac{1}{e}(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))T
XT(x)=T(βα)e+[T(βα)e]2+4αβT2+4eβxT2βT\displaystyle X^{*}_{T}(x)=\frac{T(\beta-\alpha)-e+\sqrt{[T(\beta-\alpha)-e]^{2}+4\alpha\beta T^{2}+4e\beta xT}}{2\beta T}

Therefore, substituting,

1mVdet(x,T)x\displaystyle\frac{1}{m}\frac{\partial V^{\text{det}}(x,T)}{\partial x} =log((α+βx)(1x)(α+βXT(x))(1XT(x)))\displaystyle=-\log\left(\frac{(\alpha+\beta x)(1-x)}{(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))}\right)
1m2Vdet(x,T)x2\displaystyle\frac{1}{m}\frac{\partial^{2}V^{\text{det}}(x,T)}{\partial x^{2}} =βα+βx+11x+βXT(x)xα+βXT(x)11XT(x)XT(x)x\displaystyle=\frac{-\beta}{\alpha+\beta x}+\frac{1}{1-x}+\frac{\beta\frac{\partial X^{*}_{T}(x)}{\partial x}}{\alpha+\beta X^{*}_{T}(x)}-\frac{1}{1-X^{*}_{T}(x)}\frac{\partial X^{*}_{T}(x)}{\partial x} (22)

Now we split (22) to two parts and bound them by zero individually. First note that differentiating XT(x)X_{T}^{*}(x) with respect to xx (using (17)) gives us

XT(x)x=ee+(αβ)T+2βTXT(x)\frac{\partial X^{*}_{T}(x)}{\partial x}=\frac{e}{e+(\alpha-\beta)T+2\beta TX^{*}_{T}(x)} (23)

Then, first we show that

11x\displaystyle\frac{1}{1-x} 11XT(x)XT(x)x0,\displaystyle-\frac{1}{1-X^{*}_{T}(x)}\frac{\partial X^{*}_{T}(x)}{\partial x}\leq 0,

which is equivalent to showing that

1XT(x)\displaystyle 1-X^{*}_{T}(x) XT(x)x(1x)\displaystyle\leq\frac{\partial X^{*}_{T}(x)}{\partial x}(1-x)
Using (17) \displaystyle\iff 1XT(x)\displaystyle 1-X^{*}_{T}(x) XT(x)x(1XT(x))+XT(x)xTe(α+βXT(x))(1XT(x))\displaystyle\leq\frac{\partial X^{*}_{T}(x)}{\partial x}(1-X^{*}_{T}(x))+\frac{\partial X^{*}_{T}(x)}{\partial x}\frac{T}{e}(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))
Using (23) \displaystyle\iff 1XT(x)\displaystyle 1-X^{*}_{T}(x) (1XT(x))(ee+(αβ)T+2βTXT(x))(e+T(α+βXT(x))e)\displaystyle\leq(1-X^{*}_{T}(x))(\frac{e}{e+(\alpha-\beta)T+2\beta TX^{*}_{T}(x)})(\frac{e+T(\alpha+\beta X^{*}_{T}(x))}{e})
\displaystyle\iff 1XT(x)\displaystyle 1-X^{*}_{T}(x) (1XT(x))e+αT+βTXT(x)e+(αβ)T+2βTXT(x)\displaystyle\leq(1-X^{*}_{T}(x))\cdot\frac{e+\alpha T+\beta TX^{*}_{T}(x)}{e+(\alpha-\beta)T+2\beta TX^{*}_{T}(x)}

Using expression for XT(x)X^{*}_{T}(x) from (18), we have

e+(αβ)T+2βTXT(x)=[T(βα)e]2+4αβT2+4eβxT0e+(\alpha-\beta)T+2\beta TX^{*}_{T}(x)=\sqrt{[T(\beta-\alpha)-e]^{2}+4\alpha\beta T^{2}+4e\beta xT}\geq 0.

Then, since β0,T0,0XT(x)1\beta\geq 0,T\geq 0,0\leq X^{*}_{T}(x)\leq 1, we have that the fraction in the last inequality is at least 11, and therefore the last inequality holds.

Now we bound the remaining terms in the RHS of (22) by 0. This requires showing that

βα+βx+βXT(x)xα+βXT(x)\displaystyle\frac{-\beta}{\alpha+\beta x}+\frac{\beta\frac{\partial X^{*}_{T}(x)}{\partial x}}{\alpha+\beta X^{*}_{T}(x)} 0\displaystyle\leq 0
\displaystyle\iff (α+βx)XT(x)x\displaystyle(\alpha+\beta x)\frac{\partial X^{*}_{T}(x)}{\partial x} α+βXT(x)\displaystyle\leq\alpha+\beta X^{*}_{T}(x)
Using (17) \displaystyle\iff XT(x)x(α+βXT(x)βTe(α+βXT(x))(1XT(x)))\displaystyle\frac{\partial X^{*}_{T}(x)}{\partial x}\left(\alpha+\beta X^{*}_{T}(x)-\beta\frac{T}{e}(\alpha+\beta X^{*}_{T}(x))(1-X^{*}_{T}(x))\right) α+βXT(x)\displaystyle\leq\alpha+\beta X^{*}_{T}(x)
\displaystyle\iff (α+βXT(x))XT(x)x(1βTe(1XT(x)))\displaystyle(\alpha+\beta X^{*}_{T}(x))\frac{\partial X^{*}_{T}(x)}{\partial x}(1-\frac{\beta T}{e}(1-X^{*}_{T}(x))) α+βXT(x)\displaystyle\leq\alpha+\beta X^{*}_{T}(x)
Using (23) \displaystyle\iff eβT+βXT(x)Te+(αβ)T+2βXT(x)T\displaystyle\frac{e-\beta T+\beta X^{*}_{T}(x)T}{e+(\alpha-\beta)T+2\beta X^{*}_{T}(x)T} 1\displaystyle\leq 1

Since α,β0,T0,XT(x)0\alpha,\beta\geq 0,T\geq 0,X^{*}_{T}(x)\geq 0, the last inequality holds.

Therefore the sum of all the terms in the right hand side of (22) is bounded by 0. This proves the lemma statement. ∎

Now we use the above concavity property to show that for any starting point and remaining time, the optimal revenue in the deterministic model is at least the optimal expected revenue in the stochastic model. Let Vstoch(d,T)V^{\text{stoch}}(d,T) be the optimal expected revenue one can achieve in the stochastic Bass model with dd current adopters and TT time remaining.

Lemma C.1.

For any d{0,,m}d\in\{0,\ldots,m\}, x=dmx=\frac{d}{m}, and any T0T\geq 0:

Vdet(x,T)Vstoch(d,T)V^{\text{det}}(x,T)\geq V^{\text{stoch}}(d,T)
Proof.

Given any δ0\delta\geq 0, let Δ(p,d)\Delta(p,d) be the random number of adoptions that take place in the next δ\delta time in the discrete stochastic Bass model when the current price is pp and current number of adopters is dd. Let x=dmx=\frac{d}{m}, then 𝔼[Δ(p,x)]=λ(p,x)δ+o(δ)\mathbb{E}[\Delta(p,x)]=\lambda(p,x)\delta+o(\delta). Using dynamic programming, we have:

Vstoch(d,T)\displaystyle V^{\text{stoch}}(d,T) =maxp𝔼[pΔ(p,d)+Vstoch(d+Δ(p,d),Tδ)]+o(δ)\displaystyle=\max\limits_{p}\mathbb{E}\left[p\Delta(p,d)+V^{\text{stoch}}(d+\Delta(p,d),T-\delta)\right]+o(\delta)
=maxpp𝔼[Δ(p,d)]+𝔼[Vstoch(d+Δ(p,d),Tδ)]+o(δ).\displaystyle=\max\limits_{p}p\mathbb{E}[\Delta(p,d)]+\mathbb{E}\left[V^{\text{stoch}}(d+\Delta(p,d),T-\delta)\right]+o(\delta).

Also,

Vdet(x,T)\displaystyle V^{\text{det}}(x,T) =maxpp𝔼[Δ(p,d)]+Vdet(x+𝔼[Δ(p,d)]/m,Tδ)+o(δ)\displaystyle=\max\limits_{p}p\mathbb{E}[\Delta(p,d)]+V^{\text{det}}(x+\mathbb{E}[\Delta(p,d)]/m,T-\delta)+o(\delta)

We use induction to prove the lemma by working from the end of the planning horizon (no time remaining). We know Vdet(x,0)=Vstoch(d,0)=0V^{\text{det}}(x,0)=V^{\text{stoch}}(d,0)=0 for all dd, x=dmx=\frac{d}{m}. Suppose the inequality holds for TδT-\delta, and any dd, x=dmx=\frac{d}{m}. Then,

Vdet(x,T)\displaystyle V^{\text{det}}(x,T) =maxpp𝔼[Δ(p,d)]+Vdet(x+𝔼[Δ(p,d)]/m,Tδ)+o(δ)\displaystyle=\max\limits_{p}p\mathbb{E}[\Delta(p,d)]+V^{\text{det}}(x+\mathbb{E}[\Delta(p,d)]/m,T-\delta)+o(\delta)
(using concavity from Lemma 2.1) maxpp𝔼[Δ(p,d)]m+𝔼[Vdet(x+Δ(p,d)/m,Tδ)]+o(δ)\displaystyle\geq\max\limits_{p}p\mathbb{E}[\Delta(p,d)]m+\mathbb{E}\left[V^{\text{det}}(x+\Delta(p,d)/m,T-\delta)\right]+o(\delta)
(inductive assumption on TδT-\delta) maxpp𝔼[Δ(p,d)]m+𝔼[Vstoch(d+Δ(p,d),Tδ)]+o(δ)\displaystyle\geq\max\limits_{p}p\mathbb{E}[\Delta(p,d)]m+\mathbb{E}\left[V^{\text{stoch}}(d+\Delta(p,d),T-\delta)\right]+o(\delta)
=Vstoch(d,T)+o(δ)\displaystyle=V^{\text{stoch}}(d,T)+o(\delta)

Then, taking δ0\delta\rightarrow 0, we obtain the lemma statement. ∎

Finally, we prove the following lemma that will allow us to establish an upper bound on the deterministic optimal revenue compared to the stochastic optimal revenue.

Lemma C.2.

Fix any α,β,T\alpha,\beta,T such that

mXT8log2(4mlog(e+(α+β)T))+32mX^{*}_{T}\geq 8\log^{2}\left(4m\log(e+(\alpha+\beta)T)\right)+32

then

Vdet(0,T)Vstoch(0,T)O(log((α+β)T)mXTlog(m))V^{\text{det}}(0,T)-V^{\text{stoch}}(0,T)\leq O\left(\log\left((\alpha+\beta)T\right)\sqrt{mX^{*}_{T}\log(m)}\right)
Proof.

The proof constructs a fixed price sequence such that the expected revenue in the stochastic Bass model under these prices is within O(m)O(\sqrt{m}) of the optimal deterministic revenue. Then, since the optimal expected stochastic revenue is at least as much as that obtained under the given price sequence, we will obtain the lemma statement.

Consider the following pricing scheme: for all time instances after arrival of (d1)th(d-1)^{th} customer, and until arrival of dthd^{th} customer, post price pdp^{*}_{d} given by: pdp(d1m,α,β)p^{*}_{d}\coloneqq p^{*}\left(\frac{d-1}{m},\alpha,\beta\right) for d=d0+1,,mXTd=d_{0}+1,\ldots,\lfloor mX^{*}_{T}\rfloor. Here p(,α,β)p^{*}(\cdot,\alpha,\beta) and XTX^{*}_{T} are given by the optimal price curve and adoption levels for the deterministic Bass model, as defined in (6) and (8).

Now, since in our price sequence, the prices are fixed between two arrivals, we know that (in the stochastic Bass model) the inter-arrival time between customer d1d-1 and dd is an exponential random variable IdExp(λ(pd,d1m))I_{d}\sim Exp\left(\lambda(p^{*}_{d},\frac{d-1}{m})\right), where λ(p,x)=ep(α+βx)(1x)m\lambda(p,x)=e^{-p}(\alpha+\beta x)(1-x)m. Note that from (16) and (17) we have that λ(pd,d1m)=mXTT\lambda(p^{*}_{d},\frac{d-1}{m})=\frac{mX^{*}_{T}}{T} , and that d=1n1λ(pd,d1m)=nTmXT\sum\limits_{d=1}^{n}\frac{1}{\lambda(p^{*}_{d},\frac{d-1}{m})}=\frac{nT}{mX^{*}_{T}} for any nmXTn\leq\lfloor mX^{*}_{T}\rfloor.

Let τn\tau_{n} denotes the time of arrival of the nthn^{th} customer in the stochastic model. Set n=mXT8mXTlog(2δ)n=\lfloor mX^{*}_{T}-\sqrt{8mX^{*}_{T}\log(\frac{2}{\delta})}\rfloor, λ¯=mXTT\underline{\lambda}=\frac{mX^{*}_{T}}{T}, ϵ=8nλ¯2log(2δ)\epsilon=\sqrt{\frac{8n}{\underline{\lambda}^{2}}log(\frac{2}{\delta})}, for some δ0\delta\geq 0 to be specified later. Assume n2log(2/δ)n\geq 2\log(2/\delta) for now, then by Lemma A.1, we have with probability 1δ1-\delta:

|τn(18log(2δ)mXT)T|T8XTmlog(2δ)\left|\tau_{n}-\left(1-\sqrt{\frac{8\log(\frac{2}{\delta})}{mX^{*}_{T}}}\right)T\right|\leq T\sqrt{\frac{8}{X^{*}_{T}m}\log(\frac{2}{\delta})}

Therefore, with probability at least 1δ1-\delta, τnT\tau_{n}\leq T, which means that the total number of adoptions dTd_{T} observed in the stochastic Bass model in time TT is at least nn. Let px=p(x,α,β)p^{*}_{x}=p^{*}(x,\alpha,\beta), pmax=maxxp(x,α,β)p^{*}_{max}=\max\limits_{x}p^{*}(x,\alpha,\beta). Then,

Vdet(0,T)Vstoch(0,T)\displaystyle V^{\text{det}}(0,T)-V^{\text{stoch}}(0,T)\leq m0XTpx𝑑xd=1dTpd1m\displaystyle m\int_{0}^{X^{*}_{T}}p^{*}_{x}dx-\sum\limits_{d=1}^{d_{T}}p^{*}_{\frac{d-1}{m}}
(Corollary D.1)\displaystyle\text{(Corollary~{}\ref{cor: discretization is good enough})}\leq 1mXTpd1md=1dTpd1m+β2α+12log(m)+3pmax\displaystyle\sum\limits_{1}^{\lfloor mX^{*}_{T}\rfloor}p^{*}_{\frac{d-1}{m}}-\sum\limits_{d=1}^{d_{T}}p^{*}_{\frac{d-1}{m}}+\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)+3p^{*}_{max}
\displaystyle\leq n+1mXTpd1m+β2α+12log(m)+3pmaxw.p. 1δ\displaystyle\sum\limits_{n+1}^{\lfloor mX^{*}_{T}\rfloor}p^{*}_{\frac{d-1}{m}}+\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)+3p^{*}_{max}\quad\text{w.p. }1-\delta
(Lemma 4.4)=\displaystyle\text{(Lemma~{}\ref{lem:priceupperbound})}= O(log((α+β)T)mXTlog(1δ))w.p. 1δ\displaystyle O\left(\log\left((\alpha+\beta)T\right)\sqrt{mX^{*}_{T}\log(\frac{1}{\delta})}\right)\quad\text{w.p. }1-\delta

where we borrowed Corollary D.1 from Appendix D.2, and used the price upper bound pmaxO(log((α+β)T))p^{*}_{max}\leq O\left(\log\left((\alpha+\beta)T\right)\right) from Lemma 4.4.

Note that the third step holds with probability 1δ1-\delta. When it does not hold (with probability at most δ\delta), we can bound the gap between deterministic and stochastic revenue by a trivial upper bound of pmaxmXTp^{*}_{max}mX^{*}_{T} on the deterministic revenue. We set δ=1mXTpmax\delta=\frac{1}{\sqrt{mX^{*}_{T}p^{*}_{max}}} to get that

Vdet(0,T)Vstoch(0,T)\displaystyle V^{\text{det}}(0,T)-V^{\text{stoch}}(0,T)
\displaystyle\leq (1δ)O(log((α+β)T)mXTlog(1δ))+δpmaxmXT\displaystyle(1-\delta)O\left(\log\left((\alpha+\beta)T\right)\sqrt{mX^{*}_{T}\log(\frac{1}{\delta})}\right)+\delta p^{*}_{max}mX^{*}_{T}
\displaystyle\leq O(log((α+β)T)mXTlog(m)).\displaystyle O\left(\log\left((\alpha+\beta)T\right)\sqrt{mX^{*}_{T}\log\left(m\right)}\right).

Finally, one can verify that the condition on mXTmX^{*}_{T} implies that n2log(2/δ)n\geq 2\log(2/\delta). Let M=mXTM=mX^{*}_{T}, and assume that log(2Mpmax)1\log(2\sqrt{Mp^{*}_{max}})\geq 1:

n2log(2/δ)\displaystyle n\geq 2\log(2/\delta)\iff M22Mlog(2Mpmax)+2log(2Mpmax)\displaystyle M\geq 2\sqrt{2M\log(2\sqrt{Mp^{*}_{max}})}+2\log(2\sqrt{Mp^{*}_{max}})
\displaystyle\impliedby M(22M+2)log(2Mpmax)\displaystyle M\geq\left(2\sqrt{2M}+2\right)\log(2\sqrt{Mp^{*}_{max}})
\displaystyle\impliedby M22log(4Mpmax)\displaystyle\sqrt{M}\geq 2\sqrt{2}\log(4Mp^{*}_{max})
(Lemma 4.4)\displaystyle\text{(Lemma~{}\ref{lem:priceupperbound})}\impliedby M22log(4Mlog(e+(α+β)T))\displaystyle\sqrt{M}\geq 2\sqrt{2}\log\left(4M\log(e+(\alpha+\beta)T)\right)

If log(2Mpmax)<1\log(2\sqrt{Mp^{*}_{max}})<1, then the first line is implied by M22M+2M\geq 2\sqrt{2M}+2, which is satisfied by M32M\geq 32.

The proof of Lemma 2.2 can now be completed using the upper and lower bounds on deterministic optimal revenue compared to the stochastic optimal revenue proven above.

Proof of Lemma 2.2.

The first part of Lemma 2.2 follows directly from Lemma C.1 because

Pseudo-Regret(T)Regret(T)Vdet(0,T)Vstoch(0,T).\text{Pseudo-Regret}(T)\geq\text{Regret}(T)\iff V^{\text{det}}(0,T)\geq V^{\text{stoch}}(0,T).

Similarly, the second part follows from Lemma C.2. Note that if the condition on mXTmX^{*}_{T} does not hold, then the gap Vdet(0,T)Vstoch(0,T)V^{\text{det}}(0,T)-V^{\text{stoch}}(0,T) can be trivially bounded by O(logTlog2(mlogT))O\left(\log T\log^{2}\left(m\log T\right)\right):

Vdet(0,T)Vstoch(0,T)\displaystyle V^{\text{det}}(0,T)-V^{\text{stoch}}(0,T)\leq mXTpmax0\displaystyle mX^{*}_{T}p^{*}_{max}-0
(Lemma 4.4)\displaystyle\text{(Lemma~{}\ref{lem:priceupperbound})}\leq [8log2(4mlog(e+(α+β)T))+32]O(log((α+β)T))\displaystyle\left[8\log^{2}\left(4m\log(e+(\alpha+\beta)T)\right)+32\right]O(\log((\alpha+\beta)T))
=\displaystyle= O(log((α+β)T)log2(mlog((α+β)T)))\displaystyle O\left(\log((\alpha+\beta)T)\log^{2}\left(m\log((\alpha+\beta)T)\right)\right)

Appendix D Upper Bound Proofs

D.1 Step 1: Bounding the estimation errors (Proof of Lemma 3.1, Lemma 4.1)

We prove the estimation bound on α^\hat{\alpha} and β^i\hat{\beta}_{i} separately in Lemma D.1 and Lemma D.2 respectively. Lemma 3.1 follows directly from these two results.

Lemma D.1.

Assuming that m1/316(α+β)α8log(2δ)m^{1/3}\geq\frac{16(\alpha+\beta)}{\alpha}\sqrt{8\log(\frac{2}{\delta})}, then with probability 1δ1-\delta, |αα^|A|\alpha-\hat{\alpha}|\leq A.

Proof.

From (10) we have the following expression for the estimator error of α^\hat{\alpha}:

|αα^|=γep0|1γm𝔼[τ1]1τγm||\alpha-\hat{\alpha}|=\gamma e^{p_{0}}|\frac{1}{\gamma m\mathbb{E}[\tau_{1}]}-\frac{1}{\tau_{\gamma m}}|

Recall that τd\tau_{d} denotes the (stochastic) time of arrival of dthd^{th} customer in the stochastic Bass model under the pricing decisions made by the algorithm. Note that in our algorithm prices do not change between customer arrivals. Therefore, the interarrival time Id=τdτd1I_{d}=\tau_{d}-\tau_{d-1} between d1d-1 and dd customer follows an exponential distribution. Specifically, since the prices were fixed as p0p_{0} for the first γm\gamma m customers, we have IdExp(λ(p0,d1m))I_{d}\sim Exp(\lambda(p_{0},\frac{d-1}{m})) for d{1,,γm}d\in\{1,\ldots,\gamma m\} where λ(p,x)=mep(α+βx)(1x)\lambda(p,x)=me^{-p}(\alpha+\beta x)(1-x) and λ(p0,d1m)λ¯ep0α(1γ)m\lambda(p_{0},\frac{d-1}{m})\geq\underline{\lambda}\coloneqq e^{-p_{0}}\alpha(1-\gamma)m for d{1,,γm}d\in\{1,\ldots,\gamma m\}.

Therefore, 𝔼[τ1]=𝔼[I1]=1ep0αm\mathbb{E}[\tau_{1}]=\mathbb{E}[I_{1}]=\frac{1}{e^{-p_{0}}\alpha m}, and 𝔼[τγm]=𝔼[d=1γmId]\mathbb{E}[\tau_{\gamma m}]=\mathbb{E}[\sum_{d=1}^{\gamma m}I_{d}]. Using Lemma A.1 we have:

(|τγmd=1γm𝔼[Id]|ϵ)\displaystyle\mathbb{P}\left(\left|\tau_{\gamma m}-\sum\limits_{d=1}^{\gamma m}\mathbb{E}[I_{d}]\right|\geq\epsilon\right)\leq 2exp(ϵ2λ¯28γm),\displaystyle 2\exp(-\frac{\epsilon^{2}\underline{\lambda}^{2}}{8\gamma m}),
so that |τγmd=1γm1λ(p0,d)|\displaystyle\text{so that }\left|\tau_{\gamma m}-\sum\limits_{d=1}^{\gamma m}\frac{1}{\lambda(p_{0},d)}\right|\leq ep0α(1γ)8log(2δ)γmwith probability 1δ,\displaystyle\frac{e^{p_{0}}}{\alpha(1-\gamma)}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}\quad\text{with probability }1-\delta,

where we set ϵ=8log(2δ)γmλ¯2\epsilon=\sqrt{\frac{8\log(\frac{2}{\delta})\gamma m}{\underline{\lambda}^{2}}}. Since m1/32log(2δ)ϵ2γmλ¯m^{1/3}\geq\sqrt{2\log(\frac{2}{\delta})}\implies\epsilon\leq\frac{2\gamma m}{\underline{\lambda}}, the condition for Lemma A.1 is satisfied. To bound the estimation error of α\alpha:

|τγmγm𝔼[τ1]|\displaystyle\left|\tau_{\gamma m}-\gamma m\mathbb{E}[\tau_{1}]\right| =|τγmd=1γm1λ(p0,d)|+|d=1γm1λ(p0,d1m)γm𝔼[τ1]|\displaystyle=\left|\tau_{\gamma m}-\sum\limits_{d=1}^{\gamma m}\frac{1}{\lambda(p_{0},d)}\right|+\left|\sum\limits_{d=1}^{\gamma m}\frac{1}{\lambda(p_{0},\frac{d-1}{m})}-\gamma m\mathbb{E}[\tau_{1}]\right|
(with probability 1δ1-\delta) ep0α(1γ)8log(2δ)γm+|d=1γm(1λ(p0,d1m)1ep0αm)|\displaystyle\leq\frac{e^{p_{0}}}{\alpha(1-\gamma)}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\left|\sum\limits_{d=1}^{\gamma m}(\frac{1}{\lambda(p_{0},\frac{d-1}{m})}-\frac{1}{e^{-p_{0}}\alpha m})\right|
ep0α(1γ)8log(2δ)γm+|d=1γm|ep0αmep0(α+βd1m)(md+1)|(ep0α(1γ)m)2|\displaystyle\leq\frac{e^{p_{0}}}{\alpha(1-\gamma)}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\left|\sum\limits_{d=1}^{\gamma m}\frac{|e^{-p_{0}}\alpha m-e^{-p_{0}}(\alpha+\beta\frac{d-1}{m})(m-d+1)|}{(e^{-p_{0}}\alpha(1-\gamma)m)^{2}}\right|
ep0α(1γ)8log(2δ)γm+|d=1γmmax(α,β)dep0α2(1γ)2m2|\displaystyle\leq\frac{e^{p_{0}}}{\alpha(1-\gamma)}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\left|\sum\limits_{d=1}^{\gamma m}\frac{\max(\alpha,\beta)d}{e^{-p_{0}}\alpha^{2}(1-\gamma)^{2}m^{2}}\right|
ep0α(1γ)8log(2δ)γm+ep0max(α,β)γ2α2(1γ)2\displaystyle\leq\frac{e^{p_{0}}}{\alpha(1-\gamma)}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\frac{e^{p_{0}}\max(\alpha,\beta)\gamma^{2}}{\alpha^{2}(1-\gamma)^{2}}
2ep0(α+β)α2(1γ)28log(2δ)m2/3\displaystyle\leq\frac{2e^{p_{0}}(\alpha+\beta)}{\alpha^{2}(1-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}m^{-2/3}

Denote the above bound by α\mathcal{B}_{\alpha}. Plug this and p0=0p_{0}=0 into the |α^α||\hat{\alpha}-\alpha| expression above we have with probability 1δ1-\delta:

|α^α|\displaystyle|\hat{\alpha}-\alpha| γα(γm𝔼[τ1]α)2\displaystyle\leq\gamma\frac{\mathcal{B}_{\alpha}}{\left(\gamma m\mathbb{E}[\tau_{1}]-\mathcal{B}_{\alpha}\right)^{2}}
γ4αγ2m2𝔼[τ1]2\displaystyle\leq\gamma\frac{4\mathcal{B}_{\alpha}}{\gamma^{2}m^{2}\mathbb{E}[\tau_{1}]^{2}}
=8(α+β)(1γ)28log(2δ)m1/3\displaystyle=\frac{8(\alpha+\beta)}{(1-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}m^{-1/3}

where in the second inequality we used the fact that m1/316(α+β)α8log(2δ)α12γm𝔼[τ1]m^{1/3}\geq\frac{16(\alpha+\beta)}{\alpha}\sqrt{8\log(\frac{2}{\delta})}\implies\mathcal{B}_{\alpha}\leq\frac{1}{2}\gamma m\mathbb{E}[\tau_{1}].

Lemma D.2.

Assuming that m1/364(α+β)2α28log(2δ)m^{1/3}\geq 64\frac{(\alpha+\beta)^{2}}{\alpha^{2}}\sqrt{8\log(\frac{2}{\delta})}, then for any i=1,,Ki=1,\ldots,K, with probability 1δ1-\delta, |ββ^i|Bi|\beta-\hat{\beta}_{i}|\leq B_{i}.

Proof.

Note that 𝔼[Iγim+1]=𝔼[τγim+1τγim]=1λ(p0,γi)=1ep0(α+βγi)(1γi)m\mathbb{E}[I_{\gamma_{i}m+1}]=\mathbb{E}[\tau_{\gamma_{i}m+1}-\tau_{\gamma_{i}m}]=\frac{1}{\lambda(p_{0},\gamma_{i})}=\frac{1}{e^{-p_{0}}(\alpha+\beta\gamma_{i})(1-\gamma_{i})m}. From (11) we can bound the estimation error of β^\hat{\beta} as follows. We have

α^+β^iγi\displaystyle\hat{\alpha}+\hat{\beta}_{i}\gamma_{i} =γmep0(1γi)m(τ(γi+γ)mτγim),\displaystyle=\frac{\gamma m}{e^{-p_{0}}(1-\gamma_{i})m(\tau_{(\gamma_{i}+\gamma)m}-\tau_{\gamma_{i}m})},
α+βγi\displaystyle\alpha+\beta\gamma_{i} =γmep0(1γi)m𝔼[τγim+1τγim]γm.\displaystyle=\frac{\gamma m}{e^{-p_{0}}(1-\gamma_{i})m\mathbb{E}[\tau_{\gamma_{i}m+1}-\tau_{\gamma_{i}m}]\gamma m}.

Therefore,

|ββ^i||α^α|γi+γmγiep0(1γi)m|1τ(γi+γ)mτγim1γm𝔼[τγim+1τγi]|.|\beta-\hat{\beta}_{i}|\leq\frac{|\hat{\alpha}-\alpha|}{\gamma_{i}}+\frac{\gamma m}{\gamma_{i}e^{-p_{0}}(1-\gamma_{i})m}\left|\frac{1}{\tau_{(\gamma_{i}+\gamma)m}-\tau_{\gamma_{i}m}}-\frac{1}{\gamma m\mathbb{E}[\tau_{\gamma_{i}m+1}-\tau_{\gamma_{i}}]}\right|.

Similar to the estimation bound of α^\hat{\alpha}, the main step is to bound the arrival times. Note that λ(p0,d1m)ep0(α+βγi)(1γiγ)mλ¯ep0(α+βγi)(1/3γ)m\lambda(p_{0},\frac{d-1}{m})\geq e^{-p_{0}}(\alpha+\beta\gamma_{i})(1-\gamma_{i}-\gamma)m\geq\underline{\lambda}\coloneqq e^{-p_{0}}(\alpha+\beta\gamma_{i})(1/3-\gamma)m for d{γim+1,,(γi+γ)m}d\in\{\gamma_{i}m+1,\ldots,(\gamma_{i}+\gamma)m\}, where we used the fact that by the construction of Algorithm 1, γi2/3\gamma_{i}\leq 2/3 for all i=1,,Ki=1,\ldots,K. Using Lemma A.1 we have:

(|τγmd=1γm𝔼[Id]|ϵ)\displaystyle\mathbb{P}\left(\left|\tau_{\gamma m}-\sum\limits_{d=1}^{\gamma m}\mathbb{E}[I_{d}]\right|\geq\epsilon\right)\leq 2exp(ϵ2λ¯28γm)\displaystyle 2exp(-\frac{\epsilon^{2}\underline{\lambda}^{2}}{8\gamma m})
so that |τγmd=1γm1λ(p0,d)|\displaystyle\text{ so that }\left|\tau_{\gamma m}-\sum\limits_{d=1}^{\gamma m}\frac{1}{\lambda(p_{0},d)}\right|\leq ep0(α+βγi)(1/3γ)8log(2δ)γmwith probability 1δ\displaystyle\frac{e^{p_{0}}}{(\alpha+\beta\gamma_{i})(1/3-\gamma)}\sqrt{8log(\frac{2}{\delta})\frac{\gamma}{m}}\quad\text{with probability }1-\delta

where we set ϵ=8log(2δ)γmλ¯2\epsilon=\sqrt{\frac{8\log(\frac{2}{\delta})\gamma m}{\underline{\lambda}^{2}}}. Since m1/32log(2δ)ϵ2γmλ¯m^{1/3}\geq\sqrt{2\log(\frac{2}{\delta})}\implies\epsilon\leq\frac{2\gamma m}{\underline{\lambda}}, the condition for Lemma A.1 is satisfied. To bound the estimation error of β\beta:

|(τ(γi+γ)mτγim)γm𝔼[τγim+1τγim]|=|d=γim+1(γi+γ)mIdγmλ(p0,γi)|\displaystyle\left|\left(\tau_{(\gamma_{i}+\gamma)m}-\tau_{\gamma_{i}m}\right)-\gamma m\mathbb{E}[\tau_{\gamma_{i}m+1}-\tau_{\gamma_{i}m}]\right|=\left|\sum\limits_{d=\gamma_{i}m+1}^{(\gamma_{i}+\gamma)m}I_{d}-\frac{\gamma m}{\lambda(p_{0},\gamma_{i})}\right|
\displaystyle\leq |d=γim+1(γi+γ)m(Id1λ(p0,d1m))|+|d=γim+1(γi+γ)m1λ(p0,d1m)γmλ(p0,γi)|\displaystyle\left|\sum\limits_{d=\gamma_{i}m+1}^{(\gamma_{i}+\gamma)m}\left(I_{d}-\frac{1}{\lambda(p_{0},\frac{d-1}{m})}\right)\right|+\left|\sum\limits_{d=\gamma_{i}m+1}^{(\gamma_{i}+\gamma)m}\frac{1}{\lambda(p_{0},\frac{d-1}{m})}-\frac{\gamma m}{\lambda(p_{0},\gamma_{i})}\right|
(w.p. 1δ)\displaystyle\text{(w.p. $1-\delta$)}\leq ep0(1/3γ)(α+βγi)8log(2δ)γm+|d=γim+1γim+γm(1λ(p0,d)1λ(p0,γi))|\displaystyle\frac{e^{p_{0}}}{(1/3-\gamma)(\alpha+\beta\gamma_{i})}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\left|\sum\limits_{d=\gamma_{i}m+1}^{\gamma_{i}m+\gamma m}(\frac{1}{\lambda(p_{0},d)}-\frac{1}{\lambda(p_{0},\gamma_{i})})\right|
\displaystyle\leq ep0(1/3γ)(α+βγi)8log(2δ)γm\displaystyle\frac{e^{p_{0}}}{(1/3-\gamma)(\alpha+\beta\gamma_{i})}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}
+|d=γim+1γim+γm|ep0(α+βd1m)(md+1)ep0(α+βγi)(1γi)m|(ep0(α+βγi)(1γiγ)m)2|\displaystyle+\left|\sum\limits_{d=\gamma_{i}m+1}^{\gamma_{i}m+\gamma m}\frac{|e^{-p_{0}}(\alpha+\beta\frac{d-1}{m})(m-d+1)-e^{-p_{0}}(\alpha+\beta\gamma_{i})(1-\gamma_{i})m|}{(e^{-p_{0}}(\alpha+\beta\gamma_{i})(1-\gamma_{i}-\gamma)m)^{2}}\right|
\displaystyle\leq ep0(1/3γ)(α+βγi)8log(2δ)γm+|d=γim+1γim+γm((α+β)γep0(α+βγi)2(1γiγ)2m)|\displaystyle\frac{e^{p_{0}}}{(1/3-\gamma)(\alpha+\beta\gamma_{i})}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\left|\sum\limits_{d=\gamma_{i}m+1}^{\gamma_{i}m+\gamma m}(\frac{(\alpha+\beta)\gamma}{e^{-p_{0}}(\alpha+\beta\gamma_{i})^{2}(1-\gamma_{i}-\gamma)^{2}m})\right|
\displaystyle\leq ep0(1/3γ)(α+βγi)8log(2δ)γm+ep0(α+β)γ2(α+βγi)2(1/3γ)2\displaystyle\frac{e^{p_{0}}}{(1/3-\gamma)(\alpha+\beta\gamma_{i})}\sqrt{8\log(\frac{2}{\delta})\frac{\gamma}{m}}+\frac{e^{p_{0}}(\alpha+\beta)\gamma^{2}}{(\alpha+\beta\gamma_{i})^{2}(1/3-\gamma)^{2}}
\displaystyle\leq 2ep0(α+β)(α+βγi)2(1/3γ)28log(2δ)m2/3\displaystyle\frac{2e^{p_{0}}(\alpha+\beta)}{(\alpha+\beta\gamma_{i})^{2}(1/3-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}m^{-2/3}

Let β\mathcal{B}_{\beta} denote this bound. Plug this result back into the bound on |ββ^i||\beta-\hat{\beta}_{i}|:

|ββ^i|\displaystyle|\beta-\hat{\beta}_{i}| |α^α|γi+γmγiep0(1γi)m(1τ(γi+γ)mτγim1γm𝔼[τγim+1τγi])\displaystyle\leq\frac{|\hat{\alpha}-\alpha|}{\gamma_{i}}+\frac{\gamma m}{\gamma_{i}e^{-p_{0}}(1-\gamma_{i})m}\left(\frac{1}{\tau_{(\gamma_{i}+\gamma)m}-\tau_{\gamma_{i}m}}-\frac{1}{\gamma m\mathbb{E}[\tau_{\gamma_{i}m+1}-\tau_{\gamma_{i}}]}\right)
|α^α|γi+ep0γγi(1γi)(β(γm𝔼[Iγim+1]β)2)\displaystyle\leq\frac{|\hat{\alpha}-\alpha|}{\gamma_{i}}+\frac{e^{p_{0}}\gamma}{\gamma_{i}(1-\gamma_{i})}\left(\frac{\mathcal{B}_{\beta}}{(\gamma m\mathbb{E}[I_{\gamma_{i}m+1}]-\mathcal{B}_{\beta})^{2}}\right)
(*) |α^α|γi+ep0γγi(1γi)(4βγ2m2𝔼[Iγim+1]2)\displaystyle\leq\frac{|\hat{\alpha}-\alpha|}{\gamma_{i}}+\frac{e^{p_{0}}\gamma}{\gamma_{i}(1-\gamma_{i})}\left(\frac{4\mathcal{B}_{\beta}}{\gamma^{2}m^{2}\mathbb{E}[I_{\gamma_{i}m+1}]^{2}}\right)
|α^α|γi+8(α+β)(1γi)γi(1/3γ)28log(2δ)m1/3\displaystyle\leq\frac{|\hat{\alpha}-\alpha|}{\gamma_{i}}+\frac{8(\alpha+\beta)(1-\gamma_{i})}{\gamma_{i}(1/3-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}m^{-1/3}
16(α+β)(1γi)γi(1/3γ)28log(2δ)m1/3\displaystyle\leq\frac{16(\alpha+\beta)(1-\gamma_{i})}{\gamma_{i}(1/3-\gamma)^{2}}\sqrt{8\log(\frac{2}{\delta})}m^{-1/3}

where for the (*) step we used the fact that m1/364(α+β)2α28log(2δ)β12γm𝔼[Iγim+1]m^{1/3}\geq 64\frac{(\alpha+\beta)^{2}}{\alpha^{2}}\sqrt{8\log(\frac{2}{\delta})}\implies\mathcal{B}_{\beta}\leq\frac{1}{2}\gamma m\mathbb{E}[I_{\gamma_{i}m+1}].

See 4.1

Proof.

Clearly, α^Aα\hat{\alpha}-A\leq\alpha, β^kBkβ\hat{\beta}_{k}-B_{k}\leq\beta, and β^k+Bkβ\hat{\beta}_{k}+B_{k}\geq\beta. Then using Lemma B.2 we have

|p(x,α^,β^k)p(x,α,β)|\displaystyle|p^{*}(x,\hat{\alpha},\hat{\beta}_{k})-p^{*}(x,\alpha,\beta)|\leq maxα[α,α^] or α[α^,α]{|p(x,α,β)α|α=α}|αα^|\displaystyle\max\limits_{\alpha^{\prime}\in[\alpha,\hat{\alpha}]\text{ or }\alpha^{\prime}\in[\hat{\alpha},\alpha]}\left\{\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\alpha}\right|_{\alpha=\alpha^{\prime}}\right\}|\alpha-\hat{\alpha}|
+maxβ[β,β^] or β[β^,β]{|p(x,α,β)β|β=β}|ββ^k|\displaystyle+\max\limits_{\beta^{\prime}\in[\beta,\hat{\beta}]\text{ or }\beta^{\prime}\in[\hat{\beta},\beta]}\left\{\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial\beta}\right|_{\beta=\beta^{\prime}}\right\}|\beta-\hat{\beta}_{k}|
\displaystyle\leq LαA+LβiBk\displaystyle L_{\alpha}A+L_{\beta i}B_{k}

D.2 Step 2: Proof of Lemma 4.2

Since the prices that we offer in the stochastic Bass model is based on a discretized version of the continuous price curve in the deterministic Bass model, we first need to prove a result that says that this discretization does not introduce a lot of error. Lemma D.3 below shows that it only introduces a logarithmic (in mm) amount of error, for any fixed α,β,T\alpha,\beta,T.

Lemma D.3.

For any fixed T,α,βT,\alpha,\beta and n=1,,mXTn=1,\ldots,mX^{*}_{T},

|d=1np(d1m,α,β)m0n/mp(x,α,β)𝑑x|n2mβα+12log(m)+2pmax\left|\sum\limits_{d=1}^{n}p^{*}\left(\frac{d-1}{m},\alpha,\beta\right)-m\int_{0}^{n/m}p^{*}(x,\alpha,\beta)dx\right|\leq\frac{n}{2m}\frac{\beta}{\alpha}+\frac{1}{2}\log\left(m\right)+2p^{*}_{max}

where pmaxp^{*}_{max} denotes an upper bound on the optimal prices p(x,α,β)p^{*}(x,\alpha,\beta) for all xx.

Proof.

Using (6),

|p(x,α,β)x|βα+11x.\left|\frac{\partial p^{*}(x,\alpha,\beta)}{\partial x}\right|\leq\frac{\beta}{\alpha}+\frac{1}{1-x}.

In below we abbreviate p(x,α,β)p^{*}(x,\alpha,\beta) as pxp^{*}_{x}.

|d=1np(d1m,α,β)m0n/mp(x,α,β)𝑑x|\displaystyle\left|\sum\limits_{d=1}^{n}p^{*}\left(\frac{d-1}{m},\alpha,\beta\right)-m\int_{0}^{n/m}p^{*}(x,\alpha,\beta)dx\right|
\displaystyle\leq d=1n2|pd1mmd1md/mpx𝑑x|+2pmax\displaystyle\sum\limits_{d=1}^{n-2}\left|p^{*}_{\frac{d-1}{m}}-m\int_{\frac{d-1}{m}}^{d/m}p^{*}_{x}dx\right|+2p^{*}_{max}
\displaystyle\leq d=1n2[md1md/m(βα+11d/m)(xd1m)𝑑x]+2pmax\displaystyle\sum\limits_{d=1}^{n-2}\left[m\int_{\frac{d-1}{m}}^{d/m}\left(\frac{\beta}{\alpha}+\frac{1}{1-d/m}\right)(x-\frac{d-1}{m})dx\right]+2p^{*}_{max}
=\displaystyle= n2mβα+d=1n212(1d/m)m+2pmax\displaystyle\frac{n}{2m}\frac{\beta}{\alpha}+\sum\limits_{d=1}^{n-2}\frac{1}{2(1-d/m)m}+2p^{*}_{max}
\displaystyle\leq n2mβα+12m1n11(1g/m)𝑑g+2pmax\displaystyle\frac{n}{2m}\frac{\beta}{\alpha}+\frac{1}{2m}\int_{1}^{n-1}\frac{1}{(1-g/m)}dg+2p^{*}_{max}
\displaystyle\leq n2mβα+12log(11n1m)+2pmax\displaystyle\frac{n}{2m}\frac{\beta}{\alpha}+\frac{1}{2}\log\left(\frac{1}{1-\frac{n-1}{m}}\right)+2p^{*}_{max}
\displaystyle\leq n2mβα+12log(m)+2pmax\displaystyle\frac{n}{2m}\frac{\beta}{\alpha}+\frac{1}{2}\log\left(m\right)+2p^{*}_{max}

The first inequality follows because we know p(x,α,β)p^{*}(x,\alpha,\beta) is bounded below and above by 0 and pmaxp^{*}_{max}. Therefore the difference between pp^{*} evaluated at two different xx values is at most pmaxp^{*}_{max}. ∎

Corollary D.1.

For any fixed T,α,βT,\alpha,\beta,

|d=1mXTp(d1m,α,β)m0XTp(x,α,β)𝑑x|β2α+12log(m)+3pmax\left|\sum\limits_{d=1}^{\lfloor mX^{*}_{T}\rfloor}p^{*}\left(\frac{d-1}{m},\alpha,\beta\right)-m\int_{0}^{X^{*}_{T}}p^{*}(x,\alpha,\beta)dx\right|\leq\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)+3p^{*}_{max}
Proof.

Let n=mXT(x0)d0n=\lfloor mX^{*}_{T}(x_{0})\rfloor-d_{0}. Since rounding mXTmX^{*}_{T} introduces at most at additional pmaxp^{*}_{max} difference in revenue, the result then immediately follows from Lemma D.3. ∎

For the lemma below, we define

Videt(T)mγi2γiXTp(x,α,β)𝑑xV^{\text{det}}_{i}(T)\coloneqq m\int_{\gamma_{i}}^{2\gamma_{i}\wedge X^{*}_{T}}p^{*}(x,\alpha,\beta)\,dx
Revid=γim+12γimXTmpd\text{Rev}_{i}\coloneqq\sum_{d=\lfloor\gamma_{i}m\rfloor+1}^{\lfloor 2\gamma_{i}m\rfloor\wedge\lfloor X^{*}_{T}m\rfloor}p_{d}

See 4.2

Proof.

Let A,BiA,B_{i} be the bound on the estimation error stated in Lemma D.1, Lemma D.2. Recall that first γm\gamma m customers in every epoch are offered an exploration price p0=0p_{0}=0. Let pdp_{d} be the price paid by customer dd as specified in Algorithm 1 and (13). Recall also that p(x,α,β)p^{*}(x,\alpha,\beta) is the optimal price curve as specified in (6). We use the short hand notations pxp(x,α,β),pdp(d1m,α,β),pmaxmaxx[0,1)pxp^{*}_{x}\coloneqq p^{*}(x,\alpha,\beta),p^{*}_{d}\coloneqq p^{*}(\frac{d-1}{m},\alpha,\beta),p^{*}_{max}\coloneqq\max\limits_{x\in[0,1)}p^{*}_{x} in the following calculations.

VidetRevi\displaystyle V^{\text{det}}_{i}-\text{Rev}_{i} =mγi2γiXTpx𝑑xd=γim+12γimXTmpd\displaystyle=m\int_{\gamma_{i}}^{2\gamma_{i}\wedge X^{*}_{T}}p^{*}_{x}\,dx-\sum\limits_{d=\gamma_{i}m+1}^{2\gamma_{i}m\wedge\lfloor X^{*}_{T}m\rfloor}p_{d}
=d=γim+12γimXTm[pdpd]+|d=γim+12γimXTmpdmγi2γiXTpx𝑑x|\displaystyle=\sum\limits_{d=\gamma_{i}m+1}^{2\gamma_{i}m\wedge\lfloor X^{*}_{T}m\rfloor}\left[p^{*}_{d}-p_{d}\right]+\left|\sum\limits_{d=\gamma_{i}m+1}^{2\gamma_{i}m\wedge\lfloor X^{*}_{T}m\rfloor}p^{*}_{d}-m\int_{\gamma_{i}}^{2\gamma_{i}\wedge X^{*}_{T}}p^{*}_{x}dx\right|
(Corollary D.1) d=γim+12γimXTm[pdpd]+β2α+12log(m)+3pmax\displaystyle\leq\sum\limits_{d=\gamma_{i}m+1}^{2\gamma_{i}m\wedge\lfloor X^{*}_{T}m\rfloor}\left[p^{*}_{d}-p_{d}\right]+\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)+3p^{*}_{max}
(γm+3)pmax+d=(γi+γ)m+12γimXTm[pdpd]+β2α+12log(m)\displaystyle\leq(\gamma m+3)p^{*}_{max}+\sum\limits_{d=(\gamma_{i}+\gamma)m+1}^{2\gamma_{i}m\wedge\lfloor X^{*}_{T}m\rfloor}\left[p^{*}_{d}-p_{d}\right]+\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)
(Lemma 4.1, Lemma D.1, Lemma D.2) (γm+3)pmax+2γim(LαA+LβiBi)+β2α+12log(m)w.p.12δ\displaystyle\leq(\gamma m+3)p^{*}_{max}+2\gamma_{i}m(L_{\alpha}A+L_{\beta i}B_{i})+\frac{\beta}{2\alpha}+\frac{1}{2}\log\left(m\right)\quad w.p.1-2\delta
(Lemma 4.4) O(m2/3log((α+β)T))+2γim(LαA+LβiBi)\displaystyle\leq O\left(m^{2/3}\log\left((\alpha+\beta)T\right)\right)+2\gamma_{i}m(L_{\alpha}A+L_{\beta i}B_{i})

where Lα,LβiL_{\alpha},L_{\beta i} are defined in (13).

The second to last step follows because we know that |pi¯(d1m)pd|2(LαA+LβiBi)|\underline{p^{*}_{i}}(\frac{d-1}{m})-p^{*}_{d}|\leq 2(L_{\alpha}A+L_{\beta i}B_{i}) using the definition of pi¯\underline{p^{*}_{i}} in (13) and Lemma 4.1, and that from Lemma D.1 and Lemma D.2 we know that the error bounds AA, BiB_{i} hold with probability 12δ1-2\delta.

Assuming that m1/364ϕα8log(2δ)m^{1/3}\geq\frac{64\phi}{\alpha}\sqrt{8\log(\frac{2}{\delta})}, and γim1/3640ϕβ8log(2δ)\gamma_{i}m^{1/3}\geq\frac{640\phi}{\beta}\sqrt{8\log(\frac{2}{\delta})}, one can check this implies that Aα4A\leq\frac{\alpha}{4} and Biβ4B_{i}\leq\frac{\beta}{4}, which in turn implies that α^Aα2\hat{\alpha}-A\geq\frac{\alpha}{2} and β^Biβ2\hat{\beta}-B_{i}\geq\frac{\beta}{2}. Applying these bounds to (13) we have:

Lα4α+6βα2Lβi6(1α+1β)L_{\alpha}\leq\frac{4}{\alpha}+\frac{6\beta}{\alpha^{2}}\qquad L_{\beta i}\leq 6\left(\frac{1}{\alpha}+\frac{1}{\beta}\right)

Plug in the expressions of AA and BiB_{i}, as well as the above bounds on Lα,LβiL_{\alpha},L_{\beta i}, we have with probability 1δ1-\delta,

Videt(T)ReviO(m2/3log(Tδ)(1α+1β+βα2)ϕ)V^{\text{det}}_{i}(T)-\text{Rev}_{i}\leq O\left(m^{2/3}\log\left(\frac{T}{\delta}\right)\left(\frac{1}{\alpha}+\frac{1}{\beta}+\frac{\beta}{\alpha^{2}}\right)\phi\right)

When m1/364ϕα8log(2δ)m^{1/3}\leq\frac{64\phi}{\alpha}\sqrt{8\log(\frac{2}{\delta})}, or γim1/3640ϕβ8log(2δ)\gamma_{i}m^{1/3}\leq\frac{640\phi}{\beta}\sqrt{8\log(\frac{2}{\delta})}, the regret can be trivially bounded by pmaxγimO(log((α+β)T)(ϕαlog(1δ))3+log((α+β)T)ϕβlog(1δ)m2/3)p^{*}_{max}\gamma_{i}m\leq O\left(\log((\alpha+\beta)T)\left(\frac{\phi}{\alpha}\sqrt{\log(\frac{1}{\delta})}\right)^{3}+\log((\alpha+\beta)T)\frac{\phi}{\beta}\sqrt{\log(\frac{1}{\delta})}m^{2/3}\right), where we used Lemma 4.4 to bound pmaxp^{*}_{max}. Since the first component is a constant with respect to mm and the second is no larger than the expression from before, we are done.

D.3 Step 3: Proof of Lemma 4.3 and Lemma 4.4

See 4.3

Proof.

Let pdp_{d} be the prices that the algorithm offers for customer dd. Since Algorithm 1 offers either p0=0p_{0}=0 or the lower confidence bound price defined in (13), we know from Corollary 4.1, as well as Lemma D.1, Lemma D.2, that with probability 1δK1-\delta K, pdp(d1m,α,β)ddTmXTp_{d}\leq p^{*}(\frac{d-1}{m},\alpha,\beta)\forall d\leq d_{T}\wedge mX^{*}_{T}, where KK is the total number of epochs. This means that λ(pd,d1m)λ(p(d1m,α,β),d1m)=1e(α+βXT)(1XT)m\lambda(p_{d},\frac{d-1}{m})\geq\lambda(p^{*}(\frac{d-1}{m},\alpha,\beta),\frac{d-1}{m})=\frac{1}{e}(\alpha+\beta X^{*}_{T})(1-X^{*}_{T})m. Let λ¯1e(α+βXT)(1XT)m\underline{\lambda}\coloneqq\frac{1}{e}(\alpha+\beta X^{*}_{T})(1-X^{*}_{T})m, which by (7) is equal to XTmT\frac{X^{*}_{T}m}{T}.

Set n=mXT8mXTlog(2δ),λ¯=mXTT,ϵ=8nλ¯2log(2δ)n=\lfloor mX^{*}_{T}-\sqrt{8mX^{*}_{T}\log(\frac{2}{\delta})}\rfloor,\underline{\lambda}=\frac{mX^{*}_{T}}{T},\epsilon=\sqrt{\frac{8n}{\underline{\lambda}^{2}}log(\frac{2}{\delta})}, then by Lemma A.1, we have with probability 1δ1-\delta:

|τn(18log(2δ)mXT)T|T8XTmlog(2δ)\left|\tau_{n}-\left(1-\sqrt{\frac{8\log(\frac{2}{\delta})}{mX^{*}_{T}}}\right)T\right|\leq T\sqrt{\frac{8}{X^{*}_{T}m}log(\frac{2}{\delta})}

This means that with probability 1δ(K+1)1-\delta(K+1),τnT\tau_{n}\leq T, which means that the total number of adoptions observed in the stochastic Bass model is at least nn. The result follows by observing that there can be at most log(m)\log(m) epochs. ∎

See 4.4

Proof.

Here we use the expanded notation of XT(x,α,β)X^{*}_{T}(x,\alpha,\beta) introduced in Appendix B. Using (6), we have for any xXT(0,α,β)x\leq X^{*}_{T}(0,\alpha,\beta):

p(x,α,β)\displaystyle p^{*}(x,\alpha,\beta)\leq 1+log(1x1XT(0,α,β))\displaystyle 1+\log\left(\frac{1-x}{1-X^{*}_{T}(0,\alpha,\beta)}\right)
\displaystyle\leq 1+log(11XT(0,α,β))\displaystyle 1+\log\left(\frac{1}{1-X^{*}_{T}(0,\alpha,\beta)}\right)
\displaystyle\leq log(e+(α+β)T)\displaystyle\log\left(e+(\alpha+\beta)T\right)

The last step follows from the following upper bound on XTX^{*}_{T}:

XT(0,α,β)XT(0,α+β,0)=(α+β)T(α+β)T+eX^{*}_{T}(0,\alpha,\beta)\leq X^{*}_{T}(0,\alpha+\beta,0)=\frac{(\alpha+\beta)T}{(\alpha+\beta)T+e}

The first inequality is easy to see from (7): by replacing α+βXT\alpha+\beta X^{*}_{T} with α+β\alpha+\beta, we can see that XT/(1XT)X^{*}_{T}/(1-X^{*}_{T}) increases, and since this quantity is strictly monotone in XTX^{*}_{T}, XTX^{*}_{T} must be larger. And the last equality follows from solving (7) after replacing α+βXT\alpha+\beta X^{*}_{T} with α+β\alpha+\beta. ∎

D.4 Step 4: Putting it all together for proof of Theorem 4.1

Proof of Theorem 4.1.
Pseudo-Regret =i=1K[VidetRevi]+d=dT+1mXTpd\displaystyle=\sum_{i=1}^{K}\left[V^{\text{det}}_{i}-\text{Rev}_{i}\right]+\sum\limits_{d=d_{T}+1}^{mX^{*}_{T}}p_{d}
log(m)O(m2/3log(Tδ)(1α+1β+βα2)ϕ)w.p. 1(K+1)δ\displaystyle\leq\log(m)O\left(m^{2/3}\log\left(\frac{T}{\delta}\right)\left(\frac{1}{\alpha}+\frac{1}{\beta}+\frac{\beta}{\alpha^{2}}\right)\phi\right)\quad\text{w.p. $1-(K+1)\delta$}
+O(log((α+β)T)m)\displaystyle\quad+O\left(\log\left((\alpha+\beta)T\right)\sqrt{m}\right)
=O(m2/3log(m)log(Tδ)(1α+1β+βα2)ϕ)\displaystyle=O\left(m^{2/3}\log(m)\log(\frac{T}{\delta})\left(\frac{1}{\alpha}+\frac{1}{\beta}+\frac{\beta}{\alpha^{2}}\right)\phi\right)

where the per epoch regrets are bounded using Lemma 4.2 and the “uncaptured revenue” term is bounded using Lemma 4.3 and 4.4.

The inequality holds with probability 1(K+1)δ1-(K+1)\delta, and KK is the total number of epochs, which is bounded by log(m)\log(m) since the epoch length is defined with respect to the number of customers and increases geometrically. ∎

Appendix E Lower Bound Proofs

E.1 Step 1: missing lemmas and proofs

See 5.1

Proof.

Let π\pi^{*} denote π(x,T)\pi^{*}(x,T^{\prime}) for the remainder of this proof. Using (19), we can rewrite the left hand side of the lemma:

Adet(x,T,p)\displaystyle A^{det}(x,T^{\prime},p)
=\displaystyle= limδ0Vdet(x,T)pλ(p,x)δVdet(x+λ(p,x)δ/m,Tδ)δ\displaystyle\lim_{\delta\rightarrow 0}\frac{V^{\text{det}}(x,T^{\prime})-p\lambda(p,x)\delta-V^{\text{det}}(x+\lambda(p,x)\delta/m,T^{\prime}-\delta)}{\delta}
=\displaystyle= limδ0πλ(π,x)δ+Vdet(x+λ(π,x)δ/m,Tδ)pλ(p,x)δVdet(x+λ(p,x)δ/m,Tδ)δ\displaystyle\lim_{\delta\rightarrow 0}\frac{\pi^{*}\lambda(\pi^{*},x)\delta+V^{\text{det}}(x+\lambda(\pi^{*},x)\delta/m,T^{\prime}-\delta)-p\lambda(p,x)\delta-V^{\text{det}}(x+\lambda(p,x)\delta/m,T^{\prime}-\delta)}{\delta}
=\displaystyle= Gdet(x,T,π)Gdet(x,T,p)\displaystyle G^{\text{det}}(x,T^{\prime},\pi^{*})-G^{\text{det}}(x,T^{\prime},p) (24)

where we define

Gdet(x,T,p)\displaystyle G^{\text{det}}(x,T^{\prime},p) =limδ0pλ(p,x)+Vdet(x+λ(p,x)δ/m,Tδ)Vdet(x,Tδ)δ\displaystyle=\lim\limits_{\delta\rightarrow 0}p\lambda(p,x)+\frac{V^{\text{det}}(x+\lambda(p,x)\delta/m,T^{\prime}-\delta)-V^{\text{det}}(x,T^{\prime}-\delta)}{\delta}
=pλ(p,x)+λ(p,x)mVdet(x,T)x\displaystyle=p\lambda(p,x)+\frac{\lambda(p,x)}{m}\frac{\partial V^{\text{det}}(x,T^{\prime})}{\partial x} (25)

The distribution of limits is valid since both limits exist. The new quantity Gdet(x,T,p)G^{\text{det}}(x,T^{\prime},p) will help us quantify the instantaneous impact, or the (dis)Advantage, of offering a suboptimal price.

From above, note that π(x,T):=argminpAdet(x,T,p)=argmaxpGdet(x,T,p)\pi^{*}(x,T^{\prime}):=\operatorname*{arg\,min}_{p}A^{det}(x,T^{\prime},p)=\arg\max_{p}G^{det}(x,T^{\prime},p). We derived the expression for π(x,T)\pi^{*}(x,T^{\prime}) earlier in the proof of Lemma 2.1 (equation (21)) as

π(x,T)=11mVdet(x,T)x.\pi^{*}(x,T^{\prime})=1-\frac{1}{m}\frac{\partial V^{\text{det}}(x,T^{\prime})}{\partial x}.

We can now bound (24) using the derivative of GdetG^{\text{det}}:

Gdet(x,T,p)p\displaystyle\frac{\partial G^{\text{det}}(x,T^{\prime},p)}{\partial p} =λ(p,x)pλ(p,x)λ(p,x)mVdet(x,T)x\displaystyle=\lambda(p,x)-p\lambda(p,x)-\frac{\lambda(p,x)}{m}\frac{\partial V^{\text{det}}(x,T^{\prime})}{\partial x}
=(πp)λ(p,x)\displaystyle=(\pi^{*}-p)\lambda(p,x)
Gdet(x,T,π)Gdet(x,T,p)\displaystyle G^{\text{det}}(x,T^{\prime},\pi^{*})-G^{\text{det}}(x,T^{\prime},p) =pπ(πρ)λ(ρ,x)𝑑ρ\displaystyle=\int_{p}^{\pi^{*}}(\pi^{*}-\rho)\lambda(\rho,x)d\rho
=λ(p,x)(πp)+λ(π,x)λ(p,x)\displaystyle=\lambda(p,x)(\pi^{*}-p)+\lambda(\pi^{*},x)-\lambda(p,x)
=λ(p,x)(epπ1(pπ))\displaystyle=\lambda(p,x)(e^{p-\pi^{*}}-1-(p-\pi^{*}))
λ(p,x)min(110,14(pπ)2)\displaystyle\geq\lambda(p,x)\min\left(\frac{1}{10},\frac{1}{4}(p-\pi^{*})^{2}\right)

To see the last step, consider f(x)=ex1xf(x)=e^{x}-1-x which is a convex function minimized at x=0x=0. For x12x\geq-\frac{1}{2}, it’s easy to show that f′′(x)e1/212f^{\prime\prime}(x)\geq e^{-1/2}\geq\frac{1}{2}. So by standard strong convexity argument f(x)14x2f(x)\geq\frac{1}{4}x^{2}. For x12x\leq-\frac{1}{2} we have f(x)f(12)110f(x)\geq f(-\frac{1}{2})\geq\frac{1}{10}. ∎

Before we translate the above result into a regret bound in terms of cumulative pricing error, we explain the proof idea with some more details. Given any arbitrary pricing algorithm, let

[(p1,I1),,(pn,In)][(p_{1},I_{1}),\ldots,(p_{n},I_{n})]

be the first nn observations (tuples of price pdp_{d} and inter-arrival times IdI_{d} between customer d1d-1 and dd) in the stochastic Bass model, on following the algorithm’s pricing policy. Here we assume that the algorithm is allowed to continue running even after the planning horizon TT has passed. If the algorithm is undefined after time TT, we assume that it offers 0 price. We use these observations to define a continuous price trajectory px,0xn/mp_{x},0\leq x\leq n/m as follows: set px=pd,x[d1m,dm),d=1,,np_{x}=p_{d},\forall\,x\in[\frac{d-1}{m},\frac{d}{m}),d=1,\ldots,n. We call pxp_{x} the induced price trajectory of pdp_{d}. Let tn/mdett^{det}_{n/m} denote the time when the adoption level hits xx in the deterministic Bass model following this induced pricing trajectory pxp_{x}. In other words, txdet=0xmλ(x,px)𝑑xt^{det}_{x}=\int_{0}^{x}\frac{m}{\lambda(x^{\prime},p_{x^{\prime}})}dx^{\prime}. Note that tn/mdett^{det}_{n/m} is a stochastic quantity because it depends on stochastic trajectory pxp_{x}, which in turn depends on the prices pdp_{d} offered to the first nn customers in the stochastic model.

Recall that τn=d=1nId\tau_{n}=\sum\limits_{d=1}^{n}I_{d} denotes the arrival time of customer nn in the stochastic model. First we show that the total time for nn customer arrivals in the deterministic vs. stochastic model (i.e., tn/mdett^{det}_{n/m} vs. τn\tau_{n}) under the two price trajectories (pxp_{x} vs. pdp_{d}) is roughly the same.

Lemma E.1.

Given nm,δ(0,1)n\leq m,\delta\in(0,1) such that n2log(2δ)n\geq 2\log(\frac{2}{\delta}). Then for any algorithm satisfying Assumption 1, with probability at least 1δ1-\delta,

|tn/mdetτn|epmaxα(mn)8nlog(2δ)+epmax(α+β)n2α2(mn1)2m|t^{det}_{n/m}-\tau_{n}|\leq\frac{e^{p_{max}}}{\alpha(m-n)}\sqrt{8n\log(\frac{2}{\delta})}+\frac{e^{p_{max}}(\alpha+\beta)n}{2\alpha^{2}(m-n-1)^{2}m}

where pmaxp_{max} is an upper bound on the prices pd,1ddTp_{d},1\leq d\leq d_{T} offered by algorithm.

Proof.

Since algorithm’s prices are fixed between arrival of two customers (by Assumption 1), we have that inter-arrival times follow an exponential distribution. That is, for any dd, given pd+1p_{d+1}, the price paid by (d+1)th(d+1)^{th} customer, τd+1τd\tau_{d+1}-\tau_{d} follows the exponential distribution Exp(λ(pd+1,dm))Exp(\lambda(p_{d+1},\frac{d}{m})), where λ(p,x)=ep(α+βx)(1x)m\lambda(p,x)=e^{-p}(\alpha+\beta x)(1-x)m. Note that λ(pd+1,dm)λ¯epmaxα(mn)\lambda(p_{d+1},\frac{d}{m})\geq\underline{\lambda}\coloneqq e^{-p_{max}}\alpha(m-n) for all dnd\leq n. Set ϵ=8nλ¯2log(2δ)\epsilon=\sqrt{\frac{8n}{\underline{\lambda}^{2}}\log(\frac{2}{\delta})}, then Lemma A.1 in Appendix A provides that with probability 1δ1-\delta:

|τnd=1n𝔼[τdτd1|d1]|8nλ¯2log(2δ)|\tau_{n}-\sum\limits_{d=1}^{n}\mathbb{E}[\tau_{d}-\tau_{d-1}|\mathcal{F}_{d-1}]|\leq\sqrt{\frac{8n}{\underline{\lambda}^{2}}\log(\frac{2}{\delta})}

Note that the condition on ϵ\epsilon in Lemma A.1 is satisfied since 8nλ¯2log(2δ)2nλ¯n2log(2δ)\sqrt{\frac{8n}{\underline{\lambda}^{2}}\log(\frac{2}{\delta})}\leq\frac{2n}{\underline{\lambda}}\iff n\geq 2\log(\frac{2}{\delta}).

On the other hand, for any dnd\leq n, td+1mdettdmdet=md/md+1m1λ(pd+1,x)𝑑xt^{det}_{\frac{d+1}{m}}-t^{det}_{\frac{d}{m}}=m\int_{d/m}^{\frac{d+1}{m}}\frac{1}{\lambda(p_{d+1},x)}dx. It’s easy to check that |x1λ(p,x)|=|1mep[β(1x)(α+βx)](α+βx)2(1x)2|ep(α+β)α2(1x)2m|\frac{\partial}{\partial x}\frac{1}{\lambda(p,x)}|=|\frac{1}{m}\frac{e^{p}[\beta(1-x)-(\alpha+\beta x)]}{(\alpha+\beta x)^{2}(1-x)^{2}}|\leq\frac{e^{p}(\alpha+\beta)}{\alpha^{2}(1-x)^{2}m}. So

|td+1mdettd/mdet𝔼[τd+1τd|d]|=\displaystyle\left|t^{det}_{\frac{d+1}{m}}-t^{det}_{d/m}-\mathbb{E}[\tau_{d+1}-\tau_{d}|\mathcal{F}_{d}]\right|= |md/md+1m1λ(pd+1,x)𝑑x1λ(pd+1,d/m)|\displaystyle\left|m\int_{d/m}^{\frac{d+1}{m}}\frac{1}{\lambda(p_{d+1},x)}dx-\frac{1}{\lambda(p_{d+1},d/m)}\right|
\displaystyle\leq |md/md+1m1λ(pd+1,dm)+epd+1(α+β)α2(1d+1m)2m(xdm)dx1λ(pd+1,d/m)|\displaystyle\left|m\int_{d/m}^{\frac{d+1}{m}}\frac{1}{\lambda(p_{d+1},\frac{d}{m})}+\frac{e^{p_{d+1}}(\alpha+\beta)}{\alpha^{2}(1-\frac{d+1}{m})^{2}m}(x-\frac{d}{m})dx-\frac{1}{\lambda(p_{d+1},d/m)}\right|
\displaystyle\leq |epd+1(α+β)α2(1d+1m)2d/md+1m(xdm)𝑑x|\displaystyle\left|\frac{e^{p_{d+1}}(\alpha+\beta)}{\alpha^{2}(1-\frac{d+1}{m})^{2}}\int_{d/m}^{\frac{d+1}{m}}(x-\frac{d}{m})dx\right|
\displaystyle\leq epd+1(α+β)2α2(1d+1m)2m2\displaystyle\frac{e^{p_{d+1}}(\alpha+\beta)}{2\alpha^{2}(1-\frac{d+1}{m})^{2}m^{2}}

In the second to last step we canceled the first and third term from the previous step. Combining this with above, we have with probability 1δ1-\delta

|tn/mdetτn|epmaxα(mn)8nlog(2δ)+epmax(α+β)n2α2(mn1)2m|t^{det}_{n/m}-\tau_{n}|\leq\frac{e^{p_{max}}}{\alpha(m-n)}\sqrt{8n\log(\frac{2}{\delta})}+\frac{e^{p_{max}}(\alpha+\beta)n}{2\alpha^{2}(m-n-1)^{2}m}

Using txdett^{det}_{x} and the induced pricing trajectory pxp_{x} as defined right before Lemma E.1, we can now obtain the following result.

Lemma E.2.

Fix any α,β\alpha,\beta. Then for any m,Tm,T such that mXTn:=(αα+β)4/3m2/3mX^{*}_{T}\geq n:=\left\lfloor\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right\rfloor, and any algorithm satisfying Assumption 1 and 2,

𝔼[Pseudo-Regret(T)]𝔼[m0n/mmin(14(πxpx)2,110)𝑑x]O((α+β)1/3TeCα1/3m1/3log(m))\mathbb{E}\left[\text{Pseudo-Regret}(T)\right]\geq\mathbb{E}\left[m\int_{0}^{n/m}\min\left(\frac{1}{4}(\pi^{*}_{x}-p_{x})^{2},\frac{1}{10}\right)dx\right]-O\left({\frac{(\alpha+\beta)^{1/3}Te^{C}}{\alpha^{1/3}}}m^{1/3}\sqrt{\log(m)}\right)

where πx:=π(x,α,β,Ttxdet)\pi^{*}_{x}:=\pi^{*}(x,\alpha,\beta,T-t^{det}_{x}) denotes the deterministic optimal price at adoption level xx, px:=pdp_{x}:=p_{d} denotes the price offered by the algorithm for customer d=mx+1d={\lfloor mx\rfloor+1}, and txdett^{det}_{x} is the time at which adoption level reaches xx on following the price trajectory px,xxp_{x^{\prime}},\forall x^{\prime}\leq x in the deterministic Bass model.

Proof.

Let XtX_{t} be the trajectory of adoption levels in the deterministic Bass model on following price curve pxp_{x}. Recall that by (5), mdXtdt=λ(pt,Xt)m\frac{dX_{t}}{dt}=\lambda(p_{t},X_{t}). Let ptpXtp_{t}\coloneqq p_{X_{t}} be the same price trajectory as pxp_{x} but indexed by time.

First, note that for mm large enough we have tn/mdetnepmaxα(mn)TeCnα(mn)Tt^{det}_{n/m}\leq\frac{n}{e^{-p_{max}}\alpha(m-n)}\leq\frac{Te^{C}n}{\alpha(m-n)}\leq T, This is because the last inequality holds for mn+nαeCm\geq n+\frac{n}{\alpha}e^{C}, i.e., for any mm satisfying m1/3((α+1)α1/3(α+β)4/3)eCm^{1/3}\geq\left(\frac{(\alpha+1)\alpha^{1/3}}{(\alpha+\beta)^{4/3}}\right)e^{C}. Here we used the upper bound pmax:=log(T)+Cp_{max}:=\log(T)+C from Assumption 2. Therefore for the rest of the proof we can assume that tn/mdetTt^{det}_{n/m}\leq T.

𝔼[Pseudo-Regret(T)]\displaystyle\mathbb{E}[\text{Pseudo-Regret}(T)]
=\displaystyle= 𝔼[Vdet(0,T)d=1npdd=n+1dTpd]\displaystyle\mathbb{E}\left[V^{\text{det}}(0,T)-\sum\limits_{d=1}^{n}p_{d}-\sum\limits_{d=n+1}^{d_{T}}p_{d}\right]
=\displaystyle= 𝔼[Vdet(0,T)d=1npd]𝔼[𝔼[d=n+1dTpd|n]]\displaystyle\mathbb{E}\left[V^{\text{det}}(0,T)-\sum\limits_{d=1}^{n}p_{d}\right]-\mathbb{E}\left[\mathbb{E}\left[\left.\sum\limits_{d=n+1}^{d_{T}}p_{d}\right|\mathcal{F}_{n}\right]\right]
\displaystyle\geq 𝔼[Vdet(0,T)d=1npd]𝔼[Vstoch(nm,Tτn)]\displaystyle\mathbb{E}\left[V^{\text{det}}(0,T)-\sum\limits_{d=1}^{n}p_{d}\right]-\mathbb{E}\left[V^{\text{stoch}}(\frac{n}{m},T-\tau_{n})\right]
=\displaystyle= 𝔼[Vdet(0,T)d=1npdVdet(nm,Ttn/mdet)]+𝔼[Vdet(nm,Ttn/mdet)Vstoch(n,Tτn)]\displaystyle\mathbb{E}\left[V^{\text{det}}(0,T)-\sum\limits_{d=1}^{n}p_{d}-V^{\text{det}}(\frac{n}{m},T-t^{det}_{n/m})\right]+\mathbb{E}\left[V^{\text{det}}(\frac{n}{m},T-t^{det}_{n/m})-V^{\text{stoch}}(n,T-\tau_{n})\right] (26)

Given a particular sequence of pdp_{d} for d=1,,nd=1,\ldots,n, and its’ induced continuous version pxp_{x} as defined in the lemma statement, the quantity inside the first expectation brackets is the cumulative “disadvantage” that the pxp_{x} incurs on the deterministic Bass model, where disadvantage is defined in Lemma 5.1. Therefore we can bound it as follows:

𝔼[Vdet(0,T)d=1npdVdet(n/m,Ttn/mdet)]\displaystyle\mathbb{E}\left[V^{\text{det}}(0,T)-\sum\limits_{d=1}^{n}p_{d}-V^{\text{det}}(n/m,T-t^{det}_{n/m})\right]
=\displaystyle= 𝔼[0tn/mdetVdet(Xt,Tt)Qdet(Xt,Tt,pt,dt)]\displaystyle\mathbb{E}\left[\int_{0}^{t^{det}_{n/m}}V^{\text{det}}(X_{t},T-t)-Q^{\text{det}}(X_{t},T-t,p_{t},dt)\right]
=\displaystyle= 𝔼[0tn/mdetVdet(Xt,Tt)pλ(pt,Xt)dtVdet(Xt+λ(pt,Xt)dt/m,Ttdt)dt𝑑t]\displaystyle\mathbb{E}\left[\int_{0}^{t^{det}_{n/m}}\frac{V^{\text{det}}(X_{t},T-t)-p\lambda(p_{t},X_{t})dt-V^{\text{det}}(X_{t}+\lambda(p_{t},X_{t})dt/m,T-t-dt)}{dt}\,dt\right]
\displaystyle\geq 𝔼[0tn/mdetλ(pt,Xt)min((14πXtpt)2,110)𝑑t]\displaystyle\mathbb{E}\left[\int_{0}^{t^{det}_{n/m}}\lambda(p_{t},X_{t})\min\left((\frac{1}{4}\pi^{*}_{X_{t}}-p_{t})^{2},\frac{1}{10}\right)dt\right]
=\displaystyle= m𝔼[0n/mmin(14(πxpx)2,110)𝑑x]\displaystyle m\mathbb{E}\left[\int_{0}^{n/m}\min\left(\frac{1}{4}(\pi^{*}_{x}-p_{x})^{2},\frac{1}{10}\right)dx\right]

where in the last step we applied change of variables λ(pt,Xt)dt=mdXt\lambda(p_{t},X_{t})dt=m\,dX_{t}

The second part of (26) can be bounded by using Lemm C.1 and bounding the difference between τn\tau_{n} and tn/mdett^{det}_{n/m}:

𝔼[Vdet(nm,Ttn/mdet)Vstoch(n,Tτn)]\displaystyle\mathbb{E}\left[V^{\text{det}}(\frac{n}{m},T-t^{det}_{n/m})-V^{\text{stoch}}(n,T-\tau_{n})\right]
=\displaystyle= 𝔼[Vdet(nm,Tτn)Vstoch(n,Tτn)]+𝔼[Vdet(nm,Ttn/mdet)Vdet(n,Tτn)]\displaystyle\mathbb{E}\left[V^{\text{det}}(\frac{n}{m},T-\tau_{n})-V^{\text{stoch}}(n,T-\tau_{n})\right]+\mathbb{E}\left[V^{\text{det}}(\frac{n}{m},T-t^{det}_{n/m})-V^{\text{det}}(n,T-\tau_{n})\right]
\displaystyle\geq 𝔼[Vdet(nm,Ttn/mdet)Vdet(n,Tτn)]\displaystyle\mathbb{E}\left[V^{\text{det}}(\frac{n}{m},T-t^{det}_{n/m})-V^{\text{det}}(n,T-\tau_{n})\right]

Now recall from (20) and (21) that π(x,α,β,T)=1mVdet(x,T)x\pi^{*}(x,\alpha,\beta,T^{\prime})=1-m\frac{\partial V^{\text{det}}(x,T^{\prime})}{\partial x} and

Vdet(x,T)T\displaystyle\frac{\partial V^{\text{det}}(x,T^{\prime})}{\partial T^{\prime}} =πλ(π,x)+λ(π,x)(1π)=λ(π,x)(α+β)m\displaystyle=\pi^{*}\lambda(\pi^{*},x)+\lambda(\pi^{*},x)(1-\pi^{*})=\lambda(\pi^{*},x)\leq(\alpha+\beta)m (27)

Using Lemma E.1, which bounds the difference between Ttn/mdetT-t^{det}_{n/m} and TτnT-\tau_{n}, we have with probability 1δ1-\delta:

|Vdet(nm,Ttn/mdet)Vdet(nm,Tτn)|\displaystyle|V^{\text{det}}(\frac{n}{m},T-t_{n/m}^{\text{det}})-V^{\text{det}}(\frac{n}{m},T-\tau_{n})|\leq O(epmax(α+β)1/3α1/3m1/3log(2δ))\displaystyle O\left(\frac{e^{p_{max}}(\alpha+\beta)^{1/3}}{\alpha^{1/3}}m^{1/3}\sqrt{\log(\frac{2}{\delta})}\right)

Since Vdet(nm,Tτn)Vdet(nm,T)mpmax=O(mlog((α+β)T))V^{\text{det}}(\frac{n}{m},T-\tau_{n})\leq V^{\text{det}}(\frac{n}{m},T)\leq mp^{*}_{max}=O(m\log((\alpha+\beta)T)), we can set δ=1m\delta=\frac{1}{m} and use Assumption 2 to conclude that

𝔼[|Vdet(nm,Ttn/mdet)Vdet(nm,Tτn)|]O((α+β)1/3TeCα1/3m1/3log(m))\mathbb{E}\left[\left|V^{\text{det}}(\frac{n}{m},T-t_{n/m}^{\text{det}})-V^{\text{det}}(\frac{n}{m},T-\tau_{n})\right|\right]\leq O\left(\frac{(\alpha+\beta)^{1/3}Te^{C}}{\alpha^{1/3}}m^{1/3}\sqrt{\log(m)}\right) (28)

Here we also assumed that m(2log(2δ))3/2(α+βα)2m\geq\left(2\log(\frac{2}{\delta})\right)^{3/2}\left(\frac{\alpha+\beta}{\alpha}\right)^{2}, which implies that n2log(2δ)n\geq 2\log(\frac{2}{\delta}). This satisfies the condition for Lemma E.1. If this assumption on mm does not hold, then since the expected pseudo-regret is lower bounded by zero, and the first term in the lemma statement is at most n1015log(2δ)\frac{n}{10}\leq\frac{1}{5}\log(\frac{2}{\delta}), we have that the lemma statement still holds for δ=Θ(1m)\delta=\Theta(\frac{1}{m}). ∎

E.2 Step 2: missing lemmas and proofs

Lemma E.3.

For any ii, let [(p1,I1),(pi,Ii)][(p_{1},I_{1}),\ldots(p_{i},I_{i})] be sequence of prices offered and inter-arrival times (Id:=τdτd1I_{d}:=\tau_{d}-\tau_{d-1}) for the first ii customers, and let i\mathcal{F}_{i} be the filtration generated by these random variables. Here, prices could have been chosen adaptively using an arbitrary strategy as long as pii1p_{i}\in\mathcal{F}_{i-1}. Let π1π2\pi^{*}_{1}\neq\pi^{*}_{2} be any two fixed prices. Fix any deterministic algorithm that takes in the past nn observations as input and outputs a single price πn\pi\in{\mathcal{F}}_{n}. Then, for any ϵ>0\epsilon>0 and nn such that n(αmϵ)2/3n\leq\left(\frac{\alpha m}{\epsilon}\right)^{2/3}, at least one of the following holds:

α,β(|ππ2)||ππ1|)14,\displaystyle\mathbb{P}_{\alpha,\beta}\left(\left|\pi-\pi^{*}_{2})\right|\leq\left|\pi-\pi^{*}_{1}\right|\right)\geq\frac{1}{4}, or,
α,β+ϵ(|ππ1||ππ2|)14,\displaystyle\mathbb{P}_{\alpha,\beta+\epsilon}\left(\left|\pi-\pi^{*}_{1}\right|\leq\left|\pi-\pi^{*}_{2}\right|\right)\geq\frac{1}{4},

where α,β\mathbb{P}_{\alpha,\beta} denotes the probability distribution under the stochastic Bass model with parameters α,β\alpha,\beta. Note that the only random quantity is π\pi, which depends on the first nn observations.

Proof.

Since pii1p_{i}\in\mathcal{F}_{i-1}, the probability of observing the sequence [(p1,I1),(pn,In)][(p_{1},I_{1}),\ldots(p_{n},I_{n})] is the product of the probabilities of observing IiI_{i} given i1\mathcal{F}_{i-1}.

α,β([(p1,I1),(pn,In)])\displaystyle\mathbb{P}_{\alpha,\beta}\left([(p_{1},I_{1}),\ldots(p_{n},I_{n})]\right) =i=1nα,β(Ii|i1)\displaystyle=\prod\limits_{i=1}^{n}\mathbb{P}_{\alpha,\beta}\left(I_{i}|\mathcal{F}_{i-1}\right)
α,β+ϵ([(p1,I1),(pn,In)])\displaystyle\mathbb{P}_{\alpha,\beta+\epsilon}\left([(p_{1},I_{1}),\ldots(p_{n},I_{n})]\right) =i=1nα,β+ϵ(Ii|i1)\displaystyle=\prod\limits_{i=1}^{n}\mathbb{P}_{\alpha,\beta+\epsilon}\left(I_{i}|\mathcal{F}_{i-1}\right)

The (α,β)(\alpha,\beta) subscript denotes the fact that IiI_{i}’s are generated according to the stochastic Bass model with parameters (α,β)(\alpha,\beta). Since customer arrivals are Poisson, we know that given price pip_{i}, the arrival time IiI_{i} between customer i1i-1 and ii is exponentially distributed. Specifically, IiExp(λα,β(pi,i1m))I_{i}\sim\text{Exp}\left(\lambda_{\alpha,\beta}(p_{i},\frac{i-1}{m})\right) in the (α,β)(\alpha,\beta) market, and IiExp(λα,β+ϵ(pi,i1m))I_{i}\sim\text{Exp}\left(\lambda_{\alpha,\beta+\epsilon}(p_{i},\frac{i-1}{m})\right) in the (α,β+ϵ)(\alpha,\beta+\epsilon) market, where λα,β(p,x)=ep(α+βx)(1x)m\lambda_{\alpha,\beta}(p,x)=e^{-p}(\alpha+\beta x)(1-x)m. We can now bound the KL divergence between the joint distributions of the first nn observations between the two markets:

KL(α,β+ϵ,α,β)\displaystyle\text{KL}\left(\mathbb{P}_{\alpha,\beta+\epsilon},\mathbb{P}_{\alpha,\beta}\right) =KL(i=1nα,β+ϵ(Ii|i1),i=1nα,β(Ii|i1))\displaystyle=\text{KL}\left(\prod\limits_{i=1}^{n}\mathbb{P}_{\alpha,\beta+\epsilon}\left(I_{i}|\mathcal{F}_{i-1}\right),\prod\limits_{i=1}^{n}\mathbb{P}_{\alpha,\beta}\left(I_{i}|\mathcal{F}_{i-1}\right)\right)
=d=1nKL(Exp(λα,β+ϵ(pd,d1m)),Exp(λα,β(pd,d1m))|)\displaystyle=\sum\limits_{d=1}^{n}\text{KL}\left(\left.\text{Exp}\left(\lambda_{\alpha,\beta+\epsilon}(p_{d},\frac{d-1}{m})\right),\text{Exp}\left(\lambda_{\alpha,\beta}(p_{d},\frac{d-1}{m})\right)\right|\right)
d=1n(ϵd1m)22(α+βd1m)2\displaystyle\leq\sum\limits_{d=1}^{n}\frac{(\epsilon\frac{d-1}{m})^{2}}{2(\alpha+\beta\frac{d-1}{m})^{2}}
n(ϵnm)22α2\displaystyle\leq n\frac{(\epsilon\frac{n}{m})^{2}}{2\alpha^{2}} (29)

where the second equality follows from the standard decomposition of KL divergence (see for example Lemma 15.1 of [27]) and the inequality follows from the following bound on the KL divergence of the two exponential distributions:

The KL divergence for a general pair of exponentials is KL(Exp(λ),Exp(λ0))=log(λ0λ)+λλ01\text{KL}(Exp(\lambda),Exp(\lambda_{0}))=\log(\frac{\lambda_{0}}{\lambda})+\frac{\lambda}{\lambda_{0}}-1

KL(Exp(λα,β+ϵ(pd,d1m)),Exp(λα,β(pd,d1m)))\displaystyle\text{KL}\left(\text{Exp}\left(\lambda_{\alpha,\beta+\epsilon}(p_{d},\frac{d-1}{m})\right),\text{Exp}\left(\lambda_{\alpha,\beta}(p_{d},\frac{d-1}{m})\right)\right)
=\displaystyle= α+βd1m+ϵd1mα+βd1m1+ln(α+βd1mα+βd1m+ϵd1m)\displaystyle\frac{\alpha+\beta\frac{d-1}{m}+\epsilon\frac{d-1}{m}}{\alpha+\beta\frac{d-1}{m}}-1+ln\left(\frac{\alpha+\beta\frac{d-1}{m}}{\alpha+\beta\frac{d-1}{m}+\epsilon\frac{d-1}{m}}\right)
=\displaystyle= ϵd1mα+βd1mϵd1mα+βd1m+(ϵd1m)22(α+βd1m)22(ϵd1m)33!(α+(β+ϵ~)d1m)3\displaystyle\frac{\epsilon\frac{d-1}{m}}{\alpha+\beta\frac{d-1}{m}}-\frac{\epsilon\frac{d-1}{m}}{\alpha+\beta\frac{d-1}{m}}+\frac{(\epsilon\frac{d-1}{m})^{2}}{2(\alpha+\beta\frac{d-1}{m})^{2}}-\frac{2(\epsilon\frac{d-1}{m})^{3}}{3!(\alpha+(\beta+\tilde{\epsilon})\frac{d-1}{m})^{3}}
\displaystyle\leq (ϵd1m)22(α+βd1m)2\displaystyle\frac{(\epsilon\frac{d-1}{m})^{2}}{2(\alpha+\beta\frac{d-1}{m})^{2}}

where in the last equality ϵ~\tilde{\epsilon} is some value in between 0 and ϵ\epsilon. Now let AnA_{n} be a sequence of nn observations such that the policy outputs a price that is closer to π1\pi^{*}_{1}. Using Pinsker’s inequality and (29), we can bound the difference in probability of observing this sequence in the two markets:

2(α,β(An)α,β+ϵ(An))2\displaystyle 2(\mathbb{P}_{\alpha,\beta}(A_{n})-\mathbb{P}_{\alpha,\beta+\epsilon}(A_{n}))^{2} KL(α,β+ϵ,α,β)\displaystyle\leq\text{KL}(\mathbb{P}_{\alpha,\beta+\epsilon},\mathbb{P}_{\alpha,\beta})
|α,β(An)α,β+ϵ(An)|\displaystyle|\mathbb{P}_{\alpha,\beta}(A_{n})-\mathbb{P}_{\alpha,\beta+\epsilon}(A_{n})| n(ϵnm)24α2=ϵn3/22αm\displaystyle\leq\sqrt{n\frac{(\epsilon\frac{n}{m})^{2}}{4\alpha^{2}}}=\frac{\epsilon n^{3/2}}{2\alpha m} (30)

Plugging in the upper bound on on nn from the lemma statement to (30) gives us |α,β(An)α,β+ϵ(An)|<12|\mathbb{P}_{\alpha,\beta}(A_{n})-\mathbb{P}_{\alpha,\beta+\epsilon}(A_{n})|<\frac{1}{2}.

However, suppose neither inequality in the lemma statement holds, then by the definition of AnA_{n}, we have that |α,β(An)α,β+ϵ(An)||3414|=12|\mathbb{P}_{\alpha,\beta}(A_{n})-\mathbb{P}_{\alpha,\beta+\epsilon}(A_{n})|\geq|\frac{3}{4}-\frac{1}{4}|=\frac{1}{2} for n3/2αmϵn^{3/2}\leq\frac{\alpha m}{\epsilon}, which is a contradiction. ∎

Above lemma holds for any two prices π1π2\pi^{*}_{1}\neq\pi^{*}_{2}. Readers should think of π1,π2\pi^{*}_{1},\pi^{*}_{2} as the optimal prices for customer n+1n+1 in the (α,β)(\alpha,\beta) and (α,β+ϵ)(\alpha,\beta+\epsilon) market respectively. To reduce clutter in the following Corollary statement, let π1=π(x,α,β,T),π2=π(x,α,β+ε,T)\pi^{*}_{1}=\pi^{*}(x,\alpha,\beta,T^{\prime}),\pi^{*}_{2}=\pi^{*}(x,\alpha,\beta+\varepsilon,T^{\prime}), and let πM(x)\pi^{*}_{M}(x) be the optimal price for market MM i.e., πM=π1\pi^{*}_{M}=\pi^{*}_{1} if M=(α,β)M=(\alpha,\beta) and πM=π2\pi^{*}_{M}=\pi^{*}_{2} otherwise.

Corollary E.1.

Consider market MM sampled uniformly at random from {(α,β),(α,β+ϵ)}\{(\alpha,\beta),(\alpha,\beta+\epsilon)\}, let MM^{\prime} be the other market. Suppose nn is such that n3/2αmϵn^{3/2}\leq\frac{\alpha m}{\epsilon}. Let [(p1,I1),,(pn,In)][(p_{1},I_{1}),\ldots,(p_{n},I_{n})] be a sequence of nn observations generated from the market MM, where pii1p_{i}\in\mathcal{F}_{i-1}. Fix any pricing algorithm that outputs π\pi based on the first nn observations. Then for any x[0,1)x\in[0,1), any TT^{\prime} such that π1π2\pi^{*}_{1}\neq\pi^{*}_{2}:

(|ππM(x)||ππM(x)|)18\mathbb{P}\left(\left|\pi-\pi^{*}_{M^{\prime}}(x)\right|\leq\left|\pi-\pi^{*}_{M}(x)\right|\right)\geq\frac{1}{8}

where the probability is taken both with respect to the randomness from the stochastic arrival times, as well as the uniform sampling of the market parameters.

Proof.

This directly follows from Lemma E.3. The extra factor of 1/21/2 in the probability comes from the fact that we randomly picked one market out of two. ∎

E.3 Step 3: Lipschitz bound on the optimal pricing policy

Lemma E.4.

For any remaining time T(1+2)eα+βT\geq\frac{(1+\sqrt{2})e}{\alpha+\beta} and xα2e4(α+β)3Tx\leq\frac{\alpha^{2}e}{4(\alpha+\beta)^{3}T}

π(x,α,β,T)βαe4(α+β)3T\frac{\partial\pi^{*}(x,\alpha,\beta,T)}{\partial\beta}\leq\frac{-\alpha e}{4(\alpha+\beta)^{3}T}
Proof.

Differentiating both sides of (17) with respect to β\beta gives us

11XT(x)XT(x)β=TXT(x)2βTXT(x)+(αβ)T+e\frac{1}{1-X^{*}_{T}(x)}\frac{\partial X^{*}_{T}(x)}{\partial\beta}=\frac{TX^{*}_{T}(x)}{2\beta TX^{*}_{T}(x)+(\alpha-\beta)T+e}

We omit the initial state argument xx and denote XT=XT(x)X^{*}_{T}=X^{*}_{T}(x) in the following proof.

π(x,α,β,T)β\displaystyle\frac{\partial\pi^{*}(x,\alpha,\beta,T)}{\partial\beta} =11XTXTβXTα+βXTβXTβα+βXT+xα+βx\displaystyle=\frac{1}{1-X^{*}_{T}}\frac{\partial X^{*}_{T}}{\partial\beta}-\frac{X^{*}_{T}}{\alpha+\beta X^{*}_{T}}-\frac{\beta\frac{\partial X^{*}_{T}}{\partial\beta}}{\alpha+\beta X^{*}_{T}}+\frac{x}{\alpha+\beta x}
=TXT2βTXT+(αβ)T+eXTα+βXTβα+βXT(1XT)TXT2βTXT+(αβ)T+e+xα+βx\displaystyle=\frac{TX^{*}_{T}}{2\beta TX^{*}_{T}+(\alpha-\beta)T+e}-\frac{X^{*}_{T}}{\alpha+\beta X^{*}_{T}}-\frac{\beta}{\alpha+\beta X^{*}_{T}}\frac{(1-X^{*}_{T})TX^{*}_{T}}{2\beta TX^{*}_{T}+(\alpha-\beta)T+e}+\frac{x}{\alpha+\beta x}
=TXT(α+βXT)XT(2βTXT+(αβ)T+e)(1XT)βTXT(α+βXT)(2βTXT+(αβ)T+e)+xα+βx\displaystyle=\frac{TX^{*}_{T}(\alpha+\beta X^{*}_{T})-X^{*}_{T}(2\beta TX^{*}_{T}+(\alpha-\beta)T+e)-(1-X^{*}_{T})\beta TX^{*}_{T}}{(\alpha+\beta X^{*}_{T})(2\beta TX^{*}_{T}+(\alpha-\beta)T+e)}+\frac{x}{\alpha+\beta x}
=eXT(α+βXT)(2βTXT+(αβ)T+e)+xα+βx\displaystyle=\frac{-eX^{*}_{T}}{(\alpha+\beta X^{*}_{T})(2\beta TX^{*}_{T}+(\alpha-\beta)T+e)}+\frac{x}{\alpha+\beta x}
=eXT(α+βXT)(α+β)2T2+2e(αβ)T+e2+4eβxT+xα+βx\displaystyle=\frac{-eX^{*}_{T}}{(\alpha+\beta X^{*}_{T})\sqrt{(\alpha+\beta)^{2}T^{2}+2e(\alpha-\beta)T+e^{2}+4e\beta xT}}+\frac{x}{\alpha+\beta x}
eXT(α+βXT)((α+β)T+e)+xα+βx\displaystyle\leq\frac{-eX^{*}_{T}}{(\alpha+\beta X^{*}_{T})((\alpha+\beta)T+e)}+\frac{x}{\alpha+\beta x}

We replaced XTX^{*}_{T} with (18) in the last equality. The last inequality follows from the fact that
(α+β)2T2+2e(αβ)T+e2+4eβxT(α+β)T+e\sqrt{(\alpha+\beta)^{2}T^{2}+2e(\alpha-\beta)T+e^{2}+4e\beta xT}\leq(\alpha+\beta)T+e. In particular, if T(1+2)eα+βT\geq\frac{(1+\sqrt{2})e}{\alpha+\beta}, then
(α+β)2T2+2e(αβ)T+e2+4eβxT2(α+β)T\sqrt{(\alpha+\beta)^{2}T^{2}+2e(\alpha-\beta)T+e^{2}+4e\beta xT}\leq\sqrt{2}(\alpha+\beta)T. Also

XT(x,α,β)XT(0,α,β)XT(0,α,0)=αTαT+e1+22+2αα+βX^{*}_{T}(x,\alpha,\beta)\geq X^{*}_{T}(0,\alpha,\beta)\geq X^{*}_{T}(0,\alpha,0)=\frac{\alpha T}{\alpha T+e}\geq\frac{1+\sqrt{2}}{2+\sqrt{2}}\frac{\alpha}{\alpha+\beta}

The first inequality is easy to see from (18), the second follows from Corollary B.1, and the last inequality follows from the assumption on TT. So the above can be simplified to

π(x,α,β,T)βe1+22+2αα+β(α+β)2T2+xα+βx=αe2(α+β)3T+xα+βx\frac{\partial\pi^{*}(x,\alpha,\beta,T)}{\partial\beta}\leq\frac{-e\frac{1+\sqrt{2}}{2+\sqrt{2}}\frac{\alpha}{\alpha+\beta}}{(\alpha+\beta)^{2}T\sqrt{2}}+\frac{x}{\alpha+\beta x}=\frac{-\alpha e}{2(\alpha+\beta)^{3}T}+\frac{x}{\alpha+\beta x}

Then, it easy to verify that for xα2e4(α+β)3Tx\leq\frac{\alpha^{2}e}{4(\alpha+\beta)^{3}T}, π(x,α,β,T)βαe4(α+β)3T\frac{\partial\pi^{*}(x,\alpha,\beta,T)}{\partial\beta}\leq\frac{-\alpha e}{4(\alpha+\beta)^{3}T}

E.4 Step 4: Putting it all together for proof of Theorem 5.1

We are now ready to prove our main lower bound result Theorem 5.1. In the following proof, as defined earlier in Step 1, pxp_{x} denotes the induced price trajectory from the algorithm’s offered prices, and txdett^{det}_{x} is the time when adoption level hits xx in the deterministic Bass model on following pxp_{x} (both were defined in detail in the paragraphs before Lemma E.1).

Proof of Theorem 5.1.

Let T=2(1+2)α+βT=\frac{2(1+\sqrt{2})}{\alpha+\beta}, ϵ=(α+β)2α\epsilon=\frac{(\alpha+\beta)^{2}}{\alpha}, n=(αα+β)4/3m2/3n=\left\lfloor\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right\rfloor. Randomly draw the Bass model parameters from {(α,β),(α,β+ϵ)}\{(\alpha,\beta),(\alpha,\beta+\epsilon)\} with equal probabilities. We denote the chosen market as MM, and the other market MM^{\prime}. In the calculations that follow, we use 𝔼M\mathbb{E}_{M} to indicate that the expectation is taken with respect to both the random choice of Bass model parameters as well as the randomness in the stochastic Bass model.

To reduce clutter, let π1(x)=π(x,α,β,Ttxdet),π2(x)=π(x,α,β+ε,Ttxdet)\pi^{*}_{1}(x)=\pi^{*}(x,\alpha,\beta,T-t^{det}_{x}),\pi^{*}_{2}(x)=\pi^{*}(x,\alpha,\beta+\varepsilon,T-t^{det}_{x}), and let πM(x)\pi^{*}_{M}(x) be the optimal price i.e., πM(x)=π1(x)\pi^{*}_{M}(x)=\pi^{*}_{1}(x) if M=(α,β)M=(\alpha,\beta) and πM(x)=π2(x)\pi^{*}_{M}(x)=\pi^{*}_{2}(x) otherwise.

𝔼M[Pseudo-Regret(T)]\displaystyle\mathbb{E}_{M}[\text{Pseudo-Regret}(T)]
(Lemma E.2)\displaystyle\text{(Lemma~{}\ref{lem: lower bound pseudo regret price difference})}\geq 𝔼M[m0n/mmin(14(πM(x)px)2,110)𝑑x]O((α+β)1/3TeCα1/3m1/3log(m))\displaystyle\mathbb{E}_{M}\left[m\int_{0}^{n/m}\min\left(\frac{1}{4}(\pi^{*}_{M}(x)-p_{x})^{2},\frac{1}{10}\right)dx\right]-O\left(\frac{(\alpha+\beta)^{1/3}Te^{C}}{\alpha^{1/3}}m^{1/3}\sqrt{\log(m)}\right)
\displaystyle\geq m0n/m𝔼M[min(14(π1(x)π2(x)2)2,110)𝟙(|pxπM(x)||pxπM(x)|)]𝑑x\displaystyle m\int_{0}^{n/m}\mathbb{E}_{M}\left[\min\left(\frac{1}{4}\left(\frac{\pi^{*}_{1}(x)-\pi^{*}_{2}(x)}{2}\right)^{2},\frac{1}{10}\right)\mathbbm{1}\left(\left|p_{x}-\pi^{*}_{M^{\prime}}(x)\right|\leq\left|p_{x}-\pi^{*}_{M}(x)\right|\right)\right]dx
O(m1/3eC(α+β)2/3α1/3log(m))\displaystyle-O\left(\frac{m^{1/3}e^{C}}{(\alpha+\beta)^{2/3}\alpha^{1/3}}\sqrt{\log(m)}\right)
(Lemma E.4)\displaystyle\text{(Lemma~{}\ref{lem: optimal price difference deterministic})}\geq m0n/m𝔼M[min((αeε16(α+β)3(Ttxdet))2,110)𝟙(|pxπM(x)||pxπM(x)|)]𝑑x\displaystyle m\int_{0}^{n/m}\mathbb{E}_{M}\left[\min\left(\left(\frac{\alpha e\varepsilon}{16(\alpha+\beta)^{3}(T-t^{det}_{x})}\right)^{2},\frac{1}{10}\right)\mathbbm{1}\left(\left|p_{x}-\pi^{*}_{M^{\prime}}(x)\right|\leq\left|p_{x}-\pi^{*}_{M}(x)\right|\right)\right]dx
O(m1/3eC(α+β)2/3α1/3log(m))\displaystyle-O\left(\frac{m^{1/3}e^{C}}{(\alpha+\beta)^{2/3}\alpha^{1/3}}\sqrt{\log(m)}\right)
(TTtxdet)\displaystyle\text{($T\geq T-t^{det}_{x}$)}\geq m0n/mmin((e32(1+2))2,110)𝔼M[𝟙(|pxπM(x)||pxπM(x)|)]𝑑x\displaystyle m\int_{0}^{n/m}\min\left(\left(\frac{e}{32(1+\sqrt{2})}\right)^{2},\frac{1}{10}\right)\mathbb{E}_{M}\left[\mathbbm{1}\left(\left|p_{x}-\pi^{*}_{M^{\prime}}(x)\right|\leq\left|p_{x}-\pi^{*}_{M}(x)\right|\right)\right]dx
O(m1/3eC(α+β)2/3α1/3log(m))\displaystyle-O\left(\frac{m^{1/3}e^{C}}{(\alpha+\beta)^{2/3}\alpha^{1/3}}\sqrt{\log(m)}\right)
(Corollary E.1)\displaystyle\text{(Corollary~{}\ref{cor:decision rule limit})}\geq n8min((e32(1+2))2,110)O(m1/3eC(α+β)2/3α1/3log(m))\displaystyle\frac{n}{8}\min\left(\left(\frac{e}{32(1+\sqrt{2})}\right)^{2},\frac{1}{10}\right)-O\left(\frac{m^{1/3}e^{C}}{(\alpha+\beta)^{2/3}\alpha^{1/3}}\sqrt{\log(m)}\right)
=\displaystyle= Ω((αα+β)4/3m2/3)O(m1/3eC(α+β)2/3α1/3log(m))\displaystyle\Omega\left(\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right)-O\left(\frac{m^{1/3}e^{C}}{(\alpha+\beta)^{2/3}\alpha^{1/3}}\sqrt{\log(m)}\right)
=\displaystyle= Ω((αα+β)4/3m2/3)\displaystyle\Omega\left(\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right)

Using the fact that the maximum expected regret between the two markets must be at least the average, we have that for at least one of (α,β)(\alpha,\beta) and (α,β+ϵ)(\alpha,\beta+\epsilon),

𝔼[Pseudo-Regret(T)]Ω((αα+β)4/3m2/3)\displaystyle\mathbb{E}[\text{Pseudo-Regret}(T)]\geq\Omega\left(\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{2/3}\right)

Finally, we still need to check the assumptions of Lemma E.2, Lemma E.4 and Corollary E.1 are satisfied for large enough mm.

To apply Lemma E.2, we need XT(αα+β)4/3m1/3=nmX^{*}_{T}\geq\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{-1/3}=\frac{n}{m}. Using the expanded notation X(x,α,β)X^{*}(x,\alpha,\beta) introduced in Appendix B, we know from Corollary B.1 that XT(0,α,β)XT(0,α,0)=αTαT+eX^{*}_{T}(0,\alpha,\beta)\geq X^{*}_{T}(0,\alpha,0)=\frac{\alpha T}{\alpha T+e}. It is easy to verify that for large enough mm, specifically when m1/3(αα+β)4/3(2+e(α+β)α(1+2))m^{1/3}\geq\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}\left(2+\frac{e(\alpha+\beta)}{\alpha(1+\sqrt{2})}\right) and T=2(1+2)e/(α+β)T=2(1+\sqrt{2})e/(\alpha+\beta), nmαTαT+eXT(0,α,β)=XT\frac{n}{m}\leq\frac{\alpha T}{\alpha T+e}\leq X^{*}_{T}(0,\alpha,\beta)=X^{*}_{T}.

To apply Lemma E.4 we need the conditions TxTtxdet(1+2)eα+βT^{\prime}_{x}\coloneqq T-t^{det}_{x}\geq\frac{(1+\sqrt{2})e}{\alpha+\beta} and xα2e4(α+β)3Txx\leq\frac{\alpha^{2}e}{4(\alpha+\beta)^{3}T^{\prime}_{x}} for all xnmx\leq\frac{n}{m}. For any fixed α,β\alpha,\beta, for large enough mm, specifically for for m1/3α1/3(α+β)4/3(α+eC)m^{1/3}\geq\frac{\alpha^{1/3}}{(\alpha+\beta)^{4/3}}(\alpha+e^{C}), where CC is a function of α,β\alpha,\beta as defined in Assumption 2, we have that tn/mdetnepmaxα(mn)=epmaxnα(mn)(1+2)eα+βt^{det}_{n/m}\leq\frac{n}{e^{-p_{max}}\alpha(m-n)}=\frac{e^{p_{max}}n}{\alpha(m-n)}\leq\frac{(1+\sqrt{2})e}{\alpha+\beta}. The last inequality followed from simple algebraic calculations using the assumption on mm and Assumption 2. Therefore, Ttxdet(1+2)eα+βT-t^{det}_{x}\geq\frac{(1+\sqrt{2})e}{\alpha+\beta} for all xnmx\leq\frac{n}{m}. This satisfies the first condition. Furthermore, for m1/38(1+2)(α+βα)2/3m^{1/3}\geq 8(1+\sqrt{2})\left(\frac{\alpha+\beta}{\alpha}\right)^{2/3}, xn/m=(αα+β)4/3m1/3α28(1+2)(α+β)2=α2e4(α+β)3Tα2e4(α+β)3Txx\leq n/m=\left(\frac{\alpha}{\alpha+\beta}\right)^{4/3}m^{-1/3}\leq\frac{\alpha^{2}}{8(1+\sqrt{2})(\alpha+\beta)^{2}}=\frac{\alpha^{2}e}{4(\alpha+\beta)^{3}T}\leq\frac{\alpha^{2}e}{4(\alpha+\beta)^{3}T^{\prime}_{x}}. This satisfies the second condition.

Finally, to apply Corollary E.1 we needed the condition n3/2αmϵn^{3/2}\leq\frac{\alpha m}{\epsilon}. By plugging in our choice of m,ϵm,\epsilon, it is easy to verify that this condition is satisfied.

Appendix F Auxiliary Results

Lemma F.1.

Let p¯\bar{p} be a constant upper bound on the prices that the seller can offer: ptp¯p_{t}\leq\bar{p}. Then there exists constants C0,C1C_{0},C_{1} independent of mm such that for TC0log(m)+C1T\geq C_{0}\log(m)+C_{1}, there is a trivial policy that achieves O(log(1δ))O\left(\log(\frac{1}{\delta})\right) regret.

Proof.

Consider the trivial pricing policy where the seller sets the price to the highest possible value p¯\bar{p} for the entire planning horizon. The rate of arrival of customer dd is given by λ(p¯,d1m)=ep¯(α+βd1m)(md+1)ep¯α(md+1)\lambda(\bar{p},\frac{d-1}{m})=e^{-\bar{p}}(\alpha+\beta\frac{d-1}{m})(m-d+1)\geq e^{-\bar{p}}\alpha(m-d+1)

We now divide the market of mm customers into a sequence of segments with geometrically decreasing lengths. Let m0=0,m1=m/2,m2=m1+m/4,,mi=mi1+m/2im_{0}=0,m_{1}=\lfloor m/2\rfloor,m_{2}=m_{1}+\lfloor m/4\rfloor,\ldots,m_{i}=m_{i-1}+\lfloor m/2^{i}\rfloor. Let K=log2(m)log2log(1δ)2K=\lfloor\log_{2}(m)-\log_{2}\log(\frac{1}{\delta})-2\rfloor such that mK=mO(log(1δ))m_{K}=m-O(\log(\frac{1}{\delta})). We call customers dd for mi1<dmim_{i-1}<d\leq m_{i} the customers in segment ii.

Since the price is constant, we use the following short hand notation λdλ(p¯,d1m)\lambda_{d}\coloneqq\lambda(\bar{p},\frac{d-1}{m}). Note that by construction, mmim/2im-m_{i}\geq m/2^{i}. So for mi1<dmim_{i-1}<d\leq m_{i}, λdλ¯iep¯αm2i\lambda_{d}\geq\underline{\lambda}_{i}\coloneqq e^{-\bar{p}}\alpha\frac{m}{2^{i}}.

Let IdI_{d} be the stochastic inter-arrival time between customer d1d-1 and dd, then using Lemma A.1, we can obtain the following bound on the time it takes for all customers in segment ii to arrive:

(d=mi1+1miId1λdϵi)exp(ϵi2λ¯i28(mimi1))\displaystyle\mathbb{P}\left(\sum\limits_{d=m_{i-1}+1}^{m_{i}}I_{d}-\frac{1}{\lambda_{d}}\geq\epsilon_{i}\right)\leq exp\left(\frac{-\epsilon_{i}^{2}\underline{\lambda}_{i}^{2}}{8(m_{i}-m_{i-1})}\right)

Setting this to δ\delta and solving for ϵi\epsilon_{i}, one can verify that with probability 1δ1-\delta,

d=mi1+1miId1λdϵi4ep¯αlog(1δ)2im\sum\limits_{d=m_{i-1}+1}^{m_{i}}I_{d}-\frac{1}{\lambda_{d}}\geq\epsilon_{i}\coloneqq\frac{4e^{\bar{p}}}{\alpha}\sqrt{\log(\frac{1}{\delta})\frac{2^{i}}{m}}

Note that we needed ϵi2(mimi1)λ¯i\epsilon_{i}\leq\frac{2(m_{i}-m_{i-1})}{\underline{\lambda}_{i}} to apply Lemma A.1. This condition is satisfied for i=1,,Ki=1,\ldots,K because:

ϵi2(mimi1)λ¯i\displaystyle\epsilon_{i}\leq\frac{2(m_{i}-m_{i-1})}{\underline{\lambda}_{i}}\iff log(1δ)2im12i2log2(12mlog(1/δ))=log(m)log2log(1δ)2\displaystyle\sqrt{\log(\frac{1}{\delta})\frac{2^{i}}{m}}\leq\frac{1}{2}\iff i\leq 2\log_{2}\left(\frac{1}{2}\sqrt{\frac{m}{\log(1/\delta)}}\right)=\log(m)-\log_{2}\log(\frac{1}{\delta})-2

Applying a union bound to this result for all segments, we have that with probability at least 1δlog2(m)1-\delta\log_{2}(m), we have that

d=1mKId\displaystyle\sum\limits_{d=1}^{m_{K}}I_{d}\leq d=1mK1λd+i=1Kϵi\displaystyle\sum\limits_{d=1}^{m_{K}}\frac{1}{\lambda_{d}}+\sum_{i=1}^{K}\epsilon_{i}
\displaystyle\leq i=1Kmimi1λ¯i+i=1K4ep¯αlog(1δ)2im\displaystyle\sum_{i=1}^{K}\frac{m_{i}-m_{i-1}}{\underline{\lambda}_{i}}+\sum_{i=1}^{K}\frac{4e^{\bar{p}}}{\alpha}\sqrt{\log(\frac{1}{\delta})\frac{2^{i}}{m}}
=\displaystyle= 2Kep¯α+4ep¯αlog(1δ)1m2(2K/21)21\displaystyle 2K\frac{e^{\bar{p}}}{\alpha}+\frac{4e^{\bar{p}}}{\alpha}\sqrt{\log(\frac{1}{\delta})\frac{1}{m}}\frac{\sqrt{2}(2^{K/2}-1)}{\sqrt{2}-1}
\displaystyle\leq ep¯α(2log2(m)+4221log(1δ))\displaystyle\frac{e^{\bar{p}}}{\alpha}\left(2\log_{2}(m)+\frac{4\sqrt{2}}{\sqrt{2}-1}\sqrt{\log(\frac{1}{\delta})}\right)
=\displaystyle= C0log(m)+C1\displaystyle C_{0}\log(m)+C_{1}

This means that if TC0log(m)+C1T\geq C_{0}\log(m)+C_{1}, then with probability 1δlog2(m)1-\delta\log_{2}(m), the realized revenue is at least mKp¯m_{K}\bar{p}. Since the maximum possible revenue is mp¯m\bar{p}, the regret is at most p¯(mmK)=O(log(1δ))\bar{p}(m-m_{K})=O(\log(\frac{1}{\delta})). ∎