This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\addauthor

tlcyan \addauthoradolive \addauthorjbmagenta

Social Learning with Bounded Rationality:
Negative Reviews Persist under Newest First

Jackie Baek Stern School of Business, New York University, [email protected]    Atanas Dinev Massachusetts Institute of Technology, [email protected]    Thodoris Lykouris Massachusetts Institute of Technology, [email protected]
(First version: June 2024
Current version: August 2024111An extended abstract appeared at the ACM Conference on Economics and Computation (EC) 2024.)
Abstract

We study a model of social learning from reviews where customers are computationally limited and make purchases based on reading only the first few reviews displayed by the platform. Under this bounded rationality, we establish that the review ordering policy can have a significant impact. In particular, the popular Newest First ordering induces a negative review to persist as the most recent review longer than a positive review. This phenomenon, which we term the Cost of Newest First, can make the long-term revenue unboundedly lower than a counterpart where reviews are exogenously drawn for each customer.

We show that the impact of the Cost of Newest First can be mitigated under dynamic pricing, which allows the price to depend on the set of displayed reviews. Under the optimal dynamic pricing policy, the revenue loss is at most a factor of 2. On the way, we identify a structural property for this optimal dynamic pricing: the prices should ensure that the probability of a purchase is always the same, regardless of the state of reviews. We also study an extension of the model where customers put more weight on more recent reviews (and discount older reviews based on their time of posting), and we show that Newest First is still not the optimal ordering policy if customers discount slowly.

Lastly, we corroborate our theoretical findings using a real-world review dataset. We find that the average rating of the first page of reviews is statistically significantly smaller than the overall average rating, which is in line with our theoretical results.

1 Introduction

The use of product reviews to inform customer purchase decisions has become ubiquitous in a variety of online platforms, ranging from electronic commerce to accommodation and recommendation platforms. While the online nature of such platforms may hinder the ability of customers to confidently evaluate the product compared to an in-person experience, reviews written by previous customers can shed light on the product’s quality. It is well established that product reviews play a significant role on customer purchase decisions [CM06, ZZ10, Luc16].

The process in which reviews impact product purchases can be seen as a problem of social learning, which generically studies how agents update their beliefs for an unknown quantity of interest (e.g., product quality) based on observing actions of past agents (e.g., reading reviews written by past customers). The typical assumption in the literature of social learning with reviews is that, when deciding whether to purchase a product, customers consider either all reviews provided by previous customers [CIMS17, IMSZ19, GHKV23] or a summary statistic such as their average rating [BS18, CLT21, AMMO22]. The motivation for the latter assumption is that customers have limited time and computational power and thus rely on a summary statistic, often provided by the platform (see Section 1.2 for a further discussion on these lines of work).

However, in practice, a common scenario may be somewhere “in between” the above two assumptions: customers read a small number of reviews in detail. Existing works have found that the textual content of a review contains important information that goes beyond its numeric score and such information can heavily influence purchase decisions [GI10, AGI11, LDRF+13, LLS19, LYMZ22]. Therefore, customers look beyond the average review rating and read a small number of reviews in detail. In particular, [Kav21] find that 76% of customers read between 1 and 9 reviews before making a purchase. This motivates the main questions of our work:

When customers read a limited number of reviews, how does this impact social learning?
Are there operational decisions that should be reconsidered due to this bounded rationality?

To answer these questions, we study a model for a single product (formalized in Section 2), where a platform makes decisions regarding how reviews are ordered and how the product is priced. Customers arrive sequentially and each customer takes only the first cc reviews into account to inform their purchase decision, where cc is a small constant. We assume that the tt’th customer’s valuation can be decomposed as the sum of a) an idiosyncratic valuation Θt\Theta_{t} that is known to them and b) a product quality μt\mu_{t} that has a fixed mean μ\mu; the latter quantity is unknown to the customer and can only be inferred via the reviews. We assume that when a customer reads a review written by customer ss, they observe μs\mu_{s} (see Section 1.2 for a discussion of this assumption). Each customer uses cc reviews to update their belief about μ\mu, and makes a purchase if their estimate of their valuation is higher than the price. In the event of a purchase, they leave a review that future customers can read.

1.1 Our contribution

A popular review ordering policy is to display reviews in reverse chronological order (newest to oldest); we refer to this policy as σnewest{\sigma^{\textsc{newest}}}. This is the default option in platforms such as Airbnb, Tripadvisor and Macy’s222This statement is based on access on Feb 7, 2024. Many other platforms such as Amazon and Yelp list newest as the second default and have their own ordering mechanism as the default option. as it allows customers to get access to the most up-to-date reviews. In the context of our model, under the σnewest{\sigma^{\textsc{newest}}} ordering, a customer considers the cc most recent reviews. The set of these cc reviews evolves as a stochastic process over customer arrivals: when a new purchase happens and thus a new review is provided, this review replaces the cc-th most recent review.

Cost of Newest First.

By analyzing the steady state of the aforementioned stochastic process, we observe that the σnewest{\sigma^{\textsc{newest}}} ordering policy induces an undesirable behavior where negative reviews are read more than positive reviews, leading to a significant loss in overall revenue (Section 3).

To illustrate this phenomenon, consider a simple setting where customers only read the first review (c=1c=1) and the probability of a purchase is higher when the review is positive. When the tt’th customer arrives, if the first review is positive, this customer is likely to buy the product and subsequently leave a review; the new review from the tt’th customer then becomes the “first review” for the (t+1)(t+1)’th customer. On the other hand, if the first review is negative for the tt’th customer, they are less likely to buy the product, and hence the same negative review remains as the “first review” for the (t+1)(t+1)’th customer. Therefore, negative reviews persist under the newest first ordering: a review will stay longer as the first review if it is negative compared to if it is positive.

We show that this arises due to the endogeneity of the stochastic process that σnewest{\sigma^{\textsc{newest}}} induces and results in a loss of long-term revenue. To formalize this notion, we compare σnewest{\sigma^{\textsc{newest}}} to an exogenous process where each arriving customer sees an independently drawn random set of reviews; we refer to this ordering policy as σrandom{\sigma^{\textsc{random}}}. We establish that the long-term revenue under σnewest{\sigma^{\textsc{newest}}} is strictly smaller than that of σrandom{\sigma^{\textsc{random}}} under any non-degenerate instance (Theorem 3.1) and that the revenue under σnewest{\sigma^{\textsc{newest}}} can be arbitrarily smaller, in a multiplicative sense, compared to σrandom{\sigma^{\textsc{random}}} (Theorem 3.2). We refer to this phenomenon as the Cost of Newest First (CoNF).

Dynamic pricing mitigates CoNF.

Seeking to mitigate this phenomenon, we consider the impact of optimizing the product’s price (Section 4). We show that even under the optimal static price, the CoNF remains arbitrarily large (Theorem 4.1). However, if we allow for dynamic pricing, where the price can depend on the state of the reviews, we show that the CoNF is upper bounded by a factor of 22 when the idiosyncratic valuation distribution is non-negative and gracefully decays with its negative mass otherwise (Theorem 4.2).

This improvement stems from the fact that dynamic pricing allows us to change the steady state distribution of the stochastic process. Recall that, under σnewest{\sigma^{\textsc{newest}}}, the stochastic process spends more time on states with negative reviews than states with positive reviews. The optimal dynamic pricing policy sets prices so that the purchase probability is equal across all states (Theorem 4.4) — this ensures that the steady state distribution under σnewest{\sigma^{\textsc{newest}}} is the same as that of σrandom{\sigma^{\textsc{random}}}.

A broader implication of this result is that, when purchase decisions depend on the state of the first reviews, platforms that offer state-dependent prices can be arbitrarily better off than platforms that are unaware of this phenomenon and statically optimize prices (Theorem 4.6).

Time-discounting customers.

Having identified the potential inefficiency of the σnewest{\sigma^{\textsc{newest}}} ordering policy, we extend our model to incorporate the main reason behind its popularity: customers often prefer to read more recent reviews as they contain more up-to-date content. A recent survey [Mur19] shows that 48%48\% of the participants only look at reviews within the last two weeks and over 85%85\% of the participants disregard any review that was not posted in the past three months.

To capture this recency-awareness, we extend our model to allow for customers to place more weight on more recent reviews when they update their beliefs (Section 5). Despite the presence of CoNF, when customers severely time-discount, we show that σnewest{\sigma^{\textsc{newest}}} yields the highest revenue (Theorem 5.1); in the other extreme where customers do not discount at all, σrandom{\sigma^{\textsc{random}}} maximizes revenue (Theorem 5.2). Interestingly, when customers discount slightly, neither of σnewest{\sigma^{\textsc{newest}}} or σrandom{\sigma^{\textsc{random}}} maximizes revenue; it is better to consider a finite window of the most recent ww reviews, and select cc reviews at random from this set (Theorem 5.3). Finally, we show that the CoNF interacts with the discount factor in a non-trivial way. For a random set of reviews, if higher weights imply a higher purchase probability, then one would expect that non-discounting customers purchase more than discounting customers. However, due to the CoNF, we show that there are cases in which discounting customers yield a higher purchase rate than non-discounting customers (Theorem 5.4).

Empirical evidence from Tripadvisor data.

Finally in Section 6, we corroborate our theoretical findings using real-world review data from Tripadvisor, an online platform where the default review ordering policy is newest first. We evaluate 109 hotel pages, and we find that for 79 of them, the average review rating of the first 10 reviews is lower than the average review rating of all reviews. These empirical findings support our main theoretical results with statistical significance.

1.2 Related Work and Comparison of Key Modeling Assumptions

Social learning and incentivized exploration.

Classical models of social learning from [Ban92] and [BHW92] study a setting in which there is an unknown state of the world and each agent observes an independent, noisy signal about the state as well as the actions of past agents. The agent uses this information to update their beliefs and then takes an action. In this setting, undesirable “herding” behavior can arise: agents may converge to taking the wrong action. Conceptually closer to our work, [Say18] shows that dynamic pricing can mitigate the aforementioned herding behavior. Subsequent works study how social learning is affected by the agent’s signal distribution [SS00], prior for the state [CDO22], heterogeneous preferences [GPR06, LS16], as well as the structure of their observations [AMMO22]. From a different perspective, there is a stream of literature that aim to design mechanisms to help the learning process, either by modifying the information structure [KMP14, MSS20, BPS18] or by incentivizing exploration through payments [FKKK14, KKM+17].

Social learning with reviews.

Closer to our work, several papers focus on the setting where customers learn about a product’s quality through reviews [HPZ17, CIMS17, BS18, SR18, IMSZ19, CLT21, AMMO22, GHKV23, Bon23, CSS24]. This literature induces several modeling differences compared to classical social learning. First, agents do not receive independent signals of the unknown state (the product quality). Second, agents not only observe the binary purchase decision of previous agents, but also the reviews of previous agents who purchased the product. We highlight the key modeling assumptions of our work and how they relate to existing works.

No self-selection bias.

In prior works of social learning with reviews, the main difficulty stems from the self-selection bias, the idea that only customers who value the product highly will buy the product and hence these customers leave reviews with higher ratings. In the presence of self-selection bias, [CIMS17] and [IMSZ19] study conditions in which customer beliefs eventually successfully learn the quality of a product, where customers update their beliefs based on the entire history of reviews. [AMMO22, BS18, SR18, CLT21] consider models in which customers only incorporate summary statistics of prior reviews (e.g., average rating) into their beliefs. [Bon23] analyzes how the magnitude of the self-selection bias depends on the product’s quality and polarization. [CSS24] consider a model where the platform’s pricing decision affect the review ratings and characterizes the impact of the price on the average rating. This is also empirically supported by [BMZ12] which shows that Groupon discounts lead to lower ratings. [HPZ17] study a two-stage model which quantifies both self-selection bias and under-reporting bias (reviews are provided only by customers with extreme experiences); see references within for further related work.

In contrast, our work studies a model where self-selection bias does not arise. Specifically, we assume that customer tt’s valuation can be decomposed as Vt=Θt+μtV_{t}=\Theta_{t}+\mu_{t}, where Θt\Theta_{t} is customer-specific, and μt\mu_{t} has a fixed mean μ\mu shared across customers. The quantity μ\mu is the unknown quantity of interest for all customers. Our model assumes that a review reveals μt\mu_{t}. In contrast, prior works assume that a review reveals Θt+μt\Theta_{t}+\mu_{t} and one cannot separate the contribution from each term. This means that, in our model, the customer-specific valuation and the pricing decision affect the purchase probability but do not affect the review itself conditioning on a purchase. Although our assumption makes it “easier” for customers to learn μ\mu, we study a new phenomenon that arises due to the fact that customers only read a small number of reviews.

The practical motivation for our modeling assumption is the following. On most online platforms, a review is composed by both a numeric score (e.g., 4 out of 5) and a textual description that further explains the reviewer’s thoughts. Within our model, one interpretation is that the numeric score reveals Θt+μt\Theta_{t}+\mu_{t}, but one can use the textual content of the review to separate Θt\Theta_{t} from μt\mu_{t}. Therefore, we assume that reading the text of the review reveals μt\mu_{t}, but we also assume that each customer only reads a small number of reviews since reading the text takes time. Even though most platforms provide the average score of the numeric ratings, this will suffer from self-selection bias (as shown in [BS18, CLT21]) and hence customers must read reviews in detail to learn μ\mu.

[GHKV23] also study a setting with no self-selection bias (without bounded rationality), focusing on dynamic pricing. Their model assumes that customers are partitioned into a finite number of types and only read reviews written by customers of the same type. This overcomes self-selection bias as customers of the same type can be thought of as having the same value of Θt\Theta_{t} in our model.

Belief convergence vs. stochastic process.

In the existing literature, social learning is deemed “successful” if the customer’s estimate of product quality converges to the true quality. This convergence can either be that their belief distribution converges to a single point [IMSZ19, AMMO22], or that the customer’s scalar estimate of the product quality (e.g., average rating) converges to the true quality [CIMS17, BS18]. In our setting, customers update their beliefs based on the first cc reviews and, as discussed above, these cc reviews evolve across customers as a stochastic process. Therefore, customer beliefs do not converge but rather oscillate based on the state of those cc reviews, even as the number of customers goes to infinity.

Closer to our work, [PSX21] study a similar model (without bounded rationality) and show that the initial review can have an effect on the proclivity of customer purchases and the number of reviews. This bias introduced by initial reviews is also empirically observed by the work of [LKAK18]. Unlike our model, this effect diminishes over time as the product acquires more reviews and the initial review becomes less salient. Our results on the Cost of Newest First can thus be viewed as a stronger version of the result in [PSX21] as we show that, in the presence of bounded rationality, the effect of negative reviews persists even in the steady-state of the system.

Fully Bayesian vs. non-Bayesian.

Existing papers differ in whether customers incorporate information from reviews in a fully Bayesian or non-Bayesian manner. For example, [IMSZ19] and [AMMO22] study a fully Bayesian setting where all distributions (prior on μ\mu, distribution of Θt\Theta_{t}) and purchasing behaviors are common knowledge and each customer forms a posterior belief on μ\mu using the information given to them. In contrast, [CIMS17] and [CLT21] assume that customers use a simple non-Bayesian rule when making their purchase decision. Moreover, [BS18] study both Bayesian and non-Bayesian update rules and compare them.

In our paper, customers use a Bayesian framework but their update rule is not fully Bayesian. Specifically, customers start with a Beta prior on μ\mu. This prior need not be correct as we assume that μ\mu is a fixed number. We assume reviews are binary (0 or 1), and customers read cc of them and update their beliefs assuming that these reviews are independent draws from Bernoulli(μ)Bernoulli(\mu). However, this is not necessarily the “correct” Bayesian update rule for the customers, due to the endogeneity of the stochastic process of the first cc reviews. That is, under the σnewest{\sigma^{\textsc{newest}}} ordering rule, negative reviews are more likely to persist as the most recent review — hence even if μ=1/2\mu=1/2, the most recent review is more likely to be negative than positive. Therefore, a fully Bayesian customer should take this phenomenon into account when updating their beliefs. We assume that customers do not account for this (and hence are not fully Bayesian), and we study the impact of how this endogeneity impacts the steady state of the process. Our model also implicitly assumes that customers do not use additional information about previous customers’ non-purchase decisions (which are typically non-observable) and the platform’s pricing policy (which is often opaque).

Finally, our model is flexible in that it allows customers to map their belief distribution to a scalar estimate of μ\mu in an arbitrary manner, e.g, the mean of the belief distribution (which is considered in most prior work) as well as a pessimistic estimate thereof (as studied in [GHKV23]).

Other work on social learning with reviews.

[HC24] study the design of rating systems motivated by the idea that older reviews become less relevant. In a setting where the product’s quality changes, they show that a moving average rating system is optimal in reflecting the true quality. Social learning with reviews has also been studied for ranking [MSSV23], incorporating quality variability [DLT21], dealing with non-stationary environments [BPS22], and has been applied to green technology adoption [RHP23].

2 Model

We consider a platform that repeatedly offers a product to customers that arrive in consecutive rounds t=1,2,t=1,2,\ldots. The customer makes a purchase decision based on a finite number of reviews and the price; if a purchase occurs, they leave a new review for the product. We consider the platform’s decisions regarding the ordering of reviews as well as the price.

Customer valuation. The customer at round tt has a realized valuation Vt=μt+ΘtV_{t}=\mu_{t}+\Theta_{t}\in\mathbb{R} for the product, where μt\mu_{t} and Θt\Theta_{t} represent the contribution from the product’s unobservable and observable parts respectively. Specifically, μt{0,1}\mu_{t}\in\{0,1\} is drawn independently from Bernoulli(μ)Bernoulli(\mu) for each customer tt, where μ(0,1)\mu\in(0,1) is the same across all customers and is unknown to them. Contrastingly, the quantity Θt\Theta_{t} is customer-specific and is known to customer tt before they purchase. We assume that, at every round tt, Θt\Theta_{t} is drawn independently from a distribution \mathcal{F} with bounded support. The platform knows the distribution \mathcal{F} but not Θt\Theta_{t}.

If the customer at round tt knew their exact valuation VtV_{t}, then they would purchase the product if and only if VtptV_{t}\geq p_{t}, where ptp_{t} is the price of the product at time tt. However, μt\mu_{t} is unknown and hence so is VtV_{t}. We assume that the customers read reviews to learn about μ\mu, and their purchase decision depends on their belief about μ\mu after reading the reviews. Note that customers cannot aim to estimate μt\mu_{t} beyond estimating μ\mu, since μt\mu_{t} is drawn independently for each customer.

Review generation. If the customer at round tt purchases the product, they write a review that future customers may read. The review rating given by customer tt is μt{0,1}\mu_{t}\in\{0,1\} (see Section 1.2 for a discussion of this assumption). We often refer to a review with μt=1\mu_{t}=1 as positive and to a review with μt=0\mu_{t}=0 as negative. We denote by Xt{0,1,}X_{t}\in\{0,1,\perp\} the rating of customer tt’s review, where Xt=X_{t}=\perp if customer tt did not purchase the product.

2.1 Customer Purchase Behavior

We describe the customer purchase behavior at one round, taking the price and the review ordering as fixed. Customers have a prior Beta(a,b)\mathrm{Beta}(a,b) for the value of μ\mu, for some fixed a,b>0a,b>0. This prior need not be correct and could be based on information about the features of the product or summary statistics of all reviews which are subject to self-selection bias (see discussion in Section 1.2). Customers read the first cc reviews that are shown to them to update their prior. Formally, letting 𝒁t=(Zt,1,,Zt,c){0,1}c\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c})\in\{0,1\}^{c} denote the ratings of the first cc reviews shown, the customer creates the following posterior for the unobservable quality μ\mu:

ΦtBeta(a+i=1cZt,i,b+ci=1cZt,i).\Phi_{t}\coloneqq\mathrm{Beta}\left(a+\sum_{i=1}^{c}Z_{t,i},b+c-\sum_{i=1}^{c}Z_{t,i}\right).

This corresponds to the natural posterior update for μ\mu if each Zt,iZ_{t,i} is an independent draw from Bernoulli(μ)Bernoulli(\mu).333The reviews 𝒁t\bm{Z}_{t} are not necessarily independent draws from Bernoulli(μ)Bernoulli(\mu), hence the customers are not completely Bayesian. See the last point in Section 1.2 for a detailed discussion. Note that the customer places equal weight on the first cc reviews (we also consider time-discounted weights in Section 5). Based on this posterior, the customer creates an estimate V^t\hat{V}_{t} for their valuation. We assume that there is a mapping h(Φt)(0,1)h(\Phi_{t})\in(0,1) from their posterior to a real number that represents an estimate of the fixed valuation μ\mu. For example, the mapping h(Φt)=𝔼[Φt]h(\Phi_{t})={\mathbb{E}}[\Phi_{t}] represents risk-neutral customers, while if h(Φt)h(\Phi_{t}) corresponds to the ϕ\phi-quantile of Φt\Phi_{t} for ϕ<0.5\phi<0.5, this represents pessimistic customers (see Section 5.1). The customer then forms their estimated valuation V^tΘt+h(Φt)\hat{V}_{t}\coloneqq\Theta_{t}+h(\Phi_{t}) and buys the product at price ptp_{t} if and only if V^tpt\hat{V}_{t}\geq p_{t}. Finally, the customer leaves a review XtBernoulli(μ)X_{t}\sim Bernoulli(\mu) if they bought the product, otherwise Xt=X_{t}=\perp.

To ease exposition, we often use nn to denote the number of positive ratings (i.e., n=i=1cZt,in=\sum_{i=1}^{c}Z_{t,i}), and we overload notation to denote by h(n)h(n) to refer to h(Beta(a+n,b+cn))h(\mathrm{Beta}(a+n,b+c-n)). We make the natural assumption that higher number of positive ratings leads to a higher purchase probability.

Assumption 2.1.

The estimate h(n)h(n) is strictly increasing in the number of positive reviews nn.

We also assume that the idiosyncratic valuation has positive mass on non-negative values.

Assumption 2.2.

The distribution \mathcal{F} has positive mass on non-negative values: Θ[Θ0]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]>0.

We denote the above problem instance as (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) for product quality μ\mu, prior parameters a,ba,b, idiosyncratic distribution \mathcal{F}, customers’ attention budget cc, and an estimate mapping hh.

2.2 Platform Decisions

We consider two platform decisions, review ordering and pricing.

Review ordering. With respect to ordering, since customers only take the first cc reviews into account, choosing an ordering is equivalent to selecting a set of cc reviews to show. Let It={τ<t:Xτ}I_{t}=\{\tau<t:X_{\tau}\neq\perp\} be the set of previous rounds in which a review was submitted and let t={It,{Xτ}τ<t,{pτ}τ<t,{𝒁τ}τ<t}\mathcal{H}_{t}=\{I_{t},\{X_{\tau}\}_{\tau<t},\{p_{\tau}\}_{\tau<t},\{\bm{Z}_{\tau}\}_{\tau<t}\} be the observed history before round tt. At round tt, the platform maps (possibly in a randomized way) its observed history t\mathcal{H}_{t} to the set of review ratings σ(t)\sigma(\mathcal{H}_{t}) corresponding to the cc reviews shown. We study the steady-state distribution of the system; to avoid initialization corner cases, we assume that at time t=1t=1, there is an infinite pool of reviews {Xτ}τ=1\{X_{\tau}\}_{\tau=-\infty}^{-1} where Xτi.i.d.Bernoulli(μ)X_{\tau}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Bernoulli(\mu). We consider the following review ordering policies:

  • σnewest\sigma^{\textsc{newest}} selects the cc newest reviews. This is formally defined as σnewest(t)=(Zt,i)i=1c\sigma^{\textsc{newest}}(\mathcal{H}_{t})=(Z_{t,i})_{i=1}^{c} where Zt,iZ_{t,i} is the rating of the ii-th most recent review.

  • σrandom(w){\sigma^{\textsc{random}(w)}} selects cc reviews uniformly at random (without replacement) from the most recent wcw\geq c reviews, independently at each round tt. Note that σrandom(c)=σnewest\sigma^{\textsc{random}(c)}=\sigma^{\textsc{newest}}.

  • σrandom\sigma^{\textsc{random}} shows cc random reviews. This corresponds to ratings being drawn independently from Bernoulli(μ)Bernoulli(\mu) at each round tt; i.e., σrandom(t)=(Zt,i)i=1c\sigma^{\textsc{random}}(\mathcal{H}_{t})=(Z_{t,i})_{i=1}^{c} where Zt,ii.i.d.Bernoulli(μ)Z_{t,i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}Bernoulli(\mu).

Formally, σrandom\sigma^{\textsc{random}} is defined as the limit of σrandom(w)\sigma^{\textsc{random}(w)} as ww\to\infty (see Appendix A for details). Note that, under σrandom\sigma^{\textsc{random}}, the customers’ and platform’s actions at the current round do not influence the reviews shown in future rounds (i.e. the reviews are exogenous). In contrast, under σnewest\sigma^{\textsc{newest}} and σrandom(w){\sigma^{\textsc{random}(w)}} for any ww\in\mathbb{N}, the customers’ and platform’s actions influence what reviews are shown in future rounds (i.e. the reviews are endogenous to the underlying stochastic process).

Pricing. The platform also decides on the pricing policy, where the price at each round can depend on the set of cc displayed reviews. We denote a pricing policy by a function ρ:{0,1}c\rho:\{0,1\}^{c}\to\mathbb{R}, which maps the set of displayed review ratings 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} to a price. We study two classes of pricing policies: static and dynamic. We let Πstatic\Pi^{\textsc{static}} be the set of pricing policies ρ\rho that assign a fixed price pp, i.e., ρ(𝒛)=p\rho(\bm{z})=p for any 𝒛\bm{z}. Similarly, Πdynamic\Pi^{\textsc{dynamic}} includes the set of pricing policies ρ\rho where ρ(𝒛)\rho(\bm{z}) can depend on the review ratings 𝒛\bm{z}.

2.3 Revenue and the Cost of Newest First

For an ordering policy σ\sigma and pricing policy ρ\rho, we define the revenue as the steady-state revenue:444The policies we consider have a stationary distribution so, in our analysis, we replace the lim inf\liminf with a lim\lim.

Rev(σ,ρ)lim infT𝔼[t=1Tρ(𝒁t)𝟙Θt+h(Φt)ρ(𝒁t)T].\textsc{Rev}(\sigma,\rho)\coloneqq\liminf_{T\to\infty}{\mathbb{E}}\Big{[}\frac{\sum_{t=1}^{T}\rho(\bm{Z}_{t})\mathbbm{1}_{\Theta_{t}+h(\Phi_{t})\geq\rho(\bm{Z}_{t})}}{T}\Big{]}. (1)

Our main focus lies in understanding the effect of the ordering policy on the revenue. Specifically, we compare the revenues of the ordering policies σnewest\sigma^{\textsc{newest}} and σrandom\sigma^{\textsc{random}}. For a pricing policy ρ\rho, we define the Cost of Newest First (CoNF) as follows:555For all the policies ρ\rho we consider, Rev(σnewest,ρ)>0\textsc{Rev}(\sigma^{\textsc{newest}},\rho)>0. This is formalized for ρΠstatic\rho\in\Pi^{\textsc{static}} in Definition 3.1.

χ(ρ)Rev(σrandom,ρ)Rev(σnewest,ρ).\chi(\rho)\coloneqq\frac{\textsc{Rev}(\sigma^{\textsc{random}},\rho)}{{\textsc{Rev}(\sigma^{\textsc{newest}},\rho)}}.

If ρΠstatic\rho\in\Pi^{\textsc{static}}, i.e., ρ(𝒛)=p\rho(\bm{z})=p for any 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} we use Rev(σ,p)\textsc{Rev}(\sigma,p) and χ(p)\chi(p) as shorthand for the corresponding steady-state revenue and CoNF.

In Section 4, we study the CoNF when the platform can optimize its pricing policy over a class of policies. For a class of pricing policies Π\Pi, we define similarly the optimal revenue within-class with respect to an ordering policy σ\sigma and the corresponding CoNF as:

Rev(σ,Π)supρΠRev(σ,ρ)andχ(Π)Rev(σrandom,Π)Rev(σnewest,Π)respectively.\textsc{Rev}(\sigma,\Pi)\coloneqq\sup_{\rho\in\Pi}\textsc{Rev}(\sigma,\rho)\qquad\text{and}\qquad\chi(\Pi)\coloneqq\frac{\textsc{Rev}(\sigma^{\textsc{random}},\Pi)}{\textsc{Rev}(\sigma^{\textsc{newest}},\Pi)}\qquad\text{respectively}.

To ease exposition, we make the mild assumption that 0<Rev(σ,Π)<0<{\textsc{Rev}}(\sigma,\Pi)<\infty.666For the policies we consider, Rev(σ,Π)>0{\textsc{Rev}}(\sigma,\Pi)>0 by Assumption 2.2 and the fact that h(0)>0h(0)>0; Rev(σ,Π)<{\textsc{Rev}}(\sigma,\Pi)<\infty is satisfied if the maximum revenue from a customer’s idiosyncratic valuation is finite, i.e., supppΘ(Θp)<+\sup_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}(\Theta\geq p)<+\infty.

Finally, we note that, unlike classical revenue maximization works, our focus is not on identifying policies that maximize revenue but rather in comparing the performance of different ordering policies (σnewest\sigma^{\textsc{newest}} vs σrandom\sigma^{\textsc{random}}) and different pricing policies (Πstatic\Pi^{\textsc{static}} vs Πdynamic\Pi^{\textsc{dynamic}}).

Remark 2.1.

Our assumption of Bernoulli reviews and Beta prior is made for ease of exposition. Our results extend to a more general model where reviews come from an arbitrary distribution with finite support (not only {0,1}\{0,1\}) and the estimator hh arbitrarily maps reviews to an estimate for the fixed valuation μ\mu; see Appendix B for details.

3 Cost of Newest First with a Fixed Static Price

Throughout this section, we assume a static pricing policy where the price pp is fixed and given. We establish the main phenomenon of the Cost of Newest First by showing that χ(p)>1\chi(p)>1 under very mild conditions on the price pp. We then show that χ(p)\chi(p) can be arbitrarily large.

Recall that h(n)h(n) refers to h(Beta(a+n,b+cn))h(\mathrm{Beta}(a+n,b+c-n)), where n{0,,c}n\in\{0,\ldots,c\} refers to the number of positive reviews. We first introduce two natural conditions that the price should satisfy.

Definition 3.1 (Non-absorbing price).

A price pp is non-absorbing if the purchase probability is positive for any displayed review ratings; i.e., for all n{0,1,,c}n\in\{0,1,\dots,c\},

Θ[Θ+h(n)p]>0.{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]>0.

A non-absorbing price is required to guarantee that σnewest\sigma^{\textsc{newest}} does not get “stuck” in a zero-revenue state. Specifically, under an absorbing price, there exists a set of review ratings such that the purchase probability is zero. In such a state, there will be no new subsequent review, and the set of most recent reviews will never be updated, resulting in zero revenue for σnewest\sigma^{\textsc{newest}}. This is formalized in the following proposition (whose proof is provided in Appendix C.1).

Proposition 3.1.

Any absorbing price pp yields 0 revenue under σnewest\sigma^{\textsc{newest}}, i.e., Rev(σnewest,p)=0{\textsc{Rev}}(\sigma^{\textsc{newest}},p)=0.

Definition 3.2 (Non-degenerate price).

A price pp is non-degenerate if there exist n1,n2{0,,c}n_{1},n_{2}\in\{0,\ldots,c\}, n1n2n_{1}\neq n_{2} such that the purchase probability is different given n1n_{1} and n2n_{2} positive review ratings:

Θ[Θ+h(n1)p]Θ[Θ+h(n2)p].{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n_{1})\geq p]\neq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n_{2})\geq p].

A non-degenerate price implies that the review ratings “matter”, since there exist distinct review ratings where the purchase probability differs. In contrast, under a degenerate price, the review ratings have no impact on the purchase probability, hence the review ordering policy has no impact on revenue under such prices. Note that there exist prices which are absorbing and non-degenerate and there also exist prices which are degenerate and non-absorbing.777E.g. if =𝒰[0,1]\mathcal{F}=\mathcal{U}[0,1], c=a=b=1c=a=b=1, and h(n)=n+13h(n)=\frac{n+1}{3} (mean). Then the price p=43p=\frac{4}{3} is non-degenerate but absorbing, while the price p=13p=\frac{1}{3} is degenerate but non-absorbing.

3.1 Existence of Cost of Newest First

Our main result is that the revenue under σrandom\sigma^{\textsc{random}} is strictly higher than that of σnewest\sigma^{\textsc{newest}}; that is, χ(p)>1\chi(p)>1. As a building block towards this result, we first provide simple and interpretable closed form expressions for Rev(σrandom,p)\textsc{Rev}(\sigma^{\textsc{random}},p) and Rev(σnewest,p)\textsc{Rev}(\sigma^{\textsc{newest}},p) for a static price pp, which are given in Propositions 3.2 and 3.3 respectively. Let Bern(μ){\mathrm{Bern}}(\mu) denote the Bernoulli distribution with success probability μ\mu and Binom(c,μ){\mathrm{Binom}}(c,\mu) denote the Binomial distribution with cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) trials.

Proposition 3.2 (Revenue of σrandom\sigma^{\textsc{random}}).

For any fixed price pp,

Rev(σrandom,p)=p𝔼NBinom(c,μ)[Θ[Θ+h(N)p]].\textsc{Rev}(\sigma^{\textsc{random}},p)=p{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]\Big{]}.
Proof.

By definition, σrandom\sigma^{\textsc{random}} displays cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) reviews at every round tt. As a result, the number of positive reviews is distributed as Binom(c,μ){\mathrm{Binom}}(c,\mu), yielding expected revenue, at every round tt, equal to the right hand side of the theorem. Given that this quantity does not depend on tt, recalling Eq.(1), it equals the steady-state revenue. ∎

Unlike σrandom\sigma^{\textsc{random}} which displays cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) reviews at every round, the reviews displayed by σnewest\sigma^{\textsc{newest}} are an endogenous function of the history. The proof of the next result underlies the technical crux of this section and is presented in Section 3.2.

Proposition 3.3 (Revenue of σnewest\sigma^{\textsc{newest}} ).

For any non-absorbing fixed price pp,

Rev(σnewest,p)=p𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]].\textsc{Rev}(\sigma^{\textsc{newest}},p)=\frac{p}{{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}}.

Intuitively, when the newest reviews are positive, the customer is more likely to buy the product and leave a new review, which then updates the set of newest reviews. On the other hand, when the newest reviews are negative, the customer is less likely to buy, and hence the set of newest reviews is less likely to be updated. This implies that σnewest\sigma^{\textsc{newest}} spends more time in a state with negative reviews (which yield lower revenue) compared to σrandom\sigma^{\textsc{random}}. This phenomenon is the driver of our main result and we refer to it as the Cost of Newest First (CoNF).

Theorem 3.1 (Cost of Newest First).

For any non-degenerate price p>0p>0, the revenue of σnewest\sigma^{\textsc{newest}} is strictly smaller than that of σrandom\sigma^{\textsc{random}}, i.e., Rev(σrandom,p)>Rev(σnewest,p)\textsc{Rev}(\sigma^{\textsc{random}},p)>\textsc{Rev}(\sigma^{\textsc{newest}},p).

Proof.

If pp is absorbing (Definition 3.1), by Proposition 3.1, Rev(σnewest,p)=0\textsc{Rev}(\sigma^{\textsc{newest}},p)=0 as there is a state of reviews in which there will never be another purchase. However, Rev(σrandom,p)>0\textsc{Rev}(\sigma^{\textsc{random}},p)>0 as pp is non-degenerate and hence there is a state of reviews with a positive purchase probability.

If pp is non-absorbing, we next show that the expression of Proposition 3.2 is higher than the one of Proposition 3.3. By Jensen’s inequality, 𝔼[X]1𝔼[1X]{\mathbb{E}}[X]\geq\frac{1}{{\mathbb{E}}[\frac{1}{X}]} for any non-negative random variable XX and equality is achieved if and only if XX is a constant. Letting X(N)Θ[Θ+h(N)p]X(N)\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p] be the purchase probability in a state with NN positive reviews, we apply the inequality for X=X(N)X=X(N),

𝔼NBinom(c,μ)[Θ[Θ+h(N)p]]1𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]].{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]\big{]}\geq\frac{1}{{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}}.

Multiplying with p>0p>0 on both sides we obtain that Rev(σrandom,p)Rev(σnewest,p){\textsc{Rev}}(\sigma^{\textsc{random}},p)\geq{\textsc{Rev}}(\sigma^{\textsc{newest}},p). It remains to show that when pp is non-degenerate, the inequality is strict. Note that since pp is non-degenerate, we have that Θ[Θ+h(0)p]<Θ[Θ+h(c)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]<{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p] and thus Θ[Θ+h(N)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p] is not a constant random variable when NBinom(c,μ)N\sim{\mathrm{Binom}}(c,\mu). Therefore, the inequality is strict. ∎

We describe a simple example that provides intuition on the CoNF established in Theorem 3.1, which is also illustrated in Figure 1.

Example 3.1.

Suppose μ=12\mu=\frac{1}{2}, =𝒰[0,1]\mathcal{F}=\mathcal{U}[0,1], c=a=b=1c=a=b=1, h(Φt)=𝔼[Φt]=N+13h(\Phi_{t})=\mathbb{E}[\Phi_{t}]=\frac{N+1}{3}, and p=1p=1. Under σrandom\sigma^{\textsc{random}} the probability that the review shown is positive is 0.50.5 (see Figure 1(a)). Under σnewest{\sigma^{\textsc{newest}}} the purchase probability is 23\frac{2}{3} when the review is positive and 13\frac{1}{3} when it is negative. Thus, under σnewest\sigma^{\textsc{newest}}, transitioning from a positive to a negative review is twice as likely as transitioning from a negative to a positive review (see Figure 1(b)). Hence, a negative rating is twice as likely as positive rating. This leads to lower revenue in steady state for σnewest\sigma^{\textsc{newest}} compared to σrandom\sigma^{\textsc{random}}.

(1)(1) (positive)(0)(0) (negative)1/2\nicefrac{{1}}{{2}}1/2\nicefrac{{1}}{{2}}1/2\nicefrac{{1}}{{2}}1/2\nicefrac{{1}}{{2}}
(a) σrandom\sigma^{\textsc{random}} transitions.
(1)(1) (positive)(0)(0) (negative)1/3\nicefrac{{1}}{{3}}1/6\nicefrac{{1}}{{6}}2/3\nicefrac{{2}}{{3}}5/6\nicefrac{{5}}{{6}}
(b) σnewest\sigma^{\textsc{newest}} transitions.
Figure 1: The state transitions in the instance of the Example 3.1 under σrandom\sigma^{\textsc{random}} and σnewest\sigma^{\textsc{newest}}.

3.2 Characterization of Revenue under Newest (Proof of Proposition 3.3)

We provide the proof of Proposition 3.3, which contains the main technical crux of this section. We first introduce some notation. Recalling that Zt,iZ_{t,i} denotes the rating of the ii-th most recent review at round tt, we refer to the cc most recent reviews by the vector 𝒁t=(Zt,1,,Zt,c)\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c}). We note that 𝒁t\bm{Z}_{t} is a time-homogeneous Markov chain with a finite state space {0,1}c\{0,1\}^{c}. Given that we assume an infinite pool of initial reviews, 𝒁1=(X1,,Xc)\bm{Z}_{1}=(X_{-1},\ldots,X_{-c}) where XiBern(μ)X_{i}\sim{\mathrm{Bern}}(\mu) for i{1,,c}i\in\{-1,\ldots,-c\}.

With respect to its transition dynamics of this Markov chain, for the state 𝒁t=(z1,,zc){0,1}c\bm{Z}_{t}=(z_{1},\ldots,z_{c})\in\{0,1\}^{c}, the purchase probability is Θ[Θ+h(i=1czi)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}. If there is no purchase, no review is given and the state remains 𝒁t+1=(z1,,zc)\bm{Z}_{t+1}=(z_{1},\ldots,z_{c}). If there is a purchase, 𝒁t\bm{Z}_{t} transitions to the state 𝒁t+1=(1,z1,,zc1)\bm{Z}_{t+1}=(1,z_{1},\ldots,z_{c-1}) if the review is positive (with probability μ\mu) and to the state 𝒁t+1=(0,z1,,zc1)\bm{Z}_{t+1}=(0,z_{1},\ldots,z_{c-1}) if the review is negative (with probability 1μ1-\mu).

Because the price pp is non-absorbing, for every state of reviews (z1,,zc){0,1}c(z_{1},\ldots,z_{c})\in\{0,1\}^{c}, the purchase probability is positive, and the probability that a new review is positive is strictly greater than zero (since μ(0,1)\mu\in(0,1)). Then, 𝒁t\bm{Z}_{t} can reach every state from every other state with positive probability (i.e. it is a single-recurrence-class Markov Chain with no transient states), and hence 𝒁t\bm{Z}_{t} has a unique stationary distribution, which we denote by π\pi. Our next lemma exactly characterizes the form of this stationary distribution.

Lemma 3.1.

The stationary distribution π\pi of 𝐙t\bm{Z}_{t} under any non-absorbing price pp is

π(z1,,zc)=κμi=1czi(1μ)ci=1cziΘ[Θ+h(i=1czi)p],\pi_{(z_{1},\ldots,z_{c})}=\kappa\cdot\frac{\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}},

where κ=1/𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]\kappa=1/{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]} is a normalizing constant.

Proof sketch.

If the reviews were drawn i.i.d. at each round, the probability of state (z1,,zc)(z_{1},\ldots,z_{c}) would be exactly μi=1czi(1μ)ci=1czi\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}, which is the numerator. However, the set of newest reviews is only updated when there is a purchase, which occurs with probability Θ[Θ+h(i=1czi)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}. Therefore, we multiply the numerator by 1/Θ[Θ+h(i=1czi)p]1/{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}, which is the expected number of rounds until there is a new review under state (z1,,zc)(z_{1},\dots,z_{c}); in fact, we show such a property holds for general Markov chains (Lemma C.3). A formal proof is provided in Appendix C.2. ∎

Equipped with Lemma 3.1, we now prove Proposition 3.3.

Proof of Proposition 3.3.

Using Eq. (1), the revenue of σnewest\sigma^{\textsc{newest}} can be written as

Rev(σnewest,p)\displaystyle\textsc{Rev}(\sigma^{\textsc{newest}},p) =plim infT𝔼[t=1TΘ[Θ+h(i=1cZt,i)p]]T\displaystyle=p\liminf_{T\to\infty}\frac{{\mathbb{E}}\left[\sum_{t=1}^{T}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\left[\Theta+h(\sum_{i=1}^{c}Z_{t,i})\geq p\right]\right]}{T}
=p(z1,,zc){0,1}cπ(z1,,zc)Θ[Θ+h(i=1czi)p],\displaystyle=p\sum_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\pi_{(z_{1},\ldots,z_{c})}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\left[\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\right],

where the second step expresses the revenue of the stationary distribution via the Ergodic theorem. Expanding π(z1,,zc)\pi_{(z_{1},\ldots,z_{c})} based on Lemma 3.1, the Θ[Θ+h(i=1czi)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\sum_{i=1}^{c}z_{i})\geq p] term cancels out and:

Rev(σnewest,p)\displaystyle\textsc{Rev}(\sigma^{\textsc{newest}},p) =pκ((z1,,zc){0,1}cμi=1czi(1μ)ci=1czi).\displaystyle=p\cdot\kappa\cdot\Big{(}\sum_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}\Big{)}.

Note that the term in the parenthesis equals 1, since it is a sum over all probabilities of Binom(c,μ){\mathrm{Binom}}(c,\mu). This yields Rev(σnewest,p)=pκ{\textsc{Rev}}(\sigma^{\textsc{newest}},p)=p\cdot\kappa, which gives the expression in the theorem. ∎

3.3 Cost of Newest First can be arbitrarily bad

Theorem 3.1 implies that, for any non-absorbing price p>0p>0, the CoNF is strictly greater than 11, i.e. χ(p)>1\chi(p)>1.888For a non-absorbing price pp the denominator Rev(σnewest,p){\textsc{Rev}}({\sigma^{\textsc{newest}}},p) of χ(p)\chi(p) is strictly positive. We now show that it can be arbitrarily large. We first provide a closed-form expression for χ(p)\chi(p) by dividing the expressions in Propositions 3.2 and 3.3 (see Appendix C.3 for proof details).

Lemma 3.2.

For any non-absorbing price p0p\geq 0, the CoNF is given by:

χ(p)=i,j{0,,c}μi+j(1μ)2cij(ci)(cj)Θ[Θ+h(i)p]Θ[Θ+h(j)p].\chi(p)=\sum_{i,j\in\{0,\ldots,c\}}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}.
Theorem 3.2.

For any continuous value distribution \mathcal{F} with positive mass on a bounded support, and any M>0M>0, there exists a non-degenerate and non-absorbing price pp such that χ(p)>M\chi(p)>M.

Proof.

One summand in the right hand side of Lemma 3.2 has a term corresponding to the ratio of the purchase probability of all reviews being positive compared to all reviews being negative:

β(p)Θ[Θ+h(c)p]Θ[Θ+h(0)p],\beta(p)\coloneqq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]},

which quantifies how much the reviews affect the purchase probability. Since all other terms are non-negative, the CoNF is lower bounded by this summand, i.e., χ(p)μc(1μ)cβ(p)\chi(p)\geq\mu^{c}(1-\mu)^{c}\beta(p).

Since \mathcal{F} is bounded, suppose that its support is [θ¯,θ¯][\underline{\theta},\overline{\theta}]. When all reviews are negative, selecting a price of h(0)+θ¯h(0)+\overline{\theta} results in a purchase probability of 0. Combined with the continuity of \mathcal{F}, this implies that, when ph(0)+θ¯p\to h(0)+\overline{\theta}, the purchase probability goes to 0. If, on the other hand, all reviews were positive, then using h(c)>h(0)h(c)>h(0) and that \mathcal{F} is continuous and has positive mass on its support, the purchase probability is positive; i.e., limph(0)+θ¯Θ[Θ+h(c)p]>0\lim_{p\to h(0)+\overline{\theta}}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]>0. Therefore limph(0)+θ¯β(p)=+\lim_{p\to h(0)+\overline{\theta}}\beta(p)=+\infty, which implies that limph(0)+θ¯χ(p)=+\lim_{p\to h(0)+\overline{\theta}}\chi(p)=+\infty since χ(p)μc(1μ)cβ(p)\chi(p)\geq\mu^{c}(1-\mu)^{c}\beta(p).

Lastly, any price p(θ¯+h(0),θ¯+h(0))p\in(\underline{\theta}+h(0),\overline{\theta}+h(0)) is non-absorbing and non-degenerate because for such prices, 0<Θ[Θ+h(0)p]<Θ[Θ+h(c)p].0<{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]<{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]. Hence, for every M>0M>0 there exists ϵ(M)>0\epsilon(M)>0 such that p=θ¯+h(0)ϵ(M)p=\overline{\theta}+h(0)-\epsilon(M) is non-absorbing, non-degenerate, and χ(p)>M\chi(p)>M. ∎

Remark 3.1.

In Appendix C.4, we complement Theorem 3.2 by showing that, if \mathcal{F} has monotone hazard rate, χ(p)\chi(p) is monotonically increasing in pp. In Appendix C.5, we present an explicit expression when c=1c=1 and =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] is uniform on [0,θ¯][0,\overline{\theta}]; in particular, χ(p)=Θ(k)\chi(p)=\Theta(k) for p=θ¯+h(0)Θ(1k)p=\overline{\theta}+h(0)-\Theta(\frac{1}{k}).

Remark 3.2.

In Appendix C.6, we show that χ(p)β(p)\chi(p)\leq\beta(p). This suggests that, when β(p)\beta(p) is small, the Cost of Newest First is also small. The former occurs when review ratings have small impact on purchases. For example, when =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] and θ¯\overline{\theta}\to\infty, the idiosyncratic variability dominates the variability from estimating μ\mu through reviews, yielding β(p)1\beta(p)\to 1 and thus χ(p)1\chi(p)\to 1.

3.4 Generalizing the insight beyond revenue

We generalize our main insights beyond revenue loss by analyzing the distribution of the number of positive reviews among the cc reviews. We prove a structural result on this distribution (Proposition 3.4) and show that its expectation is smaller under σnewest{\sigma^{\textsc{newest}}} compared to σrandom{\sigma^{\textsc{random}}} (Theorem 3.3). These theoretical results are used as a basis for comparison in Section 6, where we analyze a real-world review dataset. We first strengthen the non-degeneracy condition (Definition 3.2) to hold for each pair of review ratings.

Definition 3.3 (Strongly non-degenerate price).

A price pp is strongly non-degenerate if the purchase probability is different for all positive review ratings n1,n2{0,1,,c}n_{1},n_{2}\in\{0,1,\ldots,c\} with n1n2n_{1}\neq n_{2}:

Θ[Θ+h(n1)p]Θ[Θ+h(n2)p].{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n_{1})\geq p]\neq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n_{2})\geq p].

Lemma 3.1 implies that the stationary distribution of seeing nn positive reviews under σnewest{\sigma^{\textsc{newest}}} is

πnnewest=κ(cn)μn(1μ)cnΘ[Θ+h(n)p]whereκ=1/𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]].\displaystyle\pi_{n}^{\textsc{newest}}=\kappa\binom{c}{n}\frac{\mu^{n}(1-\mu)^{c-n}}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]}\qquad\text{where}\qquad\kappa=1/{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}.

By Proposition 3.3, Rev(σnewest,p)=κp{\textsc{Rev}}({\sigma^{\textsc{newest}}},p)=\kappa\cdot p, so κ\kappa can be interpreted as the rate at which customers purchase under σnewest{\sigma^{\textsc{newest}}}. In contrast, the corresponding stationary distribution under σrandom{\sigma^{\textsc{random}}} is

πnrandom=(cn)μn(1μ)cn.\displaystyle\pi_{n}^{\textsc{random}}=\binom{c}{n}\mu^{n}(1-\mu)^{c-n}.

To compare πnnewest\pi_{n}^{\textsc{newest}} and πnrandom\pi_{n}^{\textsc{random}}, we show that there is critical threshold nn^{\star} such that the steady-state probability of having nn positive reviews is higher under σnewest{\sigma^{\textsc{newest}}} than σrandom{\sigma^{\textsc{random}}} if n<nn<n^{\star}, and smaller if n>nn>n^{\star}. This phenomenon is illustrated in Figure 2. Formally, nn^{\star} is the largest number of positive review ratings where the purchase probability is at most the average purchase rate κ\kappa, i.e., Θ[Θ+h(n)p]κ{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n^{\star})\geq p]\leq\kappa.

Proposition 3.4.

For any strongly non-degenerate and non-absorbing price pp, πnnewest<πnrandom\pi^{\textsc{newest}}_{n}<\pi^{\textsc{random}}_{n} if n>nn>n^{\star} and πnnewest>πnrandom\pi^{\textsc{newest}}_{n}>\pi^{\textsc{random}}_{n} if n<nn<n^{\star}.

Proof.

Observe that πnrandom=πnnewest1κΘ[Θ+h(n)p]\pi^{\textsc{random}}_{n}=\pi^{\textsc{newest}}_{n}\cdot\frac{1}{\kappa}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]. As pp is strongly non-degenerate, the purchase probability Θ[Θ+h(n)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] is strictly increasing in nn. By the definition of nn^{\star}, if n<nn<n^{\star}, Θ[Θ+h(n)p]<κ{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]<\kappa and thus πnnewest>πnrandom\pi^{\textsc{newest}}_{n}>\pi^{\textsc{random}}_{n}. The other case is analogous. ∎

Refer to caption
Figure 2: Contrasting πnrandom\pi^{\textsc{random}}_{n} and πnnewest\pi^{\textsc{newest}}_{n}, the former is the sum of cc Bernoulli random variables with probability μ\mu, while the latter has higher probability for review ratings n<nn<n^{\star} and lower for n>nn>n^{\star}. The solid vertical lines reflect the means of the distributions; the mean is higher under πrandom\pi^{\textsc{random}}. In Section 6 we show that review ratings from Tripadvisor data exhibit similar patterns.

Next, we use Proposition 3.4 to show that the mean of πnnewest\pi^{\textsc{newest}}_{n} is strictly smaller than that of πnrandom\pi^{\textsc{random}}_{n}; the proof of the this result is in Appendix C.7.

Theorem 3.3.

For any non-degenerate and non-absorbing price pp, the average number of positive reviews under σnewest{\sigma^{\textsc{newest}}} is smaller than under σrandom{\sigma^{\textsc{random}}}. Formally, 𝔼Nπnnewest[N]<𝔼Nπnrandom[N]{\mathbb{E}}_{N\sim\pi^{\textsc{newest}}_{n}}[N]<{\mathbb{E}}_{N\sim\pi^{\textsc{random}}_{n}}[N].

In Section 6, we track the distribution of review ratings for a set of hotels from Tripadvisor (which uses σnewest\sigma^{\textsc{newest}} as its default ordering policy). The results from this data analysis support our theoretical findings (Proposition 3.4 and Theorem 3.3).

4 Dynamic Pricing Mitigates the Cost of Newest First

In this section, we allow the platform to optimize the pricing policy ρ\rho, while the review ordering policy is either σnewest\sigma^{\textsc{newest}} or σrandom\sigma^{\textsc{random}}. We assume that the platform knows the true underlying quality μ\mu.999We assume μ\mu is fixed over time and the platform has access to enough reviews to estimate μ\mu arbitrarily well. Recall that Πstatic\Pi^{\textsc{static}} and Πdynamic\Pi^{\textsc{dynamic}} are the classes of static and dynamic pricing policies respectively, and that the revenue and CoNF for a class Π\Pi are defined respectively as

Rev(σ,Π)supρΠRev(σ,ρ) and χ(Π)Rev(σrandom,Π)Rev(σnewest,Π).\textsc{Rev}(\sigma,\Pi)\coloneqq\sup_{\rho\in\Pi}\textsc{Rev}(\sigma,\rho)\qquad\text{ and }\qquad\chi(\Pi)\coloneqq\frac{{\textsc{Rev}}(\sigma^{\textsc{random}},\Pi)}{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi)}.

The main results of this section (Section 4.1) establish that the CoNF can be arbitrarily large for the optimal static pricing policy (Theorem 4.1) but that it is bounded by a small constant for the optimal dynamic pricing policy (Theorem 4.2). The main technical challenge of this section is in proving Theorem 4.2. To do this, we first characterize the optimal dynamic pricing policies under both σnewest\sigma^{\textsc{newest}} and σrandom\sigma^{\textsc{random}} and derive exact expressions for their long-term revenue (Section 4.2). In doing so, we derive a structural property of the optimal dynamic pricing policy under σnewest\sigma^{\textsc{newest}}: the prices ensure that the purchase probability is always equal regardless of the state of reviews.

4.1 Cost of Newest First under Optimal Static and Dynamic Pricing

We first establish that when optimizing over static prices, the CoNF can be arbitrarily large for any number of reviews cc. Note that this is not implied by Theorem 3.2, since here we assume the platform always chooses the optimal static price for a given instance.

Theorem 4.1.

For any instance where the support of \mathcal{F} is [0,θ¯][0,\overline{\theta}], it holds that χ(Πstatic)μch(c)h(0)+θ¯\chi(\Pi^{\textsc{static}})\geq\frac{\mu^{c}h(c)}{h(0)+\overline{\theta}}.

This implies that χ(Πstatic)+\chi(\Pi^{\textsc{static}})\to+\infty if h(c)h(c) is held constant and θ¯0\overline{\theta}\to 0 and h(0)0h(0)\to 0. Intuitively, this means that when the variability in the customer’s idiosyncratic valuation Θt\Theta_{t} is negligible compared to the variability in review-inferred quality estimates, σnewest\sigma^{\textsc{newest}} spends a disproportionate time in the state with no positive reviews, which leads to unbounded CoNF.

Proof of Theorem 4.1.

Observe that any price p>h(0)+θ¯p>h(0)+\overline{\theta} is absorbing since Θ[Θ+h(0)p]=0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]=0. By Proposition 3.1, such prices induce Rev(σnewest,p)=0{\textsc{Rev}}(\sigma^{\textsc{newest}},p)=0. For any non-absorbing price pp, its revenue is at most Rev(σnewest,p)p{\textsc{Rev}}(\sigma^{\textsc{newest}},p)\leq p. As a result, maxpRev(σnewest,p)h(0)+θ¯\max_{p\in\mathbb{R}}{\textsc{Rev}}(\sigma^{\textsc{newest}},p)\leq h(0)+\overline{\theta}.

Under σrandom\sigma^{\textsc{random}}, if all reviews are positive, the non-negativity of the value distribution implies that a price p=h(c)p=h(c) induces a purchase with probability one. The probability of this event is μc\mu^{c}, which implies that maxpRev(σrandom,p)μch(c)\max_{p\in\mathbb{R}}{\textsc{Rev}}(\sigma^{\textsc{random}},p)\geq\mu^{c}h(c). Combining the two inequalities we obtain

χ(Πstatic)=maxpRev(σrandom,p)maxpRev(σnewest,p)μch(c)h(0)+θ¯.\chi(\Pi^{\textsc{static}})=\frac{\max_{p\in\mathbb{R}}{\textsc{Rev}}(\sigma^{\textsc{random}},p)}{\max_{p\in\mathbb{R}}{\textsc{Rev}}(\sigma^{\textsc{newest}},p)}\geq\frac{\mu^{c}h(c)}{h(0)+\overline{\theta}}.

Next, we show an upper bound under dynamic pricing, which is the main result of this section.

Theorem 4.2.

For any instance, it holds that χ(Πdynamic)2Θ[Θ0]\chi(\Pi^{\textsc{dynamic}})\leq\frac{2}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}.

In contrast to static pricing where the CoNF can be arbitrarily bad, Theorem 4.2 shows that its negative impact can be mitigated via dynamic pricing. If the idiosyncratic valuation Θt\Theta_{t} is always non-negative, the upper bound on χ(Πdynamic)\chi(\Pi^{\textsc{dynamic}}) is 2. Even when Θt\Theta_{t} can be negative, for example if it is non-negative at least with probability 1/2, then Theorem 4.2 implies that χ(Πdynamic)4\chi(\Pi^{\textsc{dynamic}})\leq 4.

The proof of Theorem 4.2 (Section 4.3) relies on characterizing the optimal dynamic pricing policies under both σnewest\sigma^{\textsc{newest}} and σrandom\sigma^{\textsc{random}} and their corresponding revenues (Section 4.2). Recall that with static pricing, σnewest{\sigma^{\textsc{newest}}} spends a disproportionate amount of time in a negative review state compared to σrandom{\sigma^{\textsc{random}}}. In contrast, the optimal dynamic pricing sets prices so that the purchase probability is equal across all review states, leading to σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} spending the same amount of time in each review state (Section 4.2). This allows us to bound the ratio of demands under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} by a factor of Θ[Θ0]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]. Finally, we bound the ratio of the optimal prices under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}}, which we decompose in two terms stemming from the customer’s belief about μ\mu and customer specific valuation; each term is bounded by 1 (Section 4.3).

We complement this result by a lower bound (proof in Appendix D.1) which shows that the Cost of Newest First still exists even under optimal dynamic pricing.

Proposition 4.1.

For any α<4/3\alpha<4/3, there exists an instance such that χ(Πdynamic)>α\chi(\Pi^{\textsc{dynamic}})>\alpha.

Remark 4.1.

Even under optimal dynamic pricing, it is still the case that σrandom{\sigma^{\textsc{random}}} induces no smaller revenue than σnewest{\sigma^{\textsc{newest}}}, i.e., χ(Πdynamic)1\chi(\Pi^{\textsc{dynamic}})\geq 1 (see Appendix D.2).

Remark 4.2.

If we have the additional knowledge of h(n)uh(n)\leq u for some u0u\geq 0, then we can improve the result of Theorem 4.2 to χ(Πdynamic)2Θ[Θu]Θ[Θ0]\chi(\Pi^{\textsc{dynamic}})\leq\frac{2{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]} (see Appendix D.3).

4.2 Characterization of Optimal Dynamic Pricing

A useful quantity in our characterization results is the optimal revenue for a given valuation distribution represented by a random variable VV; in particular, the optimal revenue r(V)r^{\star}(V) is101010We note that r(V)<r^{\star}(V)<\infty for our valuations distributions V=Θ+xV=\Theta+x as Θ\Theta has bounded support and x[0,1]x\in[0,1].

r(V)maxppV[Vp].r^{\star}(V)\coloneqq\max_{p\in\mathbb{R}}p{\mathbb{P}}_{V}[V\geq p].

First, we derive the optimal pricing policy for σrandom\sigma^{\textsc{random}}. Specifically, for every state of reviews, the optimal price is the revenue-maximizing price for that state. For a state of reviews 𝒛=(z1,,zc){0,1}c\bm{z}=(z_{1},\ldots,z_{c})\in\{0,1\}^{c}, we denote by N𝒛=i=1cziN_{\bm{z}}=\sum_{i=1}^{c}z_{i} the number of positive review ratings.

Theorem 4.3.

For every review state 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}, any optimal pricing policy under σrandom\sigma^{\textsc{random}} sets ρrandom(𝐳)argmaxppΘ[Θ+h(N𝐳)p]\rho^{\textsc{random}}(\bm{z})\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq p\big{]}. This implies that

Rev(σrandom,Πdynamic)=𝔼NBinom(c,μ)[r(Θ+h(N))].{\textsc{Rev}}(\sigma^{\textsc{random}},\Pi^{\textsc{dynamic}})={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}r^{\star}(\Theta+h(N))\Big{]}.
Proof.

By definition, σrandom\sigma^{\textsc{random}} displays cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) reviews at every round tt. If the displayed reviews are 𝒛=(z1,,zc){0,1}c\bm{z}=(z_{1},\ldots,z_{c})\in\{0,1\}^{c}, the revenue obtained by offering price ρ(𝒛)\rho(\bm{z}) is equal to

ρ(𝒛)Θ[Θ+h(N𝒛)ρ(𝒛)].\rho(\bm{z}){\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}.

Thus the optimal price is any revenue-maximizing price ρ(𝒛)argmaxppΘ[Θ+h(N𝒛)p]\rho^{\star}(\bm{z})\in\operatorname*{arg\,max}_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq p\big{]}. Since the number of positive reviews N𝒛N_{\bm{z}} is distributed as Binom(c,μ){\mathrm{Binom}}(c,\mu), the optimal revenue is equal to 𝔼NBinom(c,μ)[maxpp[Θ+h(N)p]]=𝔼NBinom(c,μ)[r(Θ+h(N))]{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\max_{p\in\mathbb{R}}p{\mathbb{P}}[\Theta+h(N)\geq p]\Big{]}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}r^{\star}(\Theta+h(N))\Big{]}. ∎

Next, we characterize the optimal dynamic pricing policy under σnewest\sigma^{\textsc{newest}}. We show that it satisfies a structural property: the purchase probability is equal regardless of the review state. We define the policies that satisfy this property as review-offsetting policies.

Definition 4.1.

A dynamic pricing policy ρ\rho is review-offsetting if there exists an offset aa\in\mathbb{R} such that ρ(𝐳)=h(N𝐳)+a\rho(\bm{z})=h(N_{\bm{z}})+a for all 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}.

Note that for a review-offsetting policy ρ\rho, the purchase probability at any state 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} is Θ[Θ+h(N𝒛)h(N𝒛)+a]=Θ[Θa]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq h(N_{\bm{z}})+a\big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq a], where the last term does not depend on 𝒛\bm{z}. Hence, review-offsetting policies induce equal purchase probability regardless of the state of reviews.

The main result of this section establishes that under σnewest\sigma^{\textsc{newest}}, there is a review-offsetting dynamic pricing policy that maximizes revenue and characterizes the corresponding offset.111111We note that this characterization is the only place where we require the platform to know the true quality μ\mu.

Theorem 4.4.

Let pargmaxppΘ[Θ+𝔼NBinom(c,μ)[h(N)]p]p^{\star}\in\operatorname*{arg\,max}_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]\geq p\big{]}. Under σnewest{\sigma^{\textsc{newest}}}, the review-offsetting policy with offset p𝔼NBinom(c,μ)[h(N)]p^{\star}-{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)] is an optimal dynamic pricing policy.

Intuitively, the term pp^{\star} is the optimal price when facing a single customer with a random valuation Θ+𝔼NBinom(c,μ)[h(N)]\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)] for Θ\Theta\sim\mathcal{F}. The selected offset makes the purchase probability equal to the purchase probability under the “single customer” setting with the optimal price pp^{\star}. This intuition enables us to characterize the optimal revenue of dynamic policies (proof in Appendix D.4).

Theorem 4.5.

The revenue of the optimal dynamic pricing policy under σnewest\sigma^{\textsc{newest}} equals the optimal revenue from selling to a single customer with valuation Θ+𝔼NBinom(c,μ)[h(N)]\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. That is,

maxρΠdynamicRev(σnewest,ρ)=r(Θ+𝔼NBinom(c,μ)[h(N)]).\max_{\rho\in\Pi^{\textsc{dynamic}}}{\textsc{Rev}}(\sigma^{\textsc{newest}},\rho)=r^{\star}\big{(}\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]\big{)}.
Remark 4.3.

In Appendix D.5, we also characterize the complete set of optimal policies for σnewest\sigma^{\textsc{newest}}; the stated policy is the unique optimal policy under mild regularity conditions (Appendix D.6).

To prove Theorem 4.4, for any dynamic pricing policy ρ\rho and any state 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}, we define a corresponding policy ρ~𝒛\tilde{\rho}_{\bm{z}} to be the review-offsetting policy with offset ρ(𝒛)h(N𝒛)\rho(\bm{z})-h(N_{\bm{z}}). The policy ρ~𝒛\tilde{\rho}_{\bm{z}} has the same purchase probability as ρ\rho at state 𝒛\bm{z}, i.e., Θ[Θ+h(N𝒛)ρ~𝒛(𝒛)]=Θ[Θ+h(N𝒛)ρ(𝒛)]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\tilde{\rho}_{\bm{z}}(\bm{z})\big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}. We show that (1) ρ\rho can be improved by one of ρ~𝒛\tilde{\rho}_{\bm{z}} for 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} (Lemma 4.1) and (2) using such a review-offsetting policy ρ~𝒛\tilde{\rho}_{\bm{z}} reduces the problem to facing a single customer with valuation Θ+h¯\Theta+\overline{h} (Lemma 4.2).

Lemma 4.1.

The revenue of ρ\rho is at most the highest revenue of ρ~𝐳\tilde{\rho}_{\bm{z}} over 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}, i.e.,

Rev(ρ,σnewest)max𝒛{0,1}cRev(ρ~𝒛,σnewest).{\textsc{Rev}}(\rho,{\sigma^{\textsc{newest}}})\leq\max_{\bm{z}\in\{0,1\}^{c}}{\textsc{Rev}}(\tilde{\rho}_{\bm{z}},{\sigma^{\textsc{newest}}}).

Equality holds if and only if Rev(ρ~𝐳,σnewest)=Rev(ρ~𝐳,σnewest){\textsc{Rev}}(\tilde{\rho}_{\bm{z}},{\sigma^{\textsc{newest}}})={\textsc{Rev}}(\tilde{\rho}_{\bm{z}^{\prime}},{\sigma^{\textsc{newest}}}) for all 𝐳,𝐳{0,1}c\bm{z},\bm{z}^{\prime}\in\{0,1\}^{c}.

Lemma 4.2.

Let h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. For a review-offsetting policy ρ~\tilde{\rho} with offset aa\in\mathbb{R}:

Rev(σnewest,ρ~)=(a+h¯)Θ[Θ+h¯a+h¯].{\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho})=\big{(}a+\overline{h}\big{)}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq a+\overline{h}\big{]}.
Proof of Theorem 4.4.

By Lemma 4.2, a review-offsetting policy ρ~\tilde{\rho} with offset aa has revenue

(a+h¯)Θ[Θ+h¯a+h¯].\big{(}a+\overline{h}\big{)}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq a+\overline{h}\big{]}.

This is the revenue of selecting a price (a+h¯)(a+\overline{h}) when facing a single customer with valuation Θ+h¯\Theta+\overline{h}. The revenue-maximizing price against such a customer is pp^{\star} . By Lemma 4.2, the review-offsetting policy (of the theorem) with offset a=ph¯a^{\star}=p^{\star}-\overline{h} attains this optimal revenue and is thus the revenue-maximizing review-offsetting policy. What remains is to show that dynamic pricing policies that are not review-offsetting do not obtain higher revenue; this follows from Lemma 4.1. ∎

To prove Lemmas 4.1 and  4.2, we provide a closed-form expression for the revenue of any dynamic pricing policy ρ\rho (proof in Appendix D.7), which is an analogue of Proposition 3.3. Similar to Definition 3.1, we define a non-absorbing dynamic pricing policy as one where the purchase probability for every state of reviews is positive, i.e., Θ[Θ+h(N𝒛)ρ(𝒛)]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}>0 for all 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}.

Proposition 4.2.

For a non-absorbing dynamic pricing policy ρ\rho,121212In this proposition and the following proofs we again use the notation 𝐘=(Y1,,Yc)\bm{Y}=(Y_{1},\ldots,Y_{c}) and N𝐘=i=1cYiN_{\bm{Y}}=\sum_{i=1}^{c}Y_{i}.

Rev(σnewest,ρ)=𝔼Y1,,Yci.i.d.Bern(μ)[ρ(𝒀)]𝔼Y1,,Yci.i.d.Bern(μ)[1Θ[Θ+h(N𝒀)ρ(𝒀)]].\textsc{Rev}(\sigma^{\textsc{newest}},\rho)=\frac{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}[\rho(\bm{Y})]}{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N_{\bm{Y}})\geq\rho(\bm{Y})]}\Big{]}}.
Proof of Lemma 4.2.

Since ρ~(𝒀)=a+h(N𝒀)\tilde{\rho}(\bm{Y})=a+h(N_{\bm{Y}}), its expected price is 𝔼[ρ~(𝒀)]=a+h¯{\mathbb{E}}[\tilde{\rho}(\bm{Y})]=a+\overline{h}. The purchase probability in every state is the same and equal to Θ[Θ+h(N𝒀)ρ~(𝒀)]=Θ[Θ+h¯a+h¯]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{Y}})\geq\tilde{\rho}(\bm{Y})\big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq a+\overline{h}\big{]}. As a result, by Proposition 4.2, Rev(σnewest,ρ~)=(a+h¯)Θ[Θ+h¯a+h¯]{\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho})=\big{(}a+\overline{h}\big{)}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq a+\overline{h}\big{]}. ∎

To prove Lemma 4.1 we use the following natural convexity property (proof in Section D.8).

Lemma 4.3.

Let {αi,Ai,Bi}i𝒮\{\alpha_{i},A_{i},B_{i}\}_{i\in\mathcal{S}} be such that αi,Bi>0\alpha_{i},B_{i}>0 for all ii. Then i𝒮αiAii𝒮αiBimaxi𝒮AiBi\frac{\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}}{\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}}\leq\max\limits_{\begin{subarray}{c}i\in\mathcal{S}\end{subarray}}\frac{A_{i}}{B_{i}}. Equality is achieved if and only if AiBi=AjBj\frac{A_{i}}{B_{i}}=\frac{A_{j}}{B_{j}} for all i,j𝒮i,j\in\mathcal{S}.

Proof of Lemma 4.1.

Given that ρ~𝒛\tilde{\rho}_{\bm{z}} is review-offsetting policy with offset ρ(𝒛)h(N𝒛)\rho(\bm{z})-h(N_{\bm{z}}), by Lemma 4.2,

Rev(σnewest,ρ~𝒛)=(ρ(𝒛)h(N𝒛)+h¯)A𝒛Θ[Θ+h(N𝒛)ρ(𝒛)](B𝒛)1.{\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho}_{\bm{z}})=\underbrace{\Big{(}\rho(\bm{z})-h(N_{\bm{z}})+\overline{h}\Big{)}}_{A_{\bm{z}}}\cdot\underbrace{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}}_{(B_{\bm{z}})^{-1}}.

where h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. Given that Y1,,Yci.i.dBern(μ)Y_{1},\ldots,Y_{c}\sim_{i.i.d}{\mathrm{Bern}}(\mu) and thus N𝒀Binom(c,μ)N_{\bm{Y}}\sim{\mathrm{Binom}}(c,\mu), Proposition 4.2 implies that the revenue of ρ\rho can be expressed as:

Rev(σnewest,ρ)=𝔼Y1,,Yci.i.d.Bern(μ)[h¯+ρ(𝒀)h(N𝒀)]𝔼Y1,,Yci.i.d.Bern(μ)[1Θ[Θ+h(N𝒀)ρ(𝒀)]].\textsc{Rev}(\sigma^{\textsc{newest}},\rho)=\frac{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\big{[}\overline{h}+\rho(\bm{Y})-h(N_{\bm{Y}})\big{]}}{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{Y}})\geq\rho(\bm{Y})\big{]}}\Big{]}}.

Letting 𝒮={0,1}c\mathcal{S}=\{0,1\}^{c} and α𝒛=μN𝒛(1μ)cN𝒛\alpha_{\bm{z}}=\mu^{N_{\bm{z}}}(1-\mu)^{c-N_{\bm{z}}} be the probability that cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) trials result in 𝒛\bm{z}, Lemma 4.3 implies that the maximum revenue among ρ~𝒛\tilde{\rho}_{\bm{z}} is no smaller than that of ρ\rho:

Rev(σnewest,ρ)=𝒛𝒮α𝒛A𝒛𝒛𝒮α𝒛B𝒛max𝒛𝒮A𝒛B𝒛=maxzSRev(σnewest,ρ~𝒛).\textsc{Rev}(\sigma^{\textsc{newest}},\rho)=\frac{\sum_{\bm{z}\in\mathcal{S}}\alpha_{\bm{z}}A_{\bm{z}}}{\sum_{\bm{z}\in\mathcal{S}}\alpha_{\bm{z}}B_{\bm{z}}}\leq\max_{\bm{z}\in\mathcal{S}}\frac{A_{\bm{z}}}{B_{\bm{z}}}=\max_{z\in S}{\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho}_{\bm{z}}).

By Lemma 4.3 equality holds if and only if A𝒛B𝒛=Rev(σnewest,ρ~𝒛)\frac{A_{\bm{z}}}{B_{\bm{z}}}={\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho}_{\bm{z}}) is the same for all 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}. ∎

Remark 4.4.

In Section D.9, we compare the optimal dynamic pricing policies under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}}. We show that, under mild regularity conditions, states 𝐳\bm{z} with h(N𝐳)>h¯h(N_{\bm{z}})>\overline{h} result in higher price under σnewest{\sigma^{\textsc{newest}}}, while states 𝐳\bm{z} which h(N𝐳)<h¯h(N_{\bm{z}})<\overline{h} result in lower price under σnewest{\sigma^{\textsc{newest}}}.

4.3 Cost of Newest First is Bounded under Dynamic Pricing (Theorem 4.2)

We now prove Theorem 4.2, leveraging the results of Section 4.2 that characterize the optimal dynamic pricing policies. For convenience, we denote 𝔼NBinom(c,μ){\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)} by 𝔼N{\mathbb{E}}_{N} and h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. By Theorem 4.3 and Theorem 4.5, we can express the Cost of Newest First as:

χ(Πdynamic)=𝔼NBinom(c,μ)[r(Θ+h(N))]r(Θ+h¯)=𝔼NBinom(c,μ)[r(Θ+h(N))r(Θ+h¯)],\chi(\Pi^{\textsc{dynamic}})=\frac{{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[r^{\star}\big{(}\Theta+h(N)\big{)}]}{r^{\star}(\Theta+\overline{h})}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{r^{\star}(\Theta+h(N))}{r^{\star}(\Theta+\overline{h})}\Big{]}, (2)

where the denominator does not depend on NN and can thus move inside the expectation. We now focus on the quantity inside the expectation for a particular realization of NN. For any price pn>0p_{n}>0,

r(Θ+h(n))r(Θ+h¯)\displaystyle\frac{r^{\star}(\Theta+h(n))}{r^{\star}(\Theta+\overline{h})} =p(Θ+h(n))Θ[Θ+h(n)p(Θ+h(n))]p(Θ+h¯)Θ[Θ+h¯p(Θ+h¯)]\displaystyle=\frac{p^{\star}(\Theta+h(n))\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p^{\star}(\Theta+h(n))]}{p^{\star}(\Theta+\overline{h})\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p^{\star}(\Theta+\overline{h})]}
p(Θ+h(n))pnPrice Ratio(pn,n)Θ[Θ+h(n)p(Θ+h(n))]Θ[Θ+h¯pn]Demand Ratio(pn,n).\displaystyle\leq\underbrace{\frac{p^{\star}(\Theta+h(n))}{{p}_{n}}}_{\textsc{Price Ratio}(p_{n},n)}\cdot\underbrace{\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p^{\star}(\Theta+h(n))]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq{p}_{n}]}}_{\textsc{Demand Ratio}(p_{n},n)}. (3)

The inequality replaces the revenue-maximizing price p(Θ+h¯)p^{\star}(\Theta+\overline{h}) by another price pnp_{n}, which can only increase the ratio. Given that we operate with dynamic prices, we are allowed to select a different price for any number of positive reviews NN.

In particular, a price of h¯+p(Θ+h(n))h(n)\overline{h}+p^{\star}(\Theta+h(n))-h(n) ensures that the demand ratio is one. However, if p(Θ+h(n))<h(n)p^{\star}(\Theta+h(n))<h(n), the denominator in the price ratio can be unboundedly small. To simultaneously bound the expected price and demand ratios, we select p~n=h¯+max(p(Θ+h(n))h(n),0)\tilde{p}_{n}=\overline{h}+\max(p^{\star}(\Theta+h(n))-h(n),0).

Lemma 4.4.

The expected price ratio is at most 𝔼[Price Ratio(p~N,N)]2{\mathbb{E}}\Big{[}\textsc{Price Ratio}(\tilde{p}_{N},N)\Big{]}\leq 2.

Lemma 4.5.

For any n{0,1,,c}n\in\{0,1,\ldots,c\}, the demand ratio is Demand Ratio(p~n,n)1Θ[Θ0]\textsc{Demand Ratio}(\tilde{p}_{n},n)\leq\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}.

Proof of Theorem 4.2.

The proof directly combines (2), (3), and Lemmas 4.4 and 4.5. ∎

What is left is to prove the lemmas that bound the expected price ratio and the demand ratio.

Proof of Lemma 4.4.

Given that customer valuations are additive with a belief and an idiosyncratic component, the optimal price (numerator of price ratio) can be similarly decomposed as:

p(Θ+h(n))=h(n)Belief about μ+p(Θ+h(n))h(n)Idiosyncratic valuationp^{\star}(\Theta+h(n))=\underbrace{h(n)}_{\text{Belief about $\mu$}}+\underbrace{p^{\star}(\Theta+h(n))-h(n)}_{\text{Idiosyncratic valuation}}

The expected price ratio 𝔼[Price Ratio(p~N,N)]{\mathbb{E}}\Big{[}\textsc{Price Ratio}(\tilde{p}_{N},N)\Big{]} for p~n=h¯+max(p(Θ+h(n))h(n),0)\tilde{p}_{n}=\overline{h}+\max(p^{\star}(\Theta+h(n))-h(n),0) is:

𝔼N[h(N)h¯+max(p(Θ+h(N))h(N),0)]+𝔼N[p(Θ+h(N))h(N)h¯+max(p(Θ+h(N))h(N),0)].\displaystyle{\mathbb{E}}_{N}\Bigg{[}\frac{h(N)}{\overline{h}+\max(p^{\star}(\Theta+h(N))-h(N),0)}\Bigg{]}+{\mathbb{E}}_{N}\Bigg{[}\frac{p^{\star}(\Theta+h(N))-h(N)}{\overline{h}+\max(p^{\star}(\Theta+h(N))-h(N),0)}\Bigg{]}.

Given that the denominators in both terms are positive and h¯=𝔼N[h(N)]\overline{h}={\mathbb{E}}_{N}[h(N)], each of those terms can be upper bounded by 11, concluding the proof. ∎

Proof of Lemma 4.5.

For every number of positive reviews nn, we distinguish two cases based on where the maximum in p~n\tilde{p}_{n} lies. If p(Θ+h(n))h(n)p^{\star}(\Theta+h(n))\geq h(n) the demand ratio is equal to 11. Otherwise,

Demand Ratio(p~n,n)=Θ[Θp(Θ+h(n))h(n)]Θ[Θ0]1Θ[Θ0].\textsc{Demand Ratio}(\tilde{p}_{n},n)=\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq p^{\star}(\Theta+h(n))-h(n)]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}\leq\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}.

4.4 Broader Implication: Cost of Ignoring State-dependent Customer Behavior

Suppose a platform uses σnewest\sigma^{\textsc{newest}}, and they are not aware of the phenomenon that customer’s purchase decisions depend on the state of reviews — they instead assume the purchase behavior is constant (i.e., purchase probability does not depend on the state of the newest cc reviews). Then, if the platform uses a standard data-driven approach to optimize prices (e.g., do price experimentation and estimate demand from data), the optimal revenue is Rev(σnewest,Πstatic){\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}}). In contrast, a platform can estimate separate demands for each state of reviews and employ a dynamic pricing policy to earn Rev(σnewest,Πdynamic){\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}}). We show, by comparing Rev(σnewest,Πdynamic){\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}}) with Rev(σnewest,Πstatic){\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}}), that the revenue loss from not accounting for this state-dependent behavior can be arbitrarily large. The proof of the following result is provided in Appendix D.10.

Theorem 4.6.

For any M>0M>0, there exists an instance such that Rev(σnewest,Πdynamic)Rev(σnewest,Πstatic)>M\frac{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}})}{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}})}>M.

5 Cost of Newest First under Time-Discounting Customers

As discussed in the introduction, customers prefer to read more recent reviews; this explains the practical popularity of σnewest{\sigma^{\textsc{newest}}}. We thus extend the customer model to account for this behavior of preferring newer reviews and analyze the revenue of review ordering policies under this model.

5.1 Model of Time-Discounting Customers

We first define the new behavioral model. For γ[0,1]\gamma\in[0,1], γ\gamma-time-discounting customers have the following purchase behavior. At round tt, when presented with reviews (z1,,zc){0,1}c(z_{1},\ldots,z_{c})\in\{0,1\}^{c} which appeared at rounds (t1,,tc)(t_{1},\ldots,t_{c}) in the past (t1<t2<<tc<t)(t_{1}<t_{2}<\ldots<t_{c}<t), the posterior they form is

Φt=Beta(a+i=1cγtti1zi,b+i=1cγtti1(1zi)).\Phi_{t}=\mathrm{Beta}\Big{(}a+\sum_{i=1}^{c}\gamma^{t-t_{i}-1}z_{i},b+\sum_{i=1}^{c}\gamma^{t-t_{i}-1}(1-z_{i})\Big{)}.

Comparing to the model in Section 2, customers’ weight on the ii-th review ziz_{i} depends on the number of rounds since the posting of this review ttit-t_{i}. Intuitively, this means that a significance of a review compared to the prior decreases as the review becomes more stale. Formally, a review from ss rounds ago is discounted by γs1\gamma^{s-1}. Note that γ=1\gamma=1 corresponds to our original model while γ=0\gamma=0 corresponds to customers who only consider the review from the immediately previous round.

Customers map this posterior to an estimate h(Φt)h(\Phi_{t}) of the fixed valuation μ\mu. Using their personal preference Θt\Theta_{t} and their estimate h(Φt)h(\Phi_{t}) for the fixed valuation, they form their estimated valuation V^t=Θt+h(Φt)\hat{V}_{t}=\Theta_{t}+h(\Phi_{t}) and purchase the product if and only if V^tpt\hat{V}_{t}\geq p_{t} where ptp_{t} is the price offered by the platform. We assume that \mathcal{F} is a continuous distribution and h(Beta(x,y))h(\mathrm{Beta}(x,y)) is continuous in (x,y)(x,y).

We now discuss how the random ordering policy σrandom{\sigma^{\textsc{random}}} behaves when γ<1\gamma<1. Recall (Section 2.2) that we start with an infinite pool of reviews: {Xτ}τ=1\{X_{\tau}\}_{\tau=-\infty}^{-1} where Xτi.i.d.Bern(μ)X_{\tau}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}{\mathrm{Bern}}(\mu). Then, a review chosen at random from this pool will be “infinitely stale” (i.e., tit_{i}\to-\infty almost surely), and such a review will have a weight of 0 almost surely. Therefore, σrandom{\sigma^{\textsc{random}}} is equivalent to showing no reviews; i.e., Φt=Beta(a,b)\Phi_{t}=\mathrm{Beta}(a,b).

We also analyze the σrandom(w)\sigma^{\textsc{random}(w)} policy, which selects cc reviews uniformly at random from the ww newest reviews for a window wcw\geq c. Intuitively, a higher window ww corresponds to more randomness in the displayed reviews and σrandom(w)\sigma^{\textsc{random}(w)} interpolates between σnewest\sigma^{\textsc{newest}} and σrandom\sigma^{\textsc{random}}.

We denote by Revγ(σ,ρ){\textsc{Rev}}_{\gamma}(\sigma,\rho) the steady-state revenue under γ\gamma-time-discounting customers, review ordering policy σ\sigma, and pricing policy ρ\rho, defined in Eq. (1).131313We note that the discount factor γ\gamma only affects the customer’s belief and the revenue itself is not time-discounted. Since reviews do not affect the customer’s belief under σrandom\sigma^{\textsc{random}} when γ<1\gamma<1,

Revγ(σrandom,p)pΘ[Θ+h(Beta(a,b))p]{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p)\coloneqq p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b))\geq p] (4)

and we can show that limwRevγ(σrandom(w),p)=Revγ(σrandom,p)\lim_{w\to\infty}{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p)={\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p) (see Appendix E.1).

Lastly, we define a “review-benefiting” condition where customers are more likely to buy when the weights placed on reviews are higher (Definition 5.1). Without this, it could be better for the platform to intentionally show no reviews; hence this condition effectively states that reviews “help” on average.

Definition 5.1 (Review-benefiting price for an instance).

A price pp is review-benefiting for an instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) if for any set of weights 𝐰,𝐰~[0,1]c\bm{w},\tilde{\bm{w}}\in[0,1]^{c} where 𝐰𝐰~\bm{w}\succ\tilde{\bm{w}},141414𝐰𝐰~\bm{w}\succ\tilde{\bm{w}} refers to wiw~iw_{i}\geq\tilde{w}_{i} for all ii, and the inequality is strict for at least one element.

𝔼Yii.i.d.Bern(μ)[Θ[Θ+h(Beta(a+i=1cwiYi,b+i=1cwi(1Yi)))p]]\displaystyle{\mathbb{E}}_{Y_{i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}{\mathrm{Bern}}(\mu)}\Big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h\big{(}\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}w_{i}Y_{i},b+\sum_{i=1}^{c}w_{i}(1-Y_{i})\big{)}\big{)}\geq p\big{]}\Big{]}
>𝔼Yii.i.d.Bern(μ)[Θ[Θ+h(Beta(a+i=1cw~iYi,b+i=1cw~i(1Yi)))p]].\displaystyle>{\mathbb{E}}_{Y_{i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}{\mathrm{Bern}}(\mu)}\Big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h\big{(}\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}\tilde{w}_{i}Y_{i},b+\sum_{i=1}^{c}\tilde{w}_{i}(1-Y_{i})\big{)}\big{)}\geq p\big{]}\Big{]}.

Each term refers to the expected purchase probability given the weight vector for the reviews, where the expectation is over the review ratings YiY_{i}. Definition 5.1 states that when the weights are larger, the expected purchase probability should also be larger.

To provide intuition, we discuss two instances that induce review-benefiting prices. The first is when the prior mean is correct (aa+b=μ\frac{a}{a+b}=\mu) but customers are pessimistic, i.e., their estimate h(Φ)h(\Phi) for the fixed valuation is the ϕ\phi-quantile of Φ\Phi for ϕ<0.5\phi<0.5. In this case, incorporating reviews reduces the variance of the customer’s posterior and hence a higher weight on reviews increases h(Φ)h(\Phi) on average. The second example is when customers are risk-neutral, i.e., h(Φ)h(\Phi) is the mean of Φ\Phi, but the prior mean is negatively biased (aa+b<μ\frac{a}{a+b}<\mu). Incorporating more reviews drawn from Bern(μ){\mathrm{Bern}}(\mu), on average, increases the mean of the posterior h(Φ)h(\Phi). We formalize these instances in Appendix E.2.

5.2 (Non-)Monotonicity in Revenue with Respect to ww

We analyze how Revγ(σrandom(w),p){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p) depends on the window size ww for different values of the discount factor γ\gamma. We first consider customers who discount extremely: γ=0\gamma=0; i.e., only the previous round’s review is incorporated in the customer’s belief. In Theorem 5.1 (proof in Section E.3), we establish that the revenue of σrandom(w)\sigma^{\textsc{random}(w)} is monotonic in ww. Since limwRev0(σrandom(w),p)=Rev0(σrandom,p)\lim_{w\to\infty}{\textsc{Rev}}_{0}(\sigma^{\textsc{random}(w)},p)={\textsc{Rev}}_{0}(\sigma^{\textsc{random}},p), this implies that Rev0(σnewest,p)>Rev0(σrandom(w),p)>Rev0(σrandom,p){\textsc{Rev}}_{0}(\sigma^{\textsc{newest}},p)>{\textsc{Rev}}_{0}(\sigma^{\textsc{random}(w)},p)>{\textsc{Rev}}_{0}(\sigma^{\textsc{random}},p) for any w>cw>c. Therefore, if customers heavily discount, σnewest{\sigma^{\textsc{newest}}} is indeed the best ordering policy. To ensure that the process of the ww newest reviews does not get absorbed into any state of reviews, we say that a price is strongly non-absorbing if the purchase probability for any state of reviews and any discount factor γ[0,1]\gamma\in[0,1] lies in (0,1)(0,1) (see Definition E.1 for a formal statement).

Theorem 5.1.

For a review-benefiting and strongly non-absorbing price p>0p>0, Rev0(σrandom(w),p){\textsc{Rev}}_{0}(\sigma^{\textsc{random}(w)},p) is strictly decreasing in wcw\geq c.

Next, we consider the other extreme of γ=1\gamma=1; this is equivalent to our original model where all reviews are weighted equally. By Theorem 3.1 we have already established that Rev1(σrandom,p)>Rev1(σnewest,p){\textsc{Rev}}_{1}(\sigma^{\textsc{random}},p)>{\textsc{Rev}}_{1}(\sigma^{\textsc{newest}},p). The following theorem (proof in Section E.4) extends this result by showing that Rev1(σrandom(w),p){\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w)},p) is strictly increasing in the window ww.

Theorem 5.2.

For any non-degenerate and non-absorbing static price p>0p>0, Rev1(σrandom(w),p){\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w)},p) is strictly increasing in wcw\geq c.

Slightly discounting customers.

We have established that the revenue has an opposite relationship in ww under γ=0\gamma=0 and γ=1\gamma=1. It is natural to ask what happens when γ(0,1)\gamma\in(0,1). We consider the case when customers employ a discount factor close to 1; i.e., γ=1ϵ\gamma=1-\epsilon where ϵ\epsilon is small. In the following theorem (proof in Section E.5), we establish that the revenue is no longer monotonic in ww; the maximum revenue is achieved by a finite ww.

Theorem 5.3.

For any problem instance and any non-degenerate, non-absorbing, review-benefiting, and strongly non-absorbing price p>0p>0, there exists ε>0\varepsilon>0 such that, for all discount factors γ(1ε,1)\gamma\in(1-\varepsilon,1), there exist finite w>0w>0 with

Revγ(σrandom(w),p)>max(Revγ(σnewest,p),Revγ(σranodm,p)).{\textsc{Rev}}_{\gamma}\big{(}\sigma^{\textsc{random}(w)},p\big{)}>\max\big{(}{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},p),{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{ranodm}},p)\big{)}.

5.3 Interaction between CoNF and the Discount Factor

Recall that a review-benefiting price for an instance implies that when the review ratings are drawn independently (Yii.i.d.Bern(μ)Y_{i}\stackrel{{\scriptstyle\text{i.i.d.}}}{{\sim}}{\mathrm{Bern}}(\mu)), higher weights result in a higher purchase probability. Therefore, one may expect that the revenue would be higher for non-discounting customers (γ=1\gamma=1) than for those that discount (γ<1\gamma<1). However, we show below that, under σnewest{\sigma^{\textsc{newest}}}, this intuition does not always hold due to the interaction with the Cost of Newest First.

We focus on a class of instances 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} where a single review is shown (c=1c=1), the customer-specific valuation is uniform (=𝒰[θ¯,θ¯]\mathcal{F}=\mathcal{U}[\underline{\theta},\overline{\theta}] with θ¯θ¯1\overline{\theta}-\underline{\theta}\geq 1), the fixed valuation is μ=0.5\mu=0.5, the prior parameters are a=b=1a=b=1 (correct prior mean), and h(Φ)h(\Phi) is the ϕ\phi-quantile of the customer’s belief Φ\Phi for ϕ(0,0.5)\phi\in(0,0.5), i.e., the customer’s estimate for fixed valuation is pessimistic.

Theorem 5.4.

For any instance in the class 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} and any γ<1\gamma<1, there exists a non-absorbing and review-benefiting price pγp_{\gamma} such that Revγ(σnewest,pγ)>Rev(σnewest,pγ){\textsc{Rev}}_{\gamma}({\sigma^{\textsc{newest}}},p_{\gamma})>{\textsc{Rev}}({\sigma^{\textsc{newest}}},p_{\gamma}).

The high-level idea behind this result (proof in Section E.9) is that the impact of the Cost of Newest First is higher when the weights are larger. To see this, suppose a negative review is posted at time tt, which decreases the purchase probability for the future customers who see this review as the first review. If customers are time-discounting (γ<1\gamma<1), then even if this review is the newest review, the weight of this negative review eventually vanishes. On the other hand, if customers do not discount (γ=1\gamma=1) then the weight of this review stays at 1 as long as this is remains as the newest review. As a result, the negative impact of the Cost of Newest First can outweigh the positive impact of higher weights that a review-benefiting instance offers.

6 Evidence supporting CoNF from Tripadvisor

Tripadvisor is an online platform whose default review ordering policy is σnewest{\sigma^{\textsc{newest}}}. We focused on 109 hotels in the region of Times Square, New York, where we retrieved the newest 1000 reviews for each of these hotels. We searched “Times Square” on Tripadvisor (March 20, 2024) and collected data from the first 21 hotels with more than 1000 reviews.151515We initially retrieved 3 more hotels that had less than 1000 reviews (respectively 394, 770, and 957) but later removed them to have a consistent rule. All of those three hotels exhibit CoNF and thus including them would only strengthen our empirical findings. Subsequently, we made the same query and collected 88 additional hotels (August 6, 2024). A hotel’s page on Tripadvisor displays the average numerical rating of all reviews, as well as details of the first 10 reviews. More reviews can be shown if the user clicks for the next page of reviews. Each review has a rating (an integer between 11 and 55), the date the review was written, information on the user who wrote the review, as well as the text of the review.

We validate our theoretical findings using this dataset. Recall that Theorem 3.3 states that the average number of positive reviews from the first cc reviews is smaller under σnewest{\sigma^{\textsc{newest}}} compared to σrandom{\sigma^{\textsc{random}}}. We corroborate this result by computing the empirical distributions for each hotel (similar to Figure 2). Since the hotel’s page on Tripadvisor shows 10 reviews by default, we use c=10c=10.

For each hotel, we compute two empirical distributions, which we call π^newest\hat{\pi}^{\textsc{newest}} and π^random\hat{\pi}^{\textsc{random}}. Let d1d_{1} and d2d_{2} be the dates of when the earliest and the latest review was written (among the reviews retrieved for this hotel). For each date between d1d_{1} and d2d_{2}, we compute what the 10 newest reviews were on that date and compute their average rating; this becomes a sample in our empirical distribution π^newest\hat{\pi}^{\textsc{newest}}. The number of samples is thus simply the number of days between d1d_{1} and d2d_{2}, which, on average, corresponds to 5.955.95 years. The empirical distribution π^random\hat{\pi}^{\textsc{random}} is computed by randomly sampling 10 reviews out of the all the reviews retrieved from the particular hotel and computing their average rating, where the number of samples is equal to that of π^newest\hat{\pi}^{\textsc{newest}}.

Results.

To validate our theoretical findings, we consider a null hypothesis positing that the probability of π^random\hat{\pi}^{\textsc{random}} having higher mean than π^newest\hat{\pi}^{\textsc{newest}} is exactly 50%50\%. In our data, we observe that, for 79 out of 109, the mean of π^newest\hat{\pi}^{\textsc{newest}} is strictly smaller than that of π^random\hat{\pi}^{\textsc{random}} which implies a p-value of 5.4×1075.4\times 10^{-7} (and thus rejects the null hypothesis). On average, the mean of π^newest\hat{\pi}^{\textsc{newest}} is smaller than that of π^random\hat{\pi}^{\textsc{random}} by 2.3%2.3\%, or 0.092 points out of 5 in absolute terms. Figure 3 plots a histogram of the difference between these means for the 109 hotels.

Next, in Figure 4, we plot the full distributions of π^newest\hat{\pi}^{\textsc{newest}} and π^random\hat{\pi}^{\textsc{random}} for four hotels (in Appendix F, we present the remaining 105 hotels); this is analogous to Figure 2. We observe that π^newest\hat{\pi}^{\textsc{newest}} consistently has heavier left tails, while π^random\hat{\pi}^{\textsc{random}} has heavier right tails. This observation matches the theoretical implication of our model (Proposition 3.4).

Refer to caption
Figure 3: Histogram of the difference of means of π^newest\hat{\pi}^{\textsc{newest}} and π^random\hat{\pi}^{\textsc{random}} for the 109 hotels.
Refer to caption
Figure 4: Empirical distributions for π^newest\hat{\pi}^{\textsc{newest}} and π^random\hat{\pi}^{\textsc{random}} in four example hotels. Vertical lines represent means.

7 Conclusions

In this paper, we model the idea that customers read only a small number of reviews before making purchase decisions. This model gives rise to the Cost of Newest First, the idea that, when reviews are ordered by newest first, negative reviews will persist as the newest review longer than positive reviews. This phenomenon does not arise in models from the existing literature, since prior works assume that customers incorporate either all reviews or a summary statistic of all reviews into their beliefs. We show that incorporating randomness into the review ordering or using dynamic pricing can alleviate the negative impact of the Cost of Newest First.

Our work opens up a number of intriguing avenues for future research. First, existing literature on social learning studies the self-selection bias (which we do not consider in our model) – how does this self-selection bias interact with the Cost of Newest First? Second, in terms of operational decisions, a platform contains multiple products — should it take the state of reviews into consideration when making display or ranking decisions? Third, given this bounded rationality behavior, are there alternative methods of of disseminating relevant information from reviews? For example, one could succinctly summarize information from all reviews (via, e.g, generative AI) to be the most helpful for each customer. Fourth, we assume that the true quality μ\mu does not change over time (which enables us to characterize the closed-form expression of the corresponding steady state). Given that changes in the product quality are one of the reasons behind the ubiquitous use of Newest First, it would be interesting to revisit the Cost of Newest First in such dynamic environments. Lastly, on the theoretical side, our analysis fully characterizes the steady state of a stochastic process whose state remains unchanged with some state-dependent probability (Lemma C.3 which is the crux in the analysis of Lemma 3.1). It would be interesting to apply this result to other settings that exhibit a similar structure.

Acknowledgements.

We thank the anonymous reviewers from the 25th ACM Conference on Economics and Computation (EC 2024) for their thorough feedback that greatly improved the presentation of the paper. We are also grateful to the Simons Institute for the Theory of Computing as this work started during the Fall’22 semester-long program on Data Driven Decision Processes.

References

  • [AGI11] Nikolay Archak, Anindya Ghose, and Panagiotis G Ipeirotis. Deriving the pricing power of product features by mining consumer reviews. Management science, 57(8):1485–1509, 2011.
  • [AMMO22] Daron Acemoglu, Ali Makhdoumi, Azarakhsh Malekian, and Asuman Ozdaglar. Learning from reviews: The selection effect and the speed of learning. Econometrica, 90(6):2857–2899, 2022.
  • [Ban92] Abhijit V Banerjee. A simple model of herd behavior. The quarterly journal of economics, 107(3):797–817, 1992.
  • [BHW92] Sushil Bikhchandani, David Hirshleifer, and Ivo Welch. A theory of fads, fashion, custom, and cultural change as informational cascades. Journal of political Economy, 100(5):992–1026, 1992.
  • [BMZ12] John W Byers, Michael Mitzenmacher, and Georgios Zervas. The groupon effect on yelp ratings: a root cause analysis. In Proceedings of the 13th ACM conference on electronic commerce, pages 248–265, 2012.
  • [Bon23] Tommaso Bondi. Alone, together: A model of social (mis)learning from consumer reviews. In Kevin Leyton-Brown, Jason D. Hartline, and Larry Samuelson, editors, Proceedings of the 24th ACM Conference on Economics and Computation, EC 2023, London, United Kingdom, July 9-12, 2023, page 296. ACM, 2023.
  • [BPS18] Kostas Bimpikis, Yiangos Papanastasiou, and Nicos Savva. Crowdsourcing exploration. Management Science, 64(4):1477–1973, 2018.
  • [BPS22] Etienne Boursier, Vianney Perchet, and Marco Scarsini. Social learning in non-stationary environments. In International Conference on Algorithmic Learning Theory, pages 128–129. PMLR, 2022.
  • [BS18] Omar Besbes and Marco Scarsini. On information distortions in online ratings. Operations Research, 66(3):597–610, 2018.
  • [CDO22] Ishita Chakraborty, Joyee Deb, and Aniko Oery. When do consumers talk? Available at SSRN 4155523, 2022.
  • [CIMS17] Davide Crapis, Bar Ifrach, Costis Maglaras, and Marco Scarsini. Monopoly pricing in the presence of social learning. Management Science, 63(11):3586–3608, 2017.
  • [CLT21] Ningyuan Chen, Anran Li, and Kalyan Talluri. Reviews and self-selection bias with operational implications. Management Science, 67(12):7472–7492, 2021.
  • [CM06] Judith A Chevalier and Dina Mayzlin. The effect of word of mouth on sales: Online book reviews. Journal of marketing research, 43(3):345–354, 2006.
  • [CSS24] Christoph Carnehl, André Stenzel, and Peter Schmidt. Pricing for the stars: Dynamic pricing in the presence of rating systems. Management Science, 70(3):1755–1772, 2024.
  • [DLT21] Gregory DeCroix, Xiaoyang Long, and Jordan Tong. How service quality variability hurts revenue when customers learn: Implications for dynamic personalized pricing. Operations Research, 69(3):683–708, 2021.
  • [FKKK14] Peter Frazier, David Kempe, Jon Kleinberg, and Robert Kleinberg. Incentivizing exploration. In Proceedings of the fifteenth ACM conference on Economics and computation, pages 5–22, 2014.
  • [Gal97] Robert G Gallager. Discrete stochastic processes. Journal of the Operational Research Society, 48(1):103–103, 1997.
  • [GHKV23] Wenshuo Guo, Nika Haghtalab, Kirthevasan Kandasamy, and Ellen Vitercik. Leveraging reviews: Learning to price with buyer and seller uncertainty. In Kevin Leyton-Brown, Jason D. Hartline, and Larry Samuelson, editors, Proceedings of the 24th ACM Conference on Economics and Computation, EC 2023, London, United Kingdom, July 9-12, 2023, page 816. ACM, 2023.
  • [GI10] Anindya Ghose and Panagiotis G Ipeirotis. Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE transactions on knowledge and data engineering, 23(10):1498–1512, 2010.
  • [GPR06] Jacob K Goeree, Thomas R Palfrey, and Brian W Rogers. Social learning with private and common values. Economic theory, 28:245–264, 2006.
  • [HC24] Michael Hamilton and Titing Cui. Fresh rating systems: Structure, incentives, and fees. Incentives, and Fees (June 24, 2024), 2024.
  • [HPZ17] Nan Hu, Paul A Pavlou, and Jie Zhang. On self-selection biases in online product reviews. MIS quarterly, 41(2):449–475, 2017.
  • [IMSZ19] Bar Ifrach, Costis Maglaras, Marco Scarsini, and Anna Zseleva. Bayesian social learning from consumer reviews. Operations Research, 67(5):1209–1221, 2019.
  • [Kav21] M Kavanagh. The impact of customer reviews on purchase decisions. Bizrate Insights, available at: https://bizrateinsights.com/resources/the-impact-of-customer-reviews-on-purchase-decisions/ (accessed 4 Feb 2024), 2021.
  • [KKM+17] Sampath Kannan, Michael Kearns, Jamie Morgenstern, Mallesh Pai, Aaron Roth, Rakesh Vohra, and Zhiwei Steven Wu. Fairness incentives for myopic agents. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 369–386, 2017.
  • [KMP14] Ilan Kremer, Yishay Mansour, and Motty Perry. Implementing the “wisdom of the crowd”. Journal of Political Economy, 122(5):988–1012, 2014.
  • [LDRF+13] Stephan Ludwig, Ko De Ruyter, Mike Friedman, Elisabeth C Brüggen, Martin Wetzels, and Gerard Pfann. More than words: The influence of affective content and linguistic style matches in online reviews on conversion rates. Journal of marketing, 77(1):87–103, 2013.
  • [LKAK18] Gaël Le Mens, Balázs Kovács, Judith Avrahami, and Yaakov Kareev. How endogenous crowd formation undermines the wisdom of the crowd in online ratings. Psychological science, 29(9):1475–1490, 2018.
  • [LLS19] Xiao Liu, Dokyun Lee, and Kannan Srinivasan. Large-scale cross-category analysis of consumer review content on sales conversion leveraging deep learning. Journal of Marketing Research, 56(6):918–943, 2019.
  • [LS16] Ilan Lobel and Evan Sadler. Preferences, homophily, and social learning. Operations Research, 64(3):564–584, 2016.
  • [Luc16] Michael Luca. Reviews, reputation, and revenue: The case of yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper, (12-016), 2016.
  • [LYMZ22] Zhanfei Lei, Dezhi Yin, Sabyasachi Mitra, and Han Zhang. Swayed by the reviews: Disentangling the effects of average ratings and individual reviews in online word-of-mouth. Production and Operations Management, 31(6):2393–2411, 2022.
  • [MSS20] Yishay Mansour, Aleksandrs Slivkins, and Vasilis Syrgkanis. Bayesian incentive-compatible bandit exploration. Operations Research, 68(4):1132–1161, 2020.
  • [MSSV23] Costis Maglaras, Marco Scarsini, Dongwook Shin, and Stefano Vaccari. Product ranking in the presence of social learning. Operations Research, 71(4):1136–1153, 2023.
  • [Mur19] Rosie Murphy. Local consumer review survey 2019. Technical report, National Bureau of Economic Research, 2019.
  • [PSX21] Sungsik Park, Woochoel Shin, and Jinhong Xie. The fateful first consumer review. Marketing Science, 40(3):481–507, 2021.
  • [RHP23] Hang Ren, Tingliang Huang, and Georgia Perakis. Impact of social learning on consumer subsidies and supplier capacity for green technology adoption. Available at SSRN 4335284, 2023.
  • [Say18] Amin Sayedi. Pricing in a duopoly with observational learning. Available at SSRN 3131561, 2018.
  • [SR18] Sven Schmit and Carlos Riquelme. Human interaction with recommendation systems. In International Conference on Artificial Intelligence and Statistics, pages 862–870. PMLR, 2018.
  • [SS00] Lones Smith and Peter Sørensen. Pathological outcomes of observational learning. Econometrica, 68(2):371–398, 2000.
  • [ZZ10] Feng Zhu and Xiaoquan Zhang. Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics. Journal of marketing, 74(2):133–148, 2010.

Appendix A Formal definition of the random ordering policy (Section 2.2)

We now show that the policy σrandom(t){\sigma^{\textsc{random}}}(\mathcal{H}_{t}) can be defined as the limit of σrandom(w)(t)\sigma^{\textsc{random}(w)}(\mathcal{H}_{t}) as ww\to\infty. For a vector of review ratings 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}, we denote N(𝒛)N𝒛=i=1cziN(\bm{z})\coloneqq N_{\bm{z}}=\sum_{i=1}^{c}z_{i}. Proposition A.1 implies that as ww\to\infty, the draws from σrandom(w)\sigma^{\textsc{random}(w)} across different rounds are independent and the distribution σrandom(w)(t)\sigma^{\textsc{random}(w)}(\mathcal{H}_{t}) approaches the distribution of σrandom{\sigma^{\textsc{random}}}.

Proposition A.1.

For any review rating vectors 𝐳(1),𝐳(2),,𝐳(k){0,1}c\bm{z}^{(1)},\bm{z}^{(2)},\ldots,\bm{z}^{(k)}\in\{0,1\}^{c} and rounds t(1)<t(2)<<t(k)t^{(1)}<t^{(2)}<\ldots<t^{(k)}, the distribution of reviews by σrandom(w)\sigma^{\textsc{random}(w)} approaches the one of σrandom\sigma^{\textsc{random}} as ww\to\infty, i.e.,

limw[i{1,,k}:σrandom(w)(t(i))=𝒛(i)]=i=1k[σrandom(t(i))=𝒛(i)].\lim_{w\to\infty}{\mathbb{P}}\Big{[}\forall i\in\{1,\ldots,k\}:\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}\Big{]}=\prod_{i=1}^{k}{\mathbb{P}}\Big{[}\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}\Big{]}.

We first show that as ww goes to infinity, the cc reviews selected by σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) at each round t(i)t^{(i)} come from the infinite pool of reviews {Xi}i=1\{X_{-i}\}_{i=1}^{\infty} with high probability. Let nw=wt(k)n_{w}=\lfloor\sqrt{w-t^{(k)}}\rfloor be the largest integer such that nw2wt(k)n_{w}^{2}\leq w-t^{(k)} and let ww be large enough so that nw1n_{w}\geq 1. We define OrderInf\mathcal{E}^{\textsc{OrderInf}} as the event that for every i{1,,k}i\in\{1,\ldots,k\} the cc reviews selected by σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) come from the set {X1,,Xnw2}\{X_{-1},\ldots,X_{-n_{w}^{2}}\}. Lemma A.1 shows that OrderInf\mathcal{E}^{\textsc{OrderInf}} happens with high probability.

Lemma A.1.

[OrderInf][(nw2c)(wc)]k{\mathbb{P}}[\mathcal{E}^{\textsc{OrderInf}}]\geq\Big{[}\frac{\binom{n_{w}^{2}}{c}}{\binom{w}{c}}\Big{]}^{k}. Furthermore, limw[OrderInf]=1\lim_{w\to\infty}{\mathbb{P}}[\mathcal{E}^{\textsc{OrderInf}}]=1.

Given that σrandom(w)\sigma^{\textsc{random}(w)} selects from the infinite negative pool of reviews with high probability (if OrderInf\mathcal{E}^{\textsc{OrderInf}} holds), we now derive a concentration bound for the reviews in this pool. Recall that Xi.i.d.Bern(μ)X_{-\ell}\sim_{i.i.d.}{\mathrm{Bern}}(\mu) for {1,,nw2}\ell\in\{1,\ldots,n_{w}^{2}\}. We partition those reviews into nwn_{w} groups each containing nwn_{w} reviews. Let Conc\mathcal{E}^{\textsc{Conc}} be the event that, for each group, the average review rating concentrates around the group’s mean, i.e.

Conc={j{1,,nw}:=(j1)nw+1jnwX[μ(nwnw2/3),μ(nw+nw2/3)]}.\mathcal{E}^{\textsc{Conc}}=\Big{\{}\forall j\in\{1,\ldots,n_{w}\}:\sum_{\ell=(j-1)\cdot n_{w}+1}^{j\cdot n_{w}}X_{-\ell}\in[\mu(n_{w}-n_{w}^{2/3}),\mu(n_{w}+n_{w}^{2/3})]\Big{\}}.

Our next lemma shows that the concentration event Conc\mathcal{E}^{\textsc{Conc}} happens with high probability.

Lemma A.2.

[Conc]12nwexp(nw1/3μ3){\mathbb{P}}\Big{[}\mathcal{E}^{\textsc{Conc}}\Big{]}\geq 1-2n_{w}\exp\Big{(}-\frac{n_{w}^{1/3}\mu}{3}\Big{)}. Furthermore, limw[Conc]=1\lim_{w\to\infty}{\mathbb{P}}\Big{[}\mathcal{E}^{\textsc{Conc}}\Big{]}=1.

In order to decompose the left-hand-side of Proposition A.1 into a product of probabilities, we show the following independence lemma.

Lemma A.3.

Let (y1,,ynw2){0,1}nw2(y_{1},\ldots,y_{n_{w}^{2}})\in\{0,1\}^{n_{w}^{2}}. Then conditioned on (X1,,Xnw2)=(y1,,ynw2)(X_{-1},\ldots,X_{-n_{w}^{2}})=(y_{1},\ldots,y_{n_{w}^{2}}) and OrderInf\mathcal{E}^{\textsc{OrderInf}}, the events {σrandom(t(i))=𝐳(i)}\{\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}\} for i=1,,ki=1,\ldots,k are independent.

Having shown independence across rounds, we prove that, for any round t(i)t^{(i)}, σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) is close to σrandom\sigma^{\textsc{random}}. Let 𝒜\mathcal{A} be the set of review rating sequences for which event Conc\mathcal{E}^{\textsc{Conc}} holds i.e.

𝒜={(y1,,ynw2){0,1}nw2:j{1,,nw},=(j1)nw+1jnwy[μ(nwnw2/3),μ(nw+nw2/3)]}\mathcal{A}=\Big{\{}(y_{1},\ldots,y_{n_{w}^{2}})\in\{0,1\}^{n_{w}^{2}}:\forall j\in\{1,\ldots,n_{w}\},\sum_{\ell=(j-1)\cdot n_{w}+1}^{j\cdot n_{w}}y_{\ell}\in[\mu(n_{w}-n_{w}^{2/3}),\mu(n_{w}+n_{w}^{2/3})]\Big{\}}

Our next lemma shows that if the events OrderInf\mathcal{E}^{\textsc{OrderInf}} and Conc\mathcal{E}^{\textsc{Conc}} hold, then for any round t(i)t^{(i)}, the distribution of σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) is close to the distribution of σrandom(t(i))\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}}). To ease notation, we let X1:nw2(X1,,Xnw2)X_{-1:n_{w}^{2}}\coloneqq(X_{-1},\ldots,X_{-n_{w}^{2}}) and y1:nw2(y1,,ynw2)y_{1:n_{w}^{2}}\coloneqq(y_{1},\ldots,y_{n_{w}^{2}}).

Lemma A.4.

There exists some function f:f:\mathbb{N}\to\mathbb{R} satisfying limnf(n)=0\lim_{n\to\infty}f(n)=0 such that assuming that OrderInf\mathcal{E}^{\textsc{OrderInf}} and Conc\mathcal{E}^{\textsc{Conc}} hold, for any 𝐳=(z1,,zc){0,1}c\bm{z}=(z_{1},\ldots,z_{c})\in\{0,1\}^{c} and (y1,,ynw2)𝒜(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}:

|[σrandom(w)(t(i))=𝒛|X1:nw2=y1:nw2][σrandom(t(i))=𝒛]|f(nw).\Big{|}{\mathbb{P}}\Big{[}\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}|X_{-1:n_{w}^{2}}=y_{1:n_{w}^{2}}\Big{]}-{\mathbb{P}}\Big{[}\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}})=\bm{z}\Big{]}\Big{|}\leq f(n_{w}).
Proof of Proposition A.1.

By Lemma A.1 and Lemma A.2, we can upper bound the probability that neither of OrderInf\mathcal{E}^{\textsc{OrderInf}} nor Conc\mathcal{E}^{\textsc{Conc}} holds by

[(¬OrderInf)(¬Conc)]1[(nw2c)(wc)]k+2nwexp(nw1/3μ3)w0.{\mathbb{P}}[(\neg\mathcal{E}^{\textsc{OrderInf}})\cup(\neg\mathcal{E}^{\textsc{Conc}})]\leq 1-\Big{[}\frac{\binom{n_{w}^{2}}{c}}{\binom{w}{c}}\Big{]}^{k}+2n_{w}\exp\Big{(}-\frac{n_{w}^{1/3}\mu}{3}\Big{)}\to_{w\to\infty}0. (5)

We thus assume that OrderInf\mathcal{E}^{\textsc{OrderInf}} and Conc\mathcal{E}^{\textsc{Conc}} holds which means we focus on sequences (X1,,Xnw2)𝒜(X_{-1},\ldots,X_{-n_{w}^{2}})\in\mathcal{A}. By the law of total probability and the independence of σrandom(w)\sigma^{\textsc{random}(w)} across rounds (i.e. Lemma A.3):

[i{1,,k}:σrandom(w)(t(i))=𝒛(i)|OrderInf,Conc]\displaystyle{\mathbb{P}}\Big{[}\forall i\in\{1,\ldots,k\}:\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}|\mathcal{E}^{\textsc{OrderInf}},\mathcal{E}^{\textsc{Conc}}\Big{]} (6)
=(y1,,ynw2)𝒜i=1k[σrandom(w)(t(i))=𝒛(i)|OrderInf,X1:nw2=y1:nw2](Dec)[X1:nw2=y1:nw2|OrderInf,Conc]Seq(y1:nw2).\displaystyle=\sum_{(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}}\underbrace{\prod_{i=1}^{k}{\mathbb{P}}\Big{[}\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}|\mathcal{E}^{\textsc{OrderInf}},X_{-1:n_{w}^{2}}=y_{1:n_{w}^{2}}\Big{]}}_{(\textsc{Dec})}\cdot\underbrace{{\mathbb{P}}\big{[}X_{-1:n_{w}^{2}}=y_{1:n_{w}^{2}}|\mathcal{E}^{\textsc{OrderInf}},\mathcal{E}^{\textsc{Conc}}\big{]}}_{\textsc{Seq}(y_{1:n_{w}^{2}})}.

By Lemma A.4, we can upper and lower bound this decomposition by

i=1k([σrandom(t(i))=𝒛(i)]f(nw))(Dec)i=1k([σrandom(t(i))=𝒛(i)]+f(nw)).\prod_{i=1}^{k}\Bigg{(}{\mathbb{P}}\Big{[}\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}\Big{]}-f(n_{w})\Bigg{)}\leq(\textsc{Dec})\leq\prod_{i=1}^{k}\Bigg{(}{\mathbb{P}}\Big{[}\sigma^{\textsc{random}}(\mathcal{H}_{t^{(i)}})=\bm{z}^{(i)}\Big{]}+f(n_{w})\Bigg{)}. (7)

Recall that 𝒜\mathcal{A} consists of all sequences (y1,,ynw2)(y_{1},\ldots,y_{n_{w}^{2}}) that satisfy the event Conc\mathcal{E}^{\textsc{Conc}}. Summing across those sequences and using the independence between the choice of σrandom(w)\sigma^{\textsc{random}(w)} (which determines OrderInf\mathcal{E}^{\textsc{OrderInf}}) and the values of X1:nw2X_{-1:n_{w}^{2}} (which determine Conc\mathcal{E}^{\textsc{Conc}}), we obtain

(y1,,ynw2)𝒜Seq(y1:nw2)=[Conc|OrderInf]=[Conc]w1Lemma A.2.\sum_{(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}}\textsc{Seq}(y_{1:n_{w}^{2}})={\mathbb{P}}[\mathcal{E}^{\textsc{Conc}}|\mathcal{E}^{\textsc{OrderInf}}]=\underbrace{{\mathbb{P}}[\mathcal{E}^{\textsc{Conc}}]\to_{w\to\infty}1}_{\lx@cref{creftype~refnum}{lemma_bernoulli_concentration_nc}}. (8)

We conclude the proof by combining (5), (6), (7), (8) and using the fact that f(nw)w0f(n_{w})\to_{w\to\infty}0. ∎

A.1 High probability bound on OrderInf\mathcal{E}^{\textsc{OrderInf}} (Proof of Lemma A.1)

Proof of Lemma A.1.

Fix a particular round t(i)t^{(i)}. Note that since nw=wt(k)n_{w}=\lfloor\sqrt{w-t^{(k)}}\rfloor, the ww most recent reviews that σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) considers contain (X1,,Xnw2)(X_{-1},\ldots,X_{-n_{w}^{2}}). The probability that all cc reviews come from that set is (nw2c)(wc)\frac{\binom{n_{w}^{2}}{c}}{\binom{w}{c}}. For each round in t(1),,t(k)t^{(1)},\ldots,t^{(k)}, the subset of cc reviews is independently drawn by σrandom(w)\sigma^{\textsc{random}(w)}. Thus, [OrderInf][(nw2c)(wc)]k{\mathbb{P}}[\mathcal{E}^{\textsc{OrderInf}}]\geq\Big{[}\frac{\binom{n_{w}^{2}}{c}}{\binom{w}{c}}\Big{]}^{k}. Next notice that nw2wt(k)1n_{w}^{2}\geq w-t^{(k)}-1 and therefore, (nw2c)(wc)(wt(k)1c)(wc)w1\frac{\binom{n_{w}^{2}}{c}}{\binom{w}{c}}\geq\frac{\binom{w-t^{(k)}-1}{c}}{\binom{w}{c}}\to_{w\to\infty}1 showing the second part. ∎

A.2 Concentration for infinite pool (Proof of Lemma A.2)

Proof of Lemma A.2.

For a fixed j{1,,nw}j\in\{1,\ldots,n_{w}\}, by Chernoff bound, it holds that

[=(j1)nw+1jnwX[μ(nwnw2/3),μ(nw+nw2/3)]]12exp(nw1/3μ3).{\mathbb{P}}\Big{[}\sum_{\ell=(j-1)n_{w}+1}^{jn_{w}}X_{-\ell}\in[\mu(n_{w}-n_{w}^{2/3}),\mu(n_{w}+n_{w}^{2/3})]\Big{]}\geq 1-2\exp\Big{(}-\frac{n_{w}^{1/3}\mu}{3}\Big{)}.

By union bound on the nwn_{w} groups, the event Conc\mathcal{E}^{\textsc{Conc}} has probability at least 12nwexp(nw1/3μ3)1-2n_{w}\exp\Big{(}-\frac{n_{w}^{1/3}\mu}{3}\Big{)}. Note that nwexp(nw1/3μ3)w0n_{w}\exp\Big{(}-\frac{n_{w}^{1/3}\mu}{3}\Big{)}\to_{w\to\infty}0, which yields the second part. ∎

A.3 Independence of σrandom(w)\sigma^{\textsc{random}(w)} across rounds (Proof of Lemma A.3)

Proof of Lemma A.3.

By definition, σrandom(w)\sigma^{\textsc{random}(w)} samples a cc-sized subset of reviews independently at every round. Thus, conditioning on the subset being sampled from (X1,,Xnw2)(X_{-1},\ldots,X_{-n_{w}^{2}}), i.e. OrderInf\mathcal{E}^{\textsc{OrderInf}}, and also conditioning on the exact values of these ratings, i.e. (X1,,Xnw2)=(y1,,ynw2)(X_{-1},\ldots,X_{-n_{w}^{2}})=(y_{1},\ldots,y_{n_{w}^{2}}), the draws of σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) are independent for i=1,,ki=1,\ldots,k. ∎

A.4 Single round approximation (Proof of Lemma A.4)

Let σrandom(w)(t(i))=(Z1,,Zc){0,1}c\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=(Z_{1},\ldots,Z_{c})\in\{0,1\}^{c} be the cc review ratings chosen from the most recent ww reviews. Recall that when OrderInf\mathcal{E}^{\textsc{OrderInf}} holds, σrandom(w)\sigma^{\textsc{random}(w)} selects reviews from (X1,,Xnw2)(X_{-1},\ldots,X_{-n_{w}^{2}}) and that (X1,,Xnw2)=(y1,,ynw2)(X_{-1},\ldots,X_{-n_{w}^{2}})=(y_{1},\ldots,y_{n_{w}^{2}}). Let Sj={y}=(j1)nw+1jnwS_{j}=\{y_{\ell}\}_{\ell=(j-1)\cdot n_{w}+1}^{j\cdot n_{w}} for j{1,,nw}j\in\{1,\ldots,n_{w}\} be a partition of all reviews (y1,,ynw2)(y_{1},\ldots,y_{n_{w}^{2}}) into nwn_{w} groups of nwn_{w} reviews each. We first show that the reviews drawn by σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) come from different groups SjS_{j} with high probability. Let diff_groups\mathcal{E}^{\textsc{diff\_groups}} be the event that no two indices m1<m2{1,,c}m_{1}<m_{2}\in\{1,\ldots,c\} are such that Zm1Z_{m_{1}} and Zm2Z_{m_{2}} belong to the same group SjS_{j} for some j{1,,nw}j\in\{1,\ldots,n_{w}\}. Our next lemma lower bounds the probability of diff_groups\mathcal{E}^{\textsc{diff\_groups}}.

Lemma A.5.

Assume that OrderInf\mathcal{E}^{\textsc{OrderInf}} and Conc\mathcal{E}^{\textsc{Conc}} hold. The probability of any two selected indices being of different groups is: [diff_group]1c!c2nw2c1(nw2c)c{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]\geq 1-c!\cdot c^{2}\frac{n_{w}^{2c-1}}{(n_{w}^{2}-c)^{c}} and thus limw[diff_group]=1\lim_{w\to\infty}{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]=1.

Proof.

Let m1,m2,j\mathcal{E}_{m_{1},m_{2},j} be the event that Zm1,Zm2SjZ_{m_{1}},Z_{m_{2}}\in S_{j}. Notice that diff_group\mathcal{E}^{\textsc{diff\_group}} is exactly the event that none of the m1,m2,j\mathcal{E}_{m_{1},m_{2},j} hold. Further, note that Zm1,Zm2SjZ_{m_{1}},Z_{m_{2}}\in S_{j} implies that Zm1,Zm1+1,,Zm2SjZ_{m_{1}},Z_{m_{1}+1},\ldots,Z_{m_{2}}\in S_{j} since we output an ordered set of cc reviews by recency. There are thus (nwm2m1+1)\binom{n_{w}}{m_{2}-m_{1}+1} ways to choose Zm1,Zm1+1,,Zm2Z_{m_{1}},Z_{m_{1}+1},\ldots,Z_{m_{2}} to be in the same group SjS_{j}. Hence the probability of m1,m2,j\mathcal{E}_{m_{1},m_{2},j} is at most

[m1,m2,j](nw2m11)(nwm2m1+1)(nw2cm2)(nw2c){\mathbb{P}}[\mathcal{E}_{m_{1},m_{2},j}]\leq\frac{\binom{n_{w}^{2}}{m_{1}-1}\binom{n_{w}}{m_{2}-m_{1}+1}\binom{n_{w}^{2}}{c-m_{2}}}{\binom{n_{w}^{2}}{c}}

as there are at most (nw2m11)\binom{n_{w}^{2}}{m_{1}-1} ways to choose Z1,,Zm11Z_{1},\ldots,Z_{m_{1}-1} and at most (nw2m2m1)\binom{n_{w}^{2}}{m_{2}-m_{1}} ways to choose Zm2+1,,ZcZ_{m_{2}+1},\ldots,Z_{c}. Using the inequalities (nk)kk!(nk)nk\frac{(n-k)^{k}}{k!}\leq\binom{n}{k}\leq n^{k}, we obtain that

[m1,m2,j]c!nw2(m11)nwm2m1+1nw2(cm2)(nw2c)c=c!nw2c(m2m1+1)(nw2c)cc!nw2c2(nw2c)c{\mathbb{P}}[\mathcal{E}_{m_{1},m_{2},j}]\leq c!\cdot\frac{n_{w}^{2(m_{1}-1)}n_{w}^{m_{2}-m_{1}+1}n_{w}^{2(c-m_{2})}}{(n_{w}^{2}-c)^{c}}=c!\cdot\frac{n_{w}^{2c-(m_{2}-m_{1}+1)}}{(n_{w}^{2}-c)^{c}}\leq c!\cdot\frac{n_{w}^{2c-2}}{(n_{w}^{2}-c)^{c}}

since m2m1+12m_{2}-m_{1}+1\geq 2. Thus, via union bound over m1,m2{1,,c}m_{1},m_{2}\in\{1,\ldots,c\} and j{1,,nw}j\in\{1,\ldots,n_{w}\}, i.e., c2nwc^{2}n_{w} events, the probability of diff_group\mathcal{E}^{\textsc{diff\_group}} is lower bounded as follows

[diff_group]1c2nwc!nw2c2(nw2c)c=1c!c2nw2c1(nw2c)cnw1.{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]\geq 1-c^{2}n_{w}\cdot c!\cdot\frac{n_{w}^{2c-2}}{(n_{w}^{2}-c)^{c}}=1-c!\cdot c^{2}\frac{n_{w}^{2c-1}}{(n_{w}^{2}-c)^{c}}\to_{n_{w}\to\infty}1.

When the event diff_groups\mathcal{E}^{\textsc{diff\_groups}} holds then the reviews ZmZ_{m} for m=1,,cm=1,\ldots,c come from different groups {Sj}j=1nw\{S_{j}\}_{j=1}^{n_{w}}. We show that when this happens the values of the reviews ZmZ_{m} are independent. Let j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}} be the event that review ZmZ_{m} comes from group SjmS_{j_{m}} for m=1,,cm=1,\ldots,c. The next lemma shows that conditioned on the event j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}}, the values of Z1,,ZcZ_{1},\ldots,Z_{c} are independent.161616This is not the case without conditioning on j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}}. As an example, consider two ordered reviews (Z1,Z2)(Z_{1},Z_{2}) drawn uniformly from the three ordered review ratings (1,0,1,1)(1,0,1,1). Conditioned on Z1=0Z_{1}=0 then Z2=1Z_{2}=1 deterministically while conditioned on Z1=1Z_{1}=1 then [Z2=0|Z1=1]=14{\mathbb{P}}[Z_{2}=0|Z_{1}=1]=\frac{1}{4}. As a results Z1Z_{1} and Z2Z_{2} are correlated. Recall that Conc\mathcal{E}^{\textsc{Conc}} implies that (y1,,ynw2)𝒜(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}.

Lemma A.6.

Let j1,j2,jc{1,,nw}j_{1},j_{2}\ldots,j_{c}\in\{1,\ldots,n_{w}\} be group indices with j1<j2<,jcj_{1}<j_{2}<\ldots,\leq j_{c}. Conditioned on j1,,jc,Conc\mathcal{E}_{j_{1},\ldots,j_{c}},\mathcal{E}^{\textsc{Conc}}, OrderInf\mathcal{E}^{\textsc{OrderInf}}, and (y1,,ynw2)𝒜(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}, for any vector of review ratings 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}, the events {Zm=zm}\{Z_{m}=z_{m}\} for m=1,,cm=1,\ldots,c are independent. Furthermore,

[Zm=zm]=(Sjmynw)zm(1Sjmynw)1zm.{\mathbb{P}}\big{[}Z_{m}=z_{m}\big{]}=\Big{(}\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}}\Big{)}^{z_{m}}\Big{(}1-\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}}\Big{)}^{1-z_{m}}.
Proof.

For any specific review ratings y1Sj1,,ymSjmy_{\ell_{1}}\in S_{j_{1}},\ldots,y_{\ell_{m}}\in S_{j_{m}}, applying Bayes rule

[Z1=y1,,Zc=yc|j1,,jc]\displaystyle{\mathbb{P}}\big{[}Z_{1}=y_{\ell_{1}},\ldots,Z_{c}=y_{\ell_{c}}|\mathcal{E}_{j_{1},\ldots,j_{c}}\big{]} =[Z1=y1,,Zm=ym,j1,,jc][j1,,jc]\displaystyle=\frac{{\mathbb{P}}\big{[}Z_{1}=y_{\ell_{1}},\ldots,Z_{m}=y_{\ell_{m}},\mathcal{E}_{j_{1},\ldots,j_{c}}\big{]}}{{\mathbb{P}}\big{[}\mathcal{E}_{j_{1},\ldots,j_{c}}\big{]}}

There are (nw)c(n_{w})^{c} choices for the set of cc ratings (Z1,,Zc)(Z_{1},\ldots,Z_{c}) so that ZmSjmZ_{m}\in S_{j_{m}} for m=1,,cm=1,\ldots,c and ((nw)2c)\binom{(n_{w})^{2}}{c} total number of choices. Hence j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}} holds with probability [j1,,jc]=(nw)c((nw)2c){\mathbb{P}}\big{[}\mathcal{E}_{j_{1},\ldots,j_{c}}\big{]}=\frac{(n_{w})^{c}}{\binom{(n_{w})^{2}}{c}}. Given that there is exactly one choice for the cc reviews (Z1,,Zc)(Z_{1},\ldots,Z_{c}) which satisfies Z1=y1,,Zc=ycZ_{1}=y_{\ell_{1}},\ldots,Z_{c}=y_{\ell_{c}}:

[Z1=y1,,Zc=yc|1,,c]=1((nw)2c)(nw)c((nw)2c)=1(nw)c.{\mathbb{P}}\big{[}Z_{1}=y_{\ell_{1}},\ldots,Z_{c}=y_{\ell_{c}}|\mathcal{E}_{\ell_{1},\ldots,\ell_{c}}\big{]}=\frac{\frac{1}{\binom{(n_{w})^{2}}{c}}}{\frac{(n_{w})^{c}}{\binom{(n_{w})^{2}}{c}}}=\frac{1}{(n_{w})^{c}}. (9)

Since this holds for any y1Sj1,,ymSjmy_{\ell_{1}}\in S_{j_{1}},\ldots,y_{\ell_{m}}\in S_{j_{m}} we obtain independence of {Zm=zm}\{Z_{m}=z_{m}\} for m=1,,cm=1,\ldots,c. By summing (9) over y2,,ycy_{\ell_{2}},\ldots,y_{\ell_{c}}, for any particular ymSjmy_{\ell_{m}}\in S_{j_{m}}, we have [Zm=ym|1,,c]=1nw{\mathbb{P}}[Z_{m}=y_{\ell_{m}}|\mathcal{E}_{\ell_{1},\ldots,\ell_{c}}]=\frac{1}{n_{w}}. Therefore, the probability that ZmZ_{m} has a particular value zm{0,1}z_{m}\in\{0,1\} is equal to the fraction of ymSjmy_{\ell_{m}}\in S_{j_{m}} which have value zmz_{m} i.e.

[Zm=zm]=(Sjmynw)zm(1Sjmynw)1zm.{\mathbb{P}}\big{[}Z_{m}=z_{m}\big{]}=\Big{(}\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}}\Big{)}^{z_{m}}\Big{(}1-\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}}\Big{)}^{1-z_{m}}.

We next show that conditioned on the events diff_group\mathcal{E}^{\textsc{diff\_group}},Conc\mathcal{E}^{\textsc{Conc}}, and OrderInf\mathcal{E}^{\textsc{OrderInf}} the distribution of σrandom(w)(t(i))\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}}) is close to the distribution of σrandom\sigma^{\textsc{random}}. Given that the latter consists of i.i.d. Bern(μ){\mathrm{Bern}}(\mu), (σrandom=𝒛)=μN(𝒛)(1μ)cN(𝒛){\mathbb{P}}(\sigma^{\textsc{random}}=\bm{z})=\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})} for vector of review ratings 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}. Recall that N(𝒛)N𝒛=i=1cziN(\bm{z})\coloneqq N_{\bm{z}}=\sum_{i=1}^{c}z_{i}. Using the independence acorss different groups (Lemma A.6), the following lemma shows that [σrandom(w)(t(i))=𝒛]{\mathbb{P}}[\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}] has a similar decomposition as [σrandom=𝒛]{\mathbb{P}}[\sigma^{\textsc{random}}=\bm{z}].

Lemma A.7.

Conditioned on j1,,jc,Conc\mathcal{E}_{j_{1},\ldots,j_{c}},\mathcal{E}^{\textsc{Conc}}, OrderInf\mathcal{E}^{\textsc{OrderInf}}, and (y1,,ynw)𝒜(y_{1},\ldots,y_{n_{w}})\in\mathcal{A}, for any vector of review ratings 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}, [σrandom(w)(t(i))=𝐳]{\mathbb{P}}[\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}] is lower bounded by (μ(1nw13))N(𝐳)(1μ(1+nw13))cN(𝐳)(\mu(1-n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1+n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})} and upper bounded by (μ(1+nw13))N(𝐳)(1μ(1nw13))cN(𝐳)(\mu(1+n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1-n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})}

Proof.

Using Lemma A.6:

[σrandom(w)(t(i))=𝒛|j1,,jc]=m=1c(Sjmynw)zm(1Sjmynw)1zm.{\mathbb{P}}[\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}|\mathcal{E}_{j_{1},\ldots,j_{c}}]=\prod_{m=1}^{c}(\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}})^{z_{m}}(1-\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}})^{1-z_{m}}.

As (y1,,ynw2)𝒜(y_{1},\ldots,y_{n_{w}^{2}})\in\mathcal{A}, we have that μ(1nw13)Sjmynwμ(1+nw13)\mu(1-n_{w}^{-\frac{1}{3}})\leq\frac{\sum_{\ell\in S_{j_{m}}}y_{\ell}}{n_{w}}\leq\mu(1+n_{w}^{-\frac{1}{3}}) for all m=1,,cm=1,\ldots,c. Applying these inequalities for each mm yeilds

[σrandom(w)(t(i))=𝒛|j1,,jc]\displaystyle{\mathbb{P}}[\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}|\mathcal{E}_{j_{1},\ldots,j_{c}}]
[(μ(1nw13))N(𝒛)(1μ(1+nw13))cN(𝒛),(μ(1+nw13))N(𝒛)(1μ(1nw13))cN(𝒛)].\displaystyle\in\Big{[}(\mu(1-n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1+n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})},(\mu(1+n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1-n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})}\Big{]}.

for any j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}}. As diff_groups\mathcal{E}_{\textsc{diff\_groups}} is the union of events j1,,jc\mathcal{E}_{j_{1},\ldots,j_{c}} for all indices j1,,jcj_{1},\ldots,j_{c}, this implies

[σrandom(w)(t(i))=𝒛|diff_groups]\displaystyle{\mathbb{P}}[\sigma^{\textsc{random}(w)}(\mathcal{H}_{t^{(i)}})=\bm{z}|\mathcal{E}_{\textsc{diff\_groups}}]
[(μ(1nw13))N(𝒛)(1μ(1+nw13))cN(𝒛),(μ(1+nw13))N(𝒛)(1μ(1nw13))cN(𝒛)].\displaystyle\in\Big{[}(\mu(1-n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1+n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})},(\mu(1+n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1-n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})}\Big{]}.

Proof of Lemma A.4.

The law of total probability yields the following decomposition

[(Z1,,Zc)=𝒛]\displaystyle{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}] =[(Z1,,Zc)=𝒛|diff_group][diff_group]\displaystyle={\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}|\mathcal{E}^{\textsc{diff\_group}}]{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]
+[(Z1,,Zc)=𝒛|¬diff_group][¬diff_group].\displaystyle+{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}|\neg\mathcal{E}^{\textsc{diff\_group}}]{\mathbb{P}}[\neg\mathcal{E}^{\textsc{diff\_group}}].

By this decomposition and Lemma A.7, we can lower bound [(Z1,,Zc)=𝒛]{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}] by

[(Z1,,Zc)=𝒛|diff_group][diff_group](μ(1nw13))N(𝒛)(1μ(1+nw13))cN(𝒛)[diff_group]LB(nw,𝒛)\displaystyle{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}|\mathcal{E}^{\textsc{diff\_group}}]{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]\geq\underbrace{(\mu(1-n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1+n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})}{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]}_{\textsc{LB}(n_{w},\bm{z})}

and upper bound the same probability by

[(Z1,,Zc)=𝒛|diff_group]\displaystyle{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}|\mathcal{E}^{\textsc{diff\_group}}] +1[diff_group]\displaystyle+1-{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]
(μ(1+nw13))N(𝒛)(1μ(1nw13))cN(𝒛)+1[diff_group]UB(nw,𝒛)\displaystyle\leq\underbrace{(\mu(1+n_{w}^{-\frac{1}{3}}))^{N(\bm{z})}(1-\mu(1-n_{w}^{-\frac{1}{3}}))^{c-N(\bm{z})}+1-{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]}_{\textsc{UB}(n_{w},\bm{z})}

Therefore,

|[(Z1,,Zc)=𝒛]μN(𝒛)(1μ)cN(𝒛)|f(𝒛,nw)\displaystyle|{\mathbb{P}}[(Z_{1},\ldots,Z_{c})=\bm{z}]-\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})}|\leq f(\bm{z},n_{w})

where

f(𝒛,nw)=max(|LB(nw,𝒛)μN(𝒛)(1μ)cN(𝒛)|,|UB(nw,𝒛)μN(𝒛)(1μ)cN(𝒛)|).\displaystyle f(\bm{z},n_{w})=\max\Big{(}|\textsc{LB}(n_{w},\bm{z})-\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})}|,|\textsc{UB}(n_{w},\bm{z})-\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})}|\Big{)}.

Recall that [σrandom=𝒛]=μN(𝒛)(1μ)cN(𝒛){\mathbb{P}}[\sigma^{\textsc{random}}=\bm{z}]=\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})}. Since [diff_group]nw1{\mathbb{P}}[\mathcal{E}^{\textsc{diff\_group}}]\to_{n_{w}\to\infty}1 (Lemma A.5) and nw130n_{w}^{-\frac{1}{3}}\to 0, we obtain that f(𝒛,nw)n0f(\bm{z},n_{w})\to_{n\to\infty}0 for any 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} because

UB(nw,𝒛),LB(nw,𝒛)nwμN(𝒛)(1μ)cN(𝒛).\textsc{UB}(n_{w},\bm{z}),\textsc{LB}(n_{w},\bm{z})\to_{n_{w}\to\infty}\mu^{N(\bm{z})}(1-\mu)^{c-N(\bm{z})}.

As a result, f(nw)=max𝒛{0,1}cf(𝒛,nw)f(n_{w})=\max_{\bm{z}\in\{0,1\}^{c}}f(\bm{z},n_{w}) satisfies the desired property and concludes the proof. ∎

Appendix B Generalizing beyond Beta-Bernoulli distributions (Remark 2.1)

We consider a generalization of the customer behavior model in Section 2. We only point to the modeling assumptions that change; everything else remains as in Section 2:

  1. 1.

    With respect to the customer valuation, the product’s unobservable part μt\mu_{t} is drawn from an arbitrary distribution 𝒟\mathcal{D} with mean μ\mu and finite support R={r1,,rs}sR=\{r_{1},\ldots,r_{s}\}\in\mathbb{R}^{s} where r1<<rsr_{1}<\ldots<r_{s} and s2s\geq 2. That is, 𝒟\mathcal{D} is no longer restricted to be Bernoulli with S={0,1}S=\{0,1\} and μ\mu in (0,1)(0,1). This can capture a system where μt\mu_{t} is the number of stars (S={1,,5}S=\{1,\ldots,5\}) which is common in online platforms such as Tripadvisor, Airbnb, and Amazon.

  2. 2.

    With respect to the customer purchase behavior, when presented with a vector of cc reviews 𝒁t=(Zt,1,,Zt,c)Sc\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c})\in S^{c}, the customer maps them to an estimated valuation V^t=Θt+h^(𝒁t)\hat{V}_{t}=\Theta_{t}+\hat{h}(\bm{Z}_{t}), where h^:Sc\hat{h}:S^{c}\to\mathbb{R} is an arbitrary fixed mapping. The model of Section 2.1 is a special case where the estimate h^\hat{h} is created via a two-stage process: the customer initially creates a posterior belief Φt=Beta(a+i=1cZt,i,b+ci=1cZt,i)\Phi_{t}=\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}Z_{t,i},b+c-\sum_{i=1}^{c}Z_{t,i}\big{)} and maps this posterior to an estimate h^(𝒁t)=h(Φt)\hat{h}(\bm{Z}_{t})=h(\Phi_{t}) via a mapping hh. Unlike this special case (where the estimate can only depend on the number of positive reviews in the cc displayed reviews), our generalization here allows an arbitrary mapping from 𝒁t\bm{Z}_{t} that can also take the order of reviews into consideration.

  3. 3.

    With respect to the customer review generation, the review Xt=μtX_{t}=\mu_{t} in the event of a purchase and Xt=X_{t}=\perp otherwise. The difference to Section 2 is that μt\mu_{t} is not restricted to be Bernoulli.

  4. 4.

    We assume that the estimator h^(𝒛)\hat{h}(\bm{z}) is strictly increasing in each coordinate of the review ratings 𝒛\bm{z}. This extends Assumption 2.1 in a way that can capture the order of the reviews.

  5. 5.

    We assume that the customer’s smallest estimated valuation V¯^=Θ+h¯\underline{\hat{V}}=\Theta+\underline{h} where h¯=h^(r1,,r1)\underline{h}=\hat{h}(r_{1},\ldots,r_{1}) has nonzero mass on non-negative numbers, i.e, Θ[V¯^0]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\underline{\hat{V}}\geq 0]>0. This extends Assumption 2.2 and ensures that there exist non-negative prices inducing positive purchase probability.

The main driver behind all results in Sections 3 and 4 is the characterization of the stationary distribution of the Markov chain 𝒁t=(Zt,1,,Zt,c)\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c}) of the newest cc reviews.

  • Estimator h^\hat{h} generalizes the customer’s purchase behavior. The purchase probability at a state with review ratings 𝒛Rc\bm{z}\in R^{c} is Θ[Θ+h(𝒛)ρ(𝒛)]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\bm{z})\geq\rho(\bm{z})] where ρ\rho is the pricing policy.

  • The review generation process changes the transition dynamics of 𝒁t\bm{Z}_{t} upon a purchase, i.e., a new review takes one of the values {r1,,rs}\{r_{1},\ldots,r_{s}\} (as opposed to {0,1}\{0,1\} in the original model).

  • Letting g𝒟g_{\mathcal{D}} be the probability mass function of 𝒟\mathcal{D}, the stationary distribution of 𝒁t\bm{Z}_{t} is

    π𝒛=κi=1cg𝒟(ri)Θ[Θ+h^(𝒛)ρ(𝒛)]where κ=1𝔼Z1,,Zci.i.d.𝒟[1Θ[Θ+h^(Z1,,Zc)ρ(𝒛)]].\pi_{\bm{z}}=\kappa\cdot\frac{\prod_{i=1}^{c}g_{\mathcal{D}}(r_{i})}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}(\bm{z})\geq\rho(\bm{z})]}\quad\text{where }\kappa=\frac{1}{{\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d.}\mathcal{D}}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}(Z_{1},\ldots,Z_{c})\geq\rho(\bm{z})]}\Big{]}}.
  • This is analogous to Lemma 3.1 and Lemma D.5 for static and dynamic pricing in the the Beta-Bernoulli model. In particular, Lemma D.5 replaces g𝒟(ri)g_{\mathcal{D}}(r_{i}) and h^(𝒛)\hat{h}(\bm{z}) by μzi(1μ)1zi\mu^{z_{i}}(1-\mu)^{1-z_{i}} and h(N𝒛)h(N_{\bm{z}}) in the expression for π\pi, and the expectation 𝔼Z1,,Zci.i.d.𝒟{\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d.}\mathcal{D}} and h^(Z1,,Zc)\hat{h}(Z_{1},\ldots,Z_{c}) with 𝔼Y1,,YcBern(μ){\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim{\mathrm{Bern}}(\mu)} and h(i=1cYi)h(\sum_{i=1}^{c}Y_{i}) in the expression for κ\kappa. The proof is completely analogous.

Using this stationary distribution and similar steps as in the proofs of Proposition 3.3 and Proposition 4.2 the revenue of σnewest{\sigma^{\textsc{newest}}} in the generalized model becomes:

Rev(σnewest,ρ)=𝔼Z1,,zci.i.d.𝒟[ρ(𝒁)]𝔼Z1,,Zci.i.d.𝒟[1Θ[Θ+h^(𝒁)ρ(𝒁)]].\textsc{Rev}(\sigma^{\textsc{newest}},\rho)=\frac{{\mathbb{E}}_{Z_{1},\ldots,z_{c}\sim_{i.i.d.}\mathcal{D}}[\rho(\bm{Z})]}{{\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d.}\mathcal{D}}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}(\bm{Z})\geq\rho(\bm{Z})]}\Big{]}}.

The revenue of σrandom{\sigma^{\textsc{random}}} (which shows cc i.i.d. reviews from 𝒟\mathcal{D}) directly follows by adapting the purchase probability in Proposition 3.2 and Theorem 4.3:

Rev(σrandom,ρ)=𝔼Z1,,Zci.i.d.𝒟[ρ(𝒁)Θ[Θ+h^(Z1,,Zc)ρ(𝒁)]].{\textsc{Rev}}({\sigma^{\textsc{random}}},\rho)={\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d.}\mathcal{D}}\Big{[}\rho(\bm{Z})\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}(Z_{1},\ldots,Z_{c})\geq\rho(\bm{Z})]\Big{]}.

Theorem 3.1 and 4.2 then extend to our generalized model by analogously adapting their proofs. With respect to the negative results (Theorem 3.2 and Theorem 4.1), given that the model in the main body is a special case of the generalized model, they also directly extend. With respect to Section 5, similar extensions can be derived under appropriate assumptions; given that those results are about a specific time-discounting model, we do not focus on this extension.

Remark B.1.

The generalized model allows for customers who are fully Bayesian in estimating the fixed valuation given the ordering policy σ\sigma and use an estimator mapping h^=h^FullyBayesian\hat{h}=\hat{h}_{\textsc{FullyBayesian}}. Such customers could be reactive to σ\sigma and account for the effect of CoNF; the extension of the CoNF result may thus seem surprising. The reason why this occurs is that CoNF evaluates the customers assuming that they follow the same behavioral model under σrandom{\sigma^{\textsc{random}}} and under σnewest{\sigma^{\textsc{newest}}} (and does not consider the setting where customers are reactive to the ordering policy).

Appendix C Supplementary material for Section 3

C.1 Absorbing prices cause newest first to get “stuck” (Proposition 3.1)

Proposition 3.1 posits that Rev(σnewest,p)=0{\textsc{Rev}}({\sigma^{\textsc{newest}}},p)=0 for absorbing prices pp. Let 𝒁t=(Zt,i)i=1c\bm{Z}_{t}=(Z_{t,i})_{i=1}^{c} denote the vector of the cc most recent reviews at round tt. As argued in Section 3.2, 𝒁t\bm{Z}_{t} is a time-homogeneous Markov Chain with a finite state space {0,1}c\{0,1\}^{c}. We say that a state (z1,,zc){0,1}c(z_{1},\ldots,z_{c})\in\{0,1\}^{c} of this Markov Chain is absorbing if i=1czin0\sum_{i=1}^{c}z_{i}\leq n_{0}. Given that pp is absorbing, an absorbing state will always exist. Once 𝒁t\bm{Z}_{t} enters an absorbing state, no purchase is made thereafter and it stays at the absorbing state forever. Let τ(𝒁)\tau(\bm{Z}) be the first round that 𝒁t\bm{Z}_{t} enters an absorbing state; after this round no purchase is ever made.

To simplify analysis, we define a fictitious Markov chain 𝒁t~\tilde{\bm{Z}_{t}} on the same state space that makes transitions with probability η=0\eta=0 if Θ[Θ+h(c)p]=0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]=0 and η=Θ[Θ+h(n0)p]\eta={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n_{0})\geq p], where n0n_{0} is the smallest number of positive reviews that induce a positive purchase probability. If 𝒁t~\tilde{\bm{Z}_{t}} is at state (z1,,zc){0,1}c(z_{1},\ldots,z_{c})\in\{0,1\}^{c}, then with probability 1η1-\eta, there is no purchase and it remains at the same state. With probability η\eta, there is a purchase (and a review); the review is positive with probability μ\mu (and the state transitions to (1,z1,,zc1)(1,z_{1},\ldots,z_{c-1})) and the review is negative with probability 1μ1-\mu (and the state transitions to (0,z1,,zc1)(0,z_{1},\ldots,z_{c-1})). Notice that 𝒁t~\tilde{\bm{Z}_{t}} has the same transitions as 𝒁t\bm{Z}_{t} with the only difference being that the purchase probability is always η\eta. Let ν(𝒁~)\nu(\tilde{\bm{Z}}) be the first time that 𝒁t~\tilde{\bm{Z}_{t}} enters the state (0,,0)(0,\ldots,0). Our proof relies on the following lemmas.

Lemma C.1.

For η>0\eta>0, 𝐙\bm{Z} enters an absorbing state before 𝐙~\tilde{\bm{Z}} enters (0,,0)(0,\ldots,0): 𝔼[τ(𝐙)]𝔼[ν(𝐙~)]{\mathbb{E}}[\tau(\bm{Z})]\leq{\mathbb{E}}[\nu(\tilde{\bm{Z}})].

Proof.

Consider the following coupling of 𝒁t\bm{Z}_{t} and 𝒁~t\tilde{\bm{Z}}_{t}. Let {Ri}i=1\{R_{i}\}_{i=1}^{\infty} where Rii.i.d.Bern(μ)R_{i}\sim_{i.i.d.}{\mathrm{Bern}}(\mu) be an infinite sequence of i.i.d. Bern(μ){\mathrm{Bern}}(\mu) reviews. We will couple the processes 𝒁t\bm{Z}_{t} and 𝒁~t\tilde{\bm{Z}}_{t} by setting 𝒁t=(Rit,,Rit+c1)\bm{Z}_{t}=(R_{i_{t}},\ldots,R_{i_{t}+c-1}) and 𝒁t~=(Ri~t,,Ri~t+c1)\tilde{\bm{Z}_{t}}=(R_{\tilde{i}_{t}},\ldots,R_{\tilde{i}_{t}+c-1}) where we will update the indices iti_{t} and i~t\tilde{i}_{t} as described below. Initially, i1=i~1=1i_{1}=\tilde{i}_{1}=1. Let qt=Θ[Θ+h(i=0c1Rit+i)p]q_{t}={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\sum_{i=0}^{c-1}R_{i_{t}+i})\geq p] be the purchase probability at time tt of the process 𝒁t\bm{Z}_{t}. To ensure a purchase occurs in 𝒁~t\tilde{\bm{Z}}_{t} only if it also occurs in 𝒁t\bm{Z}_{t} as long as 𝒁t\bm{Z}_{t} is not at an absorbing state, we draw a random number Xt𝒰[0,1]X_{t}\sim\mathcal{U}[0,1] and use the following coupling:

  • Case 1: Xt<ηX_{t}<\eta (both processes get a purchase). it+1=it+1i_{t+1}=i_{t}+1 and i~t+1=i~t+1\tilde{i}_{t+1}=\tilde{i}_{t}+1.

  • Case 2: ηXtqt\eta\leq X_{t}\leq q_{t} (only 𝒁t\bm{Z}_{t} gets a purchase). it+1=it+1i_{t+1}=i_{t}+1 and i~t+1=i~t\tilde{i}_{t+1}=\tilde{i}_{t}.

  • Case 3: Xt>qtX_{t}>q_{t}. it+1=iti_{t+1}=i_{t} and i~t+1=i~t\tilde{i}_{t+1}=\tilde{i}_{t}.

If 𝒁t\bm{Z}_{t} is at an absorbing state then we only update 𝒁~t\tilde{\bm{Z}}_{t} (i.e. i~t+1=i~t+1\tilde{i}_{t+1}=\tilde{i}_{t}+1 with probability η\eta and otherwise i~t+1=i~t\tilde{i}_{t+1}=\tilde{i}_{t}). This coupling ensures that iti~ti_{t}\geq\tilde{i}_{t} as long as 𝒁t\bm{Z}_{t} is not at an absorbing state. Since the state (0,,0)(0,\ldots,0) is absorbing, it cannot be the case that 𝒁t\bm{Z}_{t} enters an absorbing state before 𝒁~t\tilde{\bm{Z}}_{t} enters (0,,0)(0,\ldots,0). Thus τ(𝒁)ν(𝒁~)\tau(\bm{Z})\leq\nu(\tilde{\bm{Z}}) which yields the result after taking expectations. ∎

Lemma C.2.

For η>0\eta>0, 𝐙~t\tilde{\bm{Z}}_{t} enters (0,0,,0)(0,0,\ldots,0) after finitely many rounds, i.e., 𝔼[ν(𝐙~)]{\mathbb{E}}[\nu(\tilde{\bm{Z}})]\leq\infty.

Proof.

Consider the process 𝒀t=𝒁~ct\bm{Y}_{t}=\tilde{\bm{Z}}_{c\cdot t} for t=0,,t=0,\ldots,. For any 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}, the probability of obtaining cc consecutive purchases with negative review is:

[𝒀t+1=(0,,0)|𝒀t=𝒛](η(1μ))c.{\mathbb{P}}[\bm{Y}_{t+1}=(0,\ldots,0)|\bm{Y}_{t}=\bm{z}]\geq(\eta(1-\mu))^{c}. (10)

Let ν(𝒀)=min{t s.t. 𝒀t=(0,,0)}\nu(\bm{Y})=\min\{t\text{ s.t. }\bm{Y}_{t}=(0,\ldots,0)\}. By (10), ν(𝒀)\nu(\bm{Y}) is stochastically dominated by a Geometric random variable with success probability (η(1μ))c(\eta(1-\mu))^{c}, implying 𝔼[ν(𝒀)]1(η(1μ))c{\mathbb{E}}[\nu(\bm{Y})]\leq\frac{1}{(\eta(1-\mu))^{c}}. Since 𝒀t\bm{Y}_{t} only keeps track of every cc-th value of 𝒁~t\tilde{\bm{Z}}_{t}, cν(𝒀)ν(𝒁~)c\cdot\nu(\bm{Y})\geq\nu(\tilde{\bm{Z}}) and thus 𝔼[ν(𝒁~)]c𝔼[τ(𝒀)]c(η(1μ))c{\mathbb{E}}[\nu(\tilde{\bm{Z}})]\leq c{\mathbb{E}}[\tau(\bm{Y})]\leq\frac{c}{(\eta(1-\mu))^{c}}. ∎

Proof of Proposition 3.1.

If Θ[Θ+h(n)p]=0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]=0 for all n{0,1,,c}n\in\{0,1,\ldots,c\}, then no purchase is ever made and Rev(σnewest,p)=0{\textsc{Rev}}({\sigma^{\textsc{newest}}},p)=0. We thus focus on the case that there exists some number of positive review ratings nn such that Θ[Θ+h(n)p]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]>0. Combining Lemmas C.1 and C.2 yields 𝔼[τ(𝒁)]<{\mathbb{E}}[\tau(\bm{Z})]<\infty. For t>τ(𝒁)t>\tau(\bm{Z}), 𝒁t\bm{Z}_{t} is absorbing and there is no purchase, i.e., Θt+h(Φt)ρ(𝒁t)\Theta_{t}+h(\Phi_{t})\geq\rho(\bm{Z}_{t}). Hence, 𝔼[t=1Tp𝟙Θt+h(Φt)T]p𝔼[τ(𝒁)]T0{\mathbb{E}}\Big{[}\frac{\sum_{t=1}^{T}p\mathbbm{1}_{\Theta_{t}+h(\Phi_{t})}}{T}\Big{]}\leq p\frac{{\mathbb{E}}[\tau(\bm{Z})]}{T}\to 0 as TT\to\infty and Rev(σnewest,p)=0{\textsc{Rev}}({\sigma^{\textsc{newest}}},p)=0. ∎

C.2 Stationary Distribution under Newest First (Lemma 3.1)

We first state a general property characterizing how the stationary distribution of a Markov chain changes if we modify it so that in every state, the process remains there with some probability.

Lemma C.3.

Let \mathcal{M} be a Markov chain on a finite state space 𝒮\mathcal{S} (|𝒮|=m|\mathcal{S}|=m) with a transition probability matrix Mm×mM\in\mathbb{R}^{m\times m} and a stationary distribution πm\pi\in\mathbb{R}^{m}. For any function f:𝒮(0,1]f:\mathcal{S}\to(0,1], we define a new Markov chain f\mathcal{M}_{f} on the same state space 𝒮\mathcal{S}. At state s𝒮s\in\mathcal{S}, f\mathcal{M}_{f} transitions according to the matrix MM with probability f(s)f(s) and remains in ss with probability 1f(s)1-f(s). Then πf(s)=κπ(s)f(s)\pi_{f}(s)=\kappa\cdot\frac{\pi(s)}{f(s)} is a stationary distribution of f\mathcal{M}_{f} where κ=1/s𝒮π(s)f(s)\kappa=1/\sum_{s\in\mathcal{S}}\frac{\pi(s)}{f(s)} is a normalizing constant

Proof of Lemma C.3.

For any state s𝒮s\in\mathcal{S}, the probability of a self-transition under f\mathcal{M}_{f} is Mf(s,s)=f(s)M(s,s)+1f(s)M_{f}(s,s)=f(s)M(s,s)+1-f(s) since there are two ways to transition from ss back to itself: (1) the f(s)f(s) transition followed by a transition back to ss via \mathcal{M} and (2) the 1f(s)1-f(s) transition that does not alter the current state. For all states sss\neq s^{\prime}, Mf(s,s)=f(s)M(s,s)M_{f}(s,s^{\prime})=f(s)M(s,s^{\prime}) since the only way for f\mathcal{M}_{f} to transition from ss to ss^{\prime} is to take a f(s)f(s) transition at ss and follow the transitions of \mathcal{M} to get to ss^{\prime}.

The distribution {πf(s)=κπ(s)f(s)}s𝒮\Big{\{}\pi_{f}(s)=\kappa\cdot\frac{\pi(s)}{f(s)}\Big{\}}_{s\in\mathcal{S}} is a probability distribution by the definition of the normalizing constant κ=1/s𝒮π(s)f(s)\kappa=1/\sum_{s\in\mathcal{S}}\frac{\pi(s)}{f(s)}. Using the transitions of f\mathcal{M}_{f}, it holds that:

s𝒮πf(s)Mf(s,s)\displaystyle\sum_{s^{\prime}\in\mathcal{S}}\pi_{f}(s^{\prime})M_{f}(s^{\prime},s) =πf(s)Mf(s,s)+ssπf(s)Mf(s,s)\displaystyle=\pi_{f}(s)M_{f}(s,s)+\sum_{s^{\prime}\neq s}\pi_{f}(s^{\prime})M_{f}(s^{\prime},s)
=πf(s)(1f(s)+f(s)M(s,s))+ssπf(s)(f(s)M(s,s))\displaystyle=\pi_{f}(s)\cdot\big{(}1-f(s)+f(s)M(s,s)\big{)}+\sum_{s^{\prime}\neq s}\pi_{f}(s^{\prime})\cdot\big{(}f(s^{\prime})M(s^{\prime},s)\big{)}
=κπ(s)f(s)(1f(s)+f(s)M(s,s))+ssκπ(s)f(s)(f(s)M(s,s))\displaystyle=\kappa\cdot\frac{\pi(s)}{f(s)}\cdot\big{(}1-f(s)+f(s)M(s,s)\big{)}+\sum_{s^{\prime}\neq s}\kappa\cdot\frac{\pi(s^{\prime})}{f(s^{\prime})}\cdot\big{(}f(s^{\prime})M(s^{\prime},s)\big{)}
=κπ(s)f(s)κπ(s)+κπ(s)M(s,s)+ssκπ(s)M(s,s)=κπ(s) (as π is stationary for )=πf(s).\displaystyle=\kappa\cdot\frac{\pi(s)}{f(s)}-\kappa\cdot\pi(s)+\underbrace{\kappa\cdot\pi(s)M(s,s)+\sum_{s^{\prime}\neq s}\kappa\cdot\pi(s^{\prime})M(s^{\prime},s)}_{=\kappa\cdot\pi(s)\text{ (as $\pi$ is stationary for $\mathcal{M}$)}}=\pi_{f}(s).

Hence, πf\pi_{f} is a stationary distribution of MfM_{f} as πf(s)=s𝒮πf(s)Mf(s,s)\pi_{f}(s)=\sum_{s^{\prime}\in\mathcal{S}}\pi_{f}(s^{\prime})M_{f}(s^{\prime},s) for all states ss. ∎

Proof of Lemma 3.1 .

We will show that the stationary distribution of the newest cc reviews 𝒁t\bm{Z}_{t} is

π(z1,,zc)=κμi=1czi(1μ)ci=1cziΘ[Θ+h(i=1czi)p]\pi_{(z_{1},\ldots,z_{c})}=\kappa\cdot\frac{\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}}

where κ=1/𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]\kappa=1/{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\big{]} is the normalizing constant. In the language of Lemma C.3, 𝒁t\bm{Z}_{t} corresponds to f\mathcal{M}_{f}, the state space 𝒮\mathcal{S} to {0,1}c\{0,1\}^{c}, and ff is a function that expresses the purchase probability at a given state, i.e., f(z1,,zc)=Θ[Θ+h(i=1czi)p]f(z_{1},\ldots,z_{c})={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}z_{i})\geq p\Big{]}. Note that with probability 1f(z1,,zc)1-f(z_{1},\ldots,z_{c}), 𝒁t\bm{Z}_{t} remains at the same state (as there is no purchase).

To apply Lemma C.3, we need to show that whenever there is purchase, 𝒁t\bm{Z}_{t} transitions according to a Markov chain with stationary distribution μi=1czi(1μ)ci=1czi\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}. Consider the Markov chain \mathcal{M} which always replaces the cc-th last review with a new Bern(μ){\mathrm{Bern}}(\mu) review. This process has stationary distribution equal to the above numerator and 𝒁t\bm{Z}_{t} transitions according to \mathcal{M} upon a purchase, i.e., with probability f(z1,,zc)f(z_{1},\ldots,z_{c}). As a result, by Lemma C.3, π\pi is a stationary distribution for 𝒁t\bm{Z}_{t}. As 𝒁t\bm{Z}_{t} is irreducible and aperiodic, this is the unique stationary distribution. ∎

C.3 Closed-form expression for Cost of Newest First (Lemma 3.2)

Proof of Lemma 3.2.

Dividing the expressions given in Propositions 3.2 and 3.3, the price term pp cancels out and the CoNF can be expressed as:

χ(p)\displaystyle\chi(p) =𝔼NBinom(c,μ)[Θ[Θ+h(N)p]]𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]\displaystyle={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]\big{]}\cdot{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}
=(i=0cμi(1μ)ci(ci)Θ[Θ+h(i)p])(j=0cμj(1μ)cj(cj)1Θ[Θ+h(j)p])\displaystyle=\Big{(}\sum_{i=0}^{c}\mu^{i}(1-\mu)^{c-i}\binom{c}{i}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]\Big{)}\cdot\Big{(}\sum_{j=0}^{c}\mu^{j}(1-\mu)^{c-j}\binom{c}{j}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}\Big{)}
=i,j{0,,c}μi+j(1μ)2cij(ci)(cj)Θ[Θ+h(i)p]Θ[Θ+h(j)p].\displaystyle=\sum_{i,j\in\{0,\ldots,c\}}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}.

C.4 Monotonicity of CoNF under monotone hazard rate (Remark 3.1)

Definition C.1.

A continuous distribution 𝒢\mathcal{G} with probability and cumulative density functions gg and GG has Monotone Hazard Rate (MHR) if the hazard rate function g(u)1G(u)\frac{g(u)}{1-G(u)} is non-decreasing in uu.

To complement Theorem 3.2, we also show a structural result for the behavior of χ(p)\chi(p) as a function of pp when \mathcal{F} has MHR.

Proposition C.1.

Suppose that \mathcal{F} is a continuous distribution with support [θ¯,θ¯][\underline{\theta},\overline{\theta}] and has MHR. Then χ(p)\chi(p) is non-decreasing for p(θ¯+h(c),θ¯+h(0))p\in(\underline{\theta}+h(c),\overline{\theta}+h(0)).171717The same proof also extends to cases when θ¯=\underline{\theta}=-\infty and/or θ¯=+\overline{\theta}=+\infty.

Proof.

By Lemma 3.2, the CoNF is given by

χ(p)\displaystyle\chi(p) =i,j{0,,c}μi+j(1μ)2cij(ci)(cj)Θ[Θ+h(i)p]Θ[Θ+h(j)p]\displaystyle=\sum_{i,j\in\{0,\ldots,c\}}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}
=i=0cμ2i(1μ)2(ci)(ci)2\displaystyle=\sum_{i=0}^{c}\mu^{2i}(1-\mu)^{2(c-i)}\binom{c}{i}^{2}
+i<jμi+j(1μ)2cij(ci)(cj)(Θ[Θ+h(i)p]Θ[Θ+h(j)p]+Θ[Θ+h(j)p]Θ[Θ+h(i)p]).\displaystyle\quad+\sum_{i<j}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\Bigg{(}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}+\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(j)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(i)\geq p]}\Bigg{)}.

Letting FF be the cumulative density function of \mathcal{F} and F¯(u)1F(u)\overline{F}(u)\coloneqq 1-F(u) be the survival function:

χ(p)=i=0cμ2i(1μ)2(ci)(ci)2+i<jμi+j(1μ)2cij(ci)(cj)(F¯(ph(i))F¯(ph(j))+F¯(ph(j))F¯(ph(i))).\chi(p)=\sum_{i=0}^{c}\mu^{2i}(1-\mu)^{2(c-i)}\binom{c}{i}^{2}+\sum_{i<j}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\Bigg{(}\frac{\overline{F}(p-h(i))}{\overline{F}(p-h(j))}+\frac{\overline{F}(p-h(j))}{\overline{F}(p-h(i))}\Bigg{)}.

We denote the ratio of the purchase probability with jj and ii positive reviews by ui,j(p)F¯(ph(j))F¯(ph(i))u_{i,j}(p)\coloneqq\frac{\overline{F}(p-h(j))}{\overline{F}(p-h(i))}. For i<ji<j, this ratio is ui,j(p)1u_{i,j}(p)\geq 1 because ph(i)>ph(j)p-h(i)>p-h(j) due to the monotonicity of hh (Assumption 2.1). Given that \mathcal{F} is continuous, F¯(x)\overline{F}(x) is differentiable for any xx such that F¯(x)(0,1)\overline{F}(x)\in(0,1). Furthermore for p(θ¯+h(c),θ¯+h(0))p\in(\underline{\theta}+h(c),\overline{\theta}+h(0)), F¯(ph(i))(0,1)\overline{F}(p-h(i))\in(0,1) for all i{0,1,,c}i\in\{0,1,\ldots,c\}, and thus ui,j(p)u_{i,j}(p) is differentiable at pp.

We now show that ui,j(p)u_{i,j}(p) is non-decreasing in pp. Letting ff be the probability density function of \mathcal{F} and taking the derivative of ui,j(p)u_{i,j}(p) with respect to pp:

ddpui,j(p)\displaystyle\frac{d}{dp}u_{i,j}(p) =f(ph(i))F¯(ph(j))f(ph(j))F¯(ph(i))F¯(ph(j))2\displaystyle=\frac{f(p-h(i))\overline{F}(p-h(j))-f(p-h(j))\overline{F}(p-h(i))}{\overline{F}(p-h(j))^{2}}
=(f(ph(i))F¯(ph(i))f(ph(j))F¯(ph(j)))F¯(ph(j))F¯(ph(i))F¯(ph(j))2.\displaystyle=\frac{\Big{(}\frac{f(p-h(i))}{\overline{F}(p-h(i))}-\frac{f(p-h(j))}{\overline{F}(p-h(j))}\Big{)}\overline{F}(p-h(j))\overline{F}(p-h(i))}{\overline{F}(p-h(j))^{2}}.

Observe that ph(i)>ph(j)p-h(i)>p-h(j) since i<ji<j and the strict monotonicity of hh. By the MHR property of \mathcal{F}: f(ph(i))F¯(ph(i))f(ph(j))F¯(ph(j))\frac{f(p-h(i))}{\overline{F}(p-h(i))}\geq\frac{f(p-h(j))}{\overline{F}(p-h(j))}, implying that ddpui,j(p)0\frac{d}{dp}u_{i,j}(p)\geq 0, and thus ui,j(p)u_{i,j}(p) is non-decreasing in pp.

To finish the proof of the theorem, we rewrite χ(p)\chi(p) as a function of {ui,j(p)}i<j\{u_{i,j}(p)\}_{i<j} as

χ(p)=i=0cμ2i(1μ)2(ci)(ci)2+i<jμi+j(1μ)2cij(ci)(cj)(ui,j(p)+1ui,j(p))\chi(p)=\sum_{i=0}^{c}\mu^{2i}(1-\mu)^{2(c-i)}\binom{c}{i}^{2}+\sum_{i<j}\mu^{i+j}(1-\mu)^{2c-i-j}\binom{c}{i}\binom{c}{j}\Bigg{(}u_{i,j}(p)+\frac{1}{u_{i,j}(p)}\Bigg{)}

Note that the function u+1uu+\frac{1}{u} is non-decreasing for u1u\geq 1. Since ui,j(p)1u_{i,j}(p)\geq 1 is non-decreasing in pp, then ui,j(p)+1ui,j(p)u_{i,j}(p)+\frac{1}{u_{i,j}(p)} is monotonically increasing in pp for every i<ji<j. Thus, χ(p)\chi(p) is non-decreasing in pp. ∎

C.5 Structure of CoNF for c=1c=1 and uniform distributions (Remark 3.1)

To study how χ(p)\chi(p) depends on the price pp we consider a set of instances where a single review is shown (c=1c=1) and =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] be the uniform distribution on [0,θ¯][0,\overline{\theta}]. The set of non-degenerate and non-absorbing prices is (h(0),h(0)+θ¯)(h(0),h(0)+\overline{\theta}). First, we provide a closed form expression for the CoNF.

Proposition C.2.

Let =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] and c=1c=1. For p(h(0),h(0)+θ¯)p\in(h(0),h(0)+\overline{\theta}), the CoNF is given by

χ(p)=1+μ(1μ)(θ¯+h(1)max(h(1),p)θ¯p+h(0)+θ¯p+h(0)θ¯+h(1)max(h(1),p)2).\chi(p)=1+\mu(1-\mu)\Big{(}\frac{\overline{\theta}+h(1)-\max(h(1),p)}{\overline{\theta}-p+h(0)}+\frac{\overline{\theta}-p+h(0)}{\overline{\theta}+h(1)-\max(h(1),p)}-2\Big{)}.
Proof.

By Lemma 3.2 when c=1c=1, the CoNF is given by

χ(p)\displaystyle\chi(p) =μ2+(1μ)2+μ(1μ)(Θ[Θ+h(1)p]Θ[Θ+h(0)p]+Θ[Θ+h(0)p]Θ[Θ+h(1)p])\displaystyle=\mu^{2}+(1-\mu)^{2}+\mu(1-\mu)\Bigg{(}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(1)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}+\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(1)\geq p]}\Bigg{)}
=1+μ(1μ)(Θ[Θ+h(1)p]Θ[Θ+h(0)p]+Θ[Θ+h(0)p]Θ[Θ+h(1)p]2).\displaystyle=1+\mu(1-\mu)\Bigg{(}\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(1)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}+\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(1)\geq p]}-2\Bigg{)}. (11)

For p(h(0),h(0)+θ¯)p\in(h(0),h(0)+\overline{\theta}) the purchase probabilities with zero and one positive reviews are given by

Θ[Θ+h(0)p]=θ¯p+h(0)θ¯ and Θ[Θ+h(1)p]=θ¯+h(1)max(h(1),p)θ¯.{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]=\frac{\overline{\theta}-p+h(0)}{\overline{\theta}}\qquad\text{ and }\qquad{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(1)\geq p]=\frac{\overline{\theta}+h(1)-\max(h(1),p)}{\overline{\theta}}.

Plugging in these probabilities in the expression for CoNF (C.5) concludes the proof. ∎

We next quantify how fast the CoNF increases as ph(0)+θ¯p\to h(0)+\overline{\theta}: we show that a price of pk=θ¯+h(0)Θ(1k)p_{k}=\overline{\theta}+h(0)-\Theta(\frac{1}{k}) results in a CoNF of Θ(k)\Theta(k). In particular, for k>1k>1 we define a target price pk=θ¯+h(0)min(θ¯k,h(1)h(0)k1)p_{k}=\overline{\theta}+h(0)-\min(\frac{\overline{\theta}}{k},\frac{h(1)-h(0)}{k-1}) and show the following corollary.

Corollary C.1.

Let =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] and c=1c=1. For k>1k>1 and p>pkp>p_{k}, the CoNF is lower bounded by

χ(p)1+μ(1μ)(k+1k2).\chi(p)\geq 1+\mu(1-\mu)(k+\frac{1}{k}-2).
Proof.

We first verify that p=pkp=p_{k} is a solution to the equation

θ¯+h(1)max(h(1),p)θ¯p+h(0)=k\frac{\overline{\theta}+h(1)-\max(h(1),p)}{\overline{\theta}-p+h(0)}=k (12)

which is equivalent to

pkmax(h(1),p)=(k1)θ¯+kh(0)h(1).pk-\max(h(1),p)=(k-1)\overline{\theta}+kh(0)-h(1). (13)

If ph(1)p\leq h(1), the solution of (13)(\ref{eq:for_p_k}) is p=θ¯+h(0)θ¯kp=\overline{\theta}+h(0)-\frac{\overline{\theta}}{k} and in order for ph(1)p\leq h(1) to hold we need θ¯+h(0)θ¯kh(1)\overline{\theta}+h(0)-\frac{\overline{\theta}}{k}\leq h(1) which can be rewritten as θ¯kh(1)h(0)k1\frac{\overline{\theta}}{k}\leq\frac{h(1)-h(0)}{k-1}. As a result, pk=θ¯+h(0)θ¯kp_{k}=\overline{\theta}+h(0)-\frac{\overline{\theta}}{k} and hence pkp_{k} satisfies (13).

If ph(1)p\geq h(1), the solution is p=θ¯+h(0)h(1)h(0)k1p=\overline{\theta}+h(0)-\frac{h(1)-h(0)}{k-1} and in order for ph(1)p\geq h(1) to hold we need θ¯+h(0)h(1)h(0)k1h(1)\overline{\theta}+h(0)-\frac{h(1)-h(0)}{k-1}\geq h(1) which can be rewritten as θ¯kh(1)h(0)k1\frac{\overline{\theta}}{k}\geq\frac{h(1)-h(0)}{k-1}. As a result, pk=θ¯+h(0)h(1)h(0)k1p_{k}=\overline{\theta}+h(0)-\frac{h(1)-h(0)}{k-1} and hence pkp_{k} satisfies (13).

As the first fraction in Proposition C.2 corresponds to (12), we obtain that χ(pk)=1+μ(1μ)(k+1k2)\chi(p_{k})=1+\mu(1-\mu)(k+\frac{1}{k}-2). Since =𝒰[0,θ¯]\mathcal{F}=\mathcal{U}[0,\overline{\theta}] has MHR, χ(p)\chi(p) is non-decreasing in pp (Proposition C.1) which finishes the proof of the corollary. ∎

C.6 Upper bounding CoNF by the effect of reviews on purchase (Remark 3.2)

Recall that β(p)\beta(p) quantifies how much the review ratings affect the purchase probability. It is expected that when the review ratings have a small effect on the purchase probability then the CoNF is also small (i.e. small β(p)\beta(p) would imply a small χ(p)\chi(p)). We formalize this below.

Proposition C.3.

For all non-absorbing prices pp, the CoNF is upper bounded by χ(p)β(p)\chi(p)\leq\beta(p).

Proof.

The monotonicity of hh, implies that for all n{0,1,,c}n\in\{0,1,\ldots,c\}:

Θ[Θ+h(c)p]Θ[Θ+h(n)p]Θ[Θ+h(0)p].{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]\geq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]\geq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p].

Taking expectation over the number of positive reviews NBinom(c,μ)N\sim{\mathrm{Binom}}(c,\mu), the first inequality implies

𝔼NBinom(c,μ)[Θ[Θ+h(N)p]]Θ[Θ+h(c)p].{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]\big{]}\leq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]. (14)

Similarly, taking expectation of the the reciprocal of the second inequality (which is well-defined as pp is non-absorbing) implies that

𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]1Θ[Θ+h(0)p].{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}\leq\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}. (15)

Expressing χ(π)\chi(\pi) as the ratio of the expressions in Propositions 3.2 and 3.3 and using (14) and (15):

χ(p)\displaystyle\chi(p) =𝔼NBinom(c,μ)[Θ[Θ+h(N)p]]𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]\displaystyle={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]]\cdot{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}
Θ[Θ+h(c)p]Θ[Θ+h(0)p]=β(p)\displaystyle\leq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(c)\geq p]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]}=\beta(p)

which concludes the proof. ∎

C.7 Average rating under σnewest{\sigma^{\textsc{newest}}} is smaller than under σrandom{\sigma^{\textsc{random}}} (Theorem 3.3)

Theorem 3.3 states that the average review rating of the cc reviews displayed by σnewest{\sigma^{\textsc{newest}}} is strictly smaller than the average rating of the cc reviews displayed by σrandom{\sigma^{\textsc{random}}}. To prove the theorem we first compare the behavior of σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} based on n=max{n|Θ[Θ+h(n)p]κ}n^{\star}=\max\{n|{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]\leq\kappa\} (i.e., the largest number of positive review ratings where the purchase probability is at most the average purchase rate κ=1/𝔼NBinom(c,μ)[1Θ[Θ+h(N)p]]\kappa=1/{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N)\geq p]}\Big{]}).

Lemma C.4.

For non-degenerate and non-absorbing price pp, πnnewestπnrandom\pi^{\textsc{newest}}_{n}\leq\pi^{\textsc{random}}_{n} if n>nn>n^{\star} and πnnewestπnrandom\pi^{\textsc{newest}}_{n}\geq\pi^{\textsc{random}}_{n} if nnn\leq n^{\star}.

Proof of Lemma C.4.

By Lemma 3.1, it holds that πnrandom=πnnewest1κΘ[Θ+h(n)p]\pi^{\textsc{random}}_{n}=\pi^{\textsc{newest}}_{n}\cdot\frac{1}{\kappa}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]. By the definition of nn^{\star} and the monotonicity of h(n)h(n), if nnn\leq n^{\star}, Θ[Θ+h(n)p]κ{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p]\leq\kappa and thus πnnewestπnrandom\pi^{\textsc{newest}}_{n}\geq\pi^{\textsc{random}}_{n}. The other case is analogous. ∎

Lemma C.5.

For any non-degenerate and non-absorbing price pp, the probability of showing at most nn^{\star} positive review ratings is strictly larger under σnewest{\sigma^{\textsc{newest}}} than under σrandom{\sigma^{\textsc{random}}}. Formally, Nπnnewest[Nn]>Nπnrandom[Nn]{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{newest}}}[N\leq n^{\star}]>{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{random}}}[N\leq n^{\star}].

Proof of Lemma C.5.

By Lemma C.4, πnnewestπnrandom\pi^{\textsc{newest}}_{n}\geq\pi^{\textsc{random}}_{n} for nnn\leq n^{\star}. To show the claim it is enough to show that there is some nnn\leq n^{\star} such that πnnewest<πnrandom\pi^{\textsc{newest}}_{n}<\pi^{\textsc{random}}_{n}. We show that the purchase probability when all reviews are negative is strictly greater under σnewest{\sigma^{\textsc{newest}}} than under σrandom{\sigma^{\textsc{random}}} i.e. π0newest<π0random\pi^{\textsc{newest}}_{0}<\pi^{\textsc{random}}_{0}. By the monotonicity of hh, Θ[Θ+h(0)p]Θ[Θ+h(n)p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(0)\geq p]\leq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] for all n{0,1,c}n\in\{0,1\ldots,c\}. Since pp is non-degenerate, the inequality is strict when n=cn=c. Taking the reciprocal and expectation of the last inequality yields the result. ∎

Proof of Theorem 3.3.

To show that the average rating is smaller under σnewest{\sigma^{\textsc{newest}}} compared to σrandom{\sigma^{\textsc{random}}},

𝔼Nπnrandom[N]𝔼Nπnnewest[N]\displaystyle{\mathbb{E}}_{N\sim\pi^{\textsc{random}}_{n}}[N]-{\mathbb{E}}_{N\sim\pi^{\textsc{newest}}_{n}}[N] =m=0cm(πmrandomπmnewest)\displaystyle=\sum_{m=0}^{c}m(\pi^{\textsc{random}}_{m}-\pi^{\textsc{newest}}_{m})
=m=n+1cm(πmrandomπmnewest)(1)m=0nm(πmnewestπmrandom)(2)\displaystyle=\underbrace{\sum_{m=n^{\star}+1}^{c}m(\pi^{\textsc{random}}_{m}-\pi^{\textsc{newest}}_{m})}_{(1)}-\underbrace{\sum_{m=0}^{n^{\star}}m(\pi^{\textsc{newest}}_{m}-\pi^{\textsc{random}}_{m})}_{(2)}

Lemma C.4 yields (πmrandomπmnewest)0(\pi^{\textsc{random}}_{m}-\pi^{\textsc{newest}}_{m})\geq 0 for mn+1m\geq n^{\star}+1 and thus

(1)\displaystyle(1) (n+1)m=n+1c(πmrandomπmnewest)\displaystyle\geq(n^{\star}+1)\sum_{m=n^{\star}+1}^{c}(\pi^{\textsc{random}}_{m}-\pi^{\textsc{newest}}_{m})
=(n+1)(Nπnrandom[N>n]Nπnnewest[N>n])\displaystyle=(n^{\star}+1)\big{(}{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{random}}}[N>n^{\star}]-{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{newest}}}[N>n^{\star}]\big{)}
=(n+1)(Nπnnewest[Nn]Nπnrandom[Nn])\displaystyle=(n^{\star}+1)\big{(}{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{newest}}}[N\leq n^{\star}]-{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{random}}}[N\leq n^{\star}]\big{)}

where in the last equality we used that [X>n]=1[Xn]{\mathbb{P}}[X>n^{\star}]=1-{\mathbb{P}}[X\leq n^{\star}] for any random variable XX. Using Lemma C.4 again yields (πmnewestπmrandom)0(\pi^{\textsc{newest}}_{m}-\pi^{\textsc{random}}_{m})\geq 0 for mnm\leq n^{\star} and thus

(2)nm=0n(πmnewestπmrandom)=n(Nπnnewest[Nn]Nπnrandom[Nn]).(2)\leq n^{\star}\sum_{m=0}^{n^{\star}}(\pi^{\textsc{newest}}_{m}-\pi^{\textsc{random}}_{m})=n^{\star}\big{(}{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{newest}}}[N\leq n^{\star}]-{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{random}}}[N\leq n^{\star}]\big{)}.

Therefore by Lemma C.5,

(1)(2)(Nπnnewest[Nn]Nπnrandom[Nn])>0.(1)-(2)\geq\big{(}{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{newest}}}[N\leq n^{\star}]-{\mathbb{P}}_{N\sim\pi_{n}^{\textsc{random}}}[N\leq n^{\star}]\big{)}>0.

Appendix D Supplementary material for Section 4

D.1 CoNF still exists under optimal dynamic pricing (Proposition 4.1)

Proof of Proposition 4.1.

For any α<4/3\alpha<4/3, we provide an instance with χ(Πdynamic)=α\chi(\Pi^{\textsc{dynamic}})=\alpha. The instance consists of a single review (c=1)(c=1), true quality μ<13\mu<\frac{1}{3}, estimate mappings h(0)=μ2h(0)=\mu^{2}, h(1)=1μ2h(1)=1-\mu^{2}, and customer-specific valuation =𝒰[0,2h¯]\mathcal{F}=\mathcal{U}[0,2\overline{h}] where h¯𝔼N[h(N)]\overline{h}\coloneqq{\mathbb{E}}_{N}[h(N)].

First, for any A>0A>0 and any price pp the revenue of selling to a customer with valuation V=Y+xV=Y+x where Y𝒰[0,A]Y\sim\mathcal{U}[0,A] is given by p[Vp]=p(A+xp)Ap\cdot{\mathbb{P}}[V\geq p]=\frac{p(A+x-p)}{A} if p[x,A+x]p\in[x,A+x]. Thus, the revenue maximizing price is p=max(A+x2,x)=max(x,A)+x2p^{\star}=\max(\frac{A+x}{2},x)=\frac{\max(x,A)+x}{2} yielding an optimal revenue of

maxp{p[Vp]}=max(x,A)+x22A+xmax(A,x)2A=(max(A,x)+x)24max(A,x).\max_{p}\{p\cdot{\mathbb{P}}[V\geq p]\}=\frac{\max(x,A)+x}{2}\cdot\frac{2A+x-\max(A,x)}{2A}=\frac{(\max(A,x)+x)^{2}}{4\max(A,x)}. (16)

Theorems 4.3 and 4.5 imply that χ(Πdynamic)=𝔼N[r(Θ+h(N))]r(Θ+h¯)\chi(\Pi^{\textsc{dynamic}})=\frac{{\mathbb{E}}_{N}[r^{\star}(\Theta+h(N))]}{r^{\star}(\Theta+\overline{h})}. Using (16) with A=2h¯A=2\overline{h}, we obtain

r(Θ+x)=(max(2h¯,x)+x)24max(2h¯,x)={98h¯if x=h¯h(1)if x=h(1)(h(0)+2h¯)28h¯if x=h(0).r^{\star}(\Theta+x)=\frac{(\max(2\overline{h},x)+x)^{2}}{4\max(2\overline{h},x)}=\begin{cases}\frac{9}{8}\overline{h}&\text{if }x=\overline{h}\\ h(1)&\text{if }x=h(1)\\ \frac{(h(0)+2\overline{h})^{2}}{8\overline{h}}&\text{if }x=h(0)\end{cases}. (17)

where max(2h¯,h(1))=h(1)\max(2\overline{h},h(1))=h(1) because 2μ(1μ)(1+2μ)=2μ(1+μ2μ2)<2μ(1+μ)<89<1μ22\mu(1-\mu)(1+2\mu)=2\mu(1+\mu-2\mu^{2})<2\mu(1+\mu)<\frac{8}{9}<1-\mu^{2} for μ<13\mu<\frac{1}{3} and thus

2h¯=2(μh(1)+(1μ)h(0))=2(μ(1μ2)+(1μ)μ2)=2μ(1μ)(1+2μ)<1μ2=h(1).2\overline{h}=2(\mu h(1)+(1-\mu)h(0))=2\big{(}\mu(1-\mu^{2})+(1-\mu)\mu^{2}\big{)}=2\mu(1-\mu)(1+2\mu)<1-\mu^{2}=h(1).

Using (17) , the CoNF can be expressed as:

χ(Πdynamic)\displaystyle\chi(\Pi^{\textsc{dynamic}}) =𝔼N[r(Θ+h(N))]r(Θ+h¯)=μr(Θ+h(1))+(1μ)r(Θ+h(0))r(Θ+h¯)\displaystyle=\frac{{\mathbb{E}}_{N}[r^{\star}(\Theta+h(N))]}{r^{\star}(\Theta+\overline{h})}=\frac{\mu r^{\star}(\Theta+h(1))+(1-\mu)r^{\star}(\Theta+h(0))}{r^{\star}(\Theta+\overline{h})}
=(1)μh(1)+(1μ)(h(0)+2h¯)28h¯98h¯=89μh(1)h¯+1μ9[(h(0)h¯)2+4h(0)h¯+4]\displaystyle\overset{(1)}{=}\frac{\mu h(1)+(1-\mu)\frac{(h(0)+2\overline{h})^{2}}{8\overline{h}}}{\frac{9}{8}\overline{h}}=\frac{8}{9}\cdot\frac{\mu h(1)}{\overline{h}}+\frac{1-\mu}{9}\Big{[}\Big{(}\frac{h(0)}{\overline{h}}\Big{)}^{2}+4\frac{h(0)}{\overline{h}}+4\Big{]}
=(2)8989(1μ)h(0)h¯+1μ9(h(0)h¯)2+1μ94h(0)h¯+1μ94\displaystyle\overset{(2)}{=}\frac{8}{9}-\frac{8}{9}\cdot\frac{(1-\mu)h(0)}{\overline{h}}+\frac{1-\mu}{9}\cdot\Big{(}\frac{h(0)}{\overline{h}}\Big{)}^{2}+\frac{1-\mu}{9}\cdot 4\frac{h(0)}{\overline{h}}+\frac{1-\mu}{9}\cdot 4
=(3)89+1μ9[(h(0)h¯)24h(0)h¯+4]=89+1μ9(h(0)h¯2)2\displaystyle\overset{(3)}{=}\frac{8}{9}+\frac{1-\mu}{9}\Big{[}\Big{(}\frac{h(0)}{\overline{h}}\Big{)}^{2}-4\frac{h(0)}{\overline{h}}+4\Big{]}=\frac{8}{9}+\frac{1-\mu}{9}\Big{(}\frac{h(0)}{\overline{h}}-2\Big{)}^{2}
=(4)89+1μ9(μ1+μ2μ22)2\displaystyle\overset{(4)}{=}\frac{8}{9}+\frac{1-\mu}{9}(\frac{\mu}{1+\mu-2\mu^{2}}-2)^{2}

where (1) follows by the expression for the optimal revenue (17), (2) follows by μh(1)h¯=1(1μ)h(0)h¯\frac{\mu h(1)}{\overline{h}}=1-\frac{(1-\mu)h(0)}{\overline{h}} which follows by the definition of h¯\overline{h}, (3) follows by 89(1μ)h(0)h¯+1μ94h(0)h¯=1μ94h(0)h¯-\frac{8}{9}\cdot\frac{(1-\mu)h(0)}{\overline{h}}+\frac{1-\mu}{9}\cdot 4\frac{h(0)}{\overline{h}}=-\frac{1-\mu}{9}\cdot 4\frac{h(0)}{\overline{h}}, and (4) follows by h(0)h¯=μ2μ(1μ)(1+2μ)=μ1+μ2μ2\frac{h(0)}{\overline{h}}=\frac{\mu^{2}}{\mu(1-\mu)(1+2\mu)}=\frac{\mu}{1+\mu-2\mu^{2}} because h(0)=μ2h(0)=\mu^{2} and h¯=μ(1μ)(1+2μ)\overline{h}=\mu(1-\mu)(1+2\mu).

Thus, χ(Πdynamic)=89+1μ9(μ1+μ2μ22)2\chi(\Pi^{\textsc{dynamic}})=\frac{8}{9}+\frac{1-\mu}{9}(\frac{\mu}{1+\mu-2\mu^{2}}-2)^{2}. The second term is 1μ9(μ1+μ2μ22)249\frac{1-\mu}{9}(\frac{\mu}{1+\mu-2\mu^{2}}-2)^{2}\to\frac{4}{9} as μ0\mu\to 0, and thus χ(Πdynamic)89+49=43\chi(\Pi^{\textsc{dynamic}})\to\frac{8}{9}+\frac{4}{9}=\frac{4}{3} as μ0\mu\to 0. Therefore, for any α<4/3\alpha<4/3 there is some sufficiently small μ>0\mu>0 such that χ(Πdynamic)>α\chi(\Pi^{\textsc{dynamic}})>\alpha. ∎

D.2 σrandom\sigma^{\textsc{random}} is no worse than σnewest\sigma^{\textsc{newest}} under optimal dynamic pricing (Remark 4.1)

Theorem 3.1 established that χ(p)>1\chi(p)>1 for any fixed non-degenerate and non-absorbing price pp. Here we show that if the platform optimizes over dynamic prices we have χ(Πdynamic)1\chi(\Pi^{\textsc{dynamic}})\geq 1 i.e. σrandom{\sigma^{\textsc{random}}} has no smaller revenue that σnewest{\sigma^{\textsc{newest}}} under optimal dynamic prices. Recall that χ(Πdynamic)=maxρΠdynamicRev(σrandom,ρ)maxρΠdynamicRev(σnewest,ρ)\chi(\Pi^{\textsc{dynamic}})=\frac{\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{random}},\rho)}{\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{newest}},\rho)}. We show the following result.

Theorem D.1.

For any problem instance, χ(Πdynamic)1\chi(\Pi^{\textsc{dynamic}})\geq 1.

Proof.

By definition of χ(Πdynamic)\chi(\Pi^{\textsc{dynamic}}), it is sufficient to show that

maxρΠdynamicRev(σnewest,ρ)maxρΠdynamicRev(σrandom,ρ).\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{newest}},\rho)\leq\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{random}},\rho). (18)

Letting h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)] and pargmaxppΘ[Θ+h¯p]p^{\star}\in\operatorname*{arg\,max}_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p], Theorem 4.4 yields that the pricing policy given by ρnewest(𝒛)=h(N𝒛)+ph¯\rho^{\textsc{newest}}(\bm{z})=h(N_{\bm{z}})+p^{\star}-\overline{h} is optimal under σnewest\sigma^{\textsc{newest}} and is thus the maximizer of the left-hand side of (18). To prove (18) it is sufficient to show that offering prices ρnewest(𝒛)\rho^{\textsc{newest}}(\bm{z}) under σrandom\sigma^{\textsc{random}} yields the same revenue as offering prices ρnewest(𝒛)\rho^{\textsc{newest}}(\bm{z}) under σnewest\sigma^{\textsc{newest}} i.e. Rev(σrandom,ρnewest)=Rev(σnewest,ρnewest)\textsc{Rev}(\sigma^{\textsc{random}},\rho^{\textsc{newest}})=\textsc{Rev}(\sigma^{\textsc{newest}},\rho^{\textsc{newest}}).

As σrandom\sigma^{\textsc{random}} shows cc i.i.d. Bern(μ){\mathrm{Bern}}(\mu) reviews, the revenue of any pricing policy ρ\rho is given by

Rev(σrandom,ρ)=𝔼Z1,,Zci.i.dBern(μ)[ρ(Z1,,Zc)Θ[Θ+h(i=1cZi)ρ(Z1,,Zc)]].{\textsc{Rev}}({\sigma^{\textsc{random}}},\rho)={\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d}{\mathrm{Bern}}(\mu)}\Big{[}\rho(Z_{1},\ldots,Z_{c}){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\sum_{i=1}^{c}Z_{i})\geq\rho(Z_{1},\ldots,Z_{c})]\Big{]}. (19)

For any reviews (Z1,,Zc)(Z_{1},\ldots,Z_{c}) the purchase probability under ρnewest(Z1,,Zc)\rho^{\textsc{newest}}(Z_{1},\ldots,Z_{c}) equals

Θ[Θ+h(i=1cZi)h(i=1cZi)+ph¯]=Θ[Θ+h¯p]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(\sum_{i=1}^{c}Z_{i})\geq h(\sum_{i=1}^{c}Z_{i})+p^{\star}-\overline{h}\Big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p^{\star}] (20)

and is independent of (Z1,,Zc)(Z_{1},\ldots,Z_{c}). As Z1,,Zci.i.d.Bern(μ)Z_{1},\ldots,Z_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu), the number of positive review ratings i=1cZi\sum_{i=1}^{c}Z_{i} is distributed as NBinom(c,μ)N\sim{\mathrm{Binom}}(c,\mu); the expected price of ρnewest(Z1,,Zc)\rho^{\textsc{newest}}(Z_{1},\ldots,Z_{c}) is thus

𝔼Z1,,Zci.i.dBern(μ)[ρnewest(Z1,,Zc)]=𝔼NBinom(c,μ)[h(N)+ph¯]=p{\mathbb{E}}_{Z_{1},\ldots,Z_{c}\sim_{i.i.d}{\mathrm{Bern}}(\mu)}\Big{[}\rho^{\textsc{newest}}(Z_{1},\ldots,Z_{c})\Big{]}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)+p^{\star}-\overline{h}]=p^{\star} (21)

where the last equality uses the definition of h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. Combining (19), (20),and (21), we obtain Rev(σrandom,ρ)=pΘ[Θ+h¯p]=r(Θ+h¯){\textsc{Rev}}({\sigma^{\textsc{random}}},\rho)=p^{\star}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p^{\star}]=r^{\star}(\Theta+\overline{h}). The last expression is exactly equal to maxρΠdynamicRev(σnewest,ρ)\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{newest}},\rho) by Theorem 4.5 which concludes the proof. ∎

Another way to prove Theorem D.1 stems from using the convexity of r(Θ+x)r^{\star}(\Theta+x), defined in the beginning of Section 4.2. This is shown in the lemma below.

Lemma D.1.

r(Θ+x)r^{\star}(\Theta+x) is a convex function in xx.

Alternative proof of Theorem D.1.

By Theorem 4.3 and Theorem 4.5, we know that

maxρΠdynamicRev(σrandom,ρ)=𝔼NBinom(c,μ)[r(Θ+h(N))]\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{random}},\rho)={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[r^{\star}(\Theta+h(N))]

and

maxρΠdynamicRev(σnewest,ρ)=r(Θ+𝔼NBinom(c,μ)[h(N)]).\max_{\rho\in\Pi^{\textsc{dynamic}}}\textsc{Rev}(\sigma^{\textsc{newest}},\rho)=r^{\star}(\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]).

By Lemma D.1 r(Θ+x)r^{\star}(\Theta+x) is convex. Thus by Jensen’s inequality the theorem follows as

r(Θ+𝔼NBinom(c,μ)[h(N)])𝔼NBinom(c,μ)[r(Θ+h(N))].r^{\star}(\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)])\leq{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[r^{\star}(\Theta+h(N))].

Proof of Lemma D.1.

Let x,y,α[0,1]x,y\in\mathbb{R},\alpha\in[0,1]. To show that r(Θ+x)r^{\star}(\Theta+x) is convex, it suffices to prove that

maxppΘ[Θ+αx+(1α)yp]αmaxppΘ[Θ+xp]+(1α)maxppΘ[Θ+yp]\max_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p]\leq\alpha\max_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p]+(1-\alpha)\max_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+y\geq p]

Let pargmaxpΘ[Θ+αx+(1α)yp]p\in\operatorname*{arg\,max}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p] be an optimal price for selling to a customer with valuation Θ+αx+(1α)y\Theta+\alpha x+(1-\alpha)y. We define two prices p1p(1α)(yx)p_{1}\coloneqq p-(1-\alpha)(y-x) and p2pα(xy)p_{2}\coloneqq p-\alpha(x-y). The probability that a customer with valuation Θ+x\Theta+x purchases under p1p_{1} equals the probability that a customer with valuation Θ+αx+(1α)y\Theta+\alpha x+(1-\alpha)y purchases under pp and thus

p1Θ[Θ+xp1]=p1Θ[Θ+αx+(1α)yp].p_{1}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p_{1}]=p_{1}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p].

Similarly, a customer with valuation Θ+y\Theta+y purchases under p2p_{2} equals the probability that a customer with valuation Θ+αx+(1α)y\Theta+\alpha x+(1-\alpha)y purchases under pp and thus

p2Θ[Θ+xp2]=p2Θ[Θ+αx+(1α)yp].p_{2}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p_{2}]=p_{2}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p].

Therefore, rearranging and using that αp1+(1α)p2=p\alpha p_{1}+(1-\alpha)p_{2}=p we obtain

αp1Θ[Θ+xp1]+(1α)p2Θ[Θ+xp2]=pΘ[Θ+αx+(1α)yp].\alpha p_{1}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p_{1}]+(1-\alpha)p_{2}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p_{2}]=p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p].

Upper bounding the left-hand side, the proof of the lemma concludes as

αmaxp1p1Θ[Θ+xp1]+(1α)maxp2p2Θ[Θ+yp2]pΘ[Θ+αx+(1α)yp].\alpha\max_{p_{1}\in\mathbb{R}}p_{1}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p_{1}]+(1-\alpha)\max_{p_{2}\in\mathbb{R}}p_{2}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+y\geq p_{2}]\geq p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\alpha x+(1-\alpha)y\geq p].

D.3 Stronger CoNF upper bound under further assumptions on hh (Remark 4.2)

Theorem D.2.

For any instance with h(n)uh(n)\leq u for all n{0,1,,c}n\in\{0,1,\ldots,c\}: χ(Πdynamic)2Θ[Θu]Θ[Θ0]\chi(\Pi^{\textsc{dynamic}})\leq\frac{2{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}

Proof.

Similar to the proof of Theorem 4.2, we define p~n=h¯+max(p(Θ+h(n))h(n),0)\tilde{p}_{n}=\overline{h}+\max(p^{\star}(\Theta+h(n))-h(n),0) and let p(n)=p(Θ+h(n))p(n)=p^{\star}(\Theta+h(n)) for convenience. We refine the analysis of Lemma 4.5, to show that for any n{0,1,,c}n\in\{0,1,\ldots,c\}, Demand Ratio(p~n,n)Θ[Θu]Θ[Θ0]\textsc{Demand Ratio}(\tilde{p}_{n},n)\leq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}. We consider two cases:

  • If p(n)h(n)p(n)\geq h(n), then max(p(n)h(n),0)=p(n)h(n)\max(p(n)-h(n),0)=p(n)-h(n) and thus

    Demand Ratio(p~n,n)=Θ[Θp(n)h(n)]Θ[Θmax(p(n)h(n),0)]=1Θ[Θu]Θ[Θ0].\textsc{Demand Ratio}(\tilde{p}_{n},n)=\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta\geq p(n)-h(n)\big{]}}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta\geq\max(p(n)-h(n),0)\big{]}}=1\leq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}.

    since Θ[Θu]Θ[Θ0]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u]\geq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0] as u>0u>0. If p(n)<h(n)p(n)<h(n), then

  • If p(n)<h(n)p(n)<h(n), then max(p(n)h(n),0)=0\max(p(n)-h(n),0)=0

    Demand Ratio(p~n,n)\displaystyle\textsc{Demand Ratio}(\tilde{p}_{n},n) =Θ[Θp(n)h(n)]Θ[Θmax(p(n)h(n),0)]\displaystyle=\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq p(n)-h(n)]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\max(p(n)-h(n),0)]}
    =Θ[Θp(n)h(n)]Θ[Θ0]Θ[Θh(n)]Θ[Θ0]Θ[Θa]Θ[Θ0].\displaystyle=\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq p(n)-h(n)]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}\leq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-h(n)]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}\leq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-a]}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq 0]}.

    The first inequality uses Θ[Θp(n)h(n)]Θ[Θh(n)]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq p(n)-h(n)]\leq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-h(n)] as the optimal price p(n)0p(n)\geq 0. The second inequality uses Θ[Θh(n)]Θ[Θu]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-h(n)]\leq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq-u] as h(n)uh(n)\leq u.

By Lemma 4.4, the expected price ratio is upper bounded as 𝔼[Price Ratio(p~N,N)]2{\mathbb{E}}\Big{[}\textsc{Price Ratio}(\tilde{p}_{N},N)\Big{]}\leq 2. Combining this with the aforementioned bound on the demand ratio the proof follows. ∎

D.4 Characterization of optimal revenue under Newest First (Theorem 4.5)

Proof of Theorem 4.5.

Letting h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)], Theorem 4.4 establishes that the optimal dynamic pricing policy is review-offsetting with offset ph¯p^{\star}-\overline{h}. Using the expression for the revenue of review-offsetting policies with offset a=ph¯a^{\star}=p^{\star}-\overline{h} in Lemma 4.2:

maxρΠdynamicRev(σnewest,ρ)=(a+h¯)Θ[Θ+h¯a+h¯]=pΘ[Θ+h¯p]=r(Θ+h¯).\max_{\rho\in\Pi^{\textsc{dynamic}}}{\textsc{Rev}}({\sigma^{\textsc{newest}}},\rho)=(a^{\star}+\overline{h}){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq a^{\star}+\overline{h}]=p^{\star}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p^{\star}]=r^{\star}(\Theta+\overline{h}).

D.5 All optimal dynamic policies under Newest First (Remark 4.3)

Theorem D.3.

Suppose that the platform uses σnewest\sigma^{\textsc{newest}} as the review ordering policy. Let p𝐳argmaxppΘ[Θ+𝔼NBinom(c,μ)[h(N)]p]p_{\bm{z}}\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]\geq p\big{]} be any revenue-maximizing price for 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}. A dynamic pricing policy ρnewest\rho^{\textsc{newest}} is optimal if and only if it has the form

ρnewest(𝒛)=h(N𝒛)+p𝒛𝔼NBinom(c,μ)[h(N)].\rho^{\textsc{newest}}(\bm{z})=h(N_{\bm{z}})+p_{\bm{z}}-{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)].

For an arbitrary dynamic pricing policy ρ\rho, letting h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)], the optimality condition of the theorem can be rewritten as 181818If the customer-specific distribution \mathcal{F} is strictly regular, there is a unique optimal dynamic pricing policy.

ρ(𝒛)h(N𝒛)+h¯argmaxppΘ[Θ+h¯p]for all 𝒛{0,1}c\rho(\bm{z})-h(N_{\bm{z}})+\overline{h}\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq p\big{]}\quad\text{for all $\bm{z}\in\{0,1\}^{c}$} (22)

Recall that by Theorem 4.5, the optimal dynamic pricing revenue under σnewest{\sigma^{\textsc{newest}}} is equal to the optimal revenue of selling to a customer with valuation Θ+h¯\Theta+\overline{h}, i.e., maxρΠdynamicRev(σnewest,ρ)=r(Θ+h¯)\max_{\rho\in\Pi^{\textsc{dynamic}}}{\textsc{Rev}}({\sigma^{\textsc{newest}}},\rho)=r^{\star}(\Theta+\overline{h}). Lemma 4.1 establishes that the revenue of ρ\rho is upper bounded by the revenue of one of the policies ρ~𝒛\tilde{\rho}_{\bm{z}} and that equality is achieved if and only if each of ρ~𝒛\tilde{\rho}_{\bm{z}} have the same revenue. To characterize all optimal dynamic pricing policies, we show that ρ~𝒛\tilde{\rho}_{\bm{z}} is optimal if and only if ρ(𝒛)h(N𝒛)+h¯\rho(\bm{z})-h(N_{\bm{z}})+\overline{h} is a revenue maximizing price when selling to a customer with valuation Θ+h¯\Theta+\overline{h} (Lemma D.2).

Lemma D.2.

The policy ρ~𝐳\tilde{\rho}_{\bm{z}} is optimal if and only if ρ(𝐳)h(N𝐳)+h¯\rho(\bm{z})-h(N_{\bm{z}})+\overline{h} is a revenue maximizing price when selling to a customer with valuation Θ+h¯\Theta+\overline{h} i.e. ρ(𝐳)h(N𝐳)+h¯argmaxppΘ[Θ+h¯p]\rho(\bm{z})-h(N_{\bm{z}})+\overline{h}\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p].

Proof.

Given that ρ~𝒛\tilde{\rho}_{\bm{z}} is a review-offsetting policy with offset ρ(𝒛)h(N𝒛)\rho(\bm{z})-h(N_{\bm{z}}), Lemma 4.2 implies

Rev(σnewest,ρ~𝒛)=(h¯+ρ(𝒛)h(N𝒛))p𝒛Θ[Θ+h(N𝒛)ρ(𝒛)].{\textsc{Rev}}(\sigma^{\textsc{newest}},\tilde{\rho}_{\bm{z}})=\underbrace{\Big{(}\overline{h}+\rho(\bm{z})-h(N_{\bm{z}})\Big{)}}_{p_{\bm{z}}}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}.

By adding and subtracting h¯\overline{h} from each side the purchase probability can be rewritten as

Θ[Θ+h(N𝒛)ρ(𝒛)]=Θ[Θ+h¯h¯+ρ(𝒛)h(N𝒛)]=Θ[Θ+h¯p𝒛].{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq\overline{h}+\rho(\bm{z})-h(N_{\bm{z}})\big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+\overline{h}\geq p_{\bm{z}}].

Thus, the revenue of ρ~𝒛\tilde{\rho}_{\bm{z}} is equal to the revenue of offering a price of p𝒛p_{\bm{z}} to a customer with valuation Θ+h¯\Theta+\overline{h}, which is maximized if and only p𝒛=ρ(𝒛)h(N𝒛)+h¯argmaxppΘ[Θ+h¯p]p_{\bm{z}}=\rho(\bm{z})-h(N_{\bm{z}})+\overline{h}\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p]. ∎

Proof of Theorem D.3.

For an optimal policy ρ\rho, Lemma 4.1 implies that

Rev(ρ,σnewest)max𝒛{0,1}cRev(ρ~𝒛,σnewest){\textsc{Rev}}(\rho,{\sigma^{\textsc{newest}}})\leq\max_{\bm{z}\in\{0,1\}^{c}}{\textsc{Rev}}(\tilde{\rho}_{\bm{z}},{\sigma^{\textsc{newest}}})

where equality holds if and only if Rev(ρ~𝒛,σnewest)=Rev(ρ~𝒛,σnewest){\textsc{Rev}}(\tilde{\rho}_{\bm{z}},{\sigma^{\textsc{newest}}})={\textsc{Rev}}(\tilde{\rho}_{\bm{z}^{\prime}},{\sigma^{\textsc{newest}}}) for all 𝒛,𝒛{0,1}c\bm{z},\bm{z}^{\prime}\in\{0,1\}^{c}. The optimality of ρ\rho thus implies that ρ~𝒛\tilde{\rho}_{\bm{z}} must be optimal for all 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}. By Lemma D.2, ρ(𝒛)h(N𝒛)+h¯argmaxppΘ[Θ+h¯p]\rho(\bm{z})-h(N_{\bm{z}})+\overline{h}\in\operatorname*{arg\,max}_{p}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\overline{h}\geq p] for all 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} i.e. (22) holds.

Conversely, suppose that ρ\rho satisfies (22). Lemma D.2 implies that each of ρ~𝒛\tilde{\rho}_{\bm{z}} is optimal and thus yields the same revenue, i.e., Rev(σnewest,ρ~𝒛)=r(Θ+h¯){\textsc{Rev}}({\sigma^{\textsc{newest}}},\tilde{\rho}_{\bm{z}})=r^{\star}(\Theta+\overline{h}) for all 𝒛\bm{z}. By the equality condition of Lemma 4.1, Rev(ρ,σnewest)=max𝒛{0,1}cRev(ρ~𝒛,σnewest)=r(Θ+h¯){\textsc{Rev}}(\rho,{\sigma^{\textsc{newest}}})=\max_{\bm{z}\in\{0,1\}^{c}}{\textsc{Rev}}(\tilde{\rho}_{\bm{z}},{\sigma^{\textsc{newest}}})=r^{\star}(\Theta+\overline{h}) and thus ρ\rho is optimal. ∎

D.6 Uniqueness of optimal dynamic pricing for Newest and Random (Remark 4.3)

We show that the optimal dynamic pricing policies under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} are unique assuming mild regularity conditions on the customer-specific value distribution \mathcal{F}. For a continuous random variable VV with bounded support, let F¯V(p)=[Vp]\overline{F}_{V}(p)={\mathbb{P}}[V\geq p] be its survival function. As VV has a continuous distribution, the inverse survival function GV(q)=(F¯V)1(q)G_{V}(q)=(\overline{F}_{V})^{-1}(q) is well-defined for q(0,1)q\in(0,1), i.e., for any quantile q(0,1)q\in(0,1), GV(q)G_{V}(q) is the unique price which induces a purchase probability qq. Let r(q)qGV(q)r(q)\coloneqq qG_{V}(q) be the revenue as a function of the quantile qq. We require a notion of strict regularity.

Definition D.1 (Strict Regularity).

A random variable VV has a strictly regular distribution if the revenue function r(q)=qGV(q)r(q)=qG_{V}(q) is strictly concave in the quantile q(0,1)q\in(0,1).

In order to show uniqueness of the optimal dynamic pricing policies, we extend the strict regularity assumption by imposing a few mild further conditions.

Definition D.2 (Well-behavedness).

A random variable VV is well-behaved if: (1) VV is continuous with bounded support; (2) VV is strictly regular; (3) [V>0]>0{\mathbb{P}}[V>0]>0.

Condition (3) implies that we can achieve a strictly positive revenue by selling to a customer with a well-behaved valuation VV.

Lemma D.3.

For any x0x\geq 0 and well-behaved random variable VV, VxV+xV_{x}\coloneqq V+x is also well-behaved.

Proof of Lemma D.3.

To prove that VxV_{x} is well-behaved we establish all three conditions of Definition D.2. First, the continuity and boundedness of VV imply that Vx=V+xV_{x}=V+x also has these properties. Second, qGV(q)qG_{V}(q) is strictly concave and qxqx is linear in qq; thus qGV+x(q)=qGV(q)+qxqG_{V+x}(q)=qG_{V}(q)+qx is strictly concave. Third, as [V>0]{\mathbb{P}}[V>0], adding a non-negative scalar xx yields [Vx>0]{\mathbb{P}}[V_{x}>0]. ∎

Lemma D.4.

For a well-behaved random variable V~\tilde{V}, the revenue-maximizing price pp^{\star} is unique.

Proof of Lemma D.4.

Letting the support of V~\tilde{V} be [v¯,v¯][\underline{v},\overline{v}] implies that GV~(q)(v¯,v¯)G_{\tilde{V}}(q)\in(\underline{v},\overline{v}) for all q(0,1)q\in(0,1), limq0GV~(q)=v¯\lim_{q\to 0}G_{\tilde{V}}(q)=\overline{v}, and limq1GV~(q)=v¯\lim_{q\to 1}G_{\tilde{V}}(q)=\underline{v}. Thus,

limq0r(q)=limq0qGV~(q)=limq0GV~(q)limq0q=0.\lim_{q\to 0}r(q)=\lim_{q\to 0}qG_{\tilde{V}}(q)=\lim_{q\to 0}G_{\tilde{V}}(q)\cdot\lim_{q\to 0}q=0.

By the third condition of well-behavedness, V~[V~>0]{\mathbb{P}}_{\tilde{V}}[\tilde{V}>0] which implies that v¯>0\overline{v}>0 and thus there exists some quantile q~\tilde{q} such that r(q~)>0r(\tilde{q})>0. In particular, for q~=v¯max(v¯,0)2(v¯v¯)\tilde{q}=\frac{\overline{v}-\max(\underline{v},0)}{2(\overline{v}-\underline{v})}, r(q~)=q~GV~(q~)>0r(\tilde{q})=\tilde{q}G_{\tilde{V}}(\tilde{q})>0 as GV~(q~)=v¯+max(0,v¯)2>0G_{\tilde{V}}(\tilde{q})=\frac{\overline{v}+\max(0,\underline{v})}{2}>0. Since limq0r(q)=0\lim_{q\to 0}r(q)=0, r(q~)>0r(\tilde{q})>0, and r(q)r(q) is strictly concave on (0,1)(0,1), it holds that either (a) r(q)r(q) has a unique maximizer at q=q(0,1)q=q^{\star}\in(0,1) or (b) r(q)r(q) is strictly increasing for q(0,1)q\in(0,1). We consider these two cases separately below.

For case (a), the optimal price pp^{\star} has a quantile q=V~[V~p](0,1)q^{\star}={\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p^{\star}]\in(0,1) such that r(q)>r(q)r(q^{\star})>r(q) for all q(0,1){q}q\in(0,1)\setminus\{q^{\star}\}. The strict concavity of rr implies

r(q)>max(limq1r(q),limq0r(q))=max(v¯,0).r(q^{\star})>\max(\lim_{q\to 1}r(q),\lim_{q\to 0}r(q))=\max(\underline{v},0). (23)

We show that p=GV~(q)p^{\star}=G_{\tilde{V}}(q^{\star}) is the unique revenue-maximizing price. It suffices to show that this price provides strictly higher revenue than any other price ppp\neq p^{\star}, i.e., r(q)=pV~[V~p]>pV~[V~p]r(q^{\star})=p^{\star}{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p^{\star}]>p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]. Let q=V~[V~p]q={\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p] be the quantile associated with the price pp.

  • If p(v¯,v¯)p\in(\underline{v},\overline{v}) the continuity of V~\tilde{V} implies that q=V~[V~p](0,1)q={\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]\in(0,1) and thus the revenue of pp^{\star} is strictly larger than the revenue of pp by the assumption of case (a) that r(q)>r(q)r(q^{\star})>r(q) for all q(0,1){q}q\in(0,1)\setminus\{q^{\star}\}.

  • If pv¯p\geq\overline{v}, then pV~[V~p]=0p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]=0 which combined with r(q)>0r(q^{\star})>0 yields that the revenue of pp^{\star} is strictly larger than the revenue of pp.

  • Lastly, if pv¯p\leq\underline{v}, then pV~[V~p]v¯<r(q)p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]\leq\underline{v}<r(q^{\star}) by (23).

For case (b), r(q)r(q) is strictly increasing for q(0,1)q\in(0,1). Let p=v¯p^{\star}=\underline{v}. Since r(1)=v¯r(1)=\overline{v} , r(0)=0r(0)=0, and r(1)>r(0)r(1)>r(0) we obtain that v¯>0\underline{v}>0. To show that p=v¯p^{\star}=\underline{v} is the unique revenue-maximizing price, it suffices to prove that v¯=pV~[V~p]>pV~[V~p]\overline{v}=p^{\star}{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p^{\star}]>p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p] for any ppp\neq p^{\star}.

  • If p(v¯,v¯)p\in(\underline{v},\overline{v}) the continuity of V~\tilde{V} implies that q=V~[V~p](0,1)q={\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]\in(0,1) and thus the revenue of pp^{\star} is strictly larger than the revenue of pp by the assumption of case (b) that r(q)r(q) is increasing on (0,1)(0,1).

  • If pv¯p\geq\overline{v} then V~[V~p]=0{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]=0 and thus pV~[V~p]=0p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]=0 which is strictly smaller than p=v¯p^{\star}=\underline{v}.

  • Lastly if p<v¯p<\underline{v}, then pV~[V~p]p<v¯=p=pV~[V~p]p{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p]\leq p<\underline{v}=p^{\star}=p^{\star}{\mathbb{P}}_{\tilde{V}}[\tilde{V}\geq p^{\star}].

As a result, in both cases, we established that there exists a unique revenue-maximizing pp^{\star}. ∎

Proposition D.1.

For a well-behaved customer-specific valuation Θ\Theta, the optimal dynamic pricing policies under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} are unique.

Proof.

The valuation of a customer at a state with review ratings 𝒛\bm{z} is Θ+h(N𝒛)\Theta+h(N_{\bm{z}}). By Theorem 4.3, any optimal dynamic pricing policy under σrandom{\sigma^{\textsc{random}}} outputs, at each state of review ratings 𝒛\bm{z}, a revenue-maximizing price for a customer with valuation Θ+h(N𝒛)\Theta+h(N_{\bm{z}}). Combining Lemma D.3 with x=h(N𝒛)x=h(N_{\bm{z}}) and Lemma D.4, this price is unique for every state of review ratings 𝒛\bm{z}.

By Theorem D.3, any optimal dynamic pricing policy under σnewest{\sigma^{\textsc{newest}}} outputs, at each state of review ratings 𝒛\bm{z}, a price of ρnewest(𝒛)=h(N𝒛)+p𝒛h¯\rho^{\textsc{newest}}(\bm{z})=h(N_{\bm{z}})+p_{\bm{z}}-\overline{h} where p𝒛p_{\bm{z}} is a revenue-maximizing price for a customer with valuation Θ+h¯\Theta+\overline{h}. Combining Lemma D.3 with x=h¯x=\overline{h} and Lemma D.4, the price ρnewest(𝒛)\rho^{\textsc{newest}}(\bm{z}) is unique for every state of review ratings 𝒛\bm{z}. ∎

D.7 Dynamic pricing revenue of Newest First (Proposition 4.2)

As an analogue of Lemma 3.1, let 𝒁t=(Zt,1,,Zt,c){0,1}c\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c})\in\{0,1\}^{c} denote the process of the newest cc reviews. We note that 𝒁t\bm{Z}_{t} is a time-homogenous Markov chain on a finite state space {0,1}c\{0,1\}^{c}.

If 𝒁t\bm{Z}_{t} is at state 𝒛=(z1,,zc)\bm{z}=(z_{1},\ldots,z_{c}) it stays at that state if there is no purchase (with probability 1Θ[Θ+h(N𝒛)ρ(𝒛)]1-{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})]). If there is a purchase with a positive review, it transitions to (1,z1,,zc1)(1,z_{1},\ldots,z_{c-1}) (with probability μΘ[Θ+h(N𝒛)ρ(𝒛)]\mu{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})]). If there is a purchase with a negative review, it transitions to (0,z1,,zc1)(0,z_{1},\ldots,z_{c-1}) (with probability (1μ)Θ[Θ+h(N𝒛)ρ(𝒛)](1-\mu){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})]). If μ(0,1)\mu\in(0,1) and ρ\rho is non-absorbing, 𝒁t\bm{Z}_{t} is a single-recurrence-class Markov chain with no transient states. Thus, 𝒁t\bm{Z}_{t} admits a unique stationary distribution characterized in the following lemma.

Lemma D.5.

The stationary distribution of 𝐙t{0,1}c\bm{Z}_{t}\in\{0,1\}^{c} under any non-absorbing policy ρ\rho is 191919Recall that for a state of cc reviews 𝐳{0,1}c\bm{z}\in\{0,1\}^{c}, N𝐳=i=1cziN_{\bm{z}}=\sum_{i=1}^{c}z_{i} denotes the number of positive review ratings.

π𝒛=κμN𝒛(1μ)cN𝒛Θ[Θ+h(N𝒛)ρ(𝒛)] for 𝒛{0,1}c,\pi_{\bm{z}}=\kappa\cdot\frac{\mu^{N_{\bm{z}}}(1-\mu)^{c-N_{\bm{z}}}}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\Big{]}}\text{ for }\bm{z}\in\{0,1\}^{c},

where κ=1/𝔼Y1,,Yci.i.d.Bern(μ)[1Θ[Θ+h(i=1cYi)ρ(Y1,,Yc)]]\kappa=1/{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\Bigg{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(\sum_{i=1}^{c}Y_{i})\geq\rho(Y_{1},\ldots,Y_{c})\big{]}}\Bigg{]} is a normalizing constant.

Proof of Lemma D.5.

Similar to the proof of Lemma 3.1, we invoke Lemma C.3. In the language of Lemma C.3, 𝒁t\bm{Z}_{t} corresponds to f\mathcal{M}_{f}, the state space 𝒮\mathcal{S} to {0,1}c\{0,1\}^{c}, and ff is a function that expresses the purchase probability at a given state, i.e., f(z1,,zc)=Θ[Θ+h(N𝒛)ρ(𝒛)]f(z_{1},\ldots,z_{c})={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h(N_{\bm{z}})\geq\rho(\bm{z})\Big{]}. Note that with probability 1f(z1,,zc)1-f(z_{1},\ldots,z_{c}), 𝒁t\bm{Z}_{t} remains at the same state (as there is no purchase).

To apply Lemma C.3, we need to show that whenever there is purchase, 𝒁t\bm{Z}_{t} transitions according to a Markov chain with stationary distribution μN𝒛(1μ)cN𝒛\mu^{N_{\bm{z}}}(1-\mu)^{c-N_{\bm{z}}}. Consider the Markov chain \mathcal{M} which always replaces the cc-th last review with a new Bern(μ){\mathrm{Bern}}(\mu) review. This process has stationary distribution equal to the above numerator and 𝒁t\bm{Z}_{t} transitions according to \mathcal{M} upon a purchase, i.e., with probability f(z1,,zc)f(z_{1},\ldots,z_{c}). As a result, by Lemma C.3, π\pi is a stationary distribution for 𝒁t\bm{Z}_{t}. As 𝒁t\bm{Z}_{t} is irreducible and aperiodic, this is the unique stationary distribution. ∎

Proof of Proposition 4.2.

By Eq. (1) and the Ergodic theorem, we can express the revenue as

Rev(σnewest,ρ)\displaystyle\textsc{Rev}(\sigma^{\textsc{newest}},\rho) =lim infT𝔼[t=1Tρ(Zt,1,,Zt,c)Θ[Θ+h(i=1cZt,i)ρ(Zt,1,,Zt,c)]]T\displaystyle=\liminf_{T\to\infty}\frac{{\mathbb{E}}\left[\sum_{t=1}^{T}\rho(Z_{t,1},\ldots,Z_{t,c}){\mathbb{P}}_{\Theta\sim\mathcal{F}}\left[\Theta+h(\sum_{i=1}^{c}Z_{t,i})\geq\rho(Z_{t,1},\ldots,Z_{t,c})\right]\right]}{T}
=(z1,,zc){0,1}cπ(z1,,zc)ρ(z1,,zc)Θ[Θ+h(N𝒛)ρ(z1,,zc)]\displaystyle=\sum_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\pi_{(z_{1},\ldots,z_{c})}\rho(z_{1},\ldots,z_{c}){\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(z_{1},\ldots,z_{c})\big{]}
=κ(z1,,zc){0,1}cμi=1czi(1μ)ci=1cziρ(z1,,zc)\displaystyle=\kappa\cdot\sum_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\mu^{\sum_{i=1}^{c}z_{i}}(1-\mu)^{c-\sum_{i=1}^{c}z_{i}}\rho(z_{1},\ldots,z_{c})
=κ𝔼Y1,,Yci.i.d.Bern(μ)[ρ(Y1,,Yc)]\displaystyle=\kappa\cdot{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}[\rho(Y_{1},\ldots,Y_{c})]
=𝔼Y1,,Yci.i.d.Bern(μ)[ρ(Y1,,Yc)]𝔼Y1,,Yci.i.d.Bern(μ)[1Θ[Θ+h(i=1cYi)ρ(Y1,,Yc)]].\displaystyle=\frac{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}[\rho(Y_{1},\ldots,Y_{c})]}{{\mathbb{E}}_{Y_{1},\ldots,Y_{c}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\Big{[}\frac{1}{{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(\sum_{i=1}^{c}Y_{i})\geq\rho(Y_{1},\ldots,Y_{c})\big{]}}\Big{]}}.

The third equality applies Lemma D.5 and cancels the term Θ[Θ+h(N𝒛)ρ(z1,,zc)]{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(N_{\bm{z}})\geq\rho(z_{1},\ldots,z_{c})\big{]}. ∎

D.8 Ratio of averages is bounded by maximum ratio (Lemma 4.3)

Proof of Lemma 4.3.

We show the inequality by contradiction and assume that i𝒮αiAii𝒮αiBi>AjBj\frac{\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}}{\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}}>\frac{A_{j}}{B_{j}} for all j𝒮j\in\mathcal{S}. Given that the denominators are positive, this implies that for any j𝒮j\in\mathcal{S}

(i𝒮αiAi)(αjBj)>(αjAj)(i𝒮αiBi).\big{(}\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}\big{)}\big{(}\alpha_{j}B_{j}\big{)}>\big{(}\alpha_{j}A_{j}\big{)}\big{(}\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}\big{)}.

Summing over j𝒮j\in\mathcal{S} we obtain the following which is a contradiction:

(i𝒮αiAi)(j𝒮αjBj)>(j𝒮αjAj)(i𝒮αiBi)\big{(}\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}\big{)}\big{(}\sum_{j\in\mathcal{S}}\alpha_{j}B_{j}\big{)}>\big{(}\sum_{j\in\mathcal{S}}\alpha_{j}A_{j}\big{)}\big{(}\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}\big{)}

As a result i𝒮αiAii𝒮αiBimaxi𝒮AiBi\frac{\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}}{\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}}\leq\max_{i\in\mathcal{S}}\frac{A_{i}}{B_{i}}. With respect to equality, let jargmaxkAkBkj\in\operatorname*{arg\,max}_{k}\frac{A_{k}}{B_{k}}. Thus,

i𝒮αiAii𝒮αiBi=AjBj.\frac{\sum_{i\in\mathcal{S}}\alpha_{i}A_{i}}{\sum_{i\in\mathcal{S}}\alpha_{i}B_{i}}=\frac{A_{j}}{B_{j}}. (24)

Multiplying by the denominators and rearranging the above can be rewritten as

i𝒮αiBjBi(AiBiAjBj)=0.\sum_{i\in\mathcal{S}}\alpha_{i}B_{j}B_{i}\big{(}\frac{A_{i}}{B_{i}}-\frac{A_{j}}{B_{j}}\big{)}=0. (25)

Since αi>0\alpha_{i}>0, Bi>0B_{i}>0 for all i𝒮i\in\mathcal{S}, and AiBiAjBj\frac{A_{i}}{B_{i}}\leq\frac{A_{j}}{B_{j}} for all i𝒮i\in\mathcal{S} , (25) holds only if AiBi=AjBj\frac{A_{i}}{B_{i}}=\frac{A_{j}}{B_{j}} for all i𝒮i\in\mathcal{S}. For the “if” direction, suppose that AiBi=AjBj\frac{A_{i}}{B_{i}}=\frac{A_{j}}{B_{j}} for all i,ji,j. In particular this holds, when jargmaxkAkBkj\in\operatorname*{arg\,max}_{k}\frac{A_{k}}{B_{k}} is a maximizing index and i𝒮i\in\mathcal{S} is an arbitrary index. This implies that (25) holds for any jargmaxkAkBkj\in\operatorname*{arg\,max}_{k}\frac{A_{k}}{B_{k}} and therefore so does (24), which concludes the proof. ∎

D.9 Comparing optimal dynamic pricing for Newest and Random (Remark 4.4)

We compare the optimal dynamic pricing policies under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} assuming that the customer-specific valuation Θ\Theta is well-behaved (Definition D.2). Recall that Proposition D.1 implies that when Θ\Theta is well-behaved, the optimal dynamic pricing policies under σnewest{\sigma^{\textsc{newest}}} and σrandom{\sigma^{\textsc{random}}} are unique. Let ρnewest\rho^{\textsc{newest}} and ρrandom\rho^{\textsc{random}} be those pricing policies.

Proposition D.2.

Let h¯=𝔼NBinom(c,μ)[h(N)]\overline{h}={\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[h(N)]. For any well-behaved customer-specific valuation Θ\Theta, the unique dynamic pricing policies ρnewest\rho^{\textsc{newest}} and ρrandom\rho^{\textsc{random}} satisfy:

  • ρnewest(𝒛)ρrandom(𝒛)\rho^{\textsc{newest}}(\bm{z})\geq\rho^{\textsc{random}}(\bm{z}) for review states 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} with h(N𝒛)>h¯h(N_{\bm{z}})>\overline{h}

  • ρnewest(𝒛)ρrandom(𝒛)\rho^{\textsc{newest}}(\bm{z})\leq\rho^{\textsc{random}}(\bm{z}) for review states 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} with h(N𝒛)<h¯h(N_{\bm{z}})<\overline{h}

  • ρnewest(𝒛)=ρrandom(𝒛)\rho^{\textsc{newest}}(\bm{z})=\rho^{\textsc{random}}(\bm{z}) review states 𝒛{0,1}c\bm{z}\in\{0,1\}^{c} with h(N𝒛)=h¯h(N_{\bm{z}})=\overline{h}.

Intuitively, Proposition D.2 suggests ρnewest\rho^{\textsc{newest}} charges higher prices in review states 𝒛\bm{z} with “high” ratings and lower prices in review states 𝒛\bm{z} with “low” ratings compared to ρrandom\rho^{\textsc{random}} in order to induce the same purchase probability in every review state.

To prove Proposition D.2, we introduce the revenue maximizing price of selling to a customer with valuation Θ+x\Theta+x for a non-negative scalar x0x\geq 0 i.e. p(Θ+x)=argmaxppΘ[Θ+xp]p^{\star}(\Theta+x)=\operatorname*{arg\,max}_{p\in\mathbb{R}}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p].

We also introduce the function g(x)p(Θ+x)xg(x)\coloneqq p^{\star}(\Theta+x)-x which intuitively captures the smallest idiosyncratic valuation a customer needs to have to purchase the product and is also a proxy for the purchase probability of a customer with valuation Θ+x\Theta+x (since Θ[Θ+xp]=Θ[Θpx]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p]={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq p-x]).

Lemma D.6.

For any well-behaved customer-specific value distribution Θ\Theta, g(x)g(x) is weakly decreasing for x0x\geq 0.

Proof of Proposition D.2.

Theorem 4.3 implies that the optimal dynamic pricing policy under σrandom{\sigma^{\textsc{random}}} is ρrandom(𝒛)=p(Θ+h(N𝒛))\rho^{\textsc{random}}(\bm{z})=p^{\star}(\Theta+h(N_{\bm{z}})) for 𝒛{0,1}c\bm{z}\in\{0,1\}^{c}. Theorem 4.4 implies that the optimal dynamic pricing policy under σnewest{\sigma^{\textsc{newest}}} is ρnewest(𝒛)=h(N𝒛)+p(Θ+h¯)h¯\rho^{\textsc{newest}}(\bm{z})=h(N_{\bm{z}})+p^{\star}(\Theta+\overline{h})-\overline{h}. As a result:

ρnewest(𝒛)ρrandom(𝒛)=(p(Θ+h¯)h¯)(p(Θ+h(N𝒛))h(N𝒛))=g(h¯)g(h(N𝒛)).\rho^{\textsc{newest}}(\bm{z})-\rho^{\textsc{random}}(\bm{z})=(p^{\star}(\Theta+\overline{h})-\overline{h})-(p^{\star}(\Theta+h(N_{\bm{z}}))-h(N_{\bm{z}}))=g(\overline{h})-g(h(N_{\bm{z}})).

By Lemma D.6, gg is monotonically decreasing which implies the result of the proposition. ∎

To prove Lemma D.6, we use an auxiliary lemma which shows that for large enough xx, the revenue-maximizing price p(Θ+x)p^{\star}(\Theta+x) is equal to θ¯+x\underline{\theta}+x (and thus induces a purchase probability of one). Given that Θ\Theta is well-behaved, we let its support be [θ¯,θ¯][\underline{\theta},\overline{\theta}].

Lemma D.7.

For any well-behaved customer-specific value distribution Θ\Theta and x0x\geq 0, there exists some threshold M0M\geq 0 such that p(Θ+x)(θ¯+x,θ¯+x)p^{\star}(\Theta+x)\in(\underline{\theta}+x,\overline{\theta}+x) for x[0,M)x\in[0,M) and p(Θ+x)=θ¯+xp^{\star}(\Theta+x)=\underline{\theta}+x for xMx\geq M.

Proof of Lemma D.7.

First observe that p(Θ+x)θ¯+xp^{\star}(\Theta+x)\geq\underline{\theta}+x since setting a price below θ¯+x\underline{\theta}+x is never optimal. Indeed, prices p<θ¯+xp<\underline{\theta}+x induce a purchase probability of one. Setting a price of p+ϵp+\epsilon for a small enough ϵ\epsilon still induces a purchase probability of one and achieves a strictly higher revenue.

Second, observe that any price pθ¯+xp\geq\overline{\theta}+x induces a purchase probability of zero and thus zero revenue. As Θ\Theta is well-behaved, then so is Θ+x\Theta+x (Lemma D.3). This implies that Θ[Θ+x>0]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x>0]>0 and thus the optimal revenue is strictly positive (as one can find a price p>0p>0 such that Θ[Θ+xp]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p]>0). Thus, p(Θ+x)<θ¯+xp^{\star}(\Theta+x)<\overline{\theta}+x for all x0x\geq 0.

Finally, let S={x0|p(Θ+x)=θ¯+x}S=\{x\geq 0|p^{\star}(\Theta+x)=\underline{\theta}+x\} be the set of increments x0x\geq 0 such that the revenue-maximizing price of selling to a customer with valuation Θ+x\Theta+x is equal to θ¯+x\underline{\theta}+x. Observe that setting a price of θ¯+x\underline{\theta}+x induces a purchase probability of one and thus a revenue of θ¯+x\underline{\theta}+x. Thus, xSx\in S if and only if the optimal revenue of selling to a customer with valuation Θ+x\Theta+x is θ¯+x\underline{\theta}+x. Given that p(Θ+x)[θ¯+x,θ¯+x)p^{\star}(\Theta+x)\in[\underline{\theta}+x,\overline{\theta}+x), to conclude the proof it is sufficient to show that there exists some M0M\geq 0 such that S=[M,+)S=[M,+\infty). To prove this, it suffices to show that SS satisfies two properties: (a) If xSx\in S then xSx^{\prime}\in S for all xxx^{\prime}\geq x; (b) If {xi}i=1\{x_{i}\}_{i=1}^{\infty} is a decreasing sequence with xiSx_{i}\in S and limixi=x\lim_{i\to\infty}x_{i}=x_{\infty}, then xSx_{\infty}\in S.

For an increment xx and price pp, let R(x,p)=pΘ[Θ+xp]R(x,p)=p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p] be the revenue of offering a price pp to a customer with valuation Θ+x\Theta+x. For property (a), let xSx\in S and xxx^{\prime}\geq x. To show that xSx^{\prime}\in S, it is sufficient to show R(x,p)θ¯+xR(x^{\prime},p)\leq\underline{\theta}+x^{\prime} for all prices pp. Expanding we have

R(x,p)\displaystyle R(x^{\prime},p) =pΘ[Θ+xp]=(p(xx)+(xx))Θ[Θ+x+(xx)p]\displaystyle=p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x^{\prime}\geq p]=\big{(}p-(x^{\prime}-x)+(x^{\prime}-x)\big{)}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x+(x^{\prime}-x)\geq p]
=(p(xx))Θ[Θ+xp(xx)]+(xx)Θ[Θ+xp]\displaystyle=\big{(}p-(x^{\prime}-x)\big{)}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p-(x^{\prime}-x)]+(x^{\prime}-x){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x^{\prime}\geq p]
=R(x,p(xx))+(xx)Θ[Θ+xp]\displaystyle=R(x,p-(x^{\prime}-x))+(x^{\prime}-x){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x^{\prime}\geq p]
x+θ¯+(xx)=θ¯+xxS\displaystyle\leq x+\underline{\theta}+(x^{\prime}-x)=\underline{\theta}+x^{\prime}\Rightarrow x^{\prime}\in S

where the last inequality uses R(x,p(xx))θ¯+xR(x,p-(x^{\prime}-x))\leq\underline{\theta}+x since xSx\in S and Θ[Θ+xp]1{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x^{\prime}\geq p]\leq 1.

For property (b), let {xi}i=1\{x_{i}\}_{i=1}^{\infty} be a decreasing sequence with xiSx_{i}\in S and such that limixi=x\lim_{i\to\infty}x_{i}=x_{\infty}. We show that xSx_{\infty}\in S. The continuity of Θ\Theta implies that revenue function R(x,p)R(x,p) is continuous in xx for any fixed price pp. Combining this with the fact that R(xi,p)θ¯+xiR(x_{i},p)\leq\underline{\theta}+x_{i} for all ii, we obtain

R(x,p)=limiR(xi,p)limiθ¯+xi=θ¯+xxS.R(x_{\infty},p)=\lim_{i\to\infty}R(x_{i},p)\leq\lim_{i\to\infty}\underline{\theta}+x_{i}=\underline{\theta}+x_{\infty}\Rightarrow x_{\infty}\in S.

Proof of Lemma D.6.

Lemma D.7 shows that p(Θ+x)(θ¯+x,θ¯+x)p^{\star}(\Theta+x)\in(\underline{\theta}+x,\overline{\theta}+x) for x[0,M)x\in[0,M) and p(Θ+x)=θ¯+xp^{\star}(\Theta+x)=\underline{\theta}+x for xMx\geq M for some threshold M0M\geq 0. Thus, g(x)(θ¯,θ¯)g(x)\in(\underline{\theta},\overline{\theta}) for x[0,M)x\in[0,M) and g(x)=θ¯g(x)=\underline{\theta} for x[M,+)x\in[M,+\infty). It thus suffices to show that g(x)g(x) is monotonically decreasing for x[0,M)x\in[0,M). The continuity of Θ\Theta implies that the function pΘ[Θ+xp]p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p] is differentiable for p(θ¯+x,θ¯+x)p\in(\underline{\theta}+x,\overline{\theta}+x) and thus the first-order conditions must be satisfied at p=p(Θ+x)(θ¯+x,θ¯+x)p=p^{\star}(\Theta+x)\in(\underline{\theta}+x,\overline{\theta}+x). Denoting the survival and density functions of Θ\Theta by F¯\overline{F} and ff respectively, this yields

ddppΘ[Θ+xp]=ddppF¯(px)=F¯(px)pf(px)=0\frac{d}{dp}p{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p]=\frac{d}{dp}p\overline{F}(p-x)=\overline{F}(p-x)-pf(p-x)=0

for p=p(Θ+x)p=p^{\star}(\Theta+x). Rearranging the last equation yields

p(Θ+x)xF¯(p(Θ+x)x)f(p(Θ+x)x)=xg(x)F¯(g(x))f(g(x))=xp^{\star}(\Theta+x)-x-\frac{\overline{F}(p^{\star}(\Theta+x)-x)}{f(p^{\star}(\Theta+x)-x)}=-x\quad\iff\quad g(x)-\frac{\overline{F}(g(x))}{f(g(x))}=-x (26)

for any x[0,M)x\in[0,M). The strict regularity of Θ\Theta implies that the function uuF¯(u)f(u)u\to u-\frac{\overline{F}(u)}{f(u)} is strictly increasing. Thus, the left-hand side of (26) is increasing in u=g(x)u=g(x) while the right-hand side is strictly decreasing in xx. Hence, u=g(x)u=g(x) is decreasing for x[0,M)x\in[0,M) concluding the proof. ∎

D.10 Platforms unaware of state-depending behavior (Theorem 4.6)

Proof of Theorem 4.6.

Suppose that \mathcal{F} has support on [0,θ¯][0,\overline{\theta}]. Then Theorem 4.1 gives

Rev(σrandom,Πstatic)μch(c)h(0)+θ¯Rev(σnewest,Πstatic).{\textsc{Rev}}(\sigma^{\textsc{random}},\Pi^{\textsc{static}})\geq\frac{\mu^{c}h(c)}{h(0)+\overline{\theta}}{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}}). (27)

Since \mathcal{F} is non-negative, Theorem 4.2 gives

Rev(σnewest,Πdynamic)12Rev(σrandom,Πdynamic).{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}})\geq\frac{1}{2}{\textsc{Rev}}(\sigma^{\textsc{random}},\Pi^{\textsc{dynamic}}). (28)

Combining (27) and (28) with the fact that Rev(σrandom,Πdynamic)Rev(σrandom,Πstatic){\textsc{Rev}}(\sigma^{\textsc{random}},\Pi^{\textsc{dynamic}})\geq{\textsc{Rev}}(\sigma^{\textsc{random}},\Pi^{\textsc{static}}),

Rev(σnewest,Πdynamic)Rev(σnewest,Πstatic)μch(c)2(h(0)+θ¯).\frac{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}})}{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}})}\geq\frac{\mu^{c}h(c)}{2(h(0)+\overline{\theta})}.

For any M>0M>0 there exist ϵ(M)>0\epsilon(M)>0 such that when θ¯,h(0)<ϵ(M)\overline{\theta},h(0)<\epsilon(M), Rev(σnewest,Πdynamic)Rev(σnewest,Πstatic)>M\frac{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{dynamic}})}{{\textsc{Rev}}(\sigma^{\textsc{newest}},\Pi^{\textsc{static}})}>M. ∎

Appendix E Supplementary material for Section 5

E.1 Limiting behavior of σrandom(w)\sigma^{\textsc{random}(w)} for discounting customers (Section 5.1)

Theorem E.1.

For any discount factor γ<1\gamma<1, limwRevγ(σrandom(w),p)=Revγ(σrandom,p)\lim_{w\to\infty}\textsc{Rev}_{\gamma}(\sigma^{\textsc{random}(w)},p)={\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p).

To prove the theorem, for the ii-th most recent review we denote by yi=(zi,si)y_{i}=(z_{i},s_{i}) its information where zi{0,1}z_{i}\in\{0,1\} is the review rating and sis_{i}\in\mathbb{N} is the number of rounds elapsed since posting the review. For a vector 𝒚=(y1,,yw)\bm{y}=(y_{1},\ldots,y_{w}) of the most recent ww reviews and their information, let 𝑹(𝒚)=(yi1,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots y_{i_{c}}) be the random variable of the information from the cc review chosen by σrandom(w)\sigma^{\textsc{random}(w)} i.e. uniformly at random without replacement from {y1,,yw}\{y_{1},\ldots,y_{w}\}. Given cc reviews with their information 𝑹(𝒚)=(yi1,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots y_{i_{c}}), the purchase probability of the customer is

fγ(𝑹(𝒚))=Θ[Θ+h(Beta(a+j=1cγsij1zij,b+j=1cγsij1(1zij)))p].f_{\gamma}\big{(}\bm{R}(\bm{y})\big{)}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}z_{i_{j}},b+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}(1-z_{i_{j}}))\big{)}\geq p\Big{]}.

For any round tt, let 𝒀𝒕=((Zt,1,St,1),(Zt,w,St,w))({0,1}×)w\bm{Y_{t}}=\big{(}(Z_{t,1},S_{t,1})\ldots,(Z_{t,w},S_{t,w})\big{)}\in(\{0,1\}\times\mathbb{N})^{w} denote the most recent ww reviews comprising the ii-th most recent review rating Zt,i{0,1}Z_{t,i}\in\{0,1\} and number of rounds St,iS_{t,i}\in\mathbb{N} elapsed since posting the review. Notice that if 𝒀𝒕\bm{Y_{t}} is at state 𝒚=(y1,,yw)({0,1}×)w\bm{y}=(y_{1},\ldots,y_{w})\in(\{0,1\}\times\mathbb{N})^{w}, the ex-ante purchase probability is equal to 𝔼[fγ(𝑹(𝒚))]{\mathbb{E}}\big{[}f_{\gamma}\big{(}\bm{R}(\bm{y})\big{)}\big{]} where the expectation is over the uniform randomly chosen cc-sized subset (i1,,ic){1,,w}(i_{1},\ldots,i_{c})\subseteq\{1,\ldots,w\}, selected by σrandom(w)\sigma^{\textsc{random}(w)}, which determines the random variable 𝑹(𝒚)=(yi1,,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots,y_{i_{c}}). The next lemma shows that, for any state of the ww most recent reviews 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}, the ex-ante purchase probability at 𝒚\bm{y} concentrates around the purchase probability of σrandom\sigma^{\textsc{random}}, which we denote by qrandomΘ[Θ+h(Beta(a,b))p]q^{\textsc{random}}\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta+h(\mathrm{Beta}(a,b))\geq p\big{]}.

Lemma E.1.

For any discount factor γ<1\gamma<1 and any ϵ>0\epsilon>0, there exists some threshold k(ϵ)k(\epsilon) such that for any window size w>k(ϵ)w>k(\epsilon) and state of reviews 𝐲{0,1}w×w\bm{y}\in\{0,1\}^{w}\times\mathbb{N}^{w}, it holds that

(qrandomϵ)(wk(ϵ)c)(wc)𝔼[fγ(𝑹(𝒚))](qrandom+ϵ)(wk(ϵ)c)(wc)+(1(wk(ϵ)c)(wc)).\big{(}q^{\textsc{random}}-\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\leq{\mathbb{E}}\big{[}f_{\gamma}\big{(}\bm{R}(\bm{y})\big{)}\big{]}\leq\big{(}q^{\textsc{random}}+\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}+\Big{(}1-\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\Big{)}.
Proof of Theorem E.1.

By the law of total expectation, the revenue of σrandom(w)\sigma^{\textsc{random}(w)} is given by (1):

Revγ(σrandom(w),p)\displaystyle\textsc{Rev}_{\gamma}(\sigma^{\textsc{random}(w)},p) =plim infT𝔼[t=1T𝔼[fγ(𝑹(𝒀t))]T].\displaystyle=p\liminf_{T\to\infty}{\mathbb{E}}\Bigg{[}\frac{\sum_{t=1}^{T}{\mathbb{E}}\big{[}f_{\gamma}\big{(}\bm{R}(\bm{Y}_{t})\big{)}\big{]}}{T}\Bigg{]}.

By Lemma E.1 for 𝒚=𝒀t\bm{y}=\bm{Y}_{t}, summing for t=1,,Tt=1,\ldots,T, and taking the outer expectation yields:

(qrandomϵ)(wk(ϵ)c)(wc)𝔼[t=1T𝔼[fγ(𝑹(𝒀t))]T](qrandom+ϵ)(wk(ϵ)c)(wc)+(1(wk(ϵ)c)(wc)).\big{(}q^{\textsc{random}}-\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\leq{\mathbb{E}}\Bigg{[}\frac{\sum_{t=1}^{T}{\mathbb{E}}\big{[}f_{\gamma}\big{(}\bm{R}(\bm{Y}_{t})\big{)}\big{]}}{T}\Bigg{]}\leq\big{(}q^{\textsc{random}}+\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}+\Big{(}1-\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\Big{)}.

Taking a limit infinum and applying the above bounds we get:

p(qrandomϵ)(wk(ϵ)c)(wc)Revγ(σrandom(w),p)p((qrandom+ϵ)(wk(ϵ)c)(wc)+(1(wk(ϵ)c)(wc))).p\cdot\big{(}q^{\textsc{random}}-\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\leq\textsc{Rev}_{\gamma}(\sigma^{\textsc{random}(w)},p)\leq p\cdot\Bigg{(}\big{(}q^{\textsc{random}}+\epsilon\big{)}\cdot\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}+\Big{(}1-\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}\Big{)}\Bigg{)}.

Given that limw(wk(ϵ)c)(wc)=1\lim_{w\to\infty}\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}}=1, p(qrandomϵ)limw+Revγ(σrecent(w),p)p(qrandom+ϵ)p\cdot\big{(}q^{\textsc{random}}-\epsilon\big{)}\leq\lim_{w\to+\infty}\textsc{Rev}_{\gamma}(\sigma^{\textsc{recent}(w)},p)\leq p\cdot\big{(}q^{\textsc{random}}+\epsilon\big{)}. Taking ϵ0\epsilon\to 0, establishes that limw+Revγ(σrecent(w),p)=pqrandom\lim_{w\to+\infty}\textsc{Rev}_{\gamma}(\sigma^{\textsc{recent}(w)},p)=p\cdot q^{\textsc{random}}, which is equal to Revγ(σrandom,p)\textsc{Rev}_{\gamma}(\sigma^{\textsc{random}},p) as the reviews shown by σrandom{\sigma^{\textsc{random}}} do not affect the customer’s posterior belief. ∎

Proof of Lemma E.1.

As γ<1\gamma<1, for any review information vector 𝒓=(yi1,,yic)\bm{r}=(y_{i_{1}},\ldots,y_{i_{c}}) the contribution γsij1zij\gamma^{s_{i_{j}}-1}z_{i_{j}} in the customer’s belief from the jj-th review goes to 0 as sij+s_{i_{j}}\to+\infty. Therefore the customer’s posterior beliefs (a+j=1cγsij1zij,b+j=1cγsij1(1zij))(a+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}z_{i_{j}},b+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}(1-z_{i_{j}})) converge to the prior (a,b)(a,b) as si1,,sics_{i_{1}},\ldots,s_{i_{c}}\to\infty. The continuity of the customer-specific value distribution \mathcal{F} implies that the purchase probability Θ[Θ+xp]{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+x\geq p] is continuous in xx. Together with the continuity of h(Beta(x,y))h(\mathrm{Beta}(x,y)) in (x,y)(x,y) this implies that for any review ratings (zi1,,zic){0,1}c(z_{i_{1}},\ldots,z_{i_{c}})\in\{0,1\}^{c}, the purchase probability for any set of cc reviews shown converges the purchase probability under σrandom{\sigma^{\textsc{random}}}, i.e.,

fγ(𝒓)=Θ[Θ+h(Beta(a+j=1cγsij1zij,b+j=1cγsij1(1zij)))p]qrandom\displaystyle f_{\gamma}\big{(}\bm{r}\big{)}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}z_{i_{j}},b+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}(1-z_{i_{j}}))\big{)}\geq p\Big{]}\to q^{\textsc{random}}

as si1,,sics_{i_{1}},\ldots,s_{i_{c}}\to\infty. Thus, for any ϵ>0\epsilon>0, there exists a threshold k(ϵ)k(\epsilon), such that for any cc reviews with information 𝒓=((zi1,si1),(zic,sic))\bm{r}=\big{(}(z_{i_{1}},s_{i_{1}})\ldots,(z_{i_{c}},s_{i_{c}})\big{)} if si1,,sic>k(ϵ)s_{i_{1}},\ldots,s_{i_{c}}>k(\epsilon) it holds that |fγ(𝒓)qrandom|ϵ\Big{|}f_{\gamma}\big{(}\bm{r}\big{)}-q^{\textsc{random}}\Big{|}\leq\epsilon.

Let 𝒚=((z1,s1),,(sw,zw))({0,1}×)w\bm{y}=\big{(}(z_{1},s_{1}),\ldots,(s_{w},z_{w})\big{)}\in(\{0,1\}\times\mathbb{N})^{w} where 1s1<<sw1\leq s_{1}<\ldots<s_{w}. We define the event that all of the selected reviews indices in 𝑹(𝒚)=(yi1,,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots,y_{i_{c}}) chosen by σrandom(w)\sigma^{\textsc{random}(w)} are greater the threshold k(ϵ)k(\epsilon), i.e., old(ϵ)={j{1,,c}:ijk(ϵ)}\mathcal{E}^{\textsc{old}}(\epsilon)=\{\forall j\in\{1,\ldots,c\}:i_{j}\geq k(\epsilon)\}. When this event holds, the time elapsed since the review’s posting will be greater than k(ϵ)k(\epsilon) as sijijk(ϵ)s_{i_{j}}\geq i_{j}\geq k(\epsilon) and 1s1<<sw1\leq s_{1}<\ldots<s_{w}. As a result, for the chosen ϵ\epsilon and for realizations 𝑹(𝒚)\bm{R}(\bm{y}) such that old(ϵ)\mathcal{E}^{\textsc{old}}(\epsilon) holds we have:

qrandomϵfγ(𝑹(𝒚))qrandom+ϵqrandomϵ𝔼[fγ(𝑹(𝒚))|old(ϵ)]qrandom+ϵ.q^{\textsc{random}}-\epsilon\leq f_{\gamma}(\bm{R}(\bm{y}))\leq q^{\textsc{random}}+\epsilon\Rightarrow q^{\textsc{random}}-\epsilon\leq{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))|\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}\leq q^{\textsc{random}}+\epsilon. (29)

Using the law of total expectation

𝔼[fγ(𝑹(𝒚))]=𝔼[fγ(𝑹(𝒚))|old(ϵ)][qrandomϵ,qrandom+ϵ] by (29)[old(ϵ)]+𝔼[fγ(𝑹(𝒚))|¬old(ϵ)][0,1] as fγ(𝑹(𝒚))[0,1](1[old(ϵ)]).{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))\Big{]}=\underbrace{{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))|\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}}_{\in[q^{\textsc{random}}-\epsilon,q^{\textsc{random}}+\epsilon]\text{ by $\eqref{ineq: upper_lower_prob_purchase_discount}$}}{\mathbb{P}}\Big{[}\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}+\underbrace{{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))|\neg\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}}_{\in[0,1]\text{ as }f_{\gamma}(\bm{R}(\bm{y}))\in[0,1]}\Big{(}1-{\mathbb{P}}\Big{[}\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}\Big{)}.

where [old(ϵ)]=(wk(ϵ)c)(wc){\mathbb{P}}\Big{[}\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}=\frac{\binom{w-k(\epsilon)}{c}}{\binom{w}{c}} as there are (wc)\binom{w}{c} ways to choose cc distinct indices from {1,,w}\{1,\ldots,w\} but exactly (wk(ϵ)c)\binom{w-k(\epsilon)}{c} satisfy that ijk(ϵ)i_{j}\geq k(\epsilon) for all j{1,,c}j\in\{1,\ldots,c\}. The proof follows by taking the lower (resp. upper) bounds on 𝔼[fγ(𝑹(𝒚))|old(ϵ)]{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))|\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]} and 𝔼[fγ(𝑹(𝒚))|¬old(ϵ)]{\mathbb{E}}\Big{[}f_{\gamma}(\bm{R}(\bm{y}))|\neg\mathcal{E}^{\textsc{old}}(\epsilon)\Big{]}. ∎

E.2 Instances inducing review-benefiting prices (Discussion on Definition 5.1)

In this section we consider a class of instances 𝒞\mathcal{C} where the platform shows a single review (c=1)(c=1), the customer-specific distribution is uniform, i.e., =𝒰[θ¯,θ¯]\mathcal{F}=\mathcal{U}[\underline{\theta},\overline{\theta}], and the estimator mapping h(Beta(x,y))h\big{(}\mathrm{Beta}(x,y)\big{)} is increasing in the mean xx+y\frac{x}{x+y} of the Beta(x,y)\mathrm{Beta}(x,y) distribution. We say that the instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) is review-monotonic if for any w>w~w>\tilde{w} with w,w~[0,1]w,\tilde{w}\in[0,1]

μh(Beta(a+w,b))+(1μ)h(Beta(a,b+w))>μh(Beta(a+w~,b))+(1μ)h(Beta(a,b+w~)).\mu h\big{(}\mathrm{Beta}(a+w,b)\big{)}+(1-\mu)h\big{(}\mathrm{Beta}(a,b+w)\big{)}>\mu h\big{(}\mathrm{Beta}(a+\tilde{w},b)\big{)}+(1-\mu)h\big{(}\mathrm{Beta}(a,b+\tilde{w})\big{)}.
Lemma E.2.

For any review-monotonic problem instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) in the class 𝒞\mathcal{C}, and any price p[θ¯+h(Beta(a+1,b)),θ¯+h(Beta(a,b+1))]p\in\Big{[}\underline{\theta}+h\big{(}\mathrm{Beta}(a+1,b)\big{)},\overline{\theta}+h\big{(}\mathrm{Beta}(a,b+1)\big{)}\Big{]}, pp is review-benefiting for (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h).

Proof of Lemma E.2.

To show that p[θ¯+h(Beta(a+1,b)),θ¯+h(Beta(a,b+1))]p\in\Big{[}\underline{\theta}+h\big{(}\mathrm{Beta}(a+1,b)\big{)},\overline{\theta}+h\big{(}\mathrm{Beta}(a,b+1)\big{)}\Big{]} is review-benefiting, expanding Definition 5.1 (for c=1c=1), it suffices that for any w>w~w>\tilde{w} with w,w~[0,1]w,\tilde{w}\in[0,1],

μΘ[Θ+h(Beta(a+w,b))p]+(1μ)Θ[Θ+h(Beta(a,b+w))p]\displaystyle\mu{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a+w,b))\geq p]+(1-\mu){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b+w))\geq p]
>μΘ[Θ+h(Beta(a+w~,b))p]+(1μ)Θ[Θ+h(Beta(a,b+w~))p].\displaystyle>\mu{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a+\tilde{w},b))\geq p]+(1-\mu){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b+\tilde{w}))\geq p]. (30)

Let H={h(Beta(a+w,b)),h(Beta(a+w~,b)),h(Beta(a,b+w~)),h(Beta(a,b+w))}H=\{h\big{(}\mathrm{Beta}(a+w,b)\big{)},h\big{(}\mathrm{Beta}(a+\tilde{w},b)\big{)},h\big{(}\mathrm{Beta}(a,b+\tilde{w})\big{)},h\big{(}\mathrm{Beta}(a,b+w)\big{)}\}. Since the instance is in 𝒞\mathcal{C}, h(Beta(x,y))h(\mathrm{Beta}(x,y)) is increasing in the mean xx+y\frac{x}{x+y} and w,w~1w,\tilde{w}\leq 1, it holds that h(Beta(a,b+1))h^h(Beta(a+1,b))h\big{(}\mathrm{Beta}(a,b+1)\big{)}\leq\hat{h}\leq h\big{(}\mathrm{Beta}(a+1,b)\big{)} and thus θ¯+h^pθ¯+h^\underline{\theta}+\hat{h}\leq p\leq\overline{\theta}+\hat{h} for any estimate h^H\hat{h}\in H. As the support of Θ+h^\Theta+\hat{h} is [θ¯+h^,θ¯+h^][\underline{\theta}+\hat{h},\overline{\theta}+\hat{h}], the purchase probability of selling to a customer with valuation Θ+h^\Theta+\hat{h} is given by Θ[Θ+h^p]=θ¯p+h^θ¯θ¯{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}\geq p]=\frac{\overline{\theta}-p+\hat{h}}{\overline{\theta}-\underline{\theta}}. Substituting this expression for h^H\hat{h}\in H in (30) and cancelling the common terms on both sides of the equation, we obtain the following inequality,

μh(Beta(a+w,b))+(1μ)h(Beta(a,b+w))>μh(Beta(a+w~,b))+(1μ)h(Beta(a,b+w~)).\mu h\big{(}\mathrm{Beta}(a+w,b)\big{)}+(1-\mu)h\big{(}\mathrm{Beta}(a,b+w)\big{)}>\mu h\big{(}\mathrm{Beta}(a+\tilde{w},b)\big{)}+(1-\mu)h\big{(}\mathrm{Beta}(a,b+\tilde{w})\big{)}.

which holds as the problem instance is review-monotonic. ∎

Let 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} be the subclass of instances in 𝒞\mathcal{C} where customers are a) pessimistic in estimating the fixed valuation, i.e., h(Beta(x,y))h(\mathrm{Beta}(x,y)) is the ϕ\phi-quantile of Beta(x,y)\mathrm{Beta}(x,y) for ϕ(0,0.5)\phi\in(0,0.5) and b) have a correct =prior mean, i.e., μ=0.5\mu=0.5 and a=b=1a=b=1.

Proposition E.1.

For any instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) in the class 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} and any price p[θ¯+h(Beta(a+1,b)),θ¯+h(Beta(a,b+1))]p\in\Big{[}\underline{\theta}+h\big{(}\mathrm{Beta}(a+1,b)\big{)},\overline{\theta}+h\big{(}\mathrm{Beta}(a,b+1)\big{)}\Big{]}, pp is review-benefiting for (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h).

Recalling that for any instance in 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}}, h(Beta(x,y))h(\mathrm{Beta}(x,y)) is the ϕ\phi-quantile of Beta(x,y)\mathrm{Beta}(x,y), the following lemma shows a closed form expression for h(Beta(1+w,1))h\big{(}\mathrm{Beta}(1+w,1)\big{)} and h(Beta(1,1+w))h\big{(}\mathrm{Beta}(1,1+w)\big{)}.

Lemma E.3.

For any quantile ϕ(0,0.5)\phi\in(0,0.5) and weight w[0,1]w\in[0,1], the ϕ\phi-quantiles of Beta(1+w,1)\mathrm{Beta}(1+w,1) and Beta(1,1+w)\mathrm{Beta}(1,1+w) are given by h(Beta(1+w,1))=ϕ1w+1h\big{(}\mathrm{Beta}(1+w,1)\big{)}=\phi^{\frac{1}{w+1}} and h(Beta(1,1+w))=1(1ϕ)1w+1h\big{(}\mathrm{Beta}(1,1+w)\big{)}=1-(1-\phi)^{\frac{1}{w+1}}.

Proof.

For any w[0,1]w\in[0,1], the density of Beta(1+w,1)\mathrm{Beta}(1+w,1) is fBeta(1+w,1)(x)=(w+1)xwf_{\mathrm{Beta}(1+w,1)}(x)=(w+1)x^{w}, and its ϕ\phi-quantile is thus given by solving the equation

0h(Beta(1+w,1))(w+1)xw𝑑x=ϕh(Beta(1+w,1))w+1=ϕh(Beta(1+w,1))=ϕ1w+1.\int_{0}^{h\big{(}\mathrm{Beta}(1+w,1)\big{)}}(w+1)x^{w}dx=\phi\iff h\big{(}\mathrm{Beta}(1+w,1)\big{)}^{w+1}=\phi\iff h\big{(}\mathrm{Beta}(1+w,1)\big{)}=\phi^{\frac{1}{w+1}}.

Similarly, the density of Beta(1,1+w)\mathrm{Beta}(1,1+w) is fBeta(1,1+w)(x)=1(w+1)(1x)xwf_{\mathrm{Beta}(1,1+w)}(x)=1-(w+1)(1-x)x^{w} and its ϕ\phi-quantile is thus given by solving the equation

0h(Beta(1,1+w))(1(w+1)(1x)xw)𝑑x\displaystyle\int_{0}^{h\big{(}\mathrm{Beta}(1,1+w)\big{)}}(1-(w+1)(1-x)x^{w})dx =ϕ1(1h(Beta(1,1+w)))w+1=ϕ\displaystyle=\phi\iff 1-(1-h\big{(}\mathrm{Beta}(1,1+w)\big{)})^{w+1}=\phi\iff
h(Beta(1,1+w))\displaystyle h\big{(}\mathrm{Beta}(1,1+w)\big{)} =1(1ϕ)1w+1\displaystyle=1-(1-\phi)^{\frac{1}{w+1}}

Proof of Proposition E.1.

To prove the proposition, by Lemma E.2 it suffices to show that any instance in the class 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} is review-monotonic. The rest of the proof focuses on that.

Using the closed form for the ϕ\phi-quantiles of Beta(1+w,1)\mathrm{Beta}(1+w,1) and Beta(1,1+w)\mathrm{Beta}(1,1+w) given by Lemma E.3 and the customer’s correct prior mean (i.e. μ=0.5\mu=0.5 and a=b=1a=b=1) the instance is review-monotonic if and only if for any 1ww~01\geq w\geq\tilde{w}\geq 0:

0.5h(Beta(1+w,1))+0.5h(Beta(1,1+w))\displaystyle 0.5\cdot h\big{(}\mathrm{Beta}(1+w,1)\big{)}+0.5\cdot h\big{(}\mathrm{Beta}(1,1+w)\big{)} >0.5h(Beta(1+w~,1))+0.5h(Beta(1,1+w~))\displaystyle>0.5\cdot h\big{(}\mathrm{Beta}(1+\tilde{w},1)\big{)}+0.5\cdot h\big{(}\mathrm{Beta}(1,1+\tilde{w})\big{)}
ϕ1w+1+1(1ϕ)1w+1\displaystyle\iff\phi^{\frac{1}{w+1}}+1-(1-\phi)^{\frac{1}{w+1}} >ϕ1w~+1+1(1ϕ)1w~+1\displaystyle>\phi^{\frac{1}{\tilde{w}+1}}+1-(1-\phi)^{\frac{1}{\tilde{w}+1}}
ϕ1w+1(1ϕ)1w+1\displaystyle\iff\phi^{\frac{1}{w+1}}-(1-\phi)^{\frac{1}{w+1}} >ϕ1w~+1(1ϕ)1w~+1.\displaystyle>\phi^{\frac{1}{\tilde{w}+1}}-(1-\phi)^{\frac{1}{\tilde{w}+1}}.

As 1w+1<1w~+1\frac{1}{w+1}<\frac{1}{\tilde{w}+1}, to show the above it suffices to prove that for any ϕ(0,0.5)\phi\in(0,0.5), the function g(x)ϕx(1ϕ)xg(x)\coloneqq\phi^{x}-(1-\phi)^{x} is strictly decreasing for x[0,1]x\in[0,1], or equivalently its derivative ddxg(x)<0\frac{d}{dx}g(x)<0 for x[0,1]x\in[0,1]. The derivative of gg is given by

ddxg(x)=ϕxlog(ϕ)(1ϕ)xlog(1ϕ)=ϕxlog(ϕ)(1(1ϕϕ)xlog(1ϕ)log(ϕ)).\frac{d}{dx}g(x)=\phi^{x}\log(\phi)-(1-\phi)^{x}\log(1-\phi)=\phi^{x}\log(\phi)\Bigg{(}1-(\frac{1-\phi}{\phi})^{x}\frac{\log(1-\phi)}{\log(\phi)}\Bigg{)}.

Since log(ϕ)<0\log(\phi)<0, to show that ddxg(x)<0\frac{d}{dx}g(x)<0 it suffices to prove that 1>(1ϕϕ)xlog(1ϕ)log(ϕ)1>(\frac{1-\phi}{\phi})^{x}\frac{\log(1-\phi)}{\log(\phi)} for all x[0,1]x\in[0,1]. Given that the right-hand side is increasing in x[0,1]x\in[0,1] for ϕ(0,0.5)\phi\in(0,0.5), it suffices to prove the above holds for x=1x=1. Letting u(ϕ)(1ϕ)log(1ϕ)ϕlog(ϕ)u(\phi)\coloneqq(1-\phi)\log(1-\phi)-\phi\log(\phi) this holds when u(ϕ)>0u(\phi)>0 for ϕ(0,0.5)\phi\in(0,0.5). The derivative ddϕu(ϕ)=log(1ϕ(1ϕ)e2)\frac{d}{d\phi}u(\phi)=\log(\frac{1}{\phi(1-\phi)e^{2}}) is decreasing for ϕ(0,0.5)\phi\in(0,0.5) and thus u(ϕ)u(\phi) is concave for ϕ(0,0.5)\phi\in(0,0.5). The concavity of u(ϕ)u(\phi), combined with limϕ0+u(ϕ)=0\lim_{\phi\to 0^{+}}u(\phi)=0 and limϕ12u(ϕ)=0\lim_{\phi\to\frac{1}{2}}u(\phi)=0 yields that u(ϕ)>0u(\phi)>0 for all ϕ(0,0.5)\phi\in(0,0.5) as desired. ∎

Let 𝒞NegBiasPrior\mathcal{C}^{\textsc{NegBiasPrior}} be the subclass of 𝒞\mathcal{C} where a) are risk-neutral in estimating the fixed valuation, i.e., h(Beta(x,y))=xx+yh(\mathrm{Beta}(x,y))=\frac{x}{x+y} and b) have a negatively biased prior mean, i.e., aa+bμ\frac{a}{a+b}\leq\mu.

Proposition E.2.

For any instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) in the class 𝒞NegBiasPrior\mathcal{C}^{\textsc{NegBiasPrior}} and any price p[θ¯+h(Beta(a+1,b)),θ¯+h(Beta(a,b+1))]p\in\Big{[}\underline{\theta}+h\big{(}\mathrm{Beta}(a+1,b)\big{)},\overline{\theta}+h\big{(}\mathrm{Beta}(a,b+1)\big{)}\Big{]}, pp is review-benefiting for (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h).

Proof of Proposition E.2.

By Lemma E.2 it suffices to show that any instance in the class 𝒞NegBiasPrior\mathcal{C}^{\textsc{NegBiasPrior}} is review-monotonic. Given that customers are risk-neutral, h(Beta(x,y))=xx+yh\big{(}\mathrm{Beta}(x,y)\big{)}=\frac{x}{x+y} for (x,y){(a+w,b),(a,b+w),(a+w~,b),(a,b+w~)}(x,y)\in\{(a+w,b),(a,b+w),(a+\tilde{w},b),(a,b+\tilde{w})\}. As a result, to prove that the instance is review-monotonic it suffices to show for all w>w~w>\tilde{w} with w,w~[0,1]w,\tilde{w}\in[0,1]:

μh(Beta(a+w,b))\displaystyle\mu h\big{(}\mathrm{Beta}(a+w,b)\big{)} +(1μ)h(Beta(a,b+w))>μh(Beta(a+w~,b))+(1μ)h(Beta(a,b+w~))\displaystyle+(1-\mu)h\big{(}\mathrm{Beta}(a,b+w)\big{)}>\mu h\big{(}\mathrm{Beta}(a+\tilde{w},b)\big{)}+(1-\mu)h\big{(}\mathrm{Beta}(a,b+\tilde{w})\big{)}
μa+wa+b+w+(1μ)aa+b+w>μa+w~a+b+w~+(1μ)aa+b+w~\displaystyle\iff\mu\frac{a+w}{a+b+w}+(1-\mu)\frac{a}{a+b+w}>\mu\frac{a+\tilde{w}}{a+b+\tilde{w}}+(1-\mu)\frac{a}{a+b+\tilde{w}}
a+μwa+b+w>a+μw~a+b+w~aw~+μw(a+b)>aw+μw~(a+b)\displaystyle\iff\frac{a+\mu w}{a+b+w}>\frac{a+\mu\tilde{w}}{a+b+\tilde{w}}\iff a\tilde{w}+\mu w(a+b)>aw+\mu\tilde{w}(a+b)
μ(a+b)(ww~)>a(ww~)μ>aa+b.\displaystyle\iff\mu(a+b)(w-\tilde{w})>a(w-\tilde{w})\iff\mu>\frac{a}{a+b}.

The last condition (μaa+b\mu\geq\frac{a}{a+b}) holds as the customer’s prior mean is negatively biased, concluding the proof. ∎

E.3 Newest First maximizes revenue with extreme discounting (Theorem 5.1)

Definition E.1.

Let yi=(zi,si)y_{i}=(z_{i},s_{i}) be the information of the ii-th review shown by the platform, where zi{0,1}z_{i}\in\{0,1\} is the review rating and sis_{i}\in\mathbb{N} is the number of rounds elapsed since posting. A price pp is strongly non-absorbing if for any information vector (y1,,yc)({0,1}×)c(y_{1},\ldots,y_{c})\in(\{0,1\}\times\mathbb{N})^{c} and any discount factor γ[0,1]\gamma\in[0,1], the purchase probability lies in (0,1)(0,1), i.e.,

0<Θ[Θ+h(Beta(a+i=1cγsi1zi,b+i=1cγsi1(1zi)))p]<1.0<{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\Big{(}\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}\gamma^{s_{i}-1}z_{i},b+\sum_{i=1}^{c}\gamma^{s_{i}-1}(1-z_{i})\big{)}\Big{)}\geq p\Big{]}<1.

When γ=0\gamma=0, only the review from the last round counts and thus the expression simplifies to 0<Θ[Θ+h^0]<10<{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+\hat{h}\geq 0]<1 where h^{h(Beta(a,b)),h(Beta(a+1,b)),h(Beta(a,b+1))}\hat{h}\in\Big{\{}h\big{(}\mathrm{Beta}(a,b)\big{)},h\big{(}\mathrm{Beta}(a+1,b)\big{)},h\big{(}\mathrm{Beta}(a,b+1)\big{)}\Big{\}}.

For a fixed window wcw\geq c and any round tt, let 𝒀𝒕=((Zt,1,St,1),(Zt,w,St,w))({0,1}×)w\bm{Y_{t}}=\big{(}(Z_{t,1},S_{t,1})\ldots,(Z_{t,w},S_{t,w})\big{)}\in(\{0,1\}\times\mathbb{N})^{w} denote the information of the ww most recent reviews comprising the ii-th most recent review rating Zt,i{0,1}Z_{t,i}\in\{0,1\} and number of rounds St,iS_{t,i}\in\mathbb{N} elapsed since posting the review. We define the random process UtU_{t} to capture the state of reviews. If the newest review came from the previous round (St,1=1S_{t,1}=1), then Ut=1U_{t}=1 if Zt,1=1Z_{t,1}=1 (positive review) and Ut=0U_{t}=0 if Zt,1=0Z_{t,1}=0 (negative review). If no review was posted in the previous round (St,1=2S_{t,1}=2), then the customer does not take reviews into account and thus Ut=U_{t}=\perp. In the latter case, the cc reviews shown by σrandom(w)\sigma^{\textsc{random}(w)} do not contain a review from the previous round and hence the purchase probability is

q=Θ[Θ+h(Beta(a,b))p].q_{\perp}={\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b))\geq p].

When Ut{0,1}U_{t}\in\{0,1\} the purchase probability can be different from qq_{\perp} if the newest review is among the cc reviews that were selected. This happens with probability (w1c1)(wc)=cw\frac{\binom{w-1}{c-1}}{\binom{w}{c}}=\frac{c}{w} as there are (wc)\binom{w}{c} ways to choose cc reviews from the most recent ww but only (w1c1)\binom{w-1}{c-1} of them contain the newest review. Letting h0=h(Beta(a,b+1))h_{0}=h(\mathrm{Beta}(a,b+1)) and h1=h(Beta(a+1,b))h_{1}=h(\mathrm{Beta}(a+1,b)), when z{0,1}z\in\{0,1\}, the purchase probability when Ut=zU_{t}=z is thus

qz=cwΘ[Θ+hzp]+(1cw)q.q_{z}=\frac{c}{w}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h_{z}\geq p]+(1-\frac{c}{w})q_{\perp}.

The next lemma characterizes the stationary distribution of the Markov chain UtU_{t}.

Lemma E.4.

The process UtU_{t} is a time-homogenous Markov chain with stationary distribution:

π\displaystyle\pi_{\perp} =μ(1q1)+(1μ)(1q0)q+μ(1q1)+(1μ)(1q0)\displaystyle=\frac{\mu(1-q_{1})+(1-\mu)(1-q_{0})}{q_{\perp}+\mu(1-q_{1})+(1-\mu)(1-q_{0})}
π1\displaystyle\pi_{1} =μqq+μ(1q1)+(1μ)(1q0),\displaystyle=\frac{\mu q_{\perp}}{q_{\perp}+\mu(1-q_{1})+(1-\mu)(1-q_{0})},
π0\displaystyle\pi_{0} =(1μ)qq+μ(1q1)+(1μ)(1q0).\displaystyle=\frac{(1-\mu)q_{\perp}}{q_{\perp}+\mu(1-q_{1})+(1-\mu)(1-q_{0})}.
Proof of Lemma E.4.

We describe the transition dynamics of the process UtU_{t}. If Ut=U_{t}=\perp, then a purchase occurs with probability qq_{\perp}. If a purchase occurs, a new review is posted and the state transitions to Ut+1=1U_{t+1}=1 if the review is positive (with probability μ\mu) and to Ut+1=0U_{t+1}=0 if the review is negative (with probability 1μ1-\mu). If no purchase occurs the state remains the same (Ut+1=U_{t+1}=\perp). If Ut=z{0,1}U_{t}=z\in\{0,1\} a purchase occurs with probability qzq_{z}. If a purchase occurs, a new review is posted and the state transitions to Ut+1=1U_{t+1}=1 if the review is positive (with probability μ\mu) and to Ut+1=0U_{t+1}=0 if the review is negative (with probability 1μ1-\mu). If no purchase occurs the state remains the same (Ut+1=UtU_{t+1}=U_{t}). As Ut+1U_{t+1} is obtained from UtU_{t} through transition dynamics which are independent of the time tt, UtU_{t} is a time-homogeneous Markov chain. Given that pp is strongly non-absorbing and μ(0,1)\mu\in(0,1), every state of UtU_{t} can be reached from every other with positive probability. Hence, UtU_{t} is a single-recurrence class Markov chain with no transient states and therefore it has a unique stationary distribution.

We now describe the steady-state equations for the process UtU_{t} and solve for the stationary distribution π\pi. The stationary distribution at \perp must satisfy the equation π=π(1q)+π1(1q1)+π0(1q0)\pi_{\perp}=\pi_{\perp}(1-q_{\perp})+\pi_{1}(1-q_{1})+\pi_{0}(1-q_{0}) as there are three ways to end up in state \perp: the process UtU_{t} was in a state s{0,1,}s\in\{0,1,\perp\} and no purchase was made (with probability 1qs1-q_{s}). The stationary distribution at state z{0,1}z\in\{0,1\} must satisfy the equation πz=μz(1μ)1z(qπ+q1π1+q0π0)\pi_{z}=\mu^{z}(1-\mu)^{1-z}(q_{\perp}\pi_{\perp}+q_{1}\pi_{1}+q_{0}\pi_{0}) as there are three ways to end up in zz. The process UtU_{t} was in some state s{0,1,}s\in\{0,1,\perp\}, a purchase was made, and the corresponding review was positive (z=1z=1) with probability μqs\mu q_{s} this negative (z=0)(z=0) and with probability (1μ)qs(1-\mu)q_{s}. Thus, the steady-state equations of the Markov chain UtU_{t} are given by

π\displaystyle\pi_{\perp} =π(1q)+π1(1q1)+π0(1q0),\displaystyle=\pi_{\perp}(1-q_{\perp})+\pi_{1}(1-q_{1})+\pi_{0}(1-q_{0}),
π1\displaystyle\pi_{1} =μ(qπ+q1π1+q0π0),\displaystyle=\mu(q_{\perp}\pi_{\perp}+q_{1}\pi_{1}+q_{0}\pi_{0}),
π0\displaystyle\pi_{0} =(1μ)(qπ+q1π1+q0π0).\displaystyle=(1-\mu)(q_{\perp}\pi_{\perp}+q_{1}\pi_{1}+q_{0}\pi_{0}).

The stationary distribution stated in the lemma follows by solving this system of equations. ∎

Proof of Theorem 5.1.

As the ex-ante expected revenue in state z{0,1,}z\in\{0,1,\perp\} is equal to pqzp\cdot q_{z}, using the closed form for the stationary distribution in Lemma E.4 and the Ergodic theorem yields:

Rev0(σrandom(w),p)\displaystyle{\textsc{Rev}}_{0}(\sigma^{\textsc{random}(w)},p) =p(πq+π1q1+π0q0)\displaystyle=p\cdot(\pi_{\perp}q_{\perp}+\pi_{1}q_{1}+\pi_{0}q_{0})
=pqμ(1q1)+(1μ)(1q0)+μq1+(1μ)q0q+μ(1q1)+(1μ)(1q0)\displaystyle=p\cdot q_{\perp}\cdot\frac{\mu(1-q_{1})+(1-\mu)(1-q_{0})+\mu q_{1}+(1-\mu)q_{0}}{q_{\perp}+\mu(1-q_{1})+(1-\mu)(1-q_{0})}
=pq1q+μ(1q1)+(1μ)(1q0)=pq11+qμq1(1μ)q0Q(w).\displaystyle=p\cdot q_{\perp}\cdot\frac{1}{q_{\perp}+\mu(1-q_{1})+(1-\mu)(1-q_{0})}=p\cdot q_{\perp}\cdot\frac{1}{1+\underbrace{q_{\perp}-\mu q_{1}-(1-\mu)q_{0}}_{Q(w)}}.

Substituting the expressions for qq_{\perp}, q0q_{0}, q1q_{1} transforms the second term Q(w)Q(w) in the denominator:

Q(w)=cw(Θ[Θ+h(Beta(a,b))p]μΘ[Θ+h1p](1μ)Θ[Θ+h0p])Diff.Q(w)=\frac{c}{w}\underbrace{\big{(}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b))\geq p]-\mu{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h_{1}\geq p]-(1-\mu){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h_{0}\geq p]\big{)}}_{\textsc{Diff}}.

Diff is independent of ww and, as pp is review-benefiting, Diff<0\textsc{Diff}<0. Together with the fact that cw\frac{c}{w} is strictly decreasing in ww this implies that Q(w)Q(w) is strictly increasing in ww and thus Rev0(σrandom(w),p){\textsc{Rev}}_{0}(\sigma^{\textsc{random}(w)},p) is strictly decreasing in ww. ∎

E.4 Random maximizes revenue with no discounting (Theorem 5.2)

To prove Theorem 5.2, we start with a more general class of review ordering policies. Let 𝒮w\mathcal{S}_{w} denote the family of all cc-sized subsets of {1,,w}\{1,\ldots,w\}. We denote by σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} the review ordering policy that selects cc reviews from the most recent ww according to a fixed probability distribution ψΔ(𝒮w)\psi\in\Delta(\mathcal{S}_{w}); when ψ=U(𝒮w)\psi^{\prime}=\pazocal{U}(\mathcal{S}_{w}) is the uniform distribution over 𝒮w\mathcal{S}_{w}, σrandom(w,ψ)=σrandom(w)\sigma^{\textsc{random}(w,\psi^{\prime})}=\sigma^{\textsc{random}(w)}.

The next lemma shows that U(𝒮w)\pazocal{U}(\mathcal{S}_{w}) maximizes the revenue over all distributions ψ\psi over 𝒮w\mathcal{S}_{w}, which suggests for any fixed window ww, more randomness in the ordering implies more revenue.

Lemma E.5.

For any non-degenerate and non-absorbing price pp the uniform distribution U(𝒮w)\pazocal{U}(\mathcal{S}_{w}) maximizes the revenue over Δ(𝒮w)\Delta(\mathcal{S}_{w}), i.e., Rev1(σrandom(w),p)=maxψΔ(𝒮w)Rev1(σrandom(w,ψ),p)\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p)=\max_{\psi\in\Delta(\mathcal{S}_{w})}\textsc{Rev}_{1}(\sigma^{\textsc{random}(w,\psi)},p).

We also define the distribution ψwΔ(𝒮w+1)\psi_{w}\in\Delta(\mathcal{S}_{w+1}) which ignores the (w+1)(w+1)-th most recent review and places equal probability on every cc-sized subset of {1,,w}\{1,\ldots,w\}. The next lemma shows that ψw\psi_{w} is strictly sub-optimal.

Lemma E.6.

For any non-degenerate and non-absorbing price pp, the revenue of ψw\psi_{w} is strictly suboptimal, i.e., Rev1(σrandom(w+1,ψw),p)<maxψΔ(𝒮w+1)Rev1(σrandom(w+1,ψ),p)\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1,\psi_{w})},p)<\max_{\psi\in\Delta(\mathcal{S}_{w+1})}\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1,\psi)},p).

Proof of Theorem 5.2.

Note that σrandom(w+1,ψw)=σrandom(w)\sigma^{\textsc{random}(w+1,\psi_{w})}=\sigma^{\textsc{random}(w)}. By Lemma E.5 and Lemma E.6, Rev1(σrandom(w),p)\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p) is strictly increasing in ww as for any window wcw\geq c,

Rev1(σrandom(w),p)\displaystyle\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p) =Rev1(σrandom(w+1,ψw),p)\displaystyle=\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1,\psi_{w})},p)
<maxψΔ(𝒮w+1)Rev1(σrandom(w+1,ψ),p)=Rev1(σrandom(w+1),p).\displaystyle<\max_{\psi\in\Delta(\mathcal{S}_{w+1})}\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1,\psi)},p)=\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1)},p).

To prove Lemma E.5, we first characterize the stationary distribution of the reviews generated by σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} in way analogous to Lemma 3.1. In particular for any state of the ww most recent review ratings 𝒛{0,1}w\bm{z}\in\{0,1\}^{w}, the purchase probability is q𝒛ψS𝒮wψ(S)Θ[Θ+h(iSzi)p]q_{\bm{z}}^{\psi}\coloneqq\sum_{S\in\mathcal{S}_{w}}\psi(S){\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\sum_{i\in S}z_{i})\geq p].

A useful quantity is the inverse purchase rate conditioned on kk positive reviews, i.e.,

ιkψ=1(wk)𝒛{0,1}w:N𝒛=k1q𝒛ψ.\iota_{k}^{\psi}=\frac{1}{\binom{w}{k}}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\psi}}.

Intuitively, ιkψ\iota_{k}^{\psi} is the average number of rounds the process of the most recent ww reviews spends at a review state with kk positive reviews. The following lemma shows that the uniform distribution minimizes this purchase rate for any kk and characterizes the equality conditions. The proof is provided at the end of this section.

Lemma E.7.

For any k{0,,c}k\in\{0,\ldots,c\} and any probability distribution ψ\psi, the inverse purchase rate under ψ\psi is at least the inverse purchase rate under 𝒰(𝒮w)\mathcal{U}(\mathcal{S}_{w}), i.e., ιkψιk𝒰(𝒮w)\iota^{\psi}_{k}\geq\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}. Equality ιkψ=ιk𝒰(𝒮w)\iota^{\psi}_{k}=\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k} is achieved if and only if q𝐳ψ=q𝐳ψq_{\bm{z}}^{\psi}=q_{\bm{z}^{\prime}}^{\psi} for all states 𝐳,𝐳{0,1}w\bm{z},\bm{z}^{\prime}\in\{0,1\}^{w} with N𝐳=N𝐳=kN_{\bm{z}}=N_{\bm{z}^{\prime}}=k.

Denoting the most recent ww reviews at round tt by 𝒁t=(Zt,1,,Zt,c)\bm{Z}_{t}=(Z_{t,1},\ldots,Z_{t,c}), we show that 𝒁t\bm{Z}_{t} is a time-homogenous Markov chain on {0,1}w\{0,1\}^{w}. If 𝒁t\bm{Z}_{t} is at state 𝒛{0,1}w\bm{z}\in\{0,1\}^{w}, a purchase occurs with probability q𝒛ψq_{\bm{z}}^{\psi}. If a purchase occurs, a new review is left and the state transitions to 𝒁t+1=(1,z1,,zw1)\bm{Z}_{t+1}=(1,z_{1},\ldots,z_{w-1}) if the review is positive (with probability μ\mu) and to 𝒁t+1=(0,z1,,zw1)\bm{Z}_{t+1}=(0,z_{1},\ldots,z_{w-1}) if the review is negative (with probability 1μ1-\mu). If there is no purchase, the state remains the same (𝒁t+1=𝒁t)(\bm{Z}_{t+1}=\bm{Z}_{t}). Given that pp is a non-absorbing price, for every state of reviews 𝒛{0,1}w\bm{z}\in\{0,1\}^{w}, the purchase probability q𝒛ψq_{\bm{z}}^{\psi} is positive and the probability of any new review is strictly positive (since μ(0,1)\mu\in(0,1)). Then 𝒁t\bm{Z}_{t} can reach every state from every other state with positive probability (i.e. it is a single-recurrence-class Markov chain with no transient states), and hence 𝒁t\bm{Z}_{t} has a unique stationary distribution denoted by π\pi. Our next lemma characterizes the form of this stationary distribution. For convenience, let N𝒛=i=1wziN_{\bm{z}}=\sum_{i=1}^{w}z_{i} be the number of positive review ratings among a vector of review ratings 𝒛{0,1}w\bm{z}\in\{0,1\}^{w}.

Lemma E.8.

For any non-absorbing price pp and any distribution ψ\psi, the stationary distribution of the Markov chain 𝐙t\bm{Z}_{t} under the review ordering policy σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} is given by

π𝒛ψ=κψμN𝒛(1μ)wN𝒛q𝒛ψ where κψ=1𝔼KBinom(w,μ)[ιKψ].\pi_{\bm{z}}^{\psi}=\kappa^{\psi}\cdot\frac{\mu^{N_{\bm{z}}}(1-\mu)^{w-N_{\bm{z}}}}{q_{\bm{z}}^{\psi}}\quad\text{ where }\kappa^{\psi}=\frac{1}{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\big{[}\iota_{K}^{\psi}\big{]}}.
Proof of Lemma E.8.

Similar to the proof of Lemma 3.1, we invoke Lemma C.3. Recall that Lemma C.3 starts with a Markov chain \mathcal{M} with state space 𝒮\mathcal{S}, transition matrix MM, and stationary distribution {π(s)}s𝒮\{\pi(s)\}_{s\in\mathcal{S}}. For any function ff on 𝒮\mathcal{S}, it then transforms \mathcal{M} into a new Markov chain f\mathcal{M}_{f} which remains at every state ss with probability 1f(s)1-f(s) and transitions according to MM otherwise. The lemma establishes that the stationary distribution of f\mathcal{M}_{f} is given by πf={κπ(s)f(s)}s𝒮\pi_{f}=\{\kappa\cdot\frac{\pi(s)}{f(s)}\}_{s\in\mathcal{S}} where κ=1/s𝒮π(s)f(s)\kappa=1/\sum_{s\in\mathcal{S}}\frac{\pi(s)}{f(s)}.

In the language of Lemma C.3, 𝒁t\bm{Z}_{t} corresponds to f\mathcal{M}_{f}, the state space 𝒮\mathcal{S} to {0,1}w\{0,1\}^{w}, and ff is a function that expresses the purchase probability at a given state, i.e., f(𝒛)=q𝒛ψf(\bm{z})=q_{\bm{z}}^{\psi}. Note that with probability 1f(𝒛)1-f(\bm{z}), 𝒁t\bm{Z}_{t} remains at the same state (as there is no purchase).

To apply Lemma C.3, we need to show that whenever there is a purchase, 𝒁t\bm{Z}_{t} transitions according to a Markov chain with stationary distribution μN𝒛(1μ)wN𝒛\mu^{N_{\bm{z}}}(1-\mu)^{w-N_{\bm{z}}}. Consider the Markov chain \mathcal{M} which always replaces the ww-th last review with a new Bern(μ){\mathrm{Bern}}(\mu) review. This process has stationary distribution equal to the above numerator and 𝒁t\bm{Z}_{t} transitions according to \mathcal{M} upon a purchase, i.e., with probability f(𝒛)f(\bm{z}). Hence, by Lemma C.3, a stationary distribution for 𝒁t\bm{Z}_{t} is

π𝒛ψ=κψμN𝒛(1μ)wN𝒛q𝒛ψ where κψ=1𝔼Z1,,Zwi.i.d.Bern(μ)[1q(Z1,,Zw)ψ].\pi_{\bm{z}}^{\psi}=\kappa^{\psi}\cdot\frac{\mu^{N_{\bm{z}}}(1-\mu)^{w-N_{\bm{z}}}}{q_{\bm{z}}^{\psi}}\quad\text{ where }\kappa^{\psi}=\frac{1}{{\mathbb{E}}_{Z_{1},\ldots,Z_{w}\sim_{i.i.d.}{\mathrm{Bern}}(\mu)}\Big{[}\frac{1}{q_{(Z_{1},\ldots,Z_{w})}^{\psi}}\Big{]}}.

This is the unique stationary distribution as 𝒁t\bm{Z}_{t} is irreducible and aperiodic. Expanding over the number of positive reviews k{0,,w}k\in\{0,\ldots,w\}, the lemma follows as the expectation in the denominator of κψ\kappa^{\psi} can be expressed as

k=0w[𝒛{0,1}w:N𝒛=k1q𝒛ψ]μk(1μ)wk=k=0wιkψ(wk)μk(1μ)wk=𝔼KBinom(w,μ)[ιKψ].\sum_{k=0}^{w}\Bigg{[}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\psi}}\Bigg{]}\mu^{k}(1-\mu)^{w-k}=\sum_{k=0}^{w}\iota^{\psi}_{k}\binom{w}{k}\mu^{k}(1-\mu)^{w-k}={\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\big{[}\iota_{K}^{\psi}\big{]}.

Having established the stationary distribution of 𝒁t\bm{Z}_{t} we give an expression for the revenue of σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} (similar to Proposition 3.3).

Lemma E.9.

For any non-absorbing price pp and any distribution ψΔ(𝒮w)\psi\in\Delta(\mathcal{S}_{w}), the revenue of σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} is given by

Rev1(σrandom(w,ψ),p)=pκψwhereκψ=1𝔼KBinom(w,μ)[ιKψ].\textsc{Rev}_{1}(\sigma^{\textsc{random}(w,\psi)},p)=p\cdot\kappa^{\psi}\quad\text{where}\quad\kappa^{\psi}=\frac{1}{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\big{[}\iota_{K}^{\psi}\big{]}}.
Proof of Lemma E.9.

Using Eq. (1) and the Ergodic theorem, the revenue of σrandom(w,ψ)\sigma^{\textsc{random}(w,\psi)} is

Rev1(σrandom(w,ψ),p)=lim infT𝔼[t=1Tp𝟙Θt+h(Φt)p]T=plim infT𝔼[t=1Tq𝒁tψ]T=p𝒛{0,1}wπ𝒛ψq𝒛ψ.\textsc{Rev}_{1}(\sigma^{\textsc{random}(w,\psi)},p)=\liminf_{T\to\infty}\frac{{\mathbb{E}}[\sum_{t=1}^{T}p\mathbbm{1}_{\Theta_{t}+h(\Phi_{t})\geq p}]}{T}=p\liminf_{T\to\infty}\frac{{\mathbb{E}}[\sum_{t=1}^{T}q_{\bm{Z}_{t}}^{\psi}]}{T}=p\sum_{\bm{z}\in\{0,1\}^{w}}\pi_{\bm{z}}^{\psi}q_{\bm{z}}^{\psi}.

The second equality uses that the ex-ante purchase probability for review state 𝒁t\bm{Z}_{t} is 𝔼[𝟙Θt+h(Φt)p]=q𝒁𝒕ψ{\mathbb{E}}[\mathbbm{1}_{\Theta_{t}+h(\Phi_{t})\geq p}]=q_{\bm{Z_{t}}}^{\psi} and the law of iterated expectation. The third equality expresses the revenue of the stationary distribution via the Ergodic theorem. Expanding π𝒛ψ\pi_{\bm{z}}^{\psi} by Lemma E.8, the q𝒛ψq_{\bm{z}}^{\psi} term cancels out:

Rev1(σrandom(w,ψ),p)\displaystyle\textsc{Rev}_{1}(\sigma^{\textsc{random}(w,\psi)},p) =pκψ[𝒛{0,1}wμN𝒛(1μ)wN𝒛].\displaystyle=p\cdot\kappa^{\psi}\cdot\Big{[}\sum_{\bm{z}\in\{0,1\}^{w}}\mu^{N_{\bm{z}}}(1-\mu)^{w-N_{\bm{z}}}\Big{]}.

The proof is concluded by noting that the term in the square brackets equals 1, since it is the sum over all probabilities of Binom(w,μ){\mathrm{Binom}}(w,\mu). ∎

We now prove Lemma E.5 and Lemma E.6.

Proof of Lemma E.5.

For κψ=1𝔼KBinom(w,μ)[ιKψ]\kappa^{\psi}=\frac{1}{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\big{[}\iota_{K}^{\psi}\big{]}}, the expectation in 1κψ\frac{1}{\kappa^{\psi}} can be expressed by summing over the number of positive review ratings k{0,1,,w}k\in\{0,1,\ldots,w\}:

1κψ=k=0wιkψ(wk)μk(1μ)wkk=0wιk𝒰(𝒮w)(wk)μk(1μ)wk=1κ𝒰(𝒮w).\frac{1}{\kappa^{\psi}}=\sum_{k=0}^{w}\iota^{\psi}_{k}\binom{w}{k}\mu^{k}(1-\mu)^{w-k}\geq\sum_{k=0}^{w}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}\binom{w}{k}\mu^{k}(1-\mu)^{w-k}=\frac{1}{\kappa^{\mathcal{U}(\mathcal{S}_{w})}}.

where the inequality follows from Lemma E.7. Thus, κψκ𝒰(𝒮w)\kappa^{\psi}\leq\kappa^{\mathcal{U}(\mathcal{S}_{w})} for any distribution ψ\psi. By Lemma E.9 the proof is concluded as

Rev1(σrandom(w,ψ),p)=pκψpκ𝒰(𝒮w)=Rev1(σrandom(w),p).\textsc{Rev}_{1}(\sigma^{\textsc{random}(w,\psi)},p)=p\cdot\kappa^{\psi}\leq p\cdot\kappa^{\mathcal{U}(\mathcal{S}_{w})}=\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p).

Proof of Lemma E.6.

By Lemma E.9, the revenue of ψw\psi_{w} can be expressed as:

Rev1(σrandom(w+1,ψw),p)=pκψw=pk=0wιkψw(wk)μk(1μ)wk.\textsc{Rev}_{1}(\sigma^{\textsc{random}(w+1,\psi_{w})},p)=p\cdot\kappa^{\psi_{w}}=\frac{p}{\sum_{k=0}^{w}\iota^{\psi_{w}}_{k}\binom{w}{k}\mu^{k}(1-\mu)^{w-k}}.

By Lemma E.7, it is thus sufficient to show that there is some number of positive ratings k{0,,c}k\in\{0,\ldots,c\} such that ιkψw>ιk𝒰(𝒮w)\iota^{\psi_{w}}_{k}>\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}. For the sake of contradiction assume this is not the case, i.e., ιkψw=ιk𝒰(𝒮w)\iota^{\psi_{w}}_{k}=\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k} for all k{0,,c}k\in\{0,\ldots,c\}. The equality condition of Lemma E.7 applied for each k{0,,c}k\in\{0,\ldots,c\} implies that

q𝒛ψw=q𝒛ψw for any 𝒛,𝒛{0,1}w with N𝒛=N𝒛.q_{\bm{z}}^{\psi_{w}}=q_{\bm{z}^{\prime}}^{\psi_{w}}\text{ for any }\bm{z},\bm{z}^{\prime}\in\{0,1\}^{w}\text{ with }N_{\bm{z}}=N_{\bm{z}^{\prime}}. (31)

Letting q(n)Θ[Θ+h(n)p]q(n)\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] denote the purchase probability given nn positive reviews, we show below (by induction on nn) that (31) implies q(n)=q(0)q(n)=q(0) for any number of positive review ratings n{0,,c}n\in\{0,\ldots,c\}, which contradicts the fact that the price pp is non-degenerate.

For the base case of the induction (n=1n=1), consider two settings with one positive review: in the first one the positive review is the (w+1)(w+1)-th most recent review, while in the second one the positive review is the ww-th most recent review. Formally, we apply Eq. 31 for 𝒛=(0,,0,0,1)\bm{z}=(0,\ldots,0,0,1) and 𝒛=(0,,0,1,0)\bm{z}^{\prime}=(0,\ldots,0,1,0). As ψw\psi_{w} places zero mass on any subset containing the last review implies that, the purchase probability at 𝒛\bm{z} is q𝒛ψw=q(0)q_{\bm{z}}^{\psi_{w}}=q(0). The purchase probability at 𝒛\bm{z}^{\prime} equals q(1)q(1) when the ww-th review is chosen and q(0)q(0) otherwise implying q𝒛ψw=q(1)Sψw[wS]+q(0)Sψw[wS)]q_{\bm{z}^{\prime}}^{\psi_{w}}=q(1){\mathbb{P}}_{S\sim\psi_{w}}[w\in S]+q(0){\mathbb{P}}_{S\sim\psi_{w}}[w\not\in S)]. Observing that Sψw[wS]>0{\mathbb{P}}_{S\sim\psi_{w}}[w\in S]>0 and solving the equality q𝒛ψw=q𝒛ψwq_{\bm{z}}^{\psi_{w}}=q_{\bm{z}^{\prime}}^{\psi_{w}} implies q(1)=q(0)q(1)=q(0).

For the induction step (n1nn-1\rightarrow n), suppose that q(n)=q(0)q(n^{\prime})=q(0) for all n<nn^{\prime}<n. We apply Eq. 31 for 𝒛=(0,,0,1,,1)\bm{z}=(0,\ldots,0,1,\ldots,1) (wnw-n zeros followed by nn ones) and 𝒛=(0,,0,1,,1,0)\bm{z}^{\prime}=(0,\ldots,0,1,\ldots,1,0) (wn1w-n-1 zeros, followed by nn ones, followed by one). If the state of the reviews is 𝒛\bm{z}, as ψw\psi_{w} never selects the (w+1)(w+1)-th most recent review, any cc reviews selected by ψw\psi_{w} contain at most n1n-1 positive review ratings and the induction hypothesis implies that the purchase probability is thus equal to q(0)q(0) regardless of the choice of the cc reviews. Thus, q𝒛ψw=q(0)q_{\bm{z}}^{\psi_{w}}=q(0). Suppose the state of the review is 𝒛\bm{z}^{\prime}. If the selected set of the cc selected reviews contains all the reviews at indices In={wn,wn+1,w}I_{n}=\{w-n,w-n+1\ldots,w\}, the number of positive reviews is nn and the purchase probability q(n)q(n). Otherwise, the cc selected reviews contain at most n1n-1 positive review ratings and by the induction hypothesis the purchase probability is q(0)q(0). Thus, q𝒛ψw=q(n)Sψw[InS]+q(0)Sψw[InS]q_{\bm{z}}^{\psi_{w}}=q(n){\mathbb{P}}_{S\sim\psi_{w}}[I_{n}\subseteq S]+q(0){\mathbb{P}}_{S\sim\psi_{w}}[I_{n}\not\subseteq S]. Observing that Sψw[InS]>0{\mathbb{P}}_{S\sim\psi_{w}}[I_{n}\subseteq S]>0 and solving the equality q𝒛ψw=q𝒛ψwq_{\bm{z}}^{\psi_{w}}=q_{\bm{z}^{\prime}}^{\psi_{w}} implies q(n)=q(0)q(n)=q(0), which finishes the induction step and the proof. ∎

We complete this subsection by proving Lemma E.7. An important quantity towards that proof is the inverse purchase rate for kk positive review ratings under the uniform distribution, i.e., ιk𝒰(𝒮w)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}, which we characterize in the following lemma. To ease notation, let q(n)Θ[Θ+h(n)p]q(n)\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] be the purchase probability given n{0,1,c}n\in\{0,1\ldots,c\} positive reviews.

Lemma E.10.

For any number of positive review ratings k{0,,w}k\in\{0,\ldots,w\}, the inverse purchase rate for kk under the uniform distribution on 𝒮w\mathcal{S}_{w} is given by ιk𝒰(𝒮w)=(wk)n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}=\frac{\binom{w}{k}}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\binom{c}{n}\binom{w-c}{k-n}}.

Proof of Lemma E.10.

Recall that 𝒰(𝒮w)\mathcal{U}(\mathcal{S}_{w}) places a probability of 1(wc)\frac{1}{\binom{w}{c}} on every subset of 𝒮w\mathcal{S}_{w}. Thus, by counting the number of occurrences of each term q(n)q(n), the purchase probability at any state 𝒛{0,1}w\bm{z}\in\{0,1\}^{w} with N𝒛=kN_{\bm{z}}=k can be expressed as:

q𝒛𝒰(𝒮w)=S𝒮w1(wc)q(iSzi)=n=max(0,c+kw)min(c,k)q(n)(kn)(wkcn)(wc)=n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)(wk).q_{\bm{z}}^{\mathcal{U}(\mathcal{S}_{w})}=\sum_{S\in\mathcal{S}_{w}}\frac{1}{\binom{w}{c}}q(\sum_{i\in S}z_{i})=\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\frac{\binom{k}{n}\binom{w-k}{c-n}}{\binom{w}{c}}=\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\frac{\binom{c}{n}\binom{w-c}{k-n}}{\binom{w}{k}}. (32)

The second equality uses that for any number of positive reviews n[max(0,c+kw),min(c,k)]n\in[\max(0,c+k-w),\min(c,k)] there are (kn)(wkcn)\binom{k}{n}\binom{w-k}{c-n} ways to choose S𝒮wS\in\mathcal{S}_{w} with iSzi=n\sum_{i\in S}z_{i}=n: (kn)\binom{k}{n} ways to choose the nn indices iSi\in S from the kk indices ii where zi=1z_{i}=1, and (wkcn)\binom{w-k}{c-n} ways to choose the remaining cnc-n indices iSi\in S from the wkw-k indices where zi=0z_{i}=0. The third equality uses the binomial coefficient identity

(kn)(wkcn)(wc)=k!n!(kn)!(wk)!(cn)!(wkc+n)!w!c!(wc)!=c!n!(cn)!(wc)!(kn)!(wkc+n)!w!k!(wk)!=(cn)(wckn)(wk).\frac{\binom{k}{n}\binom{w-k}{c-n}}{\binom{w}{c}}=\frac{\frac{k!}{n!(k-n)!}\cdot\frac{(w-k)!}{(c-n)!(w-k-c+n)!}}{\frac{w!}{c!(w-c)!}}=\frac{\frac{c!}{n!(c-n)!}\cdot\frac{(w-c)!}{(k-n)!(w-k-c+n)!}}{\frac{w!}{k!(w-k)!}}=\frac{\binom{c}{n}\binom{w-c}{k-n}}{\binom{w}{k}}.

Given that the right-hand side of Eq. 32 does not depend on the review ratings 𝒛\bm{z},

ιk𝒰(𝒮w)=1(wk)𝒛{0,1}w:N𝒛=k1q𝒛𝒰(𝒮w)=1n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)(wk)=(wk)n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}=\frac{1}{\binom{w}{k}}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\mathcal{U}(\mathcal{S}_{w})}}=\frac{1}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\frac{\binom{c}{n}\binom{w-c}{k-n}}{\binom{w}{k}}}=\frac{\binom{w}{k}}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\binom{c}{n}\binom{w-c}{k-n}}

which completes the proof. ∎

Proof of Lemma E.7.

By Jensen’s inequality applied to the convex function x1xx\to\frac{1}{x},

ιkψ=1(wk)𝒛{0,1}w:N𝒛=k1q𝒛ψ(wk)𝒛{0,1}w:N𝒛=kq𝒛ψ.\iota^{\psi}_{k}=\frac{1}{\binom{w}{k}}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\psi}}\geq\frac{\binom{w}{k}}{\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q_{\bm{z}}^{\psi}}. (33)

To show the lemma, it suffices to prove that for any probability distribution ψΔ(𝒮w)\psi\in\Delta(\mathcal{S}_{w}), the lower bound term (wk)𝒛{0,1}w:N𝒛=kq𝒛ψ\frac{\binom{w}{k}}{\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q_{\bm{z}}^{\psi}} on the left-hand side is independent of ψ\psi and equal to ιk𝒰(𝒮w)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}. We start by analyzing the sum in the denominator 𝒛{0,1}w:N𝒛=kq𝒛ψ\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q_{\bm{z}}^{\psi}. Letting q(n)Θ[Θ+h(n)p]q(n)\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] be the purchase probability given n{0,1,c}n\in\{0,1\ldots,c\} positive reviews, recalling that q𝒛ψ=S𝒮wψ(S)q(iSzi)q_{\bm{z}}^{\psi}=\sum_{S\in\mathcal{S}_{w}}\psi(S)q(\sum_{i\in S}z_{i}), and rearranging, we express the sum of interest as:

𝒛{0,1}w:N𝒛=kq𝒛ψ=𝒛{0,1}w:N𝒛=kS𝒮wψ(S)q(iSzi)=S𝒮wψ(S)𝒛{0,1}w:N𝒛=kq(iSzi).\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q_{\bm{z}}^{\psi}=\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\sum_{S\in\mathcal{S}_{w}}\psi(S)q(\sum_{i\in S}z_{i})=\sum_{S\in\mathcal{S}_{w}}\psi(S)\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q(\sum_{i\in S}z_{i}).

By counting the number of occurrences of each term q(n)q(n), the inner sum can be expressed as:

𝒛{0,1}w:N𝒛=kq(iSzi)=n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q(\sum_{i\in S}z_{i})=\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\binom{c}{n}\binom{w-c}{k-n} (34)

since for any number of positive reviews n[max(0,c+kw),min(c,k)]n\in[\max(0,c+k-w),\min(c,k)] there are (cn)(wckn)\binom{c}{n}\binom{w-c}{k-n} ways to choose sequences 𝒛{0,1}w\bm{z}\in\{0,1\}^{w} with N𝒛=kN_{\bm{z}}=k such that iSzi=n\sum_{i\in S}z_{i}=n: there are (cn)\binom{c}{n} ways to choose the subsequence (zi)iS{0,1}c(z_{i})_{i\in S}\in\{0,1\}^{c} such that iSzi=n\sum_{i\in S}z_{i}=n and (wckn)\binom{w-c}{k-n} ways to choose the complementary subsequence (zi)iS{0,1}wc(z_{i})_{i\not\in S}\in\{0,1\}^{w-c} such that iSzi=kn\sum_{i\not\in S}z_{i}=k-n.

Observing that the right-hand side of (34) is independent of SS, using the fact that S𝒮wψ(S)=1\sum_{S\in\mathcal{S}_{w}}\psi(S)=1 (as ψ\psi is a probability distribution) as well as inequality Eq. 33 and Lemma E.10, we obtain:

ιkψ(wk)𝒛{0,1}w:N𝒛=kq𝒛ψ=(wk)n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)=ιk𝒰(𝒮w).\iota^{\psi}_{k}\geq\frac{\binom{w}{k}}{\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}q_{\bm{z}}^{\psi}}=\frac{\binom{w}{k}}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\binom{c}{n}\binom{w-c}{k-n}}=\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}.

This completes the proof of the inequality statement of the lemma. With respect to the equality statement, for any k{0,,c}k\in\{0,\ldots,c\}, the equality conditions of Jensen’s inequality imply that Eq. 33 holds with equality if and only if q𝒛ψ=q𝒛ψq_{\bm{z}}^{\psi}=q_{\bm{z}^{\prime}}^{\psi} for any 𝒛,𝒛{0,1}w\bm{z},\bm{z}^{\prime}\in\{0,1\}^{w} with N𝒛=N𝒛=kN_{\bm{z}}=N_{\bm{z}^{\prime}}=k. ∎

E.5 When customer discount slightly a finite ww is the best (Theorem 5.3)

The proof of Theorem 5.3 uses the following two lemmas that characterize the behavior of the revenue of σrandom(w)\sigma^{\textsc{random}(w)} as a function of the window ww and the discount factor γ\gamma. The proofs of those lemmas are provided in Appendices E.6 and E.7 respectively.

Lemma E.11.

For any non-degenerate and non-absorbing price p>0p>0 and any 11-time-discounting customers, the revenue of σrandom(w)\sigma^{\textsc{random}(w)} converges to the revenue of σrandom\sigma^{\textsc{random}} as the window size ww goes to ++\infty, i.e., limwRev1(σrandom(w),p)=Rev1(σrandom,p)\lim_{w\to\infty}\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p)=\textsc{Rev}_{1}(\sigma^{\textsc{random}},p).

Lemma E.12.

For any window wcw\geq c and any non-degenerate and strongly non-absorbing price p>0p>0, the revenue Revγ(σrandom(w),p){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p) is a continuous function of the discount factor γ[0,1]\gamma\in[0,1].

Proof of Theorem 5.3.

For any review-benefiting price p>0p>0, by Proposition 3.2 and Eq. (4):

Rev1(σrandom,p)\displaystyle{\textsc{Rev}}_{1}(\sigma^{\textsc{random}},p) =p𝔼NBinom(c,μ)[Θ[Θ+h(Beta(a+N,b+cN))p]]\displaystyle=p\cdot{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}\big{[}{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a+N,b+c-N))\geq p]\big{]}
>pΘ[Θ+h(Beta(a,b))p]=Revγ(σrandom,p)for any γ<1.\displaystyle>p\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(\mathrm{Beta}(a,b))\geq p]={\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p)\quad\text{for any $\gamma<1$.}

By Lemma E.11, limwRev1(σrandom(w),p)=Rev1(σrandom,p)>Revγ(σrandom,p)\lim_{w\to\infty}{\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w)},p)={\textsc{Rev}}_{1}(\sigma^{\textsc{random}},p)>{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p). Since Rev1(σrandom(w),p){\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w)},p) is strictly increasing in ww (Theorem 5.2), there exists a window w>cw^{\star}>c with

Rev1(σrandom(w),p)>Revγ(σrandom,p) for any γ<1.{\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w^{\star})},p)>{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p)\quad\text{ for any $\gamma<1$.}

Given that σnewest=σrandom(c){\sigma^{\textsc{newest}}}=\sigma^{\textsc{random}(c)}, again invoking Theorem 5.2 the above implies that

Rev1(σrandom(w),p)>max(Rev1(σnewest,p),Revγ(σrandom,p)) for any γ<1.{\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w^{\star})},p)>\max\big{(}{\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},p),{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p)\big{)}\quad\text{ for any $\gamma<1$}.

By Lemma E.12, limγ1Revγ(σrandom(w),p)=Rev1(σrandom(w),p)\lim_{\gamma\to 1}{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w^{\star})},p)={\textsc{Rev}}_{1}(\sigma^{\textsc{random}(w^{\star})},p) and limγ1Revγ(σnewest,p)=Rev1(σnewest,p)\lim_{\gamma\to 1}{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},p)={\textsc{Rev}}_{1}(\sigma^{\textsc{newest}},p). As a result there exists some ϵ>0\epsilon>0 such that for all discount factors γ(1ϵ,1)\gamma\in(1-\epsilon,1),

Revγ(σrandom(w),p)>max(Revγ(σnewest,p),Revγ(σrandom,p)){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w^{\star})},p)>\max\big{(}{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},p),{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}},p)\big{)}

which completes the proof. ∎

E.6 Limiting behavior of σrandom(w)\sigma^{\textsc{random}(w)} under no discounting (Lemma E.11)

We next provide the proof of Lemma E.11. Given that Rev1(σrandom(w),p)\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p) can be expressed as a function of a ww i.i.d. Bern(μ){\mathrm{Bern}}(\mu) trials (Lemma E.9), we first show that KBinom(w,μ)K\sim{\mathrm{Binom}}(w,\mu) with ww trials and success probability μ\mu is well-concentrated around its mean. Formally, the event Conc(w)={K[μ(ww23),μ(w+w23)]}\mathcal{E}^{\textsc{Conc}}(w)=\{K\in[\mu(w-w^{\frac{2}{3}}),\mu(w+w^{\frac{2}{3}})]\} occurs with high probability.

Lemma E.13.

For any wcw\geq c, [Conc(w)]12exp(w13μ3){\mathbb{P}}[\mathcal{E}^{\textsc{Conc}}(w)]\geq 1-2\exp\Big{(}\frac{w^{-\frac{1}{3}}\mu}{3}\Big{)} and thus limw[Conc(w)]=1\lim_{w\to\infty}{\mathbb{P}}[\mathcal{E}^{\textsc{Conc}}(w)]=1.

Proof.

By Chernoff bound, KBinom(w,μ)[|Kμw|μw23]12exp(w13μ3){\mathbb{P}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}|K-\mu w|\leq\mu w^{\frac{2}{3}}\Big{]}\geq 1-2\exp\Big{(}\frac{w^{-\frac{1}{3}}\mu}{3}\Big{)} . Taking the limit as ww\rightarrow\infty yields the result. ∎

Recall that ιk𝒰(𝒮w)=1(wc)𝒛{0,1}w:N𝒛=k1q𝒛𝒰(𝒮w)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}=\frac{1}{\binom{w}{c}}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\mathcal{U}(\mathcal{S}_{w})}} is the inverse purchase rate conditioned on kk positive review ratings and q(n)Θ[Θ+h(n)p]q(n)\coloneqq{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta+h(n)\geq p] denotes the purchase probability given n{0,1,c}n\in\{0,1\ldots,c\} positive reviews. The following lemma connects the inverse purchase rate of σrandom(w)\sigma^{\textsc{random}(w)} (assuming that the above concentration holds) with the revenue of σrandom\sigma^{\textsc{random}}.

Lemma E.14.

There exists a threshold M>0M>0 such that for any window w>Mw>M, it holds that

Rev1(σrandom,p)=plimw(1𝔼KBinom(w,μ)[ιK𝒰(𝒮w)|Conc(w)]).{\textsc{Rev}}_{1}(\sigma^{\textsc{random}},p)=p\cdot\lim_{w\to\infty}\Bigg{(}\frac{1}{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota_{K}^{\mathcal{U}(\mathcal{S}_{w})}\Big{|}\mathcal{E}^{\textsc{Conc}}(w)\Big{]}}\Bigg{)}.
Proof of Lemma E.14.

Let MM be large enough so that for w>Mw>M, μ(ww23)c\mu(w-w^{-\frac{2}{3}})\geq c and wc+μ(w+w23)w\geq c+\mu(w+w^{-\frac{2}{3}}), which implies that min(c,k)=c\min(c,k)=c and max(0,c+kw)=0\max(0,c+k-w)=0 for k[μ(ww23),μ(w+w23)]k\in[\mu(w-w^{\frac{2}{3}}),\mu(w+w^{\frac{2}{3}})]. By Lemma E.10 for w>Mw>M and k[μ(ww23),μ(w+w23)]k\in[\mu(w-w^{\frac{2}{3}}),\mu(w+w^{\frac{2}{3}})], it holds that

ιk𝒰(𝒮w)=(wk)n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)=1n=max(0,c+kw)min(c,k)q(n)(cn)(wckn)(wk)=1n=0cq(n)(cn)(wckn)(wk).\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}=\frac{\binom{w}{k}}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\binom{c}{n}\binom{w-c}{k-n}}=\frac{1}{\sum_{n=\max(0,c+k-w)}^{\min(c,k)}q(n)\frac{\binom{c}{n}\binom{w-c}{k-n}}{\binom{w}{k}}}=\frac{1}{\sum_{n=0}^{c}q(n)\frac{\binom{c}{n}\binom{w-c}{k-n}}{\binom{w}{k}}}.

One of the terms in the denominator is the binomial coefficient ratio (wckn)(wk)\frac{\binom{w-c}{k-n}}{\binom{w}{k}}. Letting μ¯w=μw13\underline{\mu}_{w}=\mu-w^{-\frac{1}{3}} and μ¯w=μ+w13\overline{\mu}_{w}=\mu+w^{-\frac{1}{3}}, we show below

(wc+1w)c(μ¯w)c(1μ¯w)cn(wckn)(wk)(μ¯wn1w)n(1μ¯wcn1w)cn(\frac{w-c+1}{w})^{c}(\overline{\mu}_{w})^{c}(1-\underline{\mu}_{w})^{c-n}\geq\frac{\binom{w-c}{k-n}}{\binom{w}{k}}\geq(\underline{\mu}_{w}-\frac{n-1}{w})^{n}(1-\overline{\mu}_{w}-\frac{c-n-1}{w})^{c-n} (35)

which implies that 𝔼KBinom(w,μ)[ιK𝒰(𝒮w)|Conc(w)]{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota_{K}^{\mathcal{U}(\mathcal{S}_{w})}|\mathcal{E}^{\textsc{Conc}}(w)\Big{]} is lower and upper bounded respectively by

(wwc+1)cn=0cq(n)(cn)(μ¯w)c(1μ¯w)cn and 1n=0cq(n)(cn)(μ¯wn1w)n(1μ¯wcn1w)cn.\frac{(\frac{w}{w-c+1})^{c}}{\sum_{n=0}^{c}q(n)\binom{c}{n}(\overline{\mu}_{w})^{c}(1-\underline{\mu}_{w})^{c-n}}\text{ and }\frac{1}{\sum_{n=0}^{c}q(n)\binom{c}{n}(\underline{\mu}_{w}-\frac{n-1}{w})^{n}(1-\overline{\mu}_{w}-\frac{c-n-1}{w})^{c-n}}.

Given that μ¯w,μ¯wwμ\underline{\mu}_{w},\overline{\mu}_{w}\to_{w\to\infty}\mu (as w13w0w^{-\frac{1}{3}}\to_{w\to\infty}0), wc+1ww1\frac{w-c+1}{w}\to_{w\to\infty}1 and n1w,cn1ww0\frac{n-1}{w},\frac{c-n-1}{w}\to_{w\to\infty}0, these lower and upper bounds converge to 1𝔼NBinom(c,μ)[q(N)]\frac{1}{{\mathbb{E}}_{N\sim{\mathrm{Binom}}(c,\mu)}[q(N)]} as ww\to\infty which is equal to pRev1(σrandom,p)\frac{p}{{\textsc{Rev}}_{1}({\sigma^{\textsc{random}}},p)} (by Proposition 3.2) and yields the result. We conclude the proof by showing Eq. (35). We expand the binomial coefficient ratio in the denominator of Eq. (35) as:

(wckn)(wk)=(wc)!(kn)!(wck+n)!w!k!(wk)!=i=1n(ki+1)j=1cn(wkj+1)l=1c(wl+1).\frac{\binom{w-c}{k-n}}{\binom{w}{k}}=\frac{\frac{(w-c)!}{(k-n)!(w-c-k+n)!}}{\frac{w!}{k!(w-k)!}}=\frac{\prod_{i=1}^{n}(k-i+1)\prod_{j=1}^{c-n}(w-k-j+1)}{\prod_{l=1}^{c}(w-l+1)}.

Observing that kki+1kn+1k\geq k-i+1\geq k-n+1 for k[1,n]k\in[1,n], wkwkj+1wkc+n+1w-k\geq w-k-j+1\geq w-k-c+n+1 for j[1,cn]j\in[1,c-n], and wwl+1wc+1w\geq w-l+1\geq w-c+1 for l[1,c]l\in[1,c], we can bound the binomial coefficient

(kwc+1)n(wkwc+1)cn=(wc+1w)c(kw)n(1kw)cn(wckn)(wk)(kn+1w)n(wkc+n+1w)cn=(kwn1w)n(1kwcn1w)cn.\underbrace{\Big{(}\frac{k}{w-c+1}\Big{)}^{n}\Big{(}\frac{w-k}{w-c+1}\Big{)}^{c-n}}_{=(\frac{w-c+1}{w})^{c}(\frac{k}{w})^{n}(1-\frac{k}{w})^{c-n}}\geq\frac{\binom{w-c}{k-n}}{\binom{w}{k}}\geq\underbrace{\Big{(}\frac{k-n+1}{w}\Big{)}^{n}\Big{(}\frac{w-k-c+n+1}{w}\Big{)}^{c-n}}_{=\Big{(}\frac{k}{w}-\frac{n-1}{w}\Big{)}^{n}\Big{(}1-\frac{k}{w}-\frac{c-n-1}{w}\Big{)}^{c-n}}.

Given that kw[μ¯w,μ¯w]\frac{k}{w}\in[\underline{\mu}_{w},\overline{\mu}_{w}] (as k[μ(ww23),μ(w+w23)]k\in[\mu(w-w^{\frac{2}{3}}),\mu(w+w^{\frac{2}{3}})]), the above proves (35). ∎

Proof of Lemma E.11.

Lemma E.9 connects the revenue of σrandom(w)\sigma^{\textsc{random}(w)} to 𝔼KBinom(w,μ)[ιK𝒰(𝒮w)]{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{K}\Big{]}. By the law ot total expectation, the latter term can be expanded to:

𝔼KBinom(w,μ)[ιK𝒰(𝒮w)|Conc(w)]wpRev1(σrandom,p)(Lemma E.14)[Conc(w)]w1 (Lemma E.13)+𝔼KBinom(w,μ)[ιK𝒰(𝒮w)|¬Conc(w)][¬Conc(w)]w0 (Lemma E.13)\displaystyle\underbrace{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{K}|\mathcal{E}^{\textsc{Conc}}(w)\Big{]}}_{\to_{w\to\infty}\frac{p}{{\textsc{Rev}}_{1}({\sigma^{\textsc{random}}},p)}\text{(\lx@cref{creftype~refnum}{lemma:expectation_iota_concentration})}}\underbrace{{\mathbb{P}}[\mathcal{E}^{\textsc{Conc}}(w)]}_{\to_{w\to\infty}1\text{ (\lx@cref{creftype~refnum}{lemma:event_K_concentrates_around_mu_w})}}+{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{K}|\neg\mathcal{E}^{\textsc{Conc}}(w)\Big{]}\underbrace{{\mathbb{P}}[\neg\mathcal{E}^{\textsc{Conc}}(w)]}_{\to_{w\to\infty}0\text{ (\lx@cref{creftype~refnum}{lemma:event_K_concentrates_around_mu_w})}}

We now show that ιk𝒰(𝒮w)1q(0)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}\leq\frac{1}{q(0)}, which combined with Lemma E.13 implies that the second term goes to 0 as ww\to\infty. As h(n)h(n) is increasing, q(n)q(0)q(n)\geq q(0) for all number of positive review ratings nn and thus q𝒛𝒰(𝒮w)=1(wc)S𝒮wq(iSzi)q(0)q_{\bm{z}}^{\mathcal{U}(\mathcal{S}_{w})}=\frac{1}{\binom{w}{c}}\sum_{S\in\mathcal{S}_{w}}q(\sum_{i\in S}z_{i})\geq q(0) for all review rating states 𝒛{0,1}w\bm{z}\in\{0,1\}^{w}. Hence,

ιk𝒰(𝒮w)=1(wk)𝒛{0,1}w:N𝒛=k1q𝒛𝒰(𝒮w)1q(0).\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}=\frac{1}{\binom{w}{k}}\sum_{\bm{z}\in\{0,1\}^{w}:N_{\bm{z}}=k}\frac{1}{q_{\bm{z}}^{\mathcal{U}(\mathcal{S}_{w})}}\leq\frac{1}{q(0)}.

The proof is concluded by invoking Lemma E.9 and bounding the limit as ww\rightarrow\infty for the expansion of 𝔼KBinom(w,μ)[ιK𝒰(𝒮w)]{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{K}\Big{]} (using Lemmas E.13 and E.14 as well as the boundedness of ιk𝒰(𝒮w)\iota^{\mathcal{U}(\mathcal{S}_{w})}_{k}):

Rev1(σrandom(w),p)=p𝔼KBinom(w,μ)[ιK𝒰(𝒮w)]wppRev1(σrandom,p)=Rev1(σrandom,p).\textsc{Rev}_{1}(\sigma^{\textsc{random}(w)},p)=\frac{p}{{\mathbb{E}}_{K\sim{\mathrm{Binom}}(w,\mu)}\Big{[}\iota^{\mathcal{U}(\mathcal{S}_{w})}_{K}\Big{]}}\to_{w\to\infty}\frac{p}{\frac{p}{{\textsc{Rev}}_{1}({\sigma^{\textsc{random}}},p)}}={\textsc{Rev}}_{1}({\sigma^{\textsc{random}}},p).

E.7 Continuity of revenue in the discount factor γ\gamma (Lemma E.12)

To prove Lemma E.12 we analyze the process of the most recent ww reviews under σrandom(w)\sigma^{\textsc{random}(w)}. Let yi=(zi,si)y_{i}=(z_{i},s_{i}) be the information of the ii-th most recent review where zi{0,1}z_{i}\in\{0,1\} is the review rating and sis_{i}\in\mathbb{N} is the staleness of the review (i.e., the number of rounds elapsed since posting the review). For a vector 𝒚=(y1,,yw)\bm{y}=(y_{1},\ldots,y_{w}) of the most recent ww reviews and their information, let 𝑹(𝒚)=(yi1,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots y_{i_{c}}) be the random variable of the information from the cc reviews chosen by σrandom(w)\sigma^{\textsc{random}(w)}, i.e., uniformly at random without replacement from {y1,,yw}\{y_{1},\ldots,y_{w}\}. We also denote the staleness s(𝒚)s(\bm{y}) of a state 𝒚\bm{y} by the staleness of its oldest review, i.e., s(𝒚)=sws(\bm{y})=s_{w}. Given cc reviews with their information 𝑹(𝒚)=(yi1,yic)\bm{R}(\bm{y})=(y_{i_{1}},\ldots y_{i_{c}}), the purchase probability of the customer is

fγ(𝑹(𝒚))=Θ[Θ+h(Beta(a+j=1cγsij1zij,b+j=1cγsij1(1zij)))p].f_{\gamma}\big{(}\bm{R}(\bm{y})\big{)}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}z_{i_{j}},b+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}(1-z_{i_{j}}))\big{)}\geq p\Big{]}. (36)

For any round tt, let 𝒀𝒕=((Zt,1,St,1),(Zt,w,St,w))({0,1}×)w\bm{Y_{t}}=\big{(}(Z_{t,1},S_{t,1})\ldots,(Z_{t,w},S_{t,w})\big{)}\in(\{0,1\}\times\mathbb{N})^{w} denote the most recent ww reviews where the ii-th most recent review has rating Zt,i{0,1}Z_{t,i}\in\{0,1\} and staleness St,iS_{t,i}\in\mathbb{N}. For any state 𝒚\bm{y} let T𝒚~,𝒚=min{n:𝒀n=𝒚|𝒀0=𝒚}T_{\tilde{\bm{y}},\bm{y}}=\min\{n\in\mathbb{N}:\bm{Y}_{n}=\bm{y}|\bm{Y}_{0}=\bm{y}\} be the first round that 𝒀t\bm{Y}_{t} reaches 𝒚\bm{y} conditioned on starting in 𝒚~\tilde{\bm{y}}. The following lemma shows that the long-range transition probability of 𝒀t\bm{Y}_{t} between two states decay geometrically with the staleness of the destination state. Formally, for two review states 𝒚,𝒚~({0,1}×)w\bm{y},\tilde{\bm{y}}\in(\{0,1\}\times\mathbb{N})^{w}, in the next lemma shows that if 𝒀t\bm{Y}_{t} starts in 𝒚~\tilde{\bm{y}} the probability that it ends up in 𝒚\bm{y} after exactly s(𝒚)s(\bm{y}) rounds decays geometrically with the staleness s(𝒚)s(\bm{y}). Further, the lemma shows that the first round 𝒀t\bm{Y}_{t} enters the state 𝒚\bm{y}, i.e., T𝒚~,𝒚T_{\tilde{\bm{y}},\bm{y}} has geometrically decaying tails (its proof is provided in Section E.8).

Lemma E.15.

For any problem instance (μ,,a,b,c,h)\mathcal{E}(\mu,\mathcal{F},a,b,c,h) and bounded price p>0p>0, there exist two constants η,ξ(0,1)\eta,\xi\in(0,1) such that for any discount factor γ[0,1]\gamma\in[0,1] and any two review states 𝐲,𝐲~({0,1}×)w\bm{y},\tilde{\bm{y}}\in(\{0,1\}\times\mathbb{N})^{w}, a) ξs(𝐲)[𝐘s(𝐲)=𝐲|𝐘0=𝐲~]ηs(𝐲)\xi^{s(\bm{y})}\geq{\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\tilde{\bm{y}}]\geq\eta^{s(\bm{y})} and b) [T𝐲~,𝐲qs(𝐲)+1](1ηs(𝐲))q{\mathbb{P}}[T_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1]\leq\big{(}1-\eta^{s(\bm{y})}\big{)}^{q}.

Equipped with this lemma, we decompose the revenue of σrandom(w)\sigma^{\textsc{random}(w)} into the contribution from states with different staleness ss. For a state 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w} of the ww most recent reviews, let qγ(𝒚)=𝔼[fγ(𝑹(𝒚))]q_{\gamma}(\bm{y})={\mathbb{E}}[f_{\gamma}(\bm{R(y)})] be the ex-ante purchase probability, where the randomness in the expectation is taken over the cc reviews chosen by σrandom(w)\sigma^{\textsc{random}(w)} from the ww most recent reviews.

Lemma E.16.

The process 𝐘t\bm{Y}_{t} is a time-homogeneous Markov chain. Letting πγ\pi_{\gamma} be its stationary distribution, the revenue of σrandom(w)\sigma^{\textsc{random}(w)} can be expressed as Revγ(σrandom(w),p)=s=w+Σγ(s){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p)=\sum_{s=w}^{+\infty}\Sigma_{\gamma}(s) where Σγ(s)=p𝐲({0,1}×)w:s(𝐲)=sπγ(𝐲)qγ(𝐲)\Sigma_{\gamma}(s)=p\cdot\sum_{\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}:s(\bm{y})=s}\pi_{\gamma}(\bm{y})q_{\gamma}(\bm{y}) is the contribution from states with staleness ss.

Proof of Lemma E.16.

The proof has three components: (1) 𝒀t\bm{Y}_{t} is a time-homogeneous Markov chain; (2) 𝒀t\bm{Y}_{t} is irreducible, i.e., any state can be reached from any other state with positive probability; (3) 𝒀t\bm{Y}_{t} is positive-recurrent, i.e., 𝔼[T𝒚,𝒚]<{\mathbb{E}}[T_{\bm{y},\bm{y}}]<\infty for any state 𝒚\bm{y}. Theorem 3 in [Gal97] then implies that 𝒀t\bm{Y}_{t} has a stationary distribution πγ\pi_{\gamma} and the Ergodic theorem thus yields

Revγ(σrandom(w),p)=plim infT𝔼[t=1Tqγ(𝒀t)T]=p𝔼𝒚πγ[q(𝒚)]=s=w+p𝒚:s(𝒚)=sπγ(𝒚)qγ(𝒚)Σγ(s).{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p)=p\cdot\liminf_{T\to\infty}{\mathbb{E}}\Big{[}\frac{\sum_{t=1}^{T}q_{\gamma}(\bm{Y}_{t})}{T}\Big{]}=p\cdot{\mathbb{E}}_{\bm{y}\sim\pi_{\gamma}}\big{[}q(\bm{y})\big{]}=\sum_{s=w}^{+\infty}\underbrace{p\cdot\sum_{\bm{y}:s(\bm{y})=s}\pi_{\gamma}(\bm{y})q_{\gamma}(\bm{y})}_{\Sigma_{\gamma}(s)}.

The last step expands and regroups 𝔼𝒚πγ[qγ(𝒚)]{\mathbb{E}}_{\bm{y}\sim\pi_{\gamma}}\big{[}q_{\gamma}(\bm{y})\big{]} over states 𝒚\bm{y} with staleness s(𝒚)=ss(\bm{y})=s. The remainder of the proof shows the aforementioned three components.

(1) We first argue that 𝒀𝒕\bm{Y_{t}} is a time-homogeneous Markov chain. If 𝒀𝒕\bm{Y_{t}} is at state 𝒚=(y1,,yw)({0,1}×)w\bm{y}=(y_{1},\ldots,y_{w})\in(\{0,1\}\times\mathbb{N})^{w} where yi=(zi,si)y_{i}=(z_{i},s_{i}) for i{1,,w}i\in\{1,\ldots,w\}, the ex-ante purchase probability is equal to qγ(𝒚)q_{\gamma}(\bm{y}). If a purchase occurs, a review with staleness of 1 is left and the next state is 𝒀𝒕+𝟏=((z,1),(z1,s1+1),(zw1,sw1+1))\bm{Y_{t+1}}=\big{(}(z,1),(z_{1},s_{1}+1)\ldots,(z_{w-1},s_{w-1}+1)\big{)} where z=1z=1 with probability μ\mu (positive review) , and z=0z=0 with probability 1μ1-\mu (negative review). If a purchase does not occur, the review ratings of the ww most recent reviews remain the same while the staleness of each review increases by one, i.e., the next state is 𝒀𝒕+𝟏=((z1,s1+1),(zw,sw+1))\bm{Y_{t+1}}=\big{(}(z_{1},s_{1}+1)\ldots,(z_{w},s_{w}+1)\big{)}. Thus, 𝒀t\bm{Y}_{t} is time-homogeneous Markov chain.

(2) We second show that 𝒀t\bm{Y}_{t} is irreducible. By part a) of Lemma E.15, [𝒀s(𝒚)=𝒚|𝒀0=𝒚]>ηs(𝒚)>0{\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}]>\eta^{s(\bm{y})}>0 for any two states 𝒚,𝒚\bm{y},\bm{y}^{\prime} implying the irreducibility of 𝒀t\bm{Y}_{t}.

(3) We finally show that 𝒀t\bm{Y}_{t} is positive-recurrent. The expected return time for any state 𝒚\bm{y} is

𝔼[T𝒚,𝒚]\displaystyle{\mathbb{E}}[T_{\bm{y},\bm{y}}] =n=1+[T𝒚,𝒚n]=q=0+n=qs(𝒚)+1(q+1)s(𝒚)[T𝒚,𝒚n]\displaystyle=\sum_{n=1}^{+\infty}{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n]=\sum_{q=0}^{+\infty}\sum_{n=q\cdot s(\bm{y})+1}^{(q+1)\cdot s(\bm{y})}{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n]
q=0+s(𝒚)[T𝒚,𝒚qs(𝒚)+1]s(𝒚)q=0+(1ηs(𝒚))q=s(𝒚)ηs(𝒚)<+.\displaystyle\leq\sum_{q=0}^{+\infty}s(\bm{y}){\mathbb{P}}[T_{\bm{y},\bm{y}}\geq q\cdot s(\bm{y})+1]\leq s(\bm{y})\sum_{q=0}^{+\infty}\big{(}1-\eta^{s(\bm{y})}\big{)}^{q}=\frac{s(\bm{y})}{\eta^{s(\bm{y})}}<+\infty.

The second inequality uses b) of Lemma E.15. The last equality uses q=0+(1ηs(𝒚))q=1ηs(𝒚)\sum_{q=0}^{+\infty}\big{(}1-\eta^{s(\bm{y})}\big{)}^{q}=\frac{1}{\eta^{s(\bm{y})}}. ∎

We next show that for any staleness ss, the contribution Σγ(s)\Sigma_{\gamma}(s) (from Lemma E.16) is bounded.

Lemma E.17.

For any strongly non-absorbing price p>0p>0, there exists some constant ξ(0,1)\xi\in(0,1) such that the contribution from states with staleness ss is bounded by maxγ[0,1]Σγ(s)p(2s)wξs\max_{\gamma\in[0,1]}\Sigma_{\gamma}(s)\leq p\cdot(2s)^{w}\xi^{s}.

Proof of Lemma E.17.

The proof has three components: (1) there exists some ξ(0,1)\xi\in(0,1) such that πγ(𝒚)ξs(𝒚)\pi_{\gamma}(\bm{y})\leq\xi^{s(\bm{y})} for any state 𝒚\bm{y}, (2) the ex-ante purchase probability is qγ(𝒚)1q_{\gamma}(\bm{y})\leq 1 for any state 𝒚\bm{y}; (3) Σγ(s)\Sigma_{\gamma}(s) has at most (2s)w(2s)^{w} summands. Using (1), (2), and (3) the proof is concluded as

Σγ(s)=p𝒚:s(𝒚)=sπγ(𝒚)qγ(𝒚)p𝒚:s(𝒚)=sπγ(𝒚)p(2s)wξs.\Sigma_{\gamma}(s)=p\cdot\sum_{\bm{y}:s(\bm{y})=s}\pi_{\gamma}(\bm{y})q_{\gamma}(\bm{y})\leq p\cdot\sum_{\bm{y}:s(\bm{y})=s}\pi_{\gamma}(\bm{y})\leq p\cdot(2s)^{w}\cdot\xi^{s}.

The remainder of the proof shows the aforementioned three components.

(1) We first show that 𝒀t\bm{Y}_{t} is aperiodic at any state 𝒚\bm{y}. The lower bound in part a) of Lemma E.15 implies that [𝒀s(𝒚)=𝒚|𝒀0=𝒚]ηs(𝒚){\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\bm{y}]\geq\eta^{s(\bm{y})}. Invoking the lower bound in part a) of Lemma E.15 again and the time-homogeneity of 𝒀t\bm{Y}_{t}, for any state 𝒚𝒩(𝒚)={𝒚|[𝒀1=𝒚|𝒀0=𝒚]>0}\bm{y^{\prime}}\in\mathcal{N}(\bm{y})=\{\bm{y}^{\prime}|{\mathbb{P}}[\bm{Y}_{1}=\bm{y}^{\prime}|\bm{Y}_{0}=\bm{y}]>0\},

[𝒀s(𝒚)+1=𝒚|𝒀0=𝒚]\displaystyle{\mathbb{P}}[\bm{Y}_{s(\bm{y})+1}=\bm{y}|\bm{Y}_{0}=\bm{y}] [𝒀s(𝒚)+1=𝒚|𝒀1=𝒚][𝒀1=𝒚|𝒀0=𝒚]\displaystyle\geq{\mathbb{P}}[\bm{Y}_{s(\bm{y})+1}=\bm{y}|\bm{Y}_{1}=\bm{y}^{\prime}]{\mathbb{P}}[\bm{Y}_{1}=\bm{y}^{\prime}|\bm{Y}_{0}=\bm{y}]
=[𝒀s(𝒚)=𝒚|𝒀0=𝒚][𝒀1=𝒚|𝒀0=𝒚]ηs(𝒚)[𝒀1=𝒚|𝒀0=𝒚]>0.\displaystyle={\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}]{\mathbb{P}}[\bm{Y}_{1}=\bm{y}^{\prime}|\bm{Y}_{0}=\bm{y}]\geq\eta^{s(\bm{y})}{\mathbb{P}}[\bm{Y}_{1}=\bm{y}^{\prime}|\bm{Y}_{0}=\bm{y}]>0.

Given that the greatest common divisor of s(𝒚)+1s(\bm{y})+1 and s(𝒚)s(\bm{y}) is 1, 𝒀t\bm{Y}_{t} is aperiodic at the state 𝒚\bm{y}.

Let 𝒚({0,1}×)w\bm{y}^{\prime}\in(\{0,1\}\times\mathbb{N})^{w} be any state of review information. For any number of rounds nn, expressing [𝒀n=𝒚|𝒀0=𝒚]{\mathbb{P}}[\bm{Y}_{n}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}] over the possible states the Markov chain at round ns(𝒚)n-s(\bm{y}), using the time-homogeneity of 𝒀n\bm{Y}_{n}, and the upper bound of part a) of Lemma E.15 we obtain

[𝒀n=𝒚|𝒀0=𝒚]\displaystyle{\mathbb{P}}[\bm{Y}_{n}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}] =y′′({0,1}×)w[𝒀n=𝒚|𝒀ns(𝒚)=𝒚′′][𝒀ns(𝒚)=𝒚′′|𝒀0=𝒚]\displaystyle=\sum_{y^{\prime\prime}\in(\{0,1\}\times\mathbb{N})^{w}}{\mathbb{P}}[\bm{Y}_{n}=\bm{y}|\bm{Y}_{n-s(\bm{y})}=\bm{y}^{\prime\prime}]{\mathbb{P}}[\bm{Y}_{n-s(\bm{y})}=\bm{y}^{\prime\prime}|\bm{Y}_{0}=\bm{y}^{\prime}]
=y′′({0,1}×)w[𝒀s(𝒚)=𝒚|𝒀0=𝒚′′][𝒀ns(𝒚)=𝒚′′|𝒀0=𝒚]\displaystyle=\sum_{y^{\prime\prime}\in(\{0,1\}\times\mathbb{N})^{w}}{\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime\prime}]{\mathbb{P}}[\bm{Y}_{n-s(\bm{y})}=\bm{y}^{\prime\prime}|\bm{Y}_{0}=\bm{y}^{\prime}]
ξs(𝒚)y′′[𝒀ns(𝒚)=𝒚′′|𝒀0=𝒚]=1=ξs(𝒚).\displaystyle\leq\xi^{s(\bm{y})}\underbrace{\sum_{y^{\prime\prime}}{\mathbb{P}}[\bm{Y}_{n-s(\bm{y})}=\bm{y}^{\prime\prime}|\bm{Y}_{0}=\bm{y}^{\prime}]}_{=1}=\xi^{s(\bm{y})}.

As 𝒀t\bm{Y}_{t} is aperiodic at 𝒚\bm{y}, Theorem 1 in [Gal97] yields πγ(𝒚)=limn[𝒀n=𝒚|𝒀0=𝒚]ξs(𝒚)\pi_{\gamma}(\bm{y})=\lim_{n\to\infty}{\mathbb{P}}[\bm{Y}_{n}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}]\leq\xi^{s(\bm{y})}.

(2) Given that qγ(𝒚)q_{\gamma}(\bm{y}) is the ex-ante purchase probability, it is at most 11.

(3) It remains to show that Σs(γ)\Sigma_{s}(\gamma) has at most (2s)w(2s)^{w} summands, or equivalently |𝒚({0,1}×)w:s(𝒚)=s|<(2s)w|\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}:s(\bm{y})=s|<(2s)^{w}. For any staleness ss there are at most (2s)w(2s)^{w} review states 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w} whose oldest review has staleness ss as ssi1s\geq s_{i}\geq 1 and zi{0,1}z_{i}\in\{0,1\} for all ii and there are thus at most 2s2s possibilities for each information yi=(zi,si)y_{i}=(z_{i},s_{i}) for i=1,,wi=1,\ldots,w. ∎

To show continuity of Revγ(σrandom(w),p){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p) it remains to show continuity of each Σγ(s)\Sigma_{\gamma}(s) term as well as their sum. This is done in the following lemmas (proofs in Section E.8)

Lemma E.18.

For any strongly non-absorbing price p>0p>0, the contribution from states with staleness ss, Σγ(s)\Sigma_{\gamma}(s), is continuous in the discount factor γ\gamma.

Lemma E.19.

Let {fn}n=1\{f_{n}\}_{n=1}^{\infty} be a sequence of continuous functions fn:[0,1]0f_{n}:[0,1]\to\mathbb{R}_{\geq 0}. Suppose that n=1maxγ[0,1]fn(γ)<+\sum_{n=1}^{\infty}\max_{\gamma\in[0,1]}f_{n}(\gamma)<+\infty. Then f(γ)n=1fn(γ)f(\gamma)\coloneqq\sum_{n=1}^{\infty}f_{n}(\gamma) is continuous in γ[0,1]\gamma\in[0,1].

Proof of Lemma E.12.

To prove that Revγ(σrandom(w),p){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p) is continuous in γ\gamma, we express it as sum over the contribution of different staleness levels ss (Lemma E.16):

Revγ(σrandom(w),p)=s=w+Σγ(s).{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p)=\sum_{s=w}^{+\infty}\Sigma_{\gamma}(s).

By Lemma E.17, each summand is upper bounded by maxγ[0,1]Σγ(s)p(2s)wξs\max_{\gamma\in[0,1]}\Sigma_{\gamma}(s)\leq p\cdot(2s)^{w}\xi^{s}, which implies that maxγ[0,1]Σγ(s)s=wp(2s)wξs<\max_{\gamma\in[0,1]}\Sigma_{\gamma}(s)\leq\sum_{s=w}^{\infty}p\cdot(2s)^{w}\xi^{s}<\infty for any ξ(0,1)\xi\in(0,1). By Lemma E.18 each summand is also continuous; Lemma E.19 then implies that Revγ(σrandom(w),p){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{random}(w)},p) is continuous in γ[0,1]\gamma\in[0,1]. ∎

E.8 Auxillary lemmas for the proof of Lemma E.12 (Lemmas E.15, E.18, E.19)

Proof of Lemma E.15.

Lemma E.15 states that the long-range transition probabilities between any two states decay geometrically with the staleness of the destination state, i.e., there are constants ξ,η(0,1)\xi,\eta\in(0,1) such that for any states 𝒚,𝒚~\bm{y},\tilde{\bm{y}}: a) ξs(𝒚)[𝒀s(𝒚)=𝒚|𝒀0=𝒚~]ηs(𝒚)\xi^{s(\bm{y})}\geq{\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\tilde{\bm{y}}]\geq\eta^{s(\bm{y})} and b) [T𝒚~,𝒚qs(𝒚)+1](1ηs(𝒚))q{\mathbb{P}}[T_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1]\leq\big{(}1-\eta^{s(\bm{y})}\big{)}^{q}. Let η=min(μq¯,(1μ)q¯,1q¯)\eta=\min\big{(}\mu\underline{q},(1-\mu)\underline{q},1-\overline{q}\big{)} and ξ=max(μq¯,(1μ)q¯,1q¯)\xi=\max(\mu\overline{q},(1-\mu)\overline{q},1-\underline{q}) where q¯\underline{q} and q¯\overline{q} are the smallest and largest (over review states and discount factor) purchase probabilities:

q¯\displaystyle\underline{q} =inf(s1,,sc)cinf(z1,,zc){0,1}cinfγ[0,1]Θ[Θ+h(Beta(a+i=1cγsi1zi,b+i=1cγsi1(1zi)))p]\displaystyle=\inf_{(s_{1},\ldots,s_{c})\in\mathbb{N}^{c}}\inf_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\inf_{\gamma\in[0,1]}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Bigg{[}\Theta+h\Big{(}\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}\gamma^{s_{i}-1}z_{i},b+\sum_{i=1}^{c}\gamma^{s_{i}-1}(1-z_{i})\big{)}\Big{)}\geq p\Bigg{]}
q¯\displaystyle\overline{q} =sup(s1,,sc)csup(z1,,zc){0,1}csupγ[0,1]Θ[Θ+h(Beta(a+i=1cγsi1zi,b+i=1cγsi1(1zi)))p].\displaystyle=\sup_{(s_{1},\ldots,s_{c})\in\mathbb{N}^{c}}\sup_{(z_{1},\ldots,z_{c})\in\{0,1\}^{c}}\sup_{\gamma\in[0,1]}{\mathbb{P}}_{\Theta\sim\mathcal{F}}\Bigg{[}\Theta+h\Big{(}\mathrm{Beta}\big{(}a+\sum_{i=1}^{c}\gamma^{s_{i}-1}z_{i},b+\sum_{i=1}^{c}\gamma^{s_{i}-1}(1-z_{i})\big{)}\Big{)}\geq p\Bigg{]}.

For any state of review information 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}, the ex-ante purchase probability is thus qγ(𝒚)[q¯,q¯]q_{\gamma}(\bm{y})\in[\underline{q},\overline{q}], and the non-zero transition probabilities exiting 𝒚\bm{y} are equal to μqγ(𝒚)\mu q_{\gamma}(\bm{y}) (purchase and positive review), (1μ)qγ(𝒚)(1-\mu)q_{\gamma}(\bm{y}) (purchase and negative review) or 1qγ(𝒚)1-q_{\gamma}(\bm{y}) (no purchase). As a result, each of these one-round, non-zero transitions probabilities is at least η\eta and at most ξ\xi. Given that pp is strongly non-absorbing, \mathcal{F} is continuous, and h(Beta(x,y))h\big{(}\mathrm{Beta}(x,y)\big{)} is continuous in (x,y)(x,y), then 0<q¯<q¯<10<\underline{q}<\overline{q}<1, and thus η,ξ(0,1)\eta,\xi\in(0,1).

Having established that the one-round, non-zero transition probabilities lie in [η,ξ][\eta,\xi], we now show the sws_{w}-round transition probabilities from 𝒚~\tilde{\bm{y}} to 𝒚\bm{y} lie in [ηsw,ξsw][\eta^{s_{w}},\xi^{s_{w}}], where sw=s(𝒚)s_{w}=s(\bm{y}) is the staleness of the oldest review in the state 𝒚\bm{y}. We first argue that conditioning on 𝒀0=𝒚~\bm{Y}_{0}=\tilde{\bm{y}} and 𝒀sw=𝒚\bm{Y}_{s_{w}}=\bm{y}, the information states 𝒀1,,𝒀sw\bm{Y}_{1},\ldots,\bm{Y}_{s_{w}} are fixed. Let Xt=zX_{t}=z if there was a purchase and a review z{0,1}z\in\{0,1\} at round tt, and Xt=X_{t}=\perp otherwise. Letting 𝒚=((zi,si))i=1w\bm{y}=((z_{i},s_{i}))_{i=1}^{w} where the ii-the most recent review has rating ziz_{i} and staleness sis_{i}, the event 𝒀sw=𝒚\bm{Y}_{s_{w}}=\bm{y} implies that review rating ziz_{i} was left at round τi=swsi\tau_{i}=s_{w}-s_{i}, i.e., Xτi=ziX_{\tau_{i}}=z_{i}, and no purchase was made in any other round, i.e. Xt=X_{t}=\perp for t{τ1,,τw}t\not\in\{\tau_{1},\ldots,\tau_{w}\}. Therefore, the values X1,,XswX_{1},\ldots,X_{s_{w}} are deterministically fixed. Together with the fact that 𝒀0=𝒚~\bm{Y}_{0}=\tilde{\bm{y}}, this implies that the states 𝒀1,,𝒀sw\bm{Y}_{1},\ldots,\bm{Y}_{s_{w}} are also deterministically fixed and thus the sws_{w}-transition probability [𝒀sw=𝒚|𝒀0=𝒚~]{\mathbb{P}}[\bm{Y}_{s_{w}}=\bm{y}|\bm{Y}_{0}=\tilde{\bm{y}}] can be decomposed into a product of sws_{w} one-round transitions from 𝒀i\bm{Y}_{i} to 𝒀i+1\bm{Y}_{i+1} where i=0,,sw1i=0,\ldots,s_{w}-1. As each of those sws_{w} transition probabilities is in the interval [η,ξ][\eta,\xi], then [𝒀sw=𝒚|𝒀0=𝒚0]{\mathbb{P}}[\bm{Y}_{s_{w}}=\bm{y}|\bm{Y}_{0}=\bm{y}_{0}] is in the interval [ηsw,ξsw][\eta^{s_{w}},\xi^{s_{w}}], yielding part a).

To prove b), let 𝒁q=𝒀qs(𝒚)\bm{Z}_{q}=\bm{Y}_{q\cdot s(\bm{y})} for q=0,1,q=0,1,\ldots denote the process of every s(𝒚)s(\bm{y})-th state of 𝒀t\bm{Y}_{t}. Using the time-homogeneity of 𝒀t\bm{Y}_{t} and the lower bound from part a), for any state 𝒚({0,1}×)w\bm{y}^{\prime}\in(\{0,1\}\times\mathbb{N})^{w},

[𝒁q+1=𝒚|𝒁q=𝒚]=[𝒀(q+1)s(𝒚)=𝒚|𝒀qs(𝒚)=𝒚]=[𝒀s(𝒚)=𝒚|𝒀0=𝒚]ηs(𝒚).{\mathbb{P}}[\bm{Z}_{q+1}=\bm{y}|\bm{Z}_{q}=\bm{y}^{\prime}]={\mathbb{P}}[\bm{Y}_{(q+1)\cdot s(\bm{y})}=\bm{y}|\bm{Y}_{q\cdot s(\bm{y})}=\bm{y}^{\prime}]={\mathbb{P}}[\bm{Y}_{s(\bm{y})}=\bm{y}|\bm{Y}_{0}=\bm{y}^{\prime}]\geq\eta^{s(\bm{y})}. (37)

Letting T~𝒚~,𝒚=min{n s.t. 𝒁n=𝒚|𝒁0=𝒚~}\tilde{T}_{\tilde{\bm{y}},\bm{y}}=\min\{n\in\mathbb{N}\text{ s.t. }\bm{Z}_{n}=\bm{y}|\bm{Z}_{0}=\tilde{\bm{y}}\}, Eq. 37 implies that T~𝒚~,𝒚\tilde{T}_{\tilde{\bm{y}},\bm{y}} is stochastically dominated by a Geom(ηs(𝒚))Geom(\eta^{s(\bm{y})}) random variable and [T~𝒚~,𝒚q+1](1ηs(𝒚))q{\mathbb{P}}[\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq q+1]\leq(1-\eta^{s(\bm{y})})^{q}. The definition of 𝒁q\bm{Z}_{q} implies that s(𝒚)T~𝒚~,𝒚T𝒚~,𝒚s(\bm{y})\cdot\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq T_{\tilde{\bm{y}},\bm{y}} where T𝒚~,𝒚=min{n s.t. 𝒀n=𝒚|𝒀0=𝒚~}T_{\tilde{\bm{y}},\bm{y}}=\min\{n\in\mathbb{N}\text{ s.t. }\bm{Y}_{n}=\bm{y}|\bm{Y}_{0}=\tilde{\bm{y}}\}. Hence

[T𝒚~,𝒚qs(𝒚)+1][s(𝒚)T~𝒚~,𝒚qs(𝒚)+1][T~𝒚~,𝒚q+1](1ηs(𝒚))q{\mathbb{P}}[T_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1]\leq{\mathbb{P}}[s(\bm{y})\cdot\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1]\leq{\mathbb{P}}[\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq q+1]\leq(1-\eta^{s(\bm{y})})^{q}

where the second inequality uses that the event s(𝒚)T~𝒚~,𝒚qs(𝒚)+1s(\bm{y})\cdot\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1 implies that T~𝒚~,𝒚qs(𝒚)+1\tilde{T}_{\tilde{\bm{y}},\bm{y}}\geq q\cdot s(\bm{y})+1 (as T~𝒚~,𝒚\tilde{T}_{\tilde{\bm{y}},\bm{y}} is an integer). This concludes the proof. ∎

Proof of Lemma E.18.

Recall that Σγ(s)=p𝒚:s(𝒚)=sπγ(𝒚)qγ(𝒚)\Sigma_{\gamma}(s)=p\cdot\sum_{\bm{y}:s(\bm{y})=s}\pi_{\gamma}(\bm{y})q_{\gamma}(\bm{y}). To prove the lemma, we show the following three components: (1) For any 𝒚\bm{y}, qγ(𝒚)q_{\gamma}(\bm{y}) is continuous in γ\gamma; (2) For any 𝒚\bm{y}, πγ(𝒚)\pi_{\gamma}(\bm{y}) is continuous in γ\gamma; (3) the number of summands in Σγ(s)\Sigma_{\gamma}(s) is at most (2s)w(2s)^{w}. Those components imply that Σγ(s)\Sigma_{\gamma}(s) is continuous in γ\gamma. We now prove the three components.

(1) We first show that qγ(𝒚)q_{\gamma}(\bm{y}) is continuous in γ\gamma. For any state of reviews 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}, the ex-ante purchase probability qγ(𝒚)q_{\gamma}(\bm{y}) is qγ(𝒚)=(i1,,ic)𝒮w1(wc)fγ(yi1,,yic)q_{\gamma}(\bm{y})=\sum_{(i_{1},\ldots,i_{c})\in\mathcal{S}_{w}}\frac{1}{\binom{w}{c}}f_{\gamma}(y_{i_{1}},\ldots,y_{i_{c}}), where

fγ(yi1,,yic)=Θ[Θ+h(Beta(a+j=1cγsij1zij,b+j=1cγsij1(1zij)))p].f_{\gamma}(y_{i_{1}},\ldots,y_{i_{c}})={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}z_{i_{j}},b+\sum_{j=1}^{c}\gamma^{s_{i_{j}}-1}(1-z_{i_{j}}))\big{)}\geq p\Big{]}.

The continuity of the customer-specific valuation Θ\Theta and the function (x,y)h(Beta(x,y))(x,y)\to h(\mathrm{Beta}(x,y)) imply that fγ(yi1,,yic)f_{\gamma}(y_{i_{1}},\ldots,y_{i_{c}}) is continuous in γ[0,1]\gamma\in[0,1] for any (i1,,ic)(i_{1},\ldots,i_{c}), and thus the same holds for qγ(𝒚)q_{\gamma}(\bm{y}).

(2) We next show that for any state 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}, the stationary distribution πγ(𝒚)\pi_{\gamma}(\bm{y}) is continuous in γ[0,1]\gamma\in[0,1]. Given that 𝒀t\bm{Y}_{t} is positive recurrent (see proof of Lemma E.16), Theorem 3 in [Gal97] yields that its stationary distribution at any state of review information 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w} is given by πγ(𝒚)=1𝔼[T𝒚,𝒚]\pi_{\gamma}(\bm{y})=\frac{1}{{\mathbb{E}}[T_{\bm{y},\bm{y}}]}. Thus, it suffices to show the continuity of 𝔼[T𝒚,𝒚]=n=1+[T𝒚,𝒚n]{\mathbb{E}}[T_{\bm{y},\bm{y}}]=\sum_{n=1}^{+\infty}{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n].

Let 𝒩(𝒚)={𝒚({0,1}×)w|[𝒀1=𝒚|𝒀0=𝒚]>0}\mathcal{N}(\bm{y})=\big{\{}\bm{y}^{\prime}\in(\{0,1\}\times\mathbb{N})^{w}\big{|}{\mathbb{P}}[\bm{Y}_{1}=\bm{y}^{\prime}|\bm{Y}_{0}=\bm{y}]>0\big{\}} be the set of states which can be reached from 𝒚\bm{y} within one round. Notice that any 𝒚𝒩(𝒚)\bm{y}^{\prime}\in\mathcal{N}(\bm{y}) is obtained from 𝒚\bm{y} after either a purchase (with positive or negative review), or no purchase. The respective probabilities are μqγ(𝒚)\mu q_{\gamma}(\bm{y}), (1μ)qγ(𝒚)(1-\mu)q_{\gamma}(\bm{y}), and 1qγ(𝒚)1-q_{\gamma}(\bm{y}). As qγ(𝒚)q_{\gamma}(\bm{y}) is continuous in γ\gamma, each of the one-round transition probabilities are continuous in γ[0,1]\gamma\in[0,1]. The tail probability [T𝒚,𝒚n]{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n] can be expanded as

[T𝒚,𝒚n]=[𝒀n1𝒚,,𝒀1𝒚|𝒀0=𝒚]\displaystyle{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n]={\mathbb{P}}[\bm{Y}_{n-1}\neq\bm{y},\ldots,\bm{Y}_{1}\neq\bm{y}|\bm{Y}_{0}=\bm{y}]
=𝒚1𝒩(𝒚)𝒚𝒚2𝒩(𝒚1)𝒚𝒚n1𝒩(𝒚n2)𝒚[𝒀1=𝒚1|𝒀0=𝒚][𝒀n1=𝒚n1|𝒀n2=𝒚n2]\displaystyle=\sum_{\bm{y}_{1}\in\mathcal{N}(\bm{y})\setminus\bm{y}}\sum_{\bm{y}_{2}\in\mathcal{N}(\bm{y}_{1})\setminus\bm{y}}\ldots\sum_{\bm{y}_{n-1}\in\mathcal{N}(\bm{y}_{n-2})\setminus\bm{y}}{\mathbb{P}}[\bm{Y}_{1}=\bm{y}_{1}|\bm{Y}_{0}=\bm{y}]\ldots{\mathbb{P}}[\bm{Y}_{n-1}=\bm{y}_{n-1}|\bm{Y}_{n-2}=\bm{y}_{n-2}]

There are three events at state 𝒚\bm{y} (purchase with a positive/negative review, or no purchase) yielding |𝒩(𝒚)|3|\mathcal{N}(\bm{y})|\leq 3. Given that each of the one-round transition probabilities is continuous in γ\gamma and |𝒩(𝒚)|3|\mathcal{N}(\bm{y})|\leq 3, [T𝒚,𝒚n]{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n] sums over at most 3n3^{n} continuous terms and is thus also continuous. Using part b) in Lemma E.15, we bound the tail probability [T𝒚,𝒚n]{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n] as

[T𝒚,𝒚n][T𝒚,𝒚s(𝒚)(ns(𝒚)1)+1](1ηs(𝒚))ns(𝒚)1(1ηs(𝒚))ns(𝒚)1{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n]\leq{\mathbb{P}}\Big{[}T_{\bm{y},\bm{y}}\geq s(\bm{y})\cdot\big{(}\lceil\frac{n}{s(\bm{y})}\rceil-1\big{)}+1\Big{]}\leq(1-\eta^{s(\bm{y})})^{\lceil\frac{n}{s(\bm{y})}\rceil-1}\leq(1-\eta^{s(\bm{y})})^{\frac{n}{s(\bm{y})}-1}

where the first inequality uses that ns(𝒚)(nsw(𝒚)1)+1n\geq s(\bm{y})\cdot(\lceil\frac{n}{s_{w}(\bm{y})}\rceil-1)+1, the second inequality part b) in Lemma E.15, and the last inequality that 1ηs(𝒚)<11-\eta^{s(\bm{y})}<1 and ns(𝒚)ns(𝒚)\frac{n}{s(\bm{y})}\leq\lceil\frac{n}{s(\bm{y})}\rceil. Given that the right-hand side does not depend on γ\gamma, supγ[0,1][T𝒚,𝒚n](1ηs(𝒚))ns(𝒚)1\sup_{\gamma\in[0,1]}{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n]\leq(1-\eta^{s(\bm{y})})^{\frac{n}{s(\bm{y})}-1}. Further, as

n=1(1ηs(𝒚))ns(𝒚)1=(1ηs(𝒚))1s(𝒚)1n=0(1ηs(𝒚))ns(𝒚)=(1ηs(𝒚))1s(𝒚)11(1ηs(𝒚))1s(𝒚)<\sum_{n=1}^{\infty}(1-\eta^{s(\bm{y})})^{\frac{n}{s(\bm{y})}-1}=(1-\eta^{s(\bm{y})})^{\frac{1}{s(\bm{y})}-1}\sum_{n=0}^{\infty}(1-\eta^{s(\bm{y})})^{\frac{n}{s(\bm{y})}}=\frac{(1-\eta^{s(\bm{y})})^{\frac{1}{s(\bm{y})}-1}}{1-(1-\eta^{s(\bm{y})})^{\frac{1}{s(\bm{y})}}}<\infty

Lemma E.19 implies 𝔼[T𝒚,𝒚]=n=1+[T𝒚,𝒚n]{\mathbb{E}}[T_{\bm{y},\bm{y}}]=\sum_{n=1}^{+\infty}{\mathbb{P}}[T_{\bm{y},\bm{y}}\geq n] is continuous in γ[0,1]\gamma\in[0,1], and thus so is πγ(𝒚)\pi_{\gamma}(\bm{y}).

(3) It remains to show that the number of terms that Σs(γ)\Sigma_{s}(\gamma) sums over is at most (2s)w(2s)^{w}, or equivalently |𝒚({0,1}×)w:s(𝒚)=s|<(2s)w|\bm{y}\in(\{0,1\}\times\mathbb{N})^{w}:s(\bm{y})=s|<(2s)^{w}. For any staleness ss there are at most (2s)w(2s)^{w} review states 𝒚({0,1}×)w\bm{y}\in(\{0,1\}\times\mathbb{N})^{w} whose oldest review has staleness ss as ssi1s\geq s_{i}\geq 1 and zi{0,1}z_{i}\in\{0,1\} for all ii and there are thus 2s2s possibilities for each information yi=(zi,si)y_{i}=(z_{i},s_{i}) for i=1,,wi=1,\ldots,w. ∎

Proof Lemma E.19.

Let ϵ>0\epsilon>0. Since n=1maxγ[0,1]fn(γ)<+\sum_{n=1}^{\infty}\max_{\gamma\in[0,1]}f_{n}(\gamma)<+\infty there exists some MM\in\mathbb{N} such that n=M+1maxγ[0,1]fn(γ)<ϵ3\sum_{n=M+1}^{\infty}\max_{\gamma\in[0,1]}f_{n}(\gamma)<\frac{\epsilon}{3}. Thus, for every γ[0,1]\gamma\in[0,1],

|f(γ)n=1Mfn(γ)|=|n=M+1fn(γ)|n=M+1maxγ[0,1]fn(γ)ϵ3.|f(\gamma)-\sum_{n=1}^{M}f_{n}(\gamma)|=|\sum_{n=M+1}f_{n}(\gamma)|\leq\sum_{n=M+1}\max_{\gamma\in[0,1]}f_{n}(\gamma)\leq\frac{\epsilon}{3}.

Thus, for any γ1,γ2[0,1]\gamma_{1},\gamma_{2}\in[0,1], triangle inequality implies

|f(γ1)f(γ2)|\displaystyle|f(\gamma_{1})-f(\gamma_{2})| |f(γ1)n=1Mfn(γ1)|+|f(γ2)n=1Mfn(γ2)|+|n=1Mfn(γ2)n=1Mfn(γ1)|\displaystyle\leq|f(\gamma_{1})-\sum_{n=1}^{M}f_{n}(\gamma_{1})|+|f(\gamma_{2})-\sum_{n=1}^{M}f_{n}(\gamma_{2})|+|\sum_{n=1}^{M}f_{n}(\gamma_{2})-\sum_{n=1}^{M}f_{n}(\gamma_{1})|
2ϵ3+|n=1Mfn(γ2)n=1Mfn(γ1)|\displaystyle\leq\frac{2\epsilon}{3}+|\sum_{n=1}^{M}f_{n}(\gamma_{2})-\sum_{n=1}^{M}f_{n}(\gamma_{1})|

By the continuity of fn(γ)f_{n}(\gamma) for each n[1,M]n\in[1,M], there exists some δ(ϵ)>0\delta(\epsilon)>0 s.t. |n=1Mfn(γ2)n=1Mfn(γ1)|ϵ3|\sum_{n=1}^{M}f_{n}(\gamma_{2})-\sum_{n=1}^{M}f_{n}(\gamma_{1})|\leq\frac{\epsilon}{3} whenever |γ1γ2|<δ(ϵ)|\gamma_{1}-\gamma_{2}|<\delta(\epsilon), and thus |f(γ1)f(γ2)|ϵ|f(\gamma_{1})-f(\gamma_{2})|\leq\epsilon. ∎

E.9 Non-monotononicity in the discount factor (Theorem 5.4)

We first show that under a more general class of instances, for any discount factor γ\gamma there exists a non-absorbing price for which the existence of CoNF results in smaller revenue under σnewest{\sigma^{\textsc{newest}}} with discounting customers as opposed to non-discounting ones. Let 𝒞\mathcal{C} be the class of instances where a single review is shown (c=1)(c=1), \mathcal{F} is a continuous distribution with support on [θ¯,θ¯][\underline{\theta},\overline{\theta}], and the fixed quality estimator h(Beta(x,y))h(\mathrm{Beta}(x,y)) is strictly increasing in the mean xx+y\frac{x}{x+y} of Beta(x,y)\mathrm{Beta}(x,y).

Note that 𝒞\mathcal{C} generalizes the class 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}} because the customer-specific valuation \mathcal{F} allows for continuous distributions other than uniform 𝒰[θ¯,θ¯]\mathcal{U}[\underline{\theta},\overline{\theta}] and the fixed quality estimator h(Beta(x,y))h(\mathrm{Beta}(x,y)) allows for a richer class of mappings other than the ϕ\phi-quantile for ϕ(0,0.5)\phi\in(0,0.5).

Lemma E.20.

For any problem instance in 𝒞\mathcal{C} and discount factor γ[0,1)\gamma\in[0,1), there exists εγ>0\varepsilon_{\gamma}>0 such that for any ε(0,εγ)\varepsilon\in(0,\varepsilon_{\gamma}), the price p¯ε=θ¯+h(Beta(a,b+1))ε\overline{p}_{\varepsilon}=\overline{\theta}+h\big{(}\mathrm{Beta}(a,b+1)\big{)}-\varepsilon is non-absorbing and induces higher revenue when γ<1\gamma<1 compared to non-discounting, i.e., Revγ(σnewest,p¯ε)>Rev1(σnewest,p¯ε){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},\overline{p}_{\varepsilon})>{\textsc{Rev}}_{1}(\sigma^{\textsc{newest}},\overline{p}_{\varepsilon}).

Proof of Theorem 5.4.

Consider the interval J=[θ¯+h(Beta(a+1,b)),θ¯+h(Beta(a,b+1))]J=[\underline{\theta}+h(\mathrm{Beta}(a+1,b)),\overline{\theta}+h(\mathrm{Beta}(a,b+1))]. Given that h(Beta(x,y))(0,1)h(\mathrm{Beta}(x,y))\in(0,1) for any (x,y)(x,y) and θ¯θ¯1\overline{\theta}-\underline{\theta}\geq 1, the interval JJ is non-empty. Proposition E.1 implies that any price pJp\in J is review-benefiting for any instance in the class 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}}. Given that JJ is non-empty, there exists εJ\varepsilon_{J} for all ε<εJ\varepsilon<\varepsilon_{J}, any price p¯ε=θ¯+h(Beta(a,b+1))ε\overline{p}_{\varepsilon}=\overline{\theta}+h(\mathrm{Beta}(a,b+1))-\varepsilon is review-benefiting for any instance 𝒞PessimisticEstimate\mathcal{C}^{\textsc{PessimisticEstimate}}. By Lemma E.20, there exists εγ\varepsilon_{\gamma} such that for all ε<εγ\varepsilon<\varepsilon_{\gamma}, any price p¯ε=θ¯+h(Beta(a,b+1))ε\overline{p}_{\varepsilon}=\overline{\theta}+h(\mathrm{Beta}(a,b+1))-\varepsilon is also non-absorbing and Revγ(σnewest,p¯ε)>Rev1(σnewest,p¯ε){\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},\overline{p}_{\varepsilon})>{\textsc{Rev}}_{1}(\sigma^{\textsc{newest}},\overline{p}_{\varepsilon}). Taking ε<min(εγ,εJ)\varepsilon<\min(\varepsilon_{\gamma},\varepsilon_{J}) concludes the proof. ∎

To prove Lemma E.20, consider the process 𝒀t=(Zt,St){0,1}×\bm{Y}_{t}=(Z_{t},S_{t})\in\{0,1\}\times\mathbb{N} where the most recent review has rating Zt{0,1}Z_{t}\in\{0,1\} and staleness StS_{t}\in\mathbb{N}. We argue that 𝒀t\bm{Y}_{t} is a time-homogeneous Markov chain. Let qγ(z,s,p)q_{\gamma}(z,s,p) be the purchase probability given for price pp, review rating z{0,1}z\in\{0,1\}, and staleness of the review ss, i.e.,

qγ(z,s,p)=Θ[Θ+h(Beta(a+zγs1,b+(1z)γs1))p].q_{\gamma}(z,s,p)={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a+z\gamma^{s-1},b+(1-z)\gamma^{s-1})\big{)}\geq p\Big{]}.

If 𝒀t\bm{Y}_{t} is at state y=(z,s)y=(z,s), a purchase occurs with probability qγ(z,s,p)q_{\gamma}(z,s,p). If a purchase occurs a review with staleness of 1 is left and the next state is 𝒀t+1=(z,1)\bm{Y}_{t+1}=(z^{\prime},1) where z=1z^{\prime}=1 with probability μ\mu (positive review) and z=0z^{\prime}=0 with probability 1μ1-\mu (negative review). If a purchase does not occur, the most recent review remains the same while its staleness increases by one, i.e., the next state is 𝒀t=(z,s+1)\bm{Y}_{t}=(z,s+1). Thus, 𝒀t\bm{Y}_{t} is a time-homogeneous Markov chain. The next lemma characterizes the stationary distribution of 𝒀t\bm{Y}_{t}. For review rating z{0,1}z\in\{0,1\}, let r(z,t)r(z,t) be the probability that after obtaining a review rating of zz, this review remains as the most recent one for at least the next tt rounds. Given that probability of no purchase at state (z,s)(z,s) is (1qγ(z,s,p))(1-q_{\gamma}(z,s,p)), then r(z,t)=s=1t(1qγ(z,s,p))r(z,t)=\prod_{s=1}^{t}(1-q_{\gamma}(z,s,p)). For review rating z{0,1}z\in\{0,1\}, let A(z)=1+t=1r(z,t)A(z)=1+\sum_{t=1}^{\infty}r(z,t) be the expected number of rounds the most recent review remains zz.

Lemma E.21.

For any non-absorbing price p>0p>0, the stationary distribution of 𝐘t\bm{Y}_{t} at rating zz and staleness tt is given by

π(z,t)=κμz(1μ)1zr(z,t1) whereκ=1μA(1)+(1μ)A(0).\pi(z,t)=\kappa\cdot\mu^{z}(1-\mu)^{1-z}\cdot r(z,t-1)\text{ where}\quad\kappa=\frac{1}{\mu A(1)+(1-\mu)A(0)}.
Proof.

By Theorem 3 in [Gal97], it suffices to show that π\pi satisfies the stationary state equations of 𝒀t\bm{Y}_{t}. We now describe those equations. Consider a state (z,t)(z,t) with rating zz and staleness tt. If the staleness is t2t\geq 2, the only way to reach (z,t)(z,t) is to first obtain state (z,1)(z,1) and obtain t1t-1 no purchase decisions. Therefore, the stationary equation for (z,t)(z,t) is:

π(z,t)=π(z,1)r(z,t1).\pi^{\star}(z,t)=\pi^{\star}(z,1)\cdot r(z,t-1). (38)

If the staleness is t=1t=1, then (z,1)(z,1) can be reached from any (z,t)(z^{\prime},t^{\prime}) after a purchase and a review rating of zz. Therefore, the stationary equation at (z,1)(z,1) is

π(z,1)=z{0,1}t=1μz(1μ)1zqγ(z,t,p)π(z,t)\pi^{\star}(z,1)=\sum_{z^{\prime}\in\{0,1\}}\sum_{t^{\prime}=1}^{\infty}\mu^{z}(1-\mu)^{1-z}q_{\gamma}(z^{\prime},t^{\prime},p)\pi^{\star}(z^{\prime},t^{\prime}) (39)

First, π\pi in the lemma statement satisfies Eq. 39 as:

π(z,t)=κμz(1μ)1zr(z,t1)=κμz(1μ)1zr(z,0)=1r(z,t1)=π(z,1)r(z,t1).\pi(z,t)=\kappa\cdot\mu^{z}(1-\mu)^{1-z}\cdot r(z,t-1)=\kappa\cdot\mu^{z}(1-\mu)^{1-z}\cdot\underbrace{r(z,0)}_{=1}\cdot r(z,t-1)=\pi(z,1)\cdot r(z,t-1).

Second, to see that π\pi in the lemma statement satisfies Eq. 39, we substitute π(z,t)=κμz(1μ)zr(z,t1)\pi(z^{\prime},t^{\prime})=\kappa\cdot\mu^{z^{\prime}}(1-\mu)^{z^{\prime}}\cdot r(z^{\prime},t^{\prime}-1) in the right-hand side and expand

z{0,1}t=1μz(1μ)1zqγ(z,t,p)π(z,t)\displaystyle\sum_{z^{\prime}\in\{0,1\}}\sum_{t^{\prime}=1}^{\infty}\mu^{z}(1-\mu)^{1-z}q_{\gamma}(z^{\prime},t^{\prime},p)\pi(z^{\prime},t^{\prime})
=κμz(1μ)1zz{0,1}t=1qγ(z,t,p)μz(1μ)1zr(z,t1)\displaystyle\qquad=\kappa\cdot\mu^{z}(1-\mu)^{1-z}\cdot\sum_{z^{\prime}\in\{0,1\}}\sum_{t^{\prime}=1}^{\infty}q_{\gamma}(z^{\prime},t^{\prime},p)\cdot\mu^{z^{\prime}}(1-\mu)^{1-z^{\prime}}\cdot r(z^{\prime},t^{\prime}-1)
=κμz(1μ)1z(μt=1qγ(1,t,p)r(1,t1)Γ(1,t,p)+(1μ)t=1qγ(0,t,p)r(0,t1)Γ(0,t,p))=π(z,1)\displaystyle\qquad=\kappa\cdot\mu^{z}(1-\mu)^{1-z}\cdot\Big{(}\mu\sum_{t^{\prime}=1}^{\infty}\underbrace{q_{\gamma}(1,t^{\prime},p)\cdot r(1,t^{\prime}-1)}_{\Gamma(1,t^{\prime},p)}+(1-\mu)\sum_{t^{\prime}=1}^{\infty}\underbrace{q_{\gamma}(0,t^{\prime},p)\cdot r(0,t^{\prime}-1)}_{\Gamma(0,t^{\prime},p)}\Big{)}=\pi(z,1)

The last equality holds as r(z,0)=1r(z,0)=1 and the term inside the parenthesis is equal to 11. The latter holds as Γ(z,t,p)\Gamma(z,t^{\prime},p) corresponds to the probability that, when starting from a state with review rating zz, we receive a new review after tt^{\prime} rounds; given that the price pp is non-absorbing, we eventually receive a new review and thus t=1Γ(z,t,p)=1\sum_{t^{\prime}=1}^{\infty}\Gamma(z,t^{\prime},p)=1 for any z{0,1}z\in\{0,1\}.

Finally, it remains to show that π\pi is a valid probability distribution, which follows as

z{0,1}t=1π(z,t)\displaystyle\sum_{z\in\{0,1\}}\sum_{t=1}^{\infty}\pi(z,t) =κz{0,1}t=1μz(1μ)1zr(z,t1)\displaystyle=\kappa\cdot\sum_{z\in\{0,1\}}\sum_{t=1}^{\infty}\mu^{z}(1-\mu)^{1-z}\cdot r(z,t-1)
=κ(μt=1r(1,t1)+(1μ)t=1r(0,t1))=κ(μA(1)+(1μ)A(0))=1.\displaystyle=\kappa\cdot\Big{(}\mu\sum_{t=1}^{\infty}r(1,t-1)+(1-\mu)\sum_{t=1}^{\infty}r(0,t-1)\Big{)}=\kappa\cdot(\mu A(1)+(1-\mu)A(0))=1.

Lemma E.22.

For any discount factor γ\gamma, the revenue of σnewest{\sigma^{\textsc{newest}}} is given by

Rev(σnewest,p)=κp whereκ=1μA(1)+(1μ)A(0).{\textsc{Rev}}({\sigma^{\textsc{newest}}},p)=\kappa\cdot p\text{ where}\quad\kappa=\frac{1}{\mu A(1)+(1-\mu)A(0)}.
Proof.

By the Ergodic thoerem and substituting the stationary distribution π\pi (Lemma E.21):

Revγ(σnewest,p)\displaystyle{\textsc{Rev}}_{\gamma}(\sigma^{\textsc{newest}},p) =plim infT𝔼[t=1Tq(𝒀t,p)T]=pz{0,1}s=1+π(z,s)qγ(z,s,p)\displaystyle=p\liminf_{T\to\infty}{\mathbb{E}}\Big{[}\frac{\sum_{t=1}^{T}q(\bm{Y}_{t},p)}{T}\Big{]}=p\cdot\sum_{z\in\{0,1\}}\sum_{s=1}^{+\infty}\pi(z,s)q_{\gamma}(z,s,p)
=pκ(μt=1qγ(1,t,p)r(1,t1)=Γ(1,t,p)+(1μ)t=1qγ(0,t,p)r(0,t1)=Γ(0,t,p))=1\displaystyle=p\cdot\kappa\cdot\Big{(}\mu\sum_{t=1}^{\infty}\underbrace{q_{\gamma}(1,t,p)\cdot r(1,t-1)}_{=\Gamma(1,t,p)}+(1-\mu)\sum_{t=1}^{\infty}\underbrace{q_{\gamma}(0,t,p)\cdot r(0,t-1)}_{=\Gamma(0,t,p)}\Big{)}=1

The last equality holds as t=1Γ(z,t,p)=1\sum_{t^{\prime}=1}^{\infty}\Gamma(z,t^{\prime},p)=1 for any z{0,1}z\in\{0,1\} (see proof of Lemma E.21). ∎

Proof of Lemma E.20.

We first show that, when γ<1\gamma<1, any non-absorbing price pp induces revenue that is lower bounded by a positive quantity. For such a price, the purchase probability for a negative review is increasing in the staleness, i.e., qγ(0,s,p)qγ(0,s+1,p)q_{\gamma}(0,s,p)\leq q_{\gamma}(0,s+1,p). This holds as h(Beta(x,y))h(\mathrm{Beta}(x,y)) is increasing in xx+y\frac{x}{x+y} and enables us to upper bound the expected time A(0)A(0) needed to escape a negative-rating state. In particular, using that qγ(0,2,p)qγ(0,s,p)q_{\gamma}(0,2,p)\leq q_{\gamma}(0,s,p) for any s2s\geq 2,

A(0)=1+t=1s=1t1(1qγ(0,s,p))1+(1qγ(0,1,p))t=1(1qγ(0,2,p))t1=1+1qγ(0,1,p)qγ(0,2,p).A(0)=1+\sum_{t=1}^{\infty}\prod_{s=1}^{t-1}(1-q_{\gamma}(0,s,p))\leq 1+(1-q_{\gamma}(0,1,p))\sum_{t=1}^{\infty}(1-q_{\gamma}(0,2,p))^{t-1}=1+\frac{1-q_{\gamma}(0,1,p)}{q_{\gamma}(0,2,p)}.

This bound on A(0)A(0) enables us to show a lower bound on the revenue of σnewest{\sigma^{\textsc{newest}}}. In particular, the monotonicity of h()h(\cdot) implies that the purchase probability under positive review is larger than under negative review, i.e., qγ(1,s,p)qγ(0,s,p)q_{\gamma}(1,s,p)\geq q_{\gamma}(0,s,p) and thus A(0)A(1)A(0)\geq A(1). Combining this with Lemma E.22:

Revγ(σnewest,p)=pμA(1)+(1μ)A(0)pA(0)pqγ(0,2,p)1qγ(0,1,p)+qγ(0,2,p)pqγ(0,2,p)2LB(p).{\textsc{Rev}}_{\gamma}({\sigma^{\textsc{newest}}},p)=\frac{p}{\mu A(1)+(1-\mu)A(0)}\geq\frac{p}{A(0)}\geq\frac{p\cdot q_{\gamma}(0,2,p)}{1-q_{\gamma}(0,1,p)+q_{\gamma}(0,2,p)}\geq\underbrace{\frac{p\cdot q_{\gamma}(0,2,p)}{2}}_{\textsc{LB(p)}}.

The price p¯=θ¯+h(Beta(a,b+1))\overline{p}=\overline{\theta}+h(\mathrm{Beta}(a,b+1)) is absorbing (as the purchase probability is zero when the review is negative) and thus Rev1(σnewest,p¯)=0{\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},\overline{p})=0 by Proposition 3.1. The remainder of the proof establishes that there exits a positive quantity Q>0Q>0 such that for non-absorbing prices pp close to p¯\overline{p}, Revγ(σnewest,p)>Q>Rev1(σnewest,p){\textsc{Rev}}_{\gamma}({\sigma^{\textsc{newest}}},p)>Q>{\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},p), which then concludes the lemma.

First observe that the price p¯ε=θ¯+h(Beta(a,b+1))ε\overline{p}_{\varepsilon}=\overline{\theta}+h(\mathrm{Beta}(a,b+1))-\varepsilon is non-absorbing for any ϵ>0\epsilon>0; this holds by the continuity of Θ\Theta as the purchase probability for any review is at least Θ[Θ[θ¯ε,θ¯]]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta\in[\overline{\theta}-\varepsilon,\overline{\theta}]\big{]}>0. For this price the purchase probability given a negative review with staleness two is,

qγ(0,2,p¯ε)=Θ[Θ+h(Beta(a,b+γ))p¯ε]=Θ[Θθ¯+Δγε]q_{\gamma}(0,2,\overline{p}_{\varepsilon})={\mathbb{P}}_{\Theta\sim\mathcal{F}}\Big{[}\Theta+h\big{(}\mathrm{Beta}(a,b+\gamma)\big{)}\geq\overline{p}_{\varepsilon}\Big{]}={\mathbb{P}}_{\Theta\sim\mathcal{F}}\big{[}\Theta\geq\overline{\theta}+\Delta_{\gamma}-\varepsilon\big{]} (40)

where Δγ=h(Beta(a,b+1))h(Beta(a,b+γ))\Delta_{\gamma}=h(\mathrm{Beta}(a,b+1))-h(\mathrm{Beta}(a,b+\gamma)) is the difference in the fixed value estimate given negative review between non-discounting and discounting. The strict monotonicity of h()h(\cdot) implies that Δγ<0\Delta_{\gamma}<0, and therefore Θ[Θθ¯+Δγ]>0{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]>0. Combining with the continuity of Θ\Theta and the fact that p¯>0\overline{p}>0, Eq. 40 implies that there exists some εγ,1\varepsilon_{\gamma,1} such that for any ε<εγ,1\varepsilon<\varepsilon_{\gamma,1}, qγ(0,2,p¯ε)Θ[Θθ¯+Δγ]2q_{\gamma}(0,2,\overline{p}_{\varepsilon})\geq\frac{{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]}{2} and p¯ε>p¯2\overline{p}_{\varepsilon}>\frac{\overline{p}}{2}, yielding LB(p¯ε)=p¯εqγ(0,2,p¯ε)2>p¯Θ[Θθ¯+Δγ]8\textsc{LB}(\overline{p}_{\varepsilon})=\frac{\overline{p}_{\varepsilon}\cdot q_{\gamma}(0,2,\overline{p}_{\varepsilon})}{2}>\frac{\overline{p}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]}{8}. As a result, the revenue under γ<1\gamma<1 is lower bounded by Qp¯Θ[Θθ¯+Δγ]8Q\coloneqq\frac{\overline{p}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]}{8}:

Revγ(σnewest,p¯ε)LB(p¯ε)=p¯εqγ(0,2,p¯ε)2>p¯Θ[Θθ¯+Δγ]8=Q.{\textsc{Rev}}_{\gamma}({\sigma^{\textsc{newest}}},\overline{p}_{\varepsilon})\geq\textsc{LB}(\overline{p}_{\varepsilon})=\frac{\overline{p}_{\varepsilon}\cdot q_{\gamma}(0,2,\overline{p}_{\varepsilon})}{2}>\frac{\overline{p}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]}{8}=Q.

Second, for a non-discounting customer, the continuity of Θ\Theta implies that Rev1(σnewest,p){\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},p) is continuous in the price pp. Combined with and Rev1(σnewest,p¯)=0{\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},\overline{p})=0 (the price p¯\overline{p} is absorbing), there exists εγ,2\varepsilon_{\gamma,2} such that for ε<εγ,2\varepsilon<\varepsilon_{\gamma,2},

Rev1(σnewest,p¯ε)<p¯Θ[Θθ¯+Δγ]8=Q.{\textsc{Rev}}_{1}({\sigma^{\textsc{newest}}},\overline{p}_{\varepsilon})<\frac{\overline{p}\cdot{\mathbb{P}}_{\Theta\sim\mathcal{F}}[\Theta\geq\overline{\theta}+\Delta_{\gamma}]}{8}=Q.

Letting εγ=min(εγ,1,εγ,2)\varepsilon_{\gamma}=\min(\varepsilon_{\gamma,1},\varepsilon_{\gamma,2}), any price p¯ϵ\overline{p}_{\epsilon} where 0<ϵ<ϵγ0<\epsilon<\epsilon_{\gamma} satisfies the claim of the lemma. ∎

Appendix F Supplementary material for Section 6

[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]
[Uncaptioned image]