This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Demand Estimation from Sales Transaction Data - Practical Extensions
Submission to Journal of Revenue and Pricing Management

Norbert Remenyi
Xiaodong Luo
Abstract

In this paper we discuss practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We present modifications and extensions of the models and discuss data preprocessing and solution techniques which are useful for practitioners dealing with sales transaction data. Among these, we present an algorithm to split sales transaction data observed under partial availability, we extend a popular Expectation Maximization (EM) algorithm for non-homogeneous product sets, and we develop two iterative optimization algorithms which can handle much of the extensions discussed in the paper.

Sabre

May 31, 2020

Keywords. Demand estimation. Demand untruncation. Multinomial logit model. EM algorithm. MM algorithm. Frank-Wolfe method. Revenue management

1 Introduction

Demand estimation using censored sales transaction data has many applications in airline commercial planning process. We refer readers to a survey paper by Sharif Azadeh et al. (2014) for a brief introduction to this topic. In this paper we discuss some practical limitations and extensions of a particular choice-based demand model popular in the literature to estimate demand from sales transaction data. Discrete choice models (e.g., Ben-Akiva and Lerman (1994), Train (2003)) have provided a popular approach for estimating demand for different products in a set of substitutable items, especially in transportation and revenue management applications. We look at a common demand model, which appeared in several papers, including Dai et al. (2014), Vulcano et al. (2012), and Abdallah and Vulcano (2016). The motivation of this work came from observing a few shortcoming of these models when applied in practice, on airline revenue management data.

We will build on and extend the work presented in Vulcano et al. (2012) and Abdallah and Vulcano (2016). They combine a multinomial logit (MNL) choice model with non-homogeneous Poisson arrivals over multiple periods. The MNL model has been used by many practitioners and researchers to represent the underlying choice behavior. Although its property of independence of irrelevant alternatives (IIA) is somewhat restrictive, the model is simple, leading to tractable estimation and assortment optimization (Talluri and van Ryzin, 2004). The problem is how to jointly estimate the arrival rates of customers and the preference weights of the products via maximizing the likelihood functions. There are two different likelihood functions. The first one is the incomplete data likelihood function (see (3)(\ref{eq:incomp}) below and also Eq. (2) in Vulcano et al. (2012)) and the second one is the log-likelihood function which is based on the primary demand (see (13)(\ref{eq:compLL}) below and also Eq. (13) in Vulcano et al. (2012)). The inputs are observed historical sales, availability of the products, and market share information.

Our contribution is to discuss practical limitations of the specific model above, and present some interesting extensions. We will discuss partial availability of products, some relaxation of the IIA assumption, constrained parameter space, non-homogeneous product set, and the interpretation of the no-purchase option and related market share. We hope this discussion can facilitate more research and extension of these models.

We will present an algorithm to split sales transaction data observed under partial availability, and an extension of the EM algorithm for the case when we observe a non-homogeneous product set. We develop two iterative optimization algorithms which incorporate partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates of customers. In the first formulation we use a market share constraint at each time period, and incorporate them into the objective function through the preference weights of the outside alternative. The formulation is solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. In the second formulation we use a single, aggregate market share constraint over the time horizon, and assume knowledge of preference weights of the outside alternative. Using this formulation we develop a fast, iterative minorization-maximization algorithm (MM) building on the work in Abdallah and Vulcano (2016).

While the EM algorithm focuses on solving the complete data log-likelihood functions, the two new algorithms (like Abdallah and Vulcano (2016)) aim to solve the incomplete likelihood function directly. We remark that both likelihood functions render similar quality solutions, but they involve different intermediate decision variables and require different solution approaches. It is not known that these two methods are mathematically equivalent.

2 Practical limitations of existing models

In this section we discuss in detail some practical limitations of the choice-based demand model discussed in Vulcano et al. (2012) and in Abdallah and Vulcano (2016). The model combines non-homogeneous Poisson arrivals over multiple periods with a multinomial logit (MNL) choice model. The model assumes a retailer offers a fixed number of nn products over a time horizon TT, and the products are either available or not available for sale in a time period. From the observed sales data we estimate the demand, the sales we would have observed if all the products were available for purchase. The total demand of all the products at time tt (including outside alternatives and the no-purchase option) is modeled as a Poisson distributed random variable with parameter λt\lambda_{t}. Hence we model the total demand as a non-homogeneous Poisson model, since different time periods are allowed to have different mean demands.

The demand for individual products is modeled using a multinomial distribution, given the total demand of all products. The preferences for different products are assumed to be fixed over time, and the probability that customer chooses product ii when iSti\in S_{t} is modeled through the simple multinomial logit (MNL) model, that is

Pi(St,𝒗)=viv0+jStvj\displaystyle P_{i}(S_{t},\bm{v})=\frac{v_{i}}{v_{0}+\sum_{j\in S_{t}}v_{j}} (1)

When iSti\notin S_{t}, then Pi(St,𝒗)=0P_{i}(S_{t},\bm{v})=0. Here vi,i=1,,nv_{i},~{}i=1,...,n is the positive preference weight for product ii, and StS_{t} is the set of products available for sale at time tt. The preference weight of outside alternatives and no-purchase option (OA) are embedded into the coefficient v0v_{0}. The parameter is set to v0=1v_{0}=1 in the references above, following a standard approach of normalizing.

The incomplete data likelihood function of the model is defined as

LI(𝒗,𝝀)\displaystyle L_{I}(\bm{v},\bm{\lambda}) =\displaystyle= t=1T[(P(mtcustomers buy in periodt|𝒗,𝝀)mt!z1t!z2t!znt!)jSt[Pj(St,𝒗)iStPi(St,𝒗)]zjt]\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\prod_{j\in S_{t}}\left[\frac{P_{j}(S_{t},\bm{v})}{\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})}\right]^{z_{jt}}\right]

where

P(mtcustomers buy in periodt|𝒗,𝝀)\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda}) =\displaystyle= [λtiStPi(St,𝒗)]mtexp(λtiStPi(St,𝒗))mt!\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})\right)}{m_{t}!}

In the above equations zitz_{it} denotes the number of purchases of product ii at time period tt, and mt=i=1nzitm_{t}=\sum_{i=1}^{n}z_{it} denotes the total number of purchases in period tt. After some algebra the log-likelihood function can be written as

lI(𝒗,𝝀)=t=1T[mtlog(λtv0+iStvi)λtiStviv0+iStvi+iStzitlog(vi)]\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right] (2)

One approach is to directly maximize the log-likelihood function and jointly estimate the preference weights of the products and the arrival rates of customers. The above log-likelihood function, however, is hard to solve in general, therefore the research literature discusses different approaches to estimate the parameters of this model, given sales data and information on what was available for sale. Vulcano et al. (2012) developed an elegant EM algorithm by looking at the problem in terms of primary (first choice) demand, and treating the observed demand as incomplete observations of primary demand. Abdallah and Vulcano (2016) solves the estimation problem by specializing the minorization-maximization (MM) procedure, which is an iterative algorithm for maximizing an objective function by successively maximizing a simpler function that minorizes the true objective function.

It is also interesting to note that the objective function has a continuum of maximizers. For this reason, Vulcano et al. (2012) and Abdallah and Vulcano (2016) imposed additional constraint on the preference weights as a function of the market share

s=i=1nviv0+i=1nvi\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}}

There are a number of limiting assumptions and possible extensions of the model presented above, when we apply it to real data.

  1. 1.

    Products are fully open or closed for sale

    • The discussed model assumes that a product is either available or not for sale. Practitioners often work with aggregated data, and unable to capture the sales at a very granular level, every time the assortment changes. It is very practical to extend the model to consume data where we observe partial availability. For example, a product can be 80% open for sale in a time period.

  2. 2.

    MNL assumption

    • Customers choose from the available products according to an MNL model. One of the properties of the MNL model is the independence of irrelevant alternatives (IIA), which can be unrealistic in real applications. If customers, for instance, always purchase the product with the lowest price, the algorithm in Vulcano et al. (2012) does not converge. If the sell-down is strong, the demand is grossly overestimated.

  3. 3.

    Unconstrained parameter space

    • In practice we can encounter parameter estimates which are unreasonable in a business scenario. This may be due to the fact that the underlying assumptions of the model (such as MNL) do not exactly fit the true data generating model. Instead of using a more sophisticated model and develop its solution algorithm, we might want to simply constrain the parameters of the simpler model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for L1L_{1} regularization (Lasso regression) and L2L_{2} regularization (Ridge regression). It can be also of interest to solve the problem by putting an upper bound on the arrival rate of customers (λt\lambda_{t}). This can be helpful to regularize the model by having an interpretable business fence on these parameters.

  4. 4.

    Homogeneous product set

    • The model assumes that over different time periods we have the same set of existing products. Some might be unavailable for sale, but there exist an underlying demand for them. In revenue management, we often encounter changing product IDs (unique flight identifiers) due to schedule changes, hence we need to be able to model a non-homogeneous product set over time. Instead of handling the homogeneous parts separately, we would like to use the data over a larger time interval to borrow power to estimate the parameters.

  5. 5.

    No-purchase option is always available

    • The model above assumes that the no-purchase option is always available, that is fully open. If we include competitor’s products into the no-purchase option, this can be an unrealistic assumption. For instance, airline competitors likely control their inventory similarly as the host airline, gradually closing down products as we get closer to departure. This assumption has implications on how the host market share is interpreted in the models.

In this paper we address some of these points, and discuss how we could relax these assumptions, incorporate them into the model, and estimate the parameters. These ideas might foster further extensions of the existing models and rigorous research in the future.

2.1 Partial availability

Practitioners can encounter data at an aggregate level, where we can observe partial availability of products. As an example, in a revenue management system we store information on sales and availability at certain pre-departure time points. It is possible that a product becomes closed for sale during the time period, and we observe partial availability. For instance, a booking class on a flight can be open 60% of the time in a time interval, and the bookings we observe are matched to this partial availability. It would be of interest to extend the algorithms to work with this type of data and handle the full spectrum of availability in [0,1][0,1], as opposed to be restricted to either open (1) or closed (0) products. Another possible venue is to modify the data to fit the algorithms already developed in the literature.

2.1.1 Extending the attraction model

A natural way to incorporate partial availability is to extend the MNL purchase probabilities as

Pjt(St,𝒗,𝒐)=vjojtv0+iStvioit\displaystyle P^{\star}_{jt}(S_{t},\bm{v},\bm{o})=\frac{v_{j}\cdot o_{jt}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}

where oit[0,1]o_{it}\in\left[0,1\right] represents availability of product ii at time tt. It is obvious that oit>0o_{it}>0 is equivalent to iSti\in S_{t}. This simple formulation modifies the purchase probabilities by linearly adjusting preference weight viv_{i} with open percentage oito_{it}. If the open percentage is zero or one, we get back to the original formulation.

We have not derived the EM algorithm of Vulcano et al. (2012) with the extended purchase probability definition, but this could be an interesting venue for research. For demonstration purposes we can resort to directly maximizing the log-likelihood using a solver. The modified likelihood function becomes

LI(𝒗,𝝀)\displaystyle L_{I}(\bm{v},\bm{\lambda}) =\displaystyle= t=1T[(P(mtcustomers buy in periodt|𝒗,𝝀)mt!z1t!z2t!znt!)jSt[Pj(St,𝒗,𝒐)iStPi(St,𝒗,𝒐)]zjt]\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\prod_{j\in S_{t}}\left[\frac{P^{\star}_{j}(S_{t},\bm{v},\bm{o})}{\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})}\right]^{z_{jt}}\right] (3)

where

P(mtcustomers buy in periodt|𝒗,𝝀)\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda}) =\displaystyle= [λtiStPi(St,𝒗,𝒐)]mtexp(λtiStPi(St,𝒗,𝒐))mt!\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})\right)}{m_{t}!}

and, after omitting the constant terms, the log-likelihood function modifies to

lI(𝒗,𝝀)=t=1T[mtlog(λtv0+iStvioit)λtiStvioitv0+iStvioit+iStzitlog(vi)]\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right] (4)

2.1.2 Data disaggregation

In the previous section we discussed a simple formulation which could potentially be used to incorporate partial availability information into the purchase probability definition. Another approach to handle partial availability is to disaggregate the data to fully open and closed assortments, and use existing algorithms for the estimation. We can split the observed sales under partial availability by making a simple assumption that the sales are distributed uniformly over time.

To demonstrate this idea on a simple example, let us assume that the observed sales bb and open percentages oo for three available products are

𝒃=[125],𝒐=[1.00.80.5]\displaystyle\bm{b}=\begin{bmatrix}1\\ 2\\ 5\end{bmatrix},\quad\bm{o}=\begin{bmatrix}1.0\\ 0.8\\ 0.5\end{bmatrix}

We assumed that the elements of 𝒐\bm{o} are non-increasing from top to bottom. Then we can represent 𝒐\bm{o} as the sum of three fully open and closed assortments with weights αi\alpha_{i} as

𝒐=[1.00.80.5]=α1[100]+α2[110]+α3[111]=0.2[100]+0.3[110]+0.5[111]\displaystyle\bm{o}=\begin{bmatrix}1.0\\ 0.8\\ 0.5\end{bmatrix}=\alpha_{1}\begin{bmatrix}1\\ 0\\ 0\end{bmatrix}+\alpha_{2}\begin{bmatrix}1\\ 1\\ 0\end{bmatrix}+\alpha_{3}\begin{bmatrix}1\\ 1\\ 1\end{bmatrix}=0.2\begin{bmatrix}1\\ 0\\ 0\end{bmatrix}+0.3\begin{bmatrix}1\\ 1\\ 0\end{bmatrix}+0.5\begin{bmatrix}1\\ 1\\ 1\end{bmatrix}

Note that αi,i=1,,n\alpha_{i},~{}i=1,...,n represent time proportions and i=1nαi=1\sum_{i=1}^{n}\alpha_{i}=1. The results indicate that all products were available for sale 50% of the time, two products were available 30% of the time, and one product was available 20% of the time. A graphical representation of this example is shown in Figure 1.

α1+α2+α3\alpha_{1}+\alpha_{2}+\alpha_{3}α2+α3\alpha_{2}+\alpha_{3}α3\alpha_{3}α3=0.5\alpha_{3}=0.5α2=0.3\alpha_{2}=0.3α1=0.2\alpha_{1}=0.2Prod 33Prod 22Prod 11
Figure 1: Split example

For the general case, the elements of 𝜶\bm{\alpha} can be calculated as the consecutive differences in the open percentages, that is

αi\displaystyle\alpha_{i} =\displaystyle= oioi+1,i=1,,n1\displaystyle o_{i}-o_{i+1},\quad i=1,\ldots,n-1
αn\displaystyle\alpha_{n} =\displaystyle= on\displaystyle o_{n}

Following this simple idea we can split the observed sales under partial availability using the following identity

𝒃\displaystyle\bm{b} =\displaystyle= [α1i=1nαi000][b1b2bn1bn]+[α2i=1nαiα2i=2nαi00][b1b2bn1bn]++[αn1i=1nαiαn1i=2nαiαn1i=n1nαi0][b1b2bn1bn]+\displaystyle\begin{bmatrix}\frac{\alpha_{1}}{\sum_{i=1}^{n}\alpha_{i}}\\ 0\\ \vdots\\ 0\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+\begin{bmatrix}\frac{\alpha_{2}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{2}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ 0\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+\ldots+\begin{bmatrix}\frac{\alpha_{n-1}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{n-1}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ \frac{\alpha_{n-1}}{\sum_{i=n-1}^{n}\alpha_{i}}\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+
[αni=1nαiαni=2nαiαni=n1nαiαnαn][b1b2bn1bn]\displaystyle\begin{bmatrix}\frac{\alpha_{n}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{n}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ \frac{\alpha_{n}}{\sum_{i=n-1}^{n}\alpha_{i}}\\ \frac{\alpha_{n}}{\alpha_{n}}\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}

where \circ denotes the element-wise multiplication, or Hadamard product. Note that if αi=0\alpha_{i}=0, we do not need a split to create a new assortment. The sales splitting algorithm under partial availability is described in Algorithm 1.

Algorithm 1 Sales splitting algorithm for partial availability
1:𝒃\bm{b}: vector of observed sales
2:𝒐\bm{o}: vector of observed open percentages
3:Sort 𝒐\bm{o} in decreasing order and apply the order to 𝒃\bm{b}
4:Calculate time proportion vector α\bm{\alpha}
5:for i=1,,ni=1,\ldots,n do
6:    if i=ni=n then
7:        αi=oi\alpha_{i}=o_{i}
8:    else
9:        αi=oioi+1\alpha_{i}=o_{i}-o_{i+1}
10:    end if
11:end for
12:Split sales
13:for i=1,,ni=1,\ldots,n do
14:    if αi=0\alpha_{i}=0 then
15:        Continue
16:    else
17:        Calculate partial sales vector 𝒃(𝒊)\bm{b^{(i)}} and open percentage vector 𝒐(𝒊)\bm{o^{(i)}}
18:        for j=1,,nj=1,\ldots,n do
19:           if jij\leq i then
20:               bj(i)=αiojbjb^{(i)}_{j}=\frac{\alpha_{i}}{o_{j}}b_{j}
21:               oj(i)=1o^{(i)}_{j}=1
22:           else
23:               bj(i)=0b^{(i)}_{j}=0
24:               oj(i)=0o^{(i)}_{j}=0
25:           end if
26:        end for
27:    end if
28:end for

The algorithm results in partial sales vectors 𝒃(𝒊)\bm{b^{(i)}} and fully open and closed assortment vectors 𝒐(𝒊)\bm{o^{(i)}} for i{j:αj>0}i\in\{j:\alpha_{j}>0\}. The splitting logic ensures that

𝒃=i{j:αj>0}𝒃(𝒊)\displaystyle\bm{b}=\sum_{i\in\{j:\alpha_{j}>0\}}\bm{b^{(i)}}

After the splitting logic, we could combine intervals from different time tt which have exactly the same product offerings and product availabilities, to improve performance.

2.1.3 Comparison on a simulated example

To demonstrate the estimation from sales data with partial availability, we extended the example from Vulcano et al. (2012) by adding partial availability data to the observed sales. The observed data is presented in Table 1.

Sales Period
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 10 15 11 14 0 0 0 0 0 0 0 0 0 0 0
2 11 6 11 8 20 16 0 0 0 0 0 0 0 0 0
3 5 6 1 11 4 5 14 7 11 0 0 0 0 0 0
4 4 4 4 1 6 4 3 5 9 9 6 9 0 0 0
5 0 2 0 0 1 0 1 3 0 3 3 5 2 3 3
Open percentage 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 0.7 0.3 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.8 0.5 1.0 1.0 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.9 0.9 1.0 1.0 0.3 0.5 0.9 0.7 0.7 0.0 0.0 0.0 0.2 0.2 0.0
4 1.0 0.9 1.0 1.0 0.5 1.0 1.0 0.8 0.9 0.6 0.5 0.2 0.3 0.5 0.0
5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.9 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Table 1: Example with partial availability

We first use the extended attraction model discussed in Section 2.1.1. Direct maximization of the log-likelihood (4) by a nonlinear solver results in total primary demand of 1194.6. The detailed estimated demands and parameters are presented in Table 2.

Estimates Period 𝐯i\mathbf{v}_{i}
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 15.02 20.07 12.47 15.70 33.60 22.73 19.60 19.33 24.84 38.14 32.27 84.35 5.74 7.22 35.06 1.000
2 11.23 15.01 9.32 11.74 25.13 17.00 14.66 14.45 18.58 28.53 24.14 63.08 4.29 5.40 26.22 0.748
3 3.91 5.22 3.24 4.09 8.74 5.92 5.10 5.03 6.46 9.93 8.40 21.95 1.49 1.88 9.13 0.260
4 1.97 2.63 1.63 2.06 4.41 2.98 2.57 2.53 3.26 5.00 4.23 11.06 0.75 0.95 4.60 0.131
5 0.40 0.53 0.33 0.41 0.89 0.60 0.52 0.51 0.66 1.01 0.85 2.23 0.15 0.19 0.93 0.026
λt\mathbf{\lambda}_{t} 46.48 62.10 38.57 48.57 103.95 70.32 60.65 59.79 76.84 118.01 99.84 260.94 17.76 22.34 108.48
Table 2: Estimated demand and parameters using the extended attraction model

Second, we demonstrate the data disaggregation procedure from Section 2.1.2. We first split the observed sales data with partial availability to fully open and closed assortments, apply the EM algorithm of Vulcano et al. (2012), and then aggregate the solution back. To demonstrate this, let us apply Algorithm 1 on the data presented in Table 1. The disaggregated sales are shown in Table 3.

Sales Period
15 14 13 12 11 10 9
1 10.00 15.00 11.00 14.00
2 1.38 9.62 2.40 3.60 11.00 8.00 20.00 16.00
3 0.56 0.56 3.89 2.67 1.33 2.00 1.00 11.00 4.00 2.00 3.00 14.00
4 0.40 0.40 0.40 2.80 1.78 0.89 1.33 4.00 1.00 2.40 3.60 2.00 0.80 1.20 0.30 2.70
5 0.00 0.00 0.00 0.00 0.20 0.80 0.40 0.60 0.00 0.00 0.50 0.20 0.30 0.00 0.00 0.00 0.10 0.90
8 7 6 5 4 3 2 1
1
2
3 7.00 11.00 0.00 0.00
4 0.63 4.37 2.00 7.00 9.00 6.00 9.00 0.00 0.00 0.00 0.00
5 0.33 0.33 2.33 0.00 0.00 0.00 1.20 1.80 1.50 1.50 4.00 1.00 1.40 0.20 0.40 1.50 0.90 0.60 3.00
Table 3: Disaggregated sales of Table 1

After applying the EM algorithm on the disaggregated data, we end up with the disaggregated solution, presented in Table 4.

Estimates Period 𝐯i\mathbf{v}_{i}
15 14 13 12 11 10 9
1 0.81 0.91 1.33 10.00 2.41 5.02 2.86 15.00 11.00 14.00 6.03 5.27 15.88 4.05 2.68 11.50 0.81 16.85 1.000
2 0.59 0.67 0.94 9.62 1.76 3.65 1.64 3.60 11.00 8.00 4.39 3.83 13.63 2.95 1.95 10.90 0.59 12.26 0.727
3 0.24 0.25 0.38 3.89 0.71 1.20 0.91 2.00 1.00 11.00 1.77 1.55 2.73 1.19 0.90 2.04 0.24 6.29 0.294
4 0.14 0.18 0.27 2.80 0.36 0.80 0.61 1.33 4.00 1.00 0.91 0.85 2.45 0.71 0.36 0.82 0.11 1.21 0.150
5 0.00 0.00 0.00 0.00 0.06 0.36 0.27 0.60 0.00 0.00 0.15 0.07 0.20 0.00 0.00 0.00 0.04 0.40 0.026
λt\mathbf{\lambda}_{t} 2.55 2.87 4.16 37.59 7.57 15.76 8.97 32.19 38.57 48.57 18.94 16.54 49.85 12.73 8.41 36.09 2.55 52.89
8 7 6 5 4 3 2 1 x 𝐯i\mathbf{v}_{i} x
1 4.02 1.94 13.13 0.00 4.05 17.23 14.48 21.90 18.10 15.21 48.27 20.27 16.89 0.41 0.38 18.10 1.82 0.57 36.20 1.000
2 2.93 1.41 9.55 0.00 2.95 12.54 10.53 15.93 13.17 11.06 35.11 14.75 12.29 0.29 0.28 13.17 1.33 0.42 26.33 0.727
3 1.18 0.57 3.15 0.00 1.19 4.95 4.26 6.44 5.32 4.47 14.19 5.96 4.97 0.12 0.00 5.32 0.54 0.00 10.64 0.294
4 0.60 0.22 1.97 0.00 0.71 3.15 2.17 3.20 2.72 2.14 7.24 3.20 2.53 0.00 0.00 2.72 0.00 0.00 5.43 0.150
5 0.10 0.12 1.05 0.00 0.00 0.00 0.37 0.64 0.46 0.53 1.23 0.36 0.43 0.07 0.18 0.46 0.32 0.27 0.92 0.026
λt\mathbf{\lambda}_{t} 12.62 6.10 41.19 0.00 12.73 54.09 45.45 68.72 56.81 47.72 151.49 63.63 53.02 1.27 1.20 56.81 5.73 1.80 113.62
Table 4: Estimated demand and parameters of disaggregated sales data

Note that the estimated primary demands do not preserve the split proportions of the sales. For instance, in period 9, λ91=2.55\lambda_{9_{1}}=2.55 and λ92=52.89\lambda_{9_{2}}=52.89, while the sales were split by proportions α1=0.1\alpha_{1}=0.1 and α2=0.9\alpha_{2}=0.9. Note that the split proportions of sales do not need to match the proportion of demands, because the varying assortments imply varying market shares and spilled demand. However, to reduce the number of parameters to be estimated, it would be possible to modify the EM algorithm and incorporate constraints on λti\lambda_{t_{i}} to preserve the split proportions. Aggregating the results back, we end up with the final results presented in Table 5.

Estimates Period 𝐯i\mathbf{v}_{i}
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 13.05 25.29 11.00 14.00 27.19 18.23 17.66 19.09 21.29 36.38 33.31 68.54 17.68 20.50 36.20 1.000
2 11.82 10.64 11.00 8.00 21.85 15.80 12.85 13.89 15.48 26.46 24.22 49.85 12.86 14.91 26.33 0.727
3 4.76 4.82 1.00 11.00 6.05 4.14 6.53 4.90 6.14 10.70 9.79 20.15 5.09 5.86 10.64 0.294
4 3.39 3.10 4.00 1.00 4.21 1.89 1.32 2.79 3.86 5.38 4.85 10.45 2.53 2.72 5.43 0.150
5 0.00 1.29 0.00 0.00 0.43 0.00 0.44 1.27 0.00 1.01 1.00 1.59 0.68 1.05 0.92 0.026
λt\mathbf{\lambda}_{t} 47.17 64.50 38.57 48.57 85.33 57.23 55.43 59.92 66.82 114.17 104.53 215.12 55.50 64.34 113.62
Table 5: Estimated demand and parameters using data disaggregation

The results are fairly similar to the results in Table 2, where we incorporated partial availability into the attraction model. The total primary demand is 1190.8 as opposed to 1194.6. To benchmark these solutions, a naive approach would be to use the projection method to preprocess the observed sales using the open percentage information, such as

sales^=salesopenpct,\displaystyle\widehat{\textnormal{sales}}=\frac{\textnormal{sales}}{\textnormal{openpct}},

and then apply EM algorithm on the preprocessed data. With this approach we estimate the overall primary demand as 1519.9. This shows that incorporating partial availability natively into the attraction model or disaggregating the sales data are much more robust approaches. In case we observe small outliers in partial availability data, we end up with very large preprocessed sales using the projection method. Experiments showed that the disaggregation method dampens the effect of outliers the most.

2.2 Strong sell-down

The demand model discussed in Vulcano et al. (2012) and Abdallah and Vulcano (2016) assumes that customers choose from the available products based on the Basic Attraction Model (BAM) with the IIA property. The probability of product selection is governed by (1). This model cannot fit to scenarios with strong sell-down, and the same applies to the Generalized Attraction Model (GAM) discussed in Gallego et al. (2015). Strong sell-down can be a common phenomena in practice, because customers often seek to purchase the cheapest available product. A 100% sell-down example is presented in Table 6.

Sales Period
6 5 4 3 2 1
1 2 6 0 0 0 0
2 13 15 0 0
3 20 22
Table 6: Example with 100% sell-down

The EM algorithm developed in Vulcano et al. (2012) does not converge on this example, because the attraction model is unable to fit to the scenario presented in the data. The estimated first choice demands will converge to an unbounded solution.

A natural way to handle sell-down is by introducing an additional parameter ll for the lowest available product. The attraction model (1) would be extended as

Pi(St,𝒗,Lt,l)=vi+𝟙(iLt)lv0+jSt(vj+𝟙(jLt)l)\displaystyle P_{i}(S_{t},\bm{v},L_{t},l)=\frac{v_{i}+\mathbbm{1}(i\in L_{t})\cdot l}{v_{0}+\sum_{j\in S_{t}}\left(v_{j}+\mathbbm{1}(j\in L_{t})\cdot l\right)} (5)

Parameter ll is an additional preference weight for the product with the lowest price, or the product with excess attraction in the assortment. In practice we could add additional preference weights by group of products, depending on the structure of the problem. LL is a set of the indices for the lowest available product over time, and 𝟙(iLt)\mathbbm{1}(i\in L_{t}) is an indicator function, representing whether product ii is the lowest available product at time tt.

For the example in Table 6, L=(1,1,2,2,3,3)L=(1,1,2,2,3,3). L3=2L_{3}=2 means that at t=3t=3 the lowest available class is the second class from the top.

To present a motivating example with strong sell-down: assuming that the true parameters are v0=1v_{0}=1, v1=0.4v_{1}=0.4, v2=0.7v_{2}=0.7, v3=0.1v_{3}=0.1 and l=10l=10, the purchase probabilities for various assortments, induced by (5), are shown in Table 7.

Probability Assortment
L1=1L_{1}=1 L2=2L_{2}=2 L3=3L_{3}=3
P0P_{0} 8.8% 8.3% 8.2%
P1P_{1} 91.2% 3.3% 3.3%
P2P_{2} 88.4% 5.7%
P3P_{3} 82.8%
Table 7: Purchase probabilities with additional preference for lowest available product

These purchase probabilities can describe a realistic scenario, where the lowest available product receives majority of the demand. The products available above, however, still maintain the IIA property. This extended model fits data better with strong sell-down, which cannot be described by the basic MNL model.

Essentially the same idea, with more parameters than just the scalar ll and a slightly more general formulation, the spiked-MNL model was considered in Cao et al. (2019). The spiked-MNL model extends the classical MNL model by having a separate attractiveness parameter for the cheapest available fare class on each flight.

The likelihood function for the extended model with parameter ll becomes

LI(𝝀,𝒗,l)\displaystyle L_{I}(\bm{\lambda},\bm{v},l) =\displaystyle= t=1T[(P(mtcustomers buy in periodt|𝒗,𝝀,l)mt!z1t!z2t!znt!)\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda},l)\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\right.
jSt[Pj(St,𝒗,Lt,l)iStPi(St,𝒗,Lt,l)]zjt]\displaystyle\left.\prod_{j\in S_{t}}\left[\frac{P_{j}(S_{t},\bm{v},L_{t},l)}{\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)}\right]^{z_{jt}}\right]

where

P(mtcustomers buy in periodt|𝒗,𝝀,l)\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda},l) =\displaystyle= [λtiStPi(St,𝒗,Lt,l)]mtexp(λtiStPi(St,𝒗,Lt,l))mt!\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)\right)}{m_{t}!}

and the log-likelihood function modifies to

lI(𝒗,𝝀,l)\displaystyle l_{I}(\bm{v},\bm{\lambda},l) =\displaystyle= t=1T[mtlog(λtv0+iSt(vi+𝟙(iLt)l))λtiSt(vi+𝟙(iLt)l)v0+iSt(vi+𝟙(iLt)l)+\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}{v_{0}+\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}+\right. (6)
iStzitlog(vi+𝟙(iLt)l)]\displaystyle\left.\sum_{i\in S_{t}}z_{it}\log\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)\right]

Note that in this paper we are not providing an algorithm for maximizing the log-likelihood (6), but we can rely on available nonlinear solvers, if needed. It would be of practical interest, however, to extend the EM (Vulcano et al., 2012) and MM (Abdallah and Vulcano, 2016) algorithms to estimate the parameters of this extended model.

2.3 Constrained parameter space

Algorithms based on the model assumptions above can lead to unreasonable estimates in practice. For instance, on airline data we observed estimated demands which were high in a business scenario. This can happen due to data sparsity, outliers, data preprocessing steps, and most likely that the assumptions of the MNL model do not fit the true data generating model. Therefore, in practical applications, we might want to constrain the parameters of the model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for L1L_{1} regularization (Lasso regression) and L2L_{2} regularization (Ridge regression). It can be also of interest to apply constraints on the arrival rate of customers (λt\lambda_{t}), having an interpretable business fence on these parameters.

In practice we could solve a constrained maximum likelihood problem, such as

max𝝀,𝒗\displaystyle\max_{\bm{\lambda},\bm{v}} lI\displaystyle l_{I}
s.t.
t=1TλtL\displaystyle\sum_{t=1}^{T}\lambda_{t}\leq L

where the constraint ensures that the overall arrival rate or mean total demand over the time horizon is less than an upper bound LL. LL could be defined, for instance, as CC times the total observed sales, that is

L=C1st=1Ti=1nzit\displaystyle L=C\frac{1}{s}\sum_{t=1}^{T}\sum_{i=1}^{n}z_{it}

CC is a regularization parameter which has to be set based on practical considerations. Note that λt\lambda_{t} includes the no purchase alternative, so we scale the total sales with market share ss. Alternatively, we could constrain the arrival rates at each time period and solve constrained maximum likelihood problem

max𝝀,𝒗\displaystyle\max_{\bm{\lambda},\bm{v}} lI\displaystyle l_{I}
s.t.
λtLt,t=1,,T\displaystyle\lambda_{t}\leq L_{t},\quad t=1,\ldots,T

Similarly, we could set Lt=C1si=1nzitL_{t}=C\frac{1}{s}\sum_{i=1}^{n}z_{it}, a constant times the observed sales at time period tt. It would be of practical interest to extend the EM algorithm of Vulcano et al. (2012) to solve this constrained problem. In Section 4 we will extend the MM algorithm in Abdallah and Vulcano (2016) to this constrained problem and also discuss how to solve the optimization problem using the Frank-Wolfe algorithm.

2.4 Non-homogeneous product set

In revenue management, we often encounter changing products due to changes in the system, hence we need to deal with a non-homogeneous product set over time. Instead of dividing the observed data to homogeneous parts, we want to extend the algorithms to handle non-homogeneous product offerings, allowing us to use the whole data set and borrow power to accurately estimate the parameters.

Let us look at a hypothetical airline sales example in Table 8, which was created from the synthetic example in Vulcano et al. (2012). We did this, so the estimated results are easy to compare.

Sales Period
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1 flt1-prod1 10 15 11 14
2 flt1-prod2 11 6 11 8 20 16
3 flt1-prod3 5 6 1 11 4 5 14 7 11
4 flt1-prod4 4 4 4 1 6 4 3 5 9 9 6 9
5 flt1-prod5 0 2 0 0 1 0 1 3 0 3 3 5 2 3 3
6 flt2-prod1 10 15 11 14
7 flt2-prod2 11 6 11 8 20 16
8 flt2-prod3 5 6 1 11 4 5 14 7 11
9 flt2-prod4 4 4 4 1 6 4 3 5 9 9 6 9
10 flt2-prod5 0 2 0 0 1 0 1 3 0 3 3 5 2 3 3
11 flt3-prod1 20 30 22 28 20 30 22 28
12 flt3-prod2 22 12 22 16 40 32 22 12 22 16 40 32
13 flt3-prod3 10 12 2 22 8 10 28 14 22 10 12 2 22 8 10 28 14 22
14 flt3-prod4 8 8 8 2 12 8 6 10 18 18 12 18 8 8 8 2 12 8 6 10 18 18 12 18
15 flt3-prod5 0 4 0 0 2 0 2 6 0 6 6 10 4 6 6 0 4 0 0 2 0 2 6 0 6 6 10 4 6 6
Table 8: Schedule change example

We have 3 flights, each of them having 5 different products, a total of n=15n=15 products. The products of flight 1 only exist at the first 15 time periods, and after that the label changes to flight 2. This can happen for various reasons, for example schedule change. It is not always possible to preprocess the data and match the products of flight 1 to flight 2. We can see that the product offerings are non-homogeneous over time, or unbalanced. Product 1 of flight 1 is not available for sale for time periods 5-15, which is distinguished from not being part of the product offer set at time periods 16-30. Notice that the observed sales data of flight 1 for time periods 1-15 is the same as the synthetic example created in Vulcano et al. (2012), and the observed sales for Flight 3 are just twice of that, like a flight with double the demand and capacity.

To introduce non-homogeneous product set into the notation, let ItI_{t} denote the set of existing products of the retailer at time tt. Note that this is different from StS_{t}, which is the set of products available for purchase at time tt, therefore StItS_{t}\subseteq I_{t}. For instance, in Table 8, I7={1,2,3,4,5,11,12,13,14,15}I_{7}=\{1,2,3,4,5,11,12,13,14,15\} and S7={3,4,5,13,14,15}S_{7}=\{3,4,5,13,14,15\}. The set of all products is I=t=1TIt={1,2,,15}I=\bigcup_{t=1}^{T}I_{t}=\left\{1,2,\ldots,15\right\}.

In Section 3 we will extend the EM algorithm of Vulcano et al. (2012) for non-homogeneous product offerings. The MM and Frank-Wolfe algorithms discussed in Section 4 will also be able to natively handle this product structure.

2.5 Handling market share constraint and no-purchase option

We mentioned that the objective function (2) has an infinite number of maximum likelihood estimates, therefore an additional constraint is applied in Vulcano et al. (2012) on the preference weights as a function of the market share

s=i=1nviv0+i=1nvi\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}} (7)

using the standard scaling of v0=1v_{0}=1. The market share constraint is linear but the objective function is non-convex, therefore Abdallah and Vulcano (2016) formulates the problem using vi=exp(βi)v_{i}=\exp(\beta_{i}) in the objective function, so the market share constraint becomes

s=i=1nexp(βi)1+i=1nexp(βi).\displaystyle s=\frac{\sum_{i=1}^{n}\exp(\beta_{i})}{1+\sum_{i=1}^{n}\exp(\beta_{i})}.

They establish that the objective function in this space is a concave function, and they show how to deal with the non-convex market share constraint by solving the problem without the constraint and then using a transformation to satisfy the constraint.

The outside alternative could represent the competitor’s product, or both the competitor’s product and the no-purchase option, which would lead to different interpretations of ss and 𝝀\bm{\lambda} (Vulcano et al., 2012). The outside alternative is treated as a separate product that is always available, therefore v0v_{0} is constant and the standard scaling v0=1v_{0}=1 is used.

To look further, let II denote the set of all products of the retailer, that is I={1,2,,n}I=\left\{1,2,\ldots,n\right\}, so it follows that StIS_{t}\subseteq I. With the market share constraint above, the probability of selecting one of the retailer’s product, when everything is available for sale, is ss, that is

PI(I,𝒗)=jIvjv0+jIvj=s\displaystyle P_{I}(I,\bm{v})=\frac{\sum_{j\in I}v_{j}}{v_{0}+\sum_{j\in I}v_{j}}=s

When the retailer offers an assortment StS_{t} with a subset of all the products, it follows that

s~t=PSt(St,𝒗)=jStvjv0+jStvj<s\displaystyle\tilde{s}_{t}=P_{S_{t}}(S_{t},\bm{v})=\frac{\sum_{j\in S_{t}}v_{j}}{v_{0}+\sum_{j\in S_{t}}v_{j}}<s

given v0v_{0} is fixed. This means that the model induced market share (s~t\tilde{s}_{t}) of the retailer at time tt is less than ss, and the share of outside alternative is larger than 1s1-s. If we think of the outside alternative as the competitor’s product, one could argue that the competitor’s product might not always be available for purchase either, so the preference weight v0v_{0} is not constant over time. Also, the estimated market share ss is calculated over all possible sets of assortments, not only when all the retailer’s products are available for sale.

To extend this model, we can relax the assumption of constant, time-independent outside alternative and introduce v0tv_{0t}, allowing the preference weight to change over time. We also need this extension to be able to incorporate non-homogeneous product sets into the market share constraint, by using ItI_{t}, the offer set of retailer at time tt (Section 2.4). The market share constraint modifies to a set of constraints

s=iItviv0t+iItvi,t=1,,n\displaystyle s=\frac{\sum_{i\in I_{t}}v_{i}}{v_{0t}+\sum_{i\in I_{t}}v_{i}},~{}~{}t=1,\ldots,n (8)

which ensure that the share of outside alternative is 1s1-s, for all time points tt, when all the retailer’s products are available for sale. That is

P0(I,𝒗)=v0tv0t+iItvi=1ssiItvi1ssiItvi+iItvi=1s.\displaystyle P_{0}(I,\bm{v})=\frac{v_{0t}}{v_{0t}+\sum_{i\in I_{t}}v_{i}}=\frac{\frac{1-s}{s}\sum_{i\in I_{t}}v_{i}}{\frac{1-s}{s}\sum_{i\in I_{t}}v_{i}+\sum_{i\in I_{t}}v_{i}}=1-s.

Now let us consider an edge case scenario where the retailer’s model induced market share is constant over time regardless of the assortment he offers. This could be achieved with the set of market share constraints

s=iStviv0t+iStvi,t=1,,n\displaystyle s=\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}},~{}~{}t=1,\ldots,n (9)

leading to

s~t=PSt(St,𝒗)=jStvjv0t+jStvj=jStvj1ssjStvj+jStvj=s.\displaystyle\tilde{s}_{t}=P_{S_{t}}(S_{t},\bm{v})=\frac{\sum_{j\in S_{t}}v_{j}}{v_{0t}+\sum_{j\in S_{t}}v_{j}}=\frac{\sum_{j\in S_{t}}v_{j}}{\frac{1-s}{s}\sum_{j\in S_{t}}v_{j}+\sum_{j\in S_{t}}v_{j}}=s.

The retailers’s share remains ss and the share of the outside alternative remains 1s1-s, at each time tt, regardless of what products are available for purchase. This edge case might not make sense for all applications, but it is interesting to consider as an alternative to the assumption that the outside alternative is always fully available. In the airline retail context we could think of the outside alternative as the competitor’s products whose availability can be as limited as the host airline’s products.

To introduce a continuum of cases between (8) and (9), let us introduce parameter α[0,1]\alpha\in[0,1] controlling the availability of outside alternative. α=0\alpha=0 represents the case where the outside alternative is always available for sale (8), and naturally, α=1\alpha=1 represents the case where the outside alternative limits availability the same fashion as the retailer. This is more of a hypothetical case, given the outside alternative could include the no purchase option, which is always an available choice. Combining (8) and (9), the set of market share constraints become

s=(1α)iItviv0t+iItvi+αiStviv0t+iStvi,t=1,,n\displaystyle s=(1-\alpha)\frac{\sum_{i\in I_{t}}v_{i}}{v_{0t}+\sum_{i\in I_{t}}v_{i}}+\alpha\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}},~{}~{}t=1,\ldots,n (10)

Using α=0\alpha=0, It=I={1,,n}I_{t}=I=\{1,\ldots,n\} and v0t=1v_{0t}=1, the constraints simplify to (7).

The models require the knowledge of market share ss, which in practice can be difficult to acquire, and the estimate itself can be inaccurate. Abdallah and Vulcano (2016) included a study on the sensitivity of model estimates to the input market share. We mentioned that, in practice, market share ss is most likely estimated from data observed over various sets of assortments, not only when all the retailer’s products are available for sale. Therefore, it could also make sense to use a market share constraint aggregated over time horizon TT, that is

s=t=1TiItvit=1T(v0t+iItvi)\displaystyle s=\frac{\sum_{t=1}^{T}\sum_{i\in I_{t}}v_{i}}{\sum_{t=1}^{T}\left(v_{0t}+\sum_{i\in I_{t}}v_{i}\right)} (11)

This constraint ensures that the market share over the time horizon TT is ss, taking into account the changing offer set ItI_{t} over time. This is less restrictive and could be a more reasonable assumption than using (7), which forces the market share at each time point tt with a set of constraints. Adding α\alpha to control availability of the outside alternative, we would use

s=t=1T[(1α)iItvi+αiStvi]t=1T[v0t+(1α)iItvi+αiStvi]\displaystyle s=\frac{\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]}{\sum_{t=1}^{T}\left[v_{0t}+(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]} (12)

Finally, we want to point out that instead of solving the market share constrained optimization problem

max𝝀,𝒗\displaystyle\max_{\bm{\lambda},\bm{v}} t=1T[mtlog(λtv0+iStvi)λtiStviv0+iStvi+iStzitlog(vi)]\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]
s.t.
s=i=1nviv0+i=1nvi\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}}
v0=1\displaystyle v_{0}=1

we can directly incorporate the market share constraint into the objective function through v0=ri=1nviv_{0}=r\sum_{i=1}^{n}v_{i}, with r=(1s)/sr=(1-s)/s, and solve

max𝝀,𝒗\displaystyle\max_{\bm{\lambda},\bm{v}} t=1T[mtlog(λtri=1nvi+iStvi)λtiStviri=1nvi+iStvi+iStzitlog(vi)]\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{r\sum_{i=1}^{n}v_{i}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{r\sum_{i=1}^{n}v_{i}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]
s.t.
i=1nvi=1\displaystyle\sum_{i=1}^{n}v_{i}=1

or by setting v1=1v_{1}=1. The scaling constraint on 𝒗\bm{v} is required to avoid multiple solutions.

Incorporating non-homogeneous product set and control of availability of outside alternative, we would need to solve

max𝝀,𝒗\displaystyle\max_{\bm{\lambda},\bm{v}} t=1T[mtlog(λtv0t+iStvi)λtiStviv0t+iStvi+iStzitlog(vi)]\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]
s.t.
v0t=r[(1α)iItvi+αiStvi]\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]
i=1nvi=1\displaystyle\sum_{i=1}^{n}v_{i}=1

We will see in Section 4.2 that this formulation with upper bound constraints on the arrival rate is easy to solve using the Frank-Wolfe method.

3 Extended EM algorithm

In this section we extend the EM algorithm of Vulcano et al. (2012) with non-homogeneous product sets (Section 2.4) and the ability to control the availability of outside alternative (Section 2.5).

We incorporate ItI_{t}, the offer set of retailer at time tt, and v0tv_{0t}, the time-dependent preference weight for the outside alternative into the algorithm. We use

v0t=r[(1α)iItvi+αiStvi]\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]

where r=1ssr=\frac{1-s}{s} and α[0,1]\alpha\in[0,1] controls the availability of the outside alternative. Adding these into the formulation, we can modify the key E-step equations of the EM algorithm as

X^jt\displaystyle\hat{X}_{jt} =\displaystyle= {vj(1+r)iItvihStvh+v0thStvhhStzht,if jSt{0}hStvh+v0t(1+r)iItvizjt,if jSt\displaystyle\begin{cases}\frac{v_{j}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}\frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\sum_{h\in S_{t}}v_{h}}\sum_{h\in S_{t}}z_{ht},&\text{if $j\notin S_{t}\cup\{0\}$}\\ \frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt},&\text{if $j\in S_{t}$}\end{cases}
Y^jt\displaystyle\hat{Y}_{jt} =\displaystyle= hSt{0}vhv0t+riItvi(1+r)iItvizjt,jSt\displaystyle\frac{\sum_{h\notin S_{t}\cup\{0\}}v_{h}-v_{0t}+r\sum_{i\in I_{t}}v_{i}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt},~{}j\in S_{t}
X^0t\displaystyle\hat{X}_{0t} =\displaystyle= riItX^it\displaystyle r\sum_{i\in I_{t}}\hat{X}_{it}
Y^0t\displaystyle\hat{Y}_{0t} =\displaystyle= v0tiStvi+v0thSt{0}X^ht\displaystyle\frac{v_{0t}}{\sum_{i\in S_{t}}v_{i}+v_{0t}}\sum_{h\notin S_{t}\cup\{0\}}\hat{X}_{ht}

For the M-step we need to maximize the conditional expected, complete data log-likelihood function, which becomes

(𝒗)=t=1TjItX^jtlog(vj(1+r)iItvi)+t=1TX^0tlog(s)\displaystyle{\mathcal{L}}(\bm{v})=\sum_{t=1}^{T}\sum_{j\in I_{t}}\hat{X}_{jt}\log\left(\frac{v_{j}}{(1+r)\sum_{i\in I_{t}}v_{i}}\right)+\sum_{t=1}^{T}\hat{X}_{0t}\log\left(s\right) (13)

To find the maxima, the first order conditions become

vi=t=1T𝟙[iIt](X^itvikItX^ktkItvk)=0,i=1,,n\displaystyle\frac{\partial{\mathcal{L}}}{\partial v_{i}}=\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\left(\frac{\hat{X}_{it}}{v_{i}}-\frac{\sum_{k\in I_{t}}\hat{X}_{kt}}{\sum_{k\in I_{t}}v_{k}}\right)=0,~{}~{}i=1,\ldots,n

which is a system of nn nonlinear equations in viv_{i}. The solution can be obtained with existing implementations of iterative methods, such as Newton-Raphson. We can also use a simple fixed point iteration algorithm by rewriting =F(𝒗)=0\nabla\mathcal{L}=F(\bm{v})=0 to Φ(𝒗)=𝒗\Phi\left(\bm{v}\right)=\bm{v} and then using the fixed point iteration 𝒗(l+1):=Φ(𝒗(l))\bm{v}^{(l+1)}:=\Phi\left(\bm{v}^{(l)}\right). In our case the update becomes

vi(l+1):=t=1T𝟙[iIt]Xitt=1T𝟙[iIt]kItXktkItvk(l),i=1,,n\displaystyle v_{i}^{(l+1)}:=\frac{\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]X_{it}}{\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\frac{\sum_{k\in I_{t}}X_{kt}}{\sum_{k\in I_{t}}v_{k}^{(l)}}},~{}~{}i=1,\ldots,n

In case of homogeneous product set, so that It={1,,n},tI_{t}=\{1,\ldots,n\},~{}\forall t, we get a closed form solution to the M-step, that is

vi=t=1TXitj=1nt=1TXjt,i=1,,n\displaystyle v_{i}=\frac{\sum_{t=1}^{T}X_{it}}{\sum_{j=1}^{n}\sum_{t=1}^{T}X_{jt}},~{}~{}i=1,\ldots,n

The result is easy to obtain by assuming i=1nvi=1\sum_{i=1}^{n}v_{i}=1. Note that this closed-form equation is different from the one derived in Vulcano et al. (2012), since they used a different parametrization by setting v0=1v_{0}=1.

Algorithm 2 presents the extended EM algorithm with non-homogeneous product set and the ability to control the availability of outside alternative.

Algorithm 2 EM algorithm with non-homogeneous product set and the control of availability of outside alternative (OA)
1:ItI_{t}: set of offered products at time tt (product set)
2:StS_{t}: set of products available for purchase at time tt (StItS_{t}\subset I_{t})
3:r=1ssr=\frac{1-s}{s}, where ss is market share of the retailer
4:zjtz_{jt}: observed sales at time tt for product jj
5:α\alpha: control parameter (α=1\alpha=1: OA available as retailer; α=0\alpha=0: OA fully available)
6:E-step
7:for t=1,,Tt=1,\ldots,T do
8:    v0t=r[(1α)iItvi+αhStvh]v_{0t}=r\left[\left(1-\alpha\right)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{h\in S_{t}}v_{h}\right] \triangleright OA preference weight at tt
9:    for jItj\in I_{t} do
10:        if jStj\notin S_{t} then
11:           Xjt=vj(1+r)iItvihStvh+v0thStvhhStzhtX_{jt}=\frac{v_{j}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}\frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\sum_{h\in S_{t}}v_{h}}\sum_{h\in S_{t}}z_{ht}
12:        else (jStj\in S_{t})
13:           Yjt=hStvhv0t+riItvi(1+r)iItvizjtY_{jt}=\frac{\sum_{h\notin S_{t}}v_{h}-v_{0t}+r\sum_{i\in I_{t}}v_{i}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt}
14:           Xjt=zjtYjtX_{jt}=z_{jt}-Y_{jt}
15:        end if
16:    end for
17:    Xot=riItXitX_{ot}=r\sum_{i\in I_{t}}X_{it}
18:    Yot=v0thStvh+v0thStXhtY_{ot}=\frac{v_{0t}}{\sum_{h\in S_{t}}v_{h}+v_{0t}}\sum_{h\notin S_{t}}X_{ht}
19:end for
20:M-step
21:Find vi,i=1,,nv_{i},~{}i=1,\ldots,n as solution to the system of nonlinear equations F(𝒗)=0F(\bm{v})=0:
22:t=1T𝟙[iIt](XitvikItXktkItvk)=0,i=1,,n\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\left(\frac{X_{it}}{v_{i}}-\frac{\sum_{k\in I_{t}}X_{kt}}{\sum_{k\in I_{t}}v_{k}}\right)=0,~{}~{}i=1,\ldots,n
23:Special case (homogeneous product set), if It={1,,n},tI_{t}=\{1,\ldots,n\},~{}\forall t:
24:vi=t=1TXitj=1nt=1TXjt,i=1,,nv_{i}=\frac{\sum_{t=1}^{T}X_{it}}{\sum_{j=1}^{n}\sum_{t=1}^{T}X_{jt}},~{}~{}i=1,\ldots,n

If, in practice, we observe data with partial availability, we recommend to split the sales to fully open and closed assortments (Algorithm 1), apply extended EM (Algorithm 2), and aggregate the solution back. It would be an interesting future research topic to further extend the EM algorithm and incorporate constraints on the arrival rates (Section 2.3).

3.1 Example: limited OA availability

In this section we apply Algorithm 2 on the simulated example from Vulcano et al. (2012) and demonstrate what happens when we limit the availability of the outside alternative simultaneously with the retailer’s availability (α=1\alpha=1). The observed sales data is presented in Table 9.

Sales Period
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 10 15 11 14
2 11 6 11 8 20 16
3 5 6 1 11 4 5 14 7 11
4 4 4 4 1 6 4 3 5 9 9 6 9
5 0 2 0 0 1 0 1 3 0 3 3 5 2 3 3
Table 9: Simulated example of Vulcano et al. (2012)

First let us look at the solution, using α=0\alpha=0, presented in Table 10. We closely recover the results of Vulcano et al. (2012), since we make the same assumption that the outside alternative is always available.

Estimates Period 𝐯i\mathbf{v}_{i}
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 10 15 11 14 15.03 12.12 13.04 10.87 14.49 15.91 11.93 18.56 11.51 17.27 17.27 1.000
2 11 6 11 8 14.35 11.48 10.45 8.71 11.61 12.75 9.56 14.88 9.23 13.84 13.84 0.801
3 5 6 1 11 2.87 3.59 6.88 3.44 5.41 6.22 4.67 7.26 4.50 6.75 6.75 0.391
4 4 4 4 1 4.31 2.87 1.47 2.46 4.42 3.43 2.29 3.43 2.68 4.02 4.02 0.233
5 0 2 0 0 0.72 0.00 0.49 1.47 0.00 1.14 1.14 1.91 0.63 0.95 0.95 0.055
λt\mathbf{\lambda}_{t} 42.86 47.14 38.57 48.57 53.26 42.95 46.19 38.50 51.33 56.37 42.28 65.76 40.78 61.18 61.18
Table 10: Estimated demand and parameters using EM algorithm with α=0\alpha=0

Using α=1\alpha=1, the estimated primary demand at each time period is equal to the observed purchases, because the availability of outside alternative is restricted as the retailer’s availability, preserving the model induced market share ss at each time tt. The results are presented in Table 11.

Estimates Period 𝐯i\mathbf{v}_{i}
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
1 10 15 11 14 12.50 10.08 7.26 6.05 8.06 4.84 3.63 5.65 0.81 1.21 1.21 1.000
2 11 6 11 8 11.94 9.55 5.75 4.79 6.39 3.83 2.87 4.47 0.64 0.96 0.96 0.792
3 5 6 1 11 2.39 2.98 3.88 1.94 3.05 1.92 1.44 2.24 0.32 0.48 0.48 0.396
4 4 4 4 1 3.58 2.39 0.83 1.39 2.50 1.06 0.71 1.06 0.20 0.30 0.30 0.245
5 0 2 0 0 0.60 0.00 0.28 0.83 0.00 0.35 0.35 0.59 0.04 0.06 0.06 0.046
λt\mathbf{\lambda}_{t} 42.86 47.14 38.57 48.57 44.29 35.71 25.71 21.43 28.57 17.14 12.86 20.00 2.86 4.29 4.29
Table 11: Estimated demand and parameters using EM algorithm with α=1\alpha=1

Notice that the estimated preference weights in Tables 10 and 11 are close to each other, the estimates are not sensitive to the value of α\alpha. However, the estimated arrival rates changed drastically, due to the change in the model induced market shares.

Parameter α\alpha can be used as a tool for the practitioner to control the availability of the outside alternative between these two edge cases. We would like to emphasize, however, that competitor matching can be risky and the value of α\alpha should ideally be inferred from exogenous data. The conservative practice is to use α=0\alpha=0.

3.2 Example: non-homogeneous product set

To demonstrate the extended EM algorithm on a non-homogeneous product set, let us revisit the example presented in Table 8 (Section 2.4). This is a hypothetical airline sales example with a schedule change. We observe 3 flights, where the sales of products of flight 1 are discontinued and the repeated as flight 2, and flight 3 has twice the observed sales of flights 1 and 2. The solution using α=0\alpha=0 is presented in Table 12.

Estimates Period 𝐯i\mathbf{v}_{i}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
flt1-prod1 10 15 11 14 15.03 12.12 13.04 10.87 14.49 15.91 11.93 18.56 11.51 17.27 17.27 1.000
flt1-prod2 11 6 11 8 14.35 11.48 10.45 8.71 11.61 12.75 9.56 14.88 9.23 13.84 13.84 0.801
flt1-prod3 5 6 1 11 2.87 3.59 6.88 3.44 5.41 6.22 4.67 7.26 4.50 6.75 6.75 0.391
flt1-prod4 4 4 4 1 4.31 2.87 1.47 2.46 4.42 3.43 2.29 3.43 2.68 4.02 4.02 0.233
flt1-prod5 0 2 0 0 0.72 0.00 0.49 1.47 0.00 1.14 1.14 1.91 0.63 0.95 0.95 0.055
flt2-prod1 1.000
flt2-prod2 0.801
flt2-prod3 0.391
flt2-prod4 0.233
flt2-prod5 0.055
flt3-prod1 20 30 22 28 30.07 24.25 26.08 21.73 28.98 31.82 23.87 37.12 23.02 34.54 34.54 2.000
flt3-prod2 22 12 22 16 28.71 22.97 20.90 17.42 23.22 25.50 19.13 29.75 18.45 27.68 27.68 1.603
flt3-prod3 10 12 2 22 5.74 7.18 13.76 6.88 10.81 12.44 9.33 14.52 9.00 13.51 13.51 0.782
flt3-prod4 8 8 8 2 8.61 5.74 2.95 4.92 8.85 6.86 4.57 6.86 5.36 8.04 8.04 0.465
flt3-prod5 0 4 0 0 1.44 0.00 0.98 2.95 0.00 2.29 2.29 3.81 1.26 1.89 1.89 0.110
λt\mathbf{\lambda}_{t} 128.57 141.43 115.71 145.71 159.79 128.86 138.58 115.49 153.98 169.10 126.83 197.29 122.35 183.53 183.53
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
flt1-prod1 1.000
flt1-prod2 0.801
flt1-prod3 0.391
flt1-prod4 0.233
flt1-prod5 0.055
flt2-prod1 10 15 11 14 15.03 12.12 13.04 10.87 14.49 15.91 11.93 18.56 11.51 17.27 17.27 1.000
flt2-prod2 11 6 11 8 14.35 11.48 10.45 8.71 11.61 12.75 9.56 14.88 9.23 13.84 13.84 0.801
flt2-prod3 5 6 1 11 2.87 3.59 6.88 3.44 5.41 6.22 4.67 7.26 4.50 6.75 6.75 0.391
flt2-prod4 4 4 4 1 4.31 2.87 1.47 2.46 4.42 3.43 2.29 3.43 2.68 4.02 4.02 0.233
flt2-prod5 0 2 0 0 0.72 0.00 0.49 1.47 0.00 1.14 1.14 1.91 0.63 0.95 0.95 0.055
flt3-prod1 20 30 22 28 30.07 24.25 26.08 21.73 28.98 31.82 23.87 37.12 23.02 34.54 34.54 2.000
flt3-prod2 22 12 22 16 28.71 22.97 20.90 17.42 23.22 25.50 19.13 29.75 18.45 27.68 27.68 1.603
flt3-prod3 10 12 2 22 5.74 7.18 13.76 6.88 10.81 12.44 9.33 14.52 9.00 13.51 13.51 0.782
flt3-prod4 8 8 8 2 8.61 5.74 2.95 4.92 8.85 6.86 4.57 6.86 5.36 8.04 8.04 0.465
flt3-prod5 0 4 0 0 1.44 0.00 0.98 2.95 0.00 2.29 2.29 3.81 1.26 1.89 1.89 0.110
λt\mathbf{\lambda}_{t} 128.57 141.43 115.71 145.71 159.79 128.86 138.58 115.49 153.98 169.10 126.83 197.29 122.35 183.53 183.53
Table 12: Schedule change example: estimated demand and parameters (α=0\alpha=0)

For flights 1 and 2 we get the same estimated primary demand and preference weights as in Table 10, since we duplicated that example over the time horizon. For flight 3 we estimate twice the primary demand and preference weights, which was expected, since we artificially doubled the numbers. The method is consistent, and can handle non-homogeneous product set in a mathematically formal way.

It is interesting to note here that the EM algorithm of Vulcano et al. (2012) cannot distinguish between product ii not being available for sale (iSti\notin S_{t}) as opposed to not being part of the product set (iIti\notin I_{t}). Because of this, naive application of the EM algorithm would estimate primary demand for non-existing products of flight 1 and 2, grossly overestimating the demand. The total estimated arrival rate is i=130λt=5324.10\sum_{i=1}^{30}\lambda_{t}=5324.10, while using the extended EM algorithm we get i=130λt=4421.53\sum_{i=1}^{30}\lambda_{t}=4421.53.

It is also interesting to mention that using the extended EM algorithm with α=1\alpha=1 we get i=130λt=2365.71\sum_{i=1}^{30}\lambda_{t}=2365.71, and just like before, the total primary demand at time period tt is equal to the observed purchases. It is easy to show in general that in case α=1\alpha=1 it follows that λt=(i=1nzit)/s\lambda_{t}=\left(\sum_{i=1}^{n}z_{it}\right)/s. If we apply the algorithm with other values of α[0,1]\alpha\in[0,1] we observe a linear decrease of total estimated arrival rate as a function of α\alpha. The model induced market share of the retailer increases as α\alpha increases which decreases the estimated demand.

Algorithm 2 is a simple but yet powerful extension of the EM algorithm which can natively handle non-homogenous product set, and can control the availability of the outside alternative by a simple parameter. The price of this extension is that in the M-step we need to solve for the roots of a system of nonlinear equations, instead of having a closed form solution. However, we can use a simple fixed point iteration as the M-step.

4 Constrained optimization

In this section we develop algorithms to solve the estimation problem with constrained arrival rates (Section 2.3). We will extend the minorization-maximization (MM) algorithm of Abdallah and Vulcano (2016) and also present a solution utilizing the Frank-Wolfe algorithm. We will also incorporate partial availability, non-homogeneous product set, and the ability to control the availability of outside alternative into the model, and show how to estimate the parameters using iterative algorithms.

Consider the incomplete log-likelihood

lI(𝒗,𝝀)=t=1T[mtlog(λtv0t+iStvioit)λtiStvioitv0t+iStvioit+iStzitlog(vioit)]\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})\right] (14)

which is an extension of the log-likelihood in Abdallah and Vulcano (2016) with partial availability, as discussed in Section 2.1.1. In case oit=1o_{it}=1 when iSti\in S_{t}, the log-likelihood simplifies to the one considering only fully open and closed assortments. Note that we are using preference weights 𝒗\bm{v} in the model, but we could express the model in the utility space by using 𝒗=exp(𝜷)\bm{v}=\exp(\bm{\beta}).

The constrained optimization problem we need to solve is given by

max𝒗,𝝀\displaystyle\max_{\bm{v},\bm{\lambda}} lI(𝒗,𝝀)\displaystyle l_{I}(\bm{v},\bm{\lambda}) (15)
s.t.
λtLt,t=1,,T\displaystyle\lambda_{t}\leq L_{t},~{}t=1,\ldots,T

where LtL_{t} is an upper bound on the arrival rate at time tt. The motivation behind putting an upper bound on the arrival rates was explained in Section 2.3, where we discussed two specific ways of constraining the arrival rates. To solve the constrained maximum likelihood estimation problem, we follow the idea in Abdallah and Vulcano (2016). We express the optimization problem as a function of 𝒗\bm{v} by applying part of the Karush–Kuhn–Tucker (KKT) conditions on the Lagrangian function, removing 𝝀\bm{\lambda} from the problem. The Lagrangian function of (15) becomes

(𝒗,𝝀,𝝁)=lI(𝒗,𝝀)t=1Tμtv0t+iStvioit(λtLt)\displaystyle{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})=l_{I}(\bm{v},\bm{\lambda})-\sum_{t=1}^{T}\frac{\mu_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}(\lambda_{t}-L_{t}) (16)

Note that we used a simple re-scaling (1v0t+iStvioit\frac{1}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}} for time tt) of the Lagrange multipliers 𝝁\bm{\mu}. After we apply the KKT conditions to remove 𝝀\bm{\lambda} from the problem, and some algebra (see Appendix), the problem becomes

max𝒗\displaystyle\max_{\bm{v}} i=1nKilog(vi)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)t(𝒗)LtiStvioitv0t+iStvioit+C3((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)

where C3((𝒗))C_{3}\left(\mathcal{B}(\bm{v})\right) is defined in (26)(\ref{eq:c_v}), Ki=t=1TzitK_{i}=\sum_{t=1}^{T}z_{it} and (𝒗)\mathcal{B}(\bm{v}) represents the set of time periods where the upper bound constraints on the arrival rates are binding, that is

(𝒗)={t|Lt<mtv0t+iStvioitiStvioit}\displaystyle\mathcal{B}(\bm{v})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right\} (18)

Through the definitions of v0tv_{0t} and market share constraint we can incorporate non-homogeneous product set in the model, and the ability to control the availability of outside alternative. These details were discussed in Sections 2.4 and 2.5. We will consider two formulations here. In the first one we use market share constraint

t=1T[(1α)iItvi+αiStvioit]=s~t=1Tv0t\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t} (19)

which is equivalent to the aggregate constraint (11) discussed in Section 2.5, with using s~=s/(1s)\tilde{s}=s/(1-s). We assume that OA preference weights v0tv_{0t} are known, and we can use standard scaling v0t=1v_{0t}=1. In Section 4.1 we will show how to use the MM algorithm to solve this problem.

In the second formulation we incorporate market share constraint (10) into the objective function and constrain the preference weights by using

v0t\displaystyle v_{0t} =\displaystyle= r[(1α)iItvi+αiStvioit]\displaystyle r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right] (20)
ivi\displaystyle\sum_{\forall i}v_{i} =\displaystyle= 1\displaystyle 1

This first equation removes v0tv_{0t} from the problem, while the second equation avoids having multiple solutions by rescaling 𝒗\bm{v}. In Section 4.2 we will show how the Frank-Wolfe method will lead to a simple coordinate descent algorithm to solve this problem.

4.1 Solution using MM algorithm

In this section we present a solution to the optimization problem (15) with constraint (19) using the MM algorithm, building on Abdallah and Vulcano (2016). The idea behind MM algorithms is to find a surrogate function that minorizes the original objective function, maximize the surrogate function, and continue this iteratively. For more information on the MM algorithm, in general, see Hunter and Lange (2000a, 2004).

After removing 𝝀\bm{\lambda} from the problem using the KKT conditions, we arrive to (4). If we rearrange the last term of the objective to aid the minorization, the optimization problem becomes

max𝒗\displaystyle\max_{\bm{v}} i=1nKilog(vi)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)+t(𝒗)Ltv0tv0t+iStvioit+C3((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{v_{0t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)
s.t.
t=1T[(1α)iItvi+αiStvioit]=s~t=1Tv0t\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t}

where v0tv_{0t} are known. A common technique to find a minorizer is to use supporting hyperplanes (Hunter and Lange, 2004). Since the second, third, and fourth terms in (4.1) are convex, we can use first-order Taylor approximation to the convex functions log(x)-\log(x) and 1x\frac{1}{x}, that is, for all x,y>0x,y>0

log(y)\displaystyle-\log(y) \displaystyle\geq log(x)1x(yx)\displaystyle-\log(x)-\frac{1}{x}(y-x)
1y\displaystyle\frac{1}{y} \displaystyle\geq 1x1x2(yx)\displaystyle\frac{1}{x}-\frac{1}{x^{2}}(y-x)

In our specific case we will use

log(y)\displaystyle-\log(y) \displaystyle\geq 1log(x)y/x\displaystyle 1-\log(x)-y/x
1y\displaystyle\frac{1}{y} \displaystyle\geq yx2+2x\displaystyle-\frac{y}{x^{2}}+\frac{2}{x}

with y=iStvioity=\sum_{i\in S_{t}}v_{i}\cdot o_{it} and x=iStvi(k)oitx=\sum_{i\in S_{t}}v^{(k)}_{i}\cdot o_{it} or y=v0t+iStvioity=v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it} and x=v0t+iStvi(k)oitx=v_{0t}+\sum_{i\in S_{t}}v^{(k)}_{i}\cdot o_{it}, where 𝒗(k)\bm{v}^{(k)} is the value of 𝒗\bm{v} at iteration kk. Therefore, the minorizer function becomes

g(𝒗|𝒗(k))\displaystyle g(\bm{v}|\bm{v}^{(k)}) =\displaystyle= i=1nKilog(vi)t(𝒗(k))mtiStvioitiStvi(k)oit\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}
t(𝒗(k))mtiStvioitv0t+iStvi(k)oitt(𝒗(k))Ltv0t(iStvioit)(v0t+iStvi(k)oit)2+C0\displaystyle-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}\frac{L_{t}v_{0t}(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}+C_{0}

where C0C_{0} contains the constant terms independent of 𝒗\bm{v}. Although the minorizer g(𝒗|𝒗(k))g(\bm{v}|\bm{v}^{(k)}) is developed locally for fixed 𝒗(k)\bm{v}^{(k)}, it can be shown to be globally dominated by the (𝒗,𝝀,𝝁){\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu}). Computational results show that it is actually a fairly tight minorizer as well. We now need to solve a single market share constraint optimization problem:

max𝒗\displaystyle\max_{\bm{v}} i=1nKilog(vi)t(𝒗(k))mtiStvioitiStvi(k)oit\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}
t(𝒗(k))mtiStvioitv0t+iStvi(k)oitt(𝒗(k))Ltv0t(iStvioit)(v0t+iStvi(k)oit)2\displaystyle-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}\frac{L_{t}v_{0t}(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}
s.t.
t=1T[(1α)iItvi+αiStvioit]=s~t=1Tv0t\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t}

Since the objective function is concave, the constraint is convex, and we are maximizing, there exists a single scalar η\eta such that the first order optimality condition holds for Lagrangian

(𝒗,η)=g(𝒗|𝒗(𝒌))η(t=1T[(1α)iItvi+αiStvioit]s~t=1Tv0t)\displaystyle\mathcal{L}(\bm{v},\eta)=g(\bm{v}|\bm{v^{(k)}})-\eta\left(\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]-\tilde{s}\sum_{t=1}^{T}v_{0t}\right)

The first order optimality condition is

Kj\displaystyle K_{j} =\displaystyle= vj(t(𝒗(k))mt𝟙(jSt)ojtiStvi(k)oit+t(𝒗(k))mt𝟙(jSt)ojtv0t+iStvi(k)oit+\displaystyle v_{j}\left(\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\right.
t(𝒗(k))Ltv0t𝟙(jSt)ojt(v0t+iStvi(k)oit)2+η[(1α)t=1T𝟙(jIt)+αt=1T𝟙(jSt)ojt])\displaystyle\left.\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}+\eta\left[(1-\alpha)\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t})+\alpha\sum_{t=1}^{T}\mathbbm{1}(j\in S_{t})\cdot o_{jt}\right]\right)

which leads to the MM update of 𝒗\bm{v} summarized in Algorithm 3. In the update step we use Newton’s method to find η\eta in every MM iteration, summarized in Algorithm 4. For more details, please see Appendix.

Algorithm 3 MM algorithm to estimate (𝝀,𝒗)(\bm{\lambda},\bm{v}) in (15) with market share constraint (19)
1:Input: {(𝒛t,𝒐t,St,It,v0t,Lt)}t=1T\left\{\left(\bm{z}_{t},\bm{o}_{t},S_{t},I_{t},v_{0t},L_{t}\right)\right\}_{t=1}^{T}, ss, α\alpha
2:Let Kj=t=1TzjtK_{j}=\sum_{t=1}^{T}z_{jt}, nj=t=1T𝟙(jIt)n_{j}=\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t}) and oj=t=1Tojt,j=1,,no_{j}=\sum_{t=1}^{T}o_{jt},~{}j=1,\ldots,n
3:Let mt=j=1nzjt,t=1,,Tm_{t}=\sum_{j=1}^{n}z_{jt},~{}t=1,\ldots,T and s~=s/(1s)\tilde{s}=s/(1-s)
4:Let k=0k=0, initialize 𝒗(0)\bm{v}^{(0)}
5:while Stopping criteria is not satisfied do
6:    (𝒗(k))={t|Lt<mtv0t+iStvi(k)oitiStvi(k)oit}\mathcal{B}(\bm{v}^{(k)})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right\}
7:    for j=1,,nj=1,\ldots,n do
8:        Aj=t(𝒗(k))mt𝟙(jSt)ojtiStvi(k)oit+t(𝒗(k))mt𝟙(jSt)ojtv0t+iStvi(k)oit+t(𝒗(k))Ltv0t𝟙(jSt)ojt(v0t+iStvi(k)oit)2A_{j}=\sum_{t\notin\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}
9:    end for
10:    Find η\eta using Algorithm 4
11:    for j=1,,nj=1,\ldots,n do
12:        vj(k+1)=KjAj+η[(1α)nj+αoj]v_{j}^{(k+1)}=\frac{K_{j}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}
13:    end for
14:    k=k+1k=k+1
15:end while
16:for t=1,,Tt=1,\ldots,T do
17:    λt=min(Lt,mtv0t+iStvi(k)oitiStvi(k)oit)\lambda_{t}=\min\left(L_{t},m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)
18:end for
Algorithm 4 Newton’s method to find η\eta
1:Let k=0k=0, η0=0\eta^{0}=0, and f0=1f^{0}=1
2:while fk>ϵf^{k}>\epsilon do
3:    fk=j=1nKj[(1α)nj+αoj]Aj+ηk[(1α)nj+αoj]s~t=1Tv0tf^{k}=\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta^{k}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}-\tilde{s}\sum_{t=1}^{T}v_{0t}
4:    gk=j=1nKj[(1α)nj+αoj]2(Aj+ηk[(1α)nj+αoj])2g^{k}=-\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]^{2}}{\left(A_{j}+\eta^{k}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]\right)^{2}}
5:    ηk+1=ηkfkgk\eta^{k+1}=\eta^{k}-\frac{f^{k}}{g^{k}}
6:    k=k+1k=k+1
7:end while

4.2 Solution using Frank-Wolfe method

In this section we present a solution to the optimization problem (15) with constraint (20) using the Frank-Wolfe algorithm. The Frank-Wolfe, or conditional gradient method is an iterative optimization algorithm for constrained convex optimization, where in each iteration, it considers a linear approximation of the objective function, and moves towards a minimizer of this linear function. For more information on the Frank-Wolfe algorithm, see Frank and Wolfe (1956) and Bertsekas (1999).

After removing 𝝀\bm{\lambda} from the problem using the KKT conditions, we arrive to (4). Therefore, we need to solve the constrained optimization problem

max𝒗\displaystyle\max_{\bm{v}} i=1nKilog(vi)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)t(𝒗)LtiStvioitv0t+iStvioit+C3((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)
s.t. (24)
i=1nvi=1\displaystyle\sum_{i=1}^{n}v_{i}=1

where we plug in v0t=r[(1α)iItvi+αiStvioit]v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right] to include the market share constraints into the objective function.

The first step of the Frank-Wolfe algorithm is the direction finding subproblem, which becomes

max𝒚\displaystyle\max_{\bm{y}} f(𝒗(k))T𝒚\displaystyle\nabla f\left(\bm{v}^{(k)}\right)^{T}\bm{y} (25)
s.t.
i=1nyi=1\displaystyle\sum_{i=1}^{n}y_{i}=1

where f(𝒗(k))\nabla f\left(\bm{v}^{(k)}\right) is the gradient vector of the objective function (4.2) evaluated at the solution of iteration kk. The elements of f(𝒗)\nabla f\left(\bm{v}\right) are calculated as

fvj=\displaystyle\frac{\partial f}{\partial v_{j}}= Kjvjt(𝒗)mt𝟙(jSt)ojtiStvioit\displaystyle\frac{K_{j}}{v_{j}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-
t(𝒗)mt𝟙(jSt)ojt(rα+1)v0t+iStvioitt(𝒗)Lt𝟙(jSt)ojtr(1α)(iItvi)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}(r\alpha+1)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}r(1-\alpha)(\sum_{i\in I_{t}}v_{i})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}-
t(𝒗)mt𝟙(jIt)r(1α)v0t+iStvioit+t(𝒗)Lt𝟙(jIt)r(1α)(iStvioit)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}
where
v0t=r[(1α)iItvi+αiStvioit]\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]

For detailed derivation and computational formula, please see Appendix. The direction finding subproblem in (25) is a fractional knapsack problem (Korte and Vygen, 2012), which solution becomes

yi={1,if i=argmax{fvi|𝒗=𝒗(k),i=1,,n}0,otherwise\displaystyle y_{i}=\begin{cases}1,&\text{if }i=\arg\max\left\{\left.\frac{\partial f}{\partial v_{i}}\right|_{\bm{v}=\bm{v}^{(k)}},~{}i=1,\ldots,n\right\}\\ 0,&\text{otherwise}\end{cases}

We take a step in the direction of the maximum element of the gradient, only changing that variable in the current step. The Frank-Wolfe algorithm reduces to a coordinate descent algorithm, successively maximizing along coordinate directions determined by the largest value of the gradient vector. The update step of Frank-Wolfe is

𝒗(k+1)=𝒗(k)+γk(𝒚𝒗(k))\displaystyle\bm{v}^{(k+1)}=\bm{v}^{(k)}+\gamma_{k}\left(\bm{y}-\bm{v}^{(k)}\right)

which simplifies to

vi(k+1)={γk+(1γk)vi(k),if i=argmax{fvi|𝒗=𝒗(k),i=1,,n}vi(k),otherwise\displaystyle v_{i}^{(k+1)}=\begin{cases}\gamma_{k}+(1-\gamma_{k})v_{i}^{(k)},&\text{if }i=\arg\max\left\{\left.\frac{\partial f}{\partial v_{i}}\right|_{\bm{v}=\bm{v}^{(k)}},~{}i=1,\ldots,n\right\}\\ v_{i}^{(k)},&\text{otherwise}\end{cases}

For step size γk\gamma_{k} we can use the default choice γk=2/(k+2)\gamma_{k}=2/(k+2) or perform a line search to find γk\gamma_{k} that minimizes f(𝒗(k)+γk(𝒔𝒗(k)))f\left(\bm{v}^{(k)}+\gamma_{k}\left(\bm{s}-\bm{v}^{(k)}\right)\right) subject to 0γk10\leq\gamma_{k}\leq 1. In practice, we implemented a backtracking linesearch using the Armijo’s rule (Nocedal and Wright, 2006). The Frank-Wolfe algorithm is summarized in Algorithm 5, while the backtracking linesearch is presented in Algorithm 6.

Algorithm 5 Frank-Wolfe algorithm to estimate (𝝀,𝒗)(\bm{\lambda},\bm{v}) in (15) with constraint (20)
1:Input: {(𝒛t,𝒐t,St,It,Lt)}t=1T\left\{\left(\bm{z}_{t},\bm{o}_{t},S_{t},I_{t},L_{t}\right)\right\}_{t=1}^{T}, ss, α\alpha
2:Let Kj=t=1Tzjt,j=1,,nK_{j}=\sum_{t=1}^{T}z_{jt},~{}j=1,\ldots,n
3:Let mt=j=1nzjt,t=1,,Tm_{t}=\sum_{j=1}^{n}z_{jt},~{}t=1,\ldots,T and r=(1s)/sr=(1-s)/s
4:Let k=0k=0, initialize 𝒗(0)\bm{v}^{(0)}
5:while Stopping criteria is not satisfied do
6:    for t=1,,Tt=1,\ldots,T do
7:        v0t(k)=r[(1α)iItvi(k)+αiStvi(k)oit]v_{0t}^{(k)}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}^{(k)}+\alpha\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right]
8:    end for
9:    (𝒗(k))={t|Lt<mtv0t(k)+iStvi(k)oitiStvi(k)oit}\mathcal{B}(\bm{v}^{(k)})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right\}
10:    for j=1,,nj=1,\ldots,n do
11:        
gj=\displaystyle g_{j}= Kjvj(k)t(𝒗(k))mt𝟙(jSt)ojtiStvi(k)oit\displaystyle\frac{K_{j}}{v_{j}^{(k)}}-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-
t(𝒗(k))mt𝟙(jSt)ojt(rα+1)v0t(k)+iStvi(k)oitt(𝒗(k))Lt𝟙(jSt)ojtr(1α)(iItvi(k))(v0t(k)+iStvi(k)oit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}(r\alpha+1)}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}r(1-\alpha)\left(\sum_{i\in I_{t}}v_{i}^{(k)}\right)}{\left(v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}-
t(𝒗(k))mt𝟙(jIt)r(1α)v0t(k)+iStvi(k)oit+t(𝒗(k))Lt𝟙(jIt)r(1α)(iStvi(k)oit)(v0t(k)+iStvi(k)oit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)\left(\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)}{\left(v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}
12:    end for
13:    l=argmax{gj,j=1,,n}l=\arg\max\left\{g_{j},~{}j=1,\ldots,n\right\} \triangleright Find direction
14:    𝒚=𝒆l\bm{y}=\bm{e}_{l} \triangleright 𝒆i\bm{e}_{i} is iith unit vector
15:    Use γk=2/(k+2)\gamma_{k}=2/(k+2) or find γk\gamma_{k} using Algorithm 6 \triangleright Find step size
16:    𝒗(k+1)=𝒗(k)+γk(𝒚𝒗(k))\bm{v}^{(k+1)}=\bm{v}^{(k)}+\gamma_{k}\left(\bm{y}-\bm{v}^{(k)}\right) \triangleright Frank-Wolfe update
17:    k=k+1k=k+1
18:end while
19:for t=1,,Tt=1,\ldots,T do
20:    v0t(k)=r[(1α)iItvi(k)+αiStvi(k)oit]v_{0t}^{(k)}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}^{(k)}+\alpha\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right]
21:    λt=min(Lt,mtv0t(k)+iStvi(k)oitiStvi(k)oit)\lambda_{t}=\min\left(L_{t},m_{t}\frac{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)
22:end for
Algorithm 6 Backtracking linesearch by Armijo’s rule to find step size
1:Initialize β\beta, τ\tau, γ(0)\gamma^{(0)} \triangleright e.g. β=0.001\beta=0.001, τ=0.5\tau=0.5, γ(0)=1\gamma^{(0)}=1
2:Let h=0h=0, a=1a=1, b=2b=2
3:Let 𝒔=𝒚𝒗(h)\bm{s}=\bm{y}-\bm{v}^{(h)}
4:Let
fv=\displaystyle f_{v}= i=1nKilog(vi(k))t(𝒗(k))mtlog(iStvi(k)oit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v^{(k)}_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)-
t(𝒗(k))mtlog(v0t(k)+iStvi(k)oit)t(𝒗(k))LtiStvi(k)oitv0t(k)+iStvi(k)oit+C3((𝒗(k)))\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\log\left({v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v}^{(k)})\right)
5:Let gvs=i=1ngisig_{v}s=\sum_{i=1}^{n}g_{i}s_{i}
6:while a<ba<b do
7:    𝒘=𝒗(k)+γ(h+1)𝒔\bm{w}=\bm{v}^{(k)}+\gamma^{(h+1)}\bm{s}
8:    w0t=r[(1α)iItwi+αiStwioit],t=1,,Tw_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}w_{i}+\alpha\sum_{i\in S_{t}}w_{i}\cdot o_{it}\right],~{}t=1,\ldots,T
9:    
a\displaystyle a =\displaystyle= i=1nKilog(wi)t(𝒘)mtlog(iStwioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(w_{i})-\sum_{t\not\in\mathcal{B}(\bm{w})}m_{t}\log\left(\sum_{i\in S_{t}}w_{i}\cdot o_{it}\right)-
t(𝒘)mtlog(w0t+iStwioit)t(𝒘)LtiStwioitw0t+iStwioit+C3((𝒘))\displaystyle\sum_{t\in\mathcal{B}(\bm{w})}m_{t}\log\left({w_{0t}+\sum_{i\in S_{t}}w_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{w})}L_{t}\frac{\sum_{i\in S_{t}}w_{i}\cdot o_{it}}{w_{0t}+\sum_{i\in S_{t}}w_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{w})\right)
10:    b=fv+γ(h+1)βgvsb=f_{v}+\gamma^{(h+1)}\beta g_{v}s
11:    γ(h+1)=τγ(h)\gamma^{(h+1)}=\tau\gamma^{(h)}
12:    h=h+1h=h+1
13:end while

4.3 Example: non-homogeneous product set with constraint

To demonstrate Algorithms 3 and 5, let us revisit the example presented in Table 8 by adding constraints to the problem. We constrain the arrival rates to be less than twice the observed sales, that is Lt=2mtL_{t}=2m_{t}. We use α=0\alpha=0, assuming that the OA product is always available, and set v0t=1v_{0t}=1 for the MM algorithm. Note that ot=Sto_{t}=S_{t}, since we have fully open and closed assortments. The results, using the MM algorithm, are presented in Table 13.

Estimates Period 𝐯i\mathbf{v}_{i}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
flt1-prod1 10.41 11.45 9.37 11.80 12.47 10.06 8.74 7.29 9.72 5.83 4.37 6.80 0.97 1.46 1.46 1.000
flt1-prod2 9.40 10.34 8.46 10.65 11.26 9.08 7.89 6.58 8.77 5.26 3.95 6.14 0.88 1.32 1.32 0.903
flt1-prod3 5.11 5.62 4.60 5.79 6.12 4.94 4.29 3.58 4.77 2.86 2.15 3.34 0.48 0.72 0.72 0.491
flt1-prod4 3.71 4.08 3.33 4.20 4.44 3.58 3.11 2.59 3.46 2.07 1.56 2.42 0.35 0.52 0.52 0.356
flt1-prod5 1.38 1.52 1.24 1.56 1.65 1.33 1.16 0.97 1.29 0.77 0.58 0.90 0.13 0.19 0.19 0.133
flt2-prod1 1.000
flt2-prod2 0.903
flt2-prod3 0.491
flt2-prod4 0.356
flt2-prod5 0.133
flt3-prod1 20.82 22.90 18.74 23.59 24.94 20.11 17.49 14.57 19.43 11.66 8.74 13.60 1.94 2.91 2.91 2.000
flt3-prod2 18.79 20.67 16.91 21.30 22.52 18.16 15.79 13.16 17.54 10.52 7.89 12.28 1.75 2.63 2.63 1.806
flt3-prod3 10.22 11.24 9.20 11.58 12.24 9.87 8.58 7.15 9.54 5.72 4.29 6.68 0.95 1.43 1.43 0.982
flt3-prod4 7.41 8.15 6.67 8.40 8.88 7.16 6.22 5.19 6.92 4.15 3.11 4.84 0.69 1.04 1.04 0.712
flt3-prod5 2.76 3.03 2.48 3.13 3.31 2.67 2.32 1.93 2.57 1.54 1.16 1.80 0.26 0.39 0.39 0.265
λt\mathbf{\lambda}_{t} 128.57 141.43 115.71 145.71 154.04 124.22 108.00 90.00 120.00 72.00 54.00 84.00 12.00 18.00 18.00
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
flt1-prod1 1.000
flt1-prod2 0.903
flt1-prod3 0.491
flt1-prod4 0.356
flt1-prod5 0.133
flt2-prod1 10.41 11.45 9.37 11.80 12.47 10.06 8.74 7.29 9.72 5.83 4.37 6.80 0.97 1.46 1.46 1.000
flt2-prod2 9.40 10.34 8.46 10.65 11.26 9.08 7.89 6.58 8.77 5.26 3.95 6.14 0.88 1.32 1.32 0.903
flt2-prod3 5.11 5.62 4.60 5.79 6.12 4.94 4.29 3.58 4.77 2.86 2.15 3.34 0.48 0.72 0.72 0.491
flt2-prod4 3.71 4.08 3.33 4.20 4.44 3.58 3.11 2.59 3.46 2.07 1.56 2.42 0.35 0.52 0.52 0.356
flt2-prod5 1.38 1.52 1.24 1.56 1.65 1.33 1.16 0.97 1.29 0.77 0.58 0.90 0.13 0.19 0.19 0.133
flt3-prod1 20.82 22.90 18.74 23.59 24.94 20.11 17.49 14.57 19.43 11.66 8.74 13.60 1.94 2.91 2.91 2.000
flt3-prod2 18.79 20.67 16.91 21.30 22.52 18.16 15.79 13.16 17.54 10.52 7.89 12.28 1.75 2.63 2.63 1.806
flt3-prod3 10.22 11.24 9.20 11.58 12.24 9.87 8.58 7.15 9.54 5.72 4.29 6.68 0.95 1.43 1.43 0.982
flt3-prod4 7.41 8.15 6.67 8.40 8.88 7.16 6.22 5.19 6.92 4.15 3.11 4.84 0.69 1.04 1.04 0.712
flt3-prod5 2.76 3.03 2.48 3.13 3.31 2.67 2.32 1.93 2.57 1.54 1.16 1.80 0.26 0.39 0.39 0.265
λt\mathbf{\lambda}_{t} 128.57 141.43 115.71 145.71 154.04 124.22 108.00 90.00 120.00 72.00 54.00 84.00 12.00 18.00 18.00
Table 13: Schedule change example with constraint: estimated demand and parameters using MM algorithm

We observe the expected symmetry in the results, that is the estimated demands and preference weights are same for flights 1 and 2, and twice for flight 3. We can see that the estimated value of λt\lambda_{t} is equal to the upper bound LtL_{t} for time periods 7-15 and 22-30. Using the Frank-Wolfe algorithm, we converge to the same solution. Note that Frank-Wolfe is solving a problem with different constraints, but for this symmetric example and in the case of α=0\alpha=0 the two different formulations are equivalent. It is also interesting to note that using a built-in nonlinear optimization routine in R (sequential quadratic programming), we can solve the problem in 141 iterations, taking 2.53 seconds, converging to the same solution. The Frank-Wolfe algorithm with Armijo’s rule converges in 267 iterations taking 0.12 second, and the MM algorithm converges in 11 iterations taking only 0.006 seconds. The convergence of the Frank–Wolfe algorithm is sublinear, in general, and using the default step size γk=2/(k+2)\gamma_{k}=2/(k+2) results in even much slower convergence. Note, however, that the Frank-Wolfe algorithm solves an optimization problem with market share constraint at each time period tt substituted into the objective function. For this problem we were not able to develop the MM algorithm due to the difficulty of finding a suitable minorizer. The formulation of the MM algorithm uses an aggregate market share constraint, where we assume that v0tv_{0t} are known. The solutions are not equivalent, in general.

5 Conclusions and Future Research

In this paper we discussed some of the practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We presented modifications and extensions of the models and discussed data preprocessing and solution techniques which can be useful for practitioners dealing with sales transaction data. We hope that these discussions could facilitate further methodological progress and even more rigorous theoretical research in this domain. We presented an algorithm to split sales transaction data observed under partial availability, and we extended an EM algorithm for the case we observe a non-homogeneous product set. We developed two iterative optimization algorithms which incorporated partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates. In one formulation we used market share constraint at each time period, and incorporated them into the objective function through the preference weights of the outside alternative. This formulation was solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. We discussed another formulation, which used a single, aggregate market share constraint over the time horizon, and assumed the knowledge of preference weights of the outside alternative. Using this formulation we could develop a very fast, iterative minorization-maximization algorithm building on the work in Abdallah and Vulcano (2016).

Future extension of these methods are possible. For instance, after using the sales splitting algorithm, it would be interesting to group the arrival rates in the EM algorithm and avoid having too many parameters to be estimated. Similarly, it would be of practical interest to extend the EM algorithm to the constrained optimization case or introduce other regularization on the parameters. The MM and the Frank-Wolfe algorithms developed in this paper could be extended to include covariates and additional preference weights for the product with lowest fare. The Frank-Wolfe algorithm could be further improved by taking the gradient descent direction instead of coordinate descent. Finally, in practical applications, we often encounter sparse demand distribution with excess number of zeros and heavy tails. A natural extension of the currently used models would be to use a zero-inflated Poisson or negative binomial distribution to describe the customer arrival process.

6 Appendix

Using KKT conditions to remove 𝝀\bm{\lambda}

The KKT condition of Lagrangian (15) are

(𝒗,𝝀,𝝁)λt=mtλtμt+iStvioitv0t+iStvioit=0,t=1,,T\displaystyle\frac{\partial{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})}{\partial\lambda_{t}}=\frac{m_{t}}{\lambda_{t}}-\frac{\mu_{t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}=0,~{}t=1,\ldots,T
μt(λtLt)=0,t=1,,T\displaystyle\mu_{t}(\lambda_{t}-L_{t})=0,~{}t=1,\ldots,T
μt0,t=1,,T\displaystyle\mu_{t}\geq 0,~{}t=1,\ldots,T
λtLt,t=1,,T\displaystyle\lambda_{t}\leq L_{t},~{}t=1,\ldots,T

Expressing λt\lambda_{t} and μt\mu_{t} in first equation results in

λt\displaystyle\lambda_{t} =\displaystyle= mtv0t+iStvioitμt+iStvioit\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\mu_{t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}
μt\displaystyle\mu_{t} =\displaystyle= mtλt(v0t+iStvioit)iStvioit\displaystyle\frac{m_{t}}{\lambda_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}

Plugging back μt\mu_{t}, the complementary slackness condition becomes

(mtλt(v0t+iStvioit)iStvioit)(λtLt)=0\displaystyle\left(\frac{m_{t}}{\lambda_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)(\lambda_{t}-L_{t})=0

Therefore, we have that one of the following conditions should hold:

λt1\displaystyle\lambda_{t}^{1} =\displaystyle= Lt\displaystyle L_{t}
λt2\displaystyle\lambda_{t}^{2} =\displaystyle= mtv0t+iStvioitiStvioit\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}

So, the partial KKT condition stated earlier reduces to the following conditions:
If Lt<mtv0t+iStvioitiStvioitL_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}} then

λt\displaystyle\lambda_{t} =\displaystyle= Lt\displaystyle L_{t}
μt\displaystyle\mu_{t} =\displaystyle= mtLt(v0t+iStvioit)iStvioit\displaystyle\frac{m_{t}}{L_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}

else

λt\displaystyle\lambda_{t} =\displaystyle= mtv0t+iStvioitiStvioit\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}
μt\displaystyle\mu_{t} =\displaystyle= 0\displaystyle 0

Let us define (𝒗)={t|Lt<mtv0t+iStvioitiStvioit}\mathcal{B}(\bm{v})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right\} The Lagrangian function can be simplified to

(𝒗,𝝀,𝝁)\displaystyle{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu}) =\displaystyle= lI(𝒗,𝝀)t=1T(μtv0t+iStvioit)(λtLt)\displaystyle l_{I}(\bm{v},\bm{\lambda})-\sum_{t=1}^{T}\left(\frac{\mu_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)(\lambda_{t}-L_{t})
=\displaystyle= t=1TiStzitlog(vioit)t(𝒗)mtlog(iStvioit)+\displaystyle\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)+
t(𝒗)mtlog(Ltv0t+iStvioit)t(𝒗)LtiStvioitv0t+iStvioit+C1((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left(\frac{L_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{1}\left(\mathcal{B}(\bm{v})\right)
=\displaystyle= t=1TiStzitlog(vioit)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)t(𝒗)LtiStvioitv0t+iStvioit+C2((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{2}\left(\mathcal{B}(\bm{v})\right)
=\displaystyle= i=1nKilog(vi)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)t(𝒗)LtiStvioitv0t+iStvioit\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}
=\displaystyle= i=1nKilog(vi)t(𝒗)mtlog(iStvioit)\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-
t(𝒗)mtlog(v0t+iStvioit)+t(𝒗)Ltv0tv0t+iStvioit+C3((𝒗))\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{v_{0t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)

where

C1((𝒗))\displaystyle C_{1}\left(\mathcal{B}(\bm{v})\right) =\displaystyle= t(𝒗)(mtlogmtmt)\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)
C2((𝒗))\displaystyle C_{2}\left(\mathcal{B}(\bm{v})\right) =\displaystyle= t(𝒗)(mtlogmtmt)+t(𝒗)mtlogLt\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)+\sum_{t\in\mathcal{B}(\boldsymbol{v})}m_{t}\log L_{t}
C3((𝒗))\displaystyle C_{3}\left(\mathcal{B}(\bm{v})\right) =\displaystyle= t(𝒗)(mtlogmtmt)+t(𝒗)(mtlogLtLt).\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)+\sum_{t\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log L_{t}-L_{t}\right). (26)

Note that

t=1TiStzitlog(vioit)=t=1TiStzitlog(vi)+C4=i=1nKilog(vi)+C4\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})=\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i})+C_{4}=\sum_{i=1}^{n}K_{i}\log(v_{i})+C_{4}

for some constant C4C_{4}, because zit=0z_{it}=0 when iSti\notin S_{t}.

Newton’s method to find η\eta

To simplify notation, let us define

Aj\displaystyle A_{j} =\displaystyle= tj(𝒗(k))mt𝟙(jSt)ojtiStvi(k)+tj(𝒗(k))mt𝟙(jSt)ojtv0t+iStvi(k)+\displaystyle\sum_{t\not\in\mathcal{B}_{j}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}}+\sum_{t\in\mathcal{B}_{j}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}}+
tj(𝒗(k))Ltv0t𝟙(jSt)ojt(v0t+iStvi(k))2\displaystyle\sum_{t\in\mathcal{B}_{j}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{{(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)})}^{2}}
nj\displaystyle n_{j} =\displaystyle= t=1T𝟙(jIt)\displaystyle\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t})
oj\displaystyle o_{j} =\displaystyle= t=1T𝟙(jSt)ojt=t=1Tojt\displaystyle\sum_{t=1}^{T}\mathbbm{1}(j\in S_{t})\cdot o_{jt}=\sum_{t=1}^{T}o_{jt}

where j(𝒗(k))=(𝒗(k)){t|jSt}\mathcal{B}_{j}(\bm{v}^{(k)})=\mathcal{B}(\bm{v}^{(k)})\cap\{t|j\in S_{t}\}, njn_{j} represents the number of times product jj is in the offer set ItI_{t} over time horizon TT, and ojo_{j} represents the proportion of time product jj is available StS_{t} over time horizon TT. Using the notation above, equation (4.1) simplifies to

Kj\displaystyle K_{j} =\displaystyle= vjAj+vjη[(1α)nj+αoj]\displaystyle v_{j}A_{j}+v_{j}\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]

Expressing vjv_{j} in the above equation and plugging it into the market share constraint (19) leads to

(1α)t=1TjItKjAj+η[(1α)nj+αoj]+αt=1TjStKjojtAj+η[(1α)nj+αoj]=s~t=1Tv0t\displaystyle(1-\alpha)\sum_{t=1}^{T}\sum_{j\in I_{t}}\frac{K_{j}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}+\alpha\sum_{t=1}^{T}\sum_{j\in S_{t}}\frac{K_{j}\cdot o_{jt}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}=\tilde{s}\sum_{t=1}^{T}v_{0t}

and finally to

j=1nKj[(1α)nj+αoj]Aj+η[(1α)nj+αoj]=s~t=1Tv0t\displaystyle\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}=\tilde{s}\sum_{t=1}^{T}v_{0t}

where we used general equalities

t=1TjItKj\displaystyle\sum_{t=1}^{T}\sum_{j\in I_{t}}K_{j} =\displaystyle= j=1nnjKj\displaystyle\sum_{j=1}^{n}n_{j}K_{j}
t=1TjStKjojt\displaystyle\sum_{t=1}^{T}\sum_{j\in S_{t}}K_{j}\cdot o_{jt} =\displaystyle= j=1nojKj\displaystyle\sum_{j=1}^{n}o_{j}K_{j}

This leads to

f=j=1nKj[(1α)nj+αoj]Aj+η[(1α)nj+αoj]s~t=1Tv0t\displaystyle f=\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}-\tilde{s}\sum_{t=1}^{T}v_{0t}

and

g=dfdη=j=1nKj[(1α)nj+αoj]2(Aj+η[(1α)nj+αoj])2\displaystyle g=\frac{df}{d\eta}=-\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]^{2}}{\left(A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]\right)^{2}}

and to Newton’s method summarized in Algorithm (4).

Computation of gradient in Frank-Wolfe algorithm

Here we will derive the gradient vector of the objective function (4.2), and show a computational formula. The elements of f(𝒗)\nabla f\left(\bm{v}\right) can be derived as

fvi=\displaystyle\frac{\partial f}{\partial v_{i}}= Kivit(𝒗)mt𝟙(iSt)oitiStvioitt(𝒗)mtr(1α)𝟙(iIt)+(rα+1)𝟙(iSt)oitv0t+iStvioit\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-
t(𝒗)Lt𝟙(iSt)oit(v0t+iStvioit)(iStvioit)(r(1α)𝟙(iIt)+(rα+1)𝟙(iSt)oit)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-(\sum_{i\in S_{t}}v_{i}\cdot o_{it})(r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}
=\displaystyle= Kivit(𝒗)mt𝟙(iSt)oitiStvioitt(𝒗)mtr(1α)𝟙(iIt)+(rα+1)𝟙(iSt)oitv0t+iStvioit\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-
t(𝒗)Ltr(1α)(iItvi)𝟙(iSt)oitr(1α)(iStvioit)𝟙(iIt)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{r(1-\alpha)(\sum_{i\in I_{t}}v_{i})\mathbbm{1}(i\in S_{t})o_{it}-r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})\mathbbm{1}(i\in I_{t})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}
=\displaystyle= Kivit(𝒗)mt𝟙(iSt)oitiStvioit\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-
t(𝒗)mt𝟙(iSt)oit(rα+1)v0t+iStvioitt(𝒗)Lt𝟙(iSt)oitr(1α)(iItvi)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}(r\alpha+1)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}r(1-\alpha)(\sum_{i\in I_{t}}v_{i})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}-
t(𝒗)mt𝟙(iIt)r(1α)v0t+iStvioit+t(𝒗)Lt𝟙(iIt)r(1α)(iStvioit)(v0t+iStvioit)2\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in I_{t})r(1-\alpha)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in I_{t})r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}

To implement the gradient with matrix-vector operations let us define

𝑲+n\displaystyle\bm{K}\in\mathbb{R}^{n}_{+} \displaystyle- Ki is the total purchases of product i over the selling horizon\displaystyle K_{i}\textnormal{ is the total purchases of product $i$ over the selling horizon}
𝒗+n\displaystyle\bm{v}\in\mathbb{R}^{n}_{+} \displaystyle- vi is the preference weight for product i\displaystyle v_{i}\textnormal{ is the preference weight for product $i$}
𝒗0+T\displaystyle\bm{v}_{0}\in\mathbb{R}^{T}_{+} \displaystyle- v0t is the preference weight for outside alternative at time t\displaystyle v_{0t}\textnormal{ is the preference weight for outside alternative at time $t$}
r+\displaystyle r\in\mathbb{R}_{+} \displaystyle- r=(1s)/s , where s is the host market share\displaystyle r=(1-s)/s\textnormal{ , where $s$ is the host market share}
𝑰{0,1}nxT\displaystyle\bm{I}\in\{0,1\}^{nxT} \displaystyle- Iit=1 if product i is in the offer set at time t\displaystyle I_{it}=1\textnormal{ if product $i$ is in the offer set at time $t$}
𝑺{0,1}nxT\displaystyle\bm{S}\in\{0,1\}^{nxT} \displaystyle- Sit=1 if product i is available for sale at time t\displaystyle S_{it}=1\textnormal{ if product $i$ is available for sale at time $t$}
𝑶[0,1]nxT\displaystyle\bm{O}\in[0,1]^{nxT} \displaystyle- oit is percentage of time product i is available for sale during time period t\displaystyle o_{it}\textnormal{ is percentage of time product $i$ is available for sale during time period $t$}
𝒎+T\displaystyle\bm{m}\in\mathbb{R}^{T}_{+} \displaystyle- mt is the total purchases of all products at time t\displaystyle m_{t}\textnormal{ is the total purchases of all products at time $t$}
𝑳+T\displaystyle\bm{L}\in\mathbb{R}^{T}_{+} \displaystyle- Lt is the upper bound on λt\displaystyle L_{t}\textnormal{ is the upper bound on $\lambda_{t}$}
𝑩{0,1}T\displaystyle\bm{B}\in\{0,1\}^{T} \displaystyle- Bt=1 if bound Lt is violated at time t(Lt<mtv0t+iStvioitiStvioit)\displaystyle B_{t}=1\textnormal{ if bound $L_{t}$ is violated at time $t$}\left(L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)
α[0,1]\displaystyle\alpha\in[0,1] \displaystyle- parameter to control availability of outside alternative

Then

f(𝒗)=𝑲𝒗+(𝑺𝑶)[𝒎((𝑺𝑶)T𝒗)(𝟏𝑩)\displaystyle\nabla f\left(\bm{v}\right)=\bm{K}\oslash\bm{v}+(\bm{S}\circ\bm{O})\left[-\bm{m}\oslash\left((\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ(\mathbf{1}-\bm{B})-\right.
(rα+1)𝒎(𝒗0+(𝑺𝑶)T𝒗)𝑩\displaystyle\left.(r\alpha+1)\bm{m}\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ\bm{B}-\right.
(rα+1)𝑳(𝑰T𝒗)(𝒗0+(𝑺𝑶)T𝒗)2]+\displaystyle\left.(r\alpha+1)\bm{L}\circ\left(\bm{I}^{T}\bm{v}\right)\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)^{\circ 2}\right]+
𝑰[r(1α)𝒎(𝒗0+(𝑺𝑶)T𝒗)𝑩\displaystyle\bm{I}\left[-r(1-\alpha)\bm{m}\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ\bm{B}-\right.
(rα+1)𝑳((𝑺𝑶)T𝒗)(𝒗0+(𝑺𝑶)T𝒗)2]\displaystyle\left.(r\alpha+1)\bm{L}\circ\left((\bm{S}\circ\bm{O})^{T}\bm{v}\right)\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)^{\circ 2}\right]

where \oslash, \circ, and 2\circ 2 denote the elementwise subtraction, addition, and square operations.

Acknowledgments

The authors would like to gratefully acknowledge Guillermo Gallego who suggested the idea of splitting the sales in case of partial availability.

References

  • (1)
  • Abdallah and Vulcano (2016) Abdallah, T. and Vulcano, G. (2016), Demand estimation under the multinomial logit model from sales transaction data. Working Paper. Available at https://www.researchgate.net/publication/303408073.
  • Ben-Akiva and Lerman (1994) Ben-Akiva, M. and Lerman, S. (1994), Discrete Choice Analysis: Theory and Applications to Travel Demand, 6 edn, The MIT Press, Cambridge, MA.
  • Bertsekas (1999) Bertsekas, D. (1999), Nonlinear Programming, Athena Scientific.
  • Cao et al. (2019) Cao, Y., Kleywegt, A. and Wang, H. (2019), Network revenue management under a spiked multinomial logit choice model. Working Paper. Available at SSRN:https://ssrn.com/abstract=3200531.
  • Dai et al. (2014) Dai, J., Ding, W., Kleywegt, A., Wang, X. and Zhang, Y. (2014), Choice-based revenue management for parallel flights. Working paper, Georgia Institute of Technology.
  • Frank and Wolfe (1956) Frank, M. and Wolfe, P. (1956), ‘An algorithm for quadratic programming’, Naval Research Logistics Quarterly 3, 95–110.
  • Gallego et al. (2015) Gallego, G., Ratliff, R. and Shebalov, S. (2015), ‘A general attraction model and sales-based linear program for network revenue management under customer choice’, Operations Research 63(1), 212 – 232.
  • Hunter and Lange (2000a) Hunter, D. R. and Lange, K. (2000a), ‘Rejoinder to discussion of “optimization transfer using surrogate objective functions”’, Journal of Computational and Graphical Statistics 9, 52–59.
  • Hunter and Lange (2004) Hunter, D. R. and Lange, K. (2004), ‘A tutorial on MM algorithms’, The American Statistician 58(1), 30–37.
  • Korte and Vygen (2012) Korte, B. H. and Vygen, J. (2012), Combinatorial Optimization: Theory and Algorithms, Springer-Verlag, New York, NY.
  • Nocedal and Wright (2006) Nocedal, J. and Wright, S. J. (2006), Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2 edn, Springer, New York.
  • Sharif Azadeh et al. (2014) Sharif Azadeh, S., Marcotte, P. and Savard, G. (2014), ‘A taxonomy of demand uncensoring methods in revenue management’, Journal of Revenue and Pricing Management 13, 440–456.
  • Train (2003) Train, K. (2003), Discrete choice methods with simulation, Cambridge University Press, New York, NY.
  • Vulcano et al. (2012) Vulcano, G., van Ryzin, G. and Ratliff, R. (2012), ‘Estimating primary demand for substitutable products from sales transaction data’, Operations Research 60(2), 313–334.
  • Talluri and van Ryzin (2004) Talluri, K. and van Ryzin, G. (2004), ‘Revenue Management Under a General Discrete Choice Model of Consumer Behavior’, Management Science 50(1), 15–33.