Demand Estimation from Sales Transaction Data - Practical Extensions
Submission to Journal of Revenue and Pricing Management

Norbert Remenyi
Xiaodong Luo

Abstract

In this paper we discuss practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We present modifications and extensions of the models and discuss data preprocessing and solution techniques which are useful for practitioners dealing with sales transaction data. Among these, we present an algorithm to split sales transaction data observed under partial availability, we extend a popular Expectation Maximization (EM) algorithm for non-homogeneous product sets, and we develop two iterative optimization algorithms which can handle much of the extensions discussed in the paper.

Sabre

May 31, 2020

Keywords. Demand estimation. Demand untruncation. Multinomial logit model. EM algorithm. MM algorithm. Frank-Wolfe method. Revenue management

1 Introduction

Demand estimation using censored sales transaction data has many applications in airline commercial planning process. We refer readers to a survey paper by Sharif Azadeh et al. (2014) for a brief introduction to this topic. In this paper we discuss some practical limitations and extensions of a particular choice-based demand model popular in the literature to estimate demand from sales transaction data. Discrete choice models (e.g., Ben-Akiva and Lerman (1994), Train (2003)) have provided a popular approach for estimating demand for different products in a set of substitutable items, especially in transportation and revenue management applications. We look at a common demand model, which appeared in several papers, including Dai et al. (2014), Vulcano et al. (2012), and Abdallah and Vulcano (2016). The motivation of this work came from observing a few shortcoming of these models when applied in practice, on airline revenue management data.

We will build on and extend the work presented in Vulcano et al. (2012) and Abdallah and Vulcano (2016). They combine a multinomial logit (MNL) choice model with non-homogeneous Poisson arrivals over multiple periods. The MNL model has been used by many practitioners and researchers to represent the underlying choice behavior. Although its property of independence of irrelevant alternatives (IIA) is somewhat restrictive, the model is simple, leading to tractable estimation and assortment optimization (Talluri and van Ryzin, 2004). The problem is how to jointly estimate the arrival rates of customers and the preference weights of the products via maximizing the likelihood functions. There are two different likelihood functions. The first one is the incomplete data likelihood function (see $(\ref{eq:incomp})$ below and also Eq. (2) in Vulcano et al. (2012)) and the second one is the log-likelihood function which is based on the primary demand (see $(\ref{eq:compLL})$ below and also Eq. (13) in Vulcano et al. (2012)). The inputs are observed historical sales, availability of the products, and market share information.

Our contribution is to discuss practical limitations of the specific model above, and present some interesting extensions. We will discuss partial availability of products, some relaxation of the IIA assumption, constrained parameter space, non-homogeneous product set, and the interpretation of the no-purchase option and related market share. We hope this discussion can facilitate more research and extension of these models.

We will present an algorithm to split sales transaction data observed under partial availability, and an extension of the EM algorithm for the case when we observe a non-homogeneous product set. We develop two iterative optimization algorithms which incorporate partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates of customers. In the first formulation we use a market share constraint at each time period, and incorporate them into the objective function through the preference weights of the outside alternative. The formulation is solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. In the second formulation we use a single, aggregate market share constraint over the time horizon, and assume knowledge of preference weights of the outside alternative. Using this formulation we develop a fast, iterative minorization-maximization algorithm (MM) building on the work in Abdallah and Vulcano (2016).

While the EM algorithm focuses on solving the complete data log-likelihood functions, the two new algorithms (like Abdallah and Vulcano (2016)) aim to solve the incomplete likelihood function directly. We remark that both likelihood functions render similar quality solutions, but they involve different intermediate decision variables and require different solution approaches. It is not known that these two methods are mathematically equivalent.

2 Practical limitations of existing models

In this section we discuss in detail some practical limitations of the choice-based demand model discussed in Vulcano et al. (2012) and in Abdallah and Vulcano (2016). The model combines non-homogeneous Poisson arrivals over multiple periods with a multinomial logit (MNL) choice model. The model assumes a retailer offers a fixed number of $n$ products over a time horizon $T$ , and the products are either available or not available for sale in a time period. From the observed sales data we estimate the demand, the sales we would have observed if all the products were available for purchase. The total demand of all the products at time $t$ (including outside alternatives and the no-purchase option) is modeled as a Poisson distributed random variable with parameter $\lambda_{t}$ . Hence we model the total demand as a non-homogeneous Poisson model, since different time periods are allowed to have different mean demands.

The demand for individual products is modeled using a multinomial distribution, given the total demand of all products. The preferences for different products are assumed to be fixed over time, and the probability that customer chooses product $i$ when $i\in S_{t}$ is modeled through the simple multinomial logit (MNL) model, that is

\displaystyle P_{i}(S_{t},\bm{v})=\frac{v_{i}}{v_{0}+\sum_{j\in S_{t}}v_{j}}

(1)

When $i\notin S_{t}$ , then $P_{i}(S_{t},\bm{v})=0$ . Here $v_{i},~{}i=1,...,n$ is the positive preference weight for product $i$ , and $S_{t}$ is the set of products available for sale at time $t$ . The preference weight of outside alternatives and no-purchase option (OA) are embedded into the coefficient $v_{0}$ . The parameter is set to $v_{0}=1$ in the references above, following a standard approach of normalizing.

The incomplete data likelihood function of the model is defined as

\displaystyle L_{I}(\bm{v},\bm{\lambda})

\displaystyle=

\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\prod_{j\in S_{t}}\left[\frac{P_{j}(S_{t},\bm{v})}{\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})}\right]^{z_{jt}}\right]

where

\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})

\displaystyle=

\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v})\right)}{m_{t}!}

In the above equations $z_{it}$ denotes the number of purchases of product $i$ at time period $t$ , and $m_{t}=\sum_{i=1}^{n}z_{it}$ denotes the total number of purchases in period $t$ . After some algebra the log-likelihood function can be written as

\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]

(2)

One approach is to directly maximize the log-likelihood function and jointly estimate the preference weights of the products and the arrival rates of customers. The above log-likelihood function, however, is hard to solve in general, therefore the research literature discusses different approaches to estimate the parameters of this model, given sales data and information on what was available for sale. Vulcano et al. (2012) developed an elegant EM algorithm by looking at the problem in terms of primary (first choice) demand, and treating the observed demand as incomplete observations of primary demand. Abdallah and Vulcano (2016) solves the estimation problem by specializing the minorization-maximization (MM) procedure, which is an iterative algorithm for maximizing an objective function by successively maximizing a simpler function that minorizes the true objective function.

It is also interesting to note that the objective function has a continuum of maximizers. For this reason, Vulcano et al. (2012) and Abdallah and Vulcano (2016) imposed additional constraint on the preference weights as a function of the market share

\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}}

There are a number of limiting assumptions and possible extensions of the model presented above, when we apply it to real data.

1.
Products are fully open or closed for sale
- The discussed model assumes that a product is either available or not for sale. Practitioners often work with aggregated data, and unable to capture the sales at a very granular level, every time the assortment changes. It is very practical to extend the model to consume data where we observe partial availability. For example, a product can be 80% open for sale in a time period.
2.
MNL assumption
- Customers choose from the available products according to an MNL model. One of the properties of the MNL model is the independence of irrelevant alternatives (IIA), which can be unrealistic in real applications. If customers, for instance, always purchase the product with the lowest price, the algorithm in Vulcano et al. (2012) does not converge. If the sell-down is strong, the demand is grossly overestimated.
3.
Unconstrained parameter space
- In practice we can encounter parameter estimates which are unreasonable in a business scenario. This may be due to the fact that the underlying assumptions of the model (such as MNL) do not exactly fit the true data generating model. Instead of using a more sophisticated model and develop its solution algorithm, we might want to simply constrain the parameters of the simpler model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for $L_{1}$ regularization (Lasso regression) and $L_{2}$ regularization (Ridge regression). It can be also of interest to solve the problem by putting an upper bound on the arrival rate of customers ( $\lambda_{t}$ ). This can be helpful to regularize the model by having an interpretable business fence on these parameters.
4.
Homogeneous product set
- The model assumes that over different time periods we have the same set of existing products. Some might be unavailable for sale, but there exist an underlying demand for them. In revenue management, we often encounter changing product IDs (unique flight identifiers) due to schedule changes, hence we need to be able to model a non-homogeneous product set over time. Instead of handling the homogeneous parts separately, we would like to use the data over a larger time interval to borrow power to estimate the parameters.
5.
No-purchase option is always available
- The model above assumes that the no-purchase option is always available, that is fully open. If we include competitor’s products into the no-purchase option, this can be an unrealistic assumption. For instance, airline competitors likely control their inventory similarly as the host airline, gradually closing down products as we get closer to departure. This assumption has implications on how the host market share is interpreted in the models.

In this paper we address some of these points, and discuss how we could relax these assumptions, incorporate them into the model, and estimate the parameters. These ideas might foster further extensions of the existing models and rigorous research in the future.

2.1 Partial availability

Practitioners can encounter data at an aggregate level, where we can observe partial availability of products. As an example, in a revenue management system we store information on sales and availability at certain pre-departure time points. It is possible that a product becomes closed for sale during the time period, and we observe partial availability. For instance, a booking class on a flight can be open 60% of the time in a time interval, and the bookings we observe are matched to this partial availability. It would be of interest to extend the algorithms to work with this type of data and handle the full spectrum of availability in $[0,1]$ , as opposed to be restricted to either open (1) or closed (0) products. Another possible venue is to modify the data to fit the algorithms already developed in the literature.

2.1.1 Extending the attraction model

A natural way to incorporate partial availability is to extend the MNL purchase probabilities as

\displaystyle P^{\star}_{jt}(S_{t},\bm{v},\bm{o})=\frac{v_{j}\cdot o_{jt}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}

where $o_{it}\in\left[0,1\right]$ represents availability of product $i$ at time $t$ . It is obvious that $o_{it}>0$ is equivalent to $i\in S_{t}$ . This simple formulation modifies the purchase probabilities by linearly adjusting preference weight $v_{i}$ with open percentage $o_{it}$ . If the open percentage is zero or one, we get back to the original formulation.

We have not derived the EM algorithm of Vulcano et al. (2012) with the extended purchase probability definition, but this could be an interesting venue for research. For demonstration purposes we can resort to directly maximizing the log-likelihood using a solver. The modified likelihood function becomes

\displaystyle L_{I}(\bm{v},\bm{\lambda})

\displaystyle=

\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\prod_{j\in S_{t}}\left[\frac{P^{\star}_{j}(S_{t},\bm{v},\bm{o})}{\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})}\right]^{z_{jt}}\right]

(3)

where

\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda})

\displaystyle=

\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P^{\star}_{i}(S_{t},\bm{v},\bm{o})\right)}{m_{t}!}

and, after omitting the constant terms, the log-likelihood function modifies to

\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]

(4)

2.1.2 Data disaggregation

In the previous section we discussed a simple formulation which could potentially be used to incorporate partial availability information into the purchase probability definition. Another approach to handle partial availability is to disaggregate the data to fully open and closed assortments, and use existing algorithms for the estimation. We can split the observed sales under partial availability by making a simple assumption that the sales are distributed uniformly over time.

To demonstrate this idea on a simple example, let us assume that the observed sales $b$ and open percentages $o$ for three available products are

\displaystyle\bm{b}=\begin{bmatrix}1\\ 2\\ 5\end{bmatrix},\quad\bm{o}=\begin{bmatrix}1.0\\ 0.8\\ 0.5\end{bmatrix}

We assumed that the elements of $\bm{o}$ are non-increasing from top to bottom. Then we can represent $\bm{o}$ as the sum of three fully open and closed assortments with weights $\alpha_{i}$ as

\displaystyle\bm{o}=\begin{bmatrix}1.0\\ 0.8\\ 0.5\end{bmatrix}=\alpha_{1}\begin{bmatrix}1\\ 0\\ 0\end{bmatrix}+\alpha_{2}\begin{bmatrix}1\\ 1\\ 0\end{bmatrix}+\alpha_{3}\begin{bmatrix}1\\ 1\\ 1\end{bmatrix}=0.2\begin{bmatrix}1\\ 0\\ 0\end{bmatrix}+0.3\begin{bmatrix}1\\ 1\\ 0\end{bmatrix}+0.5\begin{bmatrix}1\\ 1\\ 1\end{bmatrix}

Note that $\alpha_{i},~{}i=1,...,n$ represent time proportions and $\sum_{i=1}^{n}\alpha_{i}=1$ . The results indicate that all products were available for sale 50% of the time, two products were available 30% of the time, and one product was available 20% of the time. A graphical representation of this example is shown in Figure 1.

Figure 1: Split example

For the general case, the elements of $\bm{\alpha}$ can be calculated as the consecutive differences in the open percentages, that is

	$\displaystyle\alpha_{i}$	$\displaystyle=$	$\displaystyle o_{i}-o_{i+1},\quad i=1,\ldots,n-1$
	$\displaystyle\alpha_{n}$	$\displaystyle=$	$\displaystyle o_{n}$

Following this simple idea we can split the observed sales under partial availability using the following identity

	$\displaystyle\bm{b}$	$\displaystyle=$	$\displaystyle\begin{bmatrix}\frac{\alpha_{1}}{\sum_{i=1}^{n}\alpha_{i}}\\ 0\\ \vdots\\ 0\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+\begin{bmatrix}\frac{\alpha_{2}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{2}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ 0\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+\ldots+\begin{bmatrix}\frac{\alpha_{n-1}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{n-1}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ \frac{\alpha_{n-1}}{\sum_{i=n-1}^{n}\alpha_{i}}\\ 0\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}+$
			$\displaystyle\begin{bmatrix}\frac{\alpha_{n}}{\sum_{i=1}^{n}\alpha_{i}}\\ \frac{\alpha_{n}}{\sum_{i=2}^{n}\alpha_{i}}\\ \vdots\\ \frac{\alpha_{n}}{\sum_{i=n-1}^{n}\alpha_{i}}\\ \frac{\alpha_{n}}{\alpha_{n}}\end{bmatrix}\circ\begin{bmatrix}b_{1}\\ b_{2}\\ \vdots\\ b_{n-1}\\ b_{n}\end{bmatrix}$

where $\circ$ denotes the element-wise multiplication, or Hadamard product. Note that if $\alpha_{i}=0$ , we do not need a split to create a new assortment. The sales splitting algorithm under partial availability is described in Algorithm 1.

Algorithm 1 Sales splitting algorithm for partial availability

\bm{b}

: vector of observed sales

\bm{o}

: vector of observed open percentages

3:Sort

\bm{o}

in decreasing order and apply the order to

\bm{b}

4:Calculate time proportion vector

\bm{\alpha}

5:for

i=1,\ldots,n

6: if

i=n

then

\alpha_{i}=o_{i}

8: else

\alpha_{i}=o_{i}-o_{i+1}

10: end if

11:end for

12:Split sales

13:for

i=1,\ldots,n

14: if

\alpha_{i}=0

then

15: Continue

16: else

17: Calculate partial sales vector

\bm{b^{(i)}}

and open percentage vector

\bm{o^{(i)}}

18: for

j=1,\ldots,n

19: if

j\leq i

then

20:

b^{(i)}_{j}=\frac{\alpha_{i}}{o_{j}}b_{j}

21:

o^{(i)}_{j}=1

22: else

23:

b^{(i)}_{j}=0

24:

o^{(i)}_{j}=0

25: end if

26: end for

27: end if

28:end for

The algorithm results in partial sales vectors $\bm{b^{(i)}}$ and fully open and closed assortment vectors $\bm{o^{(i)}}$ for $i\in\{j:\alpha_{j}>0\}$ . The splitting logic ensures that

\displaystyle\bm{b}=\sum_{i\in\{j:\alpha_{j}>0\}}\bm{b^{(i)}}

After the splitting logic, we could combine intervals from different time $t$ which have exactly the same product offerings and product availabilities, to improve performance.

2.1.3 Comparison on a simulated example

To demonstrate the estimation from sales data with partial availability, we extended the example from Vulcano et al. (2012) by adding partial availability data to the observed sales. The observed data is presented in Table 1.

Sales	Period
Sales	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1
1	10	15	11	14	0	0	0	0	0	0	0	0	0	0	0
2	11	6	11	8	20	16	0	0	0	0	0	0	0	0	0
3	5	6	1	11	4	5	14	7	11	0	0	0	0	0	0
4	4	4	4	1	6	4	3	5	9	9	6	9	0	0	0
5	0	2	0	0	1	0	1	3	0	3	3	5	2	3	3
Open percentage	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1
1	0.7	0.3	1.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.8	0.5	1.0	1.0	0.3	0.3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.9	0.9	1.0	1.0	0.3	0.5	0.9	0.7	0.7	0.0	0.0	0.0	0.2	0.2	0.0
4	1.0	0.9	1.0	1.0	0.5	1.0	1.0	0.8	0.9	0.6	0.5	0.2	0.3	0.5	0.0
5	1.0	1.0	1.0	1.0	1.0	1.0	1.0	0.9	1.0	1.0	1.0	1.0	1.0	1.0	1.0

Table 1: Example with partial availability

We first use the extended attraction model discussed in Section 2.1.1. Direct maximization of the log-likelihood (4) by a nonlinear solver results in total primary demand of 1194.6. The detailed estimated demands and parameters are presented in Table 2.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	$\mathbf{v}_{i}$
1	15.02	20.07	12.47	15.70	33.60	22.73	19.60	19.33	24.84	38.14	32.27	84.35	5.74	7.22	35.06	1.000
2	11.23	15.01	9.32	11.74	25.13	17.00	14.66	14.45	18.58	28.53	24.14	63.08	4.29	5.40	26.22	0.748
3	3.91	5.22	3.24	4.09	8.74	5.92	5.10	5.03	6.46	9.93	8.40	21.95	1.49	1.88	9.13	0.260
4	1.97	2.63	1.63	2.06	4.41	2.98	2.57	2.53	3.26	5.00	4.23	11.06	0.75	0.95	4.60	0.131
5	0.40	0.53	0.33	0.41	0.89	0.60	0.52	0.51	0.66	1.01	0.85	2.23	0.15	0.19	0.93	0.026
$\mathbf{\lambda}_{t}$	46.48	62.10	38.57	48.57	103.95	70.32	60.65	59.79	76.84	118.01	99.84	260.94	17.76	22.34	108.48

Table 2: Estimated demand and parameters using the extended attraction model

Second, we demonstrate the data disaggregation procedure from Section 2.1.2. We first split the observed sales data with partial availability to fully open and closed assortments, apply the EM algorithm of Vulcano et al. (2012), and then aggregate the solution back. To demonstrate this, let us apply Algorithm 1 on the data presented in Table 1. The disaggregated sales are shown in Table 3.

Sales	Period
Sales	15				14				13	12	11			10			9
1				10.00				15.00	11.00	14.00
2			1.38	9.62			2.40	3.60	11.00	8.00			20.00			16.00
3		0.56	0.56	3.89		2.67	1.33	2.00	1.00	11.00			4.00		2.00	3.00		14.00
4	0.40	0.40	0.40	2.80		1.78	0.89	1.33	4.00	1.00		2.40	3.60	2.00	0.80	1.20	0.30	2.70
5	0.00	0.00	0.00	0.00	0.20	0.80	0.40	0.60	0.00	0.00	0.50	0.20	0.30	0.00	0.00	0.00	0.10	0.90

	8			7			6		5		4		3			2			1
1
2
3			7.00			11.00									0.00			0.00
4		0.63	4.37		2.00	7.00		9.00		6.00		9.00		0.00	0.00		0.00	0.00
5	0.33	0.33	2.33	0.00	0.00	0.00	1.20	1.80	1.50	1.50	4.00	1.00	1.40	0.20	0.40	1.50	0.90	0.60	3.00

Table 3: Disaggregated sales of Table 1

After applying the EM algorithm on the disaggregated data, we end up with the disaggregated solution, presented in Table 4.

Estimates	Period																		$\mathbf{v}_{i}$
Estimates	15				14				13	12	11			10			9		$\mathbf{v}_{i}$
1	0.81	0.91	1.33	10.00	2.41	5.02	2.86	15.00	11.00	14.00	6.03	5.27	15.88	4.05	2.68	11.50	0.81	16.85	1.000
2	0.59	0.67	0.94	9.62	1.76	3.65	1.64	3.60	11.00	8.00	4.39	3.83	13.63	2.95	1.95	10.90	0.59	12.26	0.727
3	0.24	0.25	0.38	3.89	0.71	1.20	0.91	2.00	1.00	11.00	1.77	1.55	2.73	1.19	0.90	2.04	0.24	6.29	0.294
4	0.14	0.18	0.27	2.80	0.36	0.80	0.61	1.33	4.00	1.00	0.91	0.85	2.45	0.71	0.36	0.82	0.11	1.21	0.150
5	0.00	0.00	0.00	0.00	0.06	0.36	0.27	0.60	0.00	0.00	0.15	0.07	0.20	0.00	0.00	0.00	0.04	0.40	0.026
$\mathbf{\lambda}_{t}$	2.55	2.87	4.16	37.59	7.57	15.76	8.97	32.19	38.57	48.57	18.94	16.54	49.85	12.73	8.41	36.09	2.55	52.89

	8					6		5		4		3			2			1	x $\mathbf{v}_{i}$ x
1	4.02	1.94	13.13	4.05	17.23	14.48	21.90	18.10	15.21	48.27	20.27	16.89	0.41	0.38	18.10	1.82	0.57	36.20	1.000
2	2.93	1.41	9.55	2.95	12.54	10.53	15.93	13.17	11.06	35.11	14.75	12.29	0.29	0.28	13.17	1.33	0.42	26.33	0.727
3	1.18	0.57	3.15	1.19	4.95	4.26	6.44	5.32	4.47	14.19	5.96	4.97	0.12	0.00	5.32	0.54	0.00	10.64	0.294
4	0.60	0.22	1.97	0.71	3.15	2.17	3.20	2.72	2.14	7.24	3.20	2.53	0.00	0.00	2.72	0.00	0.00	5.43	0.150
5	0.10	0.12	1.05	0.00	0.00	0.37	0.64	0.46	0.53	1.23	0.36	0.43	0.07	0.18	0.46	0.32	0.27	0.92	0.026
$\mathbf{\lambda}_{t}$	12.62	6.10	41.19	12.73	54.09	45.45	68.72	56.81	47.72	151.49	63.63	53.02	1.27	1.20	56.81	5.73	1.80	113.62

Table 4: Estimated demand and parameters of disaggregated sales data

Note that the estimated primary demands do not preserve the split proportions of the sales. For instance, in period 9, $\lambda_{9_{1}}=2.55$ and $\lambda_{9_{2}}=52.89$ , while the sales were split by proportions $\alpha_{1}=0.1$ and $\alpha_{2}=0.9$ . Note that the split proportions of sales do not need to match the proportion of demands, because the varying assortments imply varying market shares and spilled demand. However, to reduce the number of parameters to be estimated, it would be possible to modify the EM algorithm and incorporate constraints on $\lambda_{t_{i}}$ to preserve the split proportions. Aggregating the results back, we end up with the final results presented in Table 5.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	$\mathbf{v}_{i}$
1	13.05	25.29	11.00	14.00	27.19	18.23	17.66	19.09	21.29	36.38	33.31	68.54	17.68	20.50	36.20	1.000
2	11.82	10.64	11.00	8.00	21.85	15.80	12.85	13.89	15.48	26.46	24.22	49.85	12.86	14.91	26.33	0.727
3	4.76	4.82	1.00	11.00	6.05	4.14	6.53	4.90	6.14	10.70	9.79	20.15	5.09	5.86	10.64	0.294
4	3.39	3.10	4.00	1.00	4.21	1.89	1.32	2.79	3.86	5.38	4.85	10.45	2.53	2.72	5.43	0.150
5	0.00	1.29	0.00	0.00	0.43	0.00	0.44	1.27	0.00	1.01	1.00	1.59	0.68	1.05	0.92	0.026
$\mathbf{\lambda}_{t}$	47.17	64.50	38.57	48.57	85.33	57.23	55.43	59.92	66.82	114.17	104.53	215.12	55.50	64.34	113.62

Table 5: Estimated demand and parameters using data disaggregation

The results are fairly similar to the results in Table 2, where we incorporated partial availability into the attraction model. The total primary demand is 1190.8 as opposed to 1194.6. To benchmark these solutions, a naive approach would be to use the projection method to preprocess the observed sales using the open percentage information, such as

\displaystyle\widehat{\textnormal{sales}}=\frac{\textnormal{sales}}{\textnormal{openpct}},

and then apply EM algorithm on the preprocessed data. With this approach we estimate the overall primary demand as 1519.9. This shows that incorporating partial availability natively into the attraction model or disaggregating the sales data are much more robust approaches. In case we observe small outliers in partial availability data, we end up with very large preprocessed sales using the projection method. Experiments showed that the disaggregation method dampens the effect of outliers the most.

2.2 Strong sell-down

The demand model discussed in Vulcano et al. (2012) and Abdallah and Vulcano (2016) assumes that customers choose from the available products based on the Basic Attraction Model (BAM) with the IIA property. The probability of product selection is governed by (1). This model cannot fit to scenarios with strong sell-down, and the same applies to the Generalized Attraction Model (GAM) discussed in Gallego et al. (2015). Strong sell-down can be a common phenomena in practice, because customers often seek to purchase the cheapest available product. A 100% sell-down example is presented in Table 6.

Sales	Period
Sales	6	5	4	3	2	1
1	2	6	0	0	0	0
2			13	15	0	0
3					20	22

Table 6: Example with 100% sell-down

The EM algorithm developed in Vulcano et al. (2012) does not converge on this example, because the attraction model is unable to fit to the scenario presented in the data. The estimated first choice demands will converge to an unbounded solution.

A natural way to handle sell-down is by introducing an additional parameter $l$ for the lowest available product. The attraction model (1) would be extended as

\displaystyle P_{i}(S_{t},\bm{v},L_{t},l)=\frac{v_{i}+\mathbbm{1}(i\in L_{t})\cdot l}{v_{0}+\sum_{j\in S_{t}}\left(v_{j}+\mathbbm{1}(j\in L_{t})\cdot l\right)}

(5)

Parameter $l$ is an additional preference weight for the product with the lowest price, or the product with excess attraction in the assortment. In practice we could add additional preference weights by group of products, depending on the structure of the problem. $L$ is a set of the indices for the lowest available product over time, and $\mathbbm{1}(i\in L_{t})$ is an indicator function, representing whether product $i$ is the lowest available product at time $t$ .

For the example in Table 6, $L=(1,1,2,2,3,3)$ . $L_{3}=2$ means that at $t=3$ the lowest available class is the second class from the top.

To present a motivating example with strong sell-down: assuming that the true parameters are $v_{0}=1$ , $v_{1}=0.4$ , $v_{2}=0.7$ , $v_{3}=0.1$ and $l=10$ , the purchase probabilities for various assortments, induced by (5), are shown in Table 7.

Probability	Assortment
Probability	$L_{1}=1$	$L_{2}=2$	$L_{3}=3$
$P_{0}$	8.8%	8.3%	8.2%
$P_{1}$	91.2%	3.3%	3.3%
$P_{2}$		88.4%	5.7%
$P_{3}$			82.8%

Table 7: Purchase probabilities with additional preference for lowest available product

These purchase probabilities can describe a realistic scenario, where the lowest available product receives majority of the demand. The products available above, however, still maintain the IIA property. This extended model fits data better with strong sell-down, which cannot be described by the basic MNL model.

Essentially the same idea, with more parameters than just the scalar $l$ and a slightly more general formulation, the spiked-MNL model was considered in Cao et al. (2019). The spiked-MNL model extends the classical MNL model by having a separate attractiveness parameter for the cheapest available fare class on each flight.

The likelihood function for the extended model with parameter $l$ becomes

	$\displaystyle L_{I}(\bm{\lambda},\bm{v},l)$	$\displaystyle=$	$\displaystyle\prod_{t=1}^{T}\left[\left(P(m_{t}~{}\textnormal{customers buy in period}~{}t\|\bm{v},\bm{\lambda},l)\frac{m_{t}!}{z_{1t}!z_{2t}!\cdots z_{nt}!}\right)\right.$
			$\displaystyle\left.\prod_{j\in S_{t}}\left[\frac{P_{j}(S_{t},\bm{v},L_{t},l)}{\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)}\right]^{z_{jt}}\right]$

where

\displaystyle P(m_{t}~{}\textnormal{customers buy in period}~{}t|\bm{v},\bm{\lambda},l)

\displaystyle=

\displaystyle\frac{\left[\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)\right]^{m_{t}}\exp\left(-\lambda_{t}\sum_{i\in S_{t}}P_{i}(S_{t},\bm{v},L_{t},l)\right)}{m_{t}!}

and the log-likelihood function modifies to

	$\displaystyle l_{I}(\bm{v},\bm{\lambda},l)$	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}{v_{0}+\sum_{i\in S_{t}}\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)}+\right.$		(6)
			$\displaystyle\left.\sum_{i\in S_{t}}z_{it}\log\left(v_{i}+\mathbbm{1}(i\in L_{t})\cdot l\right)\right]$		(6)

Note that in this paper we are not providing an algorithm for maximizing the log-likelihood (6), but we can rely on available nonlinear solvers, if needed. It would be of practical interest, however, to extend the EM (Vulcano et al., 2012) and MM (Abdallah and Vulcano, 2016) algorithms to estimate the parameters of this extended model.

2.3 Constrained parameter space

Algorithms based on the model assumptions above can lead to unreasonable estimates in practice. For instance, on airline data we observed estimated demands which were high in a business scenario. This can happen due to data sparsity, outliers, data preprocessing steps, and most likely that the assumptions of the MNL model do not fit the true data generating model. Therefore, in practical applications, we might want to constrain the parameters of the model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for $L_{1}$ regularization (Lasso regression) and $L_{2}$ regularization (Ridge regression). It can be also of interest to apply constraints on the arrival rate of customers ( $\lambda_{t}$ ), having an interpretable business fence on these parameters.

In practice we could solve a constrained maximum likelihood problem, such as

	$\displaystyle\max_{\bm{\lambda},\bm{v}}$		$\displaystyle l_{I}$
	s.t.
			$\displaystyle\sum_{t=1}^{T}\lambda_{t}\leq L$

where the constraint ensures that the overall arrival rate or mean total demand over the time horizon is less than an upper bound $L$ . $L$ could be defined, for instance, as $C$ times the total observed sales, that is

\displaystyle L=C\frac{1}{s}\sum_{t=1}^{T}\sum_{i=1}^{n}z_{it}

$C$ is a regularization parameter which has to be set based on practical considerations. Note that $\lambda_{t}$ includes the no purchase alternative, so we scale the total sales with market share $s$ . Alternatively, we could constrain the arrival rates at each time period and solve constrained maximum likelihood problem

	$\displaystyle\max_{\bm{\lambda},\bm{v}}$		$\displaystyle l_{I}$
	s.t.
			$\displaystyle\lambda_{t}\leq L_{t},\quad t=1,\ldots,T$

Similarly, we could set $L_{t}=C\frac{1}{s}\sum_{i=1}^{n}z_{it}$ , a constant times the observed sales at time period $t$ . It would be of practical interest to extend the EM algorithm of Vulcano et al. (2012) to solve this constrained problem. In Section 4 we will extend the MM algorithm in Abdallah and Vulcano (2016) to this constrained problem and also discuss how to solve the optimization problem using the Frank-Wolfe algorithm.

2.4 Non-homogeneous product set

In revenue management, we often encounter changing products due to changes in the system, hence we need to deal with a non-homogeneous product set over time. Instead of dividing the observed data to homogeneous parts, we want to extend the algorithms to handle non-homogeneous product offerings, allowing us to use the whole data set and borrow power to accurately estimate the parameters.

Let us look at a hypothetical airline sales example in Table 8, which was created from the synthetic example in Vulcano et al. (2012). We did this, so the estimated results are easy to compare.

Sales		Period
Sales		1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
1	flt1-prod1	10	15	11	14
2	flt1-prod2	11	6	11	8	20	16
3	flt1-prod3	5	6	1	11	4	5	14	7	11
4	flt1-prod4	4	4	4	1	6	4	3	5	9	9	6	9
5	flt1-prod5	0	2	0	0	1	0	1	3	0	3	3	5	2	3	3
6	flt2-prod1																10	15	11	14
7	flt2-prod2																11	6	11	8	20	16
8	flt2-prod3																5	6	1	11	4	5	14	7	11
9	flt2-prod4																4	4	4	1	6	4	3	5	9	9	6	9
10	flt2-prod5																0	2	0	0	1	0	1	3	0	3	3	5	2	3	3
11	flt3-prod1	20	30	22	28												20	30	22	28
12	flt3-prod2	22	12	22	16	40	32										22	12	22	16	40	32
13	flt3-prod3	10	12	2	22	8	10	28	14	22							10	12	2	22	8	10	28	14	22
14	flt3-prod4	8	8	8	2	12	8	6	10	18	18	12	18				8	8	8	2	12	8	6	10	18	18	12	18
15	flt3-prod5	0	4	0	0	2	0	2	6	0	6	6	10	4	6	6	0	4	0	0	2	0	2	6	0	6	6	10	4	6	6

Table 8: Schedule change example

We have 3 flights, each of them having 5 different products, a total of $n=15$ products. The products of flight 1 only exist at the first 15 time periods, and after that the label changes to flight 2. This can happen for various reasons, for example schedule change. It is not always possible to preprocess the data and match the products of flight 1 to flight 2. We can see that the product offerings are non-homogeneous over time, or unbalanced. Product 1 of flight 1 is not available for sale for time periods 5-15, which is distinguished from not being part of the product offer set at time periods 16-30. Notice that the observed sales data of flight 1 for time periods 1-15 is the same as the synthetic example created in Vulcano et al. (2012), and the observed sales for Flight 3 are just twice of that, like a flight with double the demand and capacity.

To introduce non-homogeneous product set into the notation, let $I_{t}$ denote the set of existing products of the retailer at time $t$ . Note that this is different from $S_{t}$ , which is the set of products available for purchase at time $t$ , therefore $S_{t}\subseteq I_{t}$ . For instance, in Table 8, $I_{7}=\{1,2,3,4,5,11,12,13,14,15\}$ and $S_{7}=\{3,4,5,13,14,15\}$ . The set of all products is $I=\bigcup_{t=1}^{T}I_{t}=\left\{1,2,\ldots,15\right\}$ .

In Section 3 we will extend the EM algorithm of Vulcano et al. (2012) for non-homogeneous product offerings. The MM and Frank-Wolfe algorithms discussed in Section 4 will also be able to natively handle this product structure.

2.5 Handling market share constraint and no-purchase option

We mentioned that the objective function (2) has an infinite number of maximum likelihood estimates, therefore an additional constraint is applied in Vulcano et al. (2012) on the preference weights as a function of the market share

\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}}

(7)

using the standard scaling of $v_{0}=1$ . The market share constraint is linear but the objective function is non-convex, therefore Abdallah and Vulcano (2016) formulates the problem using $v_{i}=\exp(\beta_{i})$ in the objective function, so the market share constraint becomes

\displaystyle s=\frac{\sum_{i=1}^{n}\exp(\beta_{i})}{1+\sum_{i=1}^{n}\exp(\beta_{i})}.

They establish that the objective function in this space is a concave function, and they show how to deal with the non-convex market share constraint by solving the problem without the constraint and then using a transformation to satisfy the constraint.

The outside alternative could represent the competitor’s product, or both the competitor’s product and the no-purchase option, which would lead to different interpretations of $s$ and $\bm{\lambda}$ (Vulcano et al., 2012). The outside alternative is treated as a separate product that is always available, therefore $v_{0}$ is constant and the standard scaling $v_{0}=1$ is used.

To look further, let $I$ denote the set of all products of the retailer, that is $I=\left\{1,2,\ldots,n\right\}$ , so it follows that $S_{t}\subseteq I$ . With the market share constraint above, the probability of selecting one of the retailer’s product, when everything is available for sale, is $s$ , that is

\displaystyle P_{I}(I,\bm{v})=\frac{\sum_{j\in I}v_{j}}{v_{0}+\sum_{j\in I}v_{j}}=s

When the retailer offers an assortment $S_{t}$ with a subset of all the products, it follows that

\displaystyle\tilde{s}_{t}=P_{S_{t}}(S_{t},\bm{v})=\frac{\sum_{j\in S_{t}}v_{j}}{v_{0}+\sum_{j\in S_{t}}v_{j}}<s

given $v_{0}$ is fixed. This means that the model induced market share ( $\tilde{s}_{t}$ ) of the retailer at time $t$ is less than $s$ , and the share of outside alternative is larger than $1-s$ . If we think of the outside alternative as the competitor’s product, one could argue that the competitor’s product might not always be available for purchase either, so the preference weight $v_{0}$ is not constant over time. Also, the estimated market share $s$ is calculated over all possible sets of assortments, not only when all the retailer’s products are available for sale.

To extend this model, we can relax the assumption of constant, time-independent outside alternative and introduce $v_{0t}$ , allowing the preference weight to change over time. We also need this extension to be able to incorporate non-homogeneous product sets into the market share constraint, by using $I_{t}$ , the offer set of retailer at time $t$ (Section 2.4). The market share constraint modifies to a set of constraints

\displaystyle s=\frac{\sum_{i\in I_{t}}v_{i}}{v_{0t}+\sum_{i\in I_{t}}v_{i}},~{}~{}t=1,\ldots,n

(8)

which ensure that the share of outside alternative is $1-s$ , for all time points $t$ , when all the retailer’s products are available for sale. That is

\displaystyle P_{0}(I,\bm{v})=\frac{v_{0t}}{v_{0t}+\sum_{i\in I_{t}}v_{i}}=\frac{\frac{1-s}{s}\sum_{i\in I_{t}}v_{i}}{\frac{1-s}{s}\sum_{i\in I_{t}}v_{i}+\sum_{i\in I_{t}}v_{i}}=1-s.

Now let us consider an edge case scenario where the retailer’s model induced market share is constant over time regardless of the assortment he offers. This could be achieved with the set of market share constraints

\displaystyle s=\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}},~{}~{}t=1,\ldots,n

(9)

leading to

\displaystyle\tilde{s}_{t}=P_{S_{t}}(S_{t},\bm{v})=\frac{\sum_{j\in S_{t}}v_{j}}{v_{0t}+\sum_{j\in S_{t}}v_{j}}=\frac{\sum_{j\in S_{t}}v_{j}}{\frac{1-s}{s}\sum_{j\in S_{t}}v_{j}+\sum_{j\in S_{t}}v_{j}}=s.

The retailers’s share remains $s$ and the share of the outside alternative remains $1-s$ , at each time $t$ , regardless of what products are available for purchase. This edge case might not make sense for all applications, but it is interesting to consider as an alternative to the assumption that the outside alternative is always fully available. In the airline retail context we could think of the outside alternative as the competitor’s products whose availability can be as limited as the host airline’s products.

To introduce a continuum of cases between (8) and (9), let us introduce parameter $\alpha\in[0,1]$ controlling the availability of outside alternative. $\alpha=0$ represents the case where the outside alternative is always available for sale (8), and naturally, $\alpha=1$ represents the case where the outside alternative limits availability the same fashion as the retailer. This is more of a hypothetical case, given the outside alternative could include the no purchase option, which is always an available choice. Combining (8) and (9), the set of market share constraints become

\displaystyle s=(1-\alpha)\frac{\sum_{i\in I_{t}}v_{i}}{v_{0t}+\sum_{i\in I_{t}}v_{i}}+\alpha\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}},~{}~{}t=1,\ldots,n

(10)

Using $\alpha=0$ , $I_{t}=I=\{1,\ldots,n\}$ and $v_{0t}=1$ , the constraints simplify to (7).

The models require the knowledge of market share $s$ , which in practice can be difficult to acquire, and the estimate itself can be inaccurate. Abdallah and Vulcano (2016) included a study on the sensitivity of model estimates to the input market share. We mentioned that, in practice, market share $s$ is most likely estimated from data observed over various sets of assortments, not only when all the retailer’s products are available for sale. Therefore, it could also make sense to use a market share constraint aggregated over time horizon $T$ , that is

\displaystyle s=\frac{\sum_{t=1}^{T}\sum_{i\in I_{t}}v_{i}}{\sum_{t=1}^{T}\left(v_{0t}+\sum_{i\in I_{t}}v_{i}\right)}

(11)

This constraint ensures that the market share over the time horizon $T$ is $s$ , taking into account the changing offer set $I_{t}$ over time. This is less restrictive and could be a more reasonable assumption than using (7), which forces the market share at each time point $t$ with a set of constraints. Adding $\alpha$ to control availability of the outside alternative, we would use

\displaystyle s=\frac{\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]}{\sum_{t=1}^{T}\left[v_{0t}+(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]}

(12)

Finally, we want to point out that instead of solving the market share constrained optimization problem

	$\displaystyle\max_{\bm{\lambda},\bm{v}}$		$\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]$
	s.t.
			$\displaystyle s=\frac{\sum_{i=1}^{n}v_{i}}{v_{0}+\sum_{i=1}^{n}v_{i}}$
			$\displaystyle v_{0}=1$

we can directly incorporate the market share constraint into the objective function through $v_{0}=r\sum_{i=1}^{n}v_{i}$ , with $r=(1-s)/s$ , and solve

	$\displaystyle\max_{\bm{\lambda},\bm{v}}$		$\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{r\sum_{i=1}^{n}v_{i}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{r\sum_{i=1}^{n}v_{i}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]$
	s.t.
			$\displaystyle\sum_{i=1}^{n}v_{i}=1$

or by setting $v_{1}=1$ . The scaling constraint on $\bm{v}$ is required to avoid multiple solutions.

Incorporating non-homogeneous product set and control of availability of outside alternative, we would need to solve

	$\displaystyle\max_{\bm{\lambda},\bm{v}}$		$\displaystyle\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}}{v_{0t}+\sum_{i\in S_{t}}v_{i}}+\sum_{i\in S_{t}}z_{it}\log(v_{i})\right]$
	s.t.
			$\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]$
			$\displaystyle\sum_{i=1}^{n}v_{i}=1$

We will see in Section 4.2 that this formulation with upper bound constraints on the arrival rate is easy to solve using the Frank-Wolfe method.

3 Extended EM algorithm

In this section we extend the EM algorithm of Vulcano et al. (2012) with non-homogeneous product sets (Section 2.4) and the ability to control the availability of outside alternative (Section 2.5).

We incorporate $I_{t}$ , the offer set of retailer at time $t$ , and $v_{0t}$ , the time-dependent preference weight for the outside alternative into the algorithm. We use

\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\right]

where $r=\frac{1-s}{s}$ and $\alpha\in[0,1]$ controls the availability of the outside alternative. Adding these into the formulation, we can modify the key E-step equations of the EM algorithm as

$\displaystyle\hat{X}_{jt}$	$\displaystyle=$	$\displaystyle\begin{cases}\frac{v_{j}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}\frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\sum_{h\in S_{t}}v_{h}}\sum_{h\in S_{t}}z_{ht},&\text{if $j\notin S_{t}\cup\{0\}$}\\ \frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt},&\text{if $j\in S_{t}$}\end{cases}$
$\displaystyle\hat{Y}_{jt}$	$\displaystyle=$	$\displaystyle\frac{\sum_{h\notin S_{t}\cup\{0\}}v_{h}-v_{0t}+r\sum_{i\in I_{t}}v_{i}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt},~{}j\in S_{t}$
$\displaystyle\hat{X}_{0t}$	$\displaystyle=$	$\displaystyle r\sum_{i\in I_{t}}\hat{X}_{it}$
$\displaystyle\hat{Y}_{0t}$	$\displaystyle=$	$\displaystyle\frac{v_{0t}}{\sum_{i\in S_{t}}v_{i}+v_{0t}}\sum_{h\notin S_{t}\cup\{0\}}\hat{X}_{ht}$

For the M-step we need to maximize the conditional expected, complete data log-likelihood function, which becomes

\displaystyle{\mathcal{L}}(\bm{v})=\sum_{t=1}^{T}\sum_{j\in I_{t}}\hat{X}_{jt}\log\left(\frac{v_{j}}{(1+r)\sum_{i\in I_{t}}v_{i}}\right)+\sum_{t=1}^{T}\hat{X}_{0t}\log\left(s\right)

(13)

To find the maxima, the first order conditions become

\displaystyle\frac{\partial{\mathcal{L}}}{\partial v_{i}}=\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\left(\frac{\hat{X}_{it}}{v_{i}}-\frac{\sum_{k\in I_{t}}\hat{X}_{kt}}{\sum_{k\in I_{t}}v_{k}}\right)=0,~{}~{}i=1,\ldots,n

which is a system of $n$ nonlinear equations in $v_{i}$ . The solution can be obtained with existing implementations of iterative methods, such as Newton-Raphson. We can also use a simple fixed point iteration algorithm by rewriting $\nabla\mathcal{L}=F(\bm{v})=0$ to $\Phi\left(\bm{v}\right)=\bm{v}$ and then using the fixed point iteration $\bm{v}^{(l+1)}:=\Phi\left(\bm{v}^{(l)}\right)$ . In our case the update becomes

\displaystyle v_{i}^{(l+1)}:=\frac{\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]X_{it}}{\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\frac{\sum_{k\in I_{t}}X_{kt}}{\sum_{k\in I_{t}}v_{k}^{(l)}}},~{}~{}i=1,\ldots,n

In case of homogeneous product set, so that $I_{t}=\{1,\ldots,n\},~{}\forall t$ , we get a closed form solution to the M-step, that is

\displaystyle v_{i}=\frac{\sum_{t=1}^{T}X_{it}}{\sum_{j=1}^{n}\sum_{t=1}^{T}X_{jt}},~{}~{}i=1,\ldots,n

The result is easy to obtain by assuming $\sum_{i=1}^{n}v_{i}=1$ . Note that this closed-form equation is different from the one derived in Vulcano et al. (2012), since they used a different parametrization by setting $v_{0}=1$ .

Algorithm 2 presents the extended EM algorithm with non-homogeneous product set and the ability to control the availability of outside alternative.

Algorithm 2 EM algorithm with non-homogeneous product set and the control of availability of outside alternative (OA)

I_{t}

: set of offered products at time

t

(product set)

S_{t}

: set of products available for purchase at time

t

(

S_{t}\subset I_{t}

)

r=\frac{1-s}{s}

, where

s

is market share of the retailer

z_{jt}

: observed sales at time

t

for product

j

\alpha

: control parameter (

\alpha=1

: OA available as retailer;

\alpha=0

: OA fully available)

6:E-step

7:for

t=1,\ldots,T

v_{0t}=r\left[\left(1-\alpha\right)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{h\in S_{t}}v_{h}\right]

\triangleright

OA preference weight at

t

9: for

j\in I_{t}

10: if

j\notin S_{t}

then

11:

X_{jt}=\frac{v_{j}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}\frac{\sum_{h\in S_{t}}v_{h}+v_{0t}}{\sum_{h\in S_{t}}v_{h}}\sum_{h\in S_{t}}z_{ht}

12: else (

j\in S_{t}

)

13:

Y_{jt}=\frac{\sum_{h\notin S_{t}}v_{h}-v_{0t}+r\sum_{i\in I_{t}}v_{i}}{\left(1+r\right)\sum_{i\in I_{t}}v_{i}}z_{jt}

14:

X_{jt}=z_{jt}-Y_{jt}

15: end if

16: end for

17:

X_{ot}=r\sum_{i\in I_{t}}X_{it}

18:

Y_{ot}=\frac{v_{0t}}{\sum_{h\in S_{t}}v_{h}+v_{0t}}\sum_{h\notin S_{t}}X_{ht}

19:end for

20:M-step

21:Find

v_{i},~{}i=1,\ldots,n

as solution to the system of nonlinear equations

F(\bm{v})=0

22:

\sum_{t=1}^{T}\mathbbm{1}\left[i\in I_{t}\right]\left(\frac{X_{it}}{v_{i}}-\frac{\sum_{k\in I_{t}}X_{kt}}{\sum_{k\in I_{t}}v_{k}}\right)=0,~{}~{}i=1,\ldots,n

23:Special case (homogeneous product set), if

I_{t}=\{1,\ldots,n\},~{}\forall t

24:

v_{i}=\frac{\sum_{t=1}^{T}X_{it}}{\sum_{j=1}^{n}\sum_{t=1}^{T}X_{jt}},~{}~{}i=1,\ldots,n

If, in practice, we observe data with partial availability, we recommend to split the sales to fully open and closed assortments (Algorithm 1), apply extended EM (Algorithm 2), and aggregate the solution back. It would be an interesting future research topic to further extend the EM algorithm and incorporate constraints on the arrival rates (Section 2.3).

3.1 Example: limited OA availability

In this section we apply Algorithm 2 on the simulated example from Vulcano et al. (2012) and demonstrate what happens when we limit the availability of the outside alternative simultaneously with the retailer’s availability ( $\alpha=1$ ). The observed sales data is presented in Table 9.

Sales	Period
Sales	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1
1	10	15	11	14
2	11	6	11	8	20	16
3	5	6	1	11	4	5	14	7	11
4	4	4	4	1	6	4	3	5	9	9	6	9
5	0	2	0	0	1	0	1	3	0	3	3	5	2	3	3

Table 9: Simulated example of Vulcano et al. (2012)

First let us look at the solution, using $\alpha=0$ , presented in Table 10. We closely recover the results of Vulcano et al. (2012), since we make the same assumption that the outside alternative is always available.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	$\mathbf{v}_{i}$
1	10	15	11	14	15.03	12.12	13.04	10.87	14.49	15.91	11.93	18.56	11.51	17.27	17.27	1.000
2	11	6	11	8	14.35	11.48	10.45	8.71	11.61	12.75	9.56	14.88	9.23	13.84	13.84	0.801
3	5	6	1	11	2.87	3.59	6.88	3.44	5.41	6.22	4.67	7.26	4.50	6.75	6.75	0.391
4	4	4	4	1	4.31	2.87	1.47	2.46	4.42	3.43	2.29	3.43	2.68	4.02	4.02	0.233
5	0	2	0	0	0.72	0.00	0.49	1.47	0.00	1.14	1.14	1.91	0.63	0.95	0.95	0.055
$\mathbf{\lambda}_{t}$	42.86	47.14	38.57	48.57	53.26	42.95	46.19	38.50	51.33	56.37	42.28	65.76	40.78	61.18	61.18

Table 10: Estimated demand and parameters using EM algorithm with

\alpha=0

Using $\alpha=1$ , the estimated primary demand at each time period is equal to the observed purchases, because the availability of outside alternative is restricted as the retailer’s availability, preserving the model induced market share $s$ at each time $t$ . The results are presented in Table 11.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	$\mathbf{v}_{i}$
1	10	15	11	14	12.50	10.08	7.26	6.05	8.06	4.84	3.63	5.65	0.81	1.21	1.21	1.000
2	11	6	11	8	11.94	9.55	5.75	4.79	6.39	3.83	2.87	4.47	0.64	0.96	0.96	0.792
3	5	6	1	11	2.39	2.98	3.88	1.94	3.05	1.92	1.44	2.24	0.32	0.48	0.48	0.396
4	4	4	4	1	3.58	2.39	0.83	1.39	2.50	1.06	0.71	1.06	0.20	0.30	0.30	0.245
5	0	2	0	0	0.60	0.00	0.28	0.83	0.00	0.35	0.35	0.59	0.04	0.06	0.06	0.046
$\mathbf{\lambda}_{t}$	42.86	47.14	38.57	48.57	44.29	35.71	25.71	21.43	28.57	17.14	12.86	20.00	2.86	4.29	4.29

Table 11: Estimated demand and parameters using EM algorithm with

\alpha=1

Notice that the estimated preference weights in Tables 10 and 11 are close to each other, the estimates are not sensitive to the value of $\alpha$ . However, the estimated arrival rates changed drastically, due to the change in the model induced market shares.

Parameter $\alpha$ can be used as a tool for the practitioner to control the availability of the outside alternative between these two edge cases. We would like to emphasize, however, that competitor matching can be risky and the value of $\alpha$ should ideally be inferred from exogenous data. The conservative practice is to use $\alpha=0$ .

3.2 Example: non-homogeneous product set

To demonstrate the extended EM algorithm on a non-homogeneous product set, let us revisit the example presented in Table 8 (Section 2.4). This is a hypothetical airline sales example with a schedule change. We observe 3 flights, where the sales of products of flight 1 are discontinued and the repeated as flight 2, and flight 3 has twice the observed sales of flights 1 and 2. The solution using $\alpha=0$ is presented in Table 12.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	$\mathbf{v}_{i}$
flt1-prod1	10	15	11	14	15.03	12.12	13.04	10.87	14.49	15.91	11.93	18.56	11.51	17.27	17.27	1.000
flt1-prod2	11	6	11	8	14.35	11.48	10.45	8.71	11.61	12.75	9.56	14.88	9.23	13.84	13.84	0.801
flt1-prod3	5	6	1	11	2.87	3.59	6.88	3.44	5.41	6.22	4.67	7.26	4.50	6.75	6.75	0.391
flt1-prod4	4	4	4	1	4.31	2.87	1.47	2.46	4.42	3.43	2.29	3.43	2.68	4.02	4.02	0.233
flt1-prod5	0	2	0	0	0.72	0.00	0.49	1.47	0.00	1.14	1.14	1.91	0.63	0.95	0.95	0.055
flt2-prod1																1.000
flt2-prod2																0.801
flt2-prod3																0.391
flt2-prod4																0.233
flt2-prod5																0.055
flt3-prod1	20	30	22	28	30.07	24.25	26.08	21.73	28.98	31.82	23.87	37.12	23.02	34.54	34.54	2.000
flt3-prod2	22	12	22	16	28.71	22.97	20.90	17.42	23.22	25.50	19.13	29.75	18.45	27.68	27.68	1.603
flt3-prod3	10	12	2	22	5.74	7.18	13.76	6.88	10.81	12.44	9.33	14.52	9.00	13.51	13.51	0.782
flt3-prod4	8	8	8	2	8.61	5.74	2.95	4.92	8.85	6.86	4.57	6.86	5.36	8.04	8.04	0.465
flt3-prod5	0	4	0	0	1.44	0.00	0.98	2.95	0.00	2.29	2.29	3.81	1.26	1.89	1.89	0.110
$\mathbf{\lambda}_{t}$	128.57	141.43	115.71	145.71	159.79	128.86	138.58	115.49	153.98	169.10	126.83	197.29	122.35	183.53	183.53

	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
flt1-prod1																1.000
flt1-prod2																0.801
flt1-prod3																0.391
flt1-prod4																0.233
flt1-prod5																0.055
flt2-prod1	10	15	11	14	15.03	12.12	13.04	10.87	14.49	15.91	11.93	18.56	11.51	17.27	17.27	1.000
flt2-prod2	11	6	11	8	14.35	11.48	10.45	8.71	11.61	12.75	9.56	14.88	9.23	13.84	13.84	0.801
flt2-prod3	5	6	1	11	2.87	3.59	6.88	3.44	5.41	6.22	4.67	7.26	4.50	6.75	6.75	0.391
flt2-prod4	4	4	4	1	4.31	2.87	1.47	2.46	4.42	3.43	2.29	3.43	2.68	4.02	4.02	0.233
flt2-prod5	0	2	0	0	0.72	0.00	0.49	1.47	0.00	1.14	1.14	1.91	0.63	0.95	0.95	0.055
flt3-prod1	20	30	22	28	30.07	24.25	26.08	21.73	28.98	31.82	23.87	37.12	23.02	34.54	34.54	2.000
flt3-prod2	22	12	22	16	28.71	22.97	20.90	17.42	23.22	25.50	19.13	29.75	18.45	27.68	27.68	1.603
flt3-prod3	10	12	2	22	5.74	7.18	13.76	6.88	10.81	12.44	9.33	14.52	9.00	13.51	13.51	0.782
flt3-prod4	8	8	8	2	8.61	5.74	2.95	4.92	8.85	6.86	4.57	6.86	5.36	8.04	8.04	0.465
flt3-prod5	0	4	0	0	1.44	0.00	0.98	2.95	0.00	2.29	2.29	3.81	1.26	1.89	1.89	0.110
$\mathbf{\lambda}_{t}$	128.57	141.43	115.71	145.71	159.79	128.86	138.58	115.49	153.98	169.10	126.83	197.29	122.35	183.53	183.53

Table 12: Schedule change example: estimated demand and parameters (

\alpha=0

)

For flights 1 and 2 we get the same estimated primary demand and preference weights as in Table 10, since we duplicated that example over the time horizon. For flight 3 we estimate twice the primary demand and preference weights, which was expected, since we artificially doubled the numbers. The method is consistent, and can handle non-homogeneous product set in a mathematically formal way.

It is interesting to note here that the EM algorithm of Vulcano et al. (2012) cannot distinguish between product $i$ not being available for sale ( $i\notin S_{t}$ ) as opposed to not being part of the product set ( $i\notin I_{t}$ ). Because of this, naive application of the EM algorithm would estimate primary demand for non-existing products of flight 1 and 2, grossly overestimating the demand. The total estimated arrival rate is $\sum_{i=1}^{30}\lambda_{t}=5324.10$ , while using the extended EM algorithm we get $\sum_{i=1}^{30}\lambda_{t}=4421.53$ .

It is also interesting to mention that using the extended EM algorithm with $\alpha=1$ we get $\sum_{i=1}^{30}\lambda_{t}=2365.71$ , and just like before, the total primary demand at time period $t$ is equal to the observed purchases. It is easy to show in general that in case $\alpha=1$ it follows that $\lambda_{t}=\left(\sum_{i=1}^{n}z_{it}\right)/s$ . If we apply the algorithm with other values of $\alpha\in[0,1]$ we observe a linear decrease of total estimated arrival rate as a function of $\alpha$ . The model induced market share of the retailer increases as $\alpha$ increases which decreases the estimated demand.

Algorithm 2 is a simple but yet powerful extension of the EM algorithm which can natively handle non-homogenous product set, and can control the availability of the outside alternative by a simple parameter. The price of this extension is that in the M-step we need to solve for the roots of a system of nonlinear equations, instead of having a closed form solution. However, we can use a simple fixed point iteration as the M-step.

4 Constrained optimization

In this section we develop algorithms to solve the estimation problem with constrained arrival rates (Section 2.3). We will extend the minorization-maximization (MM) algorithm of Abdallah and Vulcano (2016) and also present a solution utilizing the Frank-Wolfe algorithm. We will also incorporate partial availability, non-homogeneous product set, and the ability to control the availability of outside alternative into the model, and show how to estimate the parameters using iterative algorithms.

Consider the incomplete log-likelihood

\displaystyle l_{I}(\bm{v},\bm{\lambda})=\sum_{t=1}^{T}\left[m_{t}\log\left(\frac{\lambda_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\lambda_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})\right]

(14)

which is an extension of the log-likelihood in Abdallah and Vulcano (2016) with partial availability, as discussed in Section 2.1.1. In case $o_{it}=1$ when $i\in S_{t}$ , the log-likelihood simplifies to the one considering only fully open and closed assortments. Note that we are using preference weights $\bm{v}$ in the model, but we could express the model in the utility space by using $\bm{v}=\exp(\bm{\beta})$ .

The constrained optimization problem we need to solve is given by

$\displaystyle\max_{\bm{v},\bm{\lambda}}$	$\displaystyle l_{I}(\bm{v},\bm{\lambda})$	(15)
s.t.
	$\displaystyle\lambda_{t}\leq L_{t},~{}t=1,\ldots,T$

where $L_{t}$ is an upper bound on the arrival rate at time $t$ . The motivation behind putting an upper bound on the arrival rates was explained in Section 2.3, where we discussed two specific ways of constraining the arrival rates. To solve the constrained maximum likelihood estimation problem, we follow the idea in Abdallah and Vulcano (2016). We express the optimization problem as a function of $\bm{v}$ by applying part of the Karush–Kuhn–Tucker (KKT) conditions on the Lagrangian function, removing $\bm{\lambda}$ from the problem. The Lagrangian function of (15) becomes

\displaystyle{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})=l_{I}(\bm{v},\bm{\lambda})-\sum_{t=1}^{T}\frac{\mu_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}(\lambda_{t}-L_{t})

(16)

Note that we used a simple re-scaling ( $\frac{1}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$ for time $t$ ) of the Lagrange multipliers $\bm{\mu}$ . After we apply the KKT conditions to remove $\bm{\lambda}$ from the problem, and some algebra (see Appendix), the problem becomes

	$\displaystyle\max_{\bm{v}}$		$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)$

where $C_{3}\left(\mathcal{B}(\bm{v})\right)$ is defined in $(\ref{eq:c_v})$ , $K_{i}=\sum_{t=1}^{T}z_{it}$ and $\mathcal{B}(\bm{v})$ represents the set of time periods where the upper bound constraints on the arrival rates are binding, that is

\displaystyle\mathcal{B}(\bm{v})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right\}

(18)

Through the definitions of $v_{0t}$ and market share constraint we can incorporate non-homogeneous product set in the model, and the ability to control the availability of outside alternative. These details were discussed in Sections 2.4 and 2.5. We will consider two formulations here. In the first one we use market share constraint

\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t}

(19)

which is equivalent to the aggregate constraint (11) discussed in Section 2.5, with using $\tilde{s}=s/(1-s)$ . We assume that OA preference weights $v_{0t}$ are known, and we can use standard scaling $v_{0t}=1$ . In Section 4.1 we will show how to use the MM algorithm to solve this problem.

In the second formulation we incorporate market share constraint (10) into the objective function and constrain the preference weights by using

	$\displaystyle v_{0t}$	$\displaystyle=$	$\displaystyle r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]$		(20)
	$\displaystyle\sum_{\forall i}v_{i}$	$\displaystyle=$	$\displaystyle 1$

This first equation removes $v_{0t}$ from the problem, while the second equation avoids having multiple solutions by rescaling $\bm{v}$ . In Section 4.2 we will show how the Frank-Wolfe method will lead to a simple coordinate descent algorithm to solve this problem.

4.1 Solution using MM algorithm

In this section we present a solution to the optimization problem (15) with constraint (19) using the MM algorithm, building on Abdallah and Vulcano (2016). The idea behind MM algorithms is to find a surrogate function that minorizes the original objective function, maximize the surrogate function, and continue this iteratively. For more information on the MM algorithm, in general, see Hunter and Lange (2000a, 2004).

After removing $\bm{\lambda}$ from the problem using the KKT conditions, we arrive to (4). If we rearrange the last term of the objective to aid the minorization, the optimization problem becomes

	$\displaystyle\max_{\bm{v}}$		$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{v_{0t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)$
	s.t.
			$\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t}$

where $v_{0t}$ are known. A common technique to find a minorizer is to use supporting hyperplanes (Hunter and Lange, 2004). Since the second, third, and fourth terms in (4.1) are convex, we can use first-order Taylor approximation to the convex functions $-\log(x)$ and $\frac{1}{x}$ , that is, for all $x,y>0$

	$\displaystyle-\log(y)$	$\displaystyle\geq$	$\displaystyle-\log(x)-\frac{1}{x}(y-x)$
	$\displaystyle\frac{1}{y}$	$\displaystyle\geq$	$\displaystyle\frac{1}{x}-\frac{1}{x^{2}}(y-x)$

In our specific case we will use

	$\displaystyle-\log(y)$	$\displaystyle\geq$	$\displaystyle 1-\log(x)-y/x$
	$\displaystyle\frac{1}{y}$	$\displaystyle\geq$	$\displaystyle-\frac{y}{x^{2}}+\frac{2}{x}$

with $y=\sum_{i\in S_{t}}v_{i}\cdot o_{it}$ and $x=\sum_{i\in S_{t}}v^{(k)}_{i}\cdot o_{it}$ or $y=v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}$ and $x=v_{0t}+\sum_{i\in S_{t}}v^{(k)}_{i}\cdot o_{it}$ , where $\bm{v}^{(k)}$ is the value of $\bm{v}$ at iteration $k$ . Therefore, the minorizer function becomes

	$\displaystyle g(\bm{v}\|\bm{v}^{(k)})$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}$
			$\displaystyle-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}\frac{L_{t}v_{0t}(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}+C_{0}$

where $C_{0}$ contains the constant terms independent of $\bm{v}$ . Although the minorizer $g(\bm{v}|\bm{v}^{(k)})$ is developed locally for fixed $\bm{v}^{(k)}$ , it can be shown to be globally dominated by the ${\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})$ . Computational results show that it is actually a fairly tight minorizer as well. We now need to solve a single market share constraint optimization problem:

	$\displaystyle\max_{\bm{v}}$		$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}$
			$\displaystyle-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}\frac{L_{t}v_{0t}(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}$
	s.t.
			$\displaystyle\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]=\tilde{s}\sum_{t=1}^{T}v_{0t}$

Since the objective function is concave, the constraint is convex, and we are maximizing, there exists a single scalar $\eta$ such that the first order optimality condition holds for Lagrangian

\displaystyle\mathcal{L}(\bm{v},\eta)=g(\bm{v}|\bm{v^{(k)}})-\eta\left(\sum_{t=1}^{T}\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]-\tilde{s}\sum_{t=1}^{T}v_{0t}\right)

The first order optimality condition is

	$\displaystyle K_{j}$	$\displaystyle=$	$\displaystyle v_{j}\left(\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\right.$
			$\displaystyle\left.\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}+\eta\left[(1-\alpha)\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t})+\alpha\sum_{t=1}^{T}\mathbbm{1}(j\in S_{t})\cdot o_{jt}\right]\right)$

which leads to the MM update of $\bm{v}$ summarized in Algorithm 3. In the update step we use Newton’s method to find $\eta$ in every MM iteration, summarized in Algorithm 4. For more details, please see Appendix.

Algorithm 3 MM algorithm to estimate

(\bm{\lambda},\bm{v})

in (15) with market share constraint (19)

1:Input:

\left\{\left(\bm{z}_{t},\bm{o}_{t},S_{t},I_{t},v_{0t},L_{t}\right)\right\}_{t=1}^{T}

s

\alpha

2:Let

K_{j}=\sum_{t=1}^{T}z_{jt}

n_{j}=\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t})

and

o_{j}=\sum_{t=1}^{T}o_{jt},~{}j=1,\ldots,n

3:Let

m_{t}=\sum_{j=1}^{n}z_{jt},~{}t=1,\ldots,T

and

\tilde{s}=s/(1-s)

4:Let

k=0

, initialize

\bm{v}^{(0)}

5:while Stopping criteria is not satisfied do

\mathcal{B}(\bm{v}^{(k)})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right\}

7: for

j=1,\ldots,n

A_{j}=\sum_{t\notin\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\left(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}

9: end for

10: Find

\eta

using Algorithm 4

11: for

j=1,\ldots,n

12:

v_{j}^{(k+1)}=\frac{K_{j}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}

13: end for

14:

k=k+1

15:end while

16:for

t=1,\ldots,T

17:

\lambda_{t}=\min\left(L_{t},m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)

18:end for

Algorithm 4 Newton’s method to find

\eta

1:Let

k=0

\eta^{0}=0

, and

f^{0}=1

2:while

f^{k}>\epsilon

f^{k}=\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta^{k}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}-\tilde{s}\sum_{t=1}^{T}v_{0t}

g^{k}=-\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]^{2}}{\left(A_{j}+\eta^{k}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]\right)^{2}}

\eta^{k+1}=\eta^{k}-\frac{f^{k}}{g^{k}}

k=k+1

7:end while

4.2 Solution using Frank-Wolfe method

In this section we present a solution to the optimization problem (15) with constraint (20) using the Frank-Wolfe algorithm. The Frank-Wolfe, or conditional gradient method is an iterative optimization algorithm for constrained convex optimization, where in each iteration, it considers a linear approximation of the objective function, and moves towards a minimizer of this linear function. For more information on the Frank-Wolfe algorithm, see Frank and Wolfe (1956) and Bertsekas (1999).

After removing $\bm{\lambda}$ from the problem using the KKT conditions, we arrive to (4). Therefore, we need to solve the constrained optimization problem

$\displaystyle\max_{\bm{v}}$	$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
	$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)$
s.t.		(24)
	$\displaystyle\sum_{i=1}^{n}v_{i}=1$	(24)

where we plug in $v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]$ to include the market share constraints into the objective function.

The first step of the Frank-Wolfe algorithm is the direction finding subproblem, which becomes

$\displaystyle\max_{\bm{y}}$	$\displaystyle\nabla f\left(\bm{v}^{(k)}\right)^{T}\bm{y}$	(25)
s.t.
	$\displaystyle\sum_{i=1}^{n}y_{i}=1$

where $\nabla f\left(\bm{v}^{(k)}\right)$ is the gradient vector of the objective function (4.2) evaluated at the solution of iteration $k$ . The elements of $\nabla f\left(\bm{v}\right)$ are calculated as

	$\displaystyle\frac{\partial f}{\partial v_{j}}=$		$\displaystyle\frac{K_{j}}{v_{j}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}(r\alpha+1)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}r(1-\alpha)(\sum_{i\in I_{t}}v_{i})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}$
	where
			$\displaystyle v_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}+\alpha\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right]$

For detailed derivation and computational formula, please see Appendix. The direction finding subproblem in (25) is a fractional knapsack problem (Korte and Vygen, 2012), which solution becomes

\displaystyle y_{i}=\begin{cases}1,&\text{if }i=\arg\max\left\{\left.\frac{\partial f}{\partial v_{i}}\right|_{\bm{v}=\bm{v}^{(k)}},~{}i=1,\ldots,n\right\}\\ 0,&\text{otherwise}\end{cases}

We take a step in the direction of the maximum element of the gradient, only changing that variable in the current step. The Frank-Wolfe algorithm reduces to a coordinate descent algorithm, successively maximizing along coordinate directions determined by the largest value of the gradient vector. The update step of Frank-Wolfe is

\displaystyle\bm{v}^{(k+1)}=\bm{v}^{(k)}+\gamma_{k}\left(\bm{y}-\bm{v}^{(k)}\right)

which simplifies to

\displaystyle v_{i}^{(k+1)}=\begin{cases}\gamma_{k}+(1-\gamma_{k})v_{i}^{(k)},&\text{if }i=\arg\max\left\{\left.\frac{\partial f}{\partial v_{i}}\right|_{\bm{v}=\bm{v}^{(k)}},~{}i=1,\ldots,n\right\}\\ v_{i}^{(k)},&\text{otherwise}\end{cases}

For step size $\gamma_{k}$ we can use the default choice $\gamma_{k}=2/(k+2)$ or perform a line search to find $\gamma_{k}$ that minimizes $f\left(\bm{v}^{(k)}+\gamma_{k}\left(\bm{s}-\bm{v}^{(k)}\right)\right)$ subject to $0\leq\gamma_{k}\leq 1$ . In practice, we implemented a backtracking linesearch using the Armijo’s rule (Nocedal and Wright, 2006). The Frank-Wolfe algorithm is summarized in Algorithm 5, while the backtracking linesearch is presented in Algorithm 6.

Algorithm 5 Frank-Wolfe algorithm to estimate

(\bm{\lambda},\bm{v})

in (15) with constraint (20)

1:Input:

\left\{\left(\bm{z}_{t},\bm{o}_{t},S_{t},I_{t},L_{t}\right)\right\}_{t=1}^{T}

s

\alpha

2:Let

K_{j}=\sum_{t=1}^{T}z_{jt},~{}j=1,\ldots,n

3:Let

m_{t}=\sum_{j=1}^{n}z_{jt},~{}t=1,\ldots,T

and

r=(1-s)/s

4:Let

k=0

, initialize

\bm{v}^{(0)}

5:while Stopping criteria is not satisfied do

6: for

t=1,\ldots,T

v_{0t}^{(k)}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}^{(k)}+\alpha\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right]

8: end for

\mathcal{B}(\bm{v}^{(k)})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right\}

10: for

j=1,\ldots,n

11:

	$\displaystyle g_{j}=$		$\displaystyle\frac{K_{j}}{v_{j}^{(k)}}-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}(r\alpha+1)}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\mathbbm{1}(j\in S_{t})o_{jt}r(1-\alpha)\left(\sum_{i\in I_{t}}v_{i}^{(k)}\right)}{\left(v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\mathbbm{1}(j\in I_{t})r(1-\alpha)\left(\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)}{\left(v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)^{2}}$

12: end for

13:

l=\arg\max\left\{g_{j},~{}j=1,\ldots,n\right\}

\triangleright

Find direction

14:

\bm{y}=\bm{e}_{l}

\triangleright

\bm{e}_{i}

i

th unit vector

15: Use

\gamma_{k}=2/(k+2)

or find

\gamma_{k}

using Algorithm 6

\triangleright

Find step size

16:

\bm{v}^{(k+1)}=\bm{v}^{(k)}+\gamma_{k}\left(\bm{y}-\bm{v}^{(k)}\right)

\triangleright

Frank-Wolfe update

17:

k=k+1

18:end while

19:for

t=1,\ldots,T

20:

v_{0t}^{(k)}=r\left[(1-\alpha)\sum_{i\in I_{t}}v_{i}^{(k)}+\alpha\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right]

21:

\lambda_{t}=\min\left(L_{t},m_{t}\frac{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)

22:end for

Algorithm 6 Backtracking linesearch by Armijo’s rule to find step size

1:Initialize

\beta

\tau

\gamma^{(0)}

\triangleright

e.g.

\beta=0.001

\tau=0.5

\gamma^{(0)}=1

2:Let

h=0

a=1

b=2

3:Let

\bm{s}=\bm{y}-\bm{v}^{(h)}

4:Let

	$\displaystyle f_{v}=$		$\displaystyle\sum_{i=1}^{n}K_{i}\log(v^{(k)}_{i})-\sum_{t\not\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}\right)-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}m_{t}\log\left({v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v}^{(k)})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}{v_{0t}^{(k)}+\sum_{i\in S_{t}}v_{i}^{(k)}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v}^{(k)})\right)$

5:Let

g_{v}s=\sum_{i=1}^{n}g_{i}s_{i}

6:while

a<b

\bm{w}=\bm{v}^{(k)}+\gamma^{(h+1)}\bm{s}

w_{0t}=r\left[(1-\alpha)\sum_{i\in I_{t}}w_{i}+\alpha\sum_{i\in S_{t}}w_{i}\cdot o_{it}\right],~{}t=1,\ldots,T

	$\displaystyle a$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}K_{i}\log(w_{i})-\sum_{t\not\in\mathcal{B}(\bm{w})}m_{t}\log\left(\sum_{i\in S_{t}}w_{i}\cdot o_{it}\right)-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{w})}m_{t}\log\left({w_{0t}+\sum_{i\in S_{t}}w_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{w})}L_{t}\frac{\sum_{i\in S_{t}}w_{i}\cdot o_{it}}{w_{0t}+\sum_{i\in S_{t}}w_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{w})\right)$

10:

b=f_{v}+\gamma^{(h+1)}\beta g_{v}s

11:

\gamma^{(h+1)}=\tau\gamma^{(h)}

12:

h=h+1

13:end while

4.3 Example: non-homogeneous product set with constraint

To demonstrate Algorithms 3 and 5, let us revisit the example presented in Table 8 by adding constraints to the problem. We constrain the arrival rates to be less than twice the observed sales, that is $L_{t}=2m_{t}$ . We use $\alpha=0$ , assuming that the OA product is always available, and set $v_{0t}=1$ for the MM algorithm. Note that $o_{t}=S_{t}$ , since we have fully open and closed assortments. The results, using the MM algorithm, are presented in Table 13.

Estimates	Period															$\mathbf{v}_{i}$
Estimates	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	$\mathbf{v}_{i}$
flt1-prod1	10.41	11.45	9.37	11.80	12.47	10.06	8.74	7.29	9.72	5.83	4.37	6.80	0.97	1.46	1.46	1.000
flt1-prod2	9.40	10.34	8.46	10.65	11.26	9.08	7.89	6.58	8.77	5.26	3.95	6.14	0.88	1.32	1.32	0.903
flt1-prod3	5.11	5.62	4.60	5.79	6.12	4.94	4.29	3.58	4.77	2.86	2.15	3.34	0.48	0.72	0.72	0.491
flt1-prod4	3.71	4.08	3.33	4.20	4.44	3.58	3.11	2.59	3.46	2.07	1.56	2.42	0.35	0.52	0.52	0.356
flt1-prod5	1.38	1.52	1.24	1.56	1.65	1.33	1.16	0.97	1.29	0.77	0.58	0.90	0.13	0.19	0.19	0.133
flt2-prod1																1.000
flt2-prod2																0.903
flt2-prod3																0.491
flt2-prod4																0.356
flt2-prod5																0.133
flt3-prod1	20.82	22.90	18.74	23.59	24.94	20.11	17.49	14.57	19.43	11.66	8.74	13.60	1.94	2.91	2.91	2.000
flt3-prod2	18.79	20.67	16.91	21.30	22.52	18.16	15.79	13.16	17.54	10.52	7.89	12.28	1.75	2.63	2.63	1.806
flt3-prod3	10.22	11.24	9.20	11.58	12.24	9.87	8.58	7.15	9.54	5.72	4.29	6.68	0.95	1.43	1.43	0.982
flt3-prod4	7.41	8.15	6.67	8.40	8.88	7.16	6.22	5.19	6.92	4.15	3.11	4.84	0.69	1.04	1.04	0.712
flt3-prod5	2.76	3.03	2.48	3.13	3.31	2.67	2.32	1.93	2.57	1.54	1.16	1.80	0.26	0.39	0.39	0.265
$\mathbf{\lambda}_{t}$	128.57	141.43	115.71	145.71	154.04	124.22	108.00	90.00	120.00	72.00	54.00	84.00	12.00	18.00	18.00

	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30
flt1-prod1																1.000
flt1-prod2																0.903
flt1-prod3																0.491
flt1-prod4																0.356
flt1-prod5																0.133
flt2-prod1	10.41	11.45	9.37	11.80	12.47	10.06	8.74	7.29	9.72	5.83	4.37	6.80	0.97	1.46	1.46	1.000
flt2-prod2	9.40	10.34	8.46	10.65	11.26	9.08	7.89	6.58	8.77	5.26	3.95	6.14	0.88	1.32	1.32	0.903
flt2-prod3	5.11	5.62	4.60	5.79	6.12	4.94	4.29	3.58	4.77	2.86	2.15	3.34	0.48	0.72	0.72	0.491
flt2-prod4	3.71	4.08	3.33	4.20	4.44	3.58	3.11	2.59	3.46	2.07	1.56	2.42	0.35	0.52	0.52	0.356
flt2-prod5	1.38	1.52	1.24	1.56	1.65	1.33	1.16	0.97	1.29	0.77	0.58	0.90	0.13	0.19	0.19	0.133
flt3-prod1	20.82	22.90	18.74	23.59	24.94	20.11	17.49	14.57	19.43	11.66	8.74	13.60	1.94	2.91	2.91	2.000
flt3-prod2	18.79	20.67	16.91	21.30	22.52	18.16	15.79	13.16	17.54	10.52	7.89	12.28	1.75	2.63	2.63	1.806
flt3-prod3	10.22	11.24	9.20	11.58	12.24	9.87	8.58	7.15	9.54	5.72	4.29	6.68	0.95	1.43	1.43	0.982
flt3-prod4	7.41	8.15	6.67	8.40	8.88	7.16	6.22	5.19	6.92	4.15	3.11	4.84	0.69	1.04	1.04	0.712
flt3-prod5	2.76	3.03	2.48	3.13	3.31	2.67	2.32	1.93	2.57	1.54	1.16	1.80	0.26	0.39	0.39	0.265
$\mathbf{\lambda}_{t}$	128.57	141.43	115.71	145.71	154.04	124.22	108.00	90.00	120.00	72.00	54.00	84.00	12.00	18.00	18.00

Table 13: Schedule change example with constraint: estimated demand and parameters using MM algorithm

We observe the expected symmetry in the results, that is the estimated demands and preference weights are same for flights 1 and 2, and twice for flight 3. We can see that the estimated value of $\lambda_{t}$ is equal to the upper bound $L_{t}$ for time periods 7-15 and 22-30. Using the Frank-Wolfe algorithm, we converge to the same solution. Note that Frank-Wolfe is solving a problem with different constraints, but for this symmetric example and in the case of $\alpha=0$ the two different formulations are equivalent. It is also interesting to note that using a built-in nonlinear optimization routine in R (sequential quadratic programming), we can solve the problem in 141 iterations, taking 2.53 seconds, converging to the same solution. The Frank-Wolfe algorithm with Armijo’s rule converges in 267 iterations taking 0.12 second, and the MM algorithm converges in 11 iterations taking only 0.006 seconds. The convergence of the Frank–Wolfe algorithm is sublinear, in general, and using the default step size $\gamma_{k}=2/(k+2)$ results in even much slower convergence. Note, however, that the Frank-Wolfe algorithm solves an optimization problem with market share constraint at each time period $t$ substituted into the objective function. For this problem we were not able to develop the MM algorithm due to the difficulty of finding a suitable minorizer. The formulation of the MM algorithm uses an aggregate market share constraint, where we assume that $v_{0t}$ are known. The solutions are not equivalent, in general.

5 Conclusions and Future Research

In this paper we discussed some of the practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We presented modifications and extensions of the models and discussed data preprocessing and solution techniques which can be useful for practitioners dealing with sales transaction data. We hope that these discussions could facilitate further methodological progress and even more rigorous theoretical research in this domain. We presented an algorithm to split sales transaction data observed under partial availability, and we extended an EM algorithm for the case we observe a non-homogeneous product set. We developed two iterative optimization algorithms which incorporated partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates. In one formulation we used market share constraint at each time period, and incorporated them into the objective function through the preference weights of the outside alternative. This formulation was solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. We discussed another formulation, which used a single, aggregate market share constraint over the time horizon, and assumed the knowledge of preference weights of the outside alternative. Using this formulation we could develop a very fast, iterative minorization-maximization algorithm building on the work in Abdallah and Vulcano (2016).

Future extension of these methods are possible. For instance, after using the sales splitting algorithm, it would be interesting to group the arrival rates in the EM algorithm and avoid having too many parameters to be estimated. Similarly, it would be of practical interest to extend the EM algorithm to the constrained optimization case or introduce other regularization on the parameters. The MM and the Frank-Wolfe algorithms developed in this paper could be extended to include covariates and additional preference weights for the product with lowest fare. The Frank-Wolfe algorithm could be further improved by taking the gradient descent direction instead of coordinate descent. Finally, in practical applications, we often encounter sparse demand distribution with excess number of zeros and heavy tails. A natural extension of the currently used models would be to use a zero-inflated Poisson or negative binomial distribution to describe the customer arrival process.

6 Appendix

Using KKT conditions to remove $\bm{\lambda}$

The KKT condition of Lagrangian (15) are

	$\displaystyle\frac{\partial{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})}{\partial\lambda_{t}}=\frac{m_{t}}{\lambda_{t}}-\frac{\mu_{t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}=0,~{}t=1,\ldots,T$
	$\displaystyle\mu_{t}(\lambda_{t}-L_{t})=0,~{}t=1,\ldots,T$
	$\displaystyle\mu_{t}\geq 0,~{}t=1,\ldots,T$
	$\displaystyle\lambda_{t}\leq L_{t},~{}t=1,\ldots,T$

Expressing $\lambda_{t}$ and $\mu_{t}$ in first equation results in

	$\displaystyle\lambda_{t}$	$\displaystyle=$	$\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\mu_{t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$
	$\displaystyle\mu_{t}$	$\displaystyle=$	$\displaystyle\frac{m_{t}}{\lambda_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$

Plugging back $\mu_{t}$ , the complementary slackness condition becomes

\displaystyle\left(\frac{m_{t}}{\lambda_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)(\lambda_{t}-L_{t})=0

Therefore, we have that one of the following conditions should hold:

	$\displaystyle\lambda_{t}^{1}$	$\displaystyle=$	$\displaystyle L_{t}$
	$\displaystyle\lambda_{t}^{2}$	$\displaystyle=$	$\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$

So, the partial KKT condition stated earlier reduces to the following conditions:
If $L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$ then

	$\displaystyle\lambda_{t}$	$\displaystyle=$	$\displaystyle L_{t}$
	$\displaystyle\mu_{t}$	$\displaystyle=$	$\displaystyle\frac{m_{t}}{L_{t}}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$

else

	$\displaystyle\lambda_{t}$	$\displaystyle=$	$\displaystyle m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$
	$\displaystyle\mu_{t}$	$\displaystyle=$	$\displaystyle 0$

Let us define $\mathcal{B}(\bm{v})=\left\{t\bigg{|}L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right\}$ The Lagrangian function can be simplified to

$\displaystyle{\mathcal{L}}(\bm{v},\bm{\lambda},\bm{\mu})$	$\displaystyle=$	$\displaystyle l_{I}(\bm{v},\bm{\lambda})-\sum_{t=1}^{T}\left(\frac{\mu_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)(\lambda_{t}-L_{t})$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)+$
		$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left(\frac{L_{t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{1}\left(\mathcal{B}(\bm{v})\right)$
	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
		$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{2}\left(\mathcal{B}(\bm{v})\right)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
		$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}K_{i}\log(v_{i})-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\log\left(\sum_{i\in S_{t}}v_{i}\cdot o_{it}\right)-$
		$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\log\left({v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{v_{0t}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+C_{3}\left(\mathcal{B}(\bm{v})\right)$

where

$\displaystyle C_{1}\left(\mathcal{B}(\bm{v})\right)$	$\displaystyle=$	$\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)$
$\displaystyle C_{2}\left(\mathcal{B}(\bm{v})\right)$	$\displaystyle=$	$\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)+\sum_{t\in\mathcal{B}(\boldsymbol{v})}m_{t}\log L_{t}$
$\displaystyle C_{3}\left(\mathcal{B}(\bm{v})\right)$	$\displaystyle=$	$\displaystyle\sum_{t\not\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log m_{t}-m_{t}\right)+\sum_{t\in\mathcal{B}(\boldsymbol{v})}\left(m_{t}\log L_{t}-L_{t}\right).$	(26)

Note that

\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i}\cdot o_{it})=\sum_{t=1}^{T}\sum_{i\in S_{t}}z_{it}\log(v_{i})+C_{4}=\sum_{i=1}^{n}K_{i}\log(v_{i})+C_{4}

for some constant $C_{4}$ , because $z_{it}=0$ when $i\notin S_{t}$ .

Newton’s method to find $\eta$

To simplify notation, let us define

$\displaystyle A_{j}$	$\displaystyle=$	$\displaystyle\sum_{t\not\in\mathcal{B}_{j}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{\sum_{i\in S_{t}}v_{i}^{(k)}}+\sum_{t\in\mathcal{B}_{j}(\bm{v}^{(k)})}m_{t}\frac{\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)}}+$
		$\displaystyle\sum_{t\in\mathcal{B}_{j}(\bm{v}^{(k)})}L_{t}\frac{v_{0t}\mathbbm{1}(j\in S_{t})\cdot o_{jt}}{{(v_{0t}+\sum_{i\in S_{t}}v_{i}^{(k)})}^{2}}$
$\displaystyle n_{j}$	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbbm{1}(j\in I_{t})$
$\displaystyle o_{j}$	$\displaystyle=$	$\displaystyle\sum_{t=1}^{T}\mathbbm{1}(j\in S_{t})\cdot o_{jt}=\sum_{t=1}^{T}o_{jt}$

where $\mathcal{B}_{j}(\bm{v}^{(k)})=\mathcal{B}(\bm{v}^{(k)})\cap\{t|j\in S_{t}\}$ , $n_{j}$ represents the number of times product $j$ is in the offer set $I_{t}$ over time horizon $T$ , and $o_{j}$ represents the proportion of time product $j$ is available $S_{t}$ over time horizon $T$ . Using the notation above, equation (4.1) simplifies to

\displaystyle K_{j}

\displaystyle=

\displaystyle v_{j}A_{j}+v_{j}\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]

Expressing $v_{j}$ in the above equation and plugging it into the market share constraint (19) leads to

\displaystyle(1-\alpha)\sum_{t=1}^{T}\sum_{j\in I_{t}}\frac{K_{j}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}+\alpha\sum_{t=1}^{T}\sum_{j\in S_{t}}\frac{K_{j}\cdot o_{jt}}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}=\tilde{s}\sum_{t=1}^{T}v_{0t}

and finally to

\displaystyle\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}=\tilde{s}\sum_{t=1}^{T}v_{0t}

where we used general equalities

	$\displaystyle\sum_{t=1}^{T}\sum_{j\in I_{t}}K_{j}$	$\displaystyle=$	$\displaystyle\sum_{j=1}^{n}n_{j}K_{j}$
	$\displaystyle\sum_{t=1}^{T}\sum_{j\in S_{t}}K_{j}\cdot o_{jt}$	$\displaystyle=$	$\displaystyle\sum_{j=1}^{n}o_{j}K_{j}$

This leads to

\displaystyle f=\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}{A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]}-\tilde{s}\sum_{t=1}^{T}v_{0t}

and

\displaystyle g=\frac{df}{d\eta}=-\sum_{j=1}^{n}\frac{K_{j}\left[(1-\alpha)n_{j}+\alpha o_{j}\right]^{2}}{\left(A_{j}+\eta\left[(1-\alpha)n_{j}+\alpha o_{j}\right]\right)^{2}}

and to Newton’s method summarized in Algorithm (4).

Computation of gradient in Frank-Wolfe algorithm

Here we will derive the gradient vector of the objective function (4.2), and show a computational formula. The elements of $\nabla f\left(\bm{v}\right)$ can be derived as

	$\displaystyle\frac{\partial f}{\partial v_{i}}=$		$\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})-(\sum_{i\in S_{t}}v_{i}\cdot o_{it})(r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}$
	$\displaystyle=$		$\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{r(1-\alpha)\mathbbm{1}(i\in I_{t})+(r\alpha+1)\mathbbm{1}(i\in S_{t})o_{it}}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{r(1-\alpha)(\sum_{i\in I_{t}}v_{i})\mathbbm{1}(i\in S_{t})o_{it}-r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})\mathbbm{1}(i\in I_{t})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}$
	$\displaystyle=$		$\displaystyle\frac{K_{i}}{v_{i}}-\sum_{t\not\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}(r\alpha+1)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}-\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in S_{t})o_{it}r(1-\alpha)(\sum_{i\in I_{t}}v_{i})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}-$
			$\displaystyle\sum_{t\in\mathcal{B}(\bm{v})}m_{t}\frac{\mathbbm{1}(i\in I_{t})r(1-\alpha)}{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}+\sum_{t\in\mathcal{B}(\bm{v})}L_{t}\frac{\mathbbm{1}(i\in I_{t})r(1-\alpha)(\sum_{i\in S_{t}}v_{i}\cdot o_{it})}{(v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it})^{2}}$

To implement the gradient with matrix-vector operations let us define

$\displaystyle\bm{K}\in\mathbb{R}^{n}_{+}$	$\displaystyle-$	$\displaystyle K_{i}\textnormal{ is the total purchases of product $i$ over the selling horizon}$
$\displaystyle\bm{v}\in\mathbb{R}^{n}_{+}$	$\displaystyle-$	$\displaystyle v_{i}\textnormal{ is the preference weight for product $i$}$
$\displaystyle\bm{v}_{0}\in\mathbb{R}^{T}_{+}$	$\displaystyle-$	$\displaystyle v_{0t}\textnormal{ is the preference weight for outside alternative at time $t$}$
$\displaystyle r\in\mathbb{R}_{+}$	$\displaystyle-$	$\displaystyle r=(1-s)/s\textnormal{ , where $s$ is the host market share}$
$\displaystyle\bm{I}\in\{0,1\}^{nxT}$	$\displaystyle-$	$\displaystyle I_{it}=1\textnormal{ if product $i$ is in the offer set at time $t$}$
$\displaystyle\bm{S}\in\{0,1\}^{nxT}$	$\displaystyle-$	$\displaystyle S_{it}=1\textnormal{ if product $i$ is available for sale at time $t$}$
$\displaystyle\bm{O}\in[0,1]^{nxT}$	$\displaystyle-$	$\displaystyle o_{it}\textnormal{ is percentage of time product $i$ is available for sale during time period $t$}$
$\displaystyle\bm{m}\in\mathbb{R}^{T}_{+}$	$\displaystyle-$	$\displaystyle m_{t}\textnormal{ is the total purchases of all products at time $t$}$
$\displaystyle\bm{L}\in\mathbb{R}^{T}_{+}$	$\displaystyle-$	$\displaystyle L_{t}\textnormal{ is the upper bound on $\lambda_{t}$}$
$\displaystyle\bm{B}\in\{0,1\}^{T}$	$\displaystyle-$	$\displaystyle B_{t}=1\textnormal{ if bound $L_{t}$ is violated at time $t$}\left(L_{t}<m_{t}\frac{v_{0t}+\sum_{i\in S_{t}}v_{i}\cdot o_{it}}{\sum_{i\in S_{t}}v_{i}\cdot o_{it}}\right)$
$\displaystyle\alpha\in[0,1]$	$\displaystyle-$	parameter to control availability of outside alternative

Then

	$\displaystyle\nabla f\left(\bm{v}\right)=\bm{K}\oslash\bm{v}+(\bm{S}\circ\bm{O})\left[-\bm{m}\oslash\left((\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ(\mathbf{1}-\bm{B})-\right.$
	$\displaystyle\left.(r\alpha+1)\bm{m}\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ\bm{B}-\right.$
	$\displaystyle\left.(r\alpha+1)\bm{L}\circ\left(\bm{I}^{T}\bm{v}\right)\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)^{\circ 2}\right]+$
	$\displaystyle\bm{I}\left[-r(1-\alpha)\bm{m}\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)\circ\bm{B}-\right.$
	$\displaystyle\left.(r\alpha+1)\bm{L}\circ\left((\bm{S}\circ\bm{O})^{T}\bm{v}\right)\oslash\left(\bm{v}_{0}+(\bm{S}\circ\bm{O})^{T}\bm{v}\right)^{\circ 2}\right]$

where $\oslash$ , $\circ$ , and $\circ 2$ denote the elementwise subtraction, addition, and square operations.

Acknowledgments

The authors would like to gratefully acknowledge Guillermo Gallego who suggested the idea of splitting the sales in case of partial availability.

References

(1)
Abdallah and Vulcano (2016) Abdallah, T. and Vulcano, G. (2016), Demand estimation under the multinomial logit model from sales transaction data. Working Paper. Available at https://www.researchgate.net/publication/303408073.
Ben-Akiva and Lerman (1994) Ben-Akiva, M. and Lerman, S. (1994), Discrete Choice Analysis: Theory and Applications to Travel Demand, 6 edn, The MIT Press, Cambridge, MA.
Bertsekas (1999) Bertsekas, D. (1999), Nonlinear Programming, Athena Scientific.
Cao et al. (2019) Cao, Y., Kleywegt, A. and Wang, H. (2019), Network revenue management under a spiked multinomial logit choice model. Working Paper. Available at SSRN:https://ssrn.com/abstract=3200531.
Dai et al. (2014) Dai, J., Ding, W., Kleywegt, A., Wang, X. and Zhang, Y. (2014), Choice-based revenue management for parallel flights. Working paper, Georgia Institute of Technology.
Frank and Wolfe (1956) Frank, M. and Wolfe, P. (1956), ‘An algorithm for quadratic programming’, Naval Research Logistics Quarterly 3, 95–110.
Gallego et al. (2015) Gallego, G., Ratliff, R. and Shebalov, S. (2015), ‘A general attraction model and sales-based linear program for network revenue management under customer choice’, Operations Research 63(1), 212 – 232.
Hunter and Lange (2000a) Hunter, D. R. and Lange, K. (2000a), ‘Rejoinder to discussion of “optimization transfer using surrogate objective functions”’, Journal of Computational and Graphical Statistics 9, 52–59.
Hunter and Lange (2004) Hunter, D. R. and Lange, K. (2004), ‘A tutorial on MM algorithms’, The American Statistician 58(1), 30–37.
Korte and Vygen (2012) Korte, B. H. and Vygen, J. (2012), Combinatorial Optimization: Theory and Algorithms, Springer-Verlag, New York, NY.
Nocedal and Wright (2006) Nocedal, J. and Wright, S. J. (2006), Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2 edn, Springer, New York.
Sharif Azadeh et al. (2014) Sharif Azadeh, S., Marcotte, P. and Savard, G. (2014), ‘A taxonomy of demand uncensoring methods in revenue management’, Journal of Revenue and Pricing Management 13, 440–456.
Train (2003) Train, K. (2003), Discrete choice methods with simulation, Cambridge University Press, New York, NY.
Vulcano et al. (2012) Vulcano, G., van Ryzin, G. and Ratliff, R. (2012), ‘Estimating primary demand for substitutable products from sales transaction data’, Operations Research 60(2), 313–334.
Talluri and van Ryzin (2004) Talluri, K. and van Ryzin, G. (2004), ‘Revenue Management Under a General Discrete Choice Model of Consumer Behavior’, Management Science 50(1), 15–33.

Demand Estimation from Sales Transaction Data - Practical Extensions Submission to Journal of Revenue and Pricing Management

Abstract

1 Introduction

2 Practical limitations of existing models

2.1 Partial availability

2.1.1 Extending the attraction model

2.1.2 Data disaggregation

2.1.3 Comparison on a simulated example

2.2 Strong sell-down

2.3 Constrained parameter space

2.4 Non-homogeneous product set

2.5 Handling market share constraint and no-purchase option

3 Extended EM algorithm

3.1 Example: limited OA availability

3.2 Example: non-homogeneous product set

4 Constrained optimization

4.1 Solution using MM algorithm

4.2 Solution using Frank-Wolfe method

4.3 Example: non-homogeneous product set with constraint

5 Conclusions and Future Research

6 Appendix

Acknowledgments

References

Demand Estimation from Sales Transaction Data - Practical Extensions
Submission to Journal of Revenue and Pricing Management