Demand Estimation from Sales Transaction Data - Practical Extensions
Submission to Journal of Revenue and Pricing Management
Abstract
In this paper we discuss practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We present modifications and extensions of the models and discuss data preprocessing and solution techniques which are useful for practitioners dealing with sales transaction data. Among these, we present an algorithm to split sales transaction data observed under partial availability, we extend a popular Expectation Maximization (EM) algorithm for non-homogeneous product sets, and we develop two iterative optimization algorithms which can handle much of the extensions discussed in the paper.
Sabre
May 31, 2020
Keywords. Demand estimation. Demand untruncation. Multinomial logit model. EM algorithm. MM algorithm. Frank-Wolfe method. Revenue management
1 Introduction
Demand estimation using censored sales transaction data has many applications in airline commercial planning process. We refer readers to a survey paper by Sharif Azadeh et al. (2014) for a brief introduction to this topic. In this paper we discuss some practical limitations and extensions of a particular choice-based demand model popular in the literature to estimate demand from sales transaction data. Discrete choice models (e.g., Ben-Akiva and Lerman (1994), Train (2003)) have provided a popular approach for estimating demand for different products in a set of substitutable items, especially in transportation and revenue management applications. We look at a common demand model, which appeared in several papers, including Dai et al. (2014), Vulcano et al. (2012), and Abdallah and Vulcano (2016). The motivation of this work came from observing a few shortcoming of these models when applied in practice, on airline revenue management data.
We will build on and extend the work presented in Vulcano et al. (2012) and Abdallah and Vulcano (2016). They combine a multinomial logit (MNL) choice model with non-homogeneous Poisson arrivals over multiple periods. The MNL model has been used by many practitioners and researchers to represent the underlying choice behavior. Although its property of independence of irrelevant alternatives (IIA) is somewhat restrictive, the model is simple, leading to tractable estimation and assortment optimization (Talluri and van Ryzin, 2004). The problem is how to jointly estimate the arrival rates of customers and the preference weights of the products via maximizing the likelihood functions. There are two different likelihood functions. The first one is the incomplete data likelihood function (see below and also Eq. (2) in Vulcano et al. (2012)) and the second one is the log-likelihood function which is based on the primary demand (see below and also Eq. (13) in Vulcano et al. (2012)). The inputs are observed historical sales, availability of the products, and market share information.
Our contribution is to discuss practical limitations of the specific model above, and present some interesting extensions. We will discuss partial availability of products, some relaxation of the IIA assumption, constrained parameter space, non-homogeneous product set, and the interpretation of the no-purchase option and related market share. We hope this discussion can facilitate more research and extension of these models.
We will present an algorithm to split sales transaction data observed under partial availability, and an extension of the EM algorithm for the case when we observe a non-homogeneous product set. We develop two iterative optimization algorithms which incorporate partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates of customers. In the first formulation we use a market share constraint at each time period, and incorporate them into the objective function through the preference weights of the outside alternative. The formulation is solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. In the second formulation we use a single, aggregate market share constraint over the time horizon, and assume knowledge of preference weights of the outside alternative. Using this formulation we develop a fast, iterative minorization-maximization algorithm (MM) building on the work in Abdallah and Vulcano (2016).
While the EM algorithm focuses on solving the complete data log-likelihood functions, the two new algorithms (like Abdallah and Vulcano (2016)) aim to solve the incomplete likelihood function directly. We remark that both likelihood functions render similar quality solutions, but they involve different intermediate decision variables and require different solution approaches. It is not known that these two methods are mathematically equivalent.
2 Practical limitations of existing models
In this section we discuss in detail some practical limitations of the choice-based demand model discussed in Vulcano et al. (2012) and in Abdallah and Vulcano (2016). The model combines non-homogeneous Poisson arrivals over multiple periods with a multinomial logit (MNL) choice model. The model assumes a retailer offers a fixed number of products over a time horizon , and the products are either available or not available for sale in a time period. From the observed sales data we estimate the demand, the sales we would have observed if all the products were available for purchase. The total demand of all the products at time (including outside alternatives and the no-purchase option) is modeled as a Poisson distributed random variable with parameter . Hence we model the total demand as a non-homogeneous Poisson model, since different time periods are allowed to have different mean demands.
The demand for individual products is modeled using a multinomial distribution, given the total demand of all products. The preferences for different products are assumed to be fixed over time, and the probability that customer chooses product when is modeled through the simple multinomial logit (MNL) model, that is
(1) |
When , then . Here is the positive preference weight for product , and is the set of products available for sale at time . The preference weight of outside alternatives and no-purchase option (OA) are embedded into the coefficient . The parameter is set to in the references above, following a standard approach of normalizing.
The incomplete data likelihood function of the model is defined as
where
In the above equations denotes the number of purchases of product at time period , and denotes the total number of purchases in period . After some algebra the log-likelihood function can be written as
(2) |
One approach is to directly maximize the log-likelihood function and jointly estimate the preference weights of the products and the arrival rates of customers. The above log-likelihood function, however, is hard to solve in general, therefore the research literature discusses different approaches to estimate the parameters of this model, given sales data and information on what was available for sale. Vulcano et al. (2012) developed an elegant EM algorithm by looking at the problem in terms of primary (first choice) demand, and treating the observed demand as incomplete observations of primary demand. Abdallah and Vulcano (2016) solves the estimation problem by specializing the minorization-maximization (MM) procedure, which is an iterative algorithm for maximizing an objective function by successively maximizing a simpler function that minorizes the true objective function.
It is also interesting to note that the objective function has a continuum of maximizers. For this reason, Vulcano et al. (2012) and Abdallah and Vulcano (2016) imposed additional constraint on the preference weights as a function of the market share
There are a number of limiting assumptions and possible extensions of the model presented above, when we apply it to real data.
-
1.
Products are fully open or closed for sale
-
The discussed model assumes that a product is either available or not for sale. Practitioners often work with aggregated data, and unable to capture the sales at a very granular level, every time the assortment changes. It is very practical to extend the model to consume data where we observe partial availability. For example, a product can be 80% open for sale in a time period.
-
-
2.
MNL assumption
-
Customers choose from the available products according to an MNL model. One of the properties of the MNL model is the independence of irrelevant alternatives (IIA), which can be unrealistic in real applications. If customers, for instance, always purchase the product with the lowest price, the algorithm in Vulcano et al. (2012) does not converge. If the sell-down is strong, the demand is grossly overestimated.
-
-
3.
Unconstrained parameter space
-
In practice we can encounter parameter estimates which are unreasonable in a business scenario. This may be due to the fact that the underlying assumptions of the model (such as MNL) do not exactly fit the true data generating model. Instead of using a more sophisticated model and develop its solution algorithm, we might want to simply constrain the parameters of the simpler model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for regularization (Lasso regression) and regularization (Ridge regression). It can be also of interest to solve the problem by putting an upper bound on the arrival rate of customers (). This can be helpful to regularize the model by having an interpretable business fence on these parameters.
-
-
4.
Homogeneous product set
-
The model assumes that over different time periods we have the same set of existing products. Some might be unavailable for sale, but there exist an underlying demand for them. In revenue management, we often encounter changing product IDs (unique flight identifiers) due to schedule changes, hence we need to be able to model a non-homogeneous product set over time. Instead of handling the homogeneous parts separately, we would like to use the data over a larger time interval to borrow power to estimate the parameters.
-
-
5.
No-purchase option is always available
-
The model above assumes that the no-purchase option is always available, that is fully open. If we include competitor’s products into the no-purchase option, this can be an unrealistic assumption. For instance, airline competitors likely control their inventory similarly as the host airline, gradually closing down products as we get closer to departure. This assumption has implications on how the host market share is interpreted in the models.
-
In this paper we address some of these points, and discuss how we could relax these assumptions, incorporate them into the model, and estimate the parameters. These ideas might foster further extensions of the existing models and rigorous research in the future.
2.1 Partial availability
Practitioners can encounter data at an aggregate level, where we can observe partial availability of products. As an example, in a revenue management system we store information on sales and availability at certain pre-departure time points. It is possible that a product becomes closed for sale during the time period, and we observe partial availability. For instance, a booking class on a flight can be open 60% of the time in a time interval, and the bookings we observe are matched to this partial availability. It would be of interest to extend the algorithms to work with this type of data and handle the full spectrum of availability in , as opposed to be restricted to either open (1) or closed (0) products. Another possible venue is to modify the data to fit the algorithms already developed in the literature.
2.1.1 Extending the attraction model
A natural way to incorporate partial availability is to extend the MNL purchase probabilities as
where represents availability of product at time . It is obvious that is equivalent to . This simple formulation modifies the purchase probabilities by linearly adjusting preference weight with open percentage . If the open percentage is zero or one, we get back to the original formulation.
We have not derived the EM algorithm of Vulcano et al. (2012) with the extended purchase probability definition, but this could be an interesting venue for research. For demonstration purposes we can resort to directly maximizing the log-likelihood using a solver. The modified likelihood function becomes
(3) |
where
and, after omitting the constant terms, the log-likelihood function modifies to
(4) |
2.1.2 Data disaggregation
In the previous section we discussed a simple formulation which could potentially be used to incorporate partial availability information into the purchase probability definition. Another approach to handle partial availability is to disaggregate the data to fully open and closed assortments, and use existing algorithms for the estimation. We can split the observed sales under partial availability by making a simple assumption that the sales are distributed uniformly over time.
To demonstrate this idea on a simple example, let us assume that the observed sales and open percentages for three available products are
We assumed that the elements of are non-increasing from top to bottom. Then we can represent as the sum of three fully open and closed assortments with weights as
Note that represent time proportions and . The results indicate that all products were available for sale 50% of the time, two products were available 30% of the time, and one product was available 20% of the time. A graphical representation of this example is shown in Figure 1.
For the general case, the elements of can be calculated as the consecutive differences in the open percentages, that is
Following this simple idea we can split the observed sales under partial availability using the following identity
where denotes the element-wise multiplication, or Hadamard product. Note that if , we do not need a split to create a new assortment. The sales splitting algorithm under partial availability is described in Algorithm 1.
The algorithm results in partial sales vectors and fully open and closed assortment vectors for . The splitting logic ensures that
After the splitting logic, we could combine intervals from different time which have exactly the same product offerings and product availabilities, to improve performance.
2.1.3 Comparison on a simulated example
To demonstrate the estimation from sales data with partial availability, we extended the example from Vulcano et al. (2012) by adding partial availability data to the observed sales. The observed data is presented in Table 1.
Sales | Period | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | |
1 | 10 | 15 | 11 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 11 | 6 | 11 | 8 | 20 | 16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 5 | 6 | 1 | 11 | 4 | 5 | 14 | 7 | 11 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 4 | 4 | 4 | 1 | 6 | 4 | 3 | 5 | 9 | 9 | 6 | 9 | 0 | 0 | 0 |
5 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 3 | 0 | 3 | 3 | 5 | 2 | 3 | 3 |
Open percentage | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |
1 | 0.7 | 0.3 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.8 | 0.5 | 1.0 | 1.0 | 0.3 | 0.3 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.9 | 0.9 | 1.0 | 1.0 | 0.3 | 0.5 | 0.9 | 0.7 | 0.7 | 0.0 | 0.0 | 0.0 | 0.2 | 0.2 | 0.0 |
4 | 1.0 | 0.9 | 1.0 | 1.0 | 0.5 | 1.0 | 1.0 | 0.8 | 0.9 | 0.6 | 0.5 | 0.2 | 0.3 | 0.5 | 0.0 |
5 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.9 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
We first use the extended attraction model discussed in Section 2.1.1. Direct maximization of the log-likelihood (4) by a nonlinear solver results in total primary demand of 1194.6. The detailed estimated demands and parameters are presented in Table 2.
Estimates | Period | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | ||
1 | 15.02 | 20.07 | 12.47 | 15.70 | 33.60 | 22.73 | 19.60 | 19.33 | 24.84 | 38.14 | 32.27 | 84.35 | 5.74 | 7.22 | 35.06 | 1.000 |
2 | 11.23 | 15.01 | 9.32 | 11.74 | 25.13 | 17.00 | 14.66 | 14.45 | 18.58 | 28.53 | 24.14 | 63.08 | 4.29 | 5.40 | 26.22 | 0.748 |
3 | 3.91 | 5.22 | 3.24 | 4.09 | 8.74 | 5.92 | 5.10 | 5.03 | 6.46 | 9.93 | 8.40 | 21.95 | 1.49 | 1.88 | 9.13 | 0.260 |
4 | 1.97 | 2.63 | 1.63 | 2.06 | 4.41 | 2.98 | 2.57 | 2.53 | 3.26 | 5.00 | 4.23 | 11.06 | 0.75 | 0.95 | 4.60 | 0.131 |
5 | 0.40 | 0.53 | 0.33 | 0.41 | 0.89 | 0.60 | 0.52 | 0.51 | 0.66 | 1.01 | 0.85 | 2.23 | 0.15 | 0.19 | 0.93 | 0.026 |
46.48 | 62.10 | 38.57 | 48.57 | 103.95 | 70.32 | 60.65 | 59.79 | 76.84 | 118.01 | 99.84 | 260.94 | 17.76 | 22.34 | 108.48 |
Second, we demonstrate the data disaggregation procedure from Section 2.1.2. We first split the observed sales data with partial availability to fully open and closed assortments, apply the EM algorithm of Vulcano et al. (2012), and then aggregate the solution back. To demonstrate this, let us apply Algorithm 1 on the data presented in Table 1. The disaggregated sales are shown in Table 3.
Sales | Period | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | ||||||||||||
1 | 10.00 | 15.00 | 11.00 | 14.00 | ||||||||||||||
2 | 1.38 | 9.62 | 2.40 | 3.60 | 11.00 | 8.00 | 20.00 | 16.00 | ||||||||||
3 | 0.56 | 0.56 | 3.89 | 2.67 | 1.33 | 2.00 | 1.00 | 11.00 | 4.00 | 2.00 | 3.00 | 14.00 | ||||||
4 | 0.40 | 0.40 | 0.40 | 2.80 | 1.78 | 0.89 | 1.33 | 4.00 | 1.00 | 2.40 | 3.60 | 2.00 | 0.80 | 1.20 | 0.30 | 2.70 | ||
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.20 | 0.80 | 0.40 | 0.60 | 0.00 | 0.00 | 0.50 | 0.20 | 0.30 | 0.00 | 0.00 | 0.00 | 0.10 | 0.90 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||
2 | |||||||||||||||||||
3 | 7.00 | 11.00 | 0.00 | 0.00 | |||||||||||||||
4 | 0.63 | 4.37 | 2.00 | 7.00 | 9.00 | 6.00 | 9.00 | 0.00 | 0.00 | 0.00 | 0.00 | ||||||||
5 | 0.33 | 0.33 | 2.33 | 0.00 | 0.00 | 0.00 | 1.20 | 1.80 | 1.50 | 1.50 | 4.00 | 1.00 | 1.40 | 0.20 | 0.40 | 1.50 | 0.90 | 0.60 | 3.00 |
After applying the EM algorithm on the disaggregated data, we end up with the disaggregated solution, presented in Table 4.
Estimates | Period | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | |||||||||||||
1 | 0.81 | 0.91 | 1.33 | 10.00 | 2.41 | 5.02 | 2.86 | 15.00 | 11.00 | 14.00 | 6.03 | 5.27 | 15.88 | 4.05 | 2.68 | 11.50 | 0.81 | 16.85 | 1.000 |
2 | 0.59 | 0.67 | 0.94 | 9.62 | 1.76 | 3.65 | 1.64 | 3.60 | 11.00 | 8.00 | 4.39 | 3.83 | 13.63 | 2.95 | 1.95 | 10.90 | 0.59 | 12.26 | 0.727 |
3 | 0.24 | 0.25 | 0.38 | 3.89 | 0.71 | 1.20 | 0.91 | 2.00 | 1.00 | 11.00 | 1.77 | 1.55 | 2.73 | 1.19 | 0.90 | 2.04 | 0.24 | 6.29 | 0.294 |
4 | 0.14 | 0.18 | 0.27 | 2.80 | 0.36 | 0.80 | 0.61 | 1.33 | 4.00 | 1.00 | 0.91 | 0.85 | 2.45 | 0.71 | 0.36 | 0.82 | 0.11 | 1.21 | 0.150 |
5 | 0.00 | 0.00 | 0.00 | 0.00 | 0.06 | 0.36 | 0.27 | 0.60 | 0.00 | 0.00 | 0.15 | 0.07 | 0.20 | 0.00 | 0.00 | 0.00 | 0.04 | 0.40 | 0.026 |
2.55 | 2.87 | 4.16 | 37.59 | 7.57 | 15.76 | 8.97 | 32.19 | 38.57 | 48.57 | 18.94 | 16.54 | 49.85 | 12.73 | 8.41 | 36.09 | 2.55 | 52.89 |
8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 4.02 | 1.94 | 13.13 | 0.00 | 4.05 | 17.23 | 14.48 | 21.90 | 18.10 | 15.21 | 48.27 | 20.27 | 16.89 | 0.41 | 0.38 | 18.10 | 1.82 | 0.57 | 36.20 | 1.000 |
2 | 2.93 | 1.41 | 9.55 | 0.00 | 2.95 | 12.54 | 10.53 | 15.93 | 13.17 | 11.06 | 35.11 | 14.75 | 12.29 | 0.29 | 0.28 | 13.17 | 1.33 | 0.42 | 26.33 | 0.727 |
3 | 1.18 | 0.57 | 3.15 | 0.00 | 1.19 | 4.95 | 4.26 | 6.44 | 5.32 | 4.47 | 14.19 | 5.96 | 4.97 | 0.12 | 0.00 | 5.32 | 0.54 | 0.00 | 10.64 | 0.294 |
4 | 0.60 | 0.22 | 1.97 | 0.00 | 0.71 | 3.15 | 2.17 | 3.20 | 2.72 | 2.14 | 7.24 | 3.20 | 2.53 | 0.00 | 0.00 | 2.72 | 0.00 | 0.00 | 5.43 | 0.150 |
5 | 0.10 | 0.12 | 1.05 | 0.00 | 0.00 | 0.00 | 0.37 | 0.64 | 0.46 | 0.53 | 1.23 | 0.36 | 0.43 | 0.07 | 0.18 | 0.46 | 0.32 | 0.27 | 0.92 | 0.026 |
12.62 | 6.10 | 41.19 | 0.00 | 12.73 | 54.09 | 45.45 | 68.72 | 56.81 | 47.72 | 151.49 | 63.63 | 53.02 | 1.27 | 1.20 | 56.81 | 5.73 | 1.80 | 113.62 |
Note that the estimated primary demands do not preserve the split proportions of the sales. For instance, in period 9, and , while the sales were split by proportions and . Note that the split proportions of sales do not need to match the proportion of demands, because the varying assortments imply varying market shares and spilled demand. However, to reduce the number of parameters to be estimated, it would be possible to modify the EM algorithm and incorporate constraints on to preserve the split proportions. Aggregating the results back, we end up with the final results presented in Table 5.
Estimates | Period | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | ||
1 | 13.05 | 25.29 | 11.00 | 14.00 | 27.19 | 18.23 | 17.66 | 19.09 | 21.29 | 36.38 | 33.31 | 68.54 | 17.68 | 20.50 | 36.20 | 1.000 |
2 | 11.82 | 10.64 | 11.00 | 8.00 | 21.85 | 15.80 | 12.85 | 13.89 | 15.48 | 26.46 | 24.22 | 49.85 | 12.86 | 14.91 | 26.33 | 0.727 |
3 | 4.76 | 4.82 | 1.00 | 11.00 | 6.05 | 4.14 | 6.53 | 4.90 | 6.14 | 10.70 | 9.79 | 20.15 | 5.09 | 5.86 | 10.64 | 0.294 |
4 | 3.39 | 3.10 | 4.00 | 1.00 | 4.21 | 1.89 | 1.32 | 2.79 | 3.86 | 5.38 | 4.85 | 10.45 | 2.53 | 2.72 | 5.43 | 0.150 |
5 | 0.00 | 1.29 | 0.00 | 0.00 | 0.43 | 0.00 | 0.44 | 1.27 | 0.00 | 1.01 | 1.00 | 1.59 | 0.68 | 1.05 | 0.92 | 0.026 |
47.17 | 64.50 | 38.57 | 48.57 | 85.33 | 57.23 | 55.43 | 59.92 | 66.82 | 114.17 | 104.53 | 215.12 | 55.50 | 64.34 | 113.62 |
The results are fairly similar to the results in Table 2, where we incorporated partial availability into the attraction model. The total primary demand is 1190.8 as opposed to 1194.6. To benchmark these solutions, a naive approach would be to use the projection method to preprocess the observed sales using the open percentage information, such as
and then apply EM algorithm on the preprocessed data. With this approach we estimate the overall primary demand as 1519.9. This shows that incorporating partial availability natively into the attraction model or disaggregating the sales data are much more robust approaches. In case we observe small outliers in partial availability data, we end up with very large preprocessed sales using the projection method. Experiments showed that the disaggregation method dampens the effect of outliers the most.
2.2 Strong sell-down
The demand model discussed in Vulcano et al. (2012) and Abdallah and Vulcano (2016) assumes that customers choose from the available products based on the Basic Attraction Model (BAM) with the IIA property. The probability of product selection is governed by (1). This model cannot fit to scenarios with strong sell-down, and the same applies to the Generalized Attraction Model (GAM) discussed in Gallego et al. (2015). Strong sell-down can be a common phenomena in practice, because customers often seek to purchase the cheapest available product. A 100% sell-down example is presented in Table 6.
Sales | Period | |||||
---|---|---|---|---|---|---|
6 | 5 | 4 | 3 | 2 | 1 | |
1 | 2 | 6 | 0 | 0 | 0 | 0 |
2 | 13 | 15 | 0 | 0 | ||
3 | 20 | 22 |
The EM algorithm developed in Vulcano et al. (2012) does not converge on this example, because the attraction model is unable to fit to the scenario presented in the data. The estimated first choice demands will converge to an unbounded solution.
A natural way to handle sell-down is by introducing an additional parameter for the lowest available product. The attraction model (1) would be extended as
(5) |
Parameter is an additional preference weight for the product with the lowest price, or the product with excess attraction in the assortment. In practice we could add additional preference weights by group of products, depending on the structure of the problem. is a set of the indices for the lowest available product over time, and is an indicator function, representing whether product is the lowest available product at time .
For the example in Table 6, . means that at the lowest available class is the second class from the top.
To present a motivating example with strong sell-down: assuming that the true parameters are , , , and , the purchase probabilities for various assortments, induced by (5), are shown in Table 7.
Probability | Assortment | ||
---|---|---|---|
8.8% | 8.3% | 8.2% | |
91.2% | 3.3% | 3.3% | |
88.4% | 5.7% | ||
82.8% |
These purchase probabilities can describe a realistic scenario, where the lowest available product receives majority of the demand. The products available above, however, still maintain the IIA property. This extended model fits data better with strong sell-down, which cannot be described by the basic MNL model.
Essentially the same idea, with more parameters than just the scalar and a slightly more general formulation, the spiked-MNL model was considered in Cao et al. (2019). The spiked-MNL model extends the classical MNL model by having a separate attractiveness parameter for the cheapest available fare class on each flight.
The likelihood function for the extended model with parameter becomes
where
and the log-likelihood function modifies to
(6) | |||||
Note that in this paper we are not providing an algorithm for maximizing the log-likelihood (6), but we can rely on available nonlinear solvers, if needed. It would be of practical interest, however, to extend the EM (Vulcano et al., 2012) and MM (Abdallah and Vulcano, 2016) algorithms to estimate the parameters of this extended model.
2.3 Constrained parameter space
Algorithms based on the model assumptions above can lead to unreasonable estimates in practice. For instance, on airline data we observed estimated demands which were high in a business scenario. This can happen due to data sparsity, outliers, data preprocessing steps, and most likely that the assumptions of the MNL model do not fit the true data generating model. Therefore, in practical applications, we might want to constrain the parameters of the model. Abdallah and Vulcano (2016) discusses regularization as an elegant solution to the problem, and they develop algorithms for regularization (Lasso regression) and regularization (Ridge regression). It can be also of interest to apply constraints on the arrival rate of customers (), having an interpretable business fence on these parameters.
In practice we could solve a constrained maximum likelihood problem, such as
s.t. | ||||
where the constraint ensures that the overall arrival rate or mean total demand over the time horizon is less than an upper bound . could be defined, for instance, as times the total observed sales, that is
is a regularization parameter which has to be set based on practical considerations. Note that includes the no purchase alternative, so we scale the total sales with market share . Alternatively, we could constrain the arrival rates at each time period and solve constrained maximum likelihood problem
s.t. | ||||
Similarly, we could set , a constant times the observed sales at time period . It would be of practical interest to extend the EM algorithm of Vulcano et al. (2012) to solve this constrained problem. In Section 4 we will extend the MM algorithm in Abdallah and Vulcano (2016) to this constrained problem and also discuss how to solve the optimization problem using the Frank-Wolfe algorithm.
2.4 Non-homogeneous product set
In revenue management, we often encounter changing products due to changes in the system, hence we need to deal with a non-homogeneous product set over time. Instead of dividing the observed data to homogeneous parts, we want to extend the algorithms to handle non-homogeneous product offerings, allowing us to use the whole data set and borrow power to accurately estimate the parameters.
Let us look at a hypothetical airline sales example in Table 8, which was created from the synthetic example in Vulcano et al. (2012). We did this, so the estimated results are easy to compare.
Sales | Period | ||||||||||||||||||||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | ||
1 | flt1-prod1 | 10 | 15 | 11 | 14 | ||||||||||||||||||||||||||
2 | flt1-prod2 | 11 | 6 | 11 | 8 | 20 | 16 | ||||||||||||||||||||||||
3 | flt1-prod3 | 5 | 6 | 1 | 11 | 4 | 5 | 14 | 7 | 11 | |||||||||||||||||||||
4 | flt1-prod4 | 4 | 4 | 4 | 1 | 6 | 4 | 3 | 5 | 9 | 9 | 6 | 9 | ||||||||||||||||||
5 | flt1-prod5 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 3 | 0 | 3 | 3 | 5 | 2 | 3 | 3 | |||||||||||||||
6 | flt2-prod1 | 10 | 15 | 11 | 14 | ||||||||||||||||||||||||||
7 | flt2-prod2 | 11 | 6 | 11 | 8 | 20 | 16 | ||||||||||||||||||||||||
8 | flt2-prod3 | 5 | 6 | 1 | 11 | 4 | 5 | 14 | 7 | 11 | |||||||||||||||||||||
9 | flt2-prod4 | 4 | 4 | 4 | 1 | 6 | 4 | 3 | 5 | 9 | 9 | 6 | 9 | ||||||||||||||||||
10 | flt2-prod5 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 3 | 0 | 3 | 3 | 5 | 2 | 3 | 3 | |||||||||||||||
11 | flt3-prod1 | 20 | 30 | 22 | 28 | 20 | 30 | 22 | 28 | ||||||||||||||||||||||
12 | flt3-prod2 | 22 | 12 | 22 | 16 | 40 | 32 | 22 | 12 | 22 | 16 | 40 | 32 | ||||||||||||||||||
13 | flt3-prod3 | 10 | 12 | 2 | 22 | 8 | 10 | 28 | 14 | 22 | 10 | 12 | 2 | 22 | 8 | 10 | 28 | 14 | 22 | ||||||||||||
14 | flt3-prod4 | 8 | 8 | 8 | 2 | 12 | 8 | 6 | 10 | 18 | 18 | 12 | 18 | 8 | 8 | 8 | 2 | 12 | 8 | 6 | 10 | 18 | 18 | 12 | 18 | ||||||
15 | flt3-prod5 | 0 | 4 | 0 | 0 | 2 | 0 | 2 | 6 | 0 | 6 | 6 | 10 | 4 | 6 | 6 | 0 | 4 | 0 | 0 | 2 | 0 | 2 | 6 | 0 | 6 | 6 | 10 | 4 | 6 | 6 |
We have 3 flights, each of them having 5 different products, a total of products. The products of flight 1 only exist at the first 15 time periods, and after that the label changes to flight 2. This can happen for various reasons, for example schedule change. It is not always possible to preprocess the data and match the products of flight 1 to flight 2. We can see that the product offerings are non-homogeneous over time, or unbalanced. Product 1 of flight 1 is not available for sale for time periods 5-15, which is distinguished from not being part of the product offer set at time periods 16-30. Notice that the observed sales data of flight 1 for time periods 1-15 is the same as the synthetic example created in Vulcano et al. (2012), and the observed sales for Flight 3 are just twice of that, like a flight with double the demand and capacity.
To introduce non-homogeneous product set into the notation, let denote the set of existing products of the retailer at time . Note that this is different from , which is the set of products available for purchase at time , therefore . For instance, in Table 8, and . The set of all products is .
2.5 Handling market share constraint and no-purchase option
We mentioned that the objective function (2) has an infinite number of maximum likelihood estimates, therefore an additional constraint is applied in Vulcano et al. (2012) on the preference weights as a function of the market share
(7) |
using the standard scaling of . The market share constraint is linear but the objective function is non-convex, therefore Abdallah and Vulcano (2016) formulates the problem using in the objective function, so the market share constraint becomes
They establish that the objective function in this space is a concave function, and they show how to deal with the non-convex market share constraint by solving the problem without the constraint and then using a transformation to satisfy the constraint.
The outside alternative could represent the competitor’s product, or both the competitor’s product and the no-purchase option, which would lead to different interpretations of and (Vulcano et al., 2012). The outside alternative is treated as a separate product that is always available, therefore is constant and the standard scaling is used.
To look further, let denote the set of all products of the retailer, that is , so it follows that . With the market share constraint above, the probability of selecting one of the retailer’s product, when everything is available for sale, is , that is
When the retailer offers an assortment with a subset of all the products, it follows that
given is fixed. This means that the model induced market share () of the retailer at time is less than , and the share of outside alternative is larger than . If we think of the outside alternative as the competitor’s product, one could argue that the competitor’s product might not always be available for purchase either, so the preference weight is not constant over time. Also, the estimated market share is calculated over all possible sets of assortments, not only when all the retailer’s products are available for sale.
To extend this model, we can relax the assumption of constant, time-independent outside alternative and introduce , allowing the preference weight to change over time. We also need this extension to be able to incorporate non-homogeneous product sets into the market share constraint, by using , the offer set of retailer at time (Section 2.4). The market share constraint modifies to a set of constraints
(8) |
which ensure that the share of outside alternative is , for all time points , when all the retailer’s products are available for sale. That is
Now let us consider an edge case scenario where the retailer’s model induced market share is constant over time regardless of the assortment he offers. This could be achieved with the set of market share constraints
(9) |
leading to
The retailers’s share remains and the share of the outside alternative remains , at each time , regardless of what products are available for purchase. This edge case might not make sense for all applications, but it is interesting to consider as an alternative to the assumption that the outside alternative is always fully available. In the airline retail context we could think of the outside alternative as the competitor’s products whose availability can be as limited as the host airline’s products.
To introduce a continuum of cases between (8) and (9), let us introduce parameter controlling the availability of outside alternative. represents the case where the outside alternative is always available for sale (8), and naturally, represents the case where the outside alternative limits availability the same fashion as the retailer. This is more of a hypothetical case, given the outside alternative could include the no purchase option, which is always an available choice. Combining (8) and (9), the set of market share constraints become
(10) |
Using , and , the constraints simplify to (7).
The models require the knowledge of market share , which in practice can be difficult to acquire, and the estimate itself can be inaccurate. Abdallah and Vulcano (2016) included a study on the sensitivity of model estimates to the input market share. We mentioned that, in practice, market share is most likely estimated from data observed over various sets of assortments, not only when all the retailer’s products are available for sale. Therefore, it could also make sense to use a market share constraint aggregated over time horizon , that is
(11) |
This constraint ensures that the market share over the time horizon is , taking into account the changing offer set over time. This is less restrictive and could be a more reasonable assumption than using (7), which forces the market share at each time point with a set of constraints. Adding to control availability of the outside alternative, we would use
(12) |
Finally, we want to point out that instead of solving the market share constrained optimization problem
s.t. | ||||
we can directly incorporate the market share constraint into the objective function through , with , and solve
s.t. | ||||
or by setting . The scaling constraint on is required to avoid multiple solutions.
Incorporating non-homogeneous product set and control of availability of outside alternative, we would need to solve
s.t. | ||||
We will see in Section 4.2 that this formulation with upper bound constraints on the arrival rate is easy to solve using the Frank-Wolfe method.
3 Extended EM algorithm
In this section we extend the EM algorithm of Vulcano et al. (2012) with non-homogeneous product sets (Section 2.4) and the ability to control the availability of outside alternative (Section 2.5).
We incorporate , the offer set of retailer at time , and , the time-dependent preference weight for the outside alternative into the algorithm. We use
where and controls the availability of the outside alternative. Adding these into the formulation, we can modify the key E-step equations of the EM algorithm as
For the M-step we need to maximize the conditional expected, complete data log-likelihood function, which becomes
(13) |
To find the maxima, the first order conditions become
which is a system of nonlinear equations in . The solution can be obtained with existing implementations of iterative methods, such as Newton-Raphson. We can also use a simple fixed point iteration algorithm by rewriting to and then using the fixed point iteration . In our case the update becomes
In case of homogeneous product set, so that , we get a closed form solution to the M-step, that is
The result is easy to obtain by assuming . Note that this closed-form equation is different from the one derived in Vulcano et al. (2012), since they used a different parametrization by setting .
Algorithm 2 presents the extended EM algorithm with non-homogeneous product set and the ability to control the availability of outside alternative.
If, in practice, we observe data with partial availability, we recommend to split the sales to fully open and closed assortments (Algorithm 1), apply extended EM (Algorithm 2), and aggregate the solution back. It would be an interesting future research topic to further extend the EM algorithm and incorporate constraints on the arrival rates (Section 2.3).
3.1 Example: limited OA availability
In this section we apply Algorithm 2 on the simulated example from Vulcano et al. (2012) and demonstrate what happens when we limit the availability of the outside alternative simultaneously with the retailer’s availability (). The observed sales data is presented in Table 9.
Sales | Period | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | |
1 | 10 | 15 | 11 | 14 | |||||||||||
2 | 11 | 6 | 11 | 8 | 20 | 16 | |||||||||
3 | 5 | 6 | 1 | 11 | 4 | 5 | 14 | 7 | 11 | ||||||
4 | 4 | 4 | 4 | 1 | 6 | 4 | 3 | 5 | 9 | 9 | 6 | 9 | |||
5 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 3 | 0 | 3 | 3 | 5 | 2 | 3 | 3 |
First let us look at the solution, using , presented in Table 10. We closely recover the results of Vulcano et al. (2012), since we make the same assumption that the outside alternative is always available.
Estimates | Period | |||||||||||||||
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | ||
1 | 10 | 15 | 11 | 14 | 15.03 | 12.12 | 13.04 | 10.87 | 14.49 | 15.91 | 11.93 | 18.56 | 11.51 | 17.27 | 17.27 | 1.000 |
2 | 11 | 6 | 11 | 8 | 14.35 | 11.48 | 10.45 | 8.71 | 11.61 | 12.75 | 9.56 | 14.88 | 9.23 | 13.84 | 13.84 | 0.801 |
3 | 5 | 6 | 1 | 11 | 2.87 | 3.59 | 6.88 | 3.44 | 5.41 | 6.22 | 4.67 | 7.26 | 4.50 | 6.75 | 6.75 | 0.391 |
4 | 4 | 4 | 4 | 1 | 4.31 | 2.87 | 1.47 | 2.46 | 4.42 | 3.43 | 2.29 | 3.43 | 2.68 | 4.02 | 4.02 | 0.233 |
5 | 0 | 2 | 0 | 0 | 0.72 | 0.00 | 0.49 | 1.47 | 0.00 | 1.14 | 1.14 | 1.91 | 0.63 | 0.95 | 0.95 | 0.055 |
42.86 | 47.14 | 38.57 | 48.57 | 53.26 | 42.95 | 46.19 | 38.50 | 51.33 | 56.37 | 42.28 | 65.76 | 40.78 | 61.18 | 61.18 |
Using , the estimated primary demand at each time period is equal to the observed purchases, because the availability of outside alternative is restricted as the retailer’s availability, preserving the model induced market share at each time . The results are presented in Table 11.
Estimates | Period | |||||||||||||||
15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | ||
1 | 10 | 15 | 11 | 14 | 12.50 | 10.08 | 7.26 | 6.05 | 8.06 | 4.84 | 3.63 | 5.65 | 0.81 | 1.21 | 1.21 | 1.000 |
2 | 11 | 6 | 11 | 8 | 11.94 | 9.55 | 5.75 | 4.79 | 6.39 | 3.83 | 2.87 | 4.47 | 0.64 | 0.96 | 0.96 | 0.792 |
3 | 5 | 6 | 1 | 11 | 2.39 | 2.98 | 3.88 | 1.94 | 3.05 | 1.92 | 1.44 | 2.24 | 0.32 | 0.48 | 0.48 | 0.396 |
4 | 4 | 4 | 4 | 1 | 3.58 | 2.39 | 0.83 | 1.39 | 2.50 | 1.06 | 0.71 | 1.06 | 0.20 | 0.30 | 0.30 | 0.245 |
5 | 0 | 2 | 0 | 0 | 0.60 | 0.00 | 0.28 | 0.83 | 0.00 | 0.35 | 0.35 | 0.59 | 0.04 | 0.06 | 0.06 | 0.046 |
42.86 | 47.14 | 38.57 | 48.57 | 44.29 | 35.71 | 25.71 | 21.43 | 28.57 | 17.14 | 12.86 | 20.00 | 2.86 | 4.29 | 4.29 |
Notice that the estimated preference weights in Tables 10 and 11 are close to each other, the estimates are not sensitive to the value of . However, the estimated arrival rates changed drastically, due to the change in the model induced market shares.
Parameter can be used as a tool for the practitioner to control the availability of the outside alternative between these two edge cases. We would like to emphasize, however, that competitor matching can be risky and the value of should ideally be inferred from exogenous data. The conservative practice is to use .
3.2 Example: non-homogeneous product set
To demonstrate the extended EM algorithm on a non-homogeneous product set, let us revisit the example presented in Table 8 (Section 2.4). This is a hypothetical airline sales example with a schedule change. We observe 3 flights, where the sales of products of flight 1 are discontinued and the repeated as flight 2, and flight 3 has twice the observed sales of flights 1 and 2. The solution using is presented in Table 12.
Estimates | Period | |||||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ||
flt1-prod1 | 10 | 15 | 11 | 14 | 15.03 | 12.12 | 13.04 | 10.87 | 14.49 | 15.91 | 11.93 | 18.56 | 11.51 | 17.27 | 17.27 | 1.000 |
flt1-prod2 | 11 | 6 | 11 | 8 | 14.35 | 11.48 | 10.45 | 8.71 | 11.61 | 12.75 | 9.56 | 14.88 | 9.23 | 13.84 | 13.84 | 0.801 |
flt1-prod3 | 5 | 6 | 1 | 11 | 2.87 | 3.59 | 6.88 | 3.44 | 5.41 | 6.22 | 4.67 | 7.26 | 4.50 | 6.75 | 6.75 | 0.391 |
flt1-prod4 | 4 | 4 | 4 | 1 | 4.31 | 2.87 | 1.47 | 2.46 | 4.42 | 3.43 | 2.29 | 3.43 | 2.68 | 4.02 | 4.02 | 0.233 |
flt1-prod5 | 0 | 2 | 0 | 0 | 0.72 | 0.00 | 0.49 | 1.47 | 0.00 | 1.14 | 1.14 | 1.91 | 0.63 | 0.95 | 0.95 | 0.055 |
flt2-prod1 | 1.000 | |||||||||||||||
flt2-prod2 | 0.801 | |||||||||||||||
flt2-prod3 | 0.391 | |||||||||||||||
flt2-prod4 | 0.233 | |||||||||||||||
flt2-prod5 | 0.055 | |||||||||||||||
flt3-prod1 | 20 | 30 | 22 | 28 | 30.07 | 24.25 | 26.08 | 21.73 | 28.98 | 31.82 | 23.87 | 37.12 | 23.02 | 34.54 | 34.54 | 2.000 |
flt3-prod2 | 22 | 12 | 22 | 16 | 28.71 | 22.97 | 20.90 | 17.42 | 23.22 | 25.50 | 19.13 | 29.75 | 18.45 | 27.68 | 27.68 | 1.603 |
flt3-prod3 | 10 | 12 | 2 | 22 | 5.74 | 7.18 | 13.76 | 6.88 | 10.81 | 12.44 | 9.33 | 14.52 | 9.00 | 13.51 | 13.51 | 0.782 |
flt3-prod4 | 8 | 8 | 8 | 2 | 8.61 | 5.74 | 2.95 | 4.92 | 8.85 | 6.86 | 4.57 | 6.86 | 5.36 | 8.04 | 8.04 | 0.465 |
flt3-prod5 | 0 | 4 | 0 | 0 | 1.44 | 0.00 | 0.98 | 2.95 | 0.00 | 2.29 | 2.29 | 3.81 | 1.26 | 1.89 | 1.89 | 0.110 |
128.57 | 141.43 | 115.71 | 145.71 | 159.79 | 128.86 | 138.58 | 115.49 | 153.98 | 169.10 | 126.83 | 197.29 | 122.35 | 183.53 | 183.53 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | ||
flt1-prod1 | 1.000 | |||||||||||||||
flt1-prod2 | 0.801 | |||||||||||||||
flt1-prod3 | 0.391 | |||||||||||||||
flt1-prod4 | 0.233 | |||||||||||||||
flt1-prod5 | 0.055 | |||||||||||||||
flt2-prod1 | 10 | 15 | 11 | 14 | 15.03 | 12.12 | 13.04 | 10.87 | 14.49 | 15.91 | 11.93 | 18.56 | 11.51 | 17.27 | 17.27 | 1.000 |
flt2-prod2 | 11 | 6 | 11 | 8 | 14.35 | 11.48 | 10.45 | 8.71 | 11.61 | 12.75 | 9.56 | 14.88 | 9.23 | 13.84 | 13.84 | 0.801 |
flt2-prod3 | 5 | 6 | 1 | 11 | 2.87 | 3.59 | 6.88 | 3.44 | 5.41 | 6.22 | 4.67 | 7.26 | 4.50 | 6.75 | 6.75 | 0.391 |
flt2-prod4 | 4 | 4 | 4 | 1 | 4.31 | 2.87 | 1.47 | 2.46 | 4.42 | 3.43 | 2.29 | 3.43 | 2.68 | 4.02 | 4.02 | 0.233 |
flt2-prod5 | 0 | 2 | 0 | 0 | 0.72 | 0.00 | 0.49 | 1.47 | 0.00 | 1.14 | 1.14 | 1.91 | 0.63 | 0.95 | 0.95 | 0.055 |
flt3-prod1 | 20 | 30 | 22 | 28 | 30.07 | 24.25 | 26.08 | 21.73 | 28.98 | 31.82 | 23.87 | 37.12 | 23.02 | 34.54 | 34.54 | 2.000 |
flt3-prod2 | 22 | 12 | 22 | 16 | 28.71 | 22.97 | 20.90 | 17.42 | 23.22 | 25.50 | 19.13 | 29.75 | 18.45 | 27.68 | 27.68 | 1.603 |
flt3-prod3 | 10 | 12 | 2 | 22 | 5.74 | 7.18 | 13.76 | 6.88 | 10.81 | 12.44 | 9.33 | 14.52 | 9.00 | 13.51 | 13.51 | 0.782 |
flt3-prod4 | 8 | 8 | 8 | 2 | 8.61 | 5.74 | 2.95 | 4.92 | 8.85 | 6.86 | 4.57 | 6.86 | 5.36 | 8.04 | 8.04 | 0.465 |
flt3-prod5 | 0 | 4 | 0 | 0 | 1.44 | 0.00 | 0.98 | 2.95 | 0.00 | 2.29 | 2.29 | 3.81 | 1.26 | 1.89 | 1.89 | 0.110 |
128.57 | 141.43 | 115.71 | 145.71 | 159.79 | 128.86 | 138.58 | 115.49 | 153.98 | 169.10 | 126.83 | 197.29 | 122.35 | 183.53 | 183.53 |
For flights 1 and 2 we get the same estimated primary demand and preference weights as in Table 10, since we duplicated that example over the time horizon. For flight 3 we estimate twice the primary demand and preference weights, which was expected, since we artificially doubled the numbers. The method is consistent, and can handle non-homogeneous product set in a mathematically formal way.
It is interesting to note here that the EM algorithm of Vulcano et al. (2012) cannot distinguish between product not being available for sale () as opposed to not being part of the product set (). Because of this, naive application of the EM algorithm would estimate primary demand for non-existing products of flight 1 and 2, grossly overestimating the demand. The total estimated arrival rate is , while using the extended EM algorithm we get .
It is also interesting to mention that using the extended EM algorithm with we get , and just like before, the total primary demand at time period is equal to the observed purchases. It is easy to show in general that in case it follows that . If we apply the algorithm with other values of we observe a linear decrease of total estimated arrival rate as a function of . The model induced market share of the retailer increases as increases which decreases the estimated demand.
Algorithm 2 is a simple but yet powerful extension of the EM algorithm which can natively handle non-homogenous product set, and can control the availability of the outside alternative by a simple parameter. The price of this extension is that in the M-step we need to solve for the roots of a system of nonlinear equations, instead of having a closed form solution. However, we can use a simple fixed point iteration as the M-step.
4 Constrained optimization
In this section we develop algorithms to solve the estimation problem with constrained arrival rates (Section 2.3). We will extend the minorization-maximization (MM) algorithm of Abdallah and Vulcano (2016) and also present a solution utilizing the Frank-Wolfe algorithm. We will also incorporate partial availability, non-homogeneous product set, and the ability to control the availability of outside alternative into the model, and show how to estimate the parameters using iterative algorithms.
Consider the incomplete log-likelihood
(14) |
which is an extension of the log-likelihood in Abdallah and Vulcano (2016) with partial availability, as discussed in Section 2.1.1. In case when , the log-likelihood simplifies to the one considering only fully open and closed assortments. Note that we are using preference weights in the model, but we could express the model in the utility space by using .
The constrained optimization problem we need to solve is given by
(15) | |||||
s.t. | |||||
where is an upper bound on the arrival rate at time . The motivation behind putting an upper bound on the arrival rates was explained in Section 2.3, where we discussed two specific ways of constraining the arrival rates. To solve the constrained maximum likelihood estimation problem, we follow the idea in Abdallah and Vulcano (2016). We express the optimization problem as a function of by applying part of the Karush–Kuhn–Tucker (KKT) conditions on the Lagrangian function, removing from the problem. The Lagrangian function of (15) becomes
(16) |
Note that we used a simple re-scaling ( for time ) of the Lagrange multipliers . After we apply the KKT conditions to remove from the problem, and some algebra (see Appendix), the problem becomes
where is defined in , and represents the set of time periods where the upper bound constraints on the arrival rates are binding, that is
(18) |
Through the definitions of and market share constraint we can incorporate non-homogeneous product set in the model, and the ability to control the availability of outside alternative. These details were discussed in Sections 2.4 and 2.5. We will consider two formulations here. In the first one we use market share constraint
(19) |
which is equivalent to the aggregate constraint (11) discussed in Section 2.5, with using . We assume that OA preference weights are known, and we can use standard scaling . In Section 4.1 we will show how to use the MM algorithm to solve this problem.
In the second formulation we incorporate market share constraint (10) into the objective function and constrain the preference weights by using
(20) | |||||
This first equation removes from the problem, while the second equation avoids having multiple solutions by rescaling . In Section 4.2 we will show how the Frank-Wolfe method will lead to a simple coordinate descent algorithm to solve this problem.
4.1 Solution using MM algorithm
In this section we present a solution to the optimization problem (15) with constraint (19) using the MM algorithm, building on Abdallah and Vulcano (2016). The idea behind MM algorithms is to find a surrogate function that minorizes the original objective function, maximize the surrogate function, and continue this iteratively. For more information on the MM algorithm, in general, see Hunter and Lange (2000a, 2004).
After removing from the problem using the KKT conditions, we arrive to (4). If we rearrange the last term of the objective to aid the minorization, the optimization problem becomes
s.t. | ||||
where are known. A common technique to find a minorizer is to use supporting hyperplanes (Hunter and Lange, 2004). Since the second, third, and fourth terms in (4.1) are convex, we can use first-order Taylor approximation to the convex functions and , that is, for all
In our specific case we will use
with and or and , where is the value of at iteration . Therefore, the minorizer function becomes
where contains the constant terms independent of . Although the minorizer is developed locally for fixed , it can be shown to be globally dominated by the . Computational results show that it is actually a fairly tight minorizer as well. We now need to solve a single market share constraint optimization problem:
s.t. | ||||
Since the objective function is concave, the constraint is convex, and we are maximizing, there exists a single scalar such that the first order optimality condition holds for Lagrangian
The first order optimality condition is
which leads to the MM update of summarized in Algorithm 3. In the update step we use Newton’s method to find in every MM iteration, summarized in Algorithm 4. For more details, please see Appendix.
4.2 Solution using Frank-Wolfe method
In this section we present a solution to the optimization problem (15) with constraint (20) using the Frank-Wolfe algorithm. The Frank-Wolfe, or conditional gradient method is an iterative optimization algorithm for constrained convex optimization, where in each iteration, it considers a linear approximation of the objective function, and moves towards a minimizer of this linear function. For more information on the Frank-Wolfe algorithm, see Frank and Wolfe (1956) and Bertsekas (1999).
After removing from the problem using the KKT conditions, we arrive to (4). Therefore, we need to solve the constrained optimization problem
s.t. | (24) | ||||
where we plug in to include the market share constraints into the objective function.
The first step of the Frank-Wolfe algorithm is the direction finding subproblem, which becomes
(25) | |||||
s.t. | |||||
where is the gradient vector of the objective function (4.2) evaluated at the solution of iteration . The elements of are calculated as
where | ||||
For detailed derivation and computational formula, please see Appendix. The direction finding subproblem in (25) is a fractional knapsack problem (Korte and Vygen, 2012), which solution becomes
We take a step in the direction of the maximum element of the gradient, only changing that variable in the current step. The Frank-Wolfe algorithm reduces to a coordinate descent algorithm, successively maximizing along coordinate directions determined by the largest value of the gradient vector. The update step of Frank-Wolfe is
which simplifies to
For step size we can use the default choice or perform a line search to find that minimizes subject to . In practice, we implemented a backtracking linesearch using the Armijo’s rule (Nocedal and Wright, 2006). The Frank-Wolfe algorithm is summarized in Algorithm 5, while the backtracking linesearch is presented in Algorithm 6.
4.3 Example: non-homogeneous product set with constraint
To demonstrate Algorithms 3 and 5, let us revisit the example presented in Table 8 by adding constraints to the problem. We constrain the arrival rates to be less than twice the observed sales, that is . We use , assuming that the OA product is always available, and set for the MM algorithm. Note that , since we have fully open and closed assortments. The results, using the MM algorithm, are presented in Table 13.
Estimates | Period | |||||||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ||
flt1-prod1 | 10.41 | 11.45 | 9.37 | 11.80 | 12.47 | 10.06 | 8.74 | 7.29 | 9.72 | 5.83 | 4.37 | 6.80 | 0.97 | 1.46 | 1.46 | 1.000 |
flt1-prod2 | 9.40 | 10.34 | 8.46 | 10.65 | 11.26 | 9.08 | 7.89 | 6.58 | 8.77 | 5.26 | 3.95 | 6.14 | 0.88 | 1.32 | 1.32 | 0.903 |
flt1-prod3 | 5.11 | 5.62 | 4.60 | 5.79 | 6.12 | 4.94 | 4.29 | 3.58 | 4.77 | 2.86 | 2.15 | 3.34 | 0.48 | 0.72 | 0.72 | 0.491 |
flt1-prod4 | 3.71 | 4.08 | 3.33 | 4.20 | 4.44 | 3.58 | 3.11 | 2.59 | 3.46 | 2.07 | 1.56 | 2.42 | 0.35 | 0.52 | 0.52 | 0.356 |
flt1-prod5 | 1.38 | 1.52 | 1.24 | 1.56 | 1.65 | 1.33 | 1.16 | 0.97 | 1.29 | 0.77 | 0.58 | 0.90 | 0.13 | 0.19 | 0.19 | 0.133 |
flt2-prod1 | 1.000 | |||||||||||||||
flt2-prod2 | 0.903 | |||||||||||||||
flt2-prod3 | 0.491 | |||||||||||||||
flt2-prod4 | 0.356 | |||||||||||||||
flt2-prod5 | 0.133 | |||||||||||||||
flt3-prod1 | 20.82 | 22.90 | 18.74 | 23.59 | 24.94 | 20.11 | 17.49 | 14.57 | 19.43 | 11.66 | 8.74 | 13.60 | 1.94 | 2.91 | 2.91 | 2.000 |
flt3-prod2 | 18.79 | 20.67 | 16.91 | 21.30 | 22.52 | 18.16 | 15.79 | 13.16 | 17.54 | 10.52 | 7.89 | 12.28 | 1.75 | 2.63 | 2.63 | 1.806 |
flt3-prod3 | 10.22 | 11.24 | 9.20 | 11.58 | 12.24 | 9.87 | 8.58 | 7.15 | 9.54 | 5.72 | 4.29 | 6.68 | 0.95 | 1.43 | 1.43 | 0.982 |
flt3-prod4 | 7.41 | 8.15 | 6.67 | 8.40 | 8.88 | 7.16 | 6.22 | 5.19 | 6.92 | 4.15 | 3.11 | 4.84 | 0.69 | 1.04 | 1.04 | 0.712 |
flt3-prod5 | 2.76 | 3.03 | 2.48 | 3.13 | 3.31 | 2.67 | 2.32 | 1.93 | 2.57 | 1.54 | 1.16 | 1.80 | 0.26 | 0.39 | 0.39 | 0.265 |
128.57 | 141.43 | 115.71 | 145.71 | 154.04 | 124.22 | 108.00 | 90.00 | 120.00 | 72.00 | 54.00 | 84.00 | 12.00 | 18.00 | 18.00 |
16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | ||
flt1-prod1 | 1.000 | |||||||||||||||
flt1-prod2 | 0.903 | |||||||||||||||
flt1-prod3 | 0.491 | |||||||||||||||
flt1-prod4 | 0.356 | |||||||||||||||
flt1-prod5 | 0.133 | |||||||||||||||
flt2-prod1 | 10.41 | 11.45 | 9.37 | 11.80 | 12.47 | 10.06 | 8.74 | 7.29 | 9.72 | 5.83 | 4.37 | 6.80 | 0.97 | 1.46 | 1.46 | 1.000 |
flt2-prod2 | 9.40 | 10.34 | 8.46 | 10.65 | 11.26 | 9.08 | 7.89 | 6.58 | 8.77 | 5.26 | 3.95 | 6.14 | 0.88 | 1.32 | 1.32 | 0.903 |
flt2-prod3 | 5.11 | 5.62 | 4.60 | 5.79 | 6.12 | 4.94 | 4.29 | 3.58 | 4.77 | 2.86 | 2.15 | 3.34 | 0.48 | 0.72 | 0.72 | 0.491 |
flt2-prod4 | 3.71 | 4.08 | 3.33 | 4.20 | 4.44 | 3.58 | 3.11 | 2.59 | 3.46 | 2.07 | 1.56 | 2.42 | 0.35 | 0.52 | 0.52 | 0.356 |
flt2-prod5 | 1.38 | 1.52 | 1.24 | 1.56 | 1.65 | 1.33 | 1.16 | 0.97 | 1.29 | 0.77 | 0.58 | 0.90 | 0.13 | 0.19 | 0.19 | 0.133 |
flt3-prod1 | 20.82 | 22.90 | 18.74 | 23.59 | 24.94 | 20.11 | 17.49 | 14.57 | 19.43 | 11.66 | 8.74 | 13.60 | 1.94 | 2.91 | 2.91 | 2.000 |
flt3-prod2 | 18.79 | 20.67 | 16.91 | 21.30 | 22.52 | 18.16 | 15.79 | 13.16 | 17.54 | 10.52 | 7.89 | 12.28 | 1.75 | 2.63 | 2.63 | 1.806 |
flt3-prod3 | 10.22 | 11.24 | 9.20 | 11.58 | 12.24 | 9.87 | 8.58 | 7.15 | 9.54 | 5.72 | 4.29 | 6.68 | 0.95 | 1.43 | 1.43 | 0.982 |
flt3-prod4 | 7.41 | 8.15 | 6.67 | 8.40 | 8.88 | 7.16 | 6.22 | 5.19 | 6.92 | 4.15 | 3.11 | 4.84 | 0.69 | 1.04 | 1.04 | 0.712 |
flt3-prod5 | 2.76 | 3.03 | 2.48 | 3.13 | 3.31 | 2.67 | 2.32 | 1.93 | 2.57 | 1.54 | 1.16 | 1.80 | 0.26 | 0.39 | 0.39 | 0.265 |
128.57 | 141.43 | 115.71 | 145.71 | 154.04 | 124.22 | 108.00 | 90.00 | 120.00 | 72.00 | 54.00 | 84.00 | 12.00 | 18.00 | 18.00 |
We observe the expected symmetry in the results, that is the estimated demands and preference weights are same for flights 1 and 2, and twice for flight 3. We can see that the estimated value of is equal to the upper bound for time periods 7-15 and 22-30. Using the Frank-Wolfe algorithm, we converge to the same solution. Note that Frank-Wolfe is solving a problem with different constraints, but for this symmetric example and in the case of the two different formulations are equivalent. It is also interesting to note that using a built-in nonlinear optimization routine in R (sequential quadratic programming), we can solve the problem in 141 iterations, taking 2.53 seconds, converging to the same solution. The Frank-Wolfe algorithm with Armijo’s rule converges in 267 iterations taking 0.12 second, and the MM algorithm converges in 11 iterations taking only 0.006 seconds. The convergence of the Frank–Wolfe algorithm is sublinear, in general, and using the default step size results in even much slower convergence. Note, however, that the Frank-Wolfe algorithm solves an optimization problem with market share constraint at each time period substituted into the objective function. For this problem we were not able to develop the MM algorithm due to the difficulty of finding a suitable minorizer. The formulation of the MM algorithm uses an aggregate market share constraint, where we assume that are known. The solutions are not equivalent, in general.
5 Conclusions and Future Research
In this paper we discussed some of the practical limitations of the standard choice-based demand models used in the literature to estimate demand from sales transaction data. We presented modifications and extensions of the models and discussed data preprocessing and solution techniques which can be useful for practitioners dealing with sales transaction data. We hope that these discussions could facilitate further methodological progress and even more rigorous theoretical research in this domain. We presented an algorithm to split sales transaction data observed under partial availability, and we extended an EM algorithm for the case we observe a non-homogeneous product set. We developed two iterative optimization algorithms which incorporated partial availability information, non-homogeneous product set, ability to control the availability of outside alternative, and an upper bound on the arrival rates. In one formulation we used market share constraint at each time period, and incorporated them into the objective function through the preference weights of the outside alternative. This formulation was solved using the Frank-Wolfe algorithm, leading to a simple coordinate descent algorithm. We discussed another formulation, which used a single, aggregate market share constraint over the time horizon, and assumed the knowledge of preference weights of the outside alternative. Using this formulation we could develop a very fast, iterative minorization-maximization algorithm building on the work in Abdallah and Vulcano (2016).
Future extension of these methods are possible. For instance, after using the sales splitting algorithm, it would be interesting to group the arrival rates in the EM algorithm and avoid having too many parameters to be estimated. Similarly, it would be of practical interest to extend the EM algorithm to the constrained optimization case or introduce other regularization on the parameters. The MM and the Frank-Wolfe algorithms developed in this paper could be extended to include covariates and additional preference weights for the product with lowest fare. The Frank-Wolfe algorithm could be further improved by taking the gradient descent direction instead of coordinate descent. Finally, in practical applications, we often encounter sparse demand distribution with excess number of zeros and heavy tails. A natural extension of the currently used models would be to use a zero-inflated Poisson or negative binomial distribution to describe the customer arrival process.
6 Appendix
Using KKT conditions to remove
The KKT condition of Lagrangian (15) are
Expressing and in first equation results in
Plugging back , the complementary slackness condition becomes
Therefore, we have that one of the following conditions should hold:
So, the partial KKT condition stated earlier reduces to the following conditions:
If then
else
Let us define The Lagrangian function can be simplified to
where
(26) |
Note that
for some constant , because when .
Newton’s method to find
To simplify notation, let us define
where , represents the number of times product is in the offer set over time horizon , and represents the proportion of time product is available over time horizon . Using the notation above, equation (4.1) simplifies to
Expressing in the above equation and plugging it into the market share constraint (19) leads to
and finally to
where we used general equalities
This leads to
and
and to Newton’s method summarized in Algorithm (4).
Computation of gradient in Frank-Wolfe algorithm
Here we will derive the gradient vector of the objective function (4.2), and show a computational formula. The elements of can be derived as
To implement the gradient with matrix-vector operations let us define
parameter to control availability of outside alternative |
Then
where , , and denote the elementwise subtraction, addition, and square operations.
Acknowledgments
The authors would like to gratefully acknowledge Guillermo Gallego who suggested the idea of splitting the sales in case of partial availability.
References
- (1)
- Abdallah and Vulcano (2016) Abdallah, T. and Vulcano, G. (2016), Demand estimation under the multinomial logit model from sales transaction data. Working Paper. Available at https://www.researchgate.net/publication/303408073.
- Ben-Akiva and Lerman (1994) Ben-Akiva, M. and Lerman, S. (1994), Discrete Choice Analysis: Theory and Applications to Travel Demand, 6 edn, The MIT Press, Cambridge, MA.
- Bertsekas (1999) Bertsekas, D. (1999), Nonlinear Programming, Athena Scientific.
- Cao et al. (2019) Cao, Y., Kleywegt, A. and Wang, H. (2019), Network revenue management under a spiked multinomial logit choice model. Working Paper. Available at SSRN:https://ssrn.com/abstract=3200531.
- Dai et al. (2014) Dai, J., Ding, W., Kleywegt, A., Wang, X. and Zhang, Y. (2014), Choice-based revenue management for parallel flights. Working paper, Georgia Institute of Technology.
- Frank and Wolfe (1956) Frank, M. and Wolfe, P. (1956), ‘An algorithm for quadratic programming’, Naval Research Logistics Quarterly 3, 95–110.
- Gallego et al. (2015) Gallego, G., Ratliff, R. and Shebalov, S. (2015), ‘A general attraction model and sales-based linear program for network revenue management under customer choice’, Operations Research 63(1), 212 – 232.
- Hunter and Lange (2000a) Hunter, D. R. and Lange, K. (2000a), ‘Rejoinder to discussion of “optimization transfer using surrogate objective functions”’, Journal of Computational and Graphical Statistics 9, 52–59.
- Hunter and Lange (2004) Hunter, D. R. and Lange, K. (2004), ‘A tutorial on MM algorithms’, The American Statistician 58(1), 30–37.
- Korte and Vygen (2012) Korte, B. H. and Vygen, J. (2012), Combinatorial Optimization: Theory and Algorithms, Springer-Verlag, New York, NY.
- Nocedal and Wright (2006) Nocedal, J. and Wright, S. J. (2006), Numerical Optimization, Springer Series in Operations Research and Financial Engineering, 2 edn, Springer, New York.
- Sharif Azadeh et al. (2014) Sharif Azadeh, S., Marcotte, P. and Savard, G. (2014), ‘A taxonomy of demand uncensoring methods in revenue management’, Journal of Revenue and Pricing Management 13, 440–456.
- Train (2003) Train, K. (2003), Discrete choice methods with simulation, Cambridge University Press, New York, NY.
- Vulcano et al. (2012) Vulcano, G., van Ryzin, G. and Ratliff, R. (2012), ‘Estimating primary demand for substitutable products from sales transaction data’, Operations Research 60(2), 313–334.
- Talluri and van Ryzin (2004) Talluri, K. and van Ryzin, G. (2004), ‘Revenue Management Under a General Discrete Choice Model of Consumer Behavior’, Management Science 50(1), 15–33.