This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Online Stochastic Allocation of Reusable Resources

Xilin Zhang and Wang Chi Cheung Both authors are with Department of Industrial and Systems Engineering, Faculty of Engineering, National University of Singapore. (emails: [email protected] and [email protected])
Abstract

We study a multi-objective model on the allocation of reusable resources under model uncertainty. Heterogeneous customers arrive sequentially according to a latent stochastic process, request for certain amounts of resources, and occupy them for random durations of time. The decision maker’s goal is to simultaneously maximize multiple types of rewards generated by the customers, while satisfying the resource capacity constraints in each time step. We develop models and algorithms for deciding on the allocation actions. We show that when the usage duration is relatively small compared with the length of the planning horizon, our policy achieves 1O(ϵ)1-O(\epsilon) fraction of the optimal expected rewards, where ϵ\epsilon decays to zero at a near optimal rate as the resource capacities grow.

I Introduction

In the online optimization framework, information is revealed sequentially in time. The decisions are made without knowledge of the future information, but they can depend on past observations. In this work, we study online optimization algorithms in reusable resource allocation applications, where a resource unit is returned to the system after a period of usage duration, and can be further assigned to another customer. The decision-maker (DM) assigns limited inventories of reusable resources to sequentially arriving customers. In each time step, the DM’s decision leads to a set of allocation outcomes, consisting of the amounts of rewards earned, the amounts of resources consumed and the usage durations of the assigned resources. Our model captures a diversity of real-life applications include hotel booking, rental of cars and fashion items and cloud computing services. Our problem instance incorporates the following features:

  1. 1.

    Multiple objectives. The DM’s goal is to maximize multiple types of rewards.

  2. 2.

    Customer heterogeneity. The customers are associated with different customer types.

  3. 3.

    Online setting. In each time step, the arriving customer’s type is drawn independently and identically from an unknown probability distribution.

  4. 4.

    Reusability. Each type of resource is endowed with a stochastic usage duration, whose probability distribution is known to the DM. However, the DM does not know the the realized usage duration of an allocation before the resource is returned.

Features 1-3 are shared by both non-reusable and reusable resource allocation problems, while feature 4 is a distinct feature of reusable resource allocation problems. Without considering feature 4, our problem reduces to the non-reusable setting as in [1]. On feature 3, we remark that in many applications, given a customer type, its mean allocation outcome is accessible by machine learning (ML) approaches in a data-efficient manner ([2, 3, 4]). In many existing resource allocation research ([21, 5, 6, 7, 8]), the mean allocation outcomes are assumed to be prior-knowledge acquired through ML models. However, in applications where customer types are represented as high-dimensional feature vectors, the number of types can be exponential in the dimension of the feature vectors or even unbounded. Such a curse of dimensionality hinders the estimation on the proportion of each customer type. Therefore, we treat the probability distribution of each customer type as the unknown object. We further remark on feature 4 that our usage duration is defined in the same manner as [6] and [9], which are recent works on offline reusable resource allocation problems. We highlight that the probability distribution of each usage duration can be arbitrarily defined, which means our result does not depend on specific structures of certain usage distributions, such as the exponential distribution.

Traditional resource allocation problems [10, 11] concern the allocation of non-reusable resources. Online algorithms for allocating non-reusable resources have been extensively studied in [12, 13, 14, 1, 15, 16, 17]. These algorithms involve adaptive weighing processes that balance the trade-off between the rewards earned and the resources consumed. Most of their analysis largely depend on the monotonically-decreasing inventories. However, in our reusable model, the effect of an allocation may be different for each future time step, contingent on whether the allocated resources are returned, causing fluctuating resource consumption amount across consecutive time steps.

To our knowledge, this is the first paper to address reusable resource allocation problems in an online stochastic setting and demonstrate a near-optimal performance guarantee. Some studies focus on assortment planning problems in adversarial settings ([20, 18, 19]), and achieve non-trivial competitive ratios. Offline pricing and assortment planning problems have been studied in [6, 9, 20], where near-optimality is achieved in the form of approximation ratios under full model certainty. The main contribution of our paper can be summarized as follows.

  • Model generality. We propose a general reusable resource allocation model which allows for various decision settings (such as admission control, matching, pricing and assortment planning), multiple objectives (such as revenue, market share and service level), and large numbers of customer types or allocation types (the algorithm’s performance is independent of these sizes).

  • Near-optimal algorithm performance. We develop an adaptive weighing algorithm that trades-off among not only the resources occupied and the rewards earned, but also the usage durations. In the regime where each usage duration is short compared with the length of the planning horizon, our algorithm achieves matching near-optimal performance as the online non-reusable resource setting ([1]), as well as the the state-of-art offline reusable setting ([20]).

The remainder of paper is organized as follows. In Section II, we present our model and highlight some of its applications. An online algorithm and the corresponding performance analysis are proposed in Section III. In Section IV, numerical experiments are provided.

II Model

Notation. The reward types and the resource types are respectively indexed by two finite sets r\mathcal{I}_{r} and c\mathcal{I}_{c}. A generic reward type or resource type is denoted as ii. For each ici\in\mathcal{I}_{c}, the DM has ci>0c_{i}\in\mathbb{R}_{>0} units of resource ii for allocation. Each customer is associated with a customer type j𝒥j\in\mathcal{J}, which reflects the customer’s features. We denote the set of possible allocation actions as 𝒦\mathcal{K}, and each element k𝒦k\in\mathcal{K} as an action. The action set can model a broad range of decisions, which is elaborate in the end of Section II.

The DM allocates the resources in TT discrete time steps. In time step t{1,,T}t\in\{1,\ldots,T\}, at most one customer arrives. We denote the customer type of the arrival at time tt as j(t)j(t). In particular, we designate the type jnull𝒥j_{\textsf{null}}\in\mathcal{J} to represent the case of no arrival. We assume that j(1),,j(T)j(1),\ldots,j(T) are independently and identically distributed (i.i.d.) random variables over 𝒥\mathcal{J}. We denote pj=Pr(j(1)=j)p_{j}=\Pr(j(1)=j), and 𝐩={pj}j𝒥\mathbf{p}=\{p_{j}\}_{j\in\mathcal{J}}.

When a customer (denote his type as jj) arrives, the DM chooses an action k𝒦k\in\mathcal{K}. The choice leads to an array of stochastic outcomes, consisting of the amount of rewards earned Wjk=(Wijk)irW_{jk}=(W_{ijk})_{i\in\mathcal{I}_{r}}, the amount of resources occupied Ajk=(Aijk)icA_{jk}=(A_{ijk})_{i\in\mathcal{I}_{c}}, and the usage durations {Di}ic\{D_{i}\}_{i\in\mathcal{I}_{c}}. For the no arrival customer type jnullj_{\textsf{null}}, we stipulate that Pr(Wi,jnull,k=Ai,jnull,k=0)=1\Pr(W_{i^{\prime},j_{\textsf{null}},k}=A_{i,j_{\textsf{null}},k}=0)=1 for all ir,ic,k𝒦i^{\prime}\in\mathcal{I}_{r},i\in\mathcal{I}_{c},k\in\mathcal{K}, since there should be no reward earned and no resource occupied in the case of no arrival. To ensure feasibility in our resource constrained model, we assume that there exists a null action knull𝒦k_{\textsf{null}}\in\mathcal{K} that satisfies Pr(Wi,j,knull=Ai,j,knull=0)=1\Pr(W_{i^{\prime},j,k_{\textsf{null}}}=A_{i,j,k_{\textsf{null}}}=0)=1 for all ir,ic,j𝒥i^{\prime}\in\mathcal{I}_{r},i\in\mathcal{I}_{c},j\in\mathcal{J}. Selecting the null action is equivalent to rejecting a customer.

For each j,kj,k, the stochastic outcomes follow the joint distribution 𝒪jk\mathcal{O}_{jk}, namely (Wjk,Ajk)𝒪jk(W_{jk},A_{jk})\sim\mathcal{O}_{jk}. We allow Wjk,AjkW_{jk},A_{jk} to be arbitrarily correlated. For each ici\in\mathcal{I}_{c}, the random usage duration DiD_{i} is independent of Wjk,AjkW_{jk},A_{jk}. This assumption is also made in related works on offline reusable resource allocation, such as [6] and [9], since the usage duration reflects more of a customer’s intrinsic needs on each resource. We assume that Wijk[0,wmax]W_{ijk}\in[0,w_{\max}] for each ir,j𝒥,k𝒦i\in\mathcal{I}_{r},j\in\mathcal{J},k\in\mathcal{K}, Aijk[0,amax]A_{ijk}\in[0,a_{\max}] almost surely for each ic,j𝒥,k𝒦i\in\mathcal{I}_{c},j\in\mathcal{J},k\in\mathcal{K}, and Di{0,1,,dmax}D_{i}\in\{0,1,\ldots,d_{\max}\} almost surely for each ici\in\mathcal{I}_{c}. Additionally, we denote wijk=𝔼[Wijk]w_{ijk}=\mathbb{E}[W_{ijk}], aijk=𝔼[Aijk]a_{ijk}=\mathbb{E}[A_{ijk}], and di=𝔼[Di]d_{i}=\mathbb{E}[D_{i}].

Model uncertainty and dynamics. We assume that the DM knows the horizon length TT, the values of wmaxw_{\max}, amaxa_{\max}, dmaxd_{\max}, as well as Pr(Dit)\Pr(D_{i}\geq t) for every ic,t{1,,T}i\in\mathcal{I}_{c},t\in\{1,\ldots,T\}. However, the DM does not know the probability distribution 𝐩\mathbf{p} over customer types. At each time step t{1,,T}t\in\{1,\ldots,T\}, the DM observes the type j(t)𝐩j(t)\sim\mathbf{p} of the arriving customer, and the mean outcomes {(wj(t),k,aj(t),k)}k𝒦\{(w_{j(t),k},a_{j(t),k})\}_{k\in\mathcal{K}} specific to the type j(t)j(t). Then, the DM chooses an action k(t)𝒦k(t)\in\mathcal{K}, and observes the stochastic outcomes of rewards {Wi,j(t),k(t)(t)}ir\{W_{i,j(t),k(t)}(t)\}_{i\in\mathcal{I}_{r}} and resources {Ai,j(t),k(t)(t)}ic\{A_{i,j(t),k(t)}(t)\}_{i\in\mathcal{I}_{c}} at time tt. Our model uncertainty scenario included the case when the DM knows the mean outcomes aijk,wijka_{ijk},w_{ijk} in advance. For example, the DM could have estimates on aijk,wijk,Pr(Dit)a_{ijk},w_{ijk},\Pr(D_{i}\geq t) by constructing supervised learning models [2, 3, 4] on a pool of customer demand data.

An integer programming formulation. We let binary decision variables Xkπ(t)X^{\pi}_{k}(t) be the DM’s decision under a non-anticipatory algorithm π\pi, where Xkπ(t)=1X^{\pi}_{k}(t)=1 if action kk is taken at time tt, and Xkπ(t)=0X^{\pi}_{k}(t)=0 otherwise. Under a non-anticiaptory algorithm, {Xkπ(t)}k𝒦\{X^{\pi}_{k}(t)\}_{k\in\mathcal{K}} depends on historical observations {j(s)}s=1t{Wi,j(s),k(s)(s)}ir,1st1{Ai,j(s),k(s)(s)}ic,1st1\{j(s)\}^{t}_{s=1}\cup\{W_{i,j(s),k(s)}(s)\}_{i\in\mathcal{I}_{r},1\leq s\leq t-1}\cup\{A_{i,j(s),k(s)}(s)\}_{i\in\mathcal{I}_{c},1\leq s\leq t-1}. The DM aims to maximize 𝔼[minirt=1Tk𝒦Wi,j(t),k(t)Xkπ(t)]\mathbb{E}[\min_{i\in\mathcal{I}_{r}}\sum^{T}_{t=1}\sum_{k\in\mathcal{K}}W_{i,j(t),k}(t)X^{\pi}_{k}(t)], which achieves the simultaneous maximization of all the reward types by ensuring max-min fairness. Here we maximize the rewards to keep in line with the resource allocation literature instead of minimize the regret as in the classical online convex optimization literature, but we remark that they are essentially eqivalent. For each ici\in\mathcal{I}_{c} and t{1,,T}t\in\{1,\ldots,T\}, we require that the resource constraint

τ=1tk𝒦𝟏(Di(τ)tτ+1)Ai,j(τ),k(τ)Xkπ(τ)ci\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\mathbf{1}(D_{i}(\tau)\geq t-\tau+1)A_{i,j(\tau),k}(\tau)X^{\pi}_{k}(\tau)\leq c_{i} (1)

holds with certainty. The left hand side in (1) represents the amount of type ii resources occupied at time step tt. Our goal can be formulated as the following binary integer program

(IP-C) maxnon-anticipatory π𝔼[λ^]\displaystyle\text{(IP-C) }\max\limits_{\text{non-anticipatory }\pi}~{}\mathbb{E}[\hat{\lambda}]
s.t.t=1Tk𝒦Wi,j(t),k(t)Xkπ(t)Tλ^ir\displaystyle\text{s.t.}~{}\sum^{T}_{t=1}\sum_{k\in\mathcal{K}}W_{i,j(t),k}(t)X^{\pi}_{k}(t)\geq T\hat{\lambda}\quad\forall i\in\mathcal{I}_{r}
τ=1tk𝒦𝟏(Di(τ)tτ+1)Ai,j(τ),k(τ)Xkπ(τ)ci\displaystyle\quad\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\mathbf{1}(D_{i}(\tau)\geq t-\tau+1)A_{i,j(\tau),k}(\tau)X^{\pi}_{k}(\tau)\leq c_{i}
ic,t{1,T}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\forall i\in\mathcal{I}_{c},~{}t\in\{1,\ldots T\}
k𝒦Xkπ(t)=1t[T]\displaystyle\quad\sum_{k\in\mathcal{K}}X^{\pi}_{k}(t)=1\quad\forall~{}t\in[T]
Xkπ(t){0,1}k𝒦,t[T].\displaystyle\quad X^{\pi}_{k}(t)\in\{0,1\}\quad\forall k\in\mathcal{K},~{}t\in[T].

We remark that the term 𝟏(Di(τ)tτ+1)\mathbf{1}(D_{i}(\tau)\geq t-\tau+1) induces non-stationarity in resource consumption, since even when a DM selects the same action kk at τ1<τ2\tau_{1}<\tau_{2}, their amounts of resource consumption 𝟏(Di(τ1)tτ1+1)Ai,j(τ1),k(τ1),𝟏(Di(τ2)tτ2+1)Ai,j(τ2),k(τ2)\mathbf{1}(D_{i}(\tau_{1})\geq t-\tau_{1}+1)A_{i,j(\tau_{1}),k}(\tau_{1}),\mathbf{1}(D_{i}(\tau_{2})\geq t-\tau_{2}+1)A_{i,j(\tau_{2}),k}(\tau_{2}) at time tt are differently distributed. Existing works on non-reusable resources crucially hinges on model stationarity, which does not hold true in our setting.

A tractable benchmark. The goal of constructing a non-anticipatory algorithm that achieves the optimal value of (IP-C) is analytically intractable due to the curse of dimensionality. The intractability motivates us to consider an alternative linear program (LP), dubbed (LP-E), where the realization of the customer arrivals, their usage duration and outcomes exactly follow the expectation:

(LP-E) maxλ\displaystyle\text{(LP-E) }\max~{}\lambda
s.t.t=1Tj𝒥k𝒦pjwijkyjk(t)Tλir\displaystyle\text{s.t.}\sum^{T}_{t=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}w_{ijk}y_{jk}(t)\geq T\lambda\quad\forall i\in\mathcal{I}_{r}
τ=1tj𝒥k𝒦pjPr(Ditτ+1)aijkyjk(τ)ci\displaystyle\quad\sum^{t}_{\tau=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}\Pr(D_{i}\geq t-\tau+1)a_{ijk}y_{jk}(\tau)\leq c_{i}
ic,t{1,T}\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\forall i\in\mathcal{I}_{c},~{}t\in\{1,\ldots T\}
k𝒦yjk(t)1j𝒥,t{1,,T}\displaystyle\quad\sum_{k\in\mathcal{K}}y_{jk}(t)\leq 1\quad\forall j\in\mathcal{J},~{}t\in\{1,\ldots,T\}
yjk(t)0j𝒥,k𝒦,t[T].\displaystyle\quad y_{jk}(t)\geq 0\quad\forall j\in\mathcal{J},~{}k\in\mathcal{K},~{}t\in[T].

Define the optimal objective value of (LP-E) to be λ\lambda^{*}, and let the optimal objective of (IP-C) be λ^\hat{\lambda}^{*}. The following lemma shows that λ\lambda^{*} is a tractable upper bound for the expected reward of any online algorithms.

Lemma 1.

λ𝔼[λ^]\lambda^{*}\geq\mathbb{E}[\hat{\lambda}^{*}].

For the algorithm design, we further introduce a “steady state” benchmark, assuming the decision variables are invariant across time:

(LP-SS): maxxjkλ~\displaystyle\max\limits_{x_{jk}}~{}\tilde{\lambda}
s.t. j𝒥k𝒦pjwijkxjkλ~\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}w_{ijk}x_{jk}\geq\tilde{\lambda} ir\displaystyle\forall i\in\mathcal{I}_{r}
j𝒥k𝒦pjaijkdixjkci\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}a_{ijk}d_{i}x_{jk}\leq c_{i} ic\displaystyle\forall i\in\mathcal{I}_{c}
k𝒦xjk1\displaystyle\sum_{k\in\mathcal{K}}x_{jk}\leq 1 j𝒥\displaystyle\forall j\in\mathcal{J}
xjk0\displaystyle x_{jk}\geq 0 j𝒥,k𝒦.\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K}.

We denote an optimal solution of (LP-SS) as xjkx^{*}_{jk}, and the optimal value of (LP-SS) as λ~\tilde{\lambda}^{*}. We further define

γ=min{minic{ciamax},Tλ~wmax}.\gamma=\min\left\{\min_{i\in\mathcal{I}_{c}}\left\{\frac{c_{i}}{a_{\max}}\right\},\frac{T\tilde{\lambda}^{*}}{w_{\max}}\right\}.
Assumption 1.

There exists δ(0,1)\delta\in(0,1) and d¯(δ)T\bar{d}(\delta)\leq T such that t=d¯(δ)+1Pr(Dit)δ\sum^{\infty}_{t=\bar{d}(\delta)+1}\Pr(D_{i}\geq t)\leq\delta, i,j,k\forall i,j,k.

This assumption indicates that our algorithm does not apply to large DiD_{i}, say for non-reusable resources where Di=TD_{i}=T with certainty. In the next lemma, we show that λ~\tilde{\lambda}^{*} is close to λ\lambda^{*}.

Lemma 2.

(1δγ)(Tλd¯(δ)wmax)Tλ~Tλ\left(1-\frac{\delta}{\gamma}\right)\left(T\lambda^{*}-\bar{d}(\delta)w_{\max}\right)\leq T\tilde{\lambda}^{*}\leq T\lambda^{*}.

We remark that under a wide range of usage durations, we can use (LP-SS) as a benchmark to gauge the performance of our algorithm. For instance, for light tailed DiD_{i} (for example, there exists u>0u>0 such that limtPr(Dit)tu=0\lim_{t\rightarrow\infty}\Pr(D_{i}\geq t)t^{u}=0), we can take δ=ϵ/T\delta=\epsilon/T, d¯(δ)=dmaxlog(dmaxT/ϵ)\bar{d}(\delta)=d_{max}\log(d_{\max}T/\epsilon). If DiD_{i} has bounded support, i.e. Di[0,dUB]D_{i}\in[0,d_{\text{UB}}] almost surely, we can take δ=0\delta=0 and let d¯(δ)=dUB\bar{d}(\delta)=d_{\text{UB}}.

Applications. Before proceeding to our algorithm development, we highlight the generality of our model by discussing some of its applications, where the reward type set r\mathcal{I}_{r}, the customer type set 𝒥\mathcal{J} and the action set 𝒦\mathcal{K} can be chosen to model a variety of decisions.

  • Admission control. In this setting, the DM is to either admit or reject each arriving customer [21]. Real life examples include patient inflow control in an emergency department or an ICU. The admission control setting can be modeled by letting action set 𝒦={accept,reject}\mathcal{K}=\{\text{accept},\text{reject}\}. The reward of a resource is fixed at rir_{i} for ici\in\mathcal{I}_{c}. Upon taking an action kk for a type jj customer, an array of stochastic demands {Aijk}ic\{A_{ijk}\}_{i\in\mathcal{I}_{c}} is generated. Our model captures different reward settings. We list some of the examples: for simultaneously maximizing the revenue/social welfare for each type of resource, define r=c\mathcal{I}_{r}=\mathcal{I}_{c} and Wijk=riAijkW_{ijk}=r_{i}A_{ijk}. For maximizing the total revenue/social welfare of all resources, let r={1}\mathcal{I}_{r}=\{1\} be a singleton, and define W1jk=icriAijkW_{1jk}=\sum_{i\in\mathcal{I}_{c}}r_{i}A_{ijk}. For maximizing the service level of each resource, we define r=c\mathcal{I}_{r}=\mathcal{I}_{c} and Wijk=AijkW_{ijk}=A_{ijk}. We remark that multiple kinds of rewards can be modeled simultaneously.

  • Assortment Planning. In assortment planning problems, one unit of resource ii is associated with a fixed price rir_{i}. The DM influences the customers’ demands through offering different assortments of resources. Real life assortment planning examples with reusable resources include renting of fashion items and vehicles. Contingent upon the arrival of a customer, say of type jj, the DM decides the assortment k𝒦k\in\mathcal{K} to display, where 𝒦\mathcal{K} is a collection of subsets of c\mathcal{I}_{c} ([9, 22]). Let qijkq_{ijk} denote the probability for customer type jj to choose product ii in assortment kk. In the revenue management literature, the probability qijkq_{ijk} is modeled by a random utility choice model. The assortment planning problem (simultaneously maximizing revenue of each resource) can be incorporated in our model by setting r=c\mathcal{I}_{r}=\mathcal{I}_{c}, setting AijkA_{ijk} to be the Bernoulli random variable with mean qijkq_{ijk}, and setting Wijk=riAijkW_{ijk}=r_{i}A_{ijk}.

III Online algorithm and performance analysis

Main results. In this section, we propose a multi-stage adaptive weighing algorithm (dubbed Algorithm AA as in “Adaptive”, and displayed in Algorithm 1). We first provide our main results. We assume there exists ϵ\epsilon satisfying γ=Ω(log(||T/ϵ)/ϵ2)\gamma=\Omega(\log(|\mathcal{I}|T/\epsilon)/\epsilon^{2}), where =cr\mathcal{I}=\mathcal{I}_{c}\cup\mathcal{I}_{r}. Let

Wi(t)=k𝒦Wi,j(t),k(t)XkA(t)W_{i}(t)=\sum_{k\in\mathcal{K}}W_{i,j(t),k}(t)X^{A}_{k}(t)

denote the type iri\in\mathcal{I}_{r} reward achieved by Algorithm AA, our main result is shown in the following theorem.

Theorem 1.

Let ϵ>0\epsilon>0 be an arbitrary constant satisfying γ=Ω(log(||T/ϵ)/ϵ2)\gamma=\Omega(\log(|\mathcal{I}|T/\epsilon)/\epsilon^{2}). Without knowing 𝐩\boldsymbol{p},

t=1TWi(t)(1δγ)Tλ(1O(ϵ))d¯(δ)O~(ϵ)\displaystyle\sum^{T}_{t=1}W_{i}(t)\geq\left(1-\frac{\delta}{\gamma}\right)T\lambda^{*}(1-O(\epsilon))-\bar{d}(\delta)\tilde{O}(\epsilon)

for every iri\in\mathcal{I}_{r} with probability at least 1ϵ(1+ϵ)δ1-\epsilon(1+\epsilon)^{\delta}.

We remark that if DiD_{i} has bounded support, i.e. Di[0,dUB]D_{i}\in[0,d_{\text{UB}}] almost surely for all ici\in\mathcal{I}_{c}, then the above reward can be simplified as t=1TWi(t)Tλ(1O(ϵ))dUBO~(ϵ)\sum^{T}_{t=1}W_{i}(t)\geq T\lambda^{*}(1-O(\epsilon))-d_{\text{UB}}\tilde{O}(\epsilon) for every iri\in\mathcal{I}_{r} with probability at least 1ϵ1-\epsilon. In the case when δlog(||T)/ϵ\delta\leq\log(|\mathcal{I}|T)/\epsilon and d¯(δ)=o(T)\bar{d}(\delta)=o(T), our algorithm achieves a reward at least Tλ(1O(ϵ))T\lambda^{*}(1-O(\epsilon)). This result nearly matches the [1] in studying an online non-reusable resource allocation problem, as well as [20] in the state-of-art work on the offline assortment planning with reusable resources.

The assumption γ=Ω(log(||T/ϵ)/ϵ2)\gamma=\Omega(\log(|\mathcal{I}|T/\epsilon)/\epsilon^{2}) means that the amount of resource cic_{i} for each ii is sufficiently large, and the planning horizon TT is sufficiently long. Crucially, the performance guarantee in Theorem 1 does not deteriorate even when |𝒥||\mathcal{J}| or |𝒦||\mathcal{K}| grows. Consequently, our Algorithm AA is applicable to complex scenarios when |𝒥|>T|\mathcal{J}|>T.

High-level description and comparison against [1]. Our Algorithm AA extends [1]’s idea of adaptive weighing from the non-reusable setting to the reusable setting. [1] proposes a multi-stage adaptive weighing algorithm to trade-off between the rewards and the resources. For each reward and resource constraint of a fluid LP (corresponding to our (LP-E)), a penalty weight is defined. The weight on each constraint progressively gets larger as the reward generated or the resource consumed gets closer to their total capacity (the capacity of each reward is approximated). [1] does not apply to the reusable setting, as their penalty weights and algorithm design depend on the monotonic-decreasing resources.

Somewhat surprisingly, we can still utilize their adaptive weighing idea. we approximate (LP-E) with a knapsack-constrained (LP-SS), and define penalty weights that incorporates the usage duration as well as the resource consumption. In a series of Lemmas that eventually leads to Theorem 1, we show that our algorithm effectively capitalizes the reusability of the resources to maximize the total rewards. We overcome the technical difficulty in non-monotonic and time-correlated resource levels, by achieving near-optimal performance. Nevertheless, we remark that the closeness of (LP-E) and (LP-SS) builds upon Assumption 1, and hence our algorithm is not applicable to the non-reusable setting.

A multistage online algorithm. In Algorithm 1, we divide the time horizon into ll stages {1,0,1,,l1}\{-1,0,1,\ldots,l-1\} where ll satisfies ϵ2l=1\epsilon 2^{l}=1 for some ϵ[d¯(δ)/T,1/2]\epsilon\in[\bar{d}(\delta)/T,1/2]. Stage 1-1 consists of t(1)=ϵTt^{(-1)}=\epsilon T time periods. This stage is solely for exploration on the latent {pj}j𝒥\{p_{j}\}_{j\in\mathcal{J}}, and the first ϵT\epsilon T customers are served with random actions. Stage r{0,,l1}r\in\{0,\ldots,l-1\} consists of t(r)=ϵT2rt^{(r)}=\epsilon T2^{r} time periods. The assumption ϵ[d¯(δ)/T,1/2]\epsilon\in[\bar{d}(\delta)/T,1/2] ensures that l1l\geq 1 (there is at least 1 stage) and ϵTd¯(δ)\epsilon T\geq\bar{d}(\delta) (each stage consists of at least d¯(δ)\bar{d}(\delta) time periods). We denote j(r)(s)j^{(r)}(s) as the type of the customer who arrives at the ss-th time step in stage rr (where s{1,,t(r)}s\in\{1,\ldots,t^{(r)}\}), meaning that j(r)(s)=j(t(r)+s)j^{(r)}(s)=j(t^{(r)}+s).

In each stage r0r\geq 0, Algorithm AA consists of two steps. In Step 1, we estimate the optimum of (LP-SS). In Step 2, we define “penalty weights” on constraints of (LP-SS), and choose the action that balances between each constraint.

Step 1: Estimate the value of λ~\tilde{\lambda}^{*} (Line 3 of Algorithm 1). We first derive μ(r)\mu^{(r)*}, which is the optimal objective value of the linear program (LP-RSS)(r):\text{(LP-RSS)}^{(r)}:

maxxjk(r)μ(r)\displaystyle~{}\max\limits_{x^{(r)}_{jk}}~{}\mu^{(r)}
s.t. j𝒥k𝒦p^j(r)wijkxjk(r)μr\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\hat{p}^{(r)}_{j}w_{ijk}x^{(r)}_{jk}\geq\mu_{r} ir\displaystyle\forall i\in\mathcal{I}_{r}
j𝒥k𝒦p^j(r)aijkdixjk(r)ci\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\hat{p}^{(r)}_{j}a_{ijk}d_{i}x^{(r)}_{jk}\leq c_{i} ic\displaystyle\forall i\in\mathcal{I}_{c}
k𝒦xjk(r)1\displaystyle\sum_{k\in\mathcal{K}}x^{(r)}_{jk}\leq 1 j𝒥\displaystyle\forall j\in\mathcal{J}
xjk(r)0\displaystyle x^{(r)}_{jk}\geq 0 j𝒥,k𝒦,\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K},

where p^j(r)=1t(r1)t=1t(r1)𝟏(j(r1)(t)=j)\hat{p}^{(r)}_{j}=\frac{1}{t^{(r-1)}}\sum^{t^{(r-1)}}_{t=1}\mathbf{1}(j^{(r-1)}(t)=j), denoting the empirical customer distribution based on customer arrivals in stage r1r-1. (LP-RSS)(r)\text{(LP-RSS)}^{(r)} is a sample average approximation (SAA) of (LP-SS). It is worth mentioning that both (LP-SS) and (LP-RSS)(r)\text{(LP-RSS)}^{(r)} are highly tractable, even in assortment planning application when |𝒦||\mathcal{K}| is exponential in |c||\mathcal{I}_{c}| ([23]). In addition, the knapsack-type constraints in (LP-RSS)(r)\text{(LP-RSS)}^{(r)} allows us to apply the adaptive weighing in Step 2 in a similar manner to the non-reusable setting. Given that μ(r)\mu^{(r)*} is easily obtained in Step 1, we define λ(r)\lambda^{(r)} in the following lemma, and show it is a progressively more accurate estimate of λ~\tilde{\lambda}^{*} as rr grows.

Lemma 3.

Define ϵx(r)=4Tlog2||ηt(r)γ\epsilon^{(r)}_{x}=\sqrt{\frac{4T\log\frac{2|\mathcal{I}|}{\eta}}{t^{(r)}\gamma}} for r{1,0,1,,l1}r\in\{-1,0,1,\ldots,l-1\}. For any η(0,1)\eta\in(0,1), with probability at least 12η1-2\eta,

λ~(13ϵx(r1))λ(r)λ~\tilde{\lambda}^{*}(1-3\epsilon^{(r-1)}_{x})\leq\lambda^{(r)}\leq\tilde{\lambda}^{*}

where λ(r)=μ(r)1+ϵx(r1)\lambda^{(r)}=\frac{\mu^{(r)*}}{1+\epsilon^{(r-1)}_{x}}.

Step 2: Run an online algorithm given λ(r)\lambda^{(r)} (Line 4 - Line 10 of Algorithm 1). With slight abuse of notation, we write Ai,j(t(r)+s),k(t(r)+s)A_{i,j(t^{(r)}+s),k}(t^{(r)}+s) as Ai,j(r)(s),k(r)(s)A^{(r)}_{i,j^{(r)}(s),k}(s), Wi,j(t(r)+s),k(t(r)+s)W_{i,j(t^{(r)}+s),k}(t^{(r)}+s) as Wi,j(r)(s),k(r)(s)W^{(r)}_{i,j^{(r)}(s),k}(s), and Di,j(t(r)+s),k(t(r)+s)D_{i,j(t^{(r)}+s),k}(t^{(r)}+s) as Di(r)(s)D^{(r)}_{i}(s). In addition, we denote Xj(t(r)+s),k(r)(t(r)+s)X^{(r)}_{j(t^{(r)}+s),k}(t^{(r)}+s) as Xj(r)(s),k(r)(s)X^{(r)}_{j^{(r)}(s),k}(s). Define, at time tt in stage rr, Yiτt(r)=k𝒦𝟏(Di(r)(τ)tτ+1)Ai,j(r)(τ),k(r)(τ)Xj(r)(τ),k(r)(τ)Y^{(r)}_{i\tau t}=\sum_{k\in\mathcal{K}}\mathbf{1}(D^{(r)}_{i}(\tau)\geq t-\tau+1)A^{(r)}_{i,j^{(r)}(\tau),k}(\tau)X^{(r)}_{j^{(r)}(\tau),k}(\tau) and Zit(r)=k𝒦Wi,j(r)(t),k(r)(t)Xj(r)(t),k(r)(t)Z^{(r)}_{it}=\sum_{k\in\mathcal{K}}W^{(r)}_{i,j^{(r)}(t),k}(t)X^{(r)}_{j^{(r)}(t),k}(t) respectively as resource ii consumed by customer τ\tau, and reward ii earned.

At the ss-th time step of stage rr, after observing the customer type j(r)(s)j^{(r)}(s) we take action k(r)(s)k^{(r)}(s) according to Line 7. The parameter ϕi,s,t(r)\phi^{(r)}_{i,s,t} represents a “penalty weight” for the resource constraint ici\in\mathcal{I}_{c} in (LP-SS). If the allocation decisions during 1,,s11,\ldots,s-1 in stage rr leads to a high amount of resource ii occupation at time tt, the penalty ϕi,s,t(r)\phi^{(r)}_{i,s,t} would also be high. Similarly, a lower amount of accrued reward type iri\in\mathcal{I}_{r} during 1,,s11,\ldots,s-1 in stage rr leads to a higher value of the weight ψi,s(r)\psi^{(r)}_{i,s}. Both weights quantify the DM’s emphasis on resources and rewards.

Algorithm 1 Online Algorithm AA

Input: the number of time periods TT, the capacities for each resource cic_{i}, the values of γ\gamma and ϵ[di/T,1/2]\epsilon\in\left[d_{i}/T,1/2\right].
Output: actions to take k(r)(t)k^{(r)}(t), for r=0,,l1,t=1,,t(r)r=0,\ldots,l-1,t=1,\dots,t^{(r)}.

1:Set l=log(1/ϵ)l=\log\left(1/\epsilon\right). Initialize t(1)=ϵTt^{(-1)}=\epsilon T.
2:for r=0,,l1r=0,\ldots,l-1 do
3:     Compute λ(r)\lambda^{(r)} by solving (LP-RSS)(r)\text{(LP-RSS)}^{(r)}.
4:      Set
ϵx(r1)=4Tlog2||ηt(r1)γ,ϵz(r)=2wmax(1+ϵ)log2||lηt(r)λ(r).\epsilon^{(r-1)}_{x}=\sqrt{\frac{4T\log\frac{2|\mathcal{I}|}{\eta}}{t^{(r-1)}\gamma}},~{}\epsilon^{(r)}_{z}=\sqrt{\frac{2w_{\max}(1+\epsilon)\log\frac{2|\mathcal{I}|l}{\eta}}{t^{(r)}\lambda^{(r)}}}.
5:     Set
ϕi,1,t(r)={ϵγci(1+ϵ)γδ,t=1ϵγci(1+ϵ)γδτ=2t(1+ϵγPr(Ditτ+1)di(1+ϵ)),t=2,,t(r)\phi^{(r)}_{i,1,t}=\begin{cases}\frac{\epsilon\gamma}{c_{i}(1+\epsilon)^{\gamma-\delta}},~{}t=1\\ \frac{\epsilon\gamma}{c_{i}(1+\epsilon)^{\gamma-\delta}}\prod^{t}_{\tau=2}\left(1+\epsilon{\frac{\gamma\Pr\left(D_{i}\geq t-\tau+1\right)}{d_{i}(1+\epsilon)}}\right),\\ \quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad t=2,\ldots,t^{(r)}\end{cases}
for each ici\in\mathcal{I}_{c}, and
ψi,1(r)=\displaystyle\psi^{(r)}_{i,1}= ϵz(r)τ=2t(r)(1ϵz(r)λ(r)wmax(1+ϵ))wmax(1ϵz(r))(1ϵz(r))t(r)λ(r)wmax.\displaystyle-\frac{\epsilon^{(r)}_{z}\prod^{t^{(r)}}_{\tau=2}\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}(1+\epsilon)}\right)}{w_{\max}\left(1-\epsilon^{(r)}_{z}\right)^{\frac{\left(1-\epsilon^{(r)}_{z}\right)t^{(r)}\lambda^{(r)}}{w_{\max}}}}.
for each iri\in\mathcal{I}_{r}.
6:     for s=1,,t(r)s=1,\ldots,t^{(r)} do
7:          Observe customer type j(r)(s)j^{(r)}(s), take action:
k(r)(s)argmink𝒦\displaystyle k^{(r)}(s)\in\arg\min\limits_{k\in\mathcal{K}}
{t=st(r)icai,j(r)(s),kPr(Dits+1)ϕi,s,t(r)\displaystyle\left\{\sum^{t^{(r)}}_{t=s}\sum_{i\in\mathcal{I}_{c}}a_{i,j^{(r)}(s),k}\Pr\left(D_{i}\geq t-s+1\right)\phi^{(r)}_{i,s,t}\right.
+irwi,j(r)(s),kψi,s(r)}.\displaystyle\qquad+\left.\sum_{i\in\mathcal{I}_{r}}w_{i,j^{(r)}(s),k}\psi^{(r)}_{i,s}\right\}.
8:         Set Yist(r)=ai,j(r)(s),k(r)(s)Pr(Dits+1)Y^{(r)}_{ist}=a_{i,j^{(r)}(s),k^{(r)}(s)}\Pr\left(D_{i}\geq t-s+1\right) for each isi\in\mathcal{I}_{s} and t{s,,t(r)}t\in\{s,\ldots,t^{(r)}\}, and Zis(r)=wi,j(r)(s),k(r)(s)Z^{(r)}_{is}=w_{i,j^{(r)}(s),k^{(r)}(s)} for each iri\in\mathcal{I}_{r}.
9:          Set for each ici\in\mathcal{I}_{c}
ϕi,s+1,t(r)=ϕi,s,t(r)(1+ϵ)γciYist(r)(1+ϵγPr(Dits)di(1+ϵ)),t=s+1,,t(r),\displaystyle\phi^{(r)}_{i,s+1,t}=\frac{\phi^{(r)}_{i,s,t}(1+\epsilon)^{\frac{\gamma}{c_{i}}Y^{(r)}_{ist}}}{\left(1+\epsilon\frac{\gamma\Pr\left(D_{i}\geq t-s\right)}{d_{i}(1+\epsilon)}\right)},~{}t=s+1,\ldots,t^{(r)},
and for each iri\in\mathcal{I}_{r},
ψi,s+1(r)=ψi,s(r)(1ϵz(r))1wmaxZis(r)(1ϵz(r)λ(r)wmax(1+ϵ)).\displaystyle\psi^{(r)}_{i,s+1}=\frac{\psi^{(r)}_{i,s}\left(1-\epsilon^{(r)}_{z}\right)^{\frac{1}{w_{\max}}Z^{(r)}_{is}}}{\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}(1+\epsilon)}\right)}.
10:     end for
11:end for

Performance guarantee of Algorithm AA. Our analysis involves bounding the total probability of violating each constraint in (LP-C) in all time steps. Each violation leads to unserved customers and lost rewards. We show that under Algorithm AA, all constraints of (LP-C) are satisfied with high probability. To understand the choice of k(r)(s+1)k^{(r)}(s+1) in Line 7 of Algorithm 1, we introduce an auxiliary offline static algorithm (dubbed Algorithm SS as in “Static”). Algorithm SS requires knowing x={xjk}jkx^{*}=\{x^{*}_{jk}\}_{jk}, an optimal solution to (LP-SS), and a tuning parameter ϵ(0,1)\epsilon\in(0,1) for preserving capacities in anticipation of any constraint violation. At a time tt, if a customer of type jj arrives, the DM selects action k𝒦knullk\in\mathcal{K}\setminus k_{\text{null}} with probability xjk1+ϵ\frac{x_{jk}^{*}}{1+\epsilon}, and selects the null action knullk_{\text{null}} with probability ϵ1+ϵ+xj,knull1+ϵ\frac{\epsilon}{1+\epsilon}+\frac{x_{j,k_{\text{null}}}^{*}}{1+\epsilon}. We denote Xj(t(r)+s),kS(t(r)+s)X^{S}_{j(t^{(r)}+s),k}(t^{(r)}+s) as X~j(r)(s),k(r)(s)\tilde{X}^{(r)}_{j^{(r)}(s),k}(s). Define Y~iτt(r)=k𝒦𝟏(Di(r)(τ)tτ+1)Ai,j(r)(τ),k(r)(τ)X~j(r)(τ)k,r(r)(τ)\tilde{Y}^{(r)}_{i\tau t}=\sum_{k\in\mathcal{K}}\mathbf{1}(D^{(r)}_{i}(\tau)\geq t-\tau+1)A^{(r)}_{i,j^{(r)}(\tau),k}(\tau)\tilde{X}^{(r)}_{j^{(r)}(\tau)k,r}(\tau) and Z~it(r)=k𝒦Wi,j(r)(t),k(r)(t)X~j(r)(t),k(r)(t)\tilde{Z}^{(r)}_{it}=\sum_{k\in\mathcal{K}}W^{(r)}_{i,j^{(r)}(t),k}(t)\tilde{X}^{(r)}_{j^{(r)}(t),k}(t) where Pr(X~j(r)(τ),k(r)(τ)=1)=xjk1+ϵ\Pr(\tilde{X}^{(r)}_{j^{(r)}(\tau),k}(\tau)=1)=\frac{x^{*}_{jk}}{1+\epsilon} for each τ\tau in stage rr. A performance guarantee of Algorithm SS is provided in the following lemma.

Lemma 4.

Let η=ϵ/(5l)\eta=\epsilon/(5l), Algorithm SS achieves a total reward of at least

r=0l1t=1t(r)k𝒦Wi,j(r)(t),k(r)(t)X~j(r)(t),k(r)(t)\displaystyle\sum^{l-1}_{r=0}\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}W^{(r)}_{i,j^{(r)}(t),k}(t)\tilde{X}^{(r)}_{j^{(r)}(t),k}(t)
\displaystyle\geq Tλ~(1O(ϵ))d¯(δ)wmaxO(ϵ+log1ϵ)\displaystyle T\tilde{\lambda}^{*}(1-O(\epsilon))-\bar{d}(\delta)w_{\max}O\left(\epsilon+\log\frac{1}{\epsilon}\right)

for every iri\in\mathcal{I}_{r} with probability at least 1ϵ(1+ϵ)δ1-\epsilon(1+\epsilon)^{\delta}.

For analysis sake, we consider a hybrid Algorithm AsSt(r)sA_{s}S_{t^{(r)}-s} in each stage rr. For Algorithm AsSt(r)sA_{s}S_{t^{(r)}-s}, the DM makes allocation decisions based on Algorithm AA in time step {1,,s}\{1,\ldots,s\}, and based on Algorithm SS in time step {s+1,,t(r)}\{s+1,\ldots,t^{(r)}\}. We show that As+1St(r)s1A_{s+1}S_{t^{(r)}-s-1} outperforms AsSt(r)sA_{s}S_{t^{(r)}-s}, which inductively leads to the conclusion that the performance of the online adaptive Algorithm AA is no worse than the offline static Algorithm SS (see Lemma 4). This induction technique is introduced in [1] on the non-reusable setting. We need more refined techniques to disentangle the time-correlation between the resources constraints at each time step, and finally prove the following lemma.

Lemma 5.

Algorithm AA achieves a total reward of at least

r=0l1t=1t(r)k𝒦Wij(t)k,r(t)X~j(t)k,r(r)(t)\displaystyle\sum^{l-1}_{r=0}\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}W_{ij(t)k,r}(t)\tilde{X}^{(r)}_{j(t)k,r}(t)
\displaystyle\geq Tλ~(1O(ϵ))d¯(δ)wmaxO(ϵ+log1ϵ)\displaystyle T\tilde{\lambda}^{*}(1-O(\epsilon))-\bar{d}(\delta)w_{\max}O\left(\epsilon+\log\frac{1}{\epsilon}\right)

for every iri\in\mathcal{I}_{r} with probability at least 1ϵ(1+ϵ)δ1-\epsilon(1+\epsilon)^{\delta}.

Putting Lemmas 2 and 5 together, we have Theorem 1.

IV Numerical experiments

We consider an assortment planning problem. One unit of resource ii is associated with a fixed price rir_{i}. Contingent upon the arrival of a customer, say of type jj, the DM decides the assortment k𝒦k\in\mathcal{K} to display, where 𝒦\mathcal{K} is a collection of subsets of c\mathcal{I}_{c}. Let qijkq_{ijk} denote the probability for customer type jj to choose product ii in assortment kk. Therefore, the assortment planning problem (simultaneously maximizing the revenue of each resource) can be incorporated in our model by setting r=c\mathcal{I}_{r}=\mathcal{I}_{c}, setting AijkA_{ijk} to be the Bernoulli random variable with mean qijkq_{ijk}, and setting Wijk=riAijkW_{ijk}=r_{i}A_{ijk}. In our test, the probability qijkq_{ijk} is modeled by the multinomial logit (MNL) choice model. Each resource ici\in\mathcal{I}_{c} is associated with a feature vector 𝒇im\boldsymbol{f}_{i}\in\mathbb{R}^{m}, and each customer type j𝒥j\in\mathcal{J} is associated with a set of feature vectors {𝒃ij}ic\{\boldsymbol{b}_{ij}\}_{i\in\mathcal{I}_{c}}, where 𝒃ijm\boldsymbol{b}_{ij}\in\mathbb{R}^{m} for each ici\in\mathcal{I}_{c}. The feature vector 𝒇i\boldsymbol{f}_{i} could involve the fixed price rir_{i}, and qijk=exp(𝒃ij𝒇i)1+kexp(𝒃j𝒇)q_{ijk}=\frac{\exp(\boldsymbol{b}_{ij}^{\top}\boldsymbol{f}_{i})}{1+\sum_{\ell\in k}\exp(\boldsymbol{b}_{\ell j}^{\top}\boldsymbol{f}_{\ell})} if iki\in k, and qijk=0q_{ijk}=0 if iki\not\in k. In complement, the probability of no purchase is 11+kexp(𝒃j𝒇).\frac{1}{1+\sum_{\ell\in k}\exp(\boldsymbol{b}_{\ell j}^{\top}\boldsymbol{f}_{\ell})}. Notice that the size of the action set 𝒦\mathcal{K} scales exponentially with the number of products. Nevertheless, for the MNL models, k(r)(t)k^{(r)}(t) can be computed efficiently by solving a simple LP whose computational time is polynomial in |c||\mathcal{I}_{c}| [24].

We consider a synthetic data-set with 1414 types of resources indexed by c={1,2,,14}\mathcal{I}_{c}=\{1,2,\dots,14\}, and 10001000 types of customers indexed by 𝒥={1,2,,1000}\mathcal{J}=\{1,2,\ldots,1000\}. We allow offering any assortment of fewer than 5 products, and hence the assortment set is of size i=15Ci14=3472\sum^{5}_{i=1}C^{14}_{i}=3472. We let 𝒑\boldsymbol{p} follow a discrete distribution with a support of |𝒥||\mathcal{J}|. For any n1n\geq 1, we set T=1000nT=1000n, ci=20nc_{i}=20n and DiD_{i} follow a randomly generated probability distribution with a bounded support of [1,200n][1,200n]. Theoretically, the regret of both Algorithms AA and SS should grow sublinearly against the scale nn, roughly at the rate of O~(n)\tilde{O}(\sqrt{n}). For each TT value, we run 1010 simulations using a column generation approach and take the average as well as the standard deviation.

TT ϵ\epsilon UB Total Revenue %\% Gap from UB
Algo SS Algo AA Algo SS Algo AA
Mean Std Mean Std
1000 0.3 644.90 507.06 42.1689 331.03 2.531798 21.37% 48.67%
2000 0.22 1289.80 1097.47 50.88593 887.28 7.681146 14.91% 31.21%
3000 0.185 1934.70 1702.97 75.61902 1419.54 8.537564 11.98% 26.63%
4000 0.162 2579.60 2327.51 86.79659 1966.83 6.663332 9.77% 23.75%
5000 0.148 3224.50 2936.00 98.77243 2522.00 10.44462 8.95% 21.79%
6000 0.136 3869.40 3557.05 74.11313 3086.28 20.91315 8.07% 20.24%
7000 0.127 4514.30 4200.24 67.91298 3647.95 9.508417 6.96% 19.19%
8000 0.12 5159.20 4804.85 68.66264 4217.17 14.06983 6.87% 18.26%
TABLE I: Results with 1414 resource types and 10001000 customer types.

In Table I, the third column of upper bounds are the optimal values of (LP-SS). The offline Algorithm SS performs better than the online Algorithm AA. Algorithm AA achieves rewards within 12ϵ1-2\epsilon fraction of the upper bounds.

References

  • [1] N. R. Devanur, K. Jain, B. Sivan, and C. A. Wilkens, “Near optimal online algorithms and fast approximation algorithms for resource allocation problems,” Journal of the ACM (JACM), vol. 66, no. 1, pp. 1–41, 2019.
  • [2] X. Chen, Z. Owen, C. Pixton, and D. Simchi-Levi, “A statistical learning approach to personalization in revenue management,” Management Science, vol. 68, no. 3, pp. 1923–1937, 2022.
  • [3] Y. Han, F. C. Pereira, M. Ben-Akiva, and C. Zegras, “A neural-embedded choice model: Tastenet-mnl modeling taste heterogeneity with flexibility and interpretability,” arXiv preprint arXiv:2002.00922, 2020.
  • [4] A. Aouad and A. Désir, “Representing random utility choice models with neural networks,” arXiv preprint arXiv:2207.12877, 2022.
  • [5] Y. Chen, R. Levi, and C. Shi, “Revenue management of reusable resources with advanced reservations,” Production and Operations Management, vol. 26, no. 5, pp. 836–859, 2017.
  • [6] Y. Lei and S. Jasin, “Real-time dynamic pricing for revenue management with reusable resources, advance reservation, and deterministic service time requirements,” Operations Research, vol. 68, no. 3, pp. 676– 685, 2020.
  • [7] J. Baek and W. Ma, “Bifurcating constraints to improve approximation ratios for network revenue management with reusable resources,” Operations Research, 2022.
  • [8] O. Besbes, A. N. Elmachtoub, and Y. Sun, “Static pricing: Universal guarantees for reusable resources,” Operations Research, 2021.
  • [9] P. Rusmevichientong, M. Sumida, and H. Topaloglu, “Dynamic assortment optimization for reusable prod- ucts with random usage durations,” Management Science, vol. 66, no. 7, pp. 2820–2844, 2020.
  • [10] D. Adelman, “Dynamic bid prices in revenue management,” Operations Research, vol. 55, no. 4, pp. 647– 661, 2007.
  • [11] S. Alaei, M. Hajiaghayi, and V. Liaghat, “Online prophet-inequality matching with applications to ad allocation,” in Proceedings of the 13th ACM Conference on Electronic Commerce, pp. 18–35, 2012.
  • [12] S. Agrawal and N. R. Devanur, “Fast algorithms for online stochastic convex programming,” in Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 1405–1424, SIAM, 2014.
  • [13] S. Agrawal, Z. Wang, and Y. Ye, “A dynamic near-optimal algorithm for online linear programming,” Operations Research, vol. 62, no. 4, pp. 876–890, 2014.
  • [14] S. Balseiro, H. Lu, and V. Mirrokni, “Dual mirror descent for online allocation problems,” in International Conference on Machine Learning, pp. 613–628, PMLR, 2020.
  • [15] J. Feldman, M. Henzinger, N. Korula, V. S. Mirrokni, and C. Stein, “Online stochastic packing applied to display ad allocation,” in Algorithms – ESA 2010 (M. de Berg and U. Meyer, eds.), (Berlin, Heidelberg), pp. 182–194, Springer Berlin Heidelberg, 2010.
  • [16] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  • [17] J. Yuan and A. Lamperski, “Online convex optimization for cumulative constraints,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  • [18] X. Y. Gong, V. Goyal, G. Iyengar, D. Simchi-Levi, R. Udwani, and S. Wang, “Online assortment optimization with reusable resources,” Available at SSRN 3334789, 2019.
  • [19] V. Goyal, G. Iyengar, and R. Udwani, “Asymptotically optimal competitive ratio for online allocation of reusable resources,” arXiv preprint arXiv:2002.02430, 2020.
  • [20] Y. Feng, R. Niazadeh, and A. Saberi, “Near-optimal bayesian online assortment of reusable resources,” Proceedings of the 23rd ACM Conference on Economics and Computation, 2022..
  • [21] R. Levi and A. Radovanović, “Provably near-optimal lp-based policies for revenue management in systems with reusable resources,” Operations Research, vol. 58, no. 2, pp. 503–507, 2010.
  • [22] Y. Feng, R. Niazadeh, and A. Saberi, “Linear programming based online policies for real-time assortment of reusable resources,” Available at SSRN 3421227, 2019.
  • [23] J. J. M. Bront, I. Méndez-Díaz, and G. Vulcano, “A column generation algorithm for choice-based network revenue management,” Operations research, vol. 57, no. 3, pp. 769–784, 2009.
  • [24] J. Davis, G. Gallego, and H. Topaloglu, “Assortment planning under the multinomial logit model with totally unimodular constraint structures,” Work in Progress, 2013.

Appendix

IV-A Proof of Lemma 1

Proof.

Take the expectation of the resource constraints of the (LP-C) τ=1tk𝒦𝟏(Di(τ)tτ+1)Ai,j(τ),k(τ)Xkπ(τ)\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\mathbf{1}(D_{i}(\tau)\geq t-\tau+1)A_{i,j(\tau),k}(\tau)X^{\pi}_{k}(\tau) over Xkπ(τ)X^{\pi}_{k}(\tau), Di(τ)D_{i}(\tau), Ai,j(τ),k(τ)A_{i,j(\tau),k}(\tau) and j(τ)j(\tau) for τ=1,,t\tau=1,\ldots,t:

𝔼[τ=1tk𝒦𝟏(Di(τ)tτ+1)Ai,j(τ),k(τ)Xkπ(τ)]\displaystyle\mathbb{E}\left[\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\mathbf{1}\left(D_{i}\left(\tau\right)\geq t-\tau+1\right)A_{i,j\left(\tau\right),k}\left(\tau\right)X^{\pi}_{k}\left(\tau\right)\right]
=\displaystyle= τ=1tk𝒦j(1),,j(t)𝔼[𝟏(Di(τ)tτ+1)Ai,j(τ),k(τ)Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(t))\displaystyle\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\sum_{j\left(1\right),\ldots,j\left(t\right)}\mathbb{E}\left[\mathbf{1}\left(D_{i}\left(\tau\right)\geq t-\tau+1\right)A_{i,j\left(\tau\right),k}\left(\tau\right)X^{\pi}_{k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]\Pr\left(j\left(1\right),\ldots,j\left(t\right)\right)
=\displaystyle= τ=1tk𝒦j(1),,j(t)𝔼[𝟏(Di(τ)tτ+1)|j(1),,j(t)]𝔼[Ai,j(τ),k(τ)|j(1),,j(t)]\displaystyle\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\sum_{j\left(1\right),\ldots,j\left(t\right)}\mathbb{E}\left[\mathbf{1}\left(D_{i}\left(\tau\right)\geq t-\tau+1\right)|j\left(1\right),\ldots,j\left(t\right)\right]\cdot\mathbb{E}\left[A_{i,j\left(\tau\right),k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]
𝔼[Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(t))\displaystyle\quad\quad\quad\quad\quad\quad~{}\cdot\mathbb{E}\left[X^{\pi}_{k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]\cdot\Pr\left(j\left(1\right),\ldots,j\left(t\right)\right)
=\displaystyle= τ=1tk𝒦j(1),,j(t)𝔼[𝟏(Di(τ)tτ+1)|j(τ)]𝔼[Ai,j(τ),k(τ)|j(τ)]\displaystyle\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\sum_{j\left(1\right),\ldots,j\left(t\right)}\mathbb{E}\left[\mathbf{1}\left(D_{i}\left(\tau\right)\geq t-\tau+1\right)|j\left(\tau\right)\right]\cdot\mathbb{E}\left[A_{i,j\left(\tau\right),k}\left(\tau\right)|j\left(\tau\right)\right]
𝔼[Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(t))\displaystyle\quad\quad\quad\quad\quad\quad~{}\cdot\mathbb{E}\left[X^{\pi}_{k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]\cdot\Pr\left(j\left(1\right),\ldots,j\left(t\right)\right)
=\displaystyle= τ=1tk𝒦j(τ)pj(τ)ai,j(τ),kPr(Ditτ+1)\displaystyle\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\sum_{j\left(\tau\right)}p_{j\left(\tau\right)}a_{i,j\left(\tau\right),k}\Pr\left(D_{i}\geq t-\tau+1\right)
j(1),,j(τ1),j(τ+1),,j(t)𝔼[Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(τ1),j(τ+1),,j(t))\displaystyle\quad\quad\quad\quad~{}\cdot\sum_{j\left(1\right),\ldots,j\left(\tau-1\right),j\left(\tau+1\right),\ldots,j\left(t\right)}\mathbb{E}\left[X^{\pi}_{k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]\Pr\left(j\left(1\right),\ldots,j\left(\tau-1\right),j\left(\tau+1\right),\ldots,j\left(t\right)\right)
=\displaystyle= τ=1tk𝒦j𝒥pjaijkPr(Ditτ+1)\displaystyle\sum^{t}_{\tau=1}\sum_{k\in\mathcal{K}}\sum_{j\in\mathcal{J}}p_{j}a_{ijk}\Pr\left(D_{i}\geq t-\tau+1\right)
j(1),,j(τ1),j(τ+1),,j(t)𝔼[Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(τ1),j(τ+1),,j(t))\displaystyle\quad\quad\quad\quad~{}\cdot\sum_{j\left(1\right),\ldots,j\left(\tau-1\right),j\left(\tau+1\right),\ldots,j\left(t\right)}\mathbb{E}\left[X^{\pi}_{k}\left(\tau\right)|j\left(1\right),\ldots,j\left(t\right)\right]\Pr\left(j\left(1\right),\ldots,j\left(\tau-1\right),j\left(\tau+1\right),\ldots,j\left(t\right)\right)
\displaystyle\leq ci\displaystyle c_{i}

The first equation follows from the summation of probabilities. The second equation holds because Di(τ)D_{i}(\tau) is independent of Ai,j(τ),k(τ)A_{i,j(\tau),k}(\tau), and the decision is made before the realization of Ai,j(τ),k(τ)A_{i,j(\tau),k}(\tau) and Di(τ)D_{i}(\tau). The third inequality is valid since Di(τ)D_{i}(\tau) and Ai,j(τ),k(τ)A_{i,j(\tau),k}(\tau) only depend on j(τ)j(\tau). The forth inequality follows by separately consider j(τ)j(\tau) and customers at other time periods. The fifth inequality stands since j(1),,j(T)j(1),\dots,j(T) are i.i.d. across time, and we can simply denote j(τ)j(\tau) as jj. The last inequality holds since the clairvoyant’s LP is feasible for all j(1),,j(T)j(1),\ldots,j(T) paths. Letting xjk(τ)=j(1),,j(τ1),j(τ+1),,j(t)𝔼[Xkπ(τ)|j(1),,j(t)]Pr(j(1),,j(τ1),j(τ+1),,j(t))x_{jk}(\tau)=\sum_{j(1),\ldots,j(\tau-1),j(\tau+1),\ldots,j(t)}\mathbb{E}[X^{\pi}_{k}(\tau)|j(1),\ldots,j(t)]\Pr(j(1),\ldots,j(\tau-1),j(\tau+1),\ldots,j(t)), we have a feasible solution for the online DM’s expected LP. For the reward constraints, 𝔼[t=1Tk𝒦Wij(t)k(t)Xkπ(t)]=𝔼[Tλ^]\mathbb{E}[\sum^{T}_{t=1}\sum_{k\in\mathcal{K}}W_{ij(t)k}(t)X^{\pi}_{k}(t)]=\mathbb{E}[T\hat{\lambda}^{*}]. Since the solution xjk(τ)x_{jk}(\tau) achieves an objective of 𝔼[Tλ^]\mathbb{E}[T\hat{\lambda}^{*}] and it is a feasible solution of the DM’s expected LP, we have λ𝔼[λ^]\lambda^{*}\geq\mathbb{E}[\hat{\lambda}^{*}]. ∎

IV-B Proof of Lemma 2

Proof.

For each ii we have di=t=1Pr(Dit)d_{i}=\sum^{\infty}_{t=1}\Pr(D_{i}\geq t). Hence, the capacity constraints of (LP-SS) can be expressed as:

ci\displaystyle c_{i} j𝒥k𝒦pjaijkdixjk\displaystyle\geq\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}a_{ijk}d_{i}x^{*}_{jk}
=j𝒥k𝒦pjaijkt=1Pr(Dit)xjk\displaystyle=\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}a_{ijk}\sum^{\infty}_{t=1}\Pr(D_{i}\geq t)x^{*}_{jk}
j𝒥k𝒦pjaijkτ=1tPr(Ditτ+1)xjkic,t{1,,T}.\displaystyle\geq\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}a_{ijk}\sum^{t}_{\tau=1}\Pr(D_{i}\geq t-\tau+1)x^{*}_{jk}\quad\forall i\in\mathcal{I}_{c},~{}t\in\{1,\ldots,T\}.

From the above, it is evident that the optimal solution xjkx^{*}_{jk} of (LP-SS) is feasible for the (LP-E) by letting yjk(t)=xjky_{jk}(t)=x^{*}_{jk} for all t{1,,T}t\in\{1,\ldots,T\}, and therefore λ~λ\tilde{\lambda}^{*}\leq\lambda^{*}. In contrast, the optimal solution yjk(t)y^{*}_{jk}(t) of (LP-E) may not be feasible for (LP-SS).

We consider a truncated version (LP-TE) of the DM’s expected LP (LP-E). For each resource constraint:

(LP-TE):max\displaystyle\text{(LP-TE)}:~{}\max\limits λ\displaystyle~{}\lambda
s.t. t=1Tj𝒥k𝒦pjwijkyjk(t)Tλ\displaystyle\sum^{T}_{t=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}w_{ijk}y_{jk}(t)\geq T\lambda ir\displaystyle\forall i\in\mathcal{I}_{r}
τ=max(td¯(δ),1)tj𝒥k𝒦pjPr(Ditτ+1)aijkyjk(τ)ci\displaystyle\sum^{t}_{\tau=\max(t-\bar{d}(\delta),1)}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}\Pr(D_{i}\geq t-\tau+1)a_{ijk}y_{jk}(\tau)\leq c_{i} ic,t{1,T}\displaystyle\forall i\in\mathcal{I}_{c},~{}t\in\{1,\ldots T\}
k𝒦yjk(t)1\displaystyle\sum_{k\in\mathcal{K}}y_{jk}(t)\leq 1 j𝒥,t{1,,T}\displaystyle\forall j\in\mathcal{J},~{}t\in\{1,\ldots,T\}
yjk(t)0\displaystyle y_{jk}(t)\geq 0 j𝒥,k𝒦,t[T].\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K},~{}t\in[T].

Denote for (LP-TE), the optimal objective value as λ\lambda^{\prime} and the optimal solution as yjk(t)y^{\prime}_{jk}(t) for t{1,,T}t\in\{1,\ldots,T\}. Since the feasible region (LP-TE) contains the feasible region of (LP-E) and both LPs have the same objective, we have λλ\lambda^{\prime}\geq\lambda^{*}. Note that {yjk(t)}j,k,t\{y^{\prime}_{jk}(t)\}_{j,k,t} may not be feasible for (LP-E). Despite that, we can show that yjk(t)=(1δγ)yjk(t)y_{jk}(t)=\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}(t) is feasible for (LP-E):

τ=1tj𝒥k𝒦pjPr(Ditτ+1)aijk(1δγ)yjk(τ)\displaystyle\sum^{t}_{\tau=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}\Pr\left(D_{i}\geq t-\tau+1\right)a_{ijk}\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}\left(\tau\right)
\displaystyle\leq τ=max(td¯(δ),1)tj𝒥k𝒦pjPr(Ditτ+1)aijk(1δγ)yjk(τ)\displaystyle\sum^{t}_{\tau=\max\left(t-\bar{d}\left(\delta\right),1\right)}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}\Pr\left(D_{i}\geq t-\tau+1\right)a_{ijk}\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}\left(\tau\right)
+τ=1max(0,td¯(δ)1)j𝒥k𝒦pjPr(Ditτ+1)aijk(1δγ)yjk(τ)\displaystyle+\sum^{\max\left(0,t-\bar{d}\left(\delta\right)-1\right)}_{\tau=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}\Pr\left(D_{i}\geq t-\tau+1\right)a_{ijk}\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}\left(\tau\right)
\displaystyle\leq (1δγ)ci+amaxδ\displaystyle\left(1-\frac{\delta}{\gamma}\right)c_{i}+a_{max}\delta
=\displaystyle= (1δγ+δamaxci)ci\displaystyle\left(1-\frac{\delta}{\gamma}+\frac{\delta a_{max}}{c_{i}}\right)c_{i}
\displaystyle\leq ci.\displaystyle c_{i}.

By letting yjk(t)=(1δγ)yjk(t)y_{jk}\left(t\right)=\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}\left(t\right), the objective value is at least t=1Tj𝒥k𝒦pjwijk(1δγ)yjk(t)(1δγ)Tλ\sum^{T}_{t=1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}w_{ijk}\left(1-\frac{\delta}{\gamma}\right)y^{\prime}_{jk}\left(t\right)\geq\left(1-\frac{\delta}{\gamma}\right)T\lambda^{\prime}. Hence λ(1δγ)λ\lambda^{*}\geq\left(1-\frac{\delta}{\gamma}\right)\lambda^{\prime}. We formulate a truncated version of the steady-state LP (LP-SS) in a similar fashion:

(LP-TSS) :maxxjk\displaystyle\text{(LP-TSS) }:~{}\max\limits_{x_{jk}} λ~\displaystyle~{}\tilde{\lambda}
s.t. j𝒥k𝒦pjwijkxjkλ~\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}p_{j}w_{ijk}x_{jk}\geq\tilde{\lambda} ir\displaystyle\forall i\in\mathcal{I}_{r}
j𝒥k𝒦τ=1d¯(δ)pjaijkPr(Diτ)xjkci\displaystyle\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\sum^{\bar{d}(\delta)}_{\tau=1}p_{j}a_{ijk}\Pr(D_{i}\geq\tau)x_{jk}\leq c_{i} ic\displaystyle\forall i\in\mathcal{I}_{c}
k𝒦xjk1\displaystyle\sum_{k\in\mathcal{K}}x_{jk}\leq 1 j𝒥\displaystyle\forall j\in\mathcal{J}
xjk0\displaystyle x_{jk}\geq 0 j𝒥,k𝒦.\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K}.

Denote for the (LP-TSS), the optimal objective value as λ~\tilde{\lambda}^{\prime} and the optimal solution as xjkx^{\prime}_{jk}. Similar to (LP-TE), we have λ~λ~\tilde{\lambda}^{\prime}\geq\tilde{\lambda}^{*} and λ~(1δγ)λ~\tilde{\lambda}^{*}\geq\left(1-\frac{\delta}{\gamma}\right)\tilde{\lambda}^{\prime}. It’s also obvious that λλ~\lambda^{\prime}\geq\tilde{\lambda}^{\prime} since we can let xjk(τ)=xjkx^{\prime}_{jk}(\tau)=x^{\prime}_{jk} and get a feasible solution for the (LP-TE). Next we show that λ\lambda^{\prime} is not so far away from λ~\tilde{\lambda}^{\prime}.

The dual of the truncated DM’s expected LP is:

(LP-TE-D):min𝜶(1),𝜷(1),𝝆(1)t=1T(j𝒥pjβjt(1)+icciαit(1))\displaystyle\text{(LP-TE-D)}:~{}\min\limits_{\boldsymbol{\alpha}^{(1)},\boldsymbol{\beta}^{(1)},\boldsymbol{\rho}^{(1)}}~{}\sum^{T}_{t=1}\left(\sum_{j\in\mathcal{J}}p_{j}\beta^{(1)}_{jt}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha^{(1)}_{it}\right)
s.t. βjt(1)+icaijkτ=tmin(t+d¯(δ),T)Pr(Diτt+1)αiτ(1)irwijkρi(1)0\displaystyle~{}\beta^{(1)}_{jt}+\sum_{i\in\mathcal{I}_{c}}a_{ijk}\sum^{\min(t+\bar{d}(\delta),T)}_{\tau=t}\Pr(D_{i}\geq\tau-t+1)\alpha^{(1)}_{i\tau}-\sum_{i\in\mathcal{I}_{r}}w_{ijk}\rho^{(1)}_{i}\geq 0 j𝒥,k𝒦,\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K},
t{1,,T}\displaystyle~{}t\in\{1,\ldots,T\}
irρi(1)1\displaystyle\sum_{i\in\mathcal{I}_{r}}\rho^{(1)}_{i}\geq 1
ρi(1)0\displaystyle\rho^{(1)}_{i}\geq 0 ir\displaystyle\forall i\in\mathcal{I}_{r}
αit(1)0\displaystyle\alpha^{(1)}_{it}\geq 0 ic,t{1,,T}\displaystyle\forall i\in\mathcal{I}_{c},~{}t\in\{1,\ldots,T\}
βjt(1)0\displaystyle\beta^{(1)}_{jt}\geq 0 j𝒥,t{1,,T}.\displaystyle\forall j\in\mathcal{J},~{}t\in\{1,\ldots,T\}.

The dual of truncated steady-state LP is:

(LP-TSS-D):\displaystyle\text{(LP-TSS-D)}: min𝜶(2),𝜷(2),𝝆(2)j𝒥pjβj(2)+icciαi(2)\displaystyle~{}\min\limits_{\boldsymbol{\alpha}^{(2)},\boldsymbol{\beta}^{(2)},\boldsymbol{\rho}^{(2)}}~{}\sum_{j\in\mathcal{J}}p_{j}\beta^{(2)}_{j}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha^{(2)}_{i}
s.t. βj(2)+icaijkτ=1d¯(δ)Pr(Diτ)αi(2)irwijkρi(2)0\displaystyle~{}\beta^{(2)}_{j}+\sum_{i\in\mathcal{I}_{c}}a_{ijk}\sum^{\bar{d}(\delta)}_{\tau=1}\Pr(D_{i}\geq\tau)\alpha^{(2)}_{i}-\sum_{i\in\mathcal{I}_{r}}w_{ijk}\rho^{(2)}_{i}\geq 0 j𝒥,k𝒦\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K}
irρi(2)1\displaystyle\sum_{i\in\mathcal{I}_{r}}\rho^{(2)}_{i}\geq 1
ρi(2)0\displaystyle\rho^{(2)}_{i}\geq 0 ir\displaystyle\forall i\in\mathcal{I}_{r}
αi(2)0\displaystyle\alpha^{(2)}_{i}\geq 0 ic\displaystyle\forall i\in\mathcal{I}_{c}
βj(2)0\displaystyle\beta^{(2)}_{j}\geq 0 j𝒥.\displaystyle\forall j\in\mathcal{J}.

Obviously, the optimal objective value of (LP-TE-D) TλT\lambda^{\prime} is larger than or equal to TT times the optimal objective value of (LP-TSS-D) Tλ~T\tilde{\lambda}^{\prime}. Let 𝜶(2),𝜷(2),𝝆(2)\boldsymbol{\alpha}^{(2)*},\boldsymbol{\beta}^{(2)*},\boldsymbol{\rho}^{(2)*} denote the optimal solution of (LP-TSS-D). Since these are minimization problems, if for (LP-TE-D) we can let αit(1)=αi(2)\alpha^{(1)}_{it}=\alpha^{(2)*}_{i}, βjt(1)=βj(2)\beta^{(1)}_{jt}=\beta^{(2)*}_{j} for all tt, and ρi(1)=ρi(2)\rho^{(1)}_{i}=\rho^{(2)*}_{i} and still have a feasible LP, then λ=λ~\lambda^{\prime}=\tilde{\lambda}^{\prime}. Consider the first set of constraints in (LP-TE-D). For t{1,,Td¯(δ)}t\in\{1,\ldots,T-\bar{d}(\delta)\}, by letting αit(1)=αi(2)\alpha^{(1)}_{it}=\alpha^{(2)*}_{i}, βjt(1)=βj(2)\beta^{(1)}_{jt}=\beta^{(2)*}_{j} and ρi(1)=ρi(2)\rho^{(1)}_{i}=\rho^{(2)*}_{i}, we have:

βjt(1)+iaijkτ=tmin(t+d¯(δ),T)Pr(Diτt+1)αiτ(1)iwijkρi(1)\displaystyle\beta^{(1)}_{jt}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{\min(t+\bar{d}(\delta),T)}_{\tau=t}\Pr(D_{i}\geq\tau-t+1)\alpha^{(1)}_{i\tau}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(1)}_{i}
=\displaystyle= βj(2)+iaijkτ=tt+d¯(δ)Pr(Diτt+1)αi(2)iwijkρi(2)\displaystyle\beta^{(2)*}_{j}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{t+\bar{d}(\delta)}_{\tau=t}\Pr(D_{i}\geq\tau-t+1)\alpha^{(2)*}_{i}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(2)*}_{i}
=\displaystyle= βj(2)+iaijkτ=1d¯(δ)Pr(Diτ)αi(2)iwijkρi(2)\displaystyle\beta^{(2)*}_{j}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{\bar{d}(\delta)}_{\tau=1}\Pr(D_{i}\geq\tau)\alpha^{(2)*}_{i}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(2)*}_{i}
\displaystyle\geq 0.\displaystyle 0.

However, for t{Td¯(δ)+1,,T}t\in\{T-\bar{d}(\delta)+1,\ldots,T\}, the constraints may be violated:

βjt(1)+iaijkτ=tmin(t+d¯(δ),T)Pr(Diτt+1)αiτ(1)iwijkρi(1)\displaystyle\beta^{(1)}_{jt}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{\min(t+\bar{d}(\delta),T)}_{\tau=t}\Pr(D_{i}\geq\tau-t+1)\alpha^{(1)}_{i\tau}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(1)}_{i}
=\displaystyle= βj(2)+iaijkτ=tTPr(Diτt+1)αi(2)iwijkρi(2)\displaystyle\beta^{(2)*}_{j}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{T}_{\tau=t}\Pr(D_{i}\geq\tau-t+1)\alpha^{(2)*}_{i}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(2)*}_{i}
=\displaystyle= βj(2)+iaijkτ=1TtPr(Diτ)αi(2)iwijkρi(2)\displaystyle\beta^{(2)*}_{j}+\sum_{i\in\mathcal{I}}a_{ijk}\sum^{T-t}_{\tau=1}\Pr(D_{i}\geq\tau)\alpha^{(2)*}_{i}-\sum_{i\in\mathcal{I}}w_{ijk}\rho^{(2)*}_{i}

where Ttd¯(δ)T-t\leq\bar{d}(\delta). Notice that since (LP-TE-D) is a minimization problem, iρi(1)\sum_{i\in\mathcal{I}}\rho^{(1)*}_{i} is never strictly larger than 11. Hence to make the above expression larger than or equal to 0, it’s enough to let βjt(1)=wmax\beta^{(1)}_{jt}=w_{\max}. In all, by letting αit(1)=αi(2)\alpha^{(1)}_{it}=\alpha^{(2)*}_{i} for t{1,,T}t\in\{1,\ldots,T\}, βjt(1)=βj(2)\beta^{(1)}_{jt}=\beta^{(2)*}_{j} for t{1,,Td¯(δ)}t\in\{1,\ldots,T-\bar{d}(\delta)\}, βjt(1)=wmax\beta^{(1)}_{jt}=w_{\max} for t{Td¯(δ)+1,,T}t\in\{T-\bar{d}(\delta)+1,\ldots,T\} and ρi(1)=ρi(2)\rho^{(1)}_{i}=\rho^{(2)*}_{i}, we have a feasible solution for the (LP-TE-D). Then TλTλ~t=Td¯(δ)+1Tj𝒥pjβjt(1)d¯(δ)wmaxT\lambda^{\prime}-T\tilde{\lambda}^{\prime}\leq\sum^{T}_{t=T-\bar{d}(\delta)+1}\sum_{j\in\mathcal{J}}p_{j}\beta^{(1)}_{jt}\leq\bar{d}(\delta)w_{\max}. Concluding the above analysis, we have:

TλTλT(1δγ)λ\displaystyle T\lambda^{\prime}\geq T\lambda^{*}\geq T\left(1-\frac{\delta}{\gamma}\right)\lambda^{\prime}
Tλ~Tλ~T(1δγ)λ~\displaystyle T\tilde{\lambda}^{\prime}\geq T\tilde{\lambda}^{*}\geq T\left(1-\frac{\delta}{\gamma}\right)\tilde{\lambda}^{\prime}
TλTλ~+d¯(δ)wmax.\displaystyle T\lambda^{\prime}\leq T\tilde{\lambda}^{\prime}+\bar{d}(\delta)w_{\max}.

Putting these together, we have:

Tλ~(1δγ)(Tλd¯(δ)wmax).\displaystyle T\tilde{\lambda}^{*}\geq\left(1-\frac{\delta}{\gamma}\right)(T\lambda^{*}-\bar{d}(\delta)w_{\max}).

IV-C Proof of Lemma 3

Before proceeding to the proof, we introduce the multiplicative Chernoff bounds.

Lemma 6 (Multiplicative Chernoff bounds).

Let X=iXiX=\sum_{i}X_{i}, where Xi[0,B]X_{i}\in[0,B] are independent random variables, and let 𝔼[X]=μ\mathbb{E}[X]=\mu.

(a) For all ϵ>0\epsilon>0,

Pr(X<μ(1ϵ))<exp(ϵ2μ2B).\displaystyle\Pr(X<\mu(1-\epsilon))<\exp\left(\frac{-\epsilon^{2}\mu}{2B}\right).

Therefore, for all δ>0\delta>0, with probability at least 1δ1-\delta,

Xμ2μBln(1/δ).\displaystyle X-\mu\geq-\sqrt{2\mu B\ln(1/\delta)}.

(b) For ϵ[0,2e1]\epsilon\in[0,2e-1],

Pr(X>μ(1+ϵ))<exp(ϵ2μ4B).\displaystyle\Pr(X>\mu(1+\epsilon))<\exp\left(\frac{-\epsilon^{2}\mu}{4B}\right).

Hence, for all δ>exp((2e1)2μ4B)\delta>\exp\left(\frac{-(2e-1)^{2}\mu}{4B}\right), with probability at least 1δ1-\delta,

Xμ4μBln(1/δ).\displaystyle X-\mu\leq\sqrt{4\mu B\ln(1/\delta)}.

For ϵ>2e1\epsilon>2e-1,

Pr(X>μ(1+ϵ))<2(1+ϵ)μ/B.\displaystyle\Pr(X>\mu(1+\epsilon))<2^{-(1+\epsilon)\mu/B}.
Lemma 3.

Since λ(r)=μ(r)1+ϵx(r1)\lambda^{(r)}=\frac{\mu^{(r)*}}{1+\epsilon^{(r-1)}_{x}}, showing λ~(13ϵx(r1))λ(r)λ~\tilde{\lambda}^{*}(1-3\epsilon^{(r-1)}_{x})\leq\lambda^{(r)}\leq\tilde{\lambda}^{*} is equivalent to showing λ~(12ϵx(r1))μ(r)λ~(1+ϵx(r1))\tilde{\lambda}^{*}(1-2\epsilon^{(r-1)}_{x})\leq\mu^{(r)*}\leq\tilde{\lambda}^{*}(1+\epsilon^{(r-1)}_{x}). We first show that the lower bound λ~(12ϵx(r1))μ(r)\tilde{\lambda}^{*}(1-2\epsilon^{(r-1)}_{x})\leq\mu^{(r)*} holds with probability 1η1-\eta. It’s obvious that the expected instance of (LP-RSS)(r)\text{(LP-RSS)}^{(r)} has the same optimal solution and objective function as (LP-SS). In (LP-RSS)(r)\text{(LP-RSS)}^{(r)}, letting xjk(r)=xjk1+ϵx(r1)x^{(r)}_{jk}=\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}, the probability of violating each resource constraint is:

Pr(j𝒥k𝒦p^j(r)aijkdixjk1+ϵx(r1)ci1+ϵx(r1)(1+ϵx(r1)))\displaystyle\Pr\left(\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\hat{p}^{(r)}_{j}a_{ijk}d_{i}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\geq\frac{c_{i}}{1+\epsilon^{(r-1)}_{x}}\left(1+\epsilon^{(r-1)}_{x}\right)\right)
\displaystyle\leq Pr(j𝒥k𝒦p^j(r)dixjk1+ϵx(r1)γ1+ϵx(r1)(1+ϵx(r1)))\displaystyle\Pr\left(\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\hat{p}^{(r)}_{j}d_{i}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\geq\frac{\gamma}{1+\epsilon^{(r-1)}_{x}}\left(1+\epsilon^{(r-1)}_{x}\right)\right)
\displaystyle\leq Pr(t=t(r1)+1t(r)j𝒥k𝒦𝟏(j(t)=j)t(r1)xjk1+ϵx(r1)γdi(1+ϵx(r1))(1+ϵx(r1)))\displaystyle\Pr\left(\sum^{t^{(r)}}_{t=t^{(r-1)}+1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\frac{\mathbf{1}\left(j\left(t\right)=j\right)}{t^{(r-1)}}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\geq\frac{\gamma}{d_{i}\left(1+\epsilon^{(r-1)}_{x}\right)}\left(1+\epsilon^{(r-1)}_{x}\right)\right)
\displaystyle\leq exp((ϵx(r1))2t(r1)γ4di(1+ϵx(r1)))\displaystyle\exp\left(\frac{-\left(\epsilon^{(r-1)}_{x}\right)^{2}t^{(r-1)}\gamma}{4d_{i}\left(1+\epsilon^{(r-1)}_{x}\right)}\right)
\displaystyle\leq η2||.\displaystyle\frac{\eta}{2|\mathcal{I}|}.

The first and second inequalities stand by the definition of γ\gamma and p^j(r)\hat{p}^{(r)}_{j} respectively. For the third inequality, since j(t)j(t)’s are independent, j𝒥k𝒦𝟏(j(t)=j)t(r1)xjk1+ϵx(r1)1t(r1)\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\frac{\mathbf{1}\left(j\left(t\right)=j\right)}{t^{(r-1)}}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\leq\frac{1}{t^{(r-1)}} and

𝔼[t=t(r1)+1t(r)j𝒥k𝒦𝟏(j(t)=j)t(r1)xjk1+ϵx(r1)]γdi(1+ϵx(r1)),\mathbb{E}\left[\sum^{t^{(r)}}_{t=t^{(r-1)}+1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\frac{\mathbf{1}\left(j\left(t\right)=j\right)}{t^{(r-1)}}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\right]\leq\frac{\gamma}{d_{i}\left(1+\epsilon^{(r-1)}_{x}\right)},

we apply the multiplicative Chernoff bounds to derive the result. The fourth inequality follows if

γdi(1+ϵx(r1))log2||ηt(r1)(ϵx(r1))2=di(1+ϵx(r1))4Tγ.\gamma\geq\frac{d_{i}(1+\epsilon^{(r-1)}_{x})\log\frac{2|\mathcal{I}|}{\eta}}{t^{(r-1)}\left(\epsilon^{(r-1)}_{x}\right)^{2}}=\frac{d_{i}(1+\epsilon^{(r-1)}_{x})}{4T}\gamma.

This is indeed the case since we assume the usage duration diicd_{i}~{}\forall i\in\mathcal{I}_{c} is small compared with TT. Taking a union bound over all ici\in\mathcal{I}_{c}, the probability of violating any resource constraints is at most icη2||η2\sum_{i\in\mathcal{I}_{c}}\frac{\eta}{2|\mathcal{I}|}\leq\frac{\eta}{2}.

A similar process holds for bounding the probability of violating each reward constraint:

Pr(j𝒥k𝒦p^j(r)wijkxjk1+ϵx(r1)λ~1+ϵx(r1)(1ϵx(r1)))\displaystyle\Pr\left(\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\hat{p}^{(r)}_{j}w_{ijk}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\leq\frac{\tilde{\lambda}^{*}}{1+\epsilon^{(r-1)}_{x}}\left(1-\epsilon^{(r-1)}_{x}\right)\right)
\displaystyle\leq Pr(t=t(r1)+1t(r)j𝒥k𝒦𝟏(j(t)=j)t(r1)wijkγλ~xjk1+ϵx(r1)γ1+ϵx(r1)(1ϵx(r1)))\displaystyle\Pr\left(\sum^{t^{(r)}}_{t=t^{(r-1)}+1}\sum_{j\in\mathcal{J}}\sum_{k\in\mathcal{K}}\frac{\mathbf{1}\left(j\left(t\right)=j\right)}{t^{(r-1)}}\frac{w_{ijk}\gamma}{\tilde{\lambda}^{*}}\frac{x^{*}_{jk}}{1+\epsilon^{(r-1)}_{x}}\leq\frac{\gamma}{1+\epsilon^{(r-1)}_{x}}\left(1-\epsilon^{(r-1)}_{x}\right)\right)
\displaystyle\leq exp((ϵx(r1))2t(r1)γ2T(1+ϵx(r1)))\displaystyle\exp\left(\frac{-\left(\epsilon^{(r-1)}_{x}\right)^{2}t^{(r-1)}\gamma}{2T\left(1+\epsilon^{(r-1)}_{x}\right)}\right)
\displaystyle\leq η2||.\displaystyle\frac{\eta}{2|\mathcal{I}|}.

Combining the union bounds for all resource constraints and reward constraints, with with probability at least 1η1-\eta, the stage rr LP (LP-RSS)(r)\text{(LP-RSS)}^{(r)} achieves an objective value of at least λ~1ϵx(r1)1+ϵx(r1)λ~(12ϵx(r1))\tilde{\lambda}^{*}\frac{1-\epsilon^{(r-1)}_{x}}{1+\epsilon^{(r-1)}_{x}}\geq\tilde{\lambda}^{*}(1-2\epsilon^{(r-1)}_{x}).

Next we show that the upper bound μ(r)λ~(1+ϵx(r1))\mu^{(r)*}\leq\tilde{\lambda}^{*}(1+\epsilon^{(r-1)}_{x}) holds with probability 1η1-\eta. Note that the dual of (LP-SS) is:

(LP-SS-D) :min𝜶,𝜷,𝝆\displaystyle\text{(LP-SS-D) }:\min\limits_{\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\rho}} j𝒥pjβj+icciαi\displaystyle~{}\sum_{j\in\mathcal{J}}p_{j}\beta_{j}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha_{i}
s.t. βj+icaijkdiαiirwijkρi0\displaystyle~{}\beta_{j}+\sum_{i\in\mathcal{I}_{c}}a_{ijk}d_{i}\alpha_{i}-\sum_{i\in\mathcal{I}_{r}}w_{ijk}\rho_{i}\geq 0 j𝒥,k𝒦\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K}
irρi1\displaystyle\sum_{i\in\mathcal{I}_{r}}\rho_{i}\geq 1
ρi0\displaystyle\rho_{i}\geq 0 ir\displaystyle\forall i\in\mathcal{I}_{r}
αi0\displaystyle\alpha_{i}\geq 0 ic\displaystyle\forall i\in\mathcal{I}_{c}
βj0\displaystyle\beta_{j}\geq 0 j𝒥.\displaystyle\forall j\in\mathcal{J}.

The dual of (LP-RSS)(r)\text{(LP-RSS)}^{(r)} is:

(LP-RSS-D)(r):min𝜶(r),𝜷(r),𝝆(r)\displaystyle\text{(LP-RSS-D)}^{(r)}:~{}\min\limits_{\boldsymbol{\alpha}^{(r)},\boldsymbol{\beta}^{(r)},\boldsymbol{\rho}^{(r)}} j𝒥p^j(r)βj(r)+icciαi(r)\displaystyle~{}\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{(r)}_{j}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha^{(r)}_{i}
s.t. βj(r)+icaijkdiαi(r)irwijkρi(r)0\displaystyle~{}\beta^{(r)}_{j}+\sum_{i\in\mathcal{I}_{c}}a_{ijk}d_{i}\alpha^{(r)}_{i}-\sum_{i\in\mathcal{I}_{r}}w_{ijk}\rho^{(r)}_{i}\geq 0 j𝒥,k𝒦\displaystyle\forall j\in\mathcal{J},~{}k\in\mathcal{K}
irρi(r)1\displaystyle\sum_{i\in\mathcal{I}_{r}}\rho^{(r)}_{i}\geq 1
ρi(r)0\displaystyle\rho^{(r)}_{i}\geq 0 ir\displaystyle\forall i\in\mathcal{I}_{r}
αi(r)0\displaystyle\alpha^{(r)}_{i}\geq 0 ic\displaystyle\forall i\in\mathcal{I}_{c}
βj(r)0\displaystyle\beta^{(r)}_{j}\geq 0 j𝒥.\displaystyle\forall j\in\mathcal{J}.

Since the domain of the constraints are exactly the same for two dual LPs, the optimal solution of (LP-SS-D) is feasible for (LP-RSS-D)(r)\text{(LP-RSS-D)}^{(r)}. Suppose the optimal solutions to (LP-SS-D) is αi\alpha^{*}_{i}, βj\beta^{*}_{j} and ρi\rho^{*}_{i}. The optimal objective value is then λ~=j𝒥pjβj+icciαi\tilde{\lambda}^{*}=\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha^{*}_{i}. Letting αi(r)=αi\alpha^{(r)}_{i}=\alpha^{*}_{i}, βj(r)=βj\beta^{(r)}_{j}=\beta^{*}_{j} and ρi(r)=ρi\rho^{(r)}_{i}=\rho^{*}_{i}, since (LP-RSS-D)(r)\text{(LP-RSS-D)}^{(r)} is a minimization problem, we have an upper bound for optimal objective value μ(r)\mu^{(r)*}, i.e. μ(r)j𝒥p^j(r)βj+icciαi\mu^{(r)*}\leq\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}+\sum_{i\in\mathcal{I}_{c}}c_{i}\alpha^{*}_{i}. If irρi\sum_{i\in\mathcal{I}_{r}}\rho^{*}_{i} is strictly larger than 11, we can lower the values of ρi\rho^{*}_{i} and still get a feasible solution for (LP-SS-D). In this case the objective is not optimal. Therefore, irρi=1\sum_{i\in\mathcal{I}_{r}}\rho^{*}_{i}=1. It can be seen that βj+icaijkdiαiwmax\beta^{*}_{j}+\sum_{i\in\mathcal{I}_{c}}a_{ijk}d_{i}\alpha^{*}_{i}\leq w_{\max}, since otherwise we can lower the value of βj\beta^{*}_{j} and still have a feasible solution. Hence if 4wmaxlog(1/η)t(r1)(j𝒥pjβj)[0,2e1]\sqrt{\frac{4w_{\max}\log(1/\eta)}{t^{(r-1)}(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})}}\in[0,2e-1], then with probability at most 1η1-\eta we have

j𝒥p^j(r)βjj𝒥pjβj\displaystyle\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}-\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}
=\displaystyle= t=t(r1)+1t(r)j𝒥𝟏(j(t)=j)t(r1)βjj𝒥pjβj\displaystyle\sum^{t^{(r)}}_{t=t^{(r-1)}+1}\sum_{j\in\mathcal{J}}\frac{\mathbf{1}(j(t)=j)}{t^{(r-1)}}\beta^{*}_{j}-\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}
\displaystyle\leq 4(j𝒥pjβj)wmaxlog(1/η)t(r1).\displaystyle\sqrt{\frac{4(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})w_{\max}\log(1/\eta)}{t^{(r-1)}}}.

The inequality is derived by applying the Chernoff bounds to j𝒥𝟏(j(t)=j)t(r1)βj\sum_{j\in\mathcal{J}}\frac{\mathbf{1}(j(t)=j)}{t^{(r-1)}}\beta^{*}_{j}. We have wmaxt(r1)\frac{w_{\max}}{t^{(r-1)}} under the square root since βjwmax\beta^{*}_{j}\leq w_{\max} and therefore j𝒥𝟏(j(t)=j)t(r1)βjwmaxt(r1)\sum_{j\in\mathcal{J}}\frac{\mathbf{1}(j(t)=j)}{t^{(r-1)}}\beta^{*}_{j}\leq\frac{w_{\max}}{t^{(r-1)}}. By the assumption that γ=minic(ciamax,Tλ~wmax)Tλ~wmax\gamma=\min\limits_{i\in\mathcal{I}_{c}}\left(\frac{c_{i}}{a_{\max}},\frac{T\tilde{\lambda}^{*}}{w_{\max}}\right)\leq\frac{T\tilde{\lambda}^{*}}{w_{\max}} and definition of ϵx(r1)\epsilon^{(r-1)}_{x}, we have:

j𝒥pjβjλ~\displaystyle\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}\leq\tilde{\lambda}^{*}
\displaystyle\Rightarrow (j𝒥pjβj)wmax(λ~)2wmaxλ~\displaystyle\frac{(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})w_{\max}}{\left(\tilde{\lambda}^{*}\right)^{2}}\leq\frac{w_{\max}}{\tilde{\lambda}^{*}}
\displaystyle\Rightarrow (j𝒥pjβj)wmax(λ~)2Tγ\displaystyle\frac{(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})w_{\max}}{\left(\tilde{\lambda}^{*}\right)^{2}}\leq\frac{T}{\gamma}
\displaystyle\Rightarrow 4(j𝒥pjβj)wmaxlog(1/η)(λ~)2t(r1)4Tlog(2||/η)t(r1)γ=ϵx(r1).\displaystyle\sqrt{\frac{4(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})w_{\max}\log(1/\eta)}{\left(\tilde{\lambda}^{*}\right)^{2}t^{(r-1)}}}\leq\sqrt{\frac{4T\log(2|\mathcal{I}|/\eta)}{t^{(r-1)}\gamma}}=\epsilon^{(r-1)}_{x}.

Thus with a probability at least 1η1-\eta,

j𝒥p^j(r)βj+iciαi\displaystyle\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}+\sum_{i\in\mathcal{I}}c_{i}\alpha_{i}
\displaystyle\leq j𝒥pjβj+iciαi+4(j𝒥pjβj)wmaxlog(1/η)t(r1)\displaystyle\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}+\sum_{i\in\mathcal{I}}c_{i}\alpha_{i}+\sqrt{\frac{4(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})w_{\max}\log(1/\eta)}{t^{(r-1)}}}
\displaystyle\leq λ~(1+ϵx(r1)).\displaystyle\tilde{\lambda}^{*}(1+\epsilon^{(r-1)}_{x}).

If 4wmaxlog(1/η)tr1(j𝒥pjβj)2e1\sqrt{\frac{4w_{\max}\log(1/\eta)}{t_{r-1}(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})}}\geq 2e-1, then the above procedure doesn’t hold. Notice that all we need is j𝒥p^j(r)βjj𝒥pjβjλ~ϵx(r1)\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}-\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}\leq\tilde{\lambda}^{*}\epsilon^{(r-1)}_{x} with high probability. Hence define ϵy(r1)=λ~ϵx(r1)j𝒥pjβj\epsilon^{(r-1)}_{y}=\frac{\tilde{\lambda}^{*}\epsilon^{(r-1)}_{x}}{\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}}. Supposing ϵy(r)>2e1\epsilon^{(r)}_{y}>2e-1 (which will be proved later), then using the Chernoff bound, we have:

Pr(j𝒥p^j(r)βjj𝒥pjβjλ~ϵx(r1))\displaystyle\Pr\left(\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}-\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}\geq\tilde{\lambda}^{*}\epsilon^{(r-1)}_{x}\right)
=\displaystyle= Pr(j𝒥p^j(r)βjj𝒥pjβj(1+ϵy(r1)))\displaystyle\Pr\left(\sum_{j\in\mathcal{J}}\hat{p}^{(r)}_{j}\beta^{*}_{j}\geq\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}(1+\epsilon^{(r-1)}_{y})\right)
\displaystyle\leq 2(1+ϵy(r1))t(r1)j𝒥pjβjwmax\displaystyle 2^{-\frac{(1+\epsilon^{(r-1)}_{y})t^{(r-1)}\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}}{w_{\max}}}
=\displaystyle= 2t(r1)j𝒥pjβj+t(r1)λ~ϵx(r1)wmax\displaystyle 2^{-\frac{t^{(r-1)}\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}+t^{(r-1)}\tilde{\lambda}^{*}\epsilon^{(r-1)}_{x}}{w_{\max}}}
\displaystyle\leq 2t(r1)λ~ϵx(r1)wmax\displaystyle 2^{-\frac{t^{(r-1)}\tilde{\lambda}^{*}\epsilon^{(r-1)}_{x}}{w_{\max}}}
\displaystyle\leq 2t(r1)γϵx(r1)T\displaystyle 2^{-\frac{t^{(r-1)}\gamma\epsilon^{(r-1)}_{x}}{T}}
\displaystyle\leq 2ϵ2γ\displaystyle 2^{-\epsilon^{2}\gamma}
\displaystyle\leq η.\displaystyle\eta.

The third inequality follows from γTλ~wmax\gamma\leq\frac{T\tilde{\lambda}^{*}}{w_{\max}}. The fourth inequality holds by the definitions of t(r1)t^{(r-1)} and the fact that ϵx(r)ϵ\epsilon^{(r)}_{x}\geq\epsilon for all rr. The last inequality holds if γlog1ηϵ2\gamma\geq\frac{\log\frac{1}{\eta}}{\epsilon^{2}}. This is indeed the case since γ=Ω(log||Tϵϵ2)\gamma=\Omega\left(\frac{\log\frac{|\mathcal{I}|T}{\epsilon}}{\epsilon^{2}}\right) and we let η=O(ϵl)\eta=O\left(\frac{\epsilon}{l}\right) (which will be demonstrated later in the proof of Lemma 5). Hence the only thing left to prove is that given 4wmaxlog(1/η)t(r1)(j𝒥pjβj)2e1\sqrt{\frac{4w_{\max}\log(1/\eta)}{t^{(r-1)}(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})}}\geq 2e-1, we have ϵy(r)>2e1\epsilon^{(r)}_{y}>2e-1 . This would be a direct result if j𝒥pjβj<λ~ϵ2e1\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}<\frac{\tilde{\lambda}^{*}\epsilon}{2e-1}. Next we show that given j𝒥pjβjλ~ϵ2e1\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j}\geq\frac{\tilde{\lambda}^{*}\epsilon}{2e-1}, 4wmaxlog(1/η)t(r1)(j𝒥pjβj)<2e1\sqrt{\frac{4w_{\max}\log(1/\eta)}{t^{(r-1)}(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})}}<2e-1. This completes the whole proof.

4wmaxlog(1/η)t(r1)(j𝒥pjβj)\displaystyle\sqrt{\frac{4w_{\max}\log(1/\eta)}{t^{(r-1)}(\sum_{j\in\mathcal{J}}p_{j}\beta^{*}_{j})}}
\displaystyle\leq 22e1wmaxlog(1/η)t(r1)λ~ϵ\displaystyle 2\sqrt{2e-1}\sqrt{\frac{w_{\max}\log(1/\eta)}{t^{(r-1)}\tilde{\lambda}^{*}\epsilon}}
\displaystyle\leq 22e1wmaxlog(1/η)ϵ2T2r1λ~ϵ\displaystyle 2\sqrt{2e-1}\sqrt{\frac{w_{\max}\log(1/\eta)}{\epsilon^{2}T2^{r-1}\tilde{\lambda}^{*}\epsilon}}
\displaystyle\leq 22e1log(1/η)ϵ2γ\displaystyle 2\sqrt{2e-1}\sqrt{\frac{\log(1/\eta)}{\epsilon^{2}\gamma}}
\displaystyle\leq 22e1\displaystyle 2\sqrt{2e-1}
<\displaystyle< 2e1.\displaystyle 2e-1.

The second inequality holds since t(r1)=ϵT2r1t^{(r-1)}=\epsilon T2^{r-1}. The third inequalities follow from γTλ~wmax\gamma\leq\frac{T\tilde{\lambda}^{*}}{w_{\max}}. The fourth inequality holds since γlog1ηϵ2\gamma\geq\frac{\log\frac{1}{\eta}}{\epsilon^{2}}. ∎

IV-D Proof of Lemma 4

Before proving Lemma 4, we first demonstrate the validity of the following lemma.

Lemma 7.

Suppose ϵ,η(0,1)\epsilon,\eta\in(0,1). Define ϵz(r)=2wmax(1+ϵ)log2||lηt(r)λ(r)\epsilon^{(r)}_{z}=\sqrt{\frac{2w_{\max}(1+\epsilon)\log\frac{2|\mathcal{I}|l}{\eta}}{t^{(r)}\lambda^{(r)}}} for r{0,1,,l1}r\in\{0,1,\ldots,l-1\}. In stage rr, Algorithm SS achieves a reward of at least

t=1t(r)k𝒦Wi,j(r)(t),k(r)(t)X~j(r)(t),k(r)(t)t(r)λ(r)(1ϵz(r))\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}W^{(r)}_{i,j^{(r)}(t),k}(t)\tilde{X}^{(r)}_{j^{(r)}(t),k}(t)\geq t^{(r)}\lambda^{(r)}(1-\epsilon^{(r)}_{z})

for every iri\in\mathcal{I}_{r} with probability at least 1(1+ϵ)δ(t(r)2Tϵ+η2l)1-\left(1+\epsilon\right)^{\delta}(\frac{t^{(r)}}{2T}\epsilon+\frac{\eta}{2l}).

Lemma 7.

To aid analysis, we leave d¯(δ)\bar{d}(\delta) customers unserved in the end of each stage, so the next stage starts nearly empty. This causes a total reward loss of d¯(δ)wmaxlog1ϵ\bar{d}(\delta)w_{\max}\log\frac{1}{\epsilon}. Suppose we run Algorithm SS over the entire planning horizon. Then at any time tt of stage rr, the number of resource ii occupied by customers from previous stages can defined as Y^it(r)=h=0r1τ=1t(h)k𝒦𝟏(Di(h)(τ)t(r)+t(t(h)+τ)+1)Ai,j(h)(τ),k(h)(τ)X~j(h)(τ),k(h)(τ)\hat{Y}^{(r)}_{it}=\sum^{r-1}_{h=0}\sum^{t^{(h)}}_{\tau=1}\sum_{k\in\mathcal{K}}\mathbf{1}(D^{(h)}_{i}(\tau)\geq t^{(r)}+t-(t^{(h)}+\tau)+1)A^{(h)}_{i,j^{(h)}(\tau),k}(\tau)\tilde{X}^{(h)}_{j^{(h)}(\tau),k}(\tau). Since t=d¯(δ)+1Pr(Dit)δ\sum^{\infty}_{t=\bar{d}(\delta)+1}\Pr(D_{i}\geq t)\leq\delta, it is evident that 𝔼[Y^it(r)]amaxδ\mathbb{E}[\hat{Y}^{(r)}_{it}]\leq a_{\max}\delta. Then the probability of violating the resource constraint at the tt-th time of stage rr is bounded as:

Pr(Y^it(r)+τ=1tY~iτt(r)ci1+ϵ(1+ϵ))=\displaystyle\Pr(\hat{Y}^{(r)}_{it}+\sum^{t}_{\tau=1}\tilde{Y}^{(r)}_{i\tau t}\geq\frac{c_{i}}{1+\epsilon}(1+\epsilon))= Pr(γci(Y^it(r)+τ=1tY~iτt(r))γ1+ϵ(1+ϵ))\displaystyle\Pr(\frac{\gamma}{c_{i}}(\hat{Y}^{(r)}_{it}+\sum^{t}_{\tau=1}\tilde{Y}^{(r)}_{i\tau t})\geq\frac{\gamma}{1+\epsilon}(1+\epsilon))
=\displaystyle= Pr((1+ϵ)γci(Y^it(r)+τ=1tY~iτt(r))(1+ϵ)γ)\displaystyle\Pr((1+\epsilon)^{\frac{\gamma}{c_{i}}(\hat{Y}^{(r)}_{it}+\sum^{t}_{\tau=1}\tilde{Y}^{(r)}_{i\tau t})}\geq(1+\epsilon)^{\gamma})
\displaystyle\leq 𝔼[(1+ϵ)γci(Y^it(r)+τ=1tY~iτt(r))]/(1+ϵ)γ\displaystyle\mathbb{E}[(1+\epsilon)^{\frac{\gamma}{c_{i}}(\hat{Y}^{(r)}_{it}+\sum^{t}_{\tau=1}\tilde{Y}^{(r)}_{i\tau t})}]/(1+\epsilon)^{\gamma}
\displaystyle\leq 𝔼[(1+ϵ)γamaxδci(1+ϵ)γciτ=1tY~iτt(r)]/(1+ϵ)γ\displaystyle\mathbb{E}[(1+\epsilon)^{\frac{\gamma a_{\max}\delta}{c_{i}}}(1+\epsilon)^{\frac{\gamma}{c_{i}}\sum^{t}_{\tau=1}\tilde{Y}^{(r)}_{i\tau t}}]/(1+\epsilon)^{\gamma}
\displaystyle\leq (1+ϵ)δγ𝔼[τ=1t(1+ϵ)γciY~iτt(r)]\displaystyle(1+\epsilon)^{\delta-\gamma}\mathbb{E}[\prod^{t}_{\tau=1}(1+\epsilon)^{\frac{\gamma}{c_{i}}\tilde{Y}^{(r)}_{i\tau t}}]
\displaystyle\leq (1+ϵ)δγ𝔼[τ=1t(1+ϵγciY~iτt(r))]\displaystyle(1+\epsilon)^{\delta-\gamma}\mathbb{E}[\prod^{t}_{\tau=1}(1+\epsilon\frac{\gamma}{c_{i}}\tilde{Y}^{(r)}_{i\tau t})]
\displaystyle\leq (1+ϵ)δγτ=1t(1+ϵγci𝔼[Y~iτt(r)])\displaystyle(1+\epsilon)^{\delta-\gamma}\prod^{t}_{\tau=1}(1+\epsilon\frac{\gamma}{c_{i}}\mathbb{E}[\tilde{Y}^{(r)}_{i\tau t}])
\displaystyle\leq (1+ϵ)δγτ=1texp(ϵγci𝔼[Y~iτt(r)])\displaystyle(1+\epsilon)^{\delta-\gamma}\prod^{t}_{\tau=1}\exp(\epsilon\frac{\gamma}{c_{i}}\mathbb{E}[\tilde{Y}^{(r)}_{i\tau t}])
\displaystyle\leq (1+ϵ)δexp(ϵγ1+ϵ)/(1+ϵ)γ\displaystyle(1+\epsilon)^{\delta}\exp(\frac{\epsilon\gamma}{1+\epsilon})/(1+\epsilon)^{\gamma}
\displaystyle\leq (1+ϵ)δexp(ϵ2γ4(1+ϵ))\displaystyle(1+\epsilon)^{\delta}\exp(\frac{-\epsilon^{2}\gamma}{4(1+\epsilon)})
\displaystyle\leq ϵ(1+ϵ)δ2||T.\displaystyle\frac{\epsilon(1+\epsilon)^{\delta}}{2|\mathcal{I}|T}.

The first inequality follows from Markov’s inequality. The second inequality holds since 𝔼[Y^it(r)]amaxδ\mathbb{E}[\hat{Y}^{(r)}_{it}]\leq a_{\max}\delta. The third in equality holds since γamaxciic\gamma\leq\frac{a_{\max}}{c_{i}}~{}\forall i\in\mathcal{I}_{c}. The fourth inequality stands since γciY~iτt(r)1\frac{\gamma}{c_{i}}\tilde{Y}^{(r)}_{i\tau t}\leq 1 for all ici\in\mathcal{I}_{c} and 1τtT1\leq\tau\leq t\leq T, and function (1+ϵ)x1+ϵx(1+\epsilon)^{x}\leq 1+\epsilon x for x[0,1]x\in[0,1]. The fifth inequality holds because for each time period τ\tau, Y~iτt(r)\tilde{Y}^{(r)}_{i\tau t} are independent from each other. The sixth inequality holds by the fact that function 1+xex1+x\leq e^{x}. The seven inequality follows from 𝔼[Y~iτt(r)]ci1+ϵ\mathbb{E}\left[\tilde{Y}^{(r)}_{i\tau t}\right]\leq\frac{c_{i}}{1+\epsilon}. The eighth inequality stands for all ϵ[0,1]\epsilon\in[0,1], and the last inequality follows by assuming γ4(1+ϵ)ϵ2log2||Tϵ\gamma\geq\frac{4(1+\epsilon)}{\epsilon^{2}}\log\frac{2|\mathcal{I}|T}{\epsilon}. Taking a union bound over all ici\in\mathcal{I}_{c} and t=1,,t(r)t=1,\ldots,t^{(r)}, the probability of satisfying all resource constraints for (LP-RSS)(r)\text{(LP-RSS)}^{(r)} is at least

1τ=1t(r)icϵ(1+ϵ)δ2||T=1ϵ(1+ϵ)δ2||T|c|t(r)1t(r)2Tϵ(1+ϵ)δ.1-\sum^{t^{(r)}}_{\tau=1}\sum_{i\in\mathcal{I}_{c}}\frac{\epsilon(1+\epsilon)^{\delta}}{2|\mathcal{I}|T}=1-\frac{\epsilon(1+\epsilon)^{\delta}}{2|\mathcal{I}|T}|\mathcal{I}_{c}|t^{(r)}\geq 1-\frac{t^{(r)}}{2T}\epsilon(1+\epsilon)^{\delta}.

For the reward constraints, notice ϵz(r)=2wmax(1+ϵ)log2||lηt(r)λ(r)\epsilon^{(r)}_{z}=\sqrt{\frac{2w_{\max}(1+\epsilon)\log\frac{2|\mathcal{I}|l}{\eta}}{t^{(r)}\lambda^{(r)}}}, we have:

Pr(t=1t(r)k𝒦Wi,j(r)(t),k(r)(t)X~j(r)(t),k(r)(t)t(r)λ(r)(1ϵz(r)))\displaystyle\Pr\left(\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}W^{(r)}_{i,j^{(r)}(t),k}\left(t\right)\tilde{X}^{(r)}_{j^{(r)}(t),k}\left(t\right)\leq t^{(r)}\lambda^{(r)}\left(1-\epsilon^{(r)}_{z}\right)\right)
=\displaystyle= Pr(t=1t(r)k𝒦Wi,j(r)(t),k(r)(t)wmaxX~j(r)(t),k(r)(t)t(r)λ(r)(1ϵz(r))wmax)\displaystyle\Pr\left(\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}\frac{W^{(r)}_{i,j^{(r)}(t),k}\left(t\right)}{w_{\max}}\tilde{X}^{(r)}_{j^{(r)}(t),k}\left(t\right)\leq\frac{t^{(r)}\lambda^{(r)}\left(1-\epsilon^{(r)}_{z}\right)}{w_{\max}}\right)
\displaystyle\leq exp((ϵz(r))2t(r)λ(r)2wmax(1+ϵ))\displaystyle\exp\left(\frac{-\left(\epsilon^{(r)}_{z}\right)^{2}t^{(r)}\lambda^{(r)}}{2w_{\max}\left(1+\epsilon\right)}\right)
\displaystyle\leq η2||l.\displaystyle\frac{\eta}{2|\mathcal{I}|l}.

The first inequality follows from applying the multiplicative Chernoff bounds on k𝒦Wi,j(t),k(r)(t)wmaxX~j(t),k(r)(t)\sum_{k\in\mathcal{K}}\frac{W^{(r)}_{i,j(t),k}(t)}{w_{\max}}\tilde{X}^{(r)}_{j(t),k}(t). It is evident that these summing terms are mutually independent and within range [0,1][0,1], and meanwhile 𝔼[t=1t(r)k𝒦Wi,j(t),k(r)(t)wmaxX~j(r)(t),k(r)(t)]t(r)λ~wmax(1+ϵ)t(r)λ(r)wmax(1+ϵ)\mathbb{E}\left[\sum^{t^{(r)}}_{t=1}\sum_{k\in\mathcal{K}}\frac{W^{(r)}_{i,j(t),k}(t)}{w_{\max}}\tilde{X}^{(r)}_{j^{(r)}(t),k}(t)\right]\geq\frac{t^{(r)}\tilde{\lambda}^{*}}{w_{\max}(1+\epsilon)}\geq\frac{t^{(r)}\lambda^{(r)}}{w_{\max}(1+\epsilon)}. The last inequality holds by the definition of ϵz(r)\epsilon^{(r)}_{z}. By a union bound over resource and reward constraints, we have the result of Lemma 7. ∎

Lemma 4.

Suppose λ~(13ϵx(r1))λ(r)λ~\tilde{\lambda}^{*}(1-3\epsilon^{(r-1)}_{x})\leq\lambda^{(r)}\leq\tilde{\lambda}^{*} holds. Using λ(r)\lambda^{(r)} as an approximation of λ\lambda, over the ll stages, we achieve a total reward of

r=0l1t(r)λ(r)(1ϵz(r))\displaystyle\sum^{l-1}_{r=0}t^{(r)}\lambda^{(r)}\left(1-\epsilon^{(r)}_{z}\right)
\displaystyle\geq r=0l1Tλ~t(r)T(13ϵx(r1))(1ϵz(r))\displaystyle\sum^{l-1}_{r=0}T\tilde{\lambda}^{*}\frac{t^{(r)}}{T}\left(1-3\epsilon^{(r-1)}_{x}\right)\left(1-\epsilon^{(r)}_{z}\right)
\displaystyle\geq Tλ~r=0l1t(r)T(13ϵx(r1)ϵz(r))\displaystyle T\tilde{\lambda}^{*}\sum^{l-1}_{r=0}\frac{t^{(r)}}{T}\left(1-3\epsilon^{(r-1)}_{x}-\epsilon^{(r)}_{z}\right)
\displaystyle\geq Tλ~(1O(r=0l1ϵt(r)T))\displaystyle T\tilde{\lambda}^{*}\left(1-O\left(\sum^{l-1}_{r=0}\epsilon\sqrt{\frac{t^{(r)}}{T}}\right)\right)
\displaystyle\geq Tλ~(1O(ϵϵr=0l12r))\displaystyle T\tilde{\lambda}^{*}\left(1-O\left(\epsilon\sqrt{\epsilon}\sum^{l-1}_{r=0}\sqrt{2}^{r}\right)\right)
\displaystyle\geq Tλ~(1O(ϵϵr=0log2(1ϵ)2r))\displaystyle T\tilde{\lambda}^{*}\left(1-O\left(\epsilon\sqrt{\epsilon}\sum^{\log_{\sqrt{2}}\left(\frac{1}{\sqrt{\epsilon}}\right)}_{r=0}\sqrt{2}^{r}\right)\right)
\displaystyle\geq Tλ~(1O(ϵ)).\displaystyle T\tilde{\lambda}^{*}\left(1-O\left(\epsilon\right)\right).

The first inequality holds since λ~(13ϵx(r1))λ(r)λ~\tilde{\lambda}^{*}(1-3\epsilon^{(r-1)}_{x})\leq\lambda^{(r)}\leq\tilde{\lambda}^{*}. The third inequality holds by definitions of ϵx(r)=4Tlog2||ηt(r)γ\epsilon^{(r)}_{x}=\sqrt{\frac{4T\log\frac{2|\mathcal{I}|}{\eta}}{t^{(r)}\gamma}} and ϵz(r)=2wmax(1+ϵ)log2||lηt(r)λ(r)\epsilon^{(r)}_{z}=\sqrt{\frac{2w_{\max}(1+\epsilon)\log\frac{2|\mathcal{I}|l}{\eta}}{t^{(r)}\lambda^{(r)}}}, and the fact that γTλ(r)wmax(13ϵx(r1))\gamma\leq\frac{T\lambda^{(r)}}{w_{\max}(1-3\epsilon^{(r-1)}_{x})}. The fourth and fifth inequalities stems from t(r)=ϵT2rt^{(r)}=\epsilon T2^{r} and ϵ2l=1\epsilon 2^{l}=1 respectively.

Notice that with probability of at most 2η2\eta, λ~(13ϵx(r1))λ(r)λ~\tilde{\lambda}^{*}(1-3\epsilon^{(r-1)}_{x})\leq\lambda^{(r)}\leq\tilde{\lambda}^{*} fails. Therefore, Algorithm SS gives us a reward of at least Tλ~(1O(ϵ))T\tilde{\lambda}^{*}(1-O(\epsilon)) with probability of at least

1r=0l1t(r1)2Tϵ(1+ϵ)δr=0l1η2l2η=1ϵ(1+ϵ)δ25η21ϵ(1+ϵ)δ.1-\sum^{l-1}_{r=0}\frac{t^{(r-1)}}{2T}\epsilon(1+\epsilon)^{\delta}-\sum^{l-1}_{r=0}\frac{\eta}{2l}-2\eta=1-\frac{\epsilon(1+\epsilon)^{\delta}}{2}-\frac{5\eta}{2}\leq 1-\epsilon(1+\epsilon)^{\delta}.

IV-E Proof of Lemma 5

Proof.

To aid the analysis, we leave d¯(δ)\bar{d}(\delta) customer unserved in the end of each stage, so the next stage starts nearly empty. This causes a total reward loss of at most d¯(δ)wmaxl=d¯(δ)wmaxlog(1/ϵ)\bar{d}(\delta)w_{\max}l=\bar{d}(\delta)w_{\max}\log(1/\epsilon). Define the number of resource ii occupied by customers from previous stages as Y^it(r)\hat{Y}^{(r)}_{it} in time tt of stage rr. It is evident that 𝔼[Y^it(r)]amaxδ\mathbb{E}[\hat{Y}^{(r)}_{it}]\leq a_{\max}\delta. Then at any time step t{s+1,,t(r)}t\in\{s+1,\ldots,t^{(r)}\}, by the Markov inequality, the probability of violating resource ii constraint is:

Pr(Y^it(r)+τ=1s+1Yiτt(r)+τ=s+2tY~iτt(r)ci1+ϵ(1+ϵ))\displaystyle\Pr\left(\hat{Y}^{(r)}_{it}+\sum^{s+1}_{\tau=1}Y^{(r)}_{i\tau t}+\sum^{t}_{\tau=s+2}\tilde{Y}^{(r)}_{i\tau t}\geq\frac{c_{i}}{1+\epsilon}\left(1+\epsilon\right)\right)
=\displaystyle= Pr(γci(Y^it(r)+τ=1s+1Yiτt(r)+τ=s+2tY~iτt(r))γ)\displaystyle\Pr\left(\frac{\gamma}{c_{i}}\left(\hat{Y}^{(r)}_{it}+\sum^{s+1}_{\tau=1}Y^{(r)}_{i\tau t}+\sum^{t}_{\tau=s+2}\tilde{Y}^{(r)}_{i\tau t}\right)\geq\gamma\right)
\displaystyle\leq 𝔼[(1+ϵ)γci(Y^it(r)+τ=1s+1Yiτt(r)+τ=s+2tY~iτt(r))]/(1+ϵ)γ\displaystyle\mathbb{E}\left[(1+\epsilon)^{\frac{\gamma}{c_{i}}\left(\hat{Y}^{(r)}_{it}+\sum^{s+1}_{\tau=1}Y^{(r)}_{i\tau t}+\sum^{t}_{\tau=s+2}\tilde{Y}^{(r)}_{i\tau t}\right)}\right]/(1+\epsilon)^{\gamma}
\displaystyle\leq 𝔼[(1+ϵ)γamaxδci(1+ϵ)γciτ=1sYiτt(r)(1+ϵγciYi,s+1,t(r))(1+ϵ)τ=s+2tγci𝔼[Y~iτt(r)]]/(1+ϵ)γ\displaystyle\mathbb{E}\left[(1+\epsilon)^{\frac{\gamma a_{\max}\delta}{c_{i}}}\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\left(1+\epsilon\frac{\gamma}{c_{i}}Y^{(r)}_{i,s+1,t}\right)\left(1+\epsilon\right)^{\sum^{t}_{\tau=s+2}\frac{\gamma}{c_{i}}\mathbb{E}\left[\tilde{Y}^{(r)}_{i\tau t}\right]}\right]/(1+\epsilon)^{\gamma}
\displaystyle\leq (1+ϵ)δγ𝔼[(1+ϵ)γciτ=1sYiτt(r)(1+ϵγciY~i,s+1,t(r))τ=s+2t(1+ϵγci𝔼[Y~iτt(r)])].\displaystyle(1+\epsilon)^{\delta-\gamma}\mathbb{E}\left[\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\left(1+\epsilon\frac{\gamma}{c_{i}}\tilde{Y}^{(r)}_{i,s+1,t}\right)\prod^{t}_{\tau=s+2}\left(1+\epsilon\frac{\gamma}{c_{i}}\mathbb{E}\left[\tilde{Y}^{(r)}_{i\tau t}\right]\right)\right]. (33)

While it is desirable to upper bound the probabilities of violating any resource constraints without the term 𝔼[Y~iτt(r)]\mathbb{E}[\tilde{Y}^{(r)}_{i\tau t}], the values of 𝔼[Y~iτt(r)]\mathbb{E}[\tilde{Y}^{(r)}_{i\tau t}] are not known in the online setting. Nevertheless, we use the fact that k𝒦j𝒥pjaijkdixjkci\sum_{k\in\mathcal{K}}\sum_{j\in\mathcal{J}}p_{j}a_{ijk}d_{i}x^{*}_{jk}\leq c_{i} for the “steady-state” optimal solution, and upper bound 𝔼[Y~iτt(r)]\mathbb{E}[\tilde{Y}^{(r)}_{i\tau t}] as follows:

𝔼[Y~iτt(r)]\displaystyle\mathbb{E}\left[\tilde{Y}^{(r)}_{i\tau t}\right] =k𝒦j𝒥pjaijkPr(Ditτ+1)xjk1+ϵ\displaystyle=\sum_{k\in\mathcal{K}}\sum_{j\in\mathcal{J}}p_{j}a_{ijk}\Pr(D_{i}\geq t-\tau+1)\frac{x^{*}_{jk}}{1+\epsilon}
cidi(1+ϵ)Pr(Ditτ+1).\displaystyle\leq\frac{c_{i}}{d_{i}(1+\epsilon)}\Pr(D_{i}\geq t-\tau+1). (34)

Plugging (34) in (33), in time step ss, we aim to minimize the following term which, by the union over t{s+1,,t(r)}t\in\{s+1,\ldots,t^{(r)}\} and ici\in\mathcal{I}_{c}, upper bounds the probability of violating at least one future resource constraint in stage rr:

t=s+1t(r)ic𝔼[(1+ϵ)γciτ=1sYiτt(r)(1+ϵγciYi,s+1,t(r))τ=s+2t(1+ϵγPr(Ditτ+1)di(1+ϵ))](1+ϵ)δγ.\displaystyle\sum^{t^{(r)}}_{t=s+1}\sum_{i\in\mathcal{I}_{c}}\mathbb{E}\left[\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\left(1+\epsilon\frac{\gamma}{c_{i}}Y^{(r)}_{i,s+1,t}\right)\prod^{t}_{\tau=s+2}\left(1+\epsilon\frac{\gamma\Pr\left(D_{i}\geq t-\tau+1\right)}{d_{i}\left(1+\epsilon\right)}\right)\right](1+\epsilon)^{\delta-\gamma}. (35)

Similarly, since 𝔼[Z~it]=k𝒦j𝒥pjwijkxjk1+ϵλ~1+ϵλ(r)1+ϵ\mathbb{E}[\tilde{Z}_{it}]=\sum_{k\in\mathcal{K}}\sum_{j\in\mathcal{J}}p_{j}w_{ijk}\frac{x^{*}_{jk}}{1+\epsilon}\geq\frac{\tilde{\lambda}^{*}}{1+\epsilon}\geq\frac{\lambda^{(r)}}{1+\epsilon}, by the Markov inequality, we upper bound the reward constraint violation probabilities in stage rr as:

irPr(τ=1s+1Ziτ(r)+τ=s+2t(r)Z~iτ(r)t(r)λ(r)(1ϵz(r)))\displaystyle\sum_{i\in\mathcal{I}_{r}}\Pr\left(\sum^{s+1}_{\tau=1}Z^{(r)}_{i\tau}+\sum^{t^{(r)}}_{\tau=s+2}\tilde{Z}^{(r)}_{i\tau}\leq t^{(r)}\lambda^{(r)}\left(1-\epsilon^{(r)}_{z}\right)\right)
ir𝔼[(1ϵz(r))1wmaxτ=1sZiτ(r)(1ϵz(r)1wmaxZi,s+1(r))τ=s+2t(r)(1ϵz(r)λ(r)wmax(1+ϵ))]/(1ϵz(r))(1ϵz(r))t(r)λ(r)wmax.\displaystyle\leq\sum_{i\in\mathcal{I}_{r}}\mathbb{E}\left[\left(1-\epsilon^{(r)}_{z}\right)^{\frac{1}{w_{\max}}\sum^{s}_{\tau=1}Z^{(r)}_{i\tau}}\left(1-\epsilon^{(r)}_{z}\frac{1}{w_{\max}}Z^{(r)}_{i,s+1}\right)\prod^{t^{(r)}}_{\tau=s+2}\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}\left(1+\epsilon\right)}\right)\right]/\left(1-\epsilon^{(r)}_{z}\right)^{\frac{\left(1-\epsilon^{(r)}_{z}\right)t^{(r)}\lambda^{(r)}}{w_{\max}}}. (36)

Define the sum of terms (35) and (36) as (r)(As+1St(r)s1)\mathcal{F}^{(r)}(A^{s+1}S^{t^{(r)}-s-1}). In time step s+1{1,,t(r)}s+1\in\{1,\ldots,t^{(r)}\} of stage rr, the DM aims to take the action k(r)(s+1)k^{(r)}(s+1) that minimizes (r)(As+1St(r)s1)\mathcal{F}^{(r)}(A^{s+1}S^{t^{(r)}-s-1}), which has exactly the form of (7). We further show that k(r)(s+1)k^{(r)}(s+1) in fact implies that (r)(As+1St(r)s1)(r)(AsSt(r)s)\mathcal{F}^{(r)}(A^{s+1}S^{t^{(r)}-s-1})\leq\mathcal{F}^{(r)}(A^{s}S^{t^{(r)}-s}) as follows, and inductively we have (r)(AT)(r)(ST)\mathcal{F}^{(r)}(A^{T})\leq\mathcal{F}^{(r)}(S^{T}).

Suppose we run Algorithm SS in the first ss time periods of stage rr, then at time period s+1s+1, Algorithm AA obviously minimizes r(As+1St(r)s1)\mathcal{F}_{r}\left(A^{s+1}S^{t^{(r)}-s-1}\right). Hence, if we replace the action Algorithm AA at time s+1s+1 by the choice of Algorithm SS, the probability of constraint violation is no smaller:

r(As+1St(r)s1)\displaystyle\mathcal{F}_{r}\left(A^{s+1}S^{t^{(r)}-s-1}\right)
\displaystyle\leq t=s+1t(r)ic𝔼[(1+ϵ)γciτ=1sYiτt(r)(1+ϵγciYi,s+1,t(r))τ=s+2t(1+ϵγPr(Ditτ+1)di(1+ϵ))](1+ϵ)γδ\displaystyle\frac{\sum^{t^{(r)}}_{t=s+1}\sum_{i\in\mathcal{I}_{c}}\mathbb{E}\left[\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\left(1+\epsilon\frac{\gamma}{c_{i}}Y^{(r)}_{i,s+1,t}\right)\prod^{t}_{\tau=s+2}\left(1+\epsilon\frac{\gamma\Pr\left(D_{i}\geq t-\tau+1\right)}{d_{i}\left(1+\epsilon\right)}\right)\right]}{\left(1+\epsilon\right)^{\gamma-\delta}}
+ir𝔼[(1ϵz(r))1wmaxτ=1sZiτ(r)(1ϵz(r)1wmaxZi,s+1(r))τ=s+2t(r)(1ϵz(r)λ(r)wmax(1+ϵ))](1ϵz(r))(1ϵz(r))t(r)λ(r)wmax\displaystyle+\frac{\sum_{i\in\mathcal{I}_{r}}\mathbb{E}\left[\left(1-\epsilon^{(r)}_{z}\right)^{\frac{1}{w_{\max}}\sum^{s}_{\tau=1}Z^{(r)}_{i\tau}}\left(1-\epsilon^{(r)}_{z}\frac{1}{w_{\max}}Z^{(r)}_{i,s+1}\right)\prod^{t^{(r)}}_{\tau=s+2}\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}\left(1+\epsilon\right)}\right)\right]}{\left(1-\epsilon^{(r)}_{z}\right)^{\frac{\left(1-\epsilon^{(r)}_{z}\right)t^{(r)}\lambda^{(r)}}{w_{\max}}}}
\displaystyle\leq t=s+1t(r)ic𝔼[(1+ϵ)γciτ=1sYiτt(r)(1+ϵγciY~i,s+1,t(r))τ=s+2t(1+ϵγPr(Ditτ+1)di(1+ϵ))](1+ϵ)γδ\displaystyle\frac{\sum^{t^{(r)}}_{t=s+1}\sum_{i\in\mathcal{I}_{c}}\mathbb{E}\left[\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\left(1+\epsilon\frac{\gamma}{c_{i}}\tilde{Y}^{(r)}_{i,s+1,t}\right)\prod^{t}_{\tau=s+2}\left(1+\epsilon\frac{\gamma\Pr\left(D_{i}\geq t-\tau+1\right)}{d_{i}\left(1+\epsilon\right)}\right)\right]}{\left(1+\epsilon\right)^{\gamma-\delta}}
+ir𝔼[(1ϵz(r))1wmaxτ=1sZiτ(r)(1ϵz(r)1wmaxZ~i,s+1(r))τ=s+2t(r)(1ϵz(r)λ(r)wmax(1+ϵ))](1ϵz(r))(1ϵz(r))t(r)λ(r)wmax\displaystyle+\frac{\sum_{i\in\mathcal{I}_{r}}\mathbb{E}\left[\left(1-\epsilon^{(r)}_{z}\right)^{\frac{1}{w_{\max}}\sum^{s}_{\tau=1}Z^{(r)}_{i\tau}}\left(1-\epsilon^{(r)}_{z}\frac{1}{w_{\max}}\tilde{Z}^{(r)}_{i,s+1}\right)\prod^{t^{(r)}}_{\tau=s+2}\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}\left(1+\epsilon\right)}\right)\right]}{\left(1-\epsilon^{(r)}_{z}\right)^{\frac{\left(1-\epsilon^{(r)}_{z}\right)t^{(r)}\lambda^{(r)}}{w_{\max}}}}
\displaystyle\leq t=s+1t(r)ic𝔼[(1+ϵ)γciτ=1sYiτt(r)τ=s+1t(1+ϵγPr(Ditτ+1)di(1+ϵ))](1+ϵ)γδ\displaystyle\frac{\sum^{t^{(r)}}_{t=s+1}\sum_{i\in\mathcal{I}_{c}}\mathbb{E}\left[\left(1+\epsilon\right)^{\frac{\gamma}{c_{i}}\sum^{s}_{\tau=1}Y^{(r)}_{i\tau t}}\prod^{t}_{\tau=s+1}\left(1+\epsilon\frac{\gamma\Pr\left(D_{i}\geq t-\tau+1\right)}{d_{i}\left(1+\epsilon\right)}\right)\right]}{\left(1+\epsilon\right)^{\gamma-\delta}}
+ir𝔼[(1ϵz(r))1wmaxτ=1sZiτ(r)τ=s+2t(r)(1ϵz(r)λ(r)wmax(1+ϵ))](1ϵz(r))(1ϵz(r))t(r)λ(r)wmax\displaystyle+\frac{\sum_{i\in\mathcal{I}_{r}}\mathbb{E}\left[\left(1-\epsilon^{(r)}_{z}\right)^{\frac{1}{w_{\max}}\sum^{s}_{\tau=1}Z^{(r)}_{i\tau}}\sum^{t^{(r)}}_{\tau=s+2}\left(1-\epsilon^{(r)}_{z}\frac{\lambda^{(r)}}{w_{\max}\left(1+\epsilon\right)}\right)\right]}{\left(1-\epsilon^{(r)}_{z}\right)^{\frac{\left(1-\epsilon^{(r)}_{z}\right)t^{(r)}\lambda^{(r)}}{w_{\max}}}}
r(AsSt(r)s).\displaystyle\leq\mathcal{F}_{r}\left(A^{s}S^{t^{(r)}-s}\right).

By induction, we have r(At(r))r(St(r))\mathcal{F}_{r}\left(A^{t^{(r)}}\right)\leq\mathcal{F}_{r}\left(S^{t^{(r)}}\right). Since Algorithm AA has no worse performance than Algorithm SS in each stage, following Lemma 4, the total probability of constraint violation over the entire planning horizon

r=0l1r(At(r))r=0l1r(St(r))ϵ(1+ϵ)δ.\sum^{l-1}_{r=0}\mathcal{F}_{r}\left(A^{t^{(r)}}\right)\leq\sum^{l-1}_{r=0}\mathcal{F}_{r}\left(S^{t^{(r)}}\right)\leq\epsilon(1+\epsilon)^{\delta}.

Therefore, Algorithm AA achieves a reward of at least Tλ~(1O(ϵ))T\tilde{\lambda}^{*}(1-O(\epsilon)) for every iri\in\mathcal{I}_{r} with probability at least 1ϵ(1+ϵ)δ1-\epsilon(1+\epsilon)^{\delta}. ∎

IV-F More on the numerical experiments

It is worth mentioning that for the assortment planning applications, the size of the action set 𝒦\mathcal{K} scales exponentially with the number of products, and therefore equation (1) is computationally intractable under a general choice model. However, we don’t need to enumerate all possible assortments in certain cases, say under MNL models. For simultaneously maximizing the revenue of each resource, we set r=c\mathcal{I}_{r}=\mathcal{I}_{c}, AijkBernoulli(qijk)A_{ijk}\sim\text{Bernoulli}(q_{ijk}) and Wijk=riAijkW_{ijk}=r_{i}A_{ijk}. In this case, equation (1) can be written as:

k(r)(s+1)=argmink𝒦(ic(t=s+1t(r)Pr(Dits)ϕi,s+1,t(r)+riψi,s+1(r))qi,j(r)(s+1),k).k^{(r)}(s+1)=\arg\min\limits_{k\in\mathcal{K}}\left(\sum_{i\in\mathcal{I}_{c}}\left(\sum^{t^{(r)}}_{t=s+1}\Pr(D_{i}\geq t-s)\phi^{(r)}_{i,s+1,t}+r_{i}\psi^{(r)}_{i,s+1}\right)q_{i,j^{(r)}(s+1),k}\right). (4)

Under a MNL model, qijk=exp(𝒃ij𝒇i)1+kexp(𝒃j𝒇)q_{ijk}=\frac{\exp(\boldsymbol{b}_{ij}^{\top}\boldsymbol{f}_{i})}{1+\sum_{\ell\in k}\exp(\boldsymbol{b}_{\ell j}^{\top}\boldsymbol{f}_{\ell})} if iki\in k, and qijk=0q_{ijk}=0 if iki\not\in k. It can be shown that for a type jj customer, (4) can be solved efficiently by solving the following LP ([24]):

min\displaystyle\min ic(t=s+1TPr(Dits)ϕi,s+1,t(r)+riψi,s+1(r))zi\displaystyle~{}\sum_{i\in\mathcal{I}_{c}}\left(\sum^{T}_{t=s+1}\Pr(D_{i}\geq t-s)\phi^{(r)}_{i,s+1,t}+r_{i}\psi^{(r)}_{i,s+1}\right)z_{i}
s.t. iczi+z0=1\displaystyle\sum_{i\in\mathcal{I}_{c}}z_{i}+z_{0}=1
iczi𝒃ij𝒇inz0\displaystyle\sum_{i\in\mathcal{I}_{c}}\frac{z_{i}}{\boldsymbol{b}_{ij}^{\top}\boldsymbol{f}_{i}}\leq nz_{0}
0zi𝒃ij𝒇iz0ic\displaystyle 0\leq\frac{z_{i}}{\boldsymbol{b}_{ij}^{\top}\boldsymbol{f}_{i}}\leq z_{0}\quad\forall i\in\mathcal{I}_{c}

where the decision variables are {zi:ic{0}}\{z_{i}:i\in\mathcal{I}_{c}\cup\{0\}\}.

In our numerical tests, when the number of customer types is large, the computation time is large despite the usage of column generation method. We can use the sample average approximation (SAA) to reduce the number of customer types. Specifically, we can sample 100 times from the 1000 customer types and approximate the value of λ~\tilde{\lambda}^{*} on the 100 samples. This technique is adopted in Algorithm 3, where we have shown that with large enough sample sizes, the SAA works well. Hence essentially we are performing SAA on SAA when the size of 𝒥\mathcal{J} is large. Table 1 shows the results and computational time comparisons between the original algorithm AA and algorithm AA with SAA.

T (γ,ϵ)(\gamma,\epsilon) UB Total Revenue %\% Gap from UB Computational Time
AA SAA AA SAA AA SAA
1000 (300, 0.3) 644.90 331.03 305.90 48.67% 52.57% 134.18 55.19
2000 (600, 0.22) 1289.80 887.28 838.60 31.21% 34.98% 260.01 103.77
3000 (900, 0.185) 1934.70 1419.54 1357.70 26.63% 29.82% 390.10 143.35
4000 (1200, 0.162) 2579.60 1966.83 1861.60 23.75% 27.83% 489.99 188.43
5000 (1500, 0.148) 3224.50 2522.00 2398.70 21.79% 25.61% 576.65 246.04
6000 (1800, 0.136) 3869.40 3086.28 2934.40 20.24% 24.16% 641.84 297.80
7000 (2100, 0.127) 4514.30 3647.95 3477.90 19.19% 22.96% 732.04 352.72
8000 (2400, 0.12) 5159.20 4217.17 4014.40 18.26% 22.19% 1114.00 687.89
TABLE II: Using SAA to reduce computational time.