This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Approximation Algorithm for Generalized Budgeted Assignment Problems and Applications in Transportation Systems

Hongyi Jianga, Samitha Samaranayakeb
( aDepartment of Systems Engineering, City University of Hong Kong
bSchool of Civil and Environmental Engineering, Cornell University, USA
[email protected], [email protected]
)
Abstract

Motivated by a transit line planning problem in transportation systems, we investigate the following capacitated assignment problem under a budget constraint. Our model involves LL bins and PP items. Each bin ll has a utilization cost clc_{l} and an nln_{l}-dimensional capacity vector. Each item pp has an nln_{l}-dimensional binary weight vector rlpr_{lp}, where the 11s in rlpr_{lp} (if any) appear in consecutive positions, and its assignment to bin ll yields a reward vlpv_{lp}. The objective is to maximize total rewards through an assignment that satisfies three constraints: (i) the total weights of assigned items do not violate any bin’s capacity; (ii) each item is assigned to at most one open bin; and (iii) the overall utilization costs remain within a total budget BB.

We propose the first randomized rounding algorithm with a constant approximation ratio for this problem. We then apply our framework to the motivating transit line planning problem, presenting corresponding models and conducting numerical experiments using real-world data. Our results demonstrate significant improvements over previous approaches in addressing this critical transportation challenge.

Keywords: Assignment problem, Approximation algorithm, Transportation systems

1 Introduction

This paper presents a framework for addressing a critical challenge in transportation systems: optimizing the selection of high-capacity fixed-route transit lines to maximize the total reward from passenger route coverage, subject to a specified operational budget constraint. This problem arises from urban transit systems’ pressures of satisfying passenger demand with inadequate resource allocation, necessitating innovative solutions for effective line planning. Our approach aims to enhance the overall performance and efficiency of public transportation networks through strategic line planning. The primary challenge lies in developing an algorithm that is both theoretically sound and practically efficient, capable of deriving high-quality solutions for large-scale problems, such as those encountered in complex urban environments like New York City. This research makes contributions in both discrete optimization and transportation planning, proposing a novel algorithmic framework that bridges the gap between theoretical guarantees in algorithm design and practical applicability in real-world transit planning scenarios.

To address this challenge, we study the transportation problem in the context of a new assignment problem, which we call the Generalized Budgeted Assignment Problem (GBAP). Prior to introducing the formal definition, we present a warm-up example related to the aforementioned transportation line planning problems. This simple example aims to provide context and facilitate understanding of the subsequent formal definitions of the GBAP.

For ease of explanation, this example is relatively restrictive compared to real-world applications. Thus, it is important to note that we will study a more generalized line planning problem in Section 6. This extended discussion will still be based on the GBAP and corresponding approximation algorithm we develop, and will utilize real-world data.

1.1 Warm-up Example

This example involves selecting a set of fixed-route transit lines to operate such that passenger coverage is maximized, while adhering to vehicle capacity constraints and satisfying a budget constraint. By framing this as an assignment-type integer program, we aim to find the optimal solution that balances cost and service efficiency. More specifically, consider a transit system with four stops: AA, BB, CC, and DD. There are three candidate transit lines: line 11, which travels from AA to CC to DD; line 22, which travels from AA to BB to CC to DD; and line 33, which travels from CC to AA to BB to DD. These lines cost 2020, 4040, and 3030 respectively to operate. Each transit line operates a bus with a capacity of 22 passengers. This capacity applies uniformly at all points along the route. We have six passengers: passengers 11 and 22 need to travel from CC to BB, passengers 33, 44, and 55 from CC to DD, and passenger 66 from AA to BB. These transit lines, their costs, and passenger routes are illustrated in Figure 1 and Figure 2.

Refer to caption
Figure 1: This figure displays the three candidate transit lines (Line 11, Line 22, and Line 33) with their routes, operational costs, and capacity vectors. The routes are shown as directed graphs connecting stations AA, BB, CC, and DD. The cost of operating each line is given, along with its capacity vector. The capacity vector indicates the maximum number of passengers that can be accommodated on each segment of the line.
Refer to caption
Figure 2: The figure shows the origin-destination pairs for passengers 161-6, the routes they take on each transit line (with green arrows indicating used segments), the weight vectors for each line (where 11 represents a used segment and 0 an unused segment), and the reward for assigning each passenger to the line, where 11 means the line fully covers the passenger’s origin-destination pair and 0 means it does not.

For simplicity, we assume each passenger must travel directly from their origin to their destination on a single bus line (i.e., no transfers). To serve a passenger, the line chosen must be open and cover the passenger’s entire route. The total cost of opened lines must stay within the given budget of 7070. Therefore, we can open at most 22 lines. Our goal is to choose a set of lines to open within the budget and assign passengers to these lines, maximizing the number of passengers served while adhering to these rules. If a line covers a passenger’s route, assigning this passenger to the line generates a reward of 11; otherwise, the reward is 0. The objective is to maximize the overall generated reward in this context.

To solve this problem, we can construct a capacity vector for each of the three lines, as shown in Figure 1. Line 11 has a capacity vector of (2,2)(2,2), while 22 and 33 have capacity vectors of (2,2,2)(2,2,2). Each entry represents the maximum number of passengers that can be accommodated on a segment of the line. The length of each vector corresponds to the number of edges in that line. Figure 2 illustrates how passengers utilize different lines and the corresponding weight vectors. For example, passengers 11 and 22, traveling from CC to BB, have weight vectors (0,0)(0,0) for line 11, (0,0,0)(0,0,0) for line 2, and (1,1,0)(1,1,0) for line 33. These vectors indicate which segments of each line are used by the passengers, with 11 representing a used segment and 0 an unused segment. The reward of assigning passengers to a line depends on whether the line covers their route. For instance, assigning passengers 11 and 22 to line 11 or 22 yields no reward as these lines do not cover their route, while assigning them to line 33 gives a reward of 11 per person. To obey capacity constraints, the sum of weight vectors for passengers assigned to a line cannot exceed that line’s capacity vector entry-wise. For example, we cannot assign passengers 11, 22, and 33 to line 33 as it would exceed the capacity (3>2)(3>2) for the edges from CC to AA and AA to BB.

Remark 1.1.

When a line covers a passenger’s route, the weight vector contains 1s appearing in adjacent positions, reflecting that passenger routes consist of consecutive edges on the lines. For lines not covering a passenger’s route, the weight vectors are defined as all zeros for consistency, though they could be arbitrary since the reward for these assignments is 0.

Based on these definitions, we can formulate this problem as an assignment-type integer program. This formulation is a specific case of GBAP, which we will present in details in Section 1.2, specifically in (1.1). It can be verified directly that in this example, the optimal solution is to operate lines 22 and 33, covering 5 passengers, with only one passenger (from C to D) left unserved.

1.2 Definition of Generalized Budgeted Assignment Problem

Building on the warm-up example, we now formally define GBAP. This problem involves selecting a set of bins within a given budget and assigning a set of items to these selected bins under multidimensional capacity constraints, with the objective of maximizing the assignment reward. To help understand this model, one can think of transit lines as bins and passengers as items to be allocated. Specifically, consider a system with LL bins and PP items. Each bin l[L]l\in[L]111Here, [n][n] denotes the set {1,2,,n}\{1,2,\ldots,n\} for any integer nn. is characterized by a utilization cost cl+c_{l}\in\mathbb{Q}_{+} and an nln_{l}-dimensional capacity vector fl=(fl(1),,fl(nl))+nlf_{l}=\left(f_{l}^{(1)},\ldots,f_{l}^{(n_{l})}\right)\in\mathbb{Z}^{n_{l}}_{+}. In our warm-up example, the capacity vector flf_{l} has uniform entries across all dimensions. However, in our generalized model, this capacity can vary across dimensions (see Remark 1.2 for further details). Additionally, the dimension nln_{l} corresponds to the number of edges involved in a line in the warm-up example. Each item p[P]p\in[P] has an nln_{l}-dimensional weight vector for each bin ll: rlp=(rlp(1),,rlp(nl)){0,1}nlr_{lp}=\left(r_{lp}^{(1)},\ldots,r_{lp}^{(n_{l})}\right)\in\{0,1\}^{n_{l}} , where the 11s in rlpr_{lp} (if any) appear in consecutive positions within the vector. This structure reflects the assumption that routes consist of consecutive edges, as motivated by the warm-up example where routes follow contiguous edges in a line. Assigning item pp to bin ll yields a reward vlp0v_{lp}\geq 0. The objective is to maximize the total reward through an assignment that satisfies three constraints: (i) the total weight of items assigned to each bin does not exceed the bin’s capacity in any dimension, (ii) each item is assigned to at most one bin, (iii) items can only be assigned to utilized bins, and (iv) the total cost of utilized bins does not exceed a given budget B+B\in\mathbb{Q}_{+}.

The GBAP problem can be formulated as a 0-11 linear integer programming problem:

maxy,x\displaystyle\max_{y,x}~{}~{} p[P]l[L]vlpxlp\displaystyle\sum_{p\in[P]}\sum_{l\in[L]}v_{lp}x_{lp} (1.1a)
s.t. l[L]clylB\displaystyle\sum_{l\in[L]}c_{l}y_{l}\leq B (1.1b)
p[P]xlprlp(i)fl(i)yll[L],i[nl]\displaystyle\sum_{p\in[P]}x_{lp}\cdot r^{(i)}_{lp}\leq f_{l}^{(i)}\cdot y_{l}~{}~{}~{}\forall l\in[L],i\in[n_{l}] (1.1c)
l[L]xlp1p[P]\displaystyle\sum_{l\in[L]}x_{lp}\leq 1~{}~{}~{}\forall p\in[P] (1.1d)
xlp{0,1}p[P],l[L]\displaystyle x_{lp}\in\{0,1\}~{}~{}~{}\forall p\in[P],~{}l\in[L] (1.1e)
yl{0,1}l[L].\displaystyle y_{l}\in\{0,1\}~{}~{}~{}\forall l\in[L]. (1.1f)

(1.1b) ensures that the budget constraint is respected. (1.1c) guarantees that the multidimensional capacity of bins is not breached by the assigned items. (1.1d) requires that item pp can be assigned at most one bin. If item pp is assigned to bin ll, then xlp=1x_{lp}=1. Otherwise, xlp=0x_{lp}=0. (1.1e) guarantees that each bin can have at most one copy of each item. Similarly, yl=1y_{l}=1 indicates that bin ll is utilized. Items can only be matched to utilized bins.

Remark 1.2.

While the warm-up example uses binary rewards (0 or 11), our model allows for any non-negative (rational) rewards depending on different passengers (items) and transit lines (bins). This flexibility enables the study of more complex transportation scenarios, such as maximizing social welfare defined according to passenger characteristics like income and age. For further details and discussions on applications, please refer to Section 6.

Additionally, while the capacity vector in our warm-up example contains uniform values, our model allows for different values across various dimensions. This feature provides enhanced flexibility, making the model applicable to a broader range of scenarios.

For the rest of this paper, we define kk as the ratio of the budget to the largest line cost:

k:=Bmaxl[L]cl\displaystyle k:=\frac{B}{\max_{l\in[L]}c_{l}} (1.2)

This notation will be used frequently throughout our subsequent discussion and analysis.

1.3 Linear Programming Relaxation of Integer Program (1.1)

We will develop a randomized rounding algorithm based on the solution to the linear programming (LP) relaxation of (1.1). While a natural approach to relaxing (1.1) would be to directly set xlpx_{lp} and yly_{l} as continuous variables, this method yields a loose solution, as noted in [11] and [17], thereby limiting the performance of LP-based randomized rounding algorithms. Consequently, we will employ the LP relaxation (1.4) initially proposed by [11], augmented with an additional budget constraint.

For each bin l[L]l\in[L], define

Il:={S[P]:pSrlp(i)fl(i)i[nl]}\displaystyle I_{l}:=\left\{S\subseteq[P]:\sum_{p\in S}r^{(i)}_{lp}\leq f_{l}^{(i)}~{}~{}~{}\forall i\in[n_{l}]\right\} (1.3)

In other words, IlI_{l} is the family of all nonempty feasible assignments of items to ll, where a feasible assignment is a set S[P]S\subset[P] such that bin ll has enough capacity to contain all the items in SS, and Il\emptyset\notin I_{l}. For instance, in the warm-up example, if ll refers to line 33, then S:={passenger 1,2,3}lS:=\{\text{passenger }1,2,3\}\notin\mathcal{I}_{l} because it violates the capacity constraints, whereas S:={passenger 1,2}lS:=\{\text{passenger }1,2\}\in\mathcal{I}_{l}.

Then the LP relaxation of (1.1) can be stated as follows:

max{XlS}\displaystyle\underset{\{X_{lS}\}}{\max}~{}~{}~{}~{}~{}~{}~{} p[P]l[L]SIl:pSvlpXlS\displaystyle\sum_{p\in[P]}\sum_{l\in[L]}\sum_{S\in I_{l}:p\in S}v_{lp}X_{lS} (1.4a)
s.t.\displaystyle s.t.~{}~{}~{}~{}~{}~{}~{}~{} SIlXlS1l[L]\displaystyle\sum_{S\in I_{l}}X_{lS}\leq 1~{}~{}~{}~{}\forall l\in[L] (1.4b)
l[L]SIl:pSXlS1p[P]\displaystyle\sum_{l\in[L]}\sum_{S\in I_{l}:p\in S}X_{lS}\leq 1~{}~{}~{}~{}\forall p\in[P] (1.4c)
l[L]SIlclXlSB\displaystyle\sum_{l\in[L]}\sum_{S\in I_{l}}c_{l}X_{lS}\leq B (1.4d)
XlS[0,1]\displaystyle X_{lS}\in[0,1] (1.4e)

When XlSX_{lS} is restricted to {0,1}\{0,1\}, (1.4b) enforces that each bin can be matched with at most one feasible assignment. Constraints (1.4c) and (1.4d) correspond to (1.1d) and (1.1b) respectively. This LP relaxation can be solved in polynomial time using the ellipsoid method as in [11, 17].

Paper Structure: The remainder of this paper is organized as follows: Section 2 presents related work, challenges, and our contributions. Section 3 provides a high-level overview of our approach. Sections 4 and 5 detail the technical foundations, algorithm, and its analysis. Section 6 presents numerical experiments using real-world data. Section 7 concludes the paper and discusses future research directions.

2 Related Work, Challenges and Contributions

2.1 Related Work

The GBAP is a variant of separable assignment problem (SAP). In SAP, there are LL bins and PP items. For each item pp and bin ll, there is a reward vlpv_{lp} generated when item pp is assigned to bin ll. Moreover, for each bin ll, there is a separate packing constraint, which means that only certain subsets of the items can be assigned to bin ll when the subset satisfies the constraint. The goal is to find an assignment of items to the bins such that all the sets of items assigned to bins do not violate packing constraints, each item is assigned to at most one bin, and the total reward of assignments is optimal. Fleisher et al. [11] propose a β(11e)\beta\left(1-\frac{1}{e}\right)-approximation for SAP assuming that there is a β\beta-approximation algorithm for the single bin subproblem. Moreover, they showed that a special case of SAP called the capacitated distributed caching problem (CapDC) cannot be approximated in polynomial time with an approximation factor better than (11e)\left(1-\frac{1}{e}\right) unless NP\subseteq DTIME(nO(loglogn))(n^{O(\log\log n)}). In CapDC, there are LL cache locations (bins), PP requests (items) and TT different files. Bin ll has capacity AlA_{l}, and file tt has size ata_{t}. Request pp is associated with some file tpt_{p} and a value RpR_{p}. If a request pp is connected to cache location ll, then there is a cost clpc_{lp} and the reward is RpclpR_{p}-c_{lp}. The total size of the files corresponding to connected requests should be no larger than the capacity of the bin.

The maximum generalized assignment problem (GAP) is an important special case of SAP, in which each bin ll has size AlA_{l}, each item pp has size alpa_{lp} in each bin ll and a feasible assignment means that the total size of assigned items for each bin is not larger than the bin’s capacity. GAP is proved to be APX- hard by Chekuri et al. [8]. The best known approximation algorithm for GAP is from [10] and achieves an approximation factor of (11e+δ)\left(1-\frac{1}{e}+\delta\right) for a small constant δ>0\delta>0.

Calinescu et al. [7] also give a (β(11e))\left(\beta(1-\frac{1}{e})\right)-approximation for SAP. Moreover, they give a (11eϵ)\left(1-\frac{1}{e}-\epsilon\right)-approximation algorithm for GAP with the additional restriction of a budget constraint that on the number of bins that can be used. [14] propose a (11eϵ)\left(1-\frac{1}{e}-\epsilon\right)-approximation method for GAP with knapsack constraints, where there is a multi-dimensional budget vector and each item has a multi-dimensional cost vector same for each bin. The total cost vector of assigned items in a feasible assignment is no larger than the budget vector entry-wise.

A generalization of SAP, which is called as ρ\rho-SAP, is studied by Bender et al. [3] (it is called kk-SAP by [3], where kk is the number of allowed bins there, but here we make it consistent with our notation to avoid confusion). In ρ\rho-SAP each item can be assigned to at most ρ\rho different bins, but at most once to each bin. The goal is to find an assignment of items to bins that maximizes total reward. They present a β(11eρ)\beta\left(1-\frac{1}{e^{\rho}}\right)-approximation algorithm for ρ\rho-SAP under the assumption that there is a β\beta-approximation algorithm for the single bin subproblem. If the single bin subproblem admits an FPTAS, then the approximation factor is (11eρ)\left(1-\frac{1}{e^{\rho}}\right). The scheme can also be extended to the different upper bounds of number of allowed bins for different items: if the upper bound is ρi\rho_{i} for item ii, and ρ=miniρi\rho=\min_{i}\rho_{i}, then their algorithm yields β(11eρ)\beta\left(1-\frac{1}{e^{\rho}}\right) approximation.

None of the aforementioned studies are applicable to scenarios that include budget constraints on bins.

In Périvier et al.’s study [17], the focus is on the Real-Time Line Planning Problem (RLPP), which can be modeled as GBAP. In this problem setting, each passenger is restricted to selecting a single subroute per line, and there is no cost associated with allocating passengers to these lines. To tackle the RLPP, the authors propose a randomized rounding algorithm, building upon the scheme initially introduced by Fleischer et al. [11]. This algorithm yields a 11eϵ1-\frac{1}{e}-\epsilon-approximate solution, with the probability of exceeding the budget constraint capped at ek3ϵ2e^{-\frac{k}{3}\epsilon^{2}}. Here, k=Bmaxl[L]clk=\frac{B}{\max_{l\in[L]}c_{l}}, where BB represents the budget, clc_{l} is the cost associated with line ll, and LL is the set of all lines. For a comprehensive discussion and additional related works in the domain of line planning, please see Section 6.

GAP in an online setting has also been studied by Alaei et al. [2]. They propose a (11k)\left(1-\frac{1}{\sqrt{k}}\right)-competitive algorithm under the assumption that no item takes up more than 1k\frac{1}{k} fraction of the capacity of any bin. In the setting of [2], items arrive in an online manner; upon arrival, an item can be assigned to a bin or discarded; the objective is to maximize the total reward of the assignment. The size of an item is revealed only after it has been assigned to a bin; the distribution of each item’s reward and size is known in advance. The online algorithm is developed based on a generalization of the magician’s problem in [1], where it is used to assign a set of items with limited supply to a set of buyers in order to maximize the expected value of revenue or welfare.

2.2 Challenges

Our objective is to devise an algorithm that maintains a constant approximation ratio while being practically efficient. Balancing theoretical rigor with practical efficiency poses a significant challenge. Moreover, conventional approaches fail to yield a constant approximation ratio for our specific problem while meeting all practical constraints.

Firstly, while our problem shares similarities with the capacitated facility location problem (CFLP), for which [20] provides a 11e1-\frac{1}{e} approximation algorithm based on the CFLP’s submodular structure, our problem lacks this submodular property [17, Proposition 4.10]. Consequently, methods aimed at maximizing submodular functions, as seen in works like [15, 13, 9], are not applicable to our context. This necessitates the development of innovative approaches tailored to the unique challenges discussed in this paper.

Secondly, the classical LP-based randomized rounding algorithms, proposed by [11, 17], while simple, can have a non-trivial probability to produce solutions that violate the budget constraint. More specifically, the approach in [17], an adaptation of the method from [11], attempts to mitigate this by scaling down the LP solutions to manage the probability of budget constraint violations. While this method achieves a 11eϵ1-\frac{1}{e}-\epsilon-approximate solution, there remains a non-negligible probability, bounded by ek3ϵ2e^{-\frac{k}{3}\epsilon^{2}}, that the budget constraints may still be breached. This probability fails to guarantee a constant approximation ratio of feasible solutions and can exceed 12\frac{1}{2} even when kk is as large as 1ϵ2\frac{1}{\epsilon^{2}}.

To control this probability of violation, one might consider setting ϵ\epsilon to a relatively high value. However, this adjustment would negatively impact the solution quality, a dilemma extensively discussed in Remark 6.1, which provides a detailed analysis and additional insights into this trade-off.

In summary, classical randomized rounding struggles to manage budget constraints and ensure that each item is assigned to at most one bin while still achieving a constant approximation ratio, especially when the algorithm must be computationally efficient to handle large-scale problems.

2.3 Contributions

We introduce a randomized approximation algorithm that is not only practically efficient but also maintains a constant approximation ratio. Our numerical experiments, conducted using real-world data, also outperform the previous work.

Our preliminary major findings were initially presented as an extended abstract at COCOON [12]. It featured randomized approximation algorithms but lacked formal proofs and did not offer a constant approximation ratio. In this extended work, we have refined our algorithm to achieve a constant approximation ratio.

In Section 5, we introduce our randomized approximation algorithm for solving GBAP, which achieves a constant approximation ratio. Specifically, our algorithm achieves an approximation ratio of 18\frac{1}{8} for k<3k<3, and k12k(11k)>18\frac{k-1}{2k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8} for k3k\geq 3, where kk is defined in (1.2). To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP.

Our analysis leverages two online optimization mechanisms detailed in Section 4. The study of online knapsack problems in Section 4.2 may be of independent interest. Furthermore, to overcome the limitations of classical randomized rounding, which struggles to simultaneously satisfy multiple constraints, we introduce a novel approach. Our method utilizes the realizations from randomized rounding to simulate two distinct online mechanisms. This innovative technique enables us to produce solutions that concurrently satisfy different constraints while maintaining theoretical guarantees. Furthermore, this method offers new insights into algorithm design for related complex optimization problems.

In Section 6, we apply our GBAP approximation algorithms to the RLPP . Using real-world data from New York City, we conduct numerical experiments that demonstrate improvements over previous approaches. These results highlight the practical significance of our algorithms in addressing complex transportation challenges.

3 Overview of Algorithm Design and Analysis

To provide a clear road-map of the technical discussion to follow, we first present an overview of our algorithm design and analysis, before delving into the specifics.

In our algorithm design, we will consider two scenarios with respect to the ratio of the budget to the largest line cost (kk as defined in (1.2)): k<3k<3 and k3k\geq 3. For each scenario, we develop an alternative randomized procedure (see Lemma 5.1 and 5.3) that, while complex to implement, is theoretically analyzable. We derive an approximation ratio for each procedure and subsequently show that a simpler algorithm we design (see Algorithm 1) does at least as well (up to a 12\frac{1}{2} factor), thereby proving an approximation ratio for our simpler algorithm. Our proposed algorithm can be viewed as a simplified method that is derived from these two procedures. A general high level illustration of the techniques is given in the flow chart shown in in Figure 3.

Refer to caption
Figure 3: Overview of Algorithm Design and Analysis

The design of these procedures follows a similar approach. Both procedures begin with randomized rounding based on the LP relaxation of the problem to select at most one SIlS\in I_{l} for each l[L]l\in[L]. This selection is represented by the indicator variables {ZlS}SIl{}\left\{Z_{l}^{S}\right\}_{S\in I_{l}\cup\{\emptyset\}}. Specifically, let X~\widetilde{X} represent the LP solution. For each l[L]l\in[L], a set TlT_{l} is randomly selected from Il{}I_{l}\cup\{\emptyset\} according to the following distribution:

P(Tl=S)={1SIlX~lSif S=,X~lSif SIl,P(T_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widetilde{X}_{lS}&\text{if }S=\emptyset,\\ \widetilde{X}_{lS}&\text{if }S\in I_{l},\end{cases} (3.1)

where the decision variable ZlS=1Z_{l}^{S}=1 if S=TlS=T_{l} and ZlS=0Z_{l}^{S}=0 otherwise, for each SIl{}S\in I_{l}\cup\{\emptyset\}.

Subsequently, each item pp can only be assigned to bins where the corresponding selected sets contain pp, i.e., pTlp\in T_{l}. Since TlIl{}T_{l}\in I_{l}\cup\{\emptyset\}, where IlI_{l} represents the family of all nonempty feasible assignments of items to bin ll as defined in (1.3), this condition ensures that the capacity constraints are satisfied. We then focus on the other two types of constraints: (a) The budget constraint; (b) The restriction that each item can be assigned at most once. We will make two types of decisions: (a) Whether to utilize bin ll, represented by the indicator variable ξl\xi_{l}; (b) Whether to assign item pp to bin ll (not necessarily utilized), represented by the indicator variable ZlpZ_{l}^{p}. This assignment occurs only when three conditions are met simultaneously:

  1. 1.

    Bin ll is assigned a set SS containing pp (ZlS=1Z_{l}^{S}=1 and pSp\in S)

  2. 2.

    Bin ll is utilized (ξl=1\xi_{l}=1)

  3. 3.

    Item pp is assigned to bin ll (not necessarily utilized) (Zlp=1Z_{l}^{p}=1)

In other words, item pp is assigned to a utilized bin ll only when ZlpZlSξl=1Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}=1 for some SS such that pSp\in S. Importantly, in our algorithm design, ZlpZ_{l}^{p} is set to be independent of both ZlSZ_{l}^{S} and ξl\xi_{l}. The rationale and implications of this independence will be discussed in detail later in this section.

The expectation of the rewards can then be expressed as:

𝔼[l[L]SIlpSvlpZlpZlSξl]=l[L]SIlpSvlp𝔼[ZlpZlSξl]\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]

This equality is due to the linearity of expectation and the linear objective function. It allows us to analyze each term 𝔼[ZlpZlSξl]\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right] respectively.

The realizations of ZlSZ_{l}^{S} have been determined through the previously described randomized rounding process. Subsequently, we determine ξl\xi_{l} and ZlpZ_{l}^{p} by simulating two separate online decision-making processes based on these realizations.

To determine ξl\xi_{l} for l[L]l\in[L], we simulate an online decision-making process using the realizations of Hl:=SIlZlS{0,1}H_{l}:=\sum_{S\in I_{l}}Z_{l}^{S}\in\{0,1\}, revealing results as ll goes from 11 to LL. This process is adapted from an online knapsack problem decision-making process (see Section 4.2). The process consists of LL stages. At the llth stage, the algorithm decides whether to utilize bin ll (set ξl=1\xi_{l}=1) based on the realizations of HlH_{l^{\prime}} for lll^{\prime}\leq l and the remaining budget. For k3k\geq 3, it ensures that the overall costs of bin utilization are within the budget, i.e. l[L]clξlB\sum_{l\in[L]}c_{l}\cdot\xi_{l}\leq B, while for k<3k<3, the costs are approximately within the budget. It is important to note that, in contrast to these two alternative procedures, our proposed algorithm ensures strict adherence to the budget constraint in both cases.

For each item pp, to determine ZlpZ_{l}^{p} for l[L]l\in[L], we simulate an online decision-making process using the realizations of Y~lp:=SIl:pSZlS{0,1}\widetilde{Y}_{lp}:=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}\in\{0,1\}. This process iterates over ll in reverse order, from LL to 11, and is based on the magician’s mechanism (see Section 4.1). The process consists of LL stages. At the (Ll+1)(L-l+1)th stage222We use (Ll+1)(L-l+1) instead of ll to align with the descending order of ll, which simplifies subsequent index references of this sentence., the algorithm decides whether to assign item pp to bin ll (i.e., set Zlp=1Z_{l}^{p}=1) based on the realizations of Y~lp\widetilde{Y}_{l^{\prime}p} and the previous decisions ZlpZ_{l^{\prime}}^{p} for l>ll^{\prime}>l. It is important to observe that this iteration order for determining ZlpZ_{l}^{p} is inverse to the one used for determining ξl\xi_{l}. This inverse ordering plays a crucial role in ensuring the independence of these decision variables in our analysis.

The magician’s mechanism ensures that (i) l[L]S:pSZlpZlS1\sum_{l\in[L]}\sum_{S:p\in S}Z_{l}^{p}Z_{l}^{S}\leq 1, (ii) ZlpZ_{l}^{p} is independent of ZlSZ_{l}^{S}, and (iii) 𝔼[Zlp]12\mathbb{E}[Z_{l}^{p}]\geq\frac{1}{2}. Note that point (i) ensures that item pp will be assigned to at most one bin. Due to the inverse revealing order for determining ξl\xi_{l}, ZlpZ_{l}^{p} is also independent of ξl\xi_{l}. Consequently:

𝔼[ZlpZlSξl]=𝔼[Zlp]𝔼[ZlSξl]12𝔼[ZlSξl]\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\mathbb{E}\left[Z_{l}^{p}\right]\cdot\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]\geq\frac{1}{2}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]

.

Therefore,

𝔼[l[L]SIlpSvlpZlpZlSξl]l[L]SIlpSvlp2𝔼[ZlSξl]=12𝔼[l[L]SIlpSvlpZlSξl]\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]\geq\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}\frac{v_{lp}}{2}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\cdot\xi_{l}\right] (3.2)

The last term is then comparable to the outcome of the online knapsack problem decision-making process defined in Section 4.2, which helps us derive the desired result for these two alternative procedures. Finally, we compare our algorithm (Algorithm 1) to these two procedures to establish the desired approximation ratio.

Having provided an overview of our approach, we now outline the structure of the two subsequent sections. In Section 4, we introduce two key online mechanisms: the magician’s mechanism and online fractional knapsack problems. These mechanisms are essential for developing the two alternative procedures, and we present their relevant results. Section 5 then details our proposed algorithm, along with the two alternative procedures and their associated theoretical outcomes.

4 Essential Mechanisms for Algorithm Design and Analysis

As discussed in Section 3, our algorithm’s analysis is based on two key online optimization mechanisms. The first is the generalized γ\gamma-conservative magician, originally proposed by Alaei et al. [1]. This mechanism serves a dual purpose: it determines ZlpZ_{l}^{p}, as outlined in Section 3, and provides a part of the theoretical foundation for proving results in the second mechanism. We present this mechanism here for completeness. The second mechanism, which we will prove the efficacy of subsequently, adapts concepts from online knapsack problems and may be of independent interest to the broader optimization community.

4.1 Generalized Magician’s Problem [1, Section 7]

Definition 4.1.

(the generalized magician’s problem). A magician is presented with a sequence of LL boxes one by one in an online fashion. The magician has δ\delta units of mana. The magician can only open box ii if she has at least 11 unit of mana. If box ii is opened, the magician loses a random amount of mana Xi[0,1]X_{i}\in[0,1] drawn from a distribution specified on box ii by its cumulative distribution function FXiF_{X_{i}}. It is assumed that 𝔼[i=1LXi]δ\mathbb{E}\left[\sum_{i=1}^{L}X_{i}\right]\leq\delta. The magician wants to open each box with ex ante probability at least γ\gamma, i.e. not conditioned on XiX_{i} for i=1,,Li=1,\ldots,L, for a constant γ[0,1]\gamma\in[0,1] as large as possible.

The problem can be solved with a near-optimal solution using the following mechanism.

Definition 4.2.

(generalized γ\gamma-conservative magician) The magician makes a decision about each box as follows: Let the random variable WiW_{i} denote the amount of mana lost prior to seeing the iith box, and FWi(w):=P[Wiw]F_{W_{i}}(w):=P[W_{i}\leq w] denote the ex ante CDF of random variable WiW_{i}. Define random binary variable SiS_{i} conditional on WiW_{i} as follows:

P[Si=1|Wi]\displaystyle P[S_{i}=1|W_{i}] ={1,Wi<θi(γFWi(θi))/(FWi(θi)FWi(θi)),Wi=θi0,Wi>θi.\displaystyle=\begin{cases}1,&W_{i}<\theta_{i}\\ (\gamma-F^{-}_{W_{i}}(\theta_{i}))/(F_{W_{i}}(\theta_{i})-F^{-}_{W_{i}}(\theta_{i})),&W_{i}=\theta_{i}\\ 0,&W_{i}>\theta_{i}.\end{cases}
θi\displaystyle\theta_{i} =min{w|FWi(w)γ} and FWi(w)=P[Wi<w].\displaystyle=\min\{w|F_{W_{i}}(w)\geq\gamma\}\mbox{ and }F^{-}_{W_{i}}(w)=P[W_{i}<w].

Then given WiW_{i} and before seeing the iith box, the magician will decide to open the iith box if Si=1S_{i}=1.

Note that W1=0W_{1}=0 with probability 11, and therefore θ1=0\theta_{1}=0. Then the CDF of Wi+1W_{i+1} and θi+1\theta_{i+1} can be computed based on WiW_{i} and θi\theta_{i} in with dynamic programming. For more details, please see [1, Section 7].

One observation is that SiS_{i} is independent to XiX_{i} for all ii.

Theorem 4.3.

[1, Theorem 7.3] For any γ11δ\gamma\leq 1-\frac{1}{\sqrt{\delta}}, we have θiδ1\theta_{i}\leq\delta-1 for all ii, and a generalized γ\gamma-conservative magician with kk units of mana opens each box with an ex ante probability of exactly γ\gamma, i.e., P(Si=1)=γP(S_{i}=1)=\gamma. If all XiX_{i} are Bernoulli random variables, i.e., Xi{0,1}X_{i}\in\{0,1\} for all ii, and δ\delta is an integer, then for any γ11δ+3\gamma\leq 1-\frac{1}{\sqrt{\delta+3}}, we have θiδ1\theta_{i}\leq\delta-1 for all ii, and a generalized γ\gamma-conservative magician with δ\delta units of mana opens each box with an ex ante probability of exactly γ\gamma.

Remark 4.4.

In Theorem 4.3, δ\delta can be non-integer when XiX_{i} is not restricted to be Bernoulli variable as mentioned in [2] and by its proof.

Application in determining ZlpZ_{l}^{p}: For each item pp, we construct a generalized 12\frac{1}{2}-conservative magician with one unit of mana. This magician faces LL boxes, where the mana cost for opening each box follows a Bernoulli distribution. This distribution is determined by the distribution of ZlSZ_{l}^{S} resulting from the randomized rounding process. The magician’s decisions on whether to open these boxes directly correspond to the determination of ZlpZ_{l}^{p}. For a detailed description of this process, please refer to Step 3 of the procedures outlined in Lemma 5.1 and Lemma 5.3.

4.2 Online Fractional Knapsack Problems

Definition 4.5 (Online Fractional Knapsack Problem with Pre-arranged Arrival Order).

Consider a sequence of nn independent rounds. In each round ii, an object is characterized by a positive reward rir_{i}, a non-negative weight wiw_{i}, and a positive appearance probability pip_{i}. Additionally, there is a specified capacity δ1\delta\geq 1, ensuring that the cumulative weight of the objects accepted does not exceed this capacity. The appearance of each object follows a Bernoulli distribution. The problem is subject to two constraints:

i=1nwipiδ,0<wi1.\sum_{i=1}^{n}w_{i}\cdot p_{i}\leq\delta,\quad 0<w_{i}\leq 1.

In this setting, we assume full knowledge of the object distribution. At the ii-th round, we observe the appearance of object ii before deciding a fraction βi[0,1]\beta_{i}\in[0,1] of object ii to accept. Acceptance is conditioned on the remaining capacity being at least βiwi\beta_{i}\cdot w_{i}. Accepting the object results in a reduction of the remaining capacity by βiwi\beta_{i}\cdot w_{i} and a secured reward of βiri\beta_{i}\cdot r_{i}. All decisions, once made, are irreversible.

It is assumed that the sequence of object appearances can be pre-determined before the initiation of the rounds.

The goal is to devise an algorithm that optimizes an efficiency coefficient α(0,1)\alpha\in(0,1) such that the expected reward, at the end of nn rounds, is αi=1nripi\alpha\cdot\sum_{i=1}^{n}r_{i}\cdot p_{i}.

Remark 4.6.

The term “object” is used here instead of “item” to provide a distinction from the “item” used in GBAP. Specifically, in the analysis presented in Section 5, the allocation of objects in this online knapsack problem will be compared to the utilization of bins in GBAP.

Lemma 4.7 (Expected Reward for Prearranged Arrival Order).

Consider the problem defined in Definition 4.5. Assume the order of object appearances is prearranged such that

r1w1r2w2rnwn.\displaystyle\frac{r_{1}}{w_{1}}\geq\frac{r_{2}}{w_{2}}\geq\ldots\geq\frac{r_{n}}{w_{n}}. (4.1)

Furthermore, let the strategy be to accept objects fully in order of appearance until the remaining capacity is insufficient to accept the next appearing object fully. At this point, accept this next appearing object fractionally to exactly exhaust the remaining capacity. Under these conditions, the expected reward generated at the end of nn rounds is

max{12,(11δ)}i[n]ripi.\max\left\{\frac{1}{2},\left(1-\frac{1}{\sqrt{\delta}}\right)\right\}\cdot\sum_{i\in[n]}r_{i}\cdot p_{i}.
Proof.

We begin by establishing the ratio of (11δ)\left(1-\frac{1}{\sqrt{\delta}}\right). According to Definition 4.2 and Theorem 4.3, the problem defined in Definition 4.5 admits a solution with this ratio when using the Magician’s mechanism. Note that in this case, the acceptance fraction βi{0,1}\beta_{i}\in\{0,1\} for all i[n]i\in[n] due to the definition of Magician’s mechanism. It is evident that the strategy outlined in Lemma 4.7 outperforms the Magician’s mechanism in terms of efficiency and reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof for this part.

As for the ratio of 12\frac{1}{2}, we will conduct the similar comparison, but with the help of the method in Manshadi et al. [16]. In their work, they defined a method such that at the end of nn rounds, it achieves that 𝔼[mini[n]βi]12\mathbb{E}\left[\min_{i\in[n]}\beta_{i}\right]\geq\frac{1}{2}. For more details, please see Theorem 2 in [16] when μ1\mu\leq 1. This means that the method finally yields expected reward at least

12i[n]ripi.\frac{1}{2}\cdot\sum_{i\in[n]}r_{i}\cdot p_{i}.

Again, the greedy strategy outlined in Lemma 4.7 outperforms this method in terms of reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof. ∎

Application in determining ξl\xi_{l} and deriving approximation ratio: To apply the results of Lemma 4.7 in determining whether to utilize a bin ll (i.e., determine ξl\xi_{l} as mentioned in Section 3), we simulate the online knapsack problem in Algorithm 1. Specifically, after a randomized rounding process to realize ZlSZ_{l}^{S}, we reindex the LL bins following the rule in (4.1) of Algorithm 1. We then simulate the online knapsack problem as described in Section 3. This approach serves two purposes: (1) it enables us to determine ξl\xi_{l} by simulating the decision-making process for the online knapsack problem, and (2) it allows us to compare the last term in (3.2) to the outcome of the online knapsack problem, enabling us to leverage the bound established in Lemma 4.7 to derive the desired approximation ratio.

5 Approximation Algorithm

In this section, we consider solving GBAP. Let kk be the ratio of the budget to the maximum bin cost:

k:=Bmaxl[L]cl.k:=\frac{B}{\max_{l\in[L]}c_{l}}.

For the sake of simplicity and without loss of generality, we can scale the budget and costs to adhere to the following assumption:

Assumption 1.

The given budget is kk, and for all l[L]l\in[L], cl1c_{l}\leq 1, with maxl[L]cl=1\max_{l\in[L]}c_{l}=1.

The LP relaxation can be solved in polynomial time by solving its dual with ellipsoid method as in [11, 17].

Algorithm 1.
  1. 1.

    Find a solution to the LP relaxation of (1.4) and let X~\widetilde{X} be the solution.

  2. 2.

    Reorder the bins so that

    SIl(X~lSpSvlp)clSIlX~lSSIl(X~lSpSvlpS)clSIlX~lS for 1l<lL.\displaystyle\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{c_{l}\sum_{S\in I_{l}}\widetilde{X}_{lS}}\geq\frac{\sum_{S\in I_{l^{\prime}}}\left(\widetilde{X}_{l^{\prime}S}\sum_{p\in S}v_{l^{\prime}p}^{S}\right)}{c_{l^{\prime}}\sum_{S\in I_{l^{\prime}}}\widetilde{X}_{l^{\prime}S}}~{}~{}~{}~{}~{}~{}\mbox{ for }1\leq l<l^{\prime}\leq L. (5.1)

    Note that here if the denominators are zeros, the corresponding numerators are also zeros, and the corresponding bins can be ignored in the following rounding process. Thus, we can define 00:=0\frac{0}{0}:=0 in (5.1).

  3. 3.

    Independently for each l[L]l\in[L], randomly select a set TlT_{l} from Il{}I_{l}\cup\{\emptyset\} following the distribution below:

    P(Tl=S)={1SIlX~lSif S=X~lSif SIl.P({T}_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widetilde{X}_{lS}&\text{if }S=\emptyset\\ \widetilde{X}_{lS}&\text{if }S\in I_{l}\end{cases}.

    Let the (random) decision variable be ZlSZ_{l}^{S}, i.e., ZlS=1Z_{l}^{S}=1 if S=TlS=T_{l}, and ZlS=0Z_{l}^{S}=0 otherwise for SIl{}S\in I_{l}\cup\{\emptyset\}.

  4. 4.

    Temporarily assign all the elements in TlT_{l} to bin ll for each l[L]l\in[L].

  5. 5.

    For each item pp, retain it in the bin where it yields the highest reward vlpv_{lp} and remove it from others.

  6. 6.

    Open all the bins that have been assigned with at least one item, and close all the other bins. If this assignment does not exceed the budget, then we take this assignment as the final solution.

  7. 7.

    Otherwise, discard this assignment. We construct a new assignment.

  8. 8.

    Let B0=kB_{0}=k. Let ll be from 11 to LL. At llth iteration, let Bl:=Bl1clSIlZlS0B_{l}:=B_{l-1}-{c}_{l}\sum_{S\in I_{l}}Z_{l}^{S}\geq 0.

  9. 9.

    If k3k\geq 3, let :={l:Bl0,SIlZlS>0}\mathcal{L}^{*}:=\{l:B_{l}\geq 0,~{}\sum_{S\in I_{l}}Z_{l}^{S}>0\}. Assign TlT_{l} to ll for each ll\in\mathcal{L}^{*} and open it. For each item pp, retain it in the bin where it yields the highest reward vlpv_{lp} and remove it from others.

  10. 10.

    If k<3k<3, let :={l:Bl1>0,SIlZlS>0}\mathcal{L}^{*}:=\{l:B_{l-1}>0,~{}\sum_{S\in I_{l}}Z_{l}^{S}>0\}. Assign TlT_{l} to ll for each ll\in\mathcal{L}^{*} and temporarily open it. For each item pp, retain it in the bin where it yields the highest reward vlpv_{lp} and remove it from others. Let l=maxlll^{*}=\max_{l\in\mathcal{L}^{*}}l. Then we open the bins in either \{l}\mathcal{L}^{*}\backslash\{l^{*}\} or in {l}\{l^{*}\}, whichever generated more rewards.

Our proof is divided into two cases: k3k\geq 3 and k<3k<3. For both scenarios, we employ a two-step approach. First, we analyze the expected reward generated by a specific procedure of assignment based on two mechanisms: the magician’s mechanism defined in Definition 4.2 and the mechanism adapted from the online fractional knapsack problem discussed in Section 4.2. Second, we compare the rewards generated by Algorithm 1 with those from this constructed procedure.

We begin by addressing the case where k3k\geq 3. For this scenario, we construct a specific procedure and provide its analysis in the following lemma.

Lemma 5.1.

Assume k3k\geq 3. With the same notations as in Algorithm  1, we define a procedure as follows:

  1. 1.

    For each l[L]l\in[L], let Hl=SIlZlSH_{l}=\sum_{S\in I_{l}}Z_{l}^{S}, which indicates whether any set SS has been selected for bin ll in the randomized rounding process. Define ξl=1\xi_{l}=1 if Bl0B_{l}\geq 0 and Hl=1H_{l}=1. Otherwise, set ξl=0\xi_{l}=0. Here BlB_{l} represents the remaining budget after considering the first ll bins. Thus, ξl=1\xi_{l}=1 indicates that bin ll is utilized, which occurs when there is sufficient budget (Bl0B_{l}\geq 0) and a set has been selected for this bin (Hl=1H_{l}=1).

  2. 2.

    For every l[L]l\in[L] and p[P]p\in[P], introduce a variable Y~lp\widetilde{Y}_{lp}, given by

    Y~lp=SIl:pSZlS.\widetilde{Y}_{lp}=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}.

    It indicates whether item pp is included in the selected set for bin ll, and its probability distribution is

    P(Y~lp=y)={1SIl:pSX~lS,if y=0,SIl:pSX~lS,if y=1.P(\widetilde{Y}_{lp}=y)=\begin{cases}1-\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=0,\\ \sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=1.\end{cases}
  3. 3.

    For each item p[P]p\in[P], create a generalized 12\frac{1}{2}-conservative magician with 11 unit of mana (called type-p magician). The magician of item pp is presented with a sequence of LL type-pp boxes from box LL to box 11. The distribution of Y~lp\widetilde{Y}_{lp} is written on type-pp box ll. The amount of mana the magician will lose if open type-pp box ll is equal to the value of Y~lp{0,1}\widetilde{Y}_{lp}\in\{0,1\}. The magician will decide whether to open type-pp box ll following the mechanism defined in Definition 4.2. Let the (random) decision variable be ZlpZ_{l}^{p}, i.e., Zlp=1Z_{l}^{p}=1 if the magician for pp decides to open type-pp box ll, and Zlp=0Z_{l}^{p}=0 otherwise.

If we utilize bin ll when ξl=1\xi_{l}=1, and assign item pp to bin ll whenever ZlpY~lpξl=1Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1, then the assignment is feasible, which means that:

  1. (a)

    Each passenger is assigned to at most one utilized bin.

  2. (b)

    The total cost of all utilized bins is within the given budget kk.

  3. (c)

    The capacity constraints are respected.

Moreover, the expected approximation ratio derived from this assignment is at least (k12k)(11k)>18\left(\frac{k-1}{2k}\right)\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8}.

Proof.

We first prove its feasibility. We start with point (a). In Step 3, ZlpZ_{l}^{p} represents the decision of the type-pp magician on whether to open the type-pp box with index ll. Meanwhile, Y~lp\widetilde{Y}_{lp} indicates whether opening the box requires spending 1 unit of mana. Based on the definition of the magician’s mechanism in Definition 4.2 and the results stated in Theorem 4.3, the type-pp magician will spend at most one unit of mana during this process. Therefore, l=1LZlpY~lp1\sum_{l=1}^{L}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\leq 1. Since ZlpY~lpξlZlpY~lpZ_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\leq Z_{l}^{p}\cdot\widetilde{Y}_{lp}, so if we assign item pp to bin ll when ZlpY~lpξl=1Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1, it can can only be assigned at most once through this process. As for point (b) and (c), they can be checked directly by definition.

The expected reward derived from the procedure in Lemma 5.1 can be expressed as:

𝔼[l[L]p[P]vlpZlpY~lpξl]\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{p\in[P]}v_{lp}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\right] =𝔼[l[L]SIlpSvlpZlpZlSξl]=l[L]SIlpSvlp𝔼[ZlpZlSξl]\displaystyle=\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]
=l[L]SIlpSvlp𝔼[Zlp]𝔼[ZlSξl]\displaystyle=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}\right]\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right] (5.2)
12l[L]SIlpSvlp𝔼[ZlSξl]\displaystyle\geq\frac{1}{2}\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right] (5.3)
=12𝔼[l[L]SIlpSvlpZlSξl]=12l[L]𝔼[SIlpSvlpZlS|ξl=1]P[ξl=1]\displaystyle=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{|}\xi_{l}=1\right]\cdot P\left[\xi_{l}=1\right]
=12l[L]𝔼[SIlpSvlpZlS|Hl=1]P[ξl=1]\displaystyle=\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{|}H_{l}=1\right]\cdot P\left[\xi_{l}=1\right] (5.4)

Eq. (5.2) results from observing that ZlpZ_{l}^{p} is independent of ξl\xi_{l}, and ZlSZ_{l}^{S} for each l[L]l\in[L] and SIlS\in I_{l}. This independence arises because the magician’s mechanism ensures that ZlpZ_{l}^{p} only depends on Y~lp\widetilde{Y}_{l^{\prime}}^{p} and ZlpZ_{l^{\prime}}^{p} for l{l+1,l+2,,L}l^{\prime}\in\{l+1,l+2,\ldots,L\} as defined in Step 3 of the procedure, while ξl\xi_{l} depends on ZlSZ_{l^{\prime}}^{S} for l{1,2,,l}l^{\prime}\in\{1,2,\ldots,l\} and SIlS\in I_{l}. (5.3) occurs because ZlpZ_{l}^{p} represents the decision of a type-pp magician to open the box or not, constrained by having only one unit of mana, leading to E[Zlp]11+3=12E[Z_{l}^{p}]\geq\frac{1}{\sqrt{1+3}}=\frac{1}{2} according to Theorem 4.3.

As for Eq. (5.4), it is due to ξl\xi_{l} depending only on HlH_{l} and HlH_{l^{\prime}} for l<ll^{\prime}<l, and ZlSZ_{l}^{S} being independent of HlH_{l^{\prime}} for l<ll^{\prime}<l.

Let us define

rl:=𝔼[SIlpSvlpSZlSHl=1]=SIl(X~lSpSvlp)SIlX~lS.r_{l}:=\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v^{S}_{lp}Z_{l}^{S}\mid H_{l}=1\right]=\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{\sum_{S\in I_{l}}\widetilde{X}_{lS}}.

Subsequently, Eq. (5.4) can be expressed as

12l[L]𝔼[rlξl]=12𝔼[l[L]rlξl].\displaystyle\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[r_{l}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]. (5.5)

To establish the lower bound of Eq. (5.4), we construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of LL rounds. In each round ll, an object appears with a probability P(Hl=1)P(H_{l}=1), possessing a weight clc_{l} and a reward rlr_{l}. The initial capacity is equal to the budget kk. As specified in Step 2 of Algorithm 1, it satisfies that r1c1r2c2rLcL\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}, allowing us to apply Lemma 4.7.

In the constructed instance, if we fully accept each appearing object in sequence until the remaining capacity is inadequate for the full acceptance of the next appearing object, and then accept the next object fractionally to precisely exhaust the remaining capacity, then, according to Lemma 4.7 with k3k\geq 3, the expected rewards obtained in this online knapsack problem are at least:

(11k)l[L]rlP(Hl=1)=(11k)l[L]p[P]SIl:pSvlpX^lS.\displaystyle\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)=\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

If we instead only accept objects fully and stop when the remaining capacity is insufficient for the next appearing object (i.e., we do not accept any object fractionally), the reward equals l[L]rlξl\sum_{l\in[L]}r_{l}\cdot\xi_{l}. Since cl1c_{l}\leq 1, the capacity is kk, and r1c1r2c2rLcL\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}, then we have that

𝔼[l[L]rlξl]\displaystyle\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right] k1k(11k)l[L]rlP(Hl=1)\displaystyle\geq\frac{k-1}{k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)
=k1k(11k)l[L]p[P]SIl:pSvlpX^lS.\displaystyle=\frac{k-1}{k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}. (5.6)

Summarizing (5.4),(5.5) and (5.6), we conclude that

𝔼[l[L]SIlpSvlpZlpZlSξl]=12𝔼[l[L]rlξl]k12k(11k)l[L]p[P]SIl:pSvlpX^lS.\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{k-1}{2k}\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

This completes the proof. ∎

Theorem 5.2.

Assume k3k\geq 3. The solution given by Algorithm 1 is feasible. The expected approximation ratio of the solution given by Algorithm 1 is at least (k12k)(11k)>18\left(\frac{k-1}{2k}\right)\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8}.

Proof.

The feasibility can be verified directly. Given the same realization of ZlSZ_{l}^{S} in Step 3 of Algorithm 1, both Algorithm 1 and the procedure defined in Lemma 5.1 will utilize identical bins.

Moreover, under this realization, item pp can only be assigned to bins where ZlS=1Z_{l}^{S}=1 and pSp\in S. This condition holds true for both Algorithm 1 and the procedure in Lemma 5.1.

Given the realizations of ZlSZ_{l}^{S}, the optimal reward is achieved by retaining each item pp in the bin where it yields the highest reward vlpv_{lp}, subject to ZlS=1Z_{l}^{S}=1 and pSp\in S. This is precisely what Step 9 of Algorithm 1 accomplishes.

Therefore, the expected reward of Algorithm 1 will be at least as high as that derived from the procedure defined in Lemma 5.1. ∎

The above proof is constructed by creating a procedure that produces a feasible assignment solution, then comparing Algorithm 1 with it to derive the desired result. For the case when k<3k<3, we will instead produce a procedure that generates an assignment solution that may violate the budget constraint. We will then compare this with Algorithm 1 to derive the desired result. Although the approaches differ in terms of solution feasibility of the constructed procedures, the key ideas remain similar. Both rely on the magician’s mechanism and the results from the online fractional knapsack problem presented in Section 4.

Lemma 5.3.

Assume k<3k<3. With the same notations as in Algorithm  1, we define a procedure as follows:

  1. 1.

    For each l[L]l\in[L], let Hl=SIlZlSH_{l}=\sum_{S\in I_{l}}Z_{l}^{S}. Set ξl=1\xi_{l}=1 if Bl10B_{l-1}\geq 0 and Hl=1H_{l}=1. Otherwise, set ξl=0\xi_{l}=0. This ensures that bin ll is utilized (ξl=1\xi_{l}=1) only when the remaining budget for ll is non-negative (Bl10B_{l-1}\geq 0) and a set has been selected for this bin (Hl=1H_{l}=1).

  2. 2.

    For every l[L]l\in[L] and p[P]p\in[P], introduce a random variable Y~lp\widetilde{Y}_{lp}, given by

    Y~lp=SIl:pSZlS.\widetilde{Y}_{lp}=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}.

    Its probability distribution is

    P(Y~lp=y)={1SIl:pSX~lS,if y=0,SIl:pSX~lS,if y=1.P(\widetilde{Y}_{lp}=y)=\begin{cases}1-\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=0,\\ \sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=1.\end{cases}
  3. 3.

    For each item p[P]p\in[P], create a generalized 12\frac{1}{2}-conservative magician with 11 unit of mana (called type-p magician). The magician of item pp is presented with a sequence of LL type-pp boxes from box LL to box 11. The distribution of Y~lp\widetilde{Y}_{lp} is written on type-pp box ll. The amount of mana the magician will lose if open type-pp box ll is equal to the value of Y~lp{0,1}\widetilde{Y}_{lp}\in\{0,1\}. The magician will decide whether to open type-pp box ll following the mechanism defined in Definition 4.2. Let the (random) decision variable be ZlpZ_{l}^{p}, i.e., Zlp=1Z_{l}^{p}=1 if the magician for pp decides to open type-pp box ll, and Zlp=0Z_{l}^{p}=0 otherwise.

If we utilize bin ll when ξl=1\xi_{l}=1, and assign item pp to bin ll whenever ZlpY~lpξl=1Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1 for some SIlS\in I_{l} and pSp\in S, then the assignment satisfies:

  1. (a)

    Each passenger is assigned to at most one utilized bin.

  2. (b)

    Let the total cost of all utilized bins be kk^{\prime} and define l=max{l[L]:ξl=1}l^{*}=\max\{l\in[L]:\xi_{l}=1\}. Then we have kclkk^{\prime}-c_{l^{*}}\leq k. In other words, if we remove the utilized bin with the largest index, the total cost of the remaining bins is within the given budget kk.

  3. (c)

    The capacity constraints are respected.

Moreover, the expected approximation ratio derived from this assignment is at least 14\frac{1}{4}.

Proof.

As the proof employs reasoning similar to that used in Lemma 5.1, we have placed the full proof in Appendix A to maintain the flow of the main text. ∎

Theorem 5.4.

Assume k<3k<3. The solution given by Algorithm 1 is feasible. The expected approximation ratio of the solution given by Algorithm 1 is at least 18\frac{1}{8}.

Proof.

The feasibility can be verified directly. Given the same realization of ZlSZ_{l}^{S} in Step 3 of Algorithm 1, the algorithm will temporarily open the same bins as those utilized by the procedure defined in Lemma 5.3.

Moreover, under this realization, item pp can only be assigned to bins where ZlS=1Z_{l}^{S}=1 and pSp\in S. This condition holds true for both Algorithm 1 and the procedure in Lemma 5.3.

Given the realizations of ZlSZ_{l}^{S}, the optimal reward for these temporarily opened bins is achieved by retaining each item pp in the bin where it yields the highest reward vlpv_{lp}, subject to ZlS=1Z_{l}^{S}=1 and pSp\in S. This is precisely what Step 10 of Algorithm 1 does for those temporarily opened bins. Consequently, the total reward from these temporarily opened bins is at least as large as that derived from the procedure in Lemma 5.3.

The key difference from Theorem 5.1 is that the temporarily opened bins might violate the budget constraint. To address this, we compare the reward of the temporarily opened bin with the largest index (ll^{*} in Step 10 of Algorithm 1) to the reward of all other temporarily opened bins (\{l}\mathcal{L}^{*}\backslash\{l^{*}\} in Step 10 of Algorithm 1). Whichever is larger will retain at least half of the total reward.

Moreover, based on point (b) of Lemma 5.3, either choice will stay within the budget. This comparison and selection process is implemented in Step 10 of Algorithm 1.

Therefore, the expected reward of Algorithm 1 will be at least half of that derived from the procedure defined in Lemma 5.3, which yields the expected approximation ratio as 1214=18\frac{1}{2}\cdot\frac{1}{4}=\frac{1}{8}. ∎

6 Applications in transportation

In this section, we introduce an application of GBAP in a multi-modal Mobility-on-Demand (MoD) setting, illustrating how the framework can be adapted to address practical transportation problems. Consider a MoD operator managing a fleet of single-capacity cars and high-capacity buses. The operator aims to provide a multi-modal service with buses operating on fixed routes and cars functioning in a demand-responsive manner. An individual trip request can be served by a bus, a car, or a combination of both.

This scenario could be envisioned as a transit agency planning a multi-modal service where traditional buses are complemented by a demand-responsive service, or a private rideshare operator integrating fixed-route shuttles with taxis. This problem was investigated by [17]. There is a wide range of literature studying the traditional line-planning problem. For instance, [6] introduced a new multicommodity flow model for line planning. Their model aims to strike a balance between minimizing operating costs for transport companies and reducing travel times for passengers. On the other hand, [4] provides a holistic approach for transit line planning using column generation. For additional literature on line planning problems in public transportation, readers are directed to [18].

By applying GBAP to this MoD scenario, we can optimize resource allocation and route planning, maximizing the efficiency and service quality of the transportation system while adhering to budget constraints. This demonstrates the versatility and practical value of GBAP in addressing complex, real-world transportation challenges.

In this problem, the transit network is modeled as an undirected weighted graph G=(V,E)G=(V,E), where VV contains nn nodes as potential origin/destination nodes, and EE is the set of edges representing the shortest paths between the nodes. The weights of the edges are the cost required to cross an edge eEe\in E. A set \mathcal{L} of candidate bus routes/lines on GG is given to the operator. Each candidate route / line is defined as a fixed sequence of consecutive edges of GG, and it has an operating cost clc_{l}. There is a budget BB for operating bus lines. Each bus has capacity CC and each line ll has frequency flf_{l}. Thus, the overall capacity of a bus line is CflC\cdot f_{l}. The cars cover ’first-last mile’ travel. It is assumed that there are no inter-bus transfers for each passenger. If the passenger pp is assigned to line ll, then vlp0v_{lp}\geq 0 is defined as the welfare value contributed to the system by this matching. The operator is responsible for selecting a set of bus lines from \mathcal{L} to operate in a way that maximizes the overall welfare of the system. The value of vlpv_{lp} is calculated according to the rules given, e.g. binary based on whether the matching is feasible or the number of car miles saved due to the matching (for more details, please see [17]). The goal is to find a subset of lines to be operated and an assignment of passengers to opened lines to maximize the welfare of the system, such that: (a) the cost of opened lines does not exceed the budget; (b) the assignment of passengers and their trips to the opened line ll respect the capacity constraints of each edge on line ll; (c) a passenger is assigned to at most one line.

Now we can describe how this problem fits into the proposed framework. For each line ll with nln_{l} edges, we map it to a bin in GBAP with an nln_{l}-dimensional capacity vector. Each entry in the capacity vector of line ll is set to CflC\cdot f_{l}. Each passenger is taken as an item in GBAP with a 010-1 weight vector rlp:=(rlp(1),,rlp(nl)){0,1}nlr_{lp}:=(r_{lp}^{(1)},\ldots,r_{lp}^{(n_{l})})\in\{0,1\}^{n_{l}}. If the iith edge of line ll is used when passenger pp is matched to line ll, then we set the iith entry of rlpr_{lp} to 11, and 0 otherwise. Note that rlpr_{lp} has consecutive 11s according to the definition of a route in RLPP. The cost of utilizing bin ll is set to be clc_{l} while the cost of assigning item pp to bin ll is assumed to be 0. Contributed welfare values {vlp}\{v_{lp}\} are the coefficients in the objective of (1.1). Since each passenger can only be matched to at most one line, we let ρp=1\rho_{p}=1 for each p[P]p\in[P]. The RLPP is an instance of GBAP with an additional aggregation step that merges any subset of selected lines with the same route (and different frequencies) to a single line with the corresponding aggregated frequency. Since the aggregation step neither increases the cost of the solution nor decreases the objective values as shown in [17], and can be trivially incorporated into our framework, we omit it here for simplicity.

Périvier et al. [17] present an approximation algorithm (see Algorithm 2 in Appendix B) to solve the problem based on the randomized rounding scheme proposed by [11]. The corresponding approximation ratio is 11eϵ1-\frac{1}{e}-\epsilon, with a probability of violating the budget constraint bounded by eϵ2k/3e^{-\epsilon^{2}k/3}, where k:=Bmaxlclk:=\frac{B}{\max_{l\in\mathcal{L}}c_{l}} as we have assumed before.

Remark 6.1.

We note that the violation probability bound, given by ek3ϵ2e^{-\frac{k}{3}\epsilon^{2}}, may surpass 12\frac{1}{2}, even when kk is as large as 1ϵ2\frac{1}{\epsilon^{2}}. To address this, one may consider setting ϵ\epsilon to 1kγ\frac{1}{k^{\gamma}}, with γ<12\gamma<\frac{1}{2}. However, this implies that the method’s approximation ratio becomes (11e)(11kγ)<(11e)(11k)\left(1-\frac{1}{e}\right)\left(1-\frac{1}{k^{\gamma}}\right)<\left(1-\frac{1}{e}\right)\left(1-\frac{1}{\sqrt{k}}\right) with violation probability bound e13k12γe^{-\frac{1}{3}k^{1-2\gamma}}. We present Table 1 to illustrate this by taking k=25,50,100k=25,50,100, which are reasonable and relatively large parameters in real-world cases. Note that by definition, the number of the opened lines in the optimal solution can be significantly larger than kk. We calculate the approximation ratio as (11e)(1ϵ)\left(1-\frac{1}{e}\right)\left(1-\epsilon\right) since it is the original one derived by [17] and is higher than (11eϵ)\left(1-\frac{1}{e}-\epsilon\right). Furthermore, note that the rounding process tends to produce infeasible solutions with higher objective values than feasible ones, suggesting that the actual expected approximation ratio for feasible solutions might be significantly lower than (11e)(1ϵ)\left(1-\frac{1}{e}\right)\left(1-\epsilon\right), especially when the violation probability is non-negligible.

kk γ\gamma ϵ\epsilon (11e)(1ϵ)\left(1-\frac{1}{e}\right)\left(1-\epsilon\right) ek3ϵ2e^{-\frac{k}{3}\epsilon^{2}}
25 1/3 0.3420 0.4159 0.3773
25 1/4 0.4472 0.3494 0.1889
50 1/3 0.2714 0.4605 0.2929
50 1/4 0.3761 0.3944 0.0947
100 1/3 0.2154 0.4959 0.2128
100 1/4 0.3162 0.4322 0.0357
Table 1: Results for the given expressions with varying kk and γ\gamma values. Algorithm 1 achieve approximation ratios greater than 0.390.39, 0.40.4, and 0.440.44 for k=25,50,100k=25,50,100, respectively, by setting the ϵ\epsilon term to be a sufficiently small constant.

We note that when Algorithm 1, and Algorithm 2 (in Appendix B) utilize the same LP solution for their rounding process, Algorithm 1 are provably superior to Algorithm 2. The reason for this is that the feasible solutions produced by Algorithm 2 can also be generated by Algorithm 1 with the same probability distributions, while each infeasible solution generated by Algorithm 2 is converted into a feasible one by Algorithm 1. However, since the approximation ratio derived for Algorithm 2 encompasses the objective values of infeasible solutions, it is not applicable to Algorithm 1.

6.1 Numerical Experiments.

In this section, we conduct numerical experiments in the context of RLPP based on the data provided by [17] and Algorithm 1. We also compare its performance with Algorithm 2 with ϵ=0.05\epsilon=0.05. The underlying road network is derived from OpenStreetMap (OSM) geographical data from [5]. The size of the candidate set of lines is set to be 10001000, and the lines are generated based on the method proposed by [19]. The passenger data comes from records of for-hire vehicle trips in Manhattan using the New York City Open Data operator, with a time window between 5 pm and 6 pm on the first Tuesday of April 2018. There are 13,85113,851 trip requests. The bus capacity is set to be 3030. For practical purposes, we solve the corresponding LP via column generation and apply a timeout—the current LP solution will be returned once the time limit is exceeded.

We implement Algorithm 1, Algorithm 2, and the modified Algorithm 1 with rounding based on the same LP solution used by Algorithm 2. The experiments consider four different budgets, and we simulate the rounding 10410^{4} times. In Figure 4, the xx-axis represents the number of simulations, and the yy-axis indicates the objective value of the best solution found by the algorithms so far. We omit the first 100100 realizations in Figure 4 to make the display of the yy-axis more readable. The results333In the dataset we employed, the value of kk substantially exceeds 33. In this case, Algorithm 1 coincides with Algorithm 2 presented in our conference paper [12], yielding same numerical outcomes. show that when the budget is 8×1048\times 10^{4} and 10510^{5} respectively, our algorithms find significantly better solutions, while the three methods perform similarly when the budget is 2×1042\times 10^{4} and 5×1045\times 10^{4} respectively.

Refer to caption
Figure 4: We compared three methods: Algorithm 1 (GBAP), Algorithm 2 (RLPP) (with ϵ=0.05\epsilon=0.05), and Modified Algorithm 1 (which uses the same LP solution for rounding as Algorithm 2). Note that Modified Algorithm 1 always performs better than Algorithm 2 since they use the same realizations produced by the rounding process, which is based on the same LP solution.

7 Conclusion and Future Directions

In this paper, we have introduced a novel approach to the Generalized Budgeted Assignment Problem (GBAP) and demonstrated its application to the Real-Time Line Planning Problem (RLPP) in transportation systems. To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP. Our approach not only enhances the theoretical understanding of GBAP but also provides practical solutions for complex transportation planning challenges.

Our contributions are threefold. First, we have developed a randomized approximation algorithm for GBAP with a proven constant approximation ratio. Second, we have introduced an innovative technique that employs randomized rounding realizations to simulate two distinct online mechanisms. This approach enables the simultaneous satisfaction of different constraints while maintaining a strong theoretical guarantee on algorithm performance. Third, we have applied our algorithm to the RLPP and demonstrated its effectiveness through numerical experiments using real-world data from New York City’s transportation system. This application underscores the algorithm’s potential for large-scale urban transit planning and bridges the gap between theoretical advancements and practical implementations in transportation systems.

While our results represent a significant step forward, several avenues for future research remain open.

As stated in out extended abstract at COCOON [12, Theorem 6], we have established that the approximation ratio cannot exceed 11e1-\frac{1}{e} by reducing the problem to the max kk-cover problem.444We have chosen to omit the detailed statement of this result to maintain focus on our primary contributions. An avenue for future research is to explore whether the upper bound can be tightened to a constant strictly less than 11e1-\frac{1}{e}.

The second potential direction for future research is to explore possible improvements to our current approximation ratio. While our algorithm (Algorithm 1) enhances practical efficiency and performance compared to the two alternative procedures presented in Lemma 5.1 and Lemma 5.3, its approximation ratio guarantee remains the same as these procedures since it essentially builds upon their bounds. It remains unclear whether there is a significant gap in the theoretical guarantees between our algorithm and these two alternative procedures. Therefore, refining the analysis to enhance the theoretical bound of Algorithm 1 may be worthwhile. Another avenue could be the development of new algorithms with a better approximation ratio guarantee. However, it is important to note that our primary goal has been to develop an algorithm that is both theoretically sound and practically efficient. This dual requirement presents significant challenges in substantially improving the approximation ratio without compromising computational efficiency. Any future improvements will need to carefully balance these competing objectives, making this a complex but potentially rewarding area for further investigation.

It is also interesting for future research to explore scenarios with stochastically appearing items, each following a known binary distribution. In this context, the objective would be to maximize expected rewards by strategically utilizing bins within the given budget constraint. This extension could enhance the model’s applicability to real-world situations where demand or item availability is uncertain but follows predictable patterns.

References

  • Alaei [2014] Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM Journal on Computing, 43(2):930–972, 2014.
  • Alaei et al. [2013] Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. The online stochastic generalized assignment problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 11–25. Springer, 2013.
  • Bender et al. [2015] Marco Bender, Clemens Thielen, and Stephan Westphal. Packing items into several bins facilitates approximating the separable assignment problem. Information Processing Letters, 115(6-8):570–575, 2015.
  • Bertsimas et al. [2021] Dimitris Bertsimas, Yee Sian Ng, and Julia Yan. Data-driven transit network design at scale. Operations Research, 69(4):1118–1133, 2021.
  • Boeing [2017] Geoff Boeing. Osmnx: A python package to work with graph-theoretic openstreetmap street networks. Journal of Open Source Software, 2(12), 2017.
  • Borndörfer et al. [2007] Ralf Borndörfer, Martin Grötschel, and Marc E Pfetsch. A column-generation approach to line planning in public transport. Transportation Science, 41(1):123–132, 2007.
  • Calinescu et al. [2011] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
  • Chekuri and Khanna [2005] Chandra Chekuri and Sanjeev Khanna. A polynomial time approximation scheme for the multiple knapsack problem. SIAM Journal on Computing, 35(3):713–728, 2005.
  • Chekuri et al. [2012] Chandra Chekuri, Nitish Korula, and Martin Pál. Improved algorithms for orienteering and related problems. ACM Transactions on Algorithms (TALG), 8(3):1–27, 2012.
  • Feige and Vondrak [2006] Uriel Feige and Jan Vondrak. Approximation algorithms for allocation problems: Improving the factor of 1-1/e. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 667–676. IEEE, 2006.
  • Fleischer et al. [2011] Lisa Fleischer, Michel X Goemans, Vahab S Mirrokni, and Maxim Sviridenko. Tight approximation algorithms for maximum separable assignment problems. Mathematics of Operations Research, 36(3):416–431, 2011.
  • Jiang and Samaranayake [2022] Hongyi Jiang and Samitha Samaranayake. Approximation algorithms for capacitated assignment with budget constraints and applications in transportation systems. In International Computing and Combinatorics Conference, pages 94–105. Springer, 2022.
  • Kulik [2011] Ariel Kulik. Submodular and linear maximization with knapsack constraints. Master’s thesis, Computer Science Department, Technion, 2011.
  • Kulik et al. [2013] Ariel Kulik, Hadas Shachnai, and Tami Tamir. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research, 38(4):729–739, 2013.
  • Lee et al. [2009] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 323–332, 2009.
  • Manshadi et al. [2021] Vahideh Manshadi, Rad Niazadeh, and Scott Rodilitz. Fair dynamic rationing. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 694–695, 2021.
  • Périvier et al. [2021] Noémie Périvier, Chamsi Hssaine, Samitha Samaranayake, and Siddhartha Banerjee. Real-time approximate routing for smart transit systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–30, 2021.
  • Schöbel [2012] Anita Schöbel. Line planning in public transportation: models and methods. OR spectrum, 34(3):491–510, 2012.
  • Silman et al. [1974] Lionel Adrian Silman, Zeev Barzily, and Ury Passy. Planning the route system for urban buses. Computers & operations research, 1(2):201–211, 1974.
  • Wolsey [1982] Laurence A Wolsey. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research, 7(3):410–425, 1982.

Appendix A Proof of Lemma 5.3

Proof of Lemma 5.3.

Point (a) can be proven using the exact same way as in Lemma 5.3. Regarding point (b) and (c), they can be directly verified from its definition.

Following the same reasoning as in (5.4), the expected reward derived from the procedure in Lemma 5.3 can be expressed as:

𝔼[l[L]p[P]vlpZlpY~lpξl]\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{p\in[P]}v_{lp}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\right] =𝔼[l[L]SIlpSvlpZlpZlSξl]\displaystyle=\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]
12l[L]𝔼[SIlpSvlpZlS|Hl=1]P[ξl=1]\displaystyle\geq\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{|}H_{l}=1\right]\cdot P\left[\xi_{l}=1\right] (A.1)

Same as in Lemma 5.3, let us define

rl:=𝔼[SIlpSvlpSZlSHl=1]=SIl(X~lSpSvlp)SIlX~lS.r_{l}:=\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v^{S}_{lp}Z_{l}^{S}\mid H_{l}=1\right]=\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{\sum_{S\in I_{l}}\widetilde{X}_{lS}}.

Subsequently, Eq. (A.1) can be expressed as

12l[L]𝔼[rlξl]=12𝔼[l[L]rlξl].\displaystyle\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[r_{l}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]. (A.2)

To establish the lower bound of Eq. (A.1), we again construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of LL rounds. In each round ll, an object appears with a probability P(Hl=1)P(H_{l}=1), possessing a weight clc_{l} and a reward rlr_{l}. The initial capacity is equal to the budget kk. As specified in Step 2 of Algorithm 1, it satisfies that r1c1r2c2rLcL\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}, allowing us to apply Lemma 4.7.

Following the same reasoning as in Lemma 5.3, expected rewards of this online knapsack problem instance are at least:

12l[L]rlP(Hl=1)=12l[L]p[P]SIl:pSvlpX^lS.\displaystyle\frac{1}{2}\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)=\frac{1}{2}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

Unlike Lemma 5.1, the definition of ξl\xi_{l} provided in Lemma 5.3 indicates that the reward cannot exceed l[L]rlξl\sum_{l\in[L]}r_{l}\cdot\xi_{l}. Consequently, we establish:

𝔼[l[L]rlξl]12l[L]p[P]SIl:pSvlpX^lS.\displaystyle\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{1}{2}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}. (A.3)

Summarizing (A.1),(A.2) and (A.3), we conclude that

𝔼[l[L]SIlpSvlpZlpZlSξl]12𝔼[l[L]rlξl]14l[L]p[P]SIl:pSvlpX^lS.\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]\geq\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{1}{4}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

This completes the proof. ∎

Appendix B Method from [17]

Here we restate the method from [17] while omitting the aggregation step.

Algorithm 2.
  1. 1.

    Let X^\widehat{X} be the solution to the LP relaxation of (1.4).

  2. 2.

    Independently for each l[L]l\in[L], randomly select a set TlT_{l} from Il{}I_{l}\cup\{\emptyset\} following the distribution below:

    P(Tl=S)={1SIlX^lSif S=X^lSif SIl.P({T}_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widehat{X}_{lS}&\text{if }S=\emptyset\\ \widehat{X}_{lS}&\text{if }S\in I_{l}\end{cases}.

    Let the (random) decision variable be ZlSZ_{l}^{S}, i.e., ZlS=1Z_{l}^{S}=1 if S=TlS=T_{l}, and ZlS=0Z_{l}^{S}=0 otherwise for SIl{}S\in I_{l}\cup\{\emptyset\}.

  3. 3.

    Temporarily assign all the elements in TlT_{l} to bin ll for each l[L]l\in[L].

  4. 4.

    For p[P]p\in[P], we remove each item pp that is assigned to more than ρ\rho bins from all but the ρ\rho bins with highest rewards vlpv_{lp} it is assigned to.

  5. 5.

    Open all the lines that have been assigned with at least one passenger, and close all the other lines. If this assignment does not exceed the budget, then we take this assignment as the final solution.