Approximation Algorithm for Generalized Budgeted Assignment Problems and Applications in Transportation Systems

Hongyi Jiang^a, Samitha Samaranayake^b

( ^aDepartment of Systems Engineering, City University of Hong Kong
^bSchool of Civil and Environmental Engineering, Cornell University, USA
[email protected], [email protected]
)

Abstract

Motivated by a transit line planning problem in transportation systems, we investigate the following capacitated assignment problem under a budget constraint. Our model involves $L$ bins and $P$ items. Each bin $l$ has a utilization cost $c_{l}$ and an $n_{l}$ -dimensional capacity vector. Each item $p$ has an $n_{l}$ -dimensional binary weight vector $r_{lp}$ , where the $1$ s in $r_{lp}$ (if any) appear in consecutive positions, and its assignment to bin $l$ yields a reward $v_{lp}$ . The objective is to maximize total rewards through an assignment that satisfies three constraints: (i) the total weights of assigned items do not violate any bin’s capacity; (ii) each item is assigned to at most one open bin; and (iii) the overall utilization costs remain within a total budget $B$ .

We propose the first randomized rounding algorithm with a constant approximation ratio for this problem. We then apply our framework to the motivating transit line planning problem, presenting corresponding models and conducting numerical experiments using real-world data. Our results demonstrate significant improvements over previous approaches in addressing this critical transportation challenge.

Keywords: Assignment problem, Approximation algorithm, Transportation systems

1 Introduction

This paper presents a framework for addressing a critical challenge in transportation systems: optimizing the selection of high-capacity fixed-route transit lines to maximize the total reward from passenger route coverage, subject to a specified operational budget constraint. This problem arises from urban transit systems’ pressures of satisfying passenger demand with inadequate resource allocation, necessitating innovative solutions for effective line planning. Our approach aims to enhance the overall performance and efficiency of public transportation networks through strategic line planning. The primary challenge lies in developing an algorithm that is both theoretically sound and practically efficient, capable of deriving high-quality solutions for large-scale problems, such as those encountered in complex urban environments like New York City. This research makes contributions in both discrete optimization and transportation planning, proposing a novel algorithmic framework that bridges the gap between theoretical guarantees in algorithm design and practical applicability in real-world transit planning scenarios.

To address this challenge, we study the transportation problem in the context of a new assignment problem, which we call the Generalized Budgeted Assignment Problem (GBAP). Prior to introducing the formal definition, we present a warm-up example related to the aforementioned transportation line planning problems. This simple example aims to provide context and facilitate understanding of the subsequent formal definitions of the GBAP.

For ease of explanation, this example is relatively restrictive compared to real-world applications. Thus, it is important to note that we will study a more generalized line planning problem in Section 6. This extended discussion will still be based on the GBAP and corresponding approximation algorithm we develop, and will utilize real-world data.

1.1 Warm-up Example

This example involves selecting a set of fixed-route transit lines to operate such that passenger coverage is maximized, while adhering to vehicle capacity constraints and satisfying a budget constraint. By framing this as an assignment-type integer program, we aim to find the optimal solution that balances cost and service efficiency. More specifically, consider a transit system with four stops: $A$ , $B$ , $C$ , and $D$ . There are three candidate transit lines: line $1$ , which travels from $A$ to $C$ to $D$ ; line $2$ , which travels from $A$ to $B$ to $C$ to $D$ ; and line $3$ , which travels from $C$ to $A$ to $B$ to $D$ . These lines cost $20$ , $40$ , and $30$ respectively to operate. Each transit line operates a bus with a capacity of $2$ passengers. This capacity applies uniformly at all points along the route. We have six passengers: passengers $1$ and $2$ need to travel from $C$ to $B$ , passengers $3$ , $4$ , and $5$ from $C$ to $D$ , and passenger $6$ from $A$ to $B$ . These transit lines, their costs, and passenger routes are illustrated in Figure 1 and Figure 2.

Refer to caption — Figure 1: This figure displays the three candidate transit lines (Line $1$ , Line $2$ , and Line $3$ ) with their routes, operational costs, and capacity vectors. The routes are shown as directed graphs connecting stations $A$ , $B$ , $C$ , and $D$ . The cost of operating each line is given, along with its capacity vector. The capacity vector indicates the maximum number of passengers that can be accommodated on each segment of the line.

For simplicity, we assume each passenger must travel directly from their origin to their destination on a single bus line (i.e., no transfers). To serve a passenger, the line chosen must be open and cover the passenger’s entire route. The total cost of opened lines must stay within the given budget of $70$ . Therefore, we can open at most $2$ lines. Our goal is to choose a set of lines to open within the budget and assign passengers to these lines, maximizing the number of passengers served while adhering to these rules. If a line covers a passenger’s route, assigning this passenger to the line generates a reward of $1$ ; otherwise, the reward is $0$ . The objective is to maximize the overall generated reward in this context.

To solve this problem, we can construct a capacity vector for each of the three lines, as shown in Figure 1. Line $1$ has a capacity vector of $(2,2)$ , while $2$ and $3$ have capacity vectors of $(2,2,2)$ . Each entry represents the maximum number of passengers that can be accommodated on a segment of the line. The length of each vector corresponds to the number of edges in that line. Figure 2 illustrates how passengers utilize different lines and the corresponding weight vectors. For example, passengers $1$ and $2$ , traveling from $C$ to $B$ , have weight vectors $(0,0)$ for line $1$ , $(0,0,0)$ for line 2, and $(1,1,0)$ for line $3$ . These vectors indicate which segments of each line are used by the passengers, with $1$ representing a used segment and $0$ an unused segment. The reward of assigning passengers to a line depends on whether the line covers their route. For instance, assigning passengers $1$ and $2$ to line $1$ or $2$ yields no reward as these lines do not cover their route, while assigning them to line $3$ gives a reward of $1$ per person. To obey capacity constraints, the sum of weight vectors for passengers assigned to a line cannot exceed that line’s capacity vector entry-wise. For example, we cannot assign passengers $1$ , $2$ , and $3$ to line $3$ as it would exceed the capacity $(3>2)$ for the edges from $C$ to $A$ and $A$ to $B$ .

Remark 1.1.

When a line covers a passenger’s route, the weight vector contains 1s appearing in adjacent positions, reflecting that passenger routes consist of consecutive edges on the lines. For lines not covering a passenger’s route, the weight vectors are defined as all zeros for consistency, though they could be arbitrary since the reward for these assignments is $0$ .

Based on these definitions, we can formulate this problem as an assignment-type integer program. This formulation is a specific case of GBAP, which we will present in details in Section 1.2, specifically in (1.1). It can be verified directly that in this example, the optimal solution is to operate lines $2$ and $3$ , covering 5 passengers, with only one passenger (from C to D) left unserved.

1.2 Definition of Generalized Budgeted Assignment Problem

Building on the warm-up example, we now formally define GBAP. This problem involves selecting a set of bins within a given budget and assigning a set of items to these selected bins under multidimensional capacity constraints, with the objective of maximizing the assignment reward. To help understand this model, one can think of transit lines as bins and passengers as items to be allocated. Specifically, consider a system with $L$ bins and $P$ items. Each bin $l\in[L]$ ¹¹1Here, $[n]$ denotes the set $\{1,2,\ldots,n\}$ for any integer $n$ . is characterized by a utilization cost $c_{l}\in\mathbb{Q}_{+}$ and an $n_{l}$ -dimensional capacity vector $f_{l}=\left(f_{l}^{(1)},\ldots,f_{l}^{(n_{l})}\right)\in\mathbb{Z}^{n_{l}}_{+}$ . In our warm-up example, the capacity vector $f_{l}$ has uniform entries across all dimensions. However, in our generalized model, this capacity can vary across dimensions (see Remark 1.2 for further details). Additionally, the dimension $n_{l}$ corresponds to the number of edges involved in a line in the warm-up example. Each item $p\in[P]$ has an $n_{l}$ -dimensional weight vector for each bin $l$ : $r_{lp}=\left(r_{lp}^{(1)},\ldots,r_{lp}^{(n_{l})}\right)\in\{0,1\}^{n_{l}}$ , where the $1$ s in $r_{lp}$ (if any) appear in consecutive positions within the vector. This structure reflects the assumption that routes consist of consecutive edges, as motivated by the warm-up example where routes follow contiguous edges in a line. Assigning item $p$ to bin $l$ yields a reward $v_{lp}\geq 0$ . The objective is to maximize the total reward through an assignment that satisfies three constraints: (i) the total weight of items assigned to each bin does not exceed the bin’s capacity in any dimension, (ii) each item is assigned to at most one bin, (iii) items can only be assigned to utilized bins, and (iv) the total cost of utilized bins does not exceed a given budget $B\in\mathbb{Q}_{+}$ .

The GBAP problem can be formulated as a $0$ - $1$ linear integer programming problem:


$\displaystyle\max_{y,x}~{}~{}$	$\displaystyle\sum_{p\in[P]}\sum_{l\in[L]}v_{lp}x_{lp}$	(1.1a)
s.t.	$\displaystyle\sum_{l\in[L]}c_{l}y_{l}\leq B$	(1.1b)
	$\displaystyle\sum_{p\in[P]}x_{lp}\cdot r^{(i)}_{lp}\leq f_{l}^{(i)}\cdot y_{l}~{}~{}~{}\forall l\in[L],i\in[n_{l}]$	(1.1c)
	$\displaystyle\sum_{l\in[L]}x_{lp}\leq 1~{}~{}~{}\forall p\in[P]$	(1.1d)
	$\displaystyle x_{lp}\in\{0,1\}~{}~{}~{}\forall p\in[P],~{}l\in[L]$	(1.1e)
	$\displaystyle y_{l}\in\{0,1\}~{}~{}~{}\forall l\in[L].$	(1.1f)

(1.1b) ensures that the budget constraint is respected. (1.1c) guarantees that the multidimensional capacity of bins is not breached by the assigned items. (1.1d) requires that item $p$ can be assigned at most one bin. If item $p$ is assigned to bin $l$ , then $x_{lp}=1$ . Otherwise, $x_{lp}=0$ . (1.1e) guarantees that each bin can have at most one copy of each item. Similarly, $y_{l}=1$ indicates that bin $l$ is utilized. Items can only be matched to utilized bins.

Remark 1.2.

While the warm-up example uses binary rewards ( $0$ or $1$ ), our model allows for any non-negative (rational) rewards depending on different passengers (items) and transit lines (bins). This flexibility enables the study of more complex transportation scenarios, such as maximizing social welfare defined according to passenger characteristics like income and age. For further details and discussions on applications, please refer to Section 6.

Additionally, while the capacity vector in our warm-up example contains uniform values, our model allows for different values across various dimensions. This feature provides enhanced flexibility, making the model applicable to a broader range of scenarios.

For the rest of this paper, we define $k$ as the ratio of the budget to the largest line cost:

\displaystyle k:=\frac{B}{\max_{l\in[L]}c_{l}}

(1.2)

This notation will be used frequently throughout our subsequent discussion and analysis.

1.3 Linear Programming Relaxation of Integer Program (1.1)

We will develop a randomized rounding algorithm based on the solution to the linear programming (LP) relaxation of (1.1). While a natural approach to relaxing (1.1) would be to directly set $x_{lp}$ and $y_{l}$ as continuous variables, this method yields a loose solution, as noted in [11] and [17], thereby limiting the performance of LP-based randomized rounding algorithms. Consequently, we will employ the LP relaxation (1.4) initially proposed by [11], augmented with an additional budget constraint.

For each bin $l\in[L]$ , define

\displaystyle I_{l}:=\left\{S\subseteq[P]:\sum_{p\in S}r^{(i)}_{lp}\leq f_{l}^{(i)}~{}~{}~{}\forall i\in[n_{l}]\right\}

(1.3)

In other words, $I_{l}$ is the family of all nonempty feasible assignments of items to $l$ , where a feasible assignment is a set $S\subset[P]$ such that bin $l$ has enough capacity to contain all the items in $S$ , and $\emptyset\notin I_{l}$ . For instance, in the warm-up example, if $l$ refers to line $3$ , then $S:=\{\text{passenger }1,2,3\}\notin\mathcal{I}_{l}$ because it violates the capacity constraints, whereas $S:=\{\text{passenger }1,2\}\in\mathcal{I}_{l}$ .

Then the LP relaxation of (1.1) can be stated as follows:


$\displaystyle\underset{\{X_{lS}\}}{\max}~{}~{}~{}~{}~{}~{}~{}$	$\displaystyle\sum_{p\in[P]}\sum_{l\in[L]}\sum_{S\in I_{l}:p\in S}v_{lp}X_{lS}$	(1.4a)
$\displaystyle s.t.~{}~{}~{}~{}~{}~{}~{}~{}$	$\displaystyle\sum_{S\in I_{l}}X_{lS}\leq 1~{}~{}~{}~{}\forall l\in[L]$	(1.4b)
	$\displaystyle\sum_{l\in[L]}\sum_{S\in I_{l}:p\in S}X_{lS}\leq 1~{}~{}~{}~{}\forall p\in[P]$	(1.4c)
	$\displaystyle\sum_{l\in[L]}\sum_{S\in I_{l}}c_{l}X_{lS}\leq B$	(1.4d)
	$\displaystyle X_{lS}\in[0,1]$	(1.4e)

When $X_{lS}$ is restricted to $\{0,1\}$ , (1.4b) enforces that each bin can be matched with at most one feasible assignment. Constraints (1.4c) and (1.4d) correspond to (1.1d) and (1.1b) respectively. This LP relaxation can be solved in polynomial time using the ellipsoid method as in [11, 17].

Paper Structure: The remainder of this paper is organized as follows: Section 2 presents related work, challenges, and our contributions. Section 3 provides a high-level overview of our approach. Sections 4 and 5 detail the technical foundations, algorithm, and its analysis. Section 6 presents numerical experiments using real-world data. Section 7 concludes the paper and discusses future research directions.

2 Related Work, Challenges and Contributions

2.1 Related Work

The GBAP is a variant of separable assignment problem (SAP). In SAP, there are $L$ bins and $P$ items. For each item $p$ and bin $l$ , there is a reward $v_{lp}$ generated when item $p$ is assigned to bin $l$ . Moreover, for each bin $l$ , there is a separate packing constraint, which means that only certain subsets of the items can be assigned to bin $l$ when the subset satisfies the constraint. The goal is to find an assignment of items to the bins such that all the sets of items assigned to bins do not violate packing constraints, each item is assigned to at most one bin, and the total reward of assignments is optimal. Fleisher et al. [11] propose a $\beta\left(1-\frac{1}{e}\right)$ -approximation for SAP assuming that there is a $\beta$ -approximation algorithm for the single bin subproblem. Moreover, they showed that a special case of SAP called the capacitated distributed caching problem (CapDC) cannot be approximated in polynomial time with an approximation factor better than $\left(1-\frac{1}{e}\right)$ unless NP $\subseteq$ DTIME $(n^{O(\log\log n)})$ . In CapDC, there are $L$ cache locations (bins), $P$ requests (items) and $T$ different files. Bin $l$ has capacity $A_{l}$ , and file $t$ has size $a_{t}$ . Request $p$ is associated with some file $t_{p}$ and a value $R_{p}$ . If a request $p$ is connected to cache location $l$ , then there is a cost $c_{lp}$ and the reward is $R_{p}-c_{lp}$ . The total size of the files corresponding to connected requests should be no larger than the capacity of the bin.

The maximum generalized assignment problem (GAP) is an important special case of SAP, in which each bin $l$ has size $A_{l}$ , each item $p$ has size $a_{lp}$ in each bin $l$ and a feasible assignment means that the total size of assigned items for each bin is not larger than the bin’s capacity. GAP is proved to be APX- hard by Chekuri et al. [8]. The best known approximation algorithm for GAP is from [10] and achieves an approximation factor of $\left(1-\frac{1}{e}+\delta\right)$ for a small constant $\delta>0$ .

Calinescu et al. [7] also give a $\left(\beta(1-\frac{1}{e})\right)$ -approximation for SAP. Moreover, they give a $\left(1-\frac{1}{e}-\epsilon\right)$ -approximation algorithm for GAP with the additional restriction of a budget constraint that on the number of bins that can be used. [14] propose a $\left(1-\frac{1}{e}-\epsilon\right)$ -approximation method for GAP with knapsack constraints, where there is a multi-dimensional budget vector and each item has a multi-dimensional cost vector same for each bin. The total cost vector of assigned items in a feasible assignment is no larger than the budget vector entry-wise.

A generalization of SAP, which is called as $\rho$ -SAP, is studied by Bender et al. [3] (it is called $k$ -SAP by [3], where $k$ is the number of allowed bins there, but here we make it consistent with our notation to avoid confusion). In $\rho$ -SAP each item can be assigned to at most $\rho$ different bins, but at most once to each bin. The goal is to find an assignment of items to bins that maximizes total reward. They present a $\beta\left(1-\frac{1}{e^{\rho}}\right)$ -approximation algorithm for $\rho$ -SAP under the assumption that there is a $\beta$ -approximation algorithm for the single bin subproblem. If the single bin subproblem admits an FPTAS, then the approximation factor is $\left(1-\frac{1}{e^{\rho}}\right)$ . The scheme can also be extended to the different upper bounds of number of allowed bins for different items: if the upper bound is $\rho_{i}$ for item $i$ , and $\rho=\min_{i}\rho_{i}$ , then their algorithm yields $\beta\left(1-\frac{1}{e^{\rho}}\right)$ approximation.

None of the aforementioned studies are applicable to scenarios that include budget constraints on bins.

In Périvier et al.’s study [17], the focus is on the Real-Time Line Planning Problem (RLPP), which can be modeled as GBAP. In this problem setting, each passenger is restricted to selecting a single subroute per line, and there is no cost associated with allocating passengers to these lines. To tackle the RLPP, the authors propose a randomized rounding algorithm, building upon the scheme initially introduced by Fleischer et al. [11]. This algorithm yields a $1-\frac{1}{e}-\epsilon$ -approximate solution, with the probability of exceeding the budget constraint capped at $e^{-\frac{k}{3}\epsilon^{2}}$ . Here, $k=\frac{B}{\max_{l\in[L]}c_{l}}$ , where $B$ represents the budget, $c_{l}$ is the cost associated with line $l$ , and $L$ is the set of all lines. For a comprehensive discussion and additional related works in the domain of line planning, please see Section 6.

GAP in an online setting has also been studied by Alaei et al. [2]. They propose a $\left(1-\frac{1}{\sqrt{k}}\right)$ -competitive algorithm under the assumption that no item takes up more than $\frac{1}{k}$ fraction of the capacity of any bin. In the setting of [2], items arrive in an online manner; upon arrival, an item can be assigned to a bin or discarded; the objective is to maximize the total reward of the assignment. The size of an item is revealed only after it has been assigned to a bin; the distribution of each item’s reward and size is known in advance. The online algorithm is developed based on a generalization of the magician’s problem in [1], where it is used to assign a set of items with limited supply to a set of buyers in order to maximize the expected value of revenue or welfare.

2.2 Challenges

Our objective is to devise an algorithm that maintains a constant approximation ratio while being practically efficient. Balancing theoretical rigor with practical efficiency poses a significant challenge. Moreover, conventional approaches fail to yield a constant approximation ratio for our specific problem while meeting all practical constraints.

Firstly, while our problem shares similarities with the capacitated facility location problem (CFLP), for which [20] provides a $1-\frac{1}{e}$ approximation algorithm based on the CFLP’s submodular structure, our problem lacks this submodular property [17, Proposition 4.10]. Consequently, methods aimed at maximizing submodular functions, as seen in works like [15, 13, 9], are not applicable to our context. This necessitates the development of innovative approaches tailored to the unique challenges discussed in this paper.

Secondly, the classical LP-based randomized rounding algorithms, proposed by [11, 17], while simple, can have a non-trivial probability to produce solutions that violate the budget constraint. More specifically, the approach in [17], an adaptation of the method from [11], attempts to mitigate this by scaling down the LP solutions to manage the probability of budget constraint violations. While this method achieves a $1-\frac{1}{e}-\epsilon$ -approximate solution, there remains a non-negligible probability, bounded by $e^{-\frac{k}{3}\epsilon^{2}}$ , that the budget constraints may still be breached. This probability fails to guarantee a constant approximation ratio of feasible solutions and can exceed $\frac{1}{2}$ even when $k$ is as large as $\frac{1}{\epsilon^{2}}$ .

To control this probability of violation, one might consider setting $\epsilon$ to a relatively high value. However, this adjustment would negatively impact the solution quality, a dilemma extensively discussed in Remark 6.1, which provides a detailed analysis and additional insights into this trade-off.

In summary, classical randomized rounding struggles to manage budget constraints and ensure that each item is assigned to at most one bin while still achieving a constant approximation ratio, especially when the algorithm must be computationally efficient to handle large-scale problems.

2.3 Contributions

We introduce a randomized approximation algorithm that is not only practically efficient but also maintains a constant approximation ratio. Our numerical experiments, conducted using real-world data, also outperform the previous work.

Our preliminary major findings were initially presented as an extended abstract at COCOON [12]. It featured randomized approximation algorithms but lacked formal proofs and did not offer a constant approximation ratio. In this extended work, we have refined our algorithm to achieve a constant approximation ratio.

In Section 5, we introduce our randomized approximation algorithm for solving GBAP, which achieves a constant approximation ratio. Specifically, our algorithm achieves an approximation ratio of $\frac{1}{8}$ for $k<3$ , and $\frac{k-1}{2k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8}$ for $k\geq 3$ , where $k$ is defined in (1.2). To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP.

Our analysis leverages two online optimization mechanisms detailed in Section 4. The study of online knapsack problems in Section 4.2 may be of independent interest. Furthermore, to overcome the limitations of classical randomized rounding, which struggles to simultaneously satisfy multiple constraints, we introduce a novel approach. Our method utilizes the realizations from randomized rounding to simulate two distinct online mechanisms. This innovative technique enables us to produce solutions that concurrently satisfy different constraints while maintaining theoretical guarantees. Furthermore, this method offers new insights into algorithm design for related complex optimization problems.

In Section 6, we apply our GBAP approximation algorithms to the RLPP . Using real-world data from New York City, we conduct numerical experiments that demonstrate improvements over previous approaches. These results highlight the practical significance of our algorithms in addressing complex transportation challenges.

3 Overview of Algorithm Design and Analysis

To provide a clear road-map of the technical discussion to follow, we first present an overview of our algorithm design and analysis, before delving into the specifics.

In our algorithm design, we will consider two scenarios with respect to the ratio of the budget to the largest line cost ( $k$ as defined in (1.2)): $k<3$ and $k\geq 3$ . For each scenario, we develop an alternative randomized procedure (see Lemma 5.1 and 5.3) that, while complex to implement, is theoretically analyzable. We derive an approximation ratio for each procedure and subsequently show that a simpler algorithm we design (see Algorithm 1) does at least as well (up to a $\frac{1}{2}$ factor), thereby proving an approximation ratio for our simpler algorithm. Our proposed algorithm can be viewed as a simplified method that is derived from these two procedures. A general high level illustration of the techniques is given in the flow chart shown in in Figure 3.

The design of these procedures follows a similar approach. Both procedures begin with randomized rounding based on the LP relaxation of the problem to select at most one $S\in I_{l}$ for each $l\in[L]$ . This selection is represented by the indicator variables $\left\{Z_{l}^{S}\right\}_{S\in I_{l}\cup\{\emptyset\}}$ . Specifically, let $\widetilde{X}$ represent the LP solution. For each $l\in[L]$ , a set $T_{l}$ is randomly selected from $I_{l}\cup\{\emptyset\}$ according to the following distribution:

P(T_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widetilde{X}_{lS}&\text{if }S=\emptyset,\\ \widetilde{X}_{lS}&\text{if }S\in I_{l},\end{cases}

(3.1)

where the decision variable $Z_{l}^{S}=1$ if $S=T_{l}$ and $Z_{l}^{S}=0$ otherwise, for each $S\in I_{l}\cup\{\emptyset\}$ .

Subsequently, each item $p$ can only be assigned to bins where the corresponding selected sets contain $p$ , i.e., $p\in T_{l}$ . Since $T_{l}\in I_{l}\cup\{\emptyset\}$ , where $I_{l}$ represents the family of all nonempty feasible assignments of items to bin $l$ as defined in (1.3), this condition ensures that the capacity constraints are satisfied. We then focus on the other two types of constraints: (a) The budget constraint; (b) The restriction that each item can be assigned at most once. We will make two types of decisions: (a) Whether to utilize bin $l$ , represented by the indicator variable $\xi_{l}$ ; (b) Whether to assign item $p$ to bin $l$ (not necessarily utilized), represented by the indicator variable $Z_{l}^{p}$ . This assignment occurs only when three conditions are met simultaneously:

1.

Bin $l$ is assigned a set $S$ containing $p$ ( $Z_{l}^{S}=1$ and $p\in S$ )
2.

Bin $l$ is utilized ( $\xi_{l}=1$ )
3.

Item $p$ is assigned to bin $l$ (not necessarily utilized) ( $Z_{l}^{p}=1$ )

In other words, item $p$ is assigned to a utilized bin $l$ only when $Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}=1$ for some $S$ such that $p\in S$ . Importantly, in our algorithm design, $Z_{l}^{p}$ is set to be independent of both $Z_{l}^{S}$ and $\xi_{l}$ . The rationale and implications of this independence will be discussed in detail later in this section.

The expectation of the rewards can then be expressed as:

\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]

This equality is due to the linearity of expectation and the linear objective function. It allows us to analyze each term $\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]$ respectively.

The realizations of $Z_{l}^{S}$ have been determined through the previously described randomized rounding process. Subsequently, we determine $\xi_{l}$ and $Z_{l}^{p}$ by simulating two separate online decision-making processes based on these realizations.

To determine $\xi_{l}$ for $l\in[L]$ , we simulate an online decision-making process using the realizations of $H_{l}:=\sum_{S\in I_{l}}Z_{l}^{S}\in\{0,1\}$ , revealing results as $l$ goes from $1$ to $L$ . This process is adapted from an online knapsack problem decision-making process (see Section 4.2). The process consists of $L$ stages. At the $l$ th stage, the algorithm decides whether to utilize bin $l$ (set $\xi_{l}=1$ ) based on the realizations of $H_{l^{\prime}}$ for $l^{\prime}\leq l$ and the remaining budget. For $k\geq 3$ , it ensures that the overall costs of bin utilization are within the budget, i.e. $\sum_{l\in[L]}c_{l}\cdot\xi_{l}\leq B$ , while for $k<3$ , the costs are approximately within the budget. It is important to note that, in contrast to these two alternative procedures, our proposed algorithm ensures strict adherence to the budget constraint in both cases.

For each item $p$ , to determine $Z_{l}^{p}$ for $l\in[L]$ , we simulate an online decision-making process using the realizations of $\widetilde{Y}_{lp}:=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}\in\{0,1\}$ . This process iterates over $l$ in reverse order, from $L$ to $1$ , and is based on the magician’s mechanism (see Section 4.1). The process consists of $L$ stages. At the $(L-l+1)$ th stage²²2We use $(L-l+1)$ instead of $l$ to align with the descending order of $l$ , which simplifies subsequent index references of this sentence., the algorithm decides whether to assign item $p$ to bin $l$ (i.e., set $Z_{l}^{p}=1$ ) based on the realizations of $\widetilde{Y}_{l^{\prime}p}$ and the previous decisions $Z_{l^{\prime}}^{p}$ for $l^{\prime}>l$ . It is important to observe that this iteration order for determining $Z_{l}^{p}$ is inverse to the one used for determining $\xi_{l}$ . This inverse ordering plays a crucial role in ensuring the independence of these decision variables in our analysis.

The magician’s mechanism ensures that (i) $\sum_{l\in[L]}\sum_{S:p\in S}Z_{l}^{p}Z_{l}^{S}\leq 1$ , (ii) $Z_{l}^{p}$ is independent of $Z_{l}^{S}$ , and (iii) $\mathbb{E}[Z_{l}^{p}]\geq\frac{1}{2}$ . Note that point (i) ensures that item $p$ will be assigned to at most one bin. Due to the inverse revealing order for determining $\xi_{l}$ , $Z_{l}^{p}$ is also independent of $\xi_{l}$ . Consequently:

\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\mathbb{E}\left[Z_{l}^{p}\right]\cdot\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]\geq\frac{1}{2}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]

Therefore,

\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]\geq\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}\frac{v_{lp}}{2}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\cdot\xi_{l}\right]

(3.2)

The last term is then comparable to the outcome of the online knapsack problem decision-making process defined in Section 4.2, which helps us derive the desired result for these two alternative procedures. Finally, we compare our algorithm (Algorithm 1) to these two procedures to establish the desired approximation ratio.

Having provided an overview of our approach, we now outline the structure of the two subsequent sections. In Section 4, we introduce two key online mechanisms: the magician’s mechanism and online fractional knapsack problems. These mechanisms are essential for developing the two alternative procedures, and we present their relevant results. Section 5 then details our proposed algorithm, along with the two alternative procedures and their associated theoretical outcomes.

4 Essential Mechanisms for Algorithm Design and Analysis

As discussed in Section 3, our algorithm’s analysis is based on two key online optimization mechanisms. The first is the generalized $\gamma$ -conservative magician, originally proposed by Alaei et al. [1]. This mechanism serves a dual purpose: it determines $Z_{l}^{p}$ , as outlined in Section 3, and provides a part of the theoretical foundation for proving results in the second mechanism. We present this mechanism here for completeness. The second mechanism, which we will prove the efficacy of subsequently, adapts concepts from online knapsack problems and may be of independent interest to the broader optimization community.

4.1 Generalized Magician’s Problem [1, Section 7]

Definition 4.1.

(the generalized magician’s problem). A magician is presented with a sequence of $L$ boxes one by one in an online fashion. The magician has $\delta$ units of mana. The magician can only open box $i$ if she has at least $1$ unit of mana. If box $i$ is opened, the magician loses a random amount of mana $X_{i}\in[0,1]$ drawn from a distribution specified on box $i$ by its cumulative distribution function $F_{X_{i}}$ . It is assumed that $\mathbb{E}\left[\sum_{i=1}^{L}X_{i}\right]\leq\delta$ . The magician wants to open each box with ex ante probability at least $\gamma$ , i.e. not conditioned on $X_{i}$ for $i=1,\ldots,L$ , for a constant $\gamma\in[0,1]$ as large as possible.

The problem can be solved with a near-optimal solution using the following mechanism.

Definition 4.2.

(generalized $\gamma$ -conservative magician) The magician makes a decision about each box as follows: Let the random variable $W_{i}$ denote the amount of mana lost prior to seeing the $i$ th box, and $F_{W_{i}}(w):=P[W_{i}\leq w]$ denote the ex ante CDF of random variable $W_{i}$ . Define random binary variable $S_{i}$ conditional on $W_{i}$ as follows:

	$\displaystyle P[S_{i}=1\|W_{i}]$	$\displaystyle=\begin{cases}1,&W_{i}<\theta_{i}\\ (\gamma-F^{-}_{W_{i}}(\theta_{i}))/(F_{W_{i}}(\theta_{i})-F^{-}_{W_{i}}(\theta_{i})),&W_{i}=\theta_{i}\\ 0,&W_{i}>\theta_{i}.\end{cases}$
	$\displaystyle\theta_{i}$	$\displaystyle=\min\{w\|F_{W_{i}}(w)\geq\gamma\}\mbox{ and }F^{-}_{W_{i}}(w)=P[W_{i}<w].$

Then given $W_{i}$ and before seeing the $i$ th box, the magician will decide to open the $i$ th box if $S_{i}=1$ .

Note that $W_{1}=0$ with probability $1$ , and therefore $\theta_{1}=0$ . Then the CDF of $W_{i+1}$ and $\theta_{i+1}$ can be computed based on $W_{i}$ and $\theta_{i}$ in with dynamic programming. For more details, please see [1, Section 7].

One observation is that $S_{i}$ is independent to $X_{i}$ for all $i$ .

Theorem 4.3.

[1, Theorem 7.3] For any $\gamma\leq 1-\frac{1}{\sqrt{\delta}}$ , we have $\theta_{i}\leq\delta-1$ for all $i$ , and a generalized $\gamma$ -conservative magician with $k$ units of mana opens each box with an ex ante probability of exactly $\gamma$ , i.e., $P(S_{i}=1)=\gamma$ . If all $X_{i}$ are Bernoulli random variables, i.e., $X_{i}\in\{0,1\}$ for all $i$ , and $\delta$ is an integer, then for any $\gamma\leq 1-\frac{1}{\sqrt{\delta+3}}$ , we have $\theta_{i}\leq\delta-1$ for all $i$ , and a generalized $\gamma$ -conservative magician with $\delta$ units of mana opens each box with an ex ante probability of exactly $\gamma$ .

Remark 4.4.

In Theorem 4.3, $\delta$ can be non-integer when $X_{i}$ is not restricted to be Bernoulli variable as mentioned in [2] and by its proof.

Application in determining $Z_{l}^{p}$ : For each item $p$ , we construct a generalized $\frac{1}{2}$ -conservative magician with one unit of mana. This magician faces $L$ boxes, where the mana cost for opening each box follows a Bernoulli distribution. This distribution is determined by the distribution of $Z_{l}^{S}$ resulting from the randomized rounding process. The magician’s decisions on whether to open these boxes directly correspond to the determination of $Z_{l}^{p}$ . For a detailed description of this process, please refer to Step 3 of the procedures outlined in Lemma 5.1 and Lemma 5.3.

4.2 Online Fractional Knapsack Problems

Definition 4.5 (Online Fractional Knapsack Problem with Pre-arranged Arrival Order).

Consider a sequence of $n$ independent rounds. In each round $i$ , an object is characterized by a positive reward $r_{i}$ , a non-negative weight $w_{i}$ , and a positive appearance probability $p_{i}$ . Additionally, there is a specified capacity $\delta\geq 1$ , ensuring that the cumulative weight of the objects accepted does not exceed this capacity. The appearance of each object follows a Bernoulli distribution. The problem is subject to two constraints:

\sum_{i=1}^{n}w_{i}\cdot p_{i}\leq\delta,\quad 0<w_{i}\leq 1.

In this setting, we assume full knowledge of the object distribution. At the $i$ -th round, we observe the appearance of object $i$ before deciding a fraction $\beta_{i}\in[0,1]$ of object $i$ to accept. Acceptance is conditioned on the remaining capacity being at least $\beta_{i}\cdot w_{i}$ . Accepting the object results in a reduction of the remaining capacity by $\beta_{i}\cdot w_{i}$ and a secured reward of $\beta_{i}\cdot r_{i}$ . All decisions, once made, are irreversible.

It is assumed that the sequence of object appearances can be pre-determined before the initiation of the rounds.

The goal is to devise an algorithm that optimizes an efficiency coefficient $\alpha\in(0,1)$ such that the expected reward, at the end of $n$ rounds, is $\alpha\cdot\sum_{i=1}^{n}r_{i}\cdot p_{i}$ .

Remark 4.6.

The term “object” is used here instead of “item” to provide a distinction from the “item” used in GBAP. Specifically, in the analysis presented in Section 5, the allocation of objects in this online knapsack problem will be compared to the utilization of bins in GBAP.

Lemma 4.7 (Expected Reward for Prearranged Arrival Order).

Consider the problem defined in Definition 4.5. Assume the order of object appearances is prearranged such that

\displaystyle\frac{r_{1}}{w_{1}}\geq\frac{r_{2}}{w_{2}}\geq\ldots\geq\frac{r_{n}}{w_{n}}.

(4.1)

Furthermore, let the strategy be to accept objects fully in order of appearance until the remaining capacity is insufficient to accept the next appearing object fully. At this point, accept this next appearing object fractionally to exactly exhaust the remaining capacity. Under these conditions, the expected reward generated at the end of $n$ rounds is

\max\left\{\frac{1}{2},\left(1-\frac{1}{\sqrt{\delta}}\right)\right\}\cdot\sum_{i\in[n]}r_{i}\cdot p_{i}.

Proof.

We begin by establishing the ratio of $\left(1-\frac{1}{\sqrt{\delta}}\right)$ . According to Definition 4.2 and Theorem 4.3, the problem defined in Definition 4.5 admits a solution with this ratio when using the Magician’s mechanism. Note that in this case, the acceptance fraction $\beta_{i}\in\{0,1\}$ for all $i\in[n]$ due to the definition of Magician’s mechanism. It is evident that the strategy outlined in Lemma 4.7 outperforms the Magician’s mechanism in terms of efficiency and reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof for this part.

As for the ratio of $\frac{1}{2}$ , we will conduct the similar comparison, but with the help of the method in Manshadi et al. [16]. In their work, they defined a method such that at the end of $n$ rounds, it achieves that $\mathbb{E}\left[\min_{i\in[n]}\beta_{i}\right]\geq\frac{1}{2}$ . For more details, please see Theorem 2 in [16] when $\mu\leq 1$ . This means that the method finally yields expected reward at least

\frac{1}{2}\cdot\sum_{i\in[n]}r_{i}\cdot p_{i}.

Again, the greedy strategy outlined in Lemma 4.7 outperforms this method in terms of reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof. ∎

Application in determining $\xi_{l}$ and deriving approximation ratio: To apply the results of Lemma 4.7 in determining whether to utilize a bin $l$ (i.e., determine $\xi_{l}$ as mentioned in Section 3), we simulate the online knapsack problem in Algorithm 1. Specifically, after a randomized rounding process to realize $Z_{l}^{S}$ , we reindex the $L$ bins following the rule in (4.1) of Algorithm 1. We then simulate the online knapsack problem as described in Section 3. This approach serves two purposes: (1) it enables us to determine $\xi_{l}$ by simulating the decision-making process for the online knapsack problem, and (2) it allows us to compare the last term in (3.2) to the outcome of the online knapsack problem, enabling us to leverage the bound established in Lemma 4.7 to derive the desired approximation ratio.

5 Approximation Algorithm

In this section, we consider solving GBAP. Let $k$ be the ratio of the budget to the maximum bin cost:

k:=\frac{B}{\max_{l\in[L]}c_{l}}.

For the sake of simplicity and without loss of generality, we can scale the budget and costs to adhere to the following assumption:

Assumption 1.

The given budget is $k$ , and for all $l\in[L]$ , $c_{l}\leq 1$ , with $\max_{l\in[L]}c_{l}=1$ .

The LP relaxation can be solved in polynomial time by solving its dual with ellipsoid method as in [11, 17].

Algorithm 1.

1.

Find a solution to the LP relaxation of (1.4) and let $\widetilde{X}$ be the solution.

Reorder the bins so that

\displaystyle\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{c_{l}\sum_{S\in I_{l}}\widetilde{X}_{lS}}\geq\frac{\sum_{S\in I_{l^{\prime}}}\left(\widetilde{X}_{l^{\prime}S}\sum_{p\in S}v_{l^{\prime}p}^{S}\right)}{c_{l^{\prime}}\sum_{S\in I_{l^{\prime}}}\widetilde{X}_{l^{\prime}S}}~{}~{}~{}~{}~{}~{}\mbox{ for }1\leq l<l^{\prime}\leq L.

(5.1)

Note that here if the denominators are zeros, the corresponding numerators are also zeros, and the corresponding bins can be ignored in the following rounding process. Thus, we can define $\frac{0}{0}:=0$ in (5.1).

Independently for each $l\in[L]$ , randomly select a set $T_{l}$ from $I_{l}\cup\{\emptyset\}$ following the distribution below:

P({T}_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widetilde{X}_{lS}&\text{if }S=\emptyset\\ \widetilde{X}_{lS}&\text{if }S\in I_{l}\end{cases}.

Let the (random) decision variable be $Z_{l}^{S}$ , i.e., $Z_{l}^{S}=1$ if $S=T_{l}$ , and $Z_{l}^{S}=0$ otherwise for $S\in I_{l}\cup\{\emptyset\}$ .

4.

Temporarily assign all the elements in $T_{l}$ to bin $l$ for each $l\in[L]$ .
5.

For each item $p$ , retain it in the bin where it yields the highest reward $v_{lp}$ and remove it from others.
6.

Open all the bins that have been assigned with at least one item, and close all the other bins. If this assignment does not exceed the budget, then we take this assignment as the final solution.
7.

Otherwise, discard this assignment. We construct a new assignment.
8.

Let $B_{0}=k$ . Let $l$ be from $1$ to $L$ . At $l$ th iteration, let $B_{l}:=B_{l-1}-{c}_{l}\sum_{S\in I_{l}}Z_{l}^{S}\geq 0$ .
9.

If $k\geq 3$ , let $\mathcal{L}^{*}:=\{l:B_{l}\geq 0,~{}\sum_{S\in I_{l}}Z_{l}^{S}>0\}$ . Assign $T_{l}$ to $l$ for each $l\in\mathcal{L}^{*}$ and open it. For each item $p$ , retain it in the bin where it yields the highest reward $v_{lp}$ and remove it from others.
10.

If $k<3$ , let $\mathcal{L}^{*}:=\{l:B_{l-1}>0,~{}\sum_{S\in I_{l}}Z_{l}^{S}>0\}$ . Assign $T_{l}$ to $l$ for each $l\in\mathcal{L}^{*}$ and temporarily open it. For each item $p$ , retain it in the bin where it yields the highest reward $v_{lp}$ and remove it from others. Let $l^{*}=\max_{l\in\mathcal{L}^{*}}l$ . Then we open the bins in either $\mathcal{L}^{*}\backslash\{l^{*}\}$ or in $\{l^{*}\}$ , whichever generated more rewards.

Our proof is divided into two cases: $k\geq 3$ and $k<3$ . For both scenarios, we employ a two-step approach. First, we analyze the expected reward generated by a specific procedure of assignment based on two mechanisms: the magician’s mechanism defined in Definition 4.2 and the mechanism adapted from the online fractional knapsack problem discussed in Section 4.2. Second, we compare the rewards generated by Algorithm 1 with those from this constructed procedure.

We begin by addressing the case where $k\geq 3$ . For this scenario, we construct a specific procedure and provide its analysis in the following lemma.

Lemma 5.1.

Assume $k\geq 3$ . With the same notations as in Algorithm 1, we define a procedure as follows:

1.

For each $l\in[L]$ , let $H_{l}=\sum_{S\in I_{l}}Z_{l}^{S}$ , which indicates whether any set $S$ has been selected for bin $l$ in the randomized rounding process. Define $\xi_{l}=1$ if $B_{l}\geq 0$ and $H_{l}=1$ . Otherwise, set $\xi_{l}=0$ . Here $B_{l}$ represents the remaining budget after considering the first $l$ bins. Thus, $\xi_{l}=1$ indicates that bin $l$ is utilized, which occurs when there is sufficient budget ( $B_{l}\geq 0$ ) and a set has been selected for this bin ( $H_{l}=1$ ).

For every $l\in[L]$ and $p\in[P]$ , introduce a variable $\widetilde{Y}_{lp}$ , given by

\widetilde{Y}_{lp}=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}.

It indicates whether item $p$ is included in the selected set for bin $l$ , and its probability distribution is

P(\widetilde{Y}_{lp}=y)=\begin{cases}1-\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=0,\\ \sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=1.\end{cases}

3.

For each item $p\in[P]$ , create a generalized $\frac{1}{2}$ -conservative magician with $1$ unit of mana (called type-p magician). The magician of item $p$ is presented with a sequence of $L$ type- $p$ boxes from box $L$ to box $1$ . The distribution of $\widetilde{Y}_{lp}$ is written on type- $p$ box $l$ . The amount of mana the magician will lose if open type- $p$ box $l$ is equal to the value of $\widetilde{Y}_{lp}\in\{0,1\}$ . The magician will decide whether to open type- $p$ box $l$ following the mechanism defined in Definition 4.2. Let the (random) decision variable be $Z_{l}^{p}$ , i.e., $Z_{l}^{p}=1$ if the magician for $p$ decides to open type- $p$ box $l$ , and $Z_{l}^{p}=0$ otherwise.

If we utilize bin $l$ when $\xi_{l}=1$ , and assign item $p$ to bin $l$ whenever $Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1$ , then the assignment is feasible, which means that:

(a)

Each passenger is assigned to at most one utilized bin.
(b)

The total cost of all utilized bins is within the given budget $k$ .
(c)

The capacity constraints are respected.

Moreover, the expected approximation ratio derived from this assignment is at least $\left(\frac{k-1}{2k}\right)\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8}$ .

Proof.

We first prove its feasibility. We start with point (a). In Step 3, $Z_{l}^{p}$ represents the decision of the type- $p$ magician on whether to open the type- $p$ box with index $l$ . Meanwhile, $\widetilde{Y}_{lp}$ indicates whether opening the box requires spending 1 unit of mana. Based on the definition of the magician’s mechanism in Definition 4.2 and the results stated in Theorem 4.3, the type- $p$ magician will spend at most one unit of mana during this process. Therefore, $\sum_{l=1}^{L}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\leq 1$ . Since $Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\leq Z_{l}^{p}\cdot\widetilde{Y}_{lp}$ , so if we assign item $p$ to bin $l$ when $Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1$ , it can can only be assigned at most once through this process. As for point (b) and (c), they can be checked directly by definition.

The expected reward derived from the procedure in Lemma 5.1 can be expressed as:

$\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{p\in[P]}v_{lp}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\right]$	$\displaystyle=\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]$
	$\displaystyle=\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{p}\right]\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]$	(5.2)
	$\displaystyle\geq\frac{1}{2}\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}\mathbb{E}\left[Z_{l}^{S}\cdot\xi_{l}\right]$	(5.3)
	$\displaystyle=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{\|}\xi_{l}=1\right]\cdot P\left[\xi_{l}=1\right]$
	$\displaystyle=\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{\|}H_{l}=1\right]\cdot P\left[\xi_{l}=1\right]$	(5.4)

Eq. (5.2) results from observing that $Z_{l}^{p}$ is independent of $\xi_{l}$ , and $Z_{l}^{S}$ for each $l\in[L]$ and $S\in I_{l}$ . This independence arises because the magician’s mechanism ensures that $Z_{l}^{p}$ only depends on $\widetilde{Y}_{l^{\prime}}^{p}$ and $Z_{l^{\prime}}^{p}$ for $l^{\prime}\in\{l+1,l+2,\ldots,L\}$ as defined in Step 3 of the procedure, while $\xi_{l}$ depends on $Z_{l^{\prime}}^{S}$ for $l^{\prime}\in\{1,2,\ldots,l\}$ and $S\in I_{l}$ . (5.3) occurs because $Z_{l}^{p}$ represents the decision of a type- $p$ magician to open the box or not, constrained by having only one unit of mana, leading to $E[Z_{l}^{p}]\geq\frac{1}{\sqrt{1+3}}=\frac{1}{2}$ according to Theorem 4.3.

As for Eq. (5.4), it is due to $\xi_{l}$ depending only on $H_{l}$ and $H_{l^{\prime}}$ for $l^{\prime}<l$ , and $Z_{l}^{S}$ being independent of $H_{l^{\prime}}$ for $l^{\prime}<l$ .

Let us define

r_{l}:=\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v^{S}_{lp}Z_{l}^{S}\mid H_{l}=1\right]=\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{\sum_{S\in I_{l}}\widetilde{X}_{lS}}.

Subsequently, Eq. (5.4) can be expressed as

\displaystyle\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[r_{l}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right].

(5.5)

To establish the lower bound of Eq. (5.4), we construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of $L$ rounds. In each round $l$ , an object appears with a probability $P(H_{l}=1)$ , possessing a weight $c_{l}$ and a reward $r_{l}$ . The initial capacity is equal to the budget $k$ . As specified in Step 2 of Algorithm 1, it satisfies that $\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}$ , allowing us to apply Lemma 4.7.

In the constructed instance, if we fully accept each appearing object in sequence until the remaining capacity is inadequate for the full acceptance of the next appearing object, and then accept the next object fractionally to precisely exhaust the remaining capacity, then, according to Lemma 4.7 with $k\geq 3$ , the expected rewards obtained in this online knapsack problem are at least:

\displaystyle\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)=\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

If we instead only accept objects fully and stop when the remaining capacity is insufficient for the next appearing object (i.e., we do not accept any object fractionally), the reward equals $\sum_{l\in[L]}r_{l}\cdot\xi_{l}$ . Since $c_{l}\leq 1$ , the capacity is $k$ , and $\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}$ , then we have that

	$\displaystyle\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]$	$\displaystyle\geq\frac{k-1}{k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)$
		$\displaystyle=\frac{k-1}{k}\cdot\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.$		(5.6)

Summarizing (5.4),(5.5) and (5.6), we conclude that

\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{k-1}{2k}\left(1-\frac{1}{\sqrt{k}}\right)\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

This completes the proof. ∎

Theorem 5.2.

Assume $k\geq 3$ . The solution given by Algorithm 1 is feasible. The expected approximation ratio of the solution given by Algorithm 1 is at least $\left(\frac{k-1}{2k}\right)\cdot\left(1-\frac{1}{\sqrt{k}}\right)>\frac{1}{8}$ .

Proof.

The feasibility can be verified directly. Given the same realization of $Z_{l}^{S}$ in Step 3 of Algorithm 1, both Algorithm 1 and the procedure defined in Lemma 5.1 will utilize identical bins.

Moreover, under this realization, item $p$ can only be assigned to bins where $Z_{l}^{S}=1$ and $p\in S$ . This condition holds true for both Algorithm 1 and the procedure in Lemma 5.1.

Given the realizations of $Z_{l}^{S}$ , the optimal reward is achieved by retaining each item $p$ in the bin where it yields the highest reward $v_{lp}$ , subject to $Z_{l}^{S}=1$ and $p\in S$ . This is precisely what Step 9 of Algorithm 1 accomplishes.

Therefore, the expected reward of Algorithm 1 will be at least as high as that derived from the procedure defined in Lemma 5.1. ∎

The above proof is constructed by creating a procedure that produces a feasible assignment solution, then comparing Algorithm 1 with it to derive the desired result. For the case when $k<3$ , we will instead produce a procedure that generates an assignment solution that may violate the budget constraint. We will then compare this with Algorithm 1 to derive the desired result. Although the approaches differ in terms of solution feasibility of the constructed procedures, the key ideas remain similar. Both rely on the magician’s mechanism and the results from the online fractional knapsack problem presented in Section 4.

Lemma 5.3.

Assume $k<3$ . With the same notations as in Algorithm 1, we define a procedure as follows:

1.

For each $l\in[L]$ , let $H_{l}=\sum_{S\in I_{l}}Z_{l}^{S}$ . Set $\xi_{l}=1$ if $B_{l-1}\geq 0$ and $H_{l}=1$ . Otherwise, set $\xi_{l}=0$ . This ensures that bin $l$ is utilized ( $\xi_{l}=1$ ) only when the remaining budget for $l$ is non-negative ( $B_{l-1}\geq 0$ ) and a set has been selected for this bin ( $H_{l}=1$ ).

For every $l\in[L]$ and $p\in[P]$ , introduce a random variable $\widetilde{Y}_{lp}$ , given by

\widetilde{Y}_{lp}=\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}Z_{l}^{S}.

Its probability distribution is

P(\widetilde{Y}_{lp}=y)=\begin{cases}1-\sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=0,\\ \sum_{\begin{subarray}{c}S\in I_{l}:p\in S\end{subarray}}\widetilde{X}_{lS},&\text{if }y=1.\end{cases}

3.

For each item $p\in[P]$ , create a generalized $\frac{1}{2}$ -conservative magician with $1$ unit of mana (called type-p magician). The magician of item $p$ is presented with a sequence of $L$ type- $p$ boxes from box $L$ to box $1$ . The distribution of $\widetilde{Y}_{lp}$ is written on type- $p$ box $l$ . The amount of mana the magician will lose if open type- $p$ box $l$ is equal to the value of $\widetilde{Y}_{lp}\in\{0,1\}$ . The magician will decide whether to open type- $p$ box $l$ following the mechanism defined in Definition 4.2. Let the (random) decision variable be $Z_{l}^{p}$ , i.e., $Z_{l}^{p}=1$ if the magician for $p$ decides to open type- $p$ box $l$ , and $Z_{l}^{p}=0$ otherwise.

If we utilize bin $l$ when $\xi_{l}=1$ , and assign item $p$ to bin $l$ whenever $Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}=1$ for some $S\in I_{l}$ and $p\in S$ , then the assignment satisfies:

(a)

Each passenger is assigned to at most one utilized bin.
(b)

Let the total cost of all utilized bins be $k^{\prime}$ and define $l^{*}=\max\{l\in[L]:\xi_{l}=1\}$ . Then we have $k^{\prime}-c_{l^{*}}\leq k$ . In other words, if we remove the utilized bin with the largest index, the total cost of the remaining bins is within the given budget $k$ .
(c)

The capacity constraints are respected.

Moreover, the expected approximation ratio derived from this assignment is at least $\frac{1}{4}$ .

Proof.

As the proof employs reasoning similar to that used in Lemma 5.1, we have placed the full proof in Appendix A to maintain the flow of the main text. ∎

Theorem 5.4.

Assume $k<3$ . The solution given by Algorithm 1 is feasible. The expected approximation ratio of the solution given by Algorithm 1 is at least $\frac{1}{8}$ .

Proof.

The feasibility can be verified directly. Given the same realization of $Z_{l}^{S}$ in Step 3 of Algorithm 1, the algorithm will temporarily open the same bins as those utilized by the procedure defined in Lemma 5.3.

Moreover, under this realization, item $p$ can only be assigned to bins where $Z_{l}^{S}=1$ and $p\in S$ . This condition holds true for both Algorithm 1 and the procedure in Lemma 5.3.

Given the realizations of $Z_{l}^{S}$ , the optimal reward for these temporarily opened bins is achieved by retaining each item $p$ in the bin where it yields the highest reward $v_{lp}$ , subject to $Z_{l}^{S}=1$ and $p\in S$ . This is precisely what Step 10 of Algorithm 1 does for those temporarily opened bins. Consequently, the total reward from these temporarily opened bins is at least as large as that derived from the procedure in Lemma 5.3.

The key difference from Theorem 5.1 is that the temporarily opened bins might violate the budget constraint. To address this, we compare the reward of the temporarily opened bin with the largest index ( $l^{*}$ in Step 10 of Algorithm 1) to the reward of all other temporarily opened bins ( $\mathcal{L}^{*}\backslash\{l^{*}\}$ in Step 10 of Algorithm 1). Whichever is larger will retain at least half of the total reward.

Moreover, based on point (b) of Lemma 5.3, either choice will stay within the budget. This comparison and selection process is implemented in Step 10 of Algorithm 1.

Therefore, the expected reward of Algorithm 1 will be at least half of that derived from the procedure defined in Lemma 5.3, which yields the expected approximation ratio as $\frac{1}{2}\cdot\frac{1}{4}=\frac{1}{8}$ . ∎

6 Applications in transportation

In this section, we introduce an application of GBAP in a multi-modal Mobility-on-Demand (MoD) setting, illustrating how the framework can be adapted to address practical transportation problems. Consider a MoD operator managing a fleet of single-capacity cars and high-capacity buses. The operator aims to provide a multi-modal service with buses operating on fixed routes and cars functioning in a demand-responsive manner. An individual trip request can be served by a bus, a car, or a combination of both.

This scenario could be envisioned as a transit agency planning a multi-modal service where traditional buses are complemented by a demand-responsive service, or a private rideshare operator integrating fixed-route shuttles with taxis. This problem was investigated by [17]. There is a wide range of literature studying the traditional line-planning problem. For instance, [6] introduced a new multicommodity flow model for line planning. Their model aims to strike a balance between minimizing operating costs for transport companies and reducing travel times for passengers. On the other hand, [4] provides a holistic approach for transit line planning using column generation. For additional literature on line planning problems in public transportation, readers are directed to [18].

By applying GBAP to this MoD scenario, we can optimize resource allocation and route planning, maximizing the efficiency and service quality of the transportation system while adhering to budget constraints. This demonstrates the versatility and practical value of GBAP in addressing complex, real-world transportation challenges.

In this problem, the transit network is modeled as an undirected weighted graph $G=(V,E)$ , where $V$ contains $n$ nodes as potential origin/destination nodes, and $E$ is the set of edges representing the shortest paths between the nodes. The weights of the edges are the cost required to cross an edge $e\in E$ . A set $\mathcal{L}$ of candidate bus routes/lines on $G$ is given to the operator. Each candidate route / line is defined as a fixed sequence of consecutive edges of $G$ , and it has an operating cost $c_{l}$ . There is a budget $B$ for operating bus lines. Each bus has capacity $C$ and each line $l$ has frequency $f_{l}$ . Thus, the overall capacity of a bus line is $C\cdot f_{l}$ . The cars cover ’first-last mile’ travel. It is assumed that there are no inter-bus transfers for each passenger. If the passenger $p$ is assigned to line $l$ , then $v_{lp}\geq 0$ is defined as the welfare value contributed to the system by this matching. The operator is responsible for selecting a set of bus lines from $\mathcal{L}$ to operate in a way that maximizes the overall welfare of the system. The value of $v_{lp}$ is calculated according to the rules given, e.g. binary based on whether the matching is feasible or the number of car miles saved due to the matching (for more details, please see [17]). The goal is to find a subset of lines to be operated and an assignment of passengers to opened lines to maximize the welfare of the system, such that: (a) the cost of opened lines does not exceed the budget; (b) the assignment of passengers and their trips to the opened line $l$ respect the capacity constraints of each edge on line $l$ ; (c) a passenger is assigned to at most one line.

Now we can describe how this problem fits into the proposed framework. For each line $l$ with $n_{l}$ edges, we map it to a bin in GBAP with an $n_{l}$ -dimensional capacity vector. Each entry in the capacity vector of line $l$ is set to $C\cdot f_{l}$ . Each passenger is taken as an item in GBAP with a $0-1$ weight vector $r_{lp}:=(r_{lp}^{(1)},\ldots,r_{lp}^{(n_{l})})\in\{0,1\}^{n_{l}}$ . If the $i$ th edge of line $l$ is used when passenger $p$ is matched to line $l$ , then we set the $i$ th entry of $r_{lp}$ to $1$ , and $0$ otherwise. Note that $r_{lp}$ has consecutive $1$ s according to the definition of a route in RLPP. The cost of utilizing bin $l$ is set to be $c_{l}$ while the cost of assigning item $p$ to bin $l$ is assumed to be $0$ . Contributed welfare values $\{v_{lp}\}$ are the coefficients in the objective of (1.1). Since each passenger can only be matched to at most one line, we let $\rho_{p}=1$ for each $p\in[P]$ . The RLPP is an instance of GBAP with an additional aggregation step that merges any subset of selected lines with the same route (and different frequencies) to a single line with the corresponding aggregated frequency. Since the aggregation step neither increases the cost of the solution nor decreases the objective values as shown in [17], and can be trivially incorporated into our framework, we omit it here for simplicity.

Périvier et al. [17] present an approximation algorithm (see Algorithm 2 in Appendix B) to solve the problem based on the randomized rounding scheme proposed by [11]. The corresponding approximation ratio is $1-\frac{1}{e}-\epsilon$ , with a probability of violating the budget constraint bounded by $e^{-\epsilon^{2}k/3}$ , where $k:=\frac{B}{\max_{l\in\mathcal{L}}c_{l}}$ as we have assumed before.

Remark 6.1.

We note that the violation probability bound, given by $e^{-\frac{k}{3}\epsilon^{2}}$ , may surpass $\frac{1}{2}$ , even when $k$ is as large as $\frac{1}{\epsilon^{2}}$ . To address this, one may consider setting $\epsilon$ to $\frac{1}{k^{\gamma}}$ , with $\gamma<\frac{1}{2}$ . However, this implies that the method’s approximation ratio becomes $\left(1-\frac{1}{e}\right)\left(1-\frac{1}{k^{\gamma}}\right)<\left(1-\frac{1}{e}\right)\left(1-\frac{1}{\sqrt{k}}\right)$ with violation probability bound $e^{-\frac{1}{3}k^{1-2\gamma}}$ . We present Table 1 to illustrate this by taking $k=25,50,100$ , which are reasonable and relatively large parameters in real-world cases. Note that by definition, the number of the opened lines in the optimal solution can be significantly larger than $k$ . We calculate the approximation ratio as $\left(1-\frac{1}{e}\right)\left(1-\epsilon\right)$ since it is the original one derived by [17] and is higher than $\left(1-\frac{1}{e}-\epsilon\right)$ . Furthermore, note that the rounding process tends to produce infeasible solutions with higher objective values than feasible ones, suggesting that the actual expected approximation ratio for feasible solutions might be significantly lower than $\left(1-\frac{1}{e}\right)\left(1-\epsilon\right)$ , especially when the violation probability is non-negligible.

$k$	$\gamma$	$\epsilon$	$\left(1-\frac{1}{e}\right)\left(1-\epsilon\right)$	$e^{-\frac{k}{3}\epsilon^{2}}$
25	1/3	0.3420	0.4159	0.3773
25	1/4	0.4472	0.3494	0.1889
50	1/3	0.2714	0.4605	0.2929
50	1/4	0.3761	0.3944	0.0947
100	1/3	0.2154	0.4959	0.2128
100	1/4	0.3162	0.4322	0.0357

Table 1: Results for the given expressions with varying

k

and

\gamma

values. Algorithm 1 achieve approximation ratios greater than

0.39

0.4

, and

0.44

for

k=25,50,100

, respectively, by setting the

\epsilon

term to be a sufficiently small constant.

We note that when Algorithm 1, and Algorithm 2 (in Appendix B) utilize the same LP solution for their rounding process, Algorithm 1 are provably superior to Algorithm 2. The reason for this is that the feasible solutions produced by Algorithm 2 can also be generated by Algorithm 1 with the same probability distributions, while each infeasible solution generated by Algorithm 2 is converted into a feasible one by Algorithm 1. However, since the approximation ratio derived for Algorithm 2 encompasses the objective values of infeasible solutions, it is not applicable to Algorithm 1.

6.1 Numerical Experiments.

In this section, we conduct numerical experiments in the context of RLPP based on the data provided by [17] and Algorithm 1. We also compare its performance with Algorithm 2 with $\epsilon=0.05$ . The underlying road network is derived from OpenStreetMap (OSM) geographical data from [5]. The size of the candidate set of lines is set to be $1000$ , and the lines are generated based on the method proposed by [19]. The passenger data comes from records of for-hire vehicle trips in Manhattan using the New York City Open Data operator, with a time window between 5 pm and 6 pm on the first Tuesday of April 2018. There are $13,851$ trip requests. The bus capacity is set to be $30$ . For practical purposes, we solve the corresponding LP via column generation and apply a timeout—the current LP solution will be returned once the time limit is exceeded.

We implement Algorithm 1, Algorithm 2, and the modified Algorithm 1 with rounding based on the same LP solution used by Algorithm 2. The experiments consider four different budgets, and we simulate the rounding $10^{4}$ times. In Figure 4, the $x$ -axis represents the number of simulations, and the $y$ -axis indicates the objective value of the best solution found by the algorithms so far. We omit the first $100$ realizations in Figure 4 to make the display of the $y$ -axis more readable. The results³³3In the dataset we employed, the value of $k$ substantially exceeds $3$ . In this case, Algorithm 1 coincides with Algorithm 2 presented in our conference paper [12], yielding same numerical outcomes. show that when the budget is $8\times 10^{4}$ and $10^{5}$ respectively, our algorithms find significantly better solutions, while the three methods perform similarly when the budget is $2\times 10^{4}$ and $5\times 10^{4}$ respectively.

7 Conclusion and Future Directions

In this paper, we have introduced a novel approach to the Generalized Budgeted Assignment Problem (GBAP) and demonstrated its application to the Real-Time Line Planning Problem (RLPP) in transportation systems. To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP. Our approach not only enhances the theoretical understanding of GBAP but also provides practical solutions for complex transportation planning challenges.

Our contributions are threefold. First, we have developed a randomized approximation algorithm for GBAP with a proven constant approximation ratio. Second, we have introduced an innovative technique that employs randomized rounding realizations to simulate two distinct online mechanisms. This approach enables the simultaneous satisfaction of different constraints while maintaining a strong theoretical guarantee on algorithm performance. Third, we have applied our algorithm to the RLPP and demonstrated its effectiveness through numerical experiments using real-world data from New York City’s transportation system. This application underscores the algorithm’s potential for large-scale urban transit planning and bridges the gap between theoretical advancements and practical implementations in transportation systems.

While our results represent a significant step forward, several avenues for future research remain open.

As stated in out extended abstract at COCOON [12, Theorem 6], we have established that the approximation ratio cannot exceed $1-\frac{1}{e}$ by reducing the problem to the max $k$ -cover problem.⁴⁴4We have chosen to omit the detailed statement of this result to maintain focus on our primary contributions. An avenue for future research is to explore whether the upper bound can be tightened to a constant strictly less than $1-\frac{1}{e}$ .

The second potential direction for future research is to explore possible improvements to our current approximation ratio. While our algorithm (Algorithm 1) enhances practical efficiency and performance compared to the two alternative procedures presented in Lemma 5.1 and Lemma 5.3, its approximation ratio guarantee remains the same as these procedures since it essentially builds upon their bounds. It remains unclear whether there is a significant gap in the theoretical guarantees between our algorithm and these two alternative procedures. Therefore, refining the analysis to enhance the theoretical bound of Algorithm 1 may be worthwhile. Another avenue could be the development of new algorithms with a better approximation ratio guarantee. However, it is important to note that our primary goal has been to develop an algorithm that is both theoretically sound and practically efficient. This dual requirement presents significant challenges in substantially improving the approximation ratio without compromising computational efficiency. Any future improvements will need to carefully balance these competing objectives, making this a complex but potentially rewarding area for further investigation.

It is also interesting for future research to explore scenarios with stochastically appearing items, each following a known binary distribution. In this context, the objective would be to maximize expected rewards by strategically utilizing bins within the given budget constraint. This extension could enhance the model’s applicability to real-world situations where demand or item availability is uncertain but follows predictable patterns.

References

Alaei [2014] Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM Journal on Computing, 43(2):930–972, 2014.
Alaei et al. [2013] Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. The online stochastic generalized assignment problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 11–25. Springer, 2013.
Bender et al. [2015] Marco Bender, Clemens Thielen, and Stephan Westphal. Packing items into several bins facilitates approximating the separable assignment problem. Information Processing Letters, 115(6-8):570–575, 2015.
Bertsimas et al. [2021] Dimitris Bertsimas, Yee Sian Ng, and Julia Yan. Data-driven transit network design at scale. Operations Research, 69(4):1118–1133, 2021.
Boeing [2017] Geoff Boeing. Osmnx: A python package to work with graph-theoretic openstreetmap street networks. Journal of Open Source Software, 2(12), 2017.
Borndörfer et al. [2007] Ralf Borndörfer, Martin Grötschel, and Marc E Pfetsch. A column-generation approach to line planning in public transport. Transportation Science, 41(1):123–132, 2007.
Calinescu et al. [2011] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
Chekuri and Khanna [2005] Chandra Chekuri and Sanjeev Khanna. A polynomial time approximation scheme for the multiple knapsack problem. SIAM Journal on Computing, 35(3):713–728, 2005.
Chekuri et al. [2012] Chandra Chekuri, Nitish Korula, and Martin Pál. Improved algorithms for orienteering and related problems. ACM Transactions on Algorithms (TALG), 8(3):1–27, 2012.
Feige and Vondrak [2006] Uriel Feige and Jan Vondrak. Approximation algorithms for allocation problems: Improving the factor of 1-1/e. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 667–676. IEEE, 2006.
Fleischer et al. [2011] Lisa Fleischer, Michel X Goemans, Vahab S Mirrokni, and Maxim Sviridenko. Tight approximation algorithms for maximum separable assignment problems. Mathematics of Operations Research, 36(3):416–431, 2011.
Jiang and Samaranayake [2022] Hongyi Jiang and Samitha Samaranayake. Approximation algorithms for capacitated assignment with budget constraints and applications in transportation systems. In International Computing and Combinatorics Conference, pages 94–105. Springer, 2022.
Kulik [2011] Ariel Kulik. Submodular and linear maximization with knapsack constraints. Master’s thesis, Computer Science Department, Technion, 2011.
Kulik et al. [2013] Ariel Kulik, Hadas Shachnai, and Tami Tamir. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research, 38(4):729–739, 2013.
Lee et al. [2009] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 323–332, 2009.
Manshadi et al. [2021] Vahideh Manshadi, Rad Niazadeh, and Scott Rodilitz. Fair dynamic rationing. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 694–695, 2021.
Périvier et al. [2021] Noémie Périvier, Chamsi Hssaine, Samitha Samaranayake, and Siddhartha Banerjee. Real-time approximate routing for smart transit systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–30, 2021.
Schöbel [2012] Anita Schöbel. Line planning in public transportation: models and methods. OR spectrum, 34(3):491–510, 2012.
Silman et al. [1974] Lionel Adrian Silman, Zeev Barzily, and Ury Passy. Planning the route system for urban buses. Computers & operations research, 1(2):201–211, 1974.
Wolsey [1982] Laurence A Wolsey. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research, 7(3):410–425, 1982.

Appendix A Proof of Lemma 5.3

Proof of Lemma 5.3.

Point (a) can be proven using the exact same way as in Lemma 5.3. Regarding point (b) and (c), they can be directly verified from its definition.

Following the same reasoning as in (5.4), the expected reward derived from the procedure in Lemma 5.3 can be expressed as:

	$\displaystyle\mathbb{E}\left[\sum_{l\in[L]}\sum_{p\in[P]}v_{lp}Z_{l}^{p}\cdot\widetilde{Y}_{lp}\cdot\xi_{l}\right]$	$\displaystyle=\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]$
		$\displaystyle\geq\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{S}\Bigg{\|}H_{l}=1\right]\cdot P\left[\xi_{l}=1\right]$		(A.1)

Same as in Lemma 5.3, let us define

r_{l}:=\mathbb{E}\left[\sum_{S\in I_{l}}\sum_{p\in S}v^{S}_{lp}Z_{l}^{S}\mid H_{l}=1\right]=\frac{\sum_{S\in I_{l}}\left(\widetilde{X}_{lS}\sum_{p\in S}v_{lp}\right)}{\sum_{S\in I_{l}}\widetilde{X}_{lS}}.

Subsequently, Eq. (A.1) can be expressed as

\displaystyle\frac{1}{2}\sum_{l\in[L]}\mathbb{E}\left[r_{l}\cdot\xi_{l}\right]=\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right].

(A.2)

To establish the lower bound of Eq. (A.1), we again construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of $L$ rounds. In each round $l$ , an object appears with a probability $P(H_{l}=1)$ , possessing a weight $c_{l}$ and a reward $r_{l}$ . The initial capacity is equal to the budget $k$ . As specified in Step 2 of Algorithm 1, it satisfies that $\frac{r_{1}}{c_{1}}\geq\frac{r_{2}}{c_{2}}\geq\ldots\geq\frac{r_{L}}{c_{L}}$ , allowing us to apply Lemma 4.7.

Following the same reasoning as in Lemma 5.3, expected rewards of this online knapsack problem instance are at least:

\displaystyle\frac{1}{2}\sum_{l\in[L]}r_{l}\cdot P(H_{l}=1)=\frac{1}{2}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

Unlike Lemma 5.1, the definition of $\xi_{l}$ provided in Lemma 5.3 indicates that the reward cannot exceed $\sum_{l\in[L]}r_{l}\cdot\xi_{l}$ . Consequently, we establish:

\displaystyle\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{1}{2}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

(A.3)

Summarizing (A.1),(A.2) and (A.3), we conclude that

\mathbb{E}\left[\sum_{l\in[L]}\sum_{S\in I_{l}}\sum_{p\in S}v_{lp}Z_{l}^{p}Z_{l}^{S}\cdot\xi_{l}\right]\geq\frac{1}{2}\mathbb{E}\left[\sum_{l\in[L]}r_{l}\cdot\xi_{l}\right]\geq\frac{1}{4}\sum_{l\in[L]}\sum_{p\in[P]}\sum_{S\in I_{l}:p\in S}v_{lp}\widehat{X}_{lS}.

This completes the proof. ∎

Appendix B Method from [17]

Here we restate the method from [17] while omitting the aggregation step.

Algorithm 2.

1.

Let $\widehat{X}$ be the solution to the LP relaxation of (1.4).

Independently for each $l\in[L]$ , randomly select a set $T_{l}$ from $I_{l}\cup\{\emptyset\}$ following the distribution below:

P({T}_{l}=S)=\begin{cases}1-\sum_{S\in I_{l}}\widehat{X}_{lS}&\text{if }S=\emptyset\\ \widehat{X}_{lS}&\text{if }S\in I_{l}\end{cases}.

Let the (random) decision variable be $Z_{l}^{S}$ , i.e., $Z_{l}^{S}=1$ if $S=T_{l}$ , and $Z_{l}^{S}=0$ otherwise for $S\in I_{l}\cup\{\emptyset\}$ .

3.

Temporarily assign all the elements in $T_{l}$ to bin $l$ for each $l\in[L]$ .
4.

For $p\in[P]$ , we remove each item $p$ that is assigned to more than $\rho$ bins from all but the $\rho$ bins with highest rewards $v_{lp}$ it is assigned to.
5.

Open all the lines that have been assigned with at least one passenger, and close all the other lines. If this assignment does not exceed the budget, then we take this assignment as the final solution.