Approximation Algorithm for Generalized Budgeted Assignment Problems and Applications in Transportation Systems
bSchool of Civil and Environmental Engineering, Cornell University, USA
[email protected], [email protected]
)
Abstract
Motivated by a transit line planning problem in transportation systems, we investigate the following capacitated assignment problem under a budget constraint. Our model involves bins and items. Each bin has a utilization cost and an -dimensional capacity vector. Each item has an -dimensional binary weight vector , where the s in (if any) appear in consecutive positions, and its assignment to bin yields a reward . The objective is to maximize total rewards through an assignment that satisfies three constraints: (i) the total weights of assigned items do not violate any bin’s capacity; (ii) each item is assigned to at most one open bin; and (iii) the overall utilization costs remain within a total budget .
We propose the first randomized rounding algorithm with a constant approximation ratio for this problem. We then apply our framework to the motivating transit line planning problem, presenting corresponding models and conducting numerical experiments using real-world data. Our results demonstrate significant improvements over previous approaches in addressing this critical transportation challenge.
Keywords: Assignment problem, Approximation algorithm, Transportation systems
1 Introduction
This paper presents a framework for addressing a critical challenge in transportation systems: optimizing the selection of high-capacity fixed-route transit lines to maximize the total reward from passenger route coverage, subject to a specified operational budget constraint. This problem arises from urban transit systems’ pressures of satisfying passenger demand with inadequate resource allocation, necessitating innovative solutions for effective line planning. Our approach aims to enhance the overall performance and efficiency of public transportation networks through strategic line planning. The primary challenge lies in developing an algorithm that is both theoretically sound and practically efficient, capable of deriving high-quality solutions for large-scale problems, such as those encountered in complex urban environments like New York City. This research makes contributions in both discrete optimization and transportation planning, proposing a novel algorithmic framework that bridges the gap between theoretical guarantees in algorithm design and practical applicability in real-world transit planning scenarios.
To address this challenge, we study the transportation problem in the context of a new assignment problem, which we call the Generalized Budgeted Assignment Problem (GBAP). Prior to introducing the formal definition, we present a warm-up example related to the aforementioned transportation line planning problems. This simple example aims to provide context and facilitate understanding of the subsequent formal definitions of the GBAP.
For ease of explanation, this example is relatively restrictive compared to real-world applications. Thus, it is important to note that we will study a more generalized line planning problem in Section 6. This extended discussion will still be based on the GBAP and corresponding approximation algorithm we develop, and will utilize real-world data.
1.1 Warm-up Example
This example involves selecting a set of fixed-route transit lines to operate such that passenger coverage is maximized, while adhering to vehicle capacity constraints and satisfying a budget constraint. By framing this as an assignment-type integer program, we aim to find the optimal solution that balances cost and service efficiency. More specifically, consider a transit system with four stops: , , , and . There are three candidate transit lines: line , which travels from to to ; line , which travels from to to to ; and line , which travels from to to to . These lines cost , , and respectively to operate. Each transit line operates a bus with a capacity of passengers. This capacity applies uniformly at all points along the route. We have six passengers: passengers and need to travel from to , passengers , , and from to , and passenger from to . These transit lines, their costs, and passenger routes are illustrated in Figure 1 and Figure 2.


For simplicity, we assume each passenger must travel directly from their origin to their destination on a single bus line (i.e., no transfers). To serve a passenger, the line chosen must be open and cover the passenger’s entire route. The total cost of opened lines must stay within the given budget of . Therefore, we can open at most lines. Our goal is to choose a set of lines to open within the budget and assign passengers to these lines, maximizing the number of passengers served while adhering to these rules. If a line covers a passenger’s route, assigning this passenger to the line generates a reward of ; otherwise, the reward is . The objective is to maximize the overall generated reward in this context.
To solve this problem, we can construct a capacity vector for each of the three lines, as shown in Figure 1. Line has a capacity vector of , while and have capacity vectors of . Each entry represents the maximum number of passengers that can be accommodated on a segment of the line. The length of each vector corresponds to the number of edges in that line. Figure 2 illustrates how passengers utilize different lines and the corresponding weight vectors. For example, passengers and , traveling from to , have weight vectors for line , for line 2, and for line . These vectors indicate which segments of each line are used by the passengers, with representing a used segment and an unused segment. The reward of assigning passengers to a line depends on whether the line covers their route. For instance, assigning passengers and to line or yields no reward as these lines do not cover their route, while assigning them to line gives a reward of per person. To obey capacity constraints, the sum of weight vectors for passengers assigned to a line cannot exceed that line’s capacity vector entry-wise. For example, we cannot assign passengers , , and to line as it would exceed the capacity for the edges from to and to .
Remark 1.1.
When a line covers a passenger’s route, the weight vector contains 1s appearing in adjacent positions, reflecting that passenger routes consist of consecutive edges on the lines. For lines not covering a passenger’s route, the weight vectors are defined as all zeros for consistency, though they could be arbitrary since the reward for these assignments is .
Based on these definitions, we can formulate this problem as an assignment-type integer program. This formulation is a specific case of GBAP, which we will present in details in Section 1.2, specifically in (1.1). It can be verified directly that in this example, the optimal solution is to operate lines and , covering 5 passengers, with only one passenger (from C to D) left unserved.
1.2 Definition of Generalized Budgeted Assignment Problem
Building on the warm-up example, we now formally define GBAP. This problem involves selecting a set of bins within a given budget and assigning a set of items to these selected bins under multidimensional capacity constraints, with the objective of maximizing the assignment reward. To help understand this model, one can think of transit lines as bins and passengers as items to be allocated. Specifically, consider a system with bins and items. Each bin 111Here, denotes the set for any integer . is characterized by a utilization cost and an -dimensional capacity vector . In our warm-up example, the capacity vector has uniform entries across all dimensions. However, in our generalized model, this capacity can vary across dimensions (see Remark 1.2 for further details). Additionally, the dimension corresponds to the number of edges involved in a line in the warm-up example. Each item has an -dimensional weight vector for each bin : , where the s in (if any) appear in consecutive positions within the vector. This structure reflects the assumption that routes consist of consecutive edges, as motivated by the warm-up example where routes follow contiguous edges in a line. Assigning item to bin yields a reward . The objective is to maximize the total reward through an assignment that satisfies three constraints: (i) the total weight of items assigned to each bin does not exceed the bin’s capacity in any dimension, (ii) each item is assigned to at most one bin, (iii) items can only be assigned to utilized bins, and (iv) the total cost of utilized bins does not exceed a given budget .
The GBAP problem can be formulated as a - linear integer programming problem:
(1.1a) | ||||
s.t. | (1.1b) | |||
(1.1c) | ||||
(1.1d) | ||||
(1.1e) | ||||
(1.1f) |
(1.1b) ensures that the budget constraint is respected. (1.1c) guarantees that the multidimensional capacity of bins is not breached by the assigned items. (1.1d) requires that item can be assigned at most one bin. If item is assigned to bin , then . Otherwise, . (1.1e) guarantees that each bin can have at most one copy of each item. Similarly, indicates that bin is utilized. Items can only be matched to utilized bins.
Remark 1.2.
While the warm-up example uses binary rewards ( or ), our model allows for any non-negative (rational) rewards depending on different passengers (items) and transit lines (bins). This flexibility enables the study of more complex transportation scenarios, such as maximizing social welfare defined according to passenger characteristics like income and age. For further details and discussions on applications, please refer to Section 6.
Additionally, while the capacity vector in our warm-up example contains uniform values, our model allows for different values across various dimensions. This feature provides enhanced flexibility, making the model applicable to a broader range of scenarios.
For the rest of this paper, we define as the ratio of the budget to the largest line cost:
(1.2) |
This notation will be used frequently throughout our subsequent discussion and analysis.
1.3 Linear Programming Relaxation of Integer Program (1.1)
We will develop a randomized rounding algorithm based on the solution to the linear programming (LP) relaxation of (1.1). While a natural approach to relaxing (1.1) would be to directly set and as continuous variables, this method yields a loose solution, as noted in [11] and [17], thereby limiting the performance of LP-based randomized rounding algorithms. Consequently, we will employ the LP relaxation (1.4) initially proposed by [11], augmented with an additional budget constraint.
For each bin , define
(1.3) |
In other words, is the family of all nonempty feasible assignments of items to , where a feasible assignment is a set such that bin has enough capacity to contain all the items in , and . For instance, in the warm-up example, if refers to line , then because it violates the capacity constraints, whereas .
Then the LP relaxation of (1.1) can be stated as follows:
(1.4a) | ||||
(1.4b) | ||||
(1.4c) | ||||
(1.4d) | ||||
(1.4e) |
When is restricted to , (1.4b) enforces that each bin can be matched with at most one feasible assignment. Constraints (1.4c) and (1.4d) correspond to (1.1d) and (1.1b) respectively. This LP relaxation can be solved in polynomial time using the ellipsoid method as in [11, 17].
Paper Structure: The remainder of this paper is organized as follows: Section 2 presents related work, challenges, and our contributions. Section 3 provides a high-level overview of our approach. Sections 4 and 5 detail the technical foundations, algorithm, and its analysis. Section 6 presents numerical experiments using real-world data. Section 7 concludes the paper and discusses future research directions.
2 Related Work, Challenges and Contributions
2.1 Related Work
The GBAP is a variant of separable assignment problem (SAP). In SAP, there are bins and items. For each item and bin , there is a reward generated when item is assigned to bin . Moreover, for each bin , there is a separate packing constraint, which means that only certain subsets of the items can be assigned to bin when the subset satisfies the constraint. The goal is to find an assignment of items to the bins such that all the sets of items assigned to bins do not violate packing constraints, each item is assigned to at most one bin, and the total reward of assignments is optimal. Fleisher et al. [11] propose a -approximation for SAP assuming that there is a -approximation algorithm for the single bin subproblem. Moreover, they showed that a special case of SAP called the capacitated distributed caching problem (CapDC) cannot be approximated in polynomial time with an approximation factor better than unless NP DTIME. In CapDC, there are cache locations (bins), requests (items) and different files. Bin has capacity , and file has size . Request is associated with some file and a value . If a request is connected to cache location , then there is a cost and the reward is . The total size of the files corresponding to connected requests should be no larger than the capacity of the bin.
The maximum generalized assignment problem (GAP) is an important special case of SAP, in which each bin has size , each item has size in each bin and a feasible assignment means that the total size of assigned items for each bin is not larger than the bin’s capacity. GAP is proved to be APX- hard by Chekuri et al. [8]. The best known approximation algorithm for GAP is from [10] and achieves an approximation factor of for a small constant .
Calinescu et al. [7] also give a -approximation for SAP. Moreover, they give a -approximation algorithm for GAP with the additional restriction of a budget constraint that on the number of bins that can be used. [14] propose a -approximation method for GAP with knapsack constraints, where there is a multi-dimensional budget vector and each item has a multi-dimensional cost vector same for each bin. The total cost vector of assigned items in a feasible assignment is no larger than the budget vector entry-wise.
A generalization of SAP, which is called as -SAP, is studied by Bender et al. [3] (it is called -SAP by [3], where is the number of allowed bins there, but here we make it consistent with our notation to avoid confusion). In -SAP each item can be assigned to at most different bins, but at most once to each bin. The goal is to find an assignment of items to bins that maximizes total reward. They present a -approximation algorithm for -SAP under the assumption that there is a -approximation algorithm for the single bin subproblem. If the single bin subproblem admits an FPTAS, then the approximation factor is . The scheme can also be extended to the different upper bounds of number of allowed bins for different items: if the upper bound is for item , and , then their algorithm yields approximation.
None of the aforementioned studies are applicable to scenarios that include budget constraints on bins.
In Périvier et al.’s study [17], the focus is on the Real-Time Line Planning Problem (RLPP), which can be modeled as GBAP. In this problem setting, each passenger is restricted to selecting a single subroute per line, and there is no cost associated with allocating passengers to these lines. To tackle the RLPP, the authors propose a randomized rounding algorithm, building upon the scheme initially introduced by Fleischer et al. [11]. This algorithm yields a -approximate solution, with the probability of exceeding the budget constraint capped at . Here, , where represents the budget, is the cost associated with line , and is the set of all lines. For a comprehensive discussion and additional related works in the domain of line planning, please see Section 6.
GAP in an online setting has also been studied by Alaei et al. [2]. They propose a -competitive algorithm under the assumption that no item takes up more than fraction of the capacity of any bin. In the setting of [2], items arrive in an online manner; upon arrival, an item can be assigned to a bin or discarded; the objective is to maximize the total reward of the assignment. The size of an item is revealed only after it has been assigned to a bin; the distribution of each item’s reward and size is known in advance. The online algorithm is developed based on a generalization of the magician’s problem in [1], where it is used to assign a set of items with limited supply to a set of buyers in order to maximize the expected value of revenue or welfare.
2.2 Challenges
Our objective is to devise an algorithm that maintains a constant approximation ratio while being practically efficient. Balancing theoretical rigor with practical efficiency poses a significant challenge. Moreover, conventional approaches fail to yield a constant approximation ratio for our specific problem while meeting all practical constraints.
Firstly, while our problem shares similarities with the capacitated facility location problem (CFLP), for which [20] provides a approximation algorithm based on the CFLP’s submodular structure, our problem lacks this submodular property [17, Proposition 4.10]. Consequently, methods aimed at maximizing submodular functions, as seen in works like [15, 13, 9], are not applicable to our context. This necessitates the development of innovative approaches tailored to the unique challenges discussed in this paper.
Secondly, the classical LP-based randomized rounding algorithms, proposed by [11, 17], while simple, can have a non-trivial probability to produce solutions that violate the budget constraint. More specifically, the approach in [17], an adaptation of the method from [11], attempts to mitigate this by scaling down the LP solutions to manage the probability of budget constraint violations. While this method achieves a -approximate solution, there remains a non-negligible probability, bounded by , that the budget constraints may still be breached. This probability fails to guarantee a constant approximation ratio of feasible solutions and can exceed even when is as large as .
To control this probability of violation, one might consider setting to a relatively high value. However, this adjustment would negatively impact the solution quality, a dilemma extensively discussed in Remark 6.1, which provides a detailed analysis and additional insights into this trade-off.
In summary, classical randomized rounding struggles to manage budget constraints and ensure that each item is assigned to at most one bin while still achieving a constant approximation ratio, especially when the algorithm must be computationally efficient to handle large-scale problems.
2.3 Contributions
We introduce a randomized approximation algorithm that is not only practically efficient but also maintains a constant approximation ratio. Our numerical experiments, conducted using real-world data, also outperform the previous work.
Our preliminary major findings were initially presented as an extended abstract at COCOON [12]. It featured randomized approximation algorithms but lacked formal proofs and did not offer a constant approximation ratio. In this extended work, we have refined our algorithm to achieve a constant approximation ratio.
In Section 5, we introduce our randomized approximation algorithm for solving GBAP, which achieves a constant approximation ratio. Specifically, our algorithm achieves an approximation ratio of for , and for , where is defined in (1.2). To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP.
Our analysis leverages two online optimization mechanisms detailed in Section 4. The study of online knapsack problems in Section 4.2 may be of independent interest. Furthermore, to overcome the limitations of classical randomized rounding, which struggles to simultaneously satisfy multiple constraints, we introduce a novel approach. Our method utilizes the realizations from randomized rounding to simulate two distinct online mechanisms. This innovative technique enables us to produce solutions that concurrently satisfy different constraints while maintaining theoretical guarantees. Furthermore, this method offers new insights into algorithm design for related complex optimization problems.
In Section 6, we apply our GBAP approximation algorithms to the RLPP . Using real-world data from New York City, we conduct numerical experiments that demonstrate improvements over previous approaches. These results highlight the practical significance of our algorithms in addressing complex transportation challenges.
3 Overview of Algorithm Design and Analysis
To provide a clear road-map of the technical discussion to follow, we first present an overview of our algorithm design and analysis, before delving into the specifics.
In our algorithm design, we will consider two scenarios with respect to the ratio of the budget to the largest line cost ( as defined in (1.2)): and . For each scenario, we develop an alternative randomized procedure (see Lemma 5.1 and 5.3) that, while complex to implement, is theoretically analyzable. We derive an approximation ratio for each procedure and subsequently show that a simpler algorithm we design (see Algorithm 1) does at least as well (up to a factor), thereby proving an approximation ratio for our simpler algorithm. Our proposed algorithm can be viewed as a simplified method that is derived from these two procedures. A general high level illustration of the techniques is given in the flow chart shown in in Figure 3.

The design of these procedures follows a similar approach. Both procedures begin with randomized rounding based on the LP relaxation of the problem to select at most one for each . This selection is represented by the indicator variables . Specifically, let represent the LP solution. For each , a set is randomly selected from according to the following distribution:
(3.1) |
where the decision variable if and otherwise, for each .
Subsequently, each item can only be assigned to bins where the corresponding selected sets contain , i.e., . Since , where represents the family of all nonempty feasible assignments of items to bin as defined in (1.3), this condition ensures that the capacity constraints are satisfied. We then focus on the other two types of constraints: (a) The budget constraint; (b) The restriction that each item can be assigned at most once. We will make two types of decisions: (a) Whether to utilize bin , represented by the indicator variable ; (b) Whether to assign item to bin (not necessarily utilized), represented by the indicator variable . This assignment occurs only when three conditions are met simultaneously:
-
1.
Bin is assigned a set containing ( and )
-
2.
Bin is utilized ()
-
3.
Item is assigned to bin (not necessarily utilized) ()
In other words, item is assigned to a utilized bin only when for some such that . Importantly, in our algorithm design, is set to be independent of both and . The rationale and implications of this independence will be discussed in detail later in this section.
The expectation of the rewards can then be expressed as:
This equality is due to the linearity of expectation and the linear objective function. It allows us to analyze each term respectively.
The realizations of have been determined through the previously described randomized rounding process. Subsequently, we determine and by simulating two separate online decision-making processes based on these realizations.
To determine for , we simulate an online decision-making process using the realizations of , revealing results as goes from to . This process is adapted from an online knapsack problem decision-making process (see Section 4.2). The process consists of stages. At the th stage, the algorithm decides whether to utilize bin (set ) based on the realizations of for and the remaining budget. For , it ensures that the overall costs of bin utilization are within the budget, i.e. , while for , the costs are approximately within the budget. It is important to note that, in contrast to these two alternative procedures, our proposed algorithm ensures strict adherence to the budget constraint in both cases.
For each item , to determine for , we simulate an online decision-making process using the realizations of . This process iterates over in reverse order, from to , and is based on the magician’s mechanism (see Section 4.1). The process consists of stages. At the th stage222We use instead of to align with the descending order of , which simplifies subsequent index references of this sentence., the algorithm decides whether to assign item to bin (i.e., set ) based on the realizations of and the previous decisions for . It is important to observe that this iteration order for determining is inverse to the one used for determining . This inverse ordering plays a crucial role in ensuring the independence of these decision variables in our analysis.
The magician’s mechanism ensures that (i) , (ii) is independent of , and (iii) . Note that point (i) ensures that item will be assigned to at most one bin. Due to the inverse revealing order for determining , is also independent of . Consequently:
.
Therefore,
(3.2) |
The last term is then comparable to the outcome of the online knapsack problem decision-making process defined in Section 4.2, which helps us derive the desired result for these two alternative procedures. Finally, we compare our algorithm (Algorithm 1) to these two procedures to establish the desired approximation ratio.
Having provided an overview of our approach, we now outline the structure of the two subsequent sections. In Section 4, we introduce two key online mechanisms: the magician’s mechanism and online fractional knapsack problems. These mechanisms are essential for developing the two alternative procedures, and we present their relevant results. Section 5 then details our proposed algorithm, along with the two alternative procedures and their associated theoretical outcomes.
4 Essential Mechanisms for Algorithm Design and Analysis
As discussed in Section 3, our algorithm’s analysis is based on two key online optimization mechanisms. The first is the generalized -conservative magician, originally proposed by Alaei et al. [1]. This mechanism serves a dual purpose: it determines , as outlined in Section 3, and provides a part of the theoretical foundation for proving results in the second mechanism. We present this mechanism here for completeness. The second mechanism, which we will prove the efficacy of subsequently, adapts concepts from online knapsack problems and may be of independent interest to the broader optimization community.
4.1 Generalized Magician’s Problem [1, Section 7]
Definition 4.1.
(the generalized magician’s problem). A magician is presented with a sequence of boxes one by one in an online fashion. The magician has units of mana. The magician can only open box if she has at least unit of mana. If box is opened, the magician loses a random amount of mana drawn from a distribution specified on box by its cumulative distribution function . It is assumed that . The magician wants to open each box with ex ante probability at least , i.e. not conditioned on for , for a constant as large as possible.
The problem can be solved with a near-optimal solution using the following mechanism.
Definition 4.2.
(generalized -conservative magician) The magician makes a decision about each box as follows: Let the random variable denote the amount of mana lost prior to seeing the th box, and denote the ex ante CDF of random variable . Define random binary variable conditional on as follows:
Then given and before seeing the th box, the magician will decide to open the th box if .
Note that with probability , and therefore . Then the CDF of and can be computed based on and in with dynamic programming. For more details, please see [1, Section 7].
One observation is that is independent to for all .
Theorem 4.3.
[1, Theorem 7.3] For any , we have for all , and a generalized -conservative magician with units of mana opens each box with an ex ante probability of exactly , i.e., . If all are Bernoulli random variables, i.e., for all , and is an integer, then for any , we have for all , and a generalized -conservative magician with units of mana opens each box with an ex ante probability of exactly .
Remark 4.4.
Application in determining : For each item , we construct a generalized -conservative magician with one unit of mana. This magician faces boxes, where the mana cost for opening each box follows a Bernoulli distribution. This distribution is determined by the distribution of resulting from the randomized rounding process. The magician’s decisions on whether to open these boxes directly correspond to the determination of . For a detailed description of this process, please refer to Step 3 of the procedures outlined in Lemma 5.1 and Lemma 5.3.
4.2 Online Fractional Knapsack Problems
Definition 4.5 (Online Fractional Knapsack Problem with Pre-arranged Arrival Order).
Consider a sequence of independent rounds. In each round , an object is characterized by a positive reward , a non-negative weight , and a positive appearance probability . Additionally, there is a specified capacity , ensuring that the cumulative weight of the objects accepted does not exceed this capacity. The appearance of each object follows a Bernoulli distribution. The problem is subject to two constraints:
In this setting, we assume full knowledge of the object distribution. At the -th round, we observe the appearance of object before deciding a fraction of object to accept. Acceptance is conditioned on the remaining capacity being at least . Accepting the object results in a reduction of the remaining capacity by and a secured reward of . All decisions, once made, are irreversible.
It is assumed that the sequence of object appearances can be pre-determined before the initiation of the rounds.
The goal is to devise an algorithm that optimizes an efficiency coefficient such that the expected reward, at the end of rounds, is .
Remark 4.6.
The term “object” is used here instead of “item” to provide a distinction from the “item” used in GBAP. Specifically, in the analysis presented in Section 5, the allocation of objects in this online knapsack problem will be compared to the utilization of bins in GBAP.
Lemma 4.7 (Expected Reward for Prearranged Arrival Order).
Consider the problem defined in Definition 4.5. Assume the order of object appearances is prearranged such that
(4.1) |
Furthermore, let the strategy be to accept objects fully in order of appearance until the remaining capacity is insufficient to accept the next appearing object fully. At this point, accept this next appearing object fractionally to exactly exhaust the remaining capacity. Under these conditions, the expected reward generated at the end of rounds is
Proof.
We begin by establishing the ratio of . According to Definition 4.2 and Theorem 4.3, the problem defined in Definition 4.5 admits a solution with this ratio when using the Magician’s mechanism. Note that in this case, the acceptance fraction for all due to the definition of Magician’s mechanism. It is evident that the strategy outlined in Lemma 4.7 outperforms the Magician’s mechanism in terms of efficiency and reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof for this part.
As for the ratio of , we will conduct the similar comparison, but with the help of the method in Manshadi et al. [16]. In their work, they defined a method such that at the end of rounds, it achieves that . For more details, please see Theorem 2 in [16] when . This means that the method finally yields expected reward at least
Again, the greedy strategy outlined in Lemma 4.7 outperforms this method in terms of reward due to the pre-arranged order of objects as in (4.1). Therefore, this concludes the proof. ∎
Application in determining and deriving approximation ratio: To apply the results of Lemma 4.7 in determining whether to utilize a bin (i.e., determine as mentioned in Section 3), we simulate the online knapsack problem in Algorithm 1. Specifically, after a randomized rounding process to realize , we reindex the bins following the rule in (4.1) of Algorithm 1. We then simulate the online knapsack problem as described in Section 3. This approach serves two purposes: (1) it enables us to determine by simulating the decision-making process for the online knapsack problem, and (2) it allows us to compare the last term in (3.2) to the outcome of the online knapsack problem, enabling us to leverage the bound established in Lemma 4.7 to derive the desired approximation ratio.
5 Approximation Algorithm
In this section, we consider solving GBAP. Let be the ratio of the budget to the maximum bin cost:
For the sake of simplicity and without loss of generality, we can scale the budget and costs to adhere to the following assumption:
Assumption 1.
The given budget is , and for all , , with .
The LP relaxation can be solved in polynomial time by solving its dual with ellipsoid method as in [11, 17].
Algorithm 1.
-
1.
Find a solution to the LP relaxation of (1.4) and let be the solution.
-
2.
Reorder the bins so that
(5.1) Note that here if the denominators are zeros, the corresponding numerators are also zeros, and the corresponding bins can be ignored in the following rounding process. Thus, we can define in (5.1).
-
3.
Independently for each , randomly select a set from following the distribution below:
Let the (random) decision variable be , i.e., if , and otherwise for .
-
4.
Temporarily assign all the elements in to bin for each .
-
5.
For each item , retain it in the bin where it yields the highest reward and remove it from others.
-
6.
Open all the bins that have been assigned with at least one item, and close all the other bins. If this assignment does not exceed the budget, then we take this assignment as the final solution.
-
7.
Otherwise, discard this assignment. We construct a new assignment.
-
8.
Let . Let be from to . At th iteration, let .
-
9.
If , let . Assign to for each and open it. For each item , retain it in the bin where it yields the highest reward and remove it from others.
-
10.
If , let . Assign to for each and temporarily open it. For each item , retain it in the bin where it yields the highest reward and remove it from others. Let . Then we open the bins in either or in , whichever generated more rewards.
Our proof is divided into two cases: and . For both scenarios, we employ a two-step approach. First, we analyze the expected reward generated by a specific procedure of assignment based on two mechanisms: the magician’s mechanism defined in Definition 4.2 and the mechanism adapted from the online fractional knapsack problem discussed in Section 4.2. Second, we compare the rewards generated by Algorithm 1 with those from this constructed procedure.
We begin by addressing the case where . For this scenario, we construct a specific procedure and provide its analysis in the following lemma.
Lemma 5.1.
Assume . With the same notations as in Algorithm 1, we define a procedure as follows:
-
1.
For each , let , which indicates whether any set has been selected for bin in the randomized rounding process. Define if and . Otherwise, set . Here represents the remaining budget after considering the first bins. Thus, indicates that bin is utilized, which occurs when there is sufficient budget () and a set has been selected for this bin ().
-
2.
For every and , introduce a variable , given by
It indicates whether item is included in the selected set for bin , and its probability distribution is
-
3.
For each item , create a generalized -conservative magician with unit of mana (called type-p magician). The magician of item is presented with a sequence of type- boxes from box to box . The distribution of is written on type- box . The amount of mana the magician will lose if open type- box is equal to the value of . The magician will decide whether to open type- box following the mechanism defined in Definition 4.2. Let the (random) decision variable be , i.e., if the magician for decides to open type- box , and otherwise.
If we utilize bin when , and assign item to bin whenever , then the assignment is feasible, which means that:
-
(a)
Each passenger is assigned to at most one utilized bin.
-
(b)
The total cost of all utilized bins is within the given budget .
-
(c)
The capacity constraints are respected.
Moreover, the expected approximation ratio derived from this assignment is at least .
Proof.
We first prove its feasibility. We start with point (a). In Step 3, represents the decision of the type- magician on whether to open the type- box with index . Meanwhile, indicates whether opening the box requires spending 1 unit of mana. Based on the definition of the magician’s mechanism in Definition 4.2 and the results stated in Theorem 4.3, the type- magician will spend at most one unit of mana during this process. Therefore, . Since , so if we assign item to bin when , it can can only be assigned at most once through this process. As for point (b) and (c), they can be checked directly by definition.
The expected reward derived from the procedure in Lemma 5.1 can be expressed as:
(5.2) | ||||
(5.3) | ||||
(5.4) |
Eq. (5.2) results from observing that is independent of , and for each and . This independence arises because the magician’s mechanism ensures that only depends on and for as defined in Step 3 of the procedure, while depends on for and . (5.3) occurs because represents the decision of a type- magician to open the box or not, constrained by having only one unit of mana, leading to according to Theorem 4.3.
As for Eq. (5.4), it is due to depending only on and for , and being independent of for .
(5.5) |
To establish the lower bound of Eq. (5.4), we construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of rounds. In each round , an object appears with a probability , possessing a weight and a reward . The initial capacity is equal to the budget . As specified in Step 2 of Algorithm 1, it satisfies that , allowing us to apply Lemma 4.7.
In the constructed instance, if we fully accept each appearing object in sequence until the remaining capacity is inadequate for the full acceptance of the next appearing object, and then accept the next object fractionally to precisely exhaust the remaining capacity, then, according to Lemma 4.7 with , the expected rewards obtained in this online knapsack problem are at least:
If we instead only accept objects fully and stop when the remaining capacity is insufficient for the next appearing object (i.e., we do not accept any object fractionally), the reward equals . Since , the capacity is , and , then we have that
(5.6) |
This completes the proof. ∎
Theorem 5.2.
Proof.
The feasibility can be verified directly. Given the same realization of in Step 3 of Algorithm 1, both Algorithm 1 and the procedure defined in Lemma 5.1 will utilize identical bins.
Moreover, under this realization, item can only be assigned to bins where and . This condition holds true for both Algorithm 1 and the procedure in Lemma 5.1.
Given the realizations of , the optimal reward is achieved by retaining each item in the bin where it yields the highest reward , subject to and . This is precisely what Step 9 of Algorithm 1 accomplishes.
The above proof is constructed by creating a procedure that produces a feasible assignment solution, then comparing Algorithm 1 with it to derive the desired result. For the case when , we will instead produce a procedure that generates an assignment solution that may violate the budget constraint. We will then compare this with Algorithm 1 to derive the desired result. Although the approaches differ in terms of solution feasibility of the constructed procedures, the key ideas remain similar. Both rely on the magician’s mechanism and the results from the online fractional knapsack problem presented in Section 4.
Lemma 5.3.
Assume . With the same notations as in Algorithm 1, we define a procedure as follows:
-
1.
For each , let . Set if and . Otherwise, set . This ensures that bin is utilized () only when the remaining budget for is non-negative () and a set has been selected for this bin ().
-
2.
For every and , introduce a random variable , given by
Its probability distribution is
-
3.
For each item , create a generalized -conservative magician with unit of mana (called type-p magician). The magician of item is presented with a sequence of type- boxes from box to box . The distribution of is written on type- box . The amount of mana the magician will lose if open type- box is equal to the value of . The magician will decide whether to open type- box following the mechanism defined in Definition 4.2. Let the (random) decision variable be , i.e., if the magician for decides to open type- box , and otherwise.
If we utilize bin when , and assign item to bin whenever for some and , then the assignment satisfies:
-
(a)
Each passenger is assigned to at most one utilized bin.
-
(b)
Let the total cost of all utilized bins be and define . Then we have . In other words, if we remove the utilized bin with the largest index, the total cost of the remaining bins is within the given budget .
-
(c)
The capacity constraints are respected.
Moreover, the expected approximation ratio derived from this assignment is at least .
Proof.
Theorem 5.4.
Proof.
The feasibility can be verified directly. Given the same realization of in Step 3 of Algorithm 1, the algorithm will temporarily open the same bins as those utilized by the procedure defined in Lemma 5.3.
Moreover, under this realization, item can only be assigned to bins where and . This condition holds true for both Algorithm 1 and the procedure in Lemma 5.3.
Given the realizations of , the optimal reward for these temporarily opened bins is achieved by retaining each item in the bin where it yields the highest reward , subject to and . This is precisely what Step 10 of Algorithm 1 does for those temporarily opened bins. Consequently, the total reward from these temporarily opened bins is at least as large as that derived from the procedure in Lemma 5.3.
The key difference from Theorem 5.1 is that the temporarily opened bins might violate the budget constraint. To address this, we compare the reward of the temporarily opened bin with the largest index ( in Step 10 of Algorithm 1) to the reward of all other temporarily opened bins ( in Step 10 of Algorithm 1). Whichever is larger will retain at least half of the total reward.
6 Applications in transportation
In this section, we introduce an application of GBAP in a multi-modal Mobility-on-Demand (MoD) setting, illustrating how the framework can be adapted to address practical transportation problems. Consider a MoD operator managing a fleet of single-capacity cars and high-capacity buses. The operator aims to provide a multi-modal service with buses operating on fixed routes and cars functioning in a demand-responsive manner. An individual trip request can be served by a bus, a car, or a combination of both.
This scenario could be envisioned as a transit agency planning a multi-modal service where traditional buses are complemented by a demand-responsive service, or a private rideshare operator integrating fixed-route shuttles with taxis. This problem was investigated by [17]. There is a wide range of literature studying the traditional line-planning problem. For instance, [6] introduced a new multicommodity flow model for line planning. Their model aims to strike a balance between minimizing operating costs for transport companies and reducing travel times for passengers. On the other hand, [4] provides a holistic approach for transit line planning using column generation. For additional literature on line planning problems in public transportation, readers are directed to [18].
By applying GBAP to this MoD scenario, we can optimize resource allocation and route planning, maximizing the efficiency and service quality of the transportation system while adhering to budget constraints. This demonstrates the versatility and practical value of GBAP in addressing complex, real-world transportation challenges.
In this problem, the transit network is modeled as an undirected weighted graph , where contains nodes as potential origin/destination nodes, and is the set of edges representing the shortest paths between the nodes. The weights of the edges are the cost required to cross an edge . A set of candidate bus routes/lines on is given to the operator. Each candidate route / line is defined as a fixed sequence of consecutive edges of , and it has an operating cost . There is a budget for operating bus lines. Each bus has capacity and each line has frequency . Thus, the overall capacity of a bus line is . The cars cover ’first-last mile’ travel. It is assumed that there are no inter-bus transfers for each passenger. If the passenger is assigned to line , then is defined as the welfare value contributed to the system by this matching. The operator is responsible for selecting a set of bus lines from to operate in a way that maximizes the overall welfare of the system. The value of is calculated according to the rules given, e.g. binary based on whether the matching is feasible or the number of car miles saved due to the matching (for more details, please see [17]). The goal is to find a subset of lines to be operated and an assignment of passengers to opened lines to maximize the welfare of the system, such that: (a) the cost of opened lines does not exceed the budget; (b) the assignment of passengers and their trips to the opened line respect the capacity constraints of each edge on line ; (c) a passenger is assigned to at most one line.
Now we can describe how this problem fits into the proposed framework. For each line with edges, we map it to a bin in GBAP with an -dimensional capacity vector. Each entry in the capacity vector of line is set to . Each passenger is taken as an item in GBAP with a weight vector . If the th edge of line is used when passenger is matched to line , then we set the th entry of to , and otherwise. Note that has consecutive s according to the definition of a route in RLPP. The cost of utilizing bin is set to be while the cost of assigning item to bin is assumed to be . Contributed welfare values are the coefficients in the objective of (1.1). Since each passenger can only be matched to at most one line, we let for each . The RLPP is an instance of GBAP with an additional aggregation step that merges any subset of selected lines with the same route (and different frequencies) to a single line with the corresponding aggregated frequency. Since the aggregation step neither increases the cost of the solution nor decreases the objective values as shown in [17], and can be trivially incorporated into our framework, we omit it here for simplicity.
Périvier et al. [17] present an approximation algorithm (see Algorithm 2 in Appendix B) to solve the problem based on the randomized rounding scheme proposed by [11]. The corresponding approximation ratio is , with a probability of violating the budget constraint bounded by , where as we have assumed before.
Remark 6.1.
We note that the violation probability bound, given by , may surpass , even when is as large as . To address this, one may consider setting to , with . However, this implies that the method’s approximation ratio becomes with violation probability bound . We present Table 1 to illustrate this by taking , which are reasonable and relatively large parameters in real-world cases. Note that by definition, the number of the opened lines in the optimal solution can be significantly larger than . We calculate the approximation ratio as since it is the original one derived by [17] and is higher than . Furthermore, note that the rounding process tends to produce infeasible solutions with higher objective values than feasible ones, suggesting that the actual expected approximation ratio for feasible solutions might be significantly lower than , especially when the violation probability is non-negligible.
25 | 1/3 | 0.3420 | 0.4159 | 0.3773 |
---|---|---|---|---|
25 | 1/4 | 0.4472 | 0.3494 | 0.1889 |
50 | 1/3 | 0.2714 | 0.4605 | 0.2929 |
50 | 1/4 | 0.3761 | 0.3944 | 0.0947 |
100 | 1/3 | 0.2154 | 0.4959 | 0.2128 |
100 | 1/4 | 0.3162 | 0.4322 | 0.0357 |
We note that when Algorithm 1, and Algorithm 2 (in Appendix B) utilize the same LP solution for their rounding process, Algorithm 1 are provably superior to Algorithm 2. The reason for this is that the feasible solutions produced by Algorithm 2 can also be generated by Algorithm 1 with the same probability distributions, while each infeasible solution generated by Algorithm 2 is converted into a feasible one by Algorithm 1. However, since the approximation ratio derived for Algorithm 2 encompasses the objective values of infeasible solutions, it is not applicable to Algorithm 1.
6.1 Numerical Experiments.
In this section, we conduct numerical experiments in the context of RLPP based on the data provided by [17] and Algorithm 1. We also compare its performance with Algorithm 2 with . The underlying road network is derived from OpenStreetMap (OSM) geographical data from [5]. The size of the candidate set of lines is set to be , and the lines are generated based on the method proposed by [19]. The passenger data comes from records of for-hire vehicle trips in Manhattan using the New York City Open Data operator, with a time window between 5 pm and 6 pm on the first Tuesday of April 2018. There are trip requests. The bus capacity is set to be . For practical purposes, we solve the corresponding LP via column generation and apply a timeout—the current LP solution will be returned once the time limit is exceeded.
We implement Algorithm 1, Algorithm 2, and the modified Algorithm 1 with rounding based on the same LP solution used by Algorithm 2. The experiments consider four different budgets, and we simulate the rounding times. In Figure 4, the -axis represents the number of simulations, and the -axis indicates the objective value of the best solution found by the algorithms so far. We omit the first realizations in Figure 4 to make the display of the -axis more readable. The results333In the dataset we employed, the value of substantially exceeds . In this case, Algorithm 1 coincides with Algorithm 2 presented in our conference paper [12], yielding same numerical outcomes. show that when the budget is and respectively, our algorithms find significantly better solutions, while the three methods perform similarly when the budget is and respectively.

7 Conclusion and Future Directions
In this paper, we have introduced a novel approach to the Generalized Budgeted Assignment Problem (GBAP) and demonstrated its application to the Real-Time Line Planning Problem (RLPP) in transportation systems. To the best of our knowledge, this is the first algorithm to achieve a constant approximation ratio for GBAP. Our approach not only enhances the theoretical understanding of GBAP but also provides practical solutions for complex transportation planning challenges.
Our contributions are threefold. First, we have developed a randomized approximation algorithm for GBAP with a proven constant approximation ratio. Second, we have introduced an innovative technique that employs randomized rounding realizations to simulate two distinct online mechanisms. This approach enables the simultaneous satisfaction of different constraints while maintaining a strong theoretical guarantee on algorithm performance. Third, we have applied our algorithm to the RLPP and demonstrated its effectiveness through numerical experiments using real-world data from New York City’s transportation system. This application underscores the algorithm’s potential for large-scale urban transit planning and bridges the gap between theoretical advancements and practical implementations in transportation systems.
While our results represent a significant step forward, several avenues for future research remain open.
As stated in out extended abstract at COCOON [12, Theorem 6], we have established that the approximation ratio cannot exceed by reducing the problem to the max -cover problem.444We have chosen to omit the detailed statement of this result to maintain focus on our primary contributions. An avenue for future research is to explore whether the upper bound can be tightened to a constant strictly less than .
The second potential direction for future research is to explore possible improvements to our current approximation ratio. While our algorithm (Algorithm 1) enhances practical efficiency and performance compared to the two alternative procedures presented in Lemma 5.1 and Lemma 5.3, its approximation ratio guarantee remains the same as these procedures since it essentially builds upon their bounds. It remains unclear whether there is a significant gap in the theoretical guarantees between our algorithm and these two alternative procedures. Therefore, refining the analysis to enhance the theoretical bound of Algorithm 1 may be worthwhile. Another avenue could be the development of new algorithms with a better approximation ratio guarantee. However, it is important to note that our primary goal has been to develop an algorithm that is both theoretically sound and practically efficient. This dual requirement presents significant challenges in substantially improving the approximation ratio without compromising computational efficiency. Any future improvements will need to carefully balance these competing objectives, making this a complex but potentially rewarding area for further investigation.
It is also interesting for future research to explore scenarios with stochastically appearing items, each following a known binary distribution. In this context, the objective would be to maximize expected rewards by strategically utilizing bins within the given budget constraint. This extension could enhance the model’s applicability to real-world situations where demand or item availability is uncertain but follows predictable patterns.
References
- Alaei [2014] Saeed Alaei. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM Journal on Computing, 43(2):930–972, 2014.
- Alaei et al. [2013] Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. The online stochastic generalized assignment problem. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, pages 11–25. Springer, 2013.
- Bender et al. [2015] Marco Bender, Clemens Thielen, and Stephan Westphal. Packing items into several bins facilitates approximating the separable assignment problem. Information Processing Letters, 115(6-8):570–575, 2015.
- Bertsimas et al. [2021] Dimitris Bertsimas, Yee Sian Ng, and Julia Yan. Data-driven transit network design at scale. Operations Research, 69(4):1118–1133, 2021.
- Boeing [2017] Geoff Boeing. Osmnx: A python package to work with graph-theoretic openstreetmap street networks. Journal of Open Source Software, 2(12), 2017.
- Borndörfer et al. [2007] Ralf Borndörfer, Martin Grötschel, and Marc E Pfetsch. A column-generation approach to line planning in public transport. Transportation Science, 41(1):123–132, 2007.
- Calinescu et al. [2011] Gruia Calinescu, Chandra Chekuri, Martin Pal, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM Journal on Computing, 40(6):1740–1766, 2011.
- Chekuri and Khanna [2005] Chandra Chekuri and Sanjeev Khanna. A polynomial time approximation scheme for the multiple knapsack problem. SIAM Journal on Computing, 35(3):713–728, 2005.
- Chekuri et al. [2012] Chandra Chekuri, Nitish Korula, and Martin Pál. Improved algorithms for orienteering and related problems. ACM Transactions on Algorithms (TALG), 8(3):1–27, 2012.
- Feige and Vondrak [2006] Uriel Feige and Jan Vondrak. Approximation algorithms for allocation problems: Improving the factor of 1-1/e. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pages 667–676. IEEE, 2006.
- Fleischer et al. [2011] Lisa Fleischer, Michel X Goemans, Vahab S Mirrokni, and Maxim Sviridenko. Tight approximation algorithms for maximum separable assignment problems. Mathematics of Operations Research, 36(3):416–431, 2011.
- Jiang and Samaranayake [2022] Hongyi Jiang and Samitha Samaranayake. Approximation algorithms for capacitated assignment with budget constraints and applications in transportation systems. In International Computing and Combinatorics Conference, pages 94–105. Springer, 2022.
- Kulik [2011] Ariel Kulik. Submodular and linear maximization with knapsack constraints. Master’s thesis, Computer Science Department, Technion, 2011.
- Kulik et al. [2013] Ariel Kulik, Hadas Shachnai, and Tami Tamir. Approximations for monotone and nonmonotone submodular maximization with knapsack constraints. Mathematics of Operations Research, 38(4):729–739, 2013.
- Lee et al. [2009] Jon Lee, Vahab S Mirrokni, Viswanath Nagarajan, and Maxim Sviridenko. Non-monotone submodular maximization under matroid and knapsack constraints. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 323–332, 2009.
- Manshadi et al. [2021] Vahideh Manshadi, Rad Niazadeh, and Scott Rodilitz. Fair dynamic rationing. In Proceedings of the 22nd ACM Conference on Economics and Computation, pages 694–695, 2021.
- Périvier et al. [2021] Noémie Périvier, Chamsi Hssaine, Samitha Samaranayake, and Siddhartha Banerjee. Real-time approximate routing for smart transit systems. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 5(2):1–30, 2021.
- Schöbel [2012] Anita Schöbel. Line planning in public transportation: models and methods. OR spectrum, 34(3):491–510, 2012.
- Silman et al. [1974] Lionel Adrian Silman, Zeev Barzily, and Ury Passy. Planning the route system for urban buses. Computers & operations research, 1(2):201–211, 1974.
- Wolsey [1982] Laurence A Wolsey. Maximising real-valued submodular functions: Primal and dual heuristics for location problems. Mathematics of Operations Research, 7(3):410–425, 1982.
Appendix A Proof of Lemma 5.3
Proof of Lemma 5.3.
Point (a) can be proven using the exact same way as in Lemma 5.3. Regarding point (b) and (c), they can be directly verified from its definition.
Following the same reasoning as in (5.4), the expected reward derived from the procedure in Lemma 5.3 can be expressed as:
(A.1) |
(A.2) |
To establish the lower bound of Eq. (A.1), we again construct an instance of the online fractional knapsack problem, as in Definition 4.5, consisting of rounds. In each round , an object appears with a probability , possessing a weight and a reward . The initial capacity is equal to the budget . As specified in Step 2 of Algorithm 1, it satisfies that , allowing us to apply Lemma 4.7.
Following the same reasoning as in Lemma 5.3, expected rewards of this online knapsack problem instance are at least:
Unlike Lemma 5.1, the definition of provided in Lemma 5.3 indicates that the reward cannot exceed . Consequently, we establish:
(A.3) |
This completes the proof. ∎
Appendix B Method from [17]
Here we restate the method from [17] while omitting the aggregation step.
Algorithm 2.
-
1.
Let be the solution to the LP relaxation of (1.4).
-
2.
Independently for each , randomly select a set from following the distribution below:
Let the (random) decision variable be , i.e., if , and otherwise for .
-
3.
Temporarily assign all the elements in to bin for each .
-
4.
For , we remove each item that is assigned to more than bins from all but the bins with highest rewards it is assigned to.
-
5.
Open all the lines that have been assigned with at least one passenger, and close all the other lines. If this assignment does not exceed the budget, then we take this assignment as the final solution.