This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Data-driven Optimization for Drone Delivery Service Planning
with Online Demand

Aditya Paul aditya.paul@unsw.edu.au Michael W. Levin mlevin@umn.edu S. Travis Waller s.travis.waller@gmail.com David Rey david.rey@skema.edu
Abstract

In this study, we develop an innovative data-driven optimization approach to solve the drone delivery service planning problem with online demand. Drone-based logistics are expected to improve operations by enhancing flexibility and reducing congestion effects induced by last-mile deliveries. With rising digitalization and urbanization, however, logistics service providers are constantly grappling with the challenge of uncertain real-time demand. This study investigates the problem of planning drone delivery service through an urban air traffic network to fulfil online and stochastic demand. Customer requests – if accepted – generate profit and are serviced by individual drone flights as per request origins, destinations and time windows. We cast this stochastic optimization problem as a Markov decision process. We present a novel data-driven optimization approach which generates predictive prescriptions of parameters of a surrogate optimization formulation. Our solution method consists of synthesizing training data via lookahead simulations to train a supervised machine learning model for predicting relative link priority based on the state of the network. This knowledge is then leveraged to selectively create weighted reserve capacity in the network and via a surrogate objective function that controls the trade-off between reserve capacity and profit maximization to maximize the cumulative profit earned. Using numerical experiments based on benchmarking transportation networks, the resulting data-driven optimization policy is shown to outperform a myopic policy. Sensitivity analyses on learning parameters reveal insights into the design of efficient policies for drone delivery service planning with online demand.

keywords:
Stochastic processes , Data-driven optimization , Online demand , Drone delivery , Service planning
\affiliation

[1]organization=School of Civil and Environmental Engineering, addressline=UNSW Sydney, postcode=NSW, 2052, city=Sydney, country=Australia

\affiliation

[3]organization=Department of Civil, Environmental, and Geo-Engineering, University of Minnesota, postcode=55455, city=Minneapolis, country=USA

\affiliation

[2]organization=Lighthouse Professorship “Transport Modelling and Simulation”, Faculty of Transport and Traffic Sciences, Technische Universität, city=Dresden, country=Germany

\affiliation

[4]organization=SKEMA Business School, Université Côte d’Azur, addressline=Sophia Antipolis, country=France

1 Introduction

Unmanned Aerial Vehicles (UAVs) have been growing in both prevalence and application. UAVs provide the advantage of performing a task aerially with limited or no human supervision. Additionally, most UAV designs are light-weight and can be deployed in large numbers. These features cause UAVs to be in high demand in many different domains, particularly logistics. UAV applications broadly fall into two categories: either some form of data acquisition and transmission (also called monitoring or reconnaissance), or payload pickup and delivery. This study falls into the second category and addresses the drone delivery service planning problem (DDSP) under stochastic online demand. The DDSP takes the perspective of a transportation network manager that plans service for drone deliveries. We consider discrete time steps grouped into time intervals. At each time step, customers may submit delivery requests that are to be served by a fleet of UAVs subject to time windows and link capacity constraints. The probability distributions behind request parameters, and the request arrival process, are unknown to the network manager and estimated from historical data. Decisions are made at every time interval to accept or reject the requests that arrive and to route accepted requests. We explore the potential of data-driven optimization methods for this online DDSP problem.

In stochastic optimization problems, some or all parameters are subject to uncertainty. These parameters are not deterministic, but instead emerge from probability distributions or uncertainty sets. Traditional solution methods to stochastic problems exploit a priori knowledge of these probability distributions. Stochastic problems can further be sub-divided into two categories: (a) static, where all problem parameters are available beforehand, and some problem specific condition determines if and when the stochastic parameters are instantiated; (b) dynamic, where problem parameters data arrives is revealed over time. Besides, online optimization makes no assumption on the nature of uncertain data and focuses on the immediacy and real-time nature of incoming problem parameters. Online problems are always dynamic in nature. Nevertheless, in many cases, it is reasonable to presume that the probability distribution, or an estimate thereof, can be extracted either through historical data or predictive models (Van Hentenryck et al., 2010). From a computational standpoint, stochastic and online optimization problems lead to significant challenges. The rise of e-commerce has compelled logistic service providers to tackle online demand. In this study, we consider the setting of the static DDSP in the context of online demand, i.e. user requests that are revealed over time.

Our solution methodology is based on an innovative data-driven optimization and lies at the intersection of predictive analytics (machine learning) and prescriptive analytics (stochastic optimization). The literature on the combination of machine learning (ML) and stochastic optimization reveals two distinct avenues for a successful collaboration:

  • 1.

    ML-enabled optimization: Combinatorial optimization problems can be very hard to solve, and special techniques or methods can be used to solve a particular kind of problem or problem instance. Instead of relying on experts to create such techniques, algorithms and models can be developed to automatically learn from implicit distribution of problem instances and adapt to those instances. ML has been found to be useful here.

  • 2.

    Prediction and optimization: Many real-world practical applications involve uncertain parameters which directly or indirectly affect the objective function, but which must first be predicted before optimization. In such predict-then-optimize situations, ML is being recognized as a vital tool to make accurate data-driven predictions.

Research into transportation service planning under online demand, with a dedicated integration of ML and mathematical programming (MP), is still in its nascent stages. In this study, we incorporate strategies from both these avenues to develop a customized data-driven optimization approach for the DDSP with online demand. This study makes the following contributions to the field.

  1. 1.

    We extend the integer linear programming (ILP) formulation of the deterministic DDSP to incorporate online demand, i.e. delivery requests are revealed over time and routing decisions must be taken to accommodate accepted requests, and cast this problem as a Markov Decision Process (MDP).

  2. 2.

    We propose a novel solution method for the online DDSP where predictive prescriptions are computed over time. Historical demand data including spatial and temporal distributions, as well as the request arrival process are used to determine link-based features that inform a surrogate optimization formulation.

  3. 3.

    We demonstrate through numerical experiments that our data-driven optimization approach outperforms a myopic optimization approach. Sensitivity analysis reveal how learning parameters affect optimal policies for the online DDSP.

Since we base our solution method on link-based features, our approach is applicable to other network-based logistics systems. The rest of the paper is organized as follows. In Section 2, we review the relevant literature. In Section 3, we present the online DDSP formulation and outline the MDP framework. Section 4 develops the solution methodology and the proposed algorithms. Section 5 reports the numerical experiments and their results. Our findings and research perspectives are discussed in Section 6.

2 Literature review

We examine the state-of-the art relevant to the online DDSP. Section 2.1 reviews recent advances in data-driven optimization, notably the interplay between ML and optimization. Section 2.2 records the advent of stochastic and online demand as well as the latest solution approaches arising in urban logistics. Finally, Section 2.3 outlines the widespread adoption of UAVs in commercial and industrial sectors.

2.1 Data-driven optimization

Machine learning (ML) has proved useful for optimization purposes along two lines: to assist directly within the optimization process by exploiting particular problem characteristics, or as a data-driven predictive framework to estimate uncertain parameters that are involved in the optimization formulation. We review these two streams of literature hereafter.

2.1.1 Machine learning-enabled optimization

It is common in optimization that a solution algorithm is tuned to a specific problem scenario by refining its parameters or policy to that scenario. Such tuning (or adaptation) need not come from research expertise and intuition. ML can assist an optimization algorithm to this end (Bengio et al., 2021). This direct assistance provided by ML can be analysed based on its motivation or on its structure. In terms of motivation, ML can either: (a) replace some heavy computation with a fast, reliable approximation or (b) explore the decision space (defined by the optimization model) and discover better policies. As for structure, the collaboration between ML and optimization can be sequential, whereby the ML model supports the optimization algorithm with valuable information. Or it can be parallel, wherein the ML and MP components work alongside each other. If one of the components works within the framework of the other, then the solutions of the nested component help the external one make better decisions.

Kruber et al. (2017) provide an illustration of the sequential implementation. They use supervised ML to decide if a given Dantzig-Wolf decomposition will be effective in solving instances of a mixed-integer linear programming (MILP) problem. Bonami et al. (2018) address a similar question. They employ supervised ML to determine if linearizing a given instance of a mixed integer quadratic programming problem will reduce solve time. An example of parallel implementation is found in Lodi and Zarpellon (2017). The ML model works within a branch-and-bound framework, and is used to make decisions about branching. Specifically, the goal of the ML component is to learn an efficient policy for selecting the branching variable and branching direction. Yilmaz and Büyüktahtakın (2024) address sequential combinatorial optimization problems in industrial contexts, where problem instances arise with slight parameter variations. A neural network generates a relaxed problem instance by predicting binding constraints, then uses this relaxed instance to eliminate infeasible predictions of decision variables through rapid feasibility checks. This reduces solution time by up to three orders of magnitude, while maintaining an optimality gap below 0.1%. Larsen et al. (2022) introduce a supervised ML model to quickly predict expected tactical descriptions of operational solutions in stochastic optimization. This addresses a common challenge in two-stage stochastic programming, where the second stage is computationally demanding. Their approach aids in solving the overall two-stage problem by circumventing the need for online generation of multiple second stage scenarios and solutions. Another implementation possibility is to leverage a MP formulation to aid an ML algorithm in the solution process. Tran-Thanh et al. (2012) employ an unbounded knapsack formulation to improve the performance of the ϵ\epsilon-greedy algorithm for multi-armed bandits, where the knapsack solution governs the probability of pulling a given arm of the bandit. A comprehensive overview of ML-enabled optimization methods is reported in Bengio et al. (2021).

2.1.2 Predict-then-optimize

Many decision-making problems deal with both prediction and optimization (or decision). These two tasks are complex on their own and are hence often treated separately and sequentially. The prediction model predicts the unknown parameters of the optimization model, which the latter uses to perform the optimization. ML has been a popular candidate for the prediction task in recent times. But the conventional approach is to train the ML prediction model to minimize the prediction error. This does not consider the nature and attributes of the successive optimization problem and may result in sub-optimal decisions (Elmachtoub and Grigas, 2022). As a brief but standard example, consider two possible paths between an origin O and a destination D. Travel times for both paths are unknown to the decision-maker, but may be predicted with contextual data (weather, congestion, etc.). Suppose the actual travel times are 10 and 20 minutes each, respectively. ML model 1 predicts travel times on the two paths to be 15 and 13 minutes (respectively), while ML model 2 predicts the same to be 30 and 40 minutes. Although model 1 has better accuracy, it causes a worse decision.

Two frameworks have been proposed which integrate prediction with optimization by taking the downstream optimization model into account (Yan and Wang, 2022). The first framework is called smart “predict, then optimize” (SPO), and it evaluates the performance of the ML prediction model based on the final decision error induced by the prediction made, not on the initial prediction error (Elmachtoub and Grigas, 2022). The authors propose a design which applies to the optimization of a linear objective functions over a convex feasible region. They introduce the SPO loss function which captures the decision error of a prediction by utilizing the objective function of the downstream optimization model. Since training with the SPO loss function can be cumbersome, a more manageable surrogate loss function called SPO+ loss is also introduced, which is equivalent to the original loss under specific conditions. The second framework, predictive prescriptions, aims to harness the joint distributions between decision variables and auxiliary data, both of which are uncertain (Bertsimas and Kallus, 2020). The principle of predictive prescriptions is to simulate various scenarios in which the values of the unknown parameters are estimated by the ML prediction model. These estimations lead to a cost in the optimization process, and the costs are weighted by the data-driven exploration of the scenarios. The optimal decision policy then minimizes the weighted sum costs thus generated. Both SPO and predictive prescriptions framework have been increasingly adopted by the scientific community and a survey of contextual optimization methods for decision-making under uncertainty is available in Sadana et al. (2023).

2.2 Stochastic and online demand

Online demand, sometimes referred to as on-demand logistics, deals with that aspect of e-commerce which provides an affordable delivery service to satisfy the customer’s place and time requirements at short notice. It concerns the sale of immediately deliverable products, sometimes at the cost of short-term losses incurred by the e-commerce business or logistic service provider. With the consolidation of the requisite digital mechanisms (internet ubiquity, personal computers, mobile phones, online payment apps) and supply chain infrastructure (warehouses, highways, enterprise resource planning, GPS), e-commerce has experienced a consistent surge in online demand in the 21st century. E-commerce is now experiencing over 10%10\% growth per annum worldwide, and B2C e-commerce deliveries rose by nearly 25%25\% in 2020 alone (Lozzi et al., 2022). In addition, the disruption caused by COVID-19 has caused an acceleration in internet usage and a lowering of traditional offline purchases, both factors contributing further to e-commerce home deliveries (Han et al., 2022).

Even in the 2000s the research literature observed the benefits of transitioning from brick-and-mortar retail to online e-commerce. Particularly, engaging in both physical offline sales and e-commerce – what is called multi-channel retailing, or click-and-mortar – was identified as a crucial paradigm shift. In the USA, more than 50%50\% of multi-channel retailers reported positive operating margins as early as 2001 (Kim and Park, 2005). Investing in e-commerce gave companies access to new markets at relatively low capital costs. Those customers who shopped both online and offline exhibited greater loyalty to the business. The stable customer base and retailer trust acquired from years of brick-and-mortar operation could be extended to the online domain with e-commerce expansions. Engaging with online demand also gives businesses low-cost direct contact with their customers (Fruhling and Digman, 2000). This not only helps build customer relationships faster, but also enables the construction of customer profiles, the analysis of customer interests and purchasing behaviour, and comparison of similar profiles to make accurate and relevant recommendations. The comprehensive and coherent advantages of online e-commerce have even led to the evolution of omni-channel retailing, which is the modification of multi-channel retailing such that customers can experience and exploit a seamless array of multiple purchase possibilities (Verhoef et al., 2015).

We next discuss the impact of online demand on vehicle routing, both in terms of emerging obstacles and recent solution approaches. We then describe solution strategies which incorporate ML, in line with the methodologies outlined in Section 2.1.

2.2.1 Online demand in vehicle routing

The benefits of e-commerce come with the challenges of online demand. By its dynamic and uncertain nature, online demand is difficult to fulfill. Failed first deliveries can account for up to 60%60\% of all orders for a service provider (Song et al., 2009), the primary reason being the absence of the customer during package handover. This is partially due to modern lifestyle changes such as increase in single person households and flexible (but demanding) work patterns. But a major contributing factor also is that no delivery time slot is explicitly agreed upon between the customer and the e-commerce retailer. This leads to several detrimental effects. If packages are left unattended then lack of security is a concern. Repeated deliveries or customers personally travelling to collection points result in economic losses as well as excess carbon emissions (Edwards et al., 2010). A high rate of unsatisfied or partially satisfied demand also strains driver-customer relationships (Masorgo et al., 2023). However, if a delivery time window is agreed upon through some form of online demand management, then the success of last mile deliveries is shown to increase (Van Duin et al., 2016).

Many studies dealing with online demand employ some form of heuristic approach to modify routes as and when new demand requests emerge. Hong et al. (2019) consider a shipping company that faces online demand, but delivers products to pre-determined collection centres that the customer must visit on their own. They implement an ant colony based heuristic which changes the previously decided collection centre for a particular customer in light of incoming demand requests. Mobility services also fall under stochastic and online demand, where the commodity sold is the route itself. Bertsimas et al. (2019) analyse online taxi ride-sharing with time windows for pickups. They employ a cost metric to measure waiting or empty driving time between customers. The network is pruned by keeping, for each customer, the top kk subsequent customers with the lowest costs. A maxflow heuristic is applied by reducing pickup time windows to a single randomly selected time instant. The resulting route arcs are retained, and the process is iteratively repeated until a specified number of arcs is reached. Finally, a mixed-integer formulation is applied to the modified network for obtaining results. Consolidation of delivery requests can also save costs by revealing efficient routes. Muñoz-Villamizar et al. (2024) explore stochastic demand where customer requests arrive daily. The city of operation is divided into regions, each with an expected customer density, estimated average velocity and travel distance per order. A region-specific postponement cost function is proposed for quantifying the delay of a request in a given region to a later date. A tabu-search metaheuristic modifies vehicle routes by identifying better (lower cost) routes in their local neighbourhood. The neighbourhood is traversed by applying insertion, swap and inversion operators that transform the customer sequence of a vehicle route.

2.2.2 ML-based solution methods

Using ML to assist optimization was given much attention in the transport literature. The same-day delivery problem is a rich base for such joint implementations as mentioned in Section 2.1.1. Chen et al. (2023) study same-day delivery under fairness, dividing the service area into regions. Service availability is defined as the ratio of expected accepted requests per day to the expected overall number of requests. The objective is to maximize both the service availability in the entire area and the minimum service availability across all regions. The learning agent receives rewards for each fulfilled request, guiding it to improve its policy by considering changes in both components of the objective function. Feng et al. (2022) apply reinforcement learning to the real-time ride-sourcing problem with incorporation of public transport services. For every customer ride request, feasible routes are heuristically generated. For every driver capable of responding to the request, the learning agent estimates the perceived value of the route-driver pairs. Finally an integer linear program is solved to determine the optimal allocation of drivers to routes. If electric vehicles are involved, then charge depletion adds an additional factor of uncertainty. Basso et al. (2022) study the electric vehicle routing problem where both arrival of customer requests and energy consumption per unit road length are stochastic. Charging locations are deterministic. At each state of the MDP formulation, the reinforcement learning agent estimates the energy cost of moving to possible subsequent states and, in doing so, the probability of battery charge falling below a safety threshold.

The predict-then-optimize frameworks discussed in Section 2.1.2 have also been applied in stochastic transport optimization. Chu et al. (2021) implement the SPO method to solve the last-mile problem for an online food delivery service. They consider driver behaviour, traffic conditions and route lengths as input features to predict travel times. Simulated annealing is used to generate feasible routes, which are further updated with customer exchange operators to derive new routes. Baty et al. (2024) design a machine learning pipeline which relies on a downstream optimization component to improve predictions. The authors study the dynamic vehicle routing problem with time windows (VRPTW), where all requests must be fulfilled and fleet size is unlimited. They demonstrate that solving an equivalent prize-collecting VRPTW, where each unfulfilled customer is assigned a prize or profit, addresses this version of the VRP. An ML model determines the optimal allocation of prizes, yielding a near-optimal solution for the original problem. A hybrid genetic search algorithm solves the prize-collecting VRPTW based on input from the ML component. The ML model refines itself through a loss function, which measures the disparity between the target and acquired objective function values. The target objective value is derived by solving a static VRPTW from historical data, utilizing the same hybrid genetic search algorithm.

2.3 UAV for urban logistics

The increasing adoption of UAVs is due to their autonomy and efficiency. They are also versatile and affordable in relatively large numbers, and these attributes are getting better with time. Recent advancements in drone technology exhibit improvements along multiple dimensions (Chen et al., 2021). Upgrades in battery and propulsion technology allow enhanced manoeuvrability, longer flight times, greater altitude capabilities and higher payload capacity. For example, Hecken et al. (2022) studies the construction of a heavy-lift UAV for low-altitude flight. The incorporation of machine learning and swarm technology permits greater degrees of autonomy and coordination in large groups. Progress in materials science and UAV design create scalability in drone manufacture, reduction in overall weight and better operational safety. Kornatowski et al. (2020) design a drone with collapsible rotor arms to improve safety while landing and aerodynamic efficiency during flight. Efforts are also being made to conceptualize novel UAV infrastructure, such as docking stations, precision landing sites, and innovative battery charging modes (Mohsan et al., 2023).

UAV deployment for traffic monitoring has witnessed growing attention from a research perspective. UAV patrols involve repeated deployments to specific parts of a city to monitor traffic flow based on volume (Chow, 2016). Roadside surveillance by UAVs also extends to the conception of intelligent transport systems. Working as accident report agents, UAVs can alert nearby autonomous vehicles and assist rescue teams to minimize and mitigate the damage caused (Menouar et al., 2017). Aerial data analysis can also reveal possible blockages at both traffic and railroad intersections (Congress et al., 2021). Visual and spatial information gathered by planned UAV flights can help make crucial decisions for disaster management. Glock and Meyer (2020) explore route planning for UAVs to detect sampling locations and collect atmospheric samples after a fire or hazardous chemical accident. This data is then used to predict the atmospheric distribution of toxic substances in the proximity of the affected region.

There are many models for payload delivery using UAVs in the literature. UAVs can either be deployed independently, or in association with conventional vehicles such as trucks (with or without synchronization). This paper focuses on the former, i.e. the deployment of a homogeneous fleet of UAVs without ground vehicular support. However, for completeness, we delve into the relevant literature for both drone-only and truck and drone logistics, in that order.

2.3.1 Drone-only routing

UAVs are cheaper to maintain, reduce labour and delay costs and have a lower carbon footprint compared to trucks. Amazon and Alibaba have been frontrunners in adopting UAVs for logistics, in moderate density urban regions and to deliver time-sensitive packages respectively. Many other companies across the world are following suit (Han et al., 2022). Additionally, a majority of customers in urban locations live close to a supermarket, and most delivery packages are relatively light-weight (Moshref-Javadi and Winkenbach, 2021). This includes food, beverages, groceries and also medical and pharmaceutical supplies, all of which have a perishability constraint which makes UAVs an attractive choice.

Levin and Rey (2023) explore the drone delivery service planning problem and propose a novel branch-and-price algorithm for its solution. They also design a reservation heuristic to generate feasible solutions expediently. Vlahovic et al. (2017) perform a case study of a UAV delivery model for pharmaceutical supplies, exhibiting cost reductions and improvements in service responsiveness. Retail facility placements can be improved when network design is conducted under the availability of a UAV delivery service (Baloch and Gzara, 2020). Liu (2019) propose a mixed integer programming formulation for online meal delivery using UAVs. The objective function penalizes excess battery depletion, motivates fast delivery and minimizes unnecessary movement for re-routing. The model is executed in a progressive rolling fashion across discrete time-steps. Chen et al. (2021) explore key strategic and tactical decisions for retailers implementing drone-based delivery operations, focusing on when to offer drone delivery and the appropriate pricing. Using a Markov decision process (MDP) framework, they introduce heuristic procedures to obtain near-optimal closed-form solutions efficiently. The aim is to assist online retailers in real-time decision-making regarding the feasibility and extent of offering drone-based delivery for specific product categories in various service zones. Additionally, they identify effective pricing strategies for drone-based deliveries. Payload delivery expands outside the realm of delivery to consumers. For example, UAVs can be used in precision agriculture, to simultaneously monitor crop sites and perform spraying tasks. This increases the efficiency of both pesticide and fertiliser use (Radoglou-Grammatikis et al., 2020).

The study of drone-only logistics also attracts ML-enriched solution strategies. Asadi et al. (2022) study drone delivery of medical supplies to hospitals, categorizing demand based on hospital distance from the drone hub. Each demand class includes a demand distribution and the required battery charge for successful delivery. The problem is formulated as a MDP where the system state is a vector containing the number of batteries at specific charge levels within the hub. The concise state expression allows reinforcement learning where a lookup table suffices for value function approximation. The action space involves charging batteries from lower to higher levels within decision epochs. This model optimizes battery usage, preserving charge for future deliveries and reducing replacement and charging costs. UAVs also have potential as a supplementary resource in two-echelon logistics networks. Li et al. (2024) study the last-mile delivery problem, where trucks transport goods from a primary depot to satellite depots, and UAVs complete the delivery to customers. Note that we categorize this study as drone-only logistics since trucks are treated as a stochastic externality. The stochastic nature of parcel arrival times at satellites is handled through Bayesian estimation, converting continuous arrival scenarios into discrete expected arrival intervals. A two-stage stochastic model is applied: parcel consolidation is first performed to release a subset for immediate delivery, and the remaining parcels are reserved for routing with future arrivals. Next, the route is treated as a particle in a particle swarm optimization problem. Reinforcement learning using search operators incrementally permutes the route in each iteration to discover better routes in the local neighbourhood.

2.3.2 Truck and drone logistics

The integration of UAVs into the logistic service fleet (together with conventional vehicles) can result in considerable benefits (Rejeb et al., 2023). Such a multi-modal delivery model decreases delivery times, overcomes traffic congestion and minimises human participation in the delivery process. Dayarian et al. (2020) investigate a same-day delivery model where a fleet of trucks is assisted by UAVs (through commodity resupply) at mutually determined meeting locations, allowing trucks to make longer trips before returning to the depot. Gu et al. (2023) investigate a single truck-and-drone scenario for online demand, consisting of deterministic delivery requests along with online pickup requests. The drone revisits the truck for resupply when needed, such revisits dividing the route into segments. An insertion heuristic compares the costs of placing pickup customers at various segments of the initial route. A 15%15\% increase in profits is reported, with UAVs contributing a 50%50\% increase in the acceptance rate of online requests. Yang et al. (2023) study a robust drone-truck delivery problem with customer time windows. They leverage the precision of drone routes to mitigate uncertainties in truck travel times caused by ground traffic congestion. Unlike Gu et al. (2023), here the truck remains at a customer location until the drone serves other customers and returns. Numerical experiments show that the drone-truck combination can safe-guard against the risk of failed deliveries due to unmet time windows. Rave et al. (2023) broaden the truck-drone route planning framework by introducing deployable drone launching stations known as microdepots. In this setup, trucks leave a central distribution centre equipped with both onboard drones and microdepots. The authors address a truck-drone fleet planning problem for a single logistics service provider using adaptive large neighbourhood search. Experiments demonstrate that mixed fleets provide cost-savings over a truck-only fleet, and that the cost-optimal choice of fleet composition and delivery method involves considering all possible combinations.

Heterogeneous fleets logistics, such as truck and drone models, also welcome ML-supported solutions. Chen et al. (2022) handle a mixed fleet of vehicles and UAVs to handle online requests. Network evolution is modelled as a MDP. Routing heuristics determine route feasibility by vehicle or drone, and deep reinforcement learning is then used to determine if a vehicle or a drone should be dispatched. Ghiasvand et al. (2024) tackle a two-echelon multi-trip truck-drone routing problem with uncertainty. Trucks make multiple trips from the depot to local distribution centres (LDCs), while UAVs make single trips from LDCs to customers. The research formulates a mixed-integer linear programming model to minimize total customer waiting times, managing uncertain customer demand through kernel-function based machine learning algorithms.

In an urban scenario, the operation environment of UAVs can mimic the road network. Buildings and other infrastructure can impede flight paths or require excessive energy to overcome aerially. But the airspace above roadways is generally free of obstructions. A possible drone flight network can be a map of the road traffic network itself, where links are above roads and nodes are above traffic intersections and several flight levels may be considered (Levin and Rey, 2023). We adopt this problem and context for the online DDSP.

3 Problem description and formulation

We first introduce the deterministic demand formulation for the DDSP in Section 3.1. We next present the online demand model in Section 3.2 and the MDP formulation in Section 3.3.

3.1 Deterministic DDSP

The deterministic (or static) version of the DDSP is examined by Levin and Rey (2023) through an integer linear programming (ILP) formulation. This formulation is discussed below and serves as the foundation for the model and solution approach designed in this study for the online DDSP.

Let G=(N,A)G=(N,A) be a network across which customers send transport requests. These requests, if accepted, are fulfilled by a fleet of drones. Every link iAi\in A has a length ρi\rho_{i} and capacity CiC_{i}. All links are directed. The drones traverse the network through these links at a constant velocity vv. Each node nn has a set of incoming and outgoing links Γn\Gamma^{-}_{n} and Γn+\Gamma^{+}_{n} respectively. The time horizon is discretized into time steps {0,1,,t,,T}\{0,1,\ldots,t,\ldots,T\}, where TT marks the end of the horizon. Customers submit requests to the network before the time period of network operation TT. Each customer request rr consists of (a) spatial parameters – the request origin oro_{r} and destination drd_{r} such that or,drNo_{r},d_{r}\in N, (b) temporal parameters – the earliest permissible time of departure ere_{r} and the arrival time window at the destination [lr,ur][l_{r},u_{r}] and (c) a profit prp_{r} earned upon fulfilling this request. The set of all requests that arrive over the entire time horizon is denoted by \mathcal{R}. In the static version, the network manager is aware of the complete request set \mathcal{R} before the time horizon begins.

Levin and Rey (2023) introduce a link-based network formulation. Every link is divided into an upstream and downstream segment (Figure 1). The key decision variable χir(t){0,1}\chi^{r\uparrow}_{i}(t)\in\{0,1\} indicates whether request rr occupies the upstream portion of link iAi\in A at time tt. The counterpart of χir(t)\chi^{r\uparrow}_{i}(t) for the downstream segment is χir(t)\chi^{r\downarrow}_{i}(t).

Refer to caption
Figure 1: Link-Based Formulation

Two successive links connected by a node is called a turnturn. The decision variable γijr(t){0,1}\gamma_{ij}^{r}(t)\in\{0,1\} indicates whether request rr occupies the turn from link ii to jj at time tt. Note that the movement of the drone from the downstream segment of link ii to the upstream segment of link jj through the connecting node is assumed to be instantaneous. A single node can serve as the junction for many turns. For a given turn (i,j)(i,j), the set of all other turns sharing the same junction node is denoted by 𝒞ij\mathcal{C}_{ij}. The decision variable ϕij(t){0,1}\phi_{ij}(t)\in\{0,1\} resolves turn conflicts at every node by indicating which turn is active at a given time tt, consequently deactivating all other turns which share that node as a junction.

Finally, the decision variable zr{0,1}z_{r}\in\{0,1\} indicates the acceptance or rejection of request rr, and the objective is to maximize the total profit acquired. The ILP Formulation (1) is presented below:

maxZ=\displaystyle\max\ Z= rzrpr\displaystyle\sum\limits_{r\in\mathcal{R}}z_{r}p_{r} (1a)
s.t. iΓor+t=erTχir(t)=zr\displaystyle\sum\limits_{i\in\Gamma_{o_{r}}^{+}}\sum\limits_{t=e_{r}}^{T}\chi^{r\uparrow}_{i}(t)=z_{r} r\displaystyle\forall r\in\mathcal{R} (1b)
iΓdrt=lrurχir(t)=iΓor+t=erTχir(t)\displaystyle\sum\limits_{i\in\Gamma_{d_{r}}^{-}}\sum\limits_{t=l_{r}}^{u_{r}}\chi^{r\downarrow}_{i}(t)=\sum\limits_{i\in\Gamma_{o_{r}}^{+}}\sum\limits_{t=e_{r}}^{T}\chi^{r\uparrow}_{i}(t) r\displaystyle\forall r\in\mathcal{R} (1c)
χir(t+ρiv)=χir(t)\displaystyle\chi^{r\downarrow}_{i}\left(t+\frac{\rho_{i}}{v}\right)=\chi^{r\uparrow}_{i}(t) r,iA,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall i\in A,\forall t\in\{0,\ldots,T\} (1d)
rχir(t)Ci\displaystyle\sum\limits_{r\in\mathcal{R}}\chi^{r\uparrow}_{i}(t)\leq C_{i} iA,t{0,,T}\displaystyle\forall i\in A,\forall t\in\{0,\ldots,T\} (1e)
rχir(t)Ci\displaystyle\sum\limits_{r\in\mathcal{R}}\chi^{r\downarrow}_{i}(t)\leq C_{i} iA,t{0,,T}\displaystyle\forall i\in A,\forall t\in\{0,\ldots,T\} (1f)
iΓnχir(t)=iΓn+χir(t)\displaystyle\sum\limits_{i\in\Gamma_{n}^{-}}\chi^{r\downarrow}_{i}(t)=\sum\limits_{i\in\Gamma_{n}^{+}}\chi^{r\uparrow}_{i}(t) nN,r,t{0,,T}\displaystyle\forall n\in N,\forall r\in\mathcal{R},\forall t\in\{0,\ldots,T\} (1g)
γijr(t)χir(t)\displaystyle\gamma^{r}_{ij}(t)\leq\chi^{r\downarrow}_{i}(t) r,(i,j)A2,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1h)
γijr(t)χir(t)\displaystyle\gamma^{r}_{ij}(t)\leq\chi^{r\uparrow}_{i}(t) r,(i,j)A2,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1i)
γijr(t)χir(t)+χir(t)1\displaystyle\gamma^{r}_{ij}(t)\geq\chi^{r\downarrow}_{i}(t)+\chi^{r\uparrow}_{i}(t)-1 r,(i,j)A2,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1j)
ϕij(t)1||rγijr(t)\displaystyle\phi_{ij}(t)\geq\frac{1}{|\mathcal{R}|}\sum\limits_{r\in\mathcal{R}}\gamma^{r}_{ij}(t) (i,j)A2,t{0,,T}\displaystyle\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1k)
ϕij(t)+(i,j)𝒞ijϕij(t)1\displaystyle\phi_{ij}(t)+\sum\limits_{(i^{\prime},j^{\prime})\in\mathcal{C}_{ij}}\phi_{i^{\prime}j^{\prime}}(t)\leq 1 (i,j)A2,t{0,,T}\displaystyle\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1l)
γij(t){0,1}\displaystyle\gamma_{ij}(t)\in\{0,1\} r,(i,j)A2,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1m)
ϕijr(t){0,1}\displaystyle\phi_{ij}^{r}(t)\in\{0,1\} (i,j)A2,t{0,,T}\displaystyle\forall(i,j)\in A^{2},\forall t\in\{0,\ldots,T\} (1n)
χir(t),χir(t){0,1}\displaystyle\chi^{r\uparrow}_{i}(t),\chi^{r\uparrow}_{i}(t)\in\{0,1\} r,iA,t{0,,T}\displaystyle\forall r\in\mathcal{R},\forall i\in A,\forall t\in\{0,\ldots,T\} (1o)
zr{0,1}\displaystyle z_{r}\in\{0,1\} r\displaystyle\forall r\in\mathcal{R} (1p)

Equations (1b) and (1c) ensure that only accepted requests are routed, and the route reaches the destination drd_{r} within the time window. Equation (1d) monitors link travel time while equations (1e)–(1f) prevent capacity violations. Equation (1g) conserves link flow, and equations (1h)–(1j) construct valid turns. Lastly, equations (1k)–(1l) resolve conflicts at turn junctions (i.e., nodes with non-empty Γn+\Gamma_{n}^{+} and Γn\Gamma_{n}^{-}) in the network.

3.2 Online demand model

We build upon the deterministic DDSP to establish the online demand model. The time horizon is also divided into II intervals, each of duration DD, such that DI=TDI=T. The successive execution of all II intervals concludes a single instance (or day) of network operation. The number of requests arriving at the network in each interval is governed by a Poisson process with rate λ\lambda. This implies that, on average, the expected total number of requests submit to the network during the entire time horizon is λI\lambda I. Within a specific interval, the time of submission of each request (by the customer) is assumed to be distributed uniformly.

Customers submit requests to the network during each interval iIi\in I. The current incoming requests in interval ii (denoted by Ci\mathcal{R}_{C}^{i}\subset\mathcal{R}) can either be accepted or rejected. Those which are accepted are added to the set \mathcal{R}_{\checkmark}\subset\mathcal{R} and must then be routed through the network. Once a request is accepted (assigned a drone and a planned route), it cannot later on be rejected or left unfulfilled. As soon as a drone initiates its route within the network, the corresponding request is termed activeactive. Alternatively, if a request is accepted (and routed) but the assigned drone has not left the origin, the request is termed idleidle. Let I\mathcal{R}_{I}\subset\mathcal{R}_{\checkmark} and A\mathcal{R}_{A}\subset\mathcal{R}_{\checkmark} be the sets of idle and active requests respectively. All idle requests in the current interval are carried over to the next one, where their planned routes are subject to change. Once departed, no route changes are permitted and the drone is assumed to complete the delivery and drop off the network. Upon route completion the corresponding request is removed from A\mathcal{R}_{A} and \mathcal{R}_{\checkmark}. The problem objective is to maximize the total profit earned by servicing requests in the given time horizon. This demand model is illustrated in Figure 2.

Refer to caption
Figure 2: Online demand model: the circles represent requests while i,i+1,i+2i,i+1,i+2 are time intervals

3.3 Formulation as a Markov decision process

The dynamic evolution of the network (arrival and acceptance of requests, followed by drone traffic routing and optimization) can be formulated as a Markov Decision Process (MDP). In an MDP, the problem is decomposed into an environmentenvironment and an agentagent. In the online DDSP, the network is the environment, and the network manager (who accepts and routes requests) is the agent. The environment produces statesstates, based on which the agent takes actionsactions, to receive rewardsrewards. The agent decides the best action at a given state. The environment subsequently controls the transition to the next state.

3.3.1 State space

The evolution of drone traffic in the network is represented as a sequential MDP moving through states. The state of the network captures all relevant information about its current structure. The state srs_{r} is constructed with the following components:

  1. 1.

    trt_{r}: The time step at which a new request arrives. This triggers the creation of a new state. Note that two successive requests may arrive several time steps apart i.e., tr+1tr1t_{r+1}-t_{r}\geq 1.

  2. 2.

    r\vec{r}: The request vector contains details of the request. These are: the request sequence number, origin node, destination node, earliest departure time, time window of arrival at the destination, and request profit. We write r=(r,or,dr,er,[lr,ur],pr)\vec{r}=(r,o_{r},d_{r},e_{r},[l_{r},u_{r}],p_{r}).

  3. 3.

    Θrpre\vec{\Theta}_{r}^{pre}: The route vector θr\vec{\theta}_{r} contains details of the drone route assigned to request rr. The collection of all route vectors in state srs_{r} is Θrpre\vec{\Theta}_{r}^{pre}. The composition of θr\vec{\theta}_{r} reveals the request sequence number, the time of drone departure, and sequence of nodes in the route (r,tor,[or,,dr])(r,t_{o_{r}},[o_{r},\ldots,d_{r}]). For sake of brevity, the sequence of nodes in the route is also denoted by μr=[or,,dr]\mu_{r}=[o_{r},\ldots,d_{r}]. It can be known if the request rr is active or idle based on whether tort_{o_{r}} lies within the current interval or without. If a request becomes active, its route vector is immune to any further change. If a request is fulfilled by the time the new state is triggered, its corresponding θr\vec{\theta}_{r} is removed from Θrpre\vec{\Theta}_{r}^{pre}.

Hence the state srs_{r} is represented as the tuple sr=(tr,r,Θrpre)s_{r}=(t_{r},\vec{r},\vec{\Theta}_{r}^{pre}). Every interval witnesses the creation of as many states as there are request arrivals in that interval. Between the acceptance (and subsequent routing) or rejection of a request and the arrival of the successive request, the only change in the network is the movement of active drones as per their routes. Since this is a deterministic change (the routes of active drones are known and fixed), a new state need not be declared until the arrival of the next request.

3.3.2 Action space

After the arrival of a new request rr triggers the creation of state srs_{r}, the agent must take an action ara_{r} from the feasible action space 𝒜(sr)\mathcal{A}(s_{r}). The action space allows the rejection of request rr, but also contains all possible ways in which request rr can be routed through the network, provided four conditions are met:

  1. 1.

    The drone must not exceed the capacity of a link pre-occupied by other active drones.

  2. 2.

    The drone must not, at any time-step, utilize a turn junction which is already utilized by an active drone at that time-step.

  3. 3.

    The drone must not depart the request origin before ere_{r}.

  4. 4.

    The drone must reach the request destination within the time window [lr,ur][l_{r},u_{r}].

The action space 𝒜(sr)\mathcal{A}(s_{r}) also permits re-routing idle requests from previous intervals in order to accommodate request rr or to achieve a better objective value. The action taken is then represented as the tuple ar=(zr,Θrpost)a_{r}=(z_{r},\vec{\Theta}_{r}^{post}) where zrz_{r} is a binary decision regarding request rr and Θrpost\vec{\Theta}_{r}^{post} is the updatedupdated collective route vector, which would include a route θr\vec{\theta}_{r} for request rr if zr=1z_{r}=1. At a given state, action selection is done by the agent’s policy, which maps states to actions. Conventionally, a policy is represented as π(ar|sr)\pi(a_{r}|s_{r}), and governs the probability of taking action ara_{r} when in state srs_{r}.

3.3.3 Reward and value functions

Accepting a request increases the value of the objective function and should therefore indicate a reward. The reward received in state sr=(tr,r,Θrpre)s_{r}=(t_{r},\vec{r},\vec{\Theta}_{r}^{pre}) with action ar=(zr,Θrpost)a_{r}=(z_{r},\vec{\Theta}_{r}^{post}) is simply:

R(sr,ar)=zrprR(s_{r},a_{r})=z_{r}p_{r} (2)

This gives a reward of prp_{r} for accepting a request and a reward of 0 for rejecting it. Consequently, the reward accumulated over the course of interval iIi\in I is given by rCiR(sr,ar)=rCizrpr\sum\limits_{r\in\mathcal{R}_{C}^{i}}R(s_{r},a_{r})=\sum\limits_{r\in\mathcal{R}_{C}^{i}}z_{r}p_{r}. The overall cumulative reward is tr[0,T]zrpr\sum\limits_{t_{r}\in[0,T]}z_{r}p_{r}, which is equivalent to rzrpr\sum\limits_{r\in\mathcal{R}}z_{r}p_{r}.

The state-action pair (sr,ar)(s_{r},a_{r}) captures the process of taking an action ara_{r} while in a state srs_{r}. In the application of ML under the MDP framework, every state-action pair is assigned a relative value in comparison to all other pairs. This value indicates the expected final objective (or reward) that can be achieved given that the agent occupies state srs_{r} and takes action ara_{r}. The learning model presented in this study seeks to estimate the value of taking a certain action while occupying a certain state. In other words, it seeks to discover and exploit the action value function Q(sr,ar)Q(s_{r},a_{r}). Thus “learning” is equivalent to improving the accuracy of the value estimations of Q(sr,ar)Q(s_{r},a_{r}) to a satisfactory degree.

3.3.4 Online information and state transition

The transition to the next state occurs under transition probabilities PsrsrarP^{a_{r}}_{s_{r}\rightarrow s_{r}^{\prime}}. The transition probability determines which state the environment jumps to once the agent undertakes a specific action in the current state. There are many application scenarios where the transition probability matrix is either unknown to the agent, or the size of the state space prohibits its use. In such scenarios the agent can only control its policy, and has no prior knowledge of how the environment will react to its actions. The drone delivery network with online demand falls under this scenario.

As mentioned, the density of request arrival across the time horizon is due to a Poisson process with rate λ\lambda. The origins and destinations of the requests are generated via spatial distributions, the time windows of arrival at the destinations are governed by a temporal distribution, and the request profits are sourced from a profit distribution. However, the agent or network manager has no prior information about any of these distributions or the Poisson process. This makes the problem online and the nature of request arrival becomes exogenous to the agent.

3.3.5 Objective function and optimal policy

The objective is to maximize the cumulative profit earned from requests served over the entire horizon. This is identical to maximizing the cumulative reward acquired in the MDP formulation. In MP terminology, this becomes:

maxZ=rzrpr\text{max}\ Z=\sum\limits_{r\in\mathcal{R}}z_{r}p_{r} (3)

Given an (accurate) action value function Q(sr,ar)Q(s_{r},a_{r}), an optimal policy π\pi^{*} would aim to choose action ara_{r} in state srs_{r} such that the state-action pair (sr,ar)(s_{r},a_{r}) returns the maximum achievable QQ-value in that state, i.e.:

π(ar|sr)={1,if ar=argmax𝑎Q(sr,a)0,otherwise \pi^{*}(a_{r}|s_{r})=\begin{cases}1,&\text{if }a_{r}=\underset{a}{\arg\max}\ Q(s_{r},a)\\ 0,&\text{otherwise }\end{cases} (4)

This is equivalent to stating that the policy, out of the set of all policies Π\Pi, that maximizes the objective function, i.e. the expected total reward gained over the course of the time horizon, is π\pi^{*}:

πargmaxπΠ𝐄[r=1||R(sr,Arπ(sr))|sr1]\pi^{*}\in\underset{\pi\in\Pi}{\arg\max}\ \mathbf{E}\left[\sum\limits_{r=1}^{|\mathcal{R}|}R(s_{r},A_{r}^{\pi}(s_{r}))\Big{|}s_{r-1}\right] (5)

Here Arπ(sr)A_{r}^{\pi}(s_{r}) is the action taken by policy π\pi in state srs_{r}. The state s0s_{0} depicts the initial empty network where no request has yet arrived. Note that the limits of summation in (3) and (5) are slightly different because, unlike in (3), in (5) we emphasize operating upon requests sequentially to demonstrate the evolution of the MDP.

4 Solution methodology

A rolling implementation of Formulation (1) in consecutive intervals iIi\in I is developed to solve the online DDSP. Section 4.1 first outlines the rolling ILP implementation as a solution framework. This is followed by Section 4.2 which illustrates the motivation to construct a surrogate ILP objective function. The determination of predictive prescriptions via a training scheme is provided in Section 4.3. The resulting data-driven optimization approach and the proposed policy for solving the online DDSP are summarized in Section 4.4.

4.1 Myopic ILP

Five modifications need to be made to Formulation (1) for the dynamic online DDSP. First, since Formulation (1) was developed for the static DDSP, all time-indexed variables and constraints extend over the entire time horizon TT. However, in the online demand context, the ILP will be solved repeatedly for each interval iIi\in I in a rolling fashion. This requires that, in a given interval, the time-indexed variables and constraints extend from the beginning to the end of that interval. Hence the limits of tt change from {0,,T}\{0,\ldots,T\} to {(i1)D,,iD}\{(i-1)D,\ldots,iD\} for interval ii.

Second, since idle requests are carried over from previous intervals, any change in their zrz_{r} values is not permissible. Requests once accepted must be fulfilled. The same is true for active requests from preceding intervals as well. This implies that in each interval,

zr=1rIAz_{r}=1\qquad\forall r\in\mathcal{R}_{I}\cup\mathcal{R}_{A} (6a)
Third, the objective function needs to be implemented for all requests which have arrived in the current interval ii, and for idle requests from previous intervals. The latter is because idle requests are still a part of the solution process, since their routes are subject to re-optimization. This is captured as:
maxZ=rOizrprwhereOi=CiIiI\text{max}\ Z=\sum\limits_{r\in\mathcal{R}_{O}^{i}}z_{r}p_{r}\quad\text{where}\ \mathcal{R}_{O}^{i}=\mathcal{R}_{C}^{i}\cup\mathcal{R}_{I}\ \forall i\in I (6b)

Fourth, since idle requests may still be re-routed, their earliest time of departure ere_{r} cannot precede the start of the current interval ii. An idle request cannot be re-routed such that its new route begins at a time-step that has already passed.

er={(i1)D,ifer(i1)Der,otherwiserIe_{r}=\begin{cases}(i-1)D,\quad\text{if}\ e_{r}\leq(i-1)D\\ e_{r},\quad\text{otherwise}\\ \end{cases}\forall r\in\mathcal{R}_{I} (6c)

Finally, the unfinished portion of active request routes which will be undertaken in the current interval must be preserved. Suppose we are in interval ii and the last updated route vector for (currently) active request rr is θr=(r,tor,[or,,dr])\vec{\theta}_{r}=(r,t_{o_{r}},[o_{r},\ldots,d_{r}]). Let the number of nodes in its path μr\mu_{r} be PP and let ω(m,n)\omega(m,n) denote the directed link from node mm to node nn. Then the definitive path for request rr is μr=[or,n2,,nk,,nP1,dr]\mu_{r}=[o_{r},n_{2},\ldots,n_{k},\ldots,n_{P-1},d_{r}] where n1=orn_{1}=o_{r} and nP=drn_{P}=d_{r}. The following impositions must be made for all such active requests rAr\in\mathcal{R}_{A}:

χω(nk1,nk)r(tnkr)=1,ifnkorχω(nk,nk+1)r(tnkr)=1,ifnkdrγω(nk1,nk)ω(nk,nk+1)(tnkr)=1,ifnk{or,dr}}nkμr(i1)DtnkriD\begin{rcases}\chi^{r\downarrow}_{\omega(n_{k-1},n_{k})}(t_{n_{k}}^{r})=1,\quad\text{if}\ n_{k}\neq o_{r}\\ \chi^{r\uparrow}_{\omega(n_{k},n_{k+1})}(t_{n_{k}}^{r})=1,\quad\text{if}\ n_{k}\neq d_{r}\\ \gamma_{\omega(n_{k-1},n_{k})\omega(n_{k},n_{k+1})}(t_{n_{k}}^{r})=1,\quad\text{if}\ n_{k}\notin\{o_{r},d_{r}\}\\ \end{rcases}\forall\ n_{k}\in\mu_{r}\mid(i-1)D\leq t_{n_{k}}^{r}\leq iD (6d)

tnkrt_{n_{k}}^{r} denotes the time at which the drone responsible for request rr reaches node nkn_{k}. tnkrt_{n_{k}}^{r} can be determined since tort_{o_{r}}, link lengths and drone velocity are known. Solving Formulation (1) with these five modifications at every interval completes the rolling ILP implementation. The pseudo-code is presented in Algorithm 1. Since this model does not consider the impact of current decisions on the fate of requests submitted in future intervals, and only focuses on the current interval, we hereby refer to it as the Myopic ILP.

1
2for d1d\leftarrow 1 to NdaysN_{days} do
3       initialize request set \mathcal{R}_{\checkmark} and collective route vector Θ\vec{\Theta}
4       for i1i\leftarrow 1 to II do
5             initialize request sets A,I,Ci\mathcal{R}_{A},\mathcal{R}_{I},\mathcal{R}_{C}^{i}
             // request segregation
6             for rr\in\mathcal{R}_{\checkmark} do
7                   if tor(i1)Dt_{o_{r}}\leq(i-1)D then
8                        AA{r}\mathcal{R}_{A}\leftarrow\mathcal{R}_{A}\cup\{r\}
9                  else
10                        if er(i1)De_{r}\leq(i-1)D then
11                               er(i1)De_{r}\leftarrow(i-1)D
12                        II{r}\mathcal{R}_{I}\leftarrow\mathcal{R}_{I}\cup\{r\}
13                  
14            populate Ci\mathcal{R}_{C}^{i}
             // model formulation
15             add constraints (1b) – (1p) with t{(i1)D,,iD}t\in\{(i-1)D,\ldots,iD\} wherever applicable
16             for rIAr\in\mathcal{R}_{I}\cup\mathcal{R}_{A} do
17                   add constraint zr=1z_{r}=1
18            for rAr\in\mathcal{R}_{A} do
19                   add constraints
20                   χω(nk1,nk)r(tnkr)=1,ifnkorχω(nk,nk+1)r(tnkr)=1,ifnkdrγω(nk1,nk)ω(nk,nk+1)(tnkr)=1,ifnk{or,dr}}nkμr(i1)DtnkriD\begin{rcases}\chi^{r\downarrow}_{\omega(n_{k-1},n_{k})}(t_{n_{k}}^{r})=1,\ \text{if}\ n_{k}\neq o_{r}\\ \chi^{r\uparrow}_{\omega(n_{k},n_{k+1})}(t_{n_{k}}^{r})=1,\ \text{if}\ n_{k}\neq d_{r}\\ \gamma_{\omega(n_{k-1},n_{k})\omega(n_{k},n_{k+1})}(t_{n_{k}}^{r})=1,\ \text{if}\ n_{k}\notin\{o_{r},d_{r}\}\\ \end{rcases}\forall\ n_{k}\in\mu_{r}\mid(i-1)D\leq t_{n_{k}}^{r}\leq iD
21             add objective maxZ=rOizrpr\max\ Z=\sum\limits_{r\in\mathcal{R}_{O}^{i}}z_{r}p_{r}
22             solve modified Formulation (1)
             // update/add drone routes
23             for rOir\in\mathcal{R}_{O}^{i} do
24                   if rIr\in\mathcal{R}_{I} then
25                         update θr(r,tor,μr)\vec{\theta_{r}}\leftarrow\ (r,t_{o_{r}},\mu_{r})
26                  if rCir\in\mathcal{R}_{C}^{i} then
27                         if zrsol=1z_{r}^{sol}=1 then
28                               {r}\mathcal{R}_{\checkmark}\leftarrow\mathcal{R}_{\checkmark}\cup\{r\}
29                               θr(r,tor,μr)\vec{\theta_{r}}\leftarrow\ (r,t_{o_{r}},\mu_{r})
30                               ΘΘ{θr}\vec{\Theta}\leftarrow\vec{\Theta}\cup\{\theta_{r}\}
31                        
32                  
            // remove completed requests
33             for rr\in\mathcal{R}_{\checkmark} do
34                   if tdriDt_{d_{r}}\leq iD then
35                         \{r}\mathcal{R}_{\checkmark}\leftarrow\mathcal{R}_{\checkmark}\backslash\{r\}
36                         ΘΘ\{θr}\vec{\Theta}\leftarrow\vec{\Theta}\backslash\{\theta_{r}\}
37                  
38            
39      
Algorithm 1 Pseudo-code for Myopic ILP

4.2 Surrogate objective function

Maximizing profit acquisition at each time interval ignores future demand. Taking future demand into account requires making decisions that are sub-optimal in the current time interval but increase the cumulative reward by contributing to the objective of a later interval. We tolerate a sub-optimal reward per interval rCiR(sr,ar)\sum\limits_{r\in\mathcal{R}_{C}^{i}}R(s_{r},a_{r}) to achieve a greater reward-to-go rR(sr,ar)\sum\limits_{r\in\mathcal{R}}R(s_{r},a_{r}). This sub-optimality comes from allocating some resources to the future, thus preventing their use in the present. In other words, a sacrifice is made in the present to reap greater rewards in the future.

We define available link capacity per unit time as the network resource. This is captured as ClrOiχlr(t)C_{l}-\sum\limits_{r\in\mathcal{R}_{O}^{i}}\chi^{r\uparrow}_{l}(t), where the capacity of link occupied by idle and current requests is deducted from its total capacity. The strategy then translates into (intentionally) creating reserve capacity in the network at the current interval, which prevents that capacity from being utilized at the current interval. This may lead to rejection of requests or longer routes in the current interval, but is expected to yield greater profit accumulation in future intervals. We motivate the choice of using link capacity as a resource as follows: first, it is a quantitative definition, and link capacity at a given time can be easily measured during the execution of the optimization program; second, the spatial distribution of origins and destinations of customer requests, combined with conflict resolution of drone routes, leads to a heterogeneous resource space. In other words, the available capacities on different links have different value to the network. This creates a rich decision set, providing a sizeable margin with which to outperform myopic policies.

The priority of available capacity on link lAl\in A at time interval ii is denoted βi,l\beta_{i,l}. This indicates the significance of the link in the network. If a link has higher priority, then any reserve capacity on it is relatively more important to network operation and route creation, compared to other links. The formulation of a data-driven prescription of βi,l\beta_{i,l} is described in Section 4.3.1. Reserve capacity is generated on network links in proportion to their priority, since slack created on higher priority links will have greater impact on maximizing profit in future intervals.

Indefinite and uncontrolled slack creation will adversely affect network performance and reduce cumulative profit. The reserve capacity being created in the present should at some point be capitalized upon. Hence the balance between profit maximization and (βi,l\beta_{i,l} proportionate) slack creation must be controlled. We construct a new objective function to incorporate the sacrificial strategy. Since this objective function is intended to replace the myopic objective function of the Myopic ILP, it is called the surrogate objective function and comprises two components:

maxZ=rOizrpr+αilAβi,lt=(i1)DiD[ClrOiχlr(t)]\max\ Z=\sum\limits_{r\in\mathcal{R}_{O}^{i}}z_{r}p_{r}\ +\ \alpha_{i}\sum\limits_{l\in A}\beta_{i,l}\sum\limits_{t=(i-1)D}^{iD}\left[C_{l}-\sum\limits_{r\in\mathcal{R}_{O}^{i}}\chi^{r\uparrow}_{l}(t)\right] (7)

The DDSP formulation with the surrogate objective function is named the Surrogate ILP. The first component is profit maximization, and the second is slack creation based on link priority. For every link lAl\in A, we first compute the slack capacity on ll during the current time interval, and then multiply it with the relative priority of that link, βi,l\beta_{i,l}. We also define a parameter αi\alpha_{i} which influences the balance between the two components. The value of αi\alpha_{i} chosen for this interval determines to what extent reserve capacity will be created on network links (relative to profit maximization). Note that independent of the value of αi\alpha_{i}, the slack generated on a link will be proportional to the relative priority of that link. Unless, of course, if αi=0\alpha_{i}=0, in which we case the surrogate objective is reduced to the myopic objective of pure profit maximization and βi,l\beta_{i,l} has no role left to play.

In the computational experiments, both components of the surrogate objective function are standardized, so that αi\alpha_{i} can effectively control the balance. The first component – which is essentially the myopic objective function – is standardized by division with the expected total profit over the entire time horizon pavg𝐄[||]p_{avg}\mathbf{E}[|\mathcal{R}|]. This component is hereafter referred to as the Profit Component. The second component is standardized by division with the total available network capacity in that interval (lACl)D(\sum\limits_{l\in A}C_{l})D. This component is hereafter referred to as the Slack Component.

4.3 Training procedure of relative link priority (βi,l\beta_{i,l})

We next explain the training procedure used to determine predictive prescriptions of the relative link priority (βi,l\beta_{i,l}) which is used in the Surrogate ILP. The training procedure is articulated across two main steps: i) the synthesis of training data via lookahead simulation (Section 4.3.1) and ii) the training of a supervised ML model for predicting relative link priority (Section 4.3.2). To scale up the training process, we use heuristic algorithm to solve the Myopic ILPs that arise therein (Section 4.3.3).

4.3.1 Training data synthesis via lookahead simulation

We use lookahead simulations to synthesize training data that will be used to train a predictive model which itself will be queried during the execution of the Surrogate ILP policy.

The priority of a link can be correlated to the frequency of link traversal by drones in a suitable span of time. Given a duration of network operation (e.g. δ\delta intervals), crucial high priority links will experience large throughput and are likely to create bottlenecks which impede request acceptance. The link priority βi,l\beta_{i,l} is therefore formally defined as the number of times link ll is traversed by incoming requests between the beginning of interval ii and the end of interval i+δi+\delta.

βi,l=q=ii+δrCqt=(q1)DqDχlr,sol(t)\beta_{i,l}=\sum\limits_{q=i}^{i+\delta}\ \sum\limits_{r\in\mathcal{R}_{C}^{q}}\ \sum\limits_{t=(q-1)D}^{qD}\chi^{r\uparrow,sol}_{l}(t) (8)

Here χlr,sol(t)\chi^{r\uparrow,sol}_{l}(t) represents the routing solution value of the upstream-link variable χlr(t)\chi^{r\uparrow}_{l}(t) for link ll and time tt at the end of each interval. We also define a vector 𝑩\bm{\vec{B}} to store link priority values for all intervals.

𝑩=[𝜷𝒊]i{1,,I}=[βi,l]lAi{1,,I}\bm{\vec{B}}=\begin{bmatrix}\bm{\vec{\beta}_{i}}\end{bmatrix}^{i\in\{1,\ldots,I\}}=\begin{bmatrix}\beta_{i,l}\end{bmatrix}_{l\in A}^{i\in\{1,\ldots,I\}} (9)

Link priority values as defined above are a function of drone traffic evolution in the network, which in turn depends on customer demand. From a snapshot of the network state at a certain time-step, the short-term future of the network may be estimated, provided historical patterns of customer demand are known. The introduction of the ML framework serves this purpose. We develop a supervised ML model which takes as input a snapshot of the network (number of drones occupying each link) and returns as output an estimate of link priorities for the near future. This input snapshot of link occupation is captured for the beginning of every time interval and is denoted by:

si,l=rAt=(i1)DΔ(i1)D+Δχlr,sol(t) 1[t(i1)D<t+Δ]s_{i,l}=\sum\limits_{r\in\mathcal{R}_{A}}\ \sum\limits_{t=(i-1)D-\Delta}^{(i-1)D+\Delta}\ \chi^{r\uparrow,sol}_{l}(t)\ \mathds{1}[t\leq(i-1)D<t+\Delta] (10)

where Δ=ρlv\Delta=\frac{\rho_{l}}{v} is the time taken by a drone to traverse link ll. Eq. (10) increments the value of si,ls_{i,l} for every active drone occupying link ll at the beginning of interval ii (i.e., t=(i1)Dt=(i-1)D). We define a collection vector to store link occupancy values for all intervals.

𝑺=[𝒔𝒊]i{1,,I}=[si,l]lAi{1,,I}\bm{\vec{S}}=\begin{bmatrix}\bm{\vec{s}_{i}}\end{bmatrix}^{i\in\{1,\ldots,I\}}=\begin{bmatrix}s_{i,l}\end{bmatrix}_{l\in A}^{i\in\{1,\ldots,I\}} (11)

𝑺\bm{\vec{S}} serves as the input feature vector dataset and 𝑩\bm{\vec{B}} as the target output vector dataset while training the ML model. Once trained, it is deployed in the Surrogate ILP to estimate the values of βi,l\beta_{i,l} which are subsequently used in the surrogate objective function (7). Note that in the training phase, all βi,l\beta_{i,l} – and hence 𝑩\bm{\vec{B}} – are calculated via Eq. (8) to produce the output targets of the training dataset. In the testing and execution of the Surrogate ILP, however, the values of βi,l\beta_{i,l} are predicted by the ML model.

In the training scheme, the number of training intervals is designed to be much larger than that in a test instance (Itraining>>II_{training}>>I). This is equivalent to processing several test instances successively without emptying the network between two consecutive test instances. At the beginning of each training interval ii, the link occupancy on each link ll by currently active requests is calculated and stored in si,ls_{i,l}. For this we define the function LinkActivity which computes si,ls_{i,l} as per Eq. (10). This function takes two inputs – the route μr\mu_{r} of an active request and the link occupancy vector 𝒔𝒊\vec{\bm{s_{i}}} for the current interval. It iterates through the route to determine which link the UAV occupied at the beginning of the interval (t=(i1)Dt=(i-1)D) and increments the corresponding si,ls_{i,l} in 𝒔𝒊\vec{\bm{s_{i}}}. Next, a lookahead is performed by virtually simulating the future of the network for the next IvirtualI_{virtual} number of intervals. For these intervals, virtual requests V\mathcal{R}_{V} are created, imitating historical data. These virtual requests are then routed through the network, obeying link capacity and conflict resolution constraints in the presence of active requests. The routes of successfully serviced virtual requests then contribute to the values of βi,l\beta_{i,l} for interval ii. For a link ll, βi,l\beta_{i,l} is equal to the number of times link ll was traversed by virtual requests in the lookahead simulation. This captures the perceived priority of link ll in the foreseeable near future, from the perspective of the network at interval ii, in accordance with Eq. (8), with δ=Ivirtual\delta=I_{virtual}. We define the function BetaUpdate, which takes two inputs: the route μr\mu_{r} of a virtual request and the link priority vector 𝜷𝒊\vec{\bm{\beta_{i}}} for the current interval, and updates βi,l\beta_{i,l} for all the links that this request traverses according to μr\mu_{r} .

Once the virtual lookahead is complete, all virtual request paths are erased from the network. Thereafter, the process resembles the Myopic ILP. Idle requests from previous intervals and incoming requests of current interval ii are routed. Successfully accepted requests are stored in \mathcal{R}_{\checkmark}. Successful paths μr\mu_{r} for a given request rr are also stored in a route vector θr=(r,tor,μr)\theta_{r}=(r,t_{o_{r}},\mu_{r}) and added to the collective route vector Θ\vec{\Theta}. Then the next interval i+1i+1 is initiated.

4.3.2 kk-Nearest Neighbors multi-output regressor

We use kk-Nearest Neighbors (kkNN) regression to train a predictive model for estimating relative link priority based on network snapshots. The kkNN regressor is a supervised ML algorithm used for prediction tasks. It predicts the output target value of a new data point by considering the average or weighted average of its kk nearest neighbors in the input feature space. The kkNN algorithm utilizes a defined distance metric, such as the Euclidean distance, to measure the similarity between data points in the input feature space. By identifying the kk training feature vectors closest to a new data point, the algorithm establishes the nearest neighbors of the given data point. The prediction of the target value of the new data point is made by averaging or weighting the target values associated with its kk nearest neighbors, leveraging the assumption that closer points in the feature space exhibit similar target values. The kkNN multi-output regressor extends the concept of the kkNN regressor to handle multiple target values for each data point. Notably, the kkNN regressor is non-parametric and makes minimal assumptions about the underlying data distribution, making it more versatile in handling diverse datasets. It can also capture non-linear relationships between input features and output targets. We use the training data synthesized using the lookahead simulation procedure described in Section 4.3.1 to train the kkNN regressor.

4.3.3 Reservation heuristic

In practice, the training phase has an associated time budget, implying that link priority values must be estimated expediently. Solving the Myopic ILP for ItrainingI_{training} intervals demands excessive time and memory. Levin and Rey (2023) propose a reservation heuristic which routes incoming customer requests quickly. This heuristic is used within the training phase. We illustrate the reservation heuristic and formalize the training scheme in Algorithm 3.

The original spatial network G=(N,A)G=(N,A) can be converted into a spatio-temporal network GTG^{T} as demonstrated in Figure 3. GTG^{T} is the time-expanded dual graph of GG. To construct GTG^{T}, the nodes and links are extended across time through discrete time levels and connections are made on the basis of link travel times. Suppose a link in GG connects node n1n_{1} to node n2n_{2} and can be traversed in Δt\Delta t time steps. Then it connects two nodes Δt\Delta t time levels apart in GTG^{T}. The graph GTG^{T} can be viewed as a network of node-time tuples (n,tn,t), where nn denotes the node and tt is the time level on which it is located. The reservation heuristic works by routing customer requests on GTG^{T} using a shortest-path finding algorithm. The route is then “reserved” for that request by securing link capacity in space-time, making that capacity unavailable for other requests. Since turn conflict constraints must also be respected in GTG^{T}, the heuristic removes connectivity between node-time tuples whenever constraints (1e)–(1f) and (1h)–(1l) are violated. Algorithm 2 defines the function NewReservation which discovers a path μrT\mu_{r}^{T} for request rr, makes the relevant reservations in GTG^{T} and deduces the corresponding path μr\mu_{r} in GG. Two additional functions are also required in the training phase. A request may already have a path, but reservations for this path may either need to be made or erased. We use functions Reserve(μrT,GT\mu_{r}^{T},G^{T}) and EmptyReservation(μrT,GT\mu_{r}^{T},G^{T}) for these purposes respectively. They follow a similar procedure to NewReservation and are hence not illustrated.

Refer to caption
Figure 3: (a) Original Network (b) Time-expanded Network
Input: r,GTr,G_{T}
Output: μr\mu_{r}
1 Function NewReservation(r,GTr,G_{T}):
2       find space-time path μrT\mu_{r}^{T} in GTG_{T} satisfying
3       1: previous reservations for (1e)–(1f) and (1h)–(1l)
4       2: spatial and temporal constraints of request rr
5       if μrT\mu_{r}^{T} found then
6             make reservations (break connectivity) in GTG_{T} as per (1e)–(1f) and (1h)–(1l) for μrT\mu_{r}^{T}
7             create μr\mu_{r} in GG from μrT\mu_{r}^{T}
8             return μr\mu_{r}
9      else
10             return false
11      
End Function
Algorithm 2 Pseudo-code for reservation heuristic
1 initialize request sets \mathcal{R}_{\checkmark} and collective route vector Θ\vec{\Theta}
2 set si,l0s_{i,l}\leftarrow 0 and βi,l0lA,i{1,,Itraining}\beta_{i,l}\leftarrow 0\ \forall\ l\in A,\forall\ i\in\{1,\ldots,I_{training}\}
3 create GTG^{T} from GG
4 for i1i\leftarrow 1 to ItrainingI_{training} do
5       initialize request sets A,I,Ci\mathcal{R}_{A},\mathcal{R}_{I},\mathcal{R}_{C}^{i}
       // request segregation as in Algorithm 1 (lines 6 to 12)
       // reserve active requests and record link traversals
6       for rAr\in\mathcal{R}_{A} do
7             Reserve(μrT,GT\mu_{r}^{T},G_{T})
8             LinkActivity(μr,𝒔𝒊\mu_{r},\vec{\bm{s_{i}}})
9            
      // virtual requests
10       for iv1i_{v}\leftarrow 1 to IvirtualI_{virtual} do
11             create virtual requests for interval ivi_{v} and store in V\mathcal{R}_{V}
12            
13      for rVr\in\mathcal{R}_{V} do
14             if NewReservation(r,GTr,G_{T}) then
15                   BetaUpdate(μr,𝜷𝒊\mu_{r},\vec{\bm{\beta_{i}}})
16                  
17            
18      for rVr\in\mathcal{R}_{V} do
19             EmptyReservation(μrT,GT\mu_{r}^{T},G_{T})
20            
      // solve real requests of current interval and update/add drone routes
21       populate Ci\mathcal{R}_{C}^{i}
22       for rOir\in\mathcal{R}_{O}^{i} do
23             μr\mu_{r}\leftarrow NewReservation(r,GTr,G_{T})
24             if rIr\in\mathcal{R}_{I} then
25                   update θr(r,tor,μr)\theta_{r}\leftarrow\ (r,t_{o_{r}},\mu_{r})
26            if rCir\in\mathcal{R}_{C}^{i} then
27                   if μr\exists\mu_{r} then
28                         {r}\mathcal{R}_{\checkmark}\leftarrow\mathcal{R}_{\checkmark}\cup\{r\}
29                         θr(r,tor,μr)\theta_{r}\leftarrow\ (r,t_{o_{r}},\mu_{r})
30                         ΘΘ{θr}\vec{\Theta}\leftarrow\vec{\Theta}\cup\{\theta_{r}\}
31                        
32                  
33            
      // empty all reservations to prepare for next interval
34       for rOiAr\in\mathcal{R}_{O}^{i}\cup\mathcal{R}_{A} do
35             EmptyReservation(μrT,GT\mu_{r}^{T},G_{T})
36            
      // remove completed requests as in Algorithm 1 (lines 33 to 36)
37      
return 𝑩\bm{\vec{B}} and 𝑺\bm{\vec{S}}
Algorithm 3 Training data synthesis for predicting relative link priority
1 for d1d\leftarrow 1 to NdaysN_{days} do
2       initialize request set \mathcal{R}_{\checkmark} and collective route vector Θ\vec{\Theta}
3       initialize α\alpha-profile [αi]i{1,,I}\begin{bmatrix}\alpha_{i}\end{bmatrix}^{i\in\{1,\ldots,I\}}
4       set si,l0s_{i,l}\leftarrow 0 and βi,l0lA,i{1,,I}\beta_{i,l}\leftarrow 0\ \forall\ l\in A,\forall\ i\in\{1,\ldots,I\}
5       for i1i\leftarrow 1 to II do
6             initialize request sets A,I,Ci\mathcal{R}_{A},\mathcal{R}_{I},\mathcal{R}_{C}^{i}
             // request segregation as in Algorithm 1 (lines 6 to 12)
7             for rAr\in\mathcal{R}_{A} do
8                   LinkActivity(μr,𝒔𝒊\mu_{r},\vec{\bm{s_{i}}})
9                  
            // query ML component for link priorities & standardize
10             𝜷𝒊\bm{\vec{\beta_{i}}}\leftarrow kkNN(𝒔𝒊\bm{\vec{s_{i}}})
11             βmax\beta^{\max}\leftarrow max(𝜷𝒊)\max(\bm{\vec{\beta_{i}}})
12             𝜷𝒊[βi,lβmax]lA\bm{\vec{\beta_{i}}}\leftarrow\begin{bmatrix}\frac{\beta_{i,l}}{\beta^{\max}}\end{bmatrix}_{l\in A}
13             populate Ci\mathcal{R}_{C}^{i}
             // model formulation as in Algorithm 1 (lines 15 to 20)
14             add surrogate objective maxZ=rIzrpr+αilAβi,lt=(i1)DiD[ClrIχlr(t)]\max\ Z=\sum\limits_{r\in\mathcal{R}_{I}}z_{r}p_{r}\ +\alpha_{i}\sum\limits_{l\in A}\beta_{i,l}\sum\limits_{t=(i-1)D}^{iD}\left[C_{l}-\sum\limits_{r\in\mathcal{R}_{I}}\chi^{r\uparrow}_{l}(t)\right]
15             solve modified Formulation (1) with surrogate objective
             // update/add drone routes as in Algorithm 1 (lines 24 to 31)
             // remove completed requests as in Algorithm 1 (lines 33 to 36)
16            
17      
Algorithm 4 Surrogate ILP policy

4.4 Surrogate ILP policy

The Surrogate ILP policy is executed after completing the training procedure described in Section 4.3. The Surrogate ILP policy iteratively solve the Surrogate ILP where the surrogate objective function is parameterized by the predictive prescriptions for relative link priority (βi,l\beta_{i,l}) and the balance parameter (αi\alpha_{i}). Recall that the ratio of profit maximization to link priority-proportional slack creation is controlled via the balance parameter αi\alpha_{i}. The magnitude of this balance parameter may be varied from one time interval to the next. Since αi\alpha_{i} is coupled with the slack component, a greater magnitude favours slack creation and a smaller one favours profit maximization. Too large a value of αi\alpha_{i}, sustained over many intervals, can impose a vacuum in the network which degrades cumulative profit acquired by injecting excessive reserve capacities. Too small a value can make its contribution insignificant, and a negative value is likely to suffocate the network by overloading link capacities. It follows that there exists a range of desirable values for αi\alpha_{i}, which we explore in our numerical experiments. The variation (or lack thereof) in the values of αi\alpha_{i}, denoted by the sequence [αi]i{1,,I}\begin{bmatrix}\alpha_{i}\end{bmatrix}^{i\in\{1,\ldots,I\}}, is called the α\alpha-profile. In our numerical experiments, three types of α\alpha-profiles are investigated – constant, step-wise and continuous.

The pseudo-code the Surrogate ILP policy is presented in Algorithm 4. At the start of every interval, the current active requests (which were accepted in previous intervals) are segregated. The function LinkActivity populates the link occupancy vector 𝒔𝒊\vec{\bm{s_{i}}} in accordance with the positions of active requests. The trained kkNN regressor is then queried with 𝒔𝒊\vec{\bm{s_{i}}} as input, after which it supplies a predictive prescription of link priorities in the form of 𝜷𝒊\vec{\bm{\beta_{i}}}. Once the link priority vector is standardized, it is then implemented in the parameterized surrogate objective function, along with the chosen α\alpha-profile.

5 Numerical experiments

We conduct numerical experiments to assess the performance of the proposed data-driven optimization approach, i.e. the Surrogate ILP, for the online DDSP. We use the Myopic ILP as a baseline for benchmarking and we also conduct sensitivity analyses on learning parameters of the ML-enabled surrogate objective function.

We conduct these experiments on the Sioux Falls network, since it is an established data set for traffic routing problems and has been used in by Levin and Rey (2023) for the deterministic DDSP. There are 24 nodes and 76 links in this network. The length of the time horizon is TT = 60 minutes, with 12 intervals of 5 minutes each (II = 12 and DD = 5 minutes). For routing and capacity tracking, the time horizon is discretized into time steps of a single minute. After 60 minutes elapse, the network still remains operational as long as the final accepted request reaches its destination. Thus the network remains active until maxrRdr\max\limits_{r\in R_{\checkmark}}d_{r}. We focus on the proof-of-concept for the proposed novel solution methodology. In line with this goal, we maintain all link capacities at 1 drone per minute, even though this is probably more restrictive than is necessary to avoid collisions. The size of the ILPs are a function of the number of variables, and assigning small capacities permits solving the problem on a large network with capacity constraints. The number of customer requests submitted in each interval follows a Poisson process with rate λ\lambda = 100, which amounts to an expected total of 1200 requests over the course of 12 intervals. Within an interval ii, the request submission times are uniformly distributed between the start and end of the interval [(i1)D,iD)[(i-1)D,iD). The spatial distribution of request origins oro_{r} and destinations drd_{r} is exponential with respect to the y-coordinate, with origins dominating the bottom of the network and destinations occupying the top. For each request, the earliest permissible departure time ere_{r} varies uniformly between [0,10][0,10] minutes of request submission time trt_{r}. The arrival time window at the destination [lr,ur][l_{r},u_{r}] follows a skewed normal distribution between expected travel time and the end of the time horizon, skewed towards expected travel time. The price (profit) of each request is uniformly distributed between [1,10][1,10].

We implemented the proposed algorithms in Python on a Windows machine with 3.19 GHz and 64 GB of RAM. We use CPLEX 22.1.0.0 to solve the test instances, and we use a solve time limit of 300 seconds for each interval of a given instance.

5.1 α\alpha-profiles

We consider three types of α\alpha-profiles: constant, step-wise and continuous. The performance of constant profiles can be used to determine the desirable range of αi\alpha_{i}, after which temporal variations can be introduced to extend the analysis. It should be noted that even though the time intervals are discrete, a continuous α\alpha-profile is still permissible, because αi\alpha_{i} is only evaluated at discrete points marking the beginning of each interval. A summary of the α\alpha-profiles experimented with is provided in Table 1.

An intuitive approach consists of starting with a high value of αi\alpha_{i} and reduce it over time. This realizes the strategy of preferentially creating link priority-based reserve capacity in the initial intervals and transitioning into profit accumulation towards the end of the time horizon. All non-constant profiles presented are non-increasing for this reason. However, constant α\alpha-profiles have significant potential too, since the kkNN model is adaptive and returns a different link priority vector at each interval. This implies that the priority-based slack creation is also adaptive and may relieve some of the adjustment burden from αi\alpha_{i}.

Name α\alpha-profile composition Type
SP_CTE1 α=2\alpha=2 Constant
SP_CTE2 α=1.5\alpha=1.5
SP_CTE3 α=1.25\alpha=1.25
SP_CTE4 α=1\alpha=1
SP_CTE5 α=0.5\alpha=0.5
SP_CTE6 α=1\alpha=-1
SP_STP1 α\alpha = 1.5 → 1.25 | 6:6 Step-wise
SP_STP2 α\alpha = 1.5 → 1.25 | 8:4
SP_STP3 α\alpha = 1.5 → 1.25 → 1 | 4:4:4
SP_STP4 α\alpha = 1.5 → 1.25 → 1 | 7:3:2
SP_STP5 α\alpha = 1.25 → 1 | 6:6
SP_STP6 α\alpha = 1.25 → 1 | 8:4
SP_PLY1 α(t)=0.01t20.15545t+1.5\alpha(t)=0.01t^{2}-0.15545t+1.5 Continuous
SP_PLY2 α(t)=0.01t2+0.06455t+1.5\alpha(t)=-0.01t^{2}+0.06455t+1.5
SP_PLY3 α(t)=0.04545t+1.5\alpha(t)=-0.04545t+1.5
SP_PLY4 α(t)=e0.063t+0.5\alpha(t)=e^{-0.063t}+0.5
Table 1: Summary of α\alpha-profiles for the Surrogate ILP

5.2 Myopic ILP policy

We first examine the performance of the Myopic ILP policy on 5 problem instances based on the Sioux Falls network as described above. Table 2 encapsulates the results. The Myopic ILP achieved an average profit of 3178 across all 5 instances, with the expected maximum possible profit being pavg𝐄[||]=6600p_{avg}\mathbf{E}[|\mathcal{R}|]=6600. The average service rate was 43.6%. These performance results serve as a baseline for bench-marking the proposed data-driven optimization approach. The average solve-time for an interval across all 5 instances was 123.3 seconds. We observed that if an optimal solution was reached, then the solve-time was below 50 seconds, but for some intervals the solver reached the time limit of 300 seconds and gave the best feasible solution it could find. The latter case occurred in the initial three to four intervals, presumably because the network traffic was still evolving and had not reached complete saturation. In such a situation, many idle requests have not yet been dispatched and incoming requests continue to arrive at each interval, making it difficult to reach optimality. As idle requests begin to enter the network in greater numbers and network traffic increases, the pending request load subsides and optimality can be reached more easily.

Instance Total Requests Service Rate Profit
0 1203 41.6% 3076
1 1125 45.5% 3249
2 1260 41.7% 3260
3 1184 44.5% 3216
4 1147 44.0% 3089
Table 2: Performance of the Myopic ILP

5.3 Training kkNN multi-output regressor

The training instance contains 2000 intervals, instead of the usual 12 for test instances. The lookahead simulation contained 5 virtual intervals. We ran the training scheme described in Algorithm 3 on this instance and obtained the training dataset where the input feature set is 𝑺\bm{\vec{S}} and the output target set is 𝑩\bm{\vec{B}}. We employed the reservation heuristic to handle the size of the training instance. The time-expanded dual graph GTG^{T} is constructed up to a time layer tmax+Δt_{\max}+\Delta where tmax=maxrdrt_{\max}=\max\limits_{r\in\mathcal{R}}d_{r} of the training requests. The purpose of the buffer Δ\Delta is to account for the virtual requests being created in the final interval, some of whose arrival windows may exceed tmaxt_{\max}.

The completed training dataset is passed through a kkNN multi-output regressor. We implemented kkNN through the Scikit-learn package in Python. As mentioned previously, the collection vectors 𝑺\bm{\vec{S}} and 𝑩\bm{\vec{B}} contain the link occupancy vectors 𝒔𝒊=[si,l]lA\bm{\vec{s_{i}}}=\begin{bmatrix}s_{i,l}\end{bmatrix}^{l\in A} and link priority vectors 𝜷𝒊=[βi,l]lA\bm{\vec{\beta_{i}}}=\begin{bmatrix}\beta_{i,l}\end{bmatrix}^{l\in A} respectively for all training intervals. During the execution of the training scheme, we converted the link occupancy vector 𝒔𝒊\bm{\vec{s_{i}}} into a 2-D node adjacency matrix 𝑵𝒔\bm{N_{s}} where the row and column indices of the matrix are the nodes of the network. Each entry sm,ns_{m,n} of this matrix represents a directed link from node mm to node nn along link ω(m,n)\omega(m,n), and the value of the entry equals the link occupancy. This transformation is useful because it encodes the state of the network in a more information-rich format. Apart from storing link occupancy, the node adjacency matrix also conveys the network topology (for example, two separate non-zero entries in the same row mm represent two outgoing links with mm as the common source). With 𝑵𝒔\bm{N_{s}} as input, the kkNN regressor is trained to estimate the link priorities as output. The number of neighbours considered for each input is set to 60, and the distance metric is set to Euclidean distance.

Figure 4(a) demonstrates one pair of input and target output vectors in the training scheme (training interval 500). The input is the snapshot of the network at the beginning of training interval 500, depicting the number of drones (UAVs) on each link. After simulating 5 virtual intervals, the link traversals of the accepted virtual requests give rise to the desired target output as shown in Figure 4(b). This kkNN model is trained on such input and target output pairs for the entire duration of the training scheme. We note that the spatial distribution of request origins and destinations influences the link priorities. Since origins dominate the bottom portion of the network and destinations dominate the top, links directed southwards (top to bottom) are of least priority. Links within the central portion of the network, which carry traffic in the opposite direction (bottom to top) are of high priority. Some links responsible for lateral movement are also of moderate to high importance.

Refer to caption
(a) Input: Link Occupancy vector 𝒔𝒊=[si,l]lA\bm{\vec{s_{i}}}=\begin{bmatrix}s_{i,l}\end{bmatrix}^{l\in A}
Refer to caption
(b) Target Output: Link Priority vector 𝜷𝒊=[βi,l]lA\bm{\vec{\beta_{i}}}=\begin{bmatrix}\beta_{i,l}\end{bmatrix}^{l\in A}
Figure 4: (a) Input and (b) Target output vectors for training interval 500 in the training scheme. Note: drones occupying the same link in the network need not be equidistant but have been shown as such for simplicity.

5.4 Constant αi\alpha_{i}

We conduct a sensitivity analysis to quantify the impact of the α\alpha-profile onto the performance of the Surrogate ILP policy. Since αi=0i{1,,I}\alpha_{i}=0\ \forall i\in\{1,\ldots,I\} is equivalent to the Myopic ILP, we perturbed αi\alpha_{i} above and below 0 and observed the effect on cumulative profit acquired. The cumulative profit of the Surrogate ILP is used to determine the Profit Gap % with respect to the Myopic ILP and serves as the main performance metric. We similarly recorded the cumulative service (total requests accepted) and Service Gap %. The performance of the Surrogate ILP policy with constant α\alpha-profiles are displayed in Table 3.

As expected, a fully negative α\alpha-profile (SP_CTE6) performed worse than the Myopic ILP, because the surrogate objective seeks to minimize slack and force traffic though the network in a counter-productive manner. The influence of the link priorities βi,l\beta_{i,l} further worsened the solution, because high priority links are overloaded more in proportion to their priority. In contrast, a positive alpha profile performed better than the benchmark, but the profit gap decreases if α\alpha is exceedingly large. This can be explained by observing that too large an alpha maintained over the entire time horizon drains the network excessively and prevents profitable requests from being routed successfully. For the given network and the demand scenario presented, constant α\alpha profiles of 1.5, 1.25 and 1 performed the best, with an average gap across the 5 problem instances of 9.43%, 10.62% and 9.73% respectively. For these three α\alpha profiles, the average solve-times for each interval across the 5 problem instances are 148.5, 150.0, 142.4 seconds respectively. A similar pattern to the myopic ILP emerged, whereby initial three to four intervals deplete the solve-time limit but thereafter optimality is reached well before it.

We observed that the Surrogate ILP policies served slightly fewer requests than the Myopic ILP policy, but attained a larger profit. The Myopic ILP behaved greedily – optimal in the current interval (for initial intervals) but negligent of the future – and lost out on potentially profitable requests in later intervals. Instead, the Surrogate ILPs maintained reserve capacity on crucial links which enabled the acceptance of high profit requests no matter in which interval they arrived. It is possible however, to create excessive reserves that begin to hinder traffic through the network, as evidenced by the inferior performance (relative to other Surrogate ILPs) of SP_CTE1. With αi=2\alpha_{i}=2, SP_CTE1 had the lowest service rate among all Surrogate ILPs but was also the worst performing policy that still outperformed the Myopic ILP policy.

Instance Total Arrived Myopic ILP α\alpha-profile Service Rate Profit Profit Gap Profit Gap % Service Gap Service Gap %
Service Rate Profit
0 1203 41.6% 3076 SP_CTE1 37.0% 3255 179 5.8% -56 -4.7%
SP_CTE2 40.6% 3497 421 13.7% -13 -1.0%
SP_CTE3 40.9% 3474 398 12.9% -9 -0.7%
SP_CTE4 41.6% 3413 337 11.0% -1 0.0%
SP_CTE5 42.6% 3308 232 7.5% 11 1.0%
SP_CTE6 41.5% 3062 -14 -0.5% -2 -0.1%
1 1125 45.5% 3249 SP_CTE1 40.1% 3290 41 1.3% -61 -5.4%
SP_CTE2 43.8% 3534 285 8.8% -19 -1.7%
SP_CTE3 45.0% 3590 341 10.5% -6 -0.5%
SP_CTE4 45.6% 3542 293 9.0% 1 0.1%
SP_CTE5 46.4% 3409 160 4.9% 10 0.9%
SP_CTE6 44.6% 3141 -108 -3.3% -10 -0.9%
2 1260 41.7% 3260 SP_CTE1 36.4% 3328 68 2.1% -67 -5.3%
SP_CTE2 39.7% 3563 303 9.3% -26 -2.0%
SP_CTE3 40.6% 3577 317 9.7% -14 -1.1%
SP_CTE4 41.3% 3564 304 9.3% -6 -0.4%
SP_CTE5 41.9% 3401 141 4.3% 2 0.2%
SP_CTE6 41.4% 3222 -38 -1.2% -4 -0.3%
3 1184 44.5% 3216 SP_CTE1 37.8% 3280 64 2.0% -80 -6.8%
SP_CTE2 40.7% 3443 227 7.1% -45 -3.8%
SP_CTE3 42.1% 3502 286 8.9% -29 -2.4%
SP_CTE4 43.3% 3515 299 9.3% -14 -1.2%
SP_CTE5 43.6% 3278 62 1.9% -11 -0.9%
SP_CTE6 43.7% 3195 -21 -0.7% -10 -0.8%
4 1147 44.0% 3089 SP_CTE1 37.1% 3148 59 1.9% -79 -6.9%
SP_CTE2 41.0% 3346 257 8.3% -35 -3.0%
SP_CTE3 43.0% 3430 341 11.0% -12 -1.0%
SP_CTE4 43.0% 3400 311 10.1% -12 -1.0%
SP_CTE5 43.9% 3590 501 16.2% -1 -0.1%
SP_CTE6 44.1% 2980 -109 -3.5% 1 0.1%
Table 3: Surrogate ILP policy with constant α\alpha-profile vs. Myopic ILP policy

5.5 Step-wise & continuous αi\alpha_{i}

Having demonstrated the persistent superiority of SP_CTE2, SP_CTE3 and SP_CTE4, we explore step-wise changes in the α\alpha-profile with 1.5, 1.25 and 1 as pivots. We named these Surrogate ILPs SP_STP1 through SP_STP6. For example, SP_STP1 has the α\alpha-profile 1.51.25| 6:61.5\rightarrow 1.25\ |\ 6:6, which depicts 6 intervals of αi=1.5\alpha_{i}=1.5 followed by 6 intervals of αi=1.25\alpha_{i}=1.25. Similarly, SP_STP2 has the α\alpha-profile 1.51.25| 8:41.5\rightarrow 1.25\ |\ 8:4, which depicts 8 intervals of αi=1.5\alpha_{i}=1.5 followed by 4 intervals of αi=1.25\alpha_{i}=1.25. The subsequent two step-wise profiles move from 1.5 all the way to 1, and the final two go from 1.25 to 1. In these 3 pairs of Surrogate ILP policies, the latter one captures a delayed transition to a lower αi\alpha_{i}.

Table 4 records the performance of this class of α\alpha-profiles. We observe a higher quality solution from the group of step-wise α\alpha-profiles than from the constant profiles. However, there is no clear superiority of one step-wise profile over another. We find that only SP_STP5 emerged as the best in 2 instances, while the remaining 3 instances are each dominated by adifferent step profile. As with SP_CTE2, SP_CTE3 and SP_CTE4, the service rate is less than that of the Myopic ILP policy but the cumulative profit is greater.

The link priority vector returned by the kkNN multi-output regressor is adaptive and different in each interval, since it is extremely unlikely for the input snapshot vector of link occupancy to be identical in two consecutive intervals. For this reason, it is difficult to establish a prevailing improvement over constant α\alpha-profiles with step-wise or continuous profiles. By continuous αi\alpha_{i} we imply a temporal function for the α\alpha-profile which is evaluated at discrete time-steps (the start of each interval). We approximated the step-wise algorithms SP_STP3 and SP_STP4 as continuous functions of time. These algorithms (SP_PLY1 to SP_PLY4) outperformed the Myopic ILP as well, as shown in Table 5.

In Table 6, we present a comparison of the step-wise and continuous α\alpha-profiles with the 3 best constant profiles. No sustained performance gap is observed, indicating that it is difficult to generate consistent improvement upon the adaptive kkNN component, provided the value of α\alpha is not arbitrarily chosen. In Table 7 we summarize the profits and profit gaps (relative to the Myopic ILP) of all surrogate objective functions in the form of a heat map.

Instance Total Arrived Myopic ILP α\alpha-profile Service Rate Profit Profit Gap Profit Gap % Service Gap Service Gap %
Service Rate Profit
0 1203 41.6% 3076 SP_STP1 40.2% 3444 368 12.0% -17 -1.4%
SP_STP2 40.5% 3498 422 13.7% -14 -1.1%
SP_STP3 41.3% 3501 425 13.8% -4 -0.3%
SP_STP4 40.2% 3481 405 13.2% -17 -1.4%
SP_STP5 41.6% 3511 435 14.1% -1 0.0%
SP_STP6 41.1% 3449 373 12.1% -7 -0.5%
1 1125 45.5% 3249 SP_STP1 43.8% 3568 319 9.8% -19 -1.7%
SP_STP2 44.4% 3584 335 10.3% -12 -1.1%
SP_STP3 43.6% 3485 236 7.3% -21 -1.9%
SP_STP4 44.7% 3588 339 10.4% -9 -0.8%
SP_STP5 45.2% 3547 298 9.2% -3 -0.3%
SP_STP6 44.4% 3517 268 8.2% -12 -1.1%
2 1260 41.7% 3260 SP_STP1 40.5% 3591 331 10.2% -16 -1.2%
SP_STP2 40.4% 3614 354 10.9% -17 -1.3%
SP_STP3 40.8% 3570 310 9.5% -12 -0.9%
SP_STP4 40.2% 3568 308 9.4% -19 -1.5%
SP_STP5 41.1% 3582 322 9.9% -8 -0.6%
SP_STP6 41.5% 3625 365 11.2% -3 -0.2%
3 1184 44.5% 3216 SP_STP1 41.9% 3536 320 10.0% -31 -2.6%
SP_STP2 41.6% 3492 276 8.6% -34 -2.9%
SP_STP3 42.1% 3470 254 7.9% -29 -2.4%
SP_STP4 42.4% 3552 336 10.4% -25 -2.1%
SP_STP5 43.6% 3556 340 10.6% -11 -0.9%
SP_STP6 42.5% 3470 254 7.9% -24 -2.0%
4 1147 44.0% 3089 SP_STP1 42.2% 3431 342 11.1% -21 -1.8%
SP_STP2 41.7% 3395 306 9.9% -27 -2.3%
SP_STP3 42.9% 3460 371 12.0% -13 -1.1%
SP_STP4 41.8% 3382 293 9.5% -26 -2.2%
SP_STP5 43.0% 3412 323 10.5% -12 -1.0%
SP_STP6 42.6% 3418 329 10.7% -16 -1.4%
Table 4: Surrogate ILP policy with step-wise α\alpha-profile vs. Myopic ILP policy
Instance Total Arrived Myopic ILP α\alpha-profile Service Rate Profit Profit Gap Profit Gap % Service Gap Service Gap %
Service Rate Profit
0 1203 41.6% 3076 SP_PLY1 41.1% 3461 385 12.5% -7 -0.5%
SP_PLY2 40.6% 3502 426 13.8% -13 -1.0%
SP_PLY3 41.7% 3542 466 15.1% 1 0.1%
SP_PLY4 40.9% 3458 382 12.4% -9 -0.7%
1 1125 45.5% 3249 SP_PLY1 44.6% 3514 265 8.2% -10 -0.9%
SP_PLY2 44.3% 3571 322 9.9% -14 -1.2%
SP_PLY3 45.0% 3562 313 9.6% -6 -0.5%
SP_PLY4 44.7% 3530 281 8.6% -9 -0.8%
2 1260 41.7% 3260 SP_PLY1 40.9% 3540 280 8.6% -11 -0.8%
SP_PLY2 40.9% 3648 388 11.9% -11 -0.8%
SP_PLY3 40.6% 3572 312 9.6% -14 -1.1%
SP_PLY4 40.8% 3576 316 9.7% -12 -0.9%
3 1184 44.5% 3216 SP_PLY1 43.2% 3501 285 8.9% -16 -1.3%
SP_PLY2 42.2% 3544 328 10.2% -27 -2.3%
SP_PLY3 42.8% 3546 330 10.3% -20 -1.7%
SP_PLY4 42.8% 3550 334 10.4% -20 -1.7%
4 1147 44.0% 3089 SP_PLY1 43.8% 3436 347 11.2% -3 -0.2%
SP_PLY2 41.8% 3379 290 9.4% -26 -2.2%
SP_PLY3 42.5% 3411 322 10.4% -18 -1.5%
SP_PLY4 42.9% 3453 364 11.8% -13 -1.1%
Table 5: Surrogate ILP policy with continuous α\alpha-profile vs. Myopic ILP policy
Step-wise and Continuous α\alpha-profiles
Constant α\alpha-profile Instance SP_STP1 SP_STP2 SP_STP3 SP_STP4 SP_STP5 SP_STP6 SP_PLY1 SP_PLY2 SP_PLY3 SP_PLY4
0 -1.5% 0.0% 0.1% -0.5% 0.4% -1.4% -1.0% 0.1% 1.3% -1.1%
1 1.0% 1.4% -1.4% 1.5% 0.4% -0.5% -0.6% 1.0% 0.8% -0.1%
2 0.8% 1.4% 0.2% 0.1% 0.5% 1.7% -0.6% 2.4% 0.3% 0.4%
3 2.7% 1.4% 0.8% 3.2% 3.3% 0.8% 1.7% 2.9% 3.0% 3.1%
SP_CTE2 4 2.5% 1.5% 3.4% 1.1% 2.0% 2.2% 2.7% 1.0% 1.9% 3.2%
0 -0.9% 0.7% 0.8% 0.2% 1.1% -0.7% -0.4% 0.8% 2.0% -0.5%
1 -0.6% -0.2% -2.9% -0.1% -1.2% -2.0% -2.1% -0.5% -0.8% -1.7%
2 0.4% 1.0% -0.2% -0.3% 0.1% 1.3% -1.0% 2.0% -0.1% 0.0%
3 1.0% -0.3% -0.9% 1.4% 1.5% -0.9% 0.0% 1.2% 1.3% 1.4%
SP_CTE3 4 0.0% -1.0% 0.9% -1.4% -0.5% -0.3% 0.2% -1.5% -0.6% 0.7%
0 0.9% 2.5% 2.6% 2.0% 2.9% 1.1% 1.4% 2.6% 3.8% 1.3%
1 0.7% 1.2% -1.6% 1.3% 0.1% -0.7% -0.8% 0.8% 0.6% -0.3%
2 0.8% 1.4% 0.2% 0.1% 0.5% 1.7% -0.7% 2.4% 0.2% 0.3%
3 0.6% -0.7% -1.3% 1.1% 1.2% -1.3% -0.4% 0.8% 0.9% 1.0%
SP_CTE4 4 0.9% -0.1% 1.8% -0.5% 0.4% 0.5% 1.1% -0.6% 0.3% 1.6%
Table 6: Step-wise and continuous α\alpha-profiles compared with constant α\alpha-profiles. Values represent the profit gap % between the two, with respect to the constant profiles.
Profit
Instance Myopic SP_CTE1 SP_CTE2 SP_CTE3 SP_CTE4 SP_CTE5 SP_CTE6 SP_STP1 SP_STP2 SP_STP3 SP_STP4 SP_STP5 SP_STP6 SP_PLY1 SP_PLY2 SP_PLY3 SP_PLY4
0 3076 3255 3497 3474 3413 3308 3062 3444 3498 3501 3481 3511 3449 3461 3502 3542 3458
1 3249 3290 3534 3590 3542 3409 3141 3568 3584 3485 3588 3547 3517 3514 3571 3562 3530
2 3260 3328 3563 3577 3564 3401 3222 3591 3614 3570 3568 3582 3625 3540 3648 3572 3576
3 3216 3280 3443 3502 3515 3278 3195 3536 3492 3470 3552 3556 3470 3501 3544 3546 3550
4 3089 3148 3346 3430 3400 3290 2980 3431 3395 3460 3382 3412 3418 3436 3379 3411 3453
Profit Gap % with respect to the Myopic ILP
Instance Myopic SP_CTE1 SP_CTE2 SP_CTE3 SP_CTE4 SP_CTE5 SP_CTE6 SP_STP1 SP_STP2 SP_STP3 SP_STP4 SP_STP5 SP_STP6 SP_PLY1 SP_PLY2 SP_PLY3 SP_PLY4
0 0.0% 5.8% 13.7% 12.9% 11.0% 7.5% -0.5% 12.0% 13.7% 13.8% 13.2% 14.1% 12.1% 12.5% 13.8% 15.1% 12.4%
1 0.0% 1.3% 8.8% 10.5% 9.0% 4.9% -3.3% 9.8% 10.3% 7.3% 10.4% 9.2% 8.2% 8.2% 9.9% 9.6% 8.6%
2 0.0% 2.1% 9.3% 9.7% 9.3% 4.3% -1.2% 10.2% 10.9% 9.5% 9.4% 9.9% 11.2% 8.6% 11.9% 9.6% 9.7%
3 0.0% 2.0% 7.1% 8.9% 9.3% 1.9% -0.7% 10.0% 8.6% 7.9% 10.4% 10.6% 7.9% 8.9% 10.2% 10.3% 10.4%
4 0.0% 1.9% 8.3% 11.0% 10.1% 6.5% -3.5% 11.1% 9.9% 12.0% 9.5% 10.5% 10.7% 11.2% 9.4% 10.4% 11.8%
Table 7: Heat map of cumulative profit and profit gap (%) w.r.t. Myopic ILP policy

5.6 Profit acquired per interval

Figure 5 depicts the profit acquired per interval for test instances 0 and 4 by the Myopic ILP, SP_CTE2 and SP_STP3. In both instances the first interval witnessed a larger profit acquisition by the Myopic ILP. Thereafter the surrogate models closed the distance and maintained the performance gap. Presumably, in the first interval, link priority based slack creation limits the number of profitable requests being serviced. But once this reserve capacity is capitalized upon in subsequent intervals, the profit gained per interval is larger for the surrogate models. Occasionally the Myopic ILP policy performs slightly better (interval 7 of instance 0 and interval 3 of instance 4), but by and large the Surrogate ILP policies dominate and acquire a larger cumulative profit.

As denoted in Table 1, SP_CTE2 sets αi=1.5iI\alpha_{i}=1.5\ \forall i\in I whereas SP_STP3 transitions from 1.5 to 1.25 to 1 in steps of 4 intervals each. This means that for the first four intervals, both surrogate policies have identical αi\alpha_{i} values (=1.5=1.5). However the profit acquired per interval is different from the second interval onwards, owing to the dynamic and adaptive nature of the kkNN component of the surrogate ILP policy.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: Profit acquired per interval for Myopic ILP, SP_CTE2 and SP_STP3 in (a) Instance 0 and (b) Instance 4

5.7 Significance of βi,l\beta_{i,l}

We conclude the numerical experiments by demonstrating the role of using data-driven predictive prescriptions to determine efficient link priority vectors 𝜷𝒊=[βi,l]lA\bm{\vec{\beta_{i}}}=\begin{bmatrix}\beta_{i,l}\end{bmatrix}^{l\in A}. The value of these estimations of link priority can be exhibited by comparison to an arbitrary policy which assumes equal importance (or priority) of all links in the network. Such a policy would assign a uniform link priority across the network. It would thereby delegate all responsibility for managing the differential slack in the network to the α\alpha-profile. Yet it would suffer from the disadvantage of being unable to create reserve capacity differentiated across space (links in the network), and would only do this across time (varying αi\alpha_{i} across intervals). To this end, we conducted experiments using the Surrogate ILP policy with a uniform standardized link priority vector 𝜷𝒊=[1|A|]lA\bm{\vec{\beta_{i}}}=\begin{bmatrix}\frac{1}{|A|}\end{bmatrix}^{l\in A} and report the outcome in Table 8. We observed that upon using a uniform link priority vector, the performance dropped sharply and in many cases even fell below the Myopic ILP policy benchmark. This shows that uniform slack creation across all network links fails to improve performance because the reserve capacities are introduced in a manner which cannot be taken advantage of. It is crucial to create reserve capacity differentiated across both space and time to outperform the benchmark of the myopic ILP.

Instance Total Arrived Myopic ILP α\alpha-profile β\beta Application Service Rate Profit Profit Gap Profit Gap % Service Gap Service Gap %
Service Rate Profit
0 1203 41.6% 3076 SP_CTE2 kNN 40.6% 3497 421 13.7% -13 -1.0%
uniform 41.5% 3109 33 1.1% -2 -0.1%
SP_STP3 kNN 41.3% 3501 425 13.8% -4 -0.3%
uniform 42.5% 3157 81 2.6% 10 0.9%
SP_PLY2 kNN 40.6% 3502 426 13.8% -13 -1.0%
uniform 41.3% 3091 15 0.5% -4 -0.3%
1 1125 45.5% 3249 SP_CTE2 kNN 43.8% 3534 285 8.8% -19 -1.7%
uniform 46.4% 3244 -5 -0.2% 10 0.9%
SP_STP3 kNN 43.6% 3485 236 7.3% -21 -1.9%
uniform 46.0% 3219 -30 -0.9% 6 0.5%
SP_PLY2 kNN 44.3% 3571 322 9.9% -14 -1.2%
uniform 46.4% 3272 23 0.7% 10 0.9%
2 1260 41.7% 3260 SP_CTE2 kNN 39.7% 3563 303 9.3% -26 -2.0%
uniform 42.3% 3303 43 1.3% 7 0.6%
SP_STP3 kNN 40.8% 3570 310 9.5% -12 -0.9%
uniform 41.7% 3313 53 1.6% -1 0.0%
SP_PLY2 kNN 40.9% 3648 388 11.9% -11 -0.8%
uniform 40.7% 3221 -39 -1.2% -13 -1.0%
3 1184 44.5% 3216 SP_CTE2 kNN 40.7% 3443 227 7.1% -45 -3.8%
uniform 42.6% 3091 -125 -3.9% -23 -1.9%
SP_STP3 kNN 42.1% 3470 254 7.9% -29 -2.4%
uniform 44.0% 3168 -48 -1.5% -6 -0.5%
SP_PLY2 kNN 42.2% 3544 328 10.2% -27 -2.3%
uniform 43.9% 3185 -31 -1.0% -7 -0.6%
4 1147 44.0% 3089 SP_CTE2 kNN 41.0% 3346 257 8.3% -35 -3.0%
uniform 44.4% 3005 -84 -2.7% 4 0.4%
SP_STP3 kNN 42.9% 3460 371 12.0% -13 -1.1%
uniform 44.0% 2975 -114 -3.7% 0 0.0%
SP_PLY2 kNN 41.8% 3379 290 9.4% -26 -2.2%
uniform 43.8% 2976 -113 -3.7% -3 -0.2%
Table 8: Performance comparison with uniform β\beta

6 Conclusion and perspectives

This study addresses the stochastic online version of the drone delivery service planning problem in urban airspace. The solution methods we explore are executed from the perspective of a network manager. Given an incoming Poisson process of drone delivery trip requests, we aim to find the optimal assignment of drones to space-time trajectories in an urban air traffic network within a pre-defined time horizon. We assume that drone deliveries must respect delivery time windows and that UAVs must fly at constant speed throughout the network to avoid airborne delays and congestion. Our main contribution is to expand the integer linear programming formulation of the deterministic DDSP to the stochastic variant with online demand. We also propose a novel solution methodology which leverages data-driven, machine learning-enabled predictions of network link priorities to construct a modified surrogate objective function. We outline a training scheme to train the machine learning component as well. The numerical results indicate that the Surrogate ILP policy outperforms the Myopic ILP policy which iteratively solves a for the objective of profit maximization. We demonstrate that even though the training procedure might be extensive, it is one-time investment returns rewards in the form of a sustained profit margin over the Myopic ILP policy across multiple instances.

The escalating adoption of digital technology by the public and private sectors is producing real-time uncertain demand for logistic service providers globally. This is happening concurrently with the accelerating adoption of UAVs in transport fleets to service regions with varying degrees of urbanization. The utility of the solution methodology proposed in this study lies at the intersection of these two trends. By outperforming conventional online optimization methods our approach unlocks considerable benefits for the service provider and for the customers. Moreover, we successfully establish link priority and available link capacity as pivotal factors for enhancing network performance. This implies that the strategy and procedure of non-myopic anticipatory resource allocation we present in this paper is transferable to other classes of network-based logistic problems such as pickup and delivery logistics and ride-sharing.

Multiple directions of further research can be identified. First, there is an opportunity to extend the data-driven optimization framework to learn efficient α\alpha-profiles automatically. This also necessitates the design of a training scheme that explores the α\alpha-profile space quickly and intelligently under time and computational budgets. Second, modifications can be imposed on the problem definition which make the DDSP more realistic, such as capacity constraints on each UAV, differential velocities and charging limitations. Extensions to the case of multiple deliveries per UAV also warrant further research. Although this stream of problem has been extensively studied through the lens of vehicle routing models, few studies have addressed the online demand case in a time-dependent routing context. The introduction and placement of charging platforms at selected nodes or a centralized drone hub somewhere in the network is also a compelling network-design possibility.

References

  • Asadi et al. (2022) A. Asadi, S. N. Pinkley, and M. Mes. A Markov decision process approach for managing medical drone deliveries. Expert Systems with Applications, 204:117490, 2022.
  • Baloch and Gzara (2020) G. Baloch and F. Gzara. Strategic network design for parcel delivery with drones under competition. Transportation Science, 54(1):204–228, 2020.
  • Basso et al. (2022) R. Basso, B. Kulcsár, I. Sanchez-Diaz, and X. Qu. Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transportation Research Part E: Logistics and Transportation Review, 157:102496, 2022.
  • Baty et al. (2024) L. Baty, K. Jungel, P. S. Klein, A. Parmentier, and M. Schiffer. Combinatorial optimization-enriched machine learning to solve the dynamic vehicle routing problem with time windows. Transportation Science, 2024.
  • Bengio et al. (2021) Y. Bengio, A. Lodi, and A. Prouvost. Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research, 290(2):405–421, 2021.
  • Bertsimas and Kallus (2020) D. Bertsimas and N. Kallus. From predictive to prescriptive analytics. Management Science, 66(3):1025–1044, 2020.
  • Bertsimas et al. (2019) D. Bertsimas, P. Jaillet, and S. Martin. Online vehicle routing: The edge of optimization in large-scale applications. Operations Research, 67(1):143–162, 2019.
  • Bonami et al. (2018) P. Bonami, A. Lodi, and G. Zarpellon. Learning a classification of mixed-integer quadratic programming problems. In International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 595–604. Springer, 2018.
  • Chen et al. (2021) H. Chen, Z. Hu, and S. Solak. Improved delivery policies for future drone-based delivery systems. European Journal of Operational Research, 294(3):1181–1201, 2021.
  • Chen et al. (2022) X. Chen, M. W. Ulmer, and B. W. Thomas. Deep Q-learning for same-day delivery with vehicles and drones. European Journal of Operational Research, 298(3):939–952, 2022.
  • Chen et al. (2023) X. Chen, T. Wang, B. W. Thomas, and M. W. Ulmer. Same-day delivery with fair customer service. European Journal of Operational Research, 308(2):738–751, 2023.
  • Chow (2016) J. Y. Chow. Dynamic uav-based traffic monitoring under uncertainty as a stochastic arc-inventory routing policy. International Journal of Transportation Science and Technology, 5(3):167–185, 2016.
  • Chu et al. (2021) H. Chu, W. Zhang, P. Bai, and Y. Chen. Data-driven optimization for last-mile delivery. Complex & Intelligent Systems, pages 1–14, 2021.
  • Congress et al. (2021) S. S. C. Congress, A. J. Puppala, A. Banerjee, and U. D. Patil. Identifying hazardous obstructions within an intersection using unmanned aerial data analysis. International journal of Transportation Science and Technology, 10(1):34–48, 2021.
  • Dayarian et al. (2020) I. Dayarian, M. Savelsbergh, and J.-P. Clarke. Same-day delivery with drone resupply. Transportation Science, 54(1):229–249, 2020.
  • Edwards et al. (2010) J. Edwards, A. McKinnon, T. Cherrett, F. McLeod, and L. Song. Carbon dioxide benefits of using collection–delivery points for failed home deliveries in the united kingdom. Transportation Research Record, 2191(1):136–143, 2010.
  • Elmachtoub and Grigas (2022) A. N. Elmachtoub and P. Grigas. Smart “Predict, then Optimize”. Management Science, 68(1):9–26, 2022.
  • Feng et al. (2022) S. Feng, P. Duan, J. Ke, and H. Yang. Coordinating ride-sourcing and public transport services with a reinforcement learning approach. Transportation Research Part C: Emerging Technologies, 138:103611, 2022.
  • Fruhling and Digman (2000) A. L. Fruhling and L. A. Digman. The impact of electronic commerce on business-level strategies. Journal of Electronic Commerce Research, 1(1):13, 2000.
  • Ghiasvand et al. (2024) M. R. Ghiasvand, D. Rahmani, and M. Moshref-Javadi. Data-driven robust optimization for a multi-trip truck-drone routing problem. Expert Systems with Applications, 241:122485, 2024.
  • Glock and Meyer (2020) K. Glock and A. Meyer. Mission planning for emergency rapid mapping with drones. Transportation Science, 54(2):534–560, 2020.
  • Gu et al. (2023) R. Gu, Y. Liu, and M. Poon. Dynamic truck–drone routing problem for scheduled deliveries and on-demand pickups with time-related constraints. Transportation Research Part C: Emerging Technologies, 151:104139, 2023.
  • Han et al. (2022) B. R. Han, T. Sun, L. Y. Chu, and L. Wu. Covid-19 and e-commerce operations: Evidence from alibaba. Manufacturing & Service Operations Management, 24(3):1388–1405, 2022.
  • Hecken et al. (2022) T. Hecken, S. Cumnuantip, and T. Klimmek. Structural design of heavy-lift unmanned cargo drones in low altitudes. Automated Low-Altitude Air Delivery: Towards Autonomous Cargo Transportation with Drones, pages 159–183, 2022.
  • Hong et al. (2019) J. Hong, M. Lee, T. Cheong, and H. C. Lee. Routing for an on-demand logistics service. Transportation Research Part C: Emerging Technologies, 103:328–351, 2019.
  • Kim and Park (2005) J. Kim and J. Park. A consumer shopping channel extension model: attitude shift toward the online store. Journal of Fashion Marketing and Management: An International Journal, 9(1):106–121, 2005.
  • Kornatowski et al. (2020) P. M. Kornatowski, M. Feroskhan, W. J. Stewart, and D. Floreano. A morphing cargo drone for safe flight in proximity of humans. IEEE Robotics and Automation Letters, 5(3):4233–4240, 2020.
  • Kruber et al. (2017) M. Kruber, M. E. Lübbecke, and A. Parmentier. Learning when to use a decomposition. In International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pages 202–210. Springer, 2017.
  • Larsen et al. (2022) E. Larsen, S. Lachapelle, Y. Bengio, E. Frejinger, S. Lacoste-Julien, and A. Lodi. Predicting tactical solutions to operational planning problems under imperfect information. INFORMS Journal on Computing, 34(1):227–242, 2022.
  • Levin and Rey (2023) M. W. Levin and D. Rey. Branch-and-price for drone delivery service planning in urban airspace. Transportation Science, 57(4):843–865, 2023.
  • Li et al. (2024) X. Li, P. Yan, K. Yu, P. Li, and Y. Liu. Parcel consolidation approach and routing algorithm for last-mile delivery by unmanned aerial vehicles. Expert Systems with Applications, 238:122149, 2024.
  • Liu (2019) Y. Liu. An optimization-driven dynamic vehicle routing algorithm for on-demand meal delivery using drones. Computers & Operations Research, 111:1–20, 2019.
  • Lodi and Zarpellon (2017) A. Lodi and G. Zarpellon. On learning and branching: a survey. Transactions in Operations Research, 25(2):207–236, 2017.
  • Lozzi et al. (2022) G. Lozzi, G. Iannaccone, I. Maltese, V. Gatta, E. Marcucci, and R. Lozzi. On-demand logistics: Solutions, barriers, and enablers. Sustainability, 14(15):9465, 2022.
  • Masorgo et al. (2023) N. Masorgo, S. Mir, and A. R. Hofer. You’re driving me crazy! how emotions elicited by negative driver behaviors impact customer outcomes in last mile delivery. Journal of Business Logistics, 2023.
  • Menouar et al. (2017) H. Menouar, I. Guvenc, K. Akkaya, A. S. Uluagac, A. Kadri, and A. Tuncer. Uav-enabled intelligent transportation systems for the smart city: Applications and challenges. IEEE Communications Magazine, 55(3):22–28, 2017.
  • Mohsan et al. (2023) S. A. H. Mohsan, N. Q. H. Othman, Y. Li, M. H. Alsharif, and M. A. Khan. Unmanned aerial vehicles (uavs): Practical aspects, applications, open challenges, security issues, and future trends. Intelligent Service Robotics, 16(1):109–137, 2023.
  • Moshref-Javadi and Winkenbach (2021) M. Moshref-Javadi and M. Winkenbach. Applications and research avenues for drone-based models in logistics: A classification and review. Expert Systems with Applications, 177:114854, 2021.
  • Muñoz-Villamizar et al. (2024) A. Muñoz-Villamizar, J. C. Velazquez-Martínez, and S. Caballero-Caballero. A large-scale last-mile consolidation model for e-commerce home delivery. Expert Systems with Applications, 235:121200, 2024.
  • Radoglou-Grammatikis et al. (2020) P. Radoglou-Grammatikis, P. Sarigiannidis, T. Lagkas, and I. Moscholios. A compilation of uav applications for precision agriculture. Computer Networks, 172:107148, 2020.
  • Rave et al. (2023) A. Rave, P. Fontaine, and H. Kuhn. Drone location and vehicle fleet planning with trucks and aerial drones. European Journal of Operational Research, 308(1):113–130, 2023.
  • Rejeb et al. (2023) A. Rejeb, K. Rejeb, S. J. Simske, and H. Treiblmaier. Drones for supply chain management and logistics: a review and research agenda. International Journal of Logistics Research and Applications, 26(6):708–731, 2023.
  • Sadana et al. (2023) U. Sadana, A. Chenreddy, E. Delage, A. Forel, E. Frejinger, and T. Vidal. A survey of contextual optimization methods for decision making under uncertainty. arXiv preprint arXiv:2306.10374, 2023.
  • Song et al. (2009) L. Song, T. Cherrett, F. McLeod, and W. Guan. Addressing the last mile problem: transport impacts of collection and delivery points. Transportation Research Record, 2097(1):9–18, 2009.
  • Tran-Thanh et al. (2012) L. Tran-Thanh, A. Chapman, A. Rogers, and N. Jennings. Knapsack based optimal policies for budget–limited multi–armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pages 1134–1140, 2012.
  • Van Duin et al. (2016) J. Van Duin, W. de Goffau, B. Wiegmans, L. Tavasszy, and M. Saes. Improving home delivery efficiency by using principles of address intelligence for b2c deliveries. Transportation Research Procedia, 12:14–25, 2016.
  • Van Hentenryck et al. (2010) P. Van Hentenryck, R. Bent, and E. Upfal. Online stochastic optimization under time constraints. Annals of Operations Research, 177:151–183, 2010.
  • Verhoef et al. (2015) P. C. Verhoef, P. K. Kannan, and J. J. Inman. From multi-channel retailing to omni-channel retailing: introduction to the special issue on multi-channel retailing. Journal of Retailing, 91(2):174–181, 2015.
  • Vlahovic et al. (2017) N. Vlahovic, B. Knezevic, and P. Batalic. Implementing delivery drones in logistics business process: Case of pharmaceutical industry. International Journal of Mechanical and Industrial Engineering, 10(12):4026–4031, 2017.
  • Yan and Wang (2022) R. Yan and S. Wang. Integrating prediction with optimization: Models and applications in transportation management. Multimodal Transportation, 1(3):100018, 2022.
  • Yang et al. (2023) Y. Yang, C. Yan, Y. Cao, and R. Roberti. Planning robust drone-truck delivery routes under road traffic uncertainty. European Journal of Operational Research, 309(3):1145–1160, 2023.
  • Yilmaz and Büyüktahtakın (2024) D. Yilmaz and İ. E. Büyüktahtakın. An expandable machine learning-optimization framework to sequential decision-making. European Journal of Operational Research, 314(1):280–296, 2024.