Differentially Private Stochastic Convex Optimization for Network Routing Applications
Abstract
Network routing problems are common across many engineering applications. Computing optimal routing policies requires knowledge about network demand, i.e., the origin and destination (OD) of all requests in the network. However, privacy considerations make it challenging to share individual OD data that would be required for computing optimal policies. Privacy can be particularly challenging in standard network routing problems because sources and sinks can be easily identified from flow conservation constraints, making feasibility and privacy mutually exclusive.
In this paper, we present a differentially private algorithm for network routing problems. The main ingredient is a reformulation of network routing which moves all user data-dependent parameters out of the constraint set and into the objective function. We then present an algorithm for solving this formulation based on a differentially private variant of stochastic gradient descent. In this algorithm, differential privacy is achieved by injecting noise, and one may wonder if this noise injection compromises solution quality. We prove that our algorithm is both differentially private and asymptotically optimal as the size of the training set goes to infinity. We corroborate the theoretical results with numerical experiments on a road traffic network which show that our algorithm provides differentially private and near-optimal solutions in practice.
1 Introduction
Network routing problems appear in many important topics in engineering, including traffic routing in transportation systems, power routing in electrical grids, and packet routing in distributed computer systems. Network routing problems study settings where resources must be delivered to customers through a network with limited bandwidth. The goal is typically to route resources to their respective customers as efficiently as possible, or equivalently, with as little network congestion as possible.
One key challenge in network routing problems is that the requests (i.e., network demand) are often not known in advance. Indeed, it is difficult to know exactly how much power a neighborhood will need, or exactly how many visits a particular website will receive on any given day. Since information about the demand is often necessary to develop optimal or near-optimal network routing solutions, network routing algorithms often need a way of obtaining or estimating future demand. With the advent of big data and internet-of-things systems, crowd-sourcing has gained popularity as a demand forecasting approach for network routing systems. In crowd-sourcing, customers submit their request history to the network operator. The network operator uses this historical data to train a statistical or machine learning model to predict future demand from historical demand.
While crowd-sourcing provides a bountiful supply of data for training demand forecasting models, it can also introduce potential privacy concerns. Since crowd-sourcing uses individual-level customer data to train demand forecasting models, the model’s outputs may reveal sensitive user information, especially if it overfits to its training data [CTW+21]. Such privacy risks are problematic because they may deter users from sharing their data with network operators, hence reducing the supply of training data for demand forecasting models.
To address such concerns, the demand forecasting pipeline should be augmented with privacy-preserving mechanisms. Differential privacy [DMNS06] is a principled and popular method to occlude the influence a single user’s data on the result of a population study while also maintaining the study’s accuracy. This is done by carefully injecting noise into the desired computation so that data sets that differ by at most one data point will produce statistically indistinguishable results.
Providing differential privacy guarantees for the standard formulation of network routing is difficult because the constraints contain user data, meaning that in general feasibility and privacy become mutually exclusive. More specifically, in the standard network routing problem, the demand sources and sinks can be identified by checking for conservation of flow, and as a result, the presence of a user going to or from a very rare location can be detected from any feasible solution. Because differential privacy requires that the presence of any single user’s data be difficult to detect from the algorithm’s output, privacy and feasibility are at odds with one another in the standard formulation.
1.1 Statement of Contributions
In this paper we present a differentially private algorithm for network routing problems. The main ingredient is a reformulation of network routing which moves all user data dependent parameters out of the constraint set and into the objective function. We then present an algorithm for solving this formulation based on differentially private variants of stochastic gradient descent. In this algorithm, differential privacy is achieved by injecting noise, and one may wonder if this noise injection compromises solution quality. We prove that our algorithm is differentially private and under several reasonable regularity conditions, is also asymptotically optimal (as the size of the training set goes to infinity). We note that in exchange for becoming compatible with differentially private algorithms, this new formulation is more computationally expensive.
1.2 Related Work
Traffic assignment in transportation systems is one of the most well-known applications of network routing. Herein we focus our literature review on privacy research in transportation networks. Privacy works in transportation mainly focus on location privacy, where the objectives is to prevent untrusted and/or external entities from learning geographic locations or location sequences of an individual [BS03]. Privacy-preserving approaches have been proposed for various location-based applications, e.g., trajectory publishing, mobile crowdsensing, traffic control, etc. These techniques are based on spatial cloaking [CML11] and differential privacy [Dwo08]. While not the setting of interest for this paper, there are many works that use Secure Multi-Party Computation (MPC) [GMW87] to achieve privacy in distributed mobility systems.
Spatial cloaking approaches aggregate users’ exact locations into coarse information. These approaches are often based on -anonymity [Swe02], where a mobility dataset is divided into equivalence classes based on data attributes (e.g., geological regions, time, etc.) so that each class contains at least records [GFCA14, HC20]. These -anonymity-based approaches can guarantee that every record in the dataset is indistinguishable from at least other records. However, -anonymity is generally considered to be a weak privacy guarantee, especially when is small. Furthermore, very coarse data aggregation is required to address outliers or sparse data, and in these cases spatial cloaking-based approaches provide low data accuracy.
Differential privacy provides a principled privacy guarantee by producing randomized responses to queries, whereby two datasets that differ in only one entry produce statistically indistinguishable responses [DMNS06]. In other words, differential privacy ensures that an adversary with arbitrary background information (e.g., query responses, other entries) cannot infer individual entries with high confidence. Within transportation research, [WHL+18, YLL+19] share noisy versions of location data for mobile crowd-sensing applications. [GZFS15, GLTY18, AHFI+18, LYH20] use differential privacy to publish noisy versions of trajectory data. [DKBS15] and [HTP17] apply differential privacy to gradient descent algorithms for federated learning in mobility systems.
Many of the works mentioned in the previous paragraph establish differential privacy of their proposed algorithms by using composition properties of differential privacy. Composition theorems for differential privacy describe how well privacy is preserved when conducting several computations one after another. In [DKBS15] and [HTP17], composition theorems are applied as black boxes without considering the mathematical properties of the gradient descent algorithm. As a result, the privacy guarantees are overly conservative, meaning that large amounts of noise are added to the algorithm, leading to suboptimal behavior both in theory and in practice. Similarly, [GZFS15, GLTY18, AHFI+18, LYH20] use composition rules as a black box, and while privacy is achieved in this way, there are no accuracy guarantees for the algorithms presented in those works.
While blackbox applications of differential privacy can lead to impractical levels of noise injection, specialized applications of differential privacy were discovered that could provide privacy with much less noise. [WLK+17] show how a simple adjustment to stochastic gradient descent can give rise to an algorithm which is both differentially private, and under reasonable regularity assumptions, is also asymptotically optimal. [FMTT18] and [FKT20] refined this idea to develop stochastic gradient descent algorithms that are differentially private, computationally efficient, and have optimal convergence rates. These techniques cannot directly be used to solve the standard formulation of network routing because they study unconstrained optimization problems or optimization problems with public constraint sets (i.e., constraints that do not include any private data).
2 Model
In this section we define notations, network models, and assumptions that will allow us to formulate network routing as a data-driven convex optimization problem.
2.1 Preliminaries
Indicator Representation of Edge Sets: Let be a graph with vertices and edges . For any subset of edges , we define the indicator representation of as as a boolean vector of length in the following way: The th entry of is if , and is otherwise.
Derivative Notation: For a scalar valued function , we use to denote the gradient of the with respect to evaluated at the point . For a vector valued function , and a variable , we use to denote the derivative matrix of with respect to evaluated at the point .
Projections: For a convex set , we use to denote the projection operator for . For any , to be the point in that is closest to .
2.2 Network, Demand, and Network Flow
In this section we will introduce a) a graph model of a network, b) a representation of network demand, c) the standard formulation for network routing and d) privacy requirements. The notation defined in this section is aggregated in table form in Section A for the reader’s convenience.
Definition 1 (Network Representation).
We use a directed graph to represent the network, where and represent the set of vertices and edges in the network, respectively. We will use and to denote the number of vertices and edges in the graph, respectively. For vertex pairs we will use to denote the set of simple paths from to in .
Definition 2 (Operation Period).
We use to denote the operation period within which a network operator wants to optimize its routing decisions. We will also use to denote the number of minutes in the operation period. For example, could represent a morning commute period where .
Definition 3 (Demand Representation).
We study a stationary demand model where demand within the operation period is specified by a matrix . For each ordered pair of vertices , is the number of requests arriving per minute (i.e., the arrival rate) during that need routing from vertex to vertex .
Remark 1 (Estimating from historical data).
The arrival rates from historical demand are computed empirically, i.e., if represents the demand for day , then is calculated by counting the number of requests appearing on day , and then dividing it by to obtain requests per minute.
Definition 4 (Link Latency Functions).
To model congestion effects, each link has a latency function which specifies the average travel time through the link as a function of the total flow on the link.
In this paper we study a setting where a network operator wants to route demand while minimizing the total travel time for the requests. With these definitions, the standard formulation of minimum travle time network routing is described in Definition 5.
Definition 5 (Standard Formulation of Network Flow).
For a network with link latency functions and demand , the standard network flow problem is the following minimization program:
(1a) | ||||
subject to | (1b) | |||
(1c) | ||||
(1d) |
In (1), the decision variable is a collection of flows , one for each non-zero entry of , as represented by constraint (1b). (1d) are flow conservation constraints to ensure that sends units of flow from to . Constraints (1c) ensure that represents the total amount of flow on each edge. Finally, the objective function (1a) is to minimize the total travel time as a function of total network flow.
In the next subsection we will describe the rigorous privacy requirements that we will mandate while designing algorithms for network flow. We then describe in Section 2.4 why privacy and feasibility are mutually exclusive in the standard network flow formulation.
2.3 Privacy Requirements
We will use differential privacy to reason about the privacy-preserving properties of our algorithms. At a high level, changing one data point of the input to a differentially private algorithm should lead to a statistically indistinguishable change in the output. To make this concept concrete we will need to define data sets and adjacency.
Definition 6 (Data sets).
Given a space of data points , a data set is any finite set of elements from . In practice, each element of a data set is data collected from a single user, or data collected during a specific time period. We will use to denote the set of all data sets.
Definition 7 (Data Set Adjacency).
Given a space of data sets , an adjacency relation Adj is a mapping . Two data sets are said to be adjacent if and only if .
While the exact definition can vary across applications, adjacent data sets are sets that are very similar to one another. The most common definition of adjacency is the following: are adjacent if and only if can be obtained by adding, deleting, or modifying at most one data point from , and vice versa. Thus comparing a function’s output on adjacent data sets measures how sensitive the function is to changes in a single data point.
With these definitions in place, we are now ready to define differential privacy.
Definition 8 (Differential Privacy).
For a given adjacency relation Adj, a function is -differentially private if for any with , the following inequality
holds for every event .
The definition of differential privacy requires that changing one data point in the input to a differentially private algorithm cannot change the distribution of the algorithm’s output by too much. Such a requirement ensures the following strong privacy guarantee: It is statistically impossible to reliably infer the properties of a single data point just by examining the algorithm’s output, even if the observer knows all of the other data points in the input [DMNS06].
To proceed, we must specify what data set adjacency means in the context of network routing. For network routing problems, historical data is often a collection of routing requests through the network. To protect the privacy of those who submit requests, we want the routing policy we compute to not depend too much on any single request that appears in the historical data. This motivates the the following notion of data set adjacency that we will be using throughout this paper.
Definition 9 (Request Level Adjacency (RLA)).
We define a function which maps pairs of data sets to booleans. For two historical datasets of network demand and , we say and are request-level-adjacent (RLA) if exactly one of the following statements is true:
-
1.
contains all of the requests in , and contains one extra request that is not present in .
-
2.
contains all of the requests in , and contains exactly one extra request that is not present in .
Mathematically, are request-level-adjacent, i.e., , if and only if they satisfy all of the following relations:
-
•
There exists so that for all .
-
•
There exists two vertices and so that for all .
-
•
.
Indeed, these relations dictate that one of the datasets contains an extra request from to which happened on the th day. A difference of request within a minute operation period leads to a change of in the arrival rate. Aside from this difference, the datasets are otherwise identical.
2.4 Differential Privacy Challenges in Standard Network Flow
In the introduction we mentioned that privacy and feasibility can be mutually exclusive in the standard formulation of network flow described in (1). In this section we formally show that if is constructed from a data set of trips as described in Remark 1, then trips to or from uncommon locations can be easily detected from any feasible solution to (1). As a result, announcing or releasing a feasible solution to (1) is not, in general, differentially private. Formally, we will prove the following theorem in this section.
Theorem 1 (Differential Privacy Impossibility for Standard Network Flow).
Let be a function that takes as input a matrix with non-negative entries and returns a feasible solution to the optimization problem where is used as a demand matrix. Then cannot be -differentially private for any .
We further note that -differential privacy only provides a meaningful privacy guarantee when and [Dwo08].
Proof of Theorem 1.
Let be a network, and be constructed from a historical data set of requests in . Suppose there exist uncommon locations and for which contains no trips to or from either or . Mathematically, this means that
Such situations are not uncommon in transportation networks, if, for example, and are the homes of two different people who do not drive (perhaps they walk or bike to and from work).
If we now add a trip from to to the data set, and let be the resulting demand matrix, then we have
-
i
for all , and
-
ii
, .
Let be the optimization problem using demand respectively. Because are request-level-adjacent, any differentially private algorithm must behave similarly when acting on . However, this is impossible because the feasible sets of are disjoint. If we look at constraint with and then any feasible solution to satisfies
However, any feasible solution to satisfies
In other words, checking the net flow leaving node will detect the presence of any trips going to or from . We will now show that any algorithm which returns a feasible solution to (1) cannot be differentially private. To this end, define the event to be the event that flow is conserved at node . Then for any algorithm that takes as input a demand matrix and returns a feasible solution to (1) with the specified demand matrix, we have . Recalling Definition 8, is -differentially private only if . This equation can only be satisfied if . ∎
We have two remarks regarding Theorem 1.
Remark 2.
The same result holds if returns the total flow associated with a feasible solution (see (1c)), rather than returning the feasible solution itself. In other words, even total traffic measurements (without knowing the breakdown by pairs) can already expose trips to or from uncommon locations.
Remark 3.
The vulnerability of trips to and from uncommon locations is not a purely theoretical concern. A study on the New York City Taxi and Limousine data set was able to identify trips from residential areas to gentlemen’s club [Pan14]. Because the start locations were in residential areas, it was easy to re-identify the users who booked the taxi trips as the owners of the homes that the taxi trips began at. As a result, despite the data set being anonymized, users who had taken taxis to this gentlemen’s club were re-identified, and their reputations may have been negatively affected.
3 Routing Policy Formulation of Network Flow
To sidestep the impossibility result described in Theorem 1, we present an alternative formulation for the network flow problem in this section. The alternative formulation avoids the issues mentioned in Theorem 1 by moving all parameters related to user data from the constraints to the objective function, as described in (8). We note that (8) can only be solved if the demand is known, which may not always be the case. For this reason, we present two variations of (8): (9) is the stochastic version of (8) where is drawn from a distribution , and (10) is the data driven approximation to (9) that one would solve if is unknown.
Before formally defining the model, we provide a high level description of how this formulation works. In this formulation, a feasible solution specifies, for each , a flow that routes 1 unit of flow from to . We note that a flow is specified for even if there is no demand for this origin and destination in , i.e., . We refer to as a routing policy due to its connection to randomized routing, which Remark 5 discusses in further detail. Given a feasible solution , the objective function first calculates the total traffic on each edge by taking a linear combination of flows, where the coefficients of the linear combination are determined by the demand , with high demand pairs having larger coefficients. The total travel time can be computed from the total traffic in the same way as (1a). These ideas are formalized by the following definitions.
Definition 10 (Unit flow).
For a given origin-destination pair , we say that a flow is a unit flow if and only if it routes exactly unit of flow from to . Formally, this condition is represented by the following constraints:
(5) |
Indeed, (5) requires that the net flow entering is , the net flow entering is , and that flow is conserved at all other vertices in the graph.
Definition 11 (Unit Network Flow).
A unit network flow is a collection of flows so that is a unit flow for each . We use to denote the set of all unit network flows.
Remark 4.
We can represent as a concatenation of the vectors . Since each unit flow is a dimensional vector, we have .
We will refer to unit network flows as routing policies, due to their connection with randomized routing, described in the following remark.
Remark 5 (Routing demand using unit flow policies).
A unit network flow represents a randomized routing policy. Due to the flow decomposition theorem [Tre11], every unit flow can be written as a distribution of paths from to . Formally, for any unit flow , there exist paths from to and weights so that the following equations hold:
(6) | ||||
Furthermore, and are efficiently computable. Defining the probability distribution
represents the probability that a random path chosen from contains . describes the expected behavior of the randomized routing policy that determines routes for requests by drawing a path independently at random from . In particular, when using this policy to serve demand , by linearity of expectation, the expected number of requests from to whose assigned path contains is exactly . Furthermore, the average number of requests on each edge will be .
3.1 Minimum Total Travel Time Network Flow
In the minimum travel time network flow problem, the network operator wants to find a stationary routing policy for each pair that will lead to small total travel times for the requests. Due to the equivalence between stationary routing policies and unit flows, the network operator can instead search over the space of unit flows.
The total travel time of a flow through is given by . The total flow resulting from a unit network flow serving demand is the sum of the flow contributions from each of the pairs. With Remark 5 in mind, the total amount of flow on an edge when serving according to is given by . We can thus define , the total travel time experienced by requests when being routed by , as follows:
(7) |
With these definitions in place, the unit network flow that minimizes the total travel time when serving the demand can be found by solving the following optimization problem
(8) | ||||
subject to | ||||
3.2 Network Flow with Stochastic Demand
In practice, demand may vary from day to day, and such variations can be modeled by being a random variable with distribution . If is known by the network operator, then rather than solving (8), the operator is interested in minimizing expected total travel time through the following stochastic optimization problem:
(9) | ||||
subject to | ||||
In the more realistic case that is not known, the optimization problem (9) can be approximated from historical data. We study a situation where the network operator has demand data collected from operations of previous days. Using this data it can solve the following empirical approximation to (9):
(10) | ||||
subject to | ||||
3.3 Assumptions on Travel Time functions
In this section we will introduce some assumptions that will help us establish our technical results. We will make the following assumptions on the network demand:
Assumption 1 (Bounded Demand).
We assume there exists a non-negative constant so that for every . In practice, this constant can be estimated from historical data.
The following are assumptions we make on the objective function (see (7)). These assumptions are related to properties of the link latency functions . We present a typical model for link latency functions in Section 3.4 that satisfies all of the following assumptions.
Assumption 2 (Bounded Variance Gradients).
We assume there exists a non-negative constant so that for every , the variance of is upper bounded by , i.e., .
Assumption 3 (Twice Differentiability).
We assume that is twice-differentiable for every so that the hessian is defined on the entire domain of .
Assumption 4 (Strong Convexity).
We assume that there exists for which is -strongly convex for every . This means that for every , and any unit network flows we have
Assumption 5 (Smoothness).
We assume that there exists for which is -smooth for every . This means that for every , and any unit network flows we have
Assumption 6 (Bounded second order partial derivative).
We assume that there exists so that for all and .
Remark 6 (Satisfying Assumption 6).
We can satisfy Assumption 6 for any positive by re-scaling. In particular, letting , then for any we can define a re-scaled objective function
By construction, satisfies Assumption 6 with constant value . Note, however, that the smoothness and strong convexity parameters for will be rescaled accordingly to and respectively.
3.4 Transportation Model satisfying assumptions from Section 3.3
In this section, we present a transportation network model that satisfies Assumptions 3,4,5 and 6. We study a network where the link latency functions are all affine where for each , there are non-negative constants so that . Let be defined so that , and let be the concatenation of all of the zero order coefficients in the link latency functions.
As mentioned in Remark 4, we will represent as a concatenation of . As such, . Let be the order in which the unit flows are concatenated to produce so that
The total flow on the links in the network when serving demand according to can then be written as:
where . Here is a dimensional row vector whose th entry is , and represents the Kronecker product. Then when is being routed according to , the travel times on the links can be computed as
which means that the total travel time can be written as
(11) |
If we add a bit of regularization, we obtain
Recall that we use to denote the hessian of with respect to its first argument. We now make the following observations:
-
•
The Hessian of with respect to is defined for all and is equal to . Hence Assumption 3 is satisfied.
-
•
Since is a diagonal matrix with non-negative entries, . This implies that . Hence . This implies that is -strongly convex, and hence Assumption 4 is satisfied.
-
•
Note that . Defining , we see that for all , meaning that is -smooth, and thus Assumption 5 is satisfied.
-
•
By product rule,
where is the th standard basis vector for . By triangle inequality we can then conclude that
Here is due to the fact that is a unit flow, meaning and thus . Also, implies that . is due to .
Hence Assumption 6 is satisfied with .
4 Differentially Private Network Routing Optimization
Given the setup from Section 3, our objective is to design a request-level differentially private algorithm that returns a near optimal solution to (9). Since the true distribution of demand is unknown, we will design an algorithm for (10) and show that under the assumptions described in Section 3.2, the algorithm’s solution is also near optimal for (9).
Computing a near-optimal solution to (10) while maintaining differential privacy may seem like a daunting task, but it turns out that a single modification to a well-known optimization algorithm gives an accurate and differentially private solution.
We present a Private Projected Stochastic Gradient Descent algorithm, which is described in Algorithm 1. As the name suggests, Algorithm 1 is a modified version of stochastic gradient descent. The algorithm conducts a single pass over the historical data, using each data point to perform a noisy gradient step (see line 6). The key difference between Algorithm 1 and standard stochastic gradient descent is in line 11, where instead of returning the final gradient descent iterate, Algorithm 1 returns a noisy version of the last iterate. Algorithm 1 has the following privacy and performance guarantees.
Theorem 2 (Privacy Guarantee for Algorithm 1).
Theorem 3 (Performance Guarantee for Algorithm 1).
In particular, is -differentially private and converges to as , meaning that privacy and asymptotic optimality are simultaneously achieved.
4.1 Discussion
Carefully adding noise to specific parts of existing algorithms is a principled approach for developing differentially private algorithms [DMNS06, FMTT18, FKT20]. The main challenge in such an approach is determining a) where and b) how much noise to add. Suppose the goal is to (approximately) compute a query function on a data set in a differentially private way. The latter question can be addressed by measuring the sensitivity of the desired query function.
Definition 12 ( sensitivity).
Consider a function which maps data sets to real vectors. For a given adjacency relation Adj, the sensitivity of , denoted , is the largest achievable difference in function value between adjacent data sets. Namely,
Once the sensitivity of the query function is known, the required noise distribution can be determined using the Gaussian mechanism as described in Theorem 4.
Theorem 4 (From [DR14]).
Suppose maps datasets to real vectors. If is the sensitivity of , then where is -differentially private with respect to the adjacency relation Adj.
Calculating the sensitivity of the simple query functions (e.g., counting, voting, selecting the maximum value) is relatively easy, making the Gaussian mechanism straightforward to apply. However, for more complicated functions, noise calibration becomes more involved.
Algorithm 1 is an application of the Gaussian mechanism. Moreover, Theorem 3 shows that the suboptimality of Algorithm 1 is . The asymptotic optimality of Algorithm 1 comes from the fact that the sensitivity of the final gradient descent iterate is actually converging to zero as approaches infinity. This fact enables us to add less noise as . Indeed, the Gaussian noise added to the final gradient descent iterate in Algorithm 1 has standard deviation which is .
In the remainder of this section, we sketch some of the mathematical ideas behind the perhaps non-intuitive result that the sensitivity of gradient descent converges to zero as the number of data points increases. For simplicity of exposition we will a) use scalar notation in place of vector notation for the sake of readability, and b) consider the simpler case of unconstrained gradient descent, which removes the need to perform projections. As a reminder, the full proof can be found in Appendix B.
Given a mobility data set, we use to denote the -th iterate of Algorithm 1 when using data set . It is sufficient to show that the converges to zero for every . This is because the sensitivity is obtained by integrating the derivative:
Because are adjacent, the distance between them is finite, and thus the above integral will converge to zero if its integrand converges to zero.
With this in mind, by chain rule, we can write
Next, by using properties of smooth and strongly convex functions, we show that for some positive constant . This result implies
Next, recalling that , note that is computed from the first gradient steps, which only depend on the first data points . Hence the . From this we see that
By using Assumption 6 we have the following inequality:
Putting everything together, we have
The choice of stepsize in Algorithm 1 ensures three things: a) for every , b) is non-increasing, and c) as . Given these facts, there are two cases to consider for .
-
Case 1:
. In this case, we can upper bound and . Hence . Since , this converges to zero as .
-
Case 2:
. In this case, we simply upper bound and which gives which converges to zero as .
Finally, this shows that
and since both terms in the maximum are converging to zero, we have .
5 Experiments
Differentially private mechanisms add noise to provide a principled privacy guarantee for individual level user data. One immediate question is the degree to which the added noise degrades the quality of the obtained solution. In the previous section, we addressed this question theoretically in Theorem 3 by showing that Algorithm 1 is both differentially private and asymptotic optimal. In this section, we present empirical studies on privacy-performance trade-offs by comparing our algorithm’s performance to that of a non-private network routing approach. To this end, we simulate a transportation network to evaluate the performance of our algorithm and the non-private optimal solution to the network flow problem (10).
We describe the dataset used for the experiments in Section 5.1. Next, the algorithms used in the experiments are described in Section 5.2. In Section 5.3, we evaluate the practical performance of our algorithm for different values of . In particular, we study the effect of the number of samples, and quantify the loss in system performance we may experience due to the introduction of our privacy-preserving algorithm. Finally, in Section 5.4, we study the sensitivity of our algorithms to the simulation parameters. In particular, we study the convergence of the routing policy with increasing data for different demands, edge latency function, and the magnitude of the regularization loss introduced to convexify the edge latency function.
5.1 Data Set
We use data for the Sioux Fall network, which is available in the Transportation Network Test Problem (TNTP) dataset [Tra16]. This network has 24 nodes, 76 edges, and 528 OD pairs (see Figure 1 for an overview of the network topology). The distribution of mean hourly demand across different OD pairs is shown in Figure 1. The mean OD demand is 682 vehicles/ hour. Furthermore, for more context on the scale of the network, the travel time on edges ranges from 2 to 10 minutes. Trip data is generated each hour, i.e., , from a Poisson distribution with a mean value that is given by the data. Our objective is to learn a routing policy for these trips that minimizes the total travel costs for all users. To model congestion, we use the link latency model described in Section 3.4. In particular, for each , we estimate the free flow latency as directly from the data set. The sensitivity of the latency function to the traffic volume on the link, denoted by is chosen such that the travel time on the link is doubled when the link flow equals the link capacity. In later experiments, we will change this factor to study the sensitivity of our algorithm.
5.2 Algorithms
Throughout Sections 5.3 and 5.4 we compare the performance of two different algorithms: a Baseline algorithm and Algorithm 1, which we describe below.
Baseline: This algorithm computes the minimum travel time solution to (10) for the Sioux Fall model described in Section 3.4. Recall that in the Section 3.4 model, (11) describes the travel time incurred when serving demand with a routing policy . Given a data set , it computes the solution to the following minimization problem: .
Algorithm 1: The travel time function described in (11) is not strongly convex because is rank deficient. In order to satisfy Assumption 4, we introduce an regularization. Namely, given a data set , we apply Algorithm 1 to the following minimization problem: .
For the Sioux Falls network, the parameters for implementing Algorithm 1 are set as follows. The smoothness parameter is set to be the largest eigenvalue of , which is equal to . We set , and the regularizer parameter . It is easy to check that these values satisfy the assumptions described in Section 3.4. Note that several different values of could have been used to convexify the latency function. However, our choice is governed by two factors. A small value of will ensure that the regularized objective is a good estimate to the true objective, which is desirable. However, smaller values of will lead to a larger condition number (which is ), resulting in slower convergence and necessitating more data for achieving a similar performance. Thus, the particular value of balances both these factors for our problem instance. In section 5.4, we present a sensitivity analysis with different values of .



5.3 Performance of Algorithm 1
In this subsection, we study the convergence of the routing policy with each step of the gradient descent performed by our algorithm. Since the impact of a routing policy is directly reflected in the total travel time, we plot the travel cost induced by a the learned routing policy as a function of the iterates. Recall that the number of iterations for a data set with data points is according to Algorithm 1. In our experiments we evaluate the cost of a routing policy as instead of using the sample average . This approximation is done solely for improving the run-time of our experiment (by up to 50X) and introduces less than % error in the evaluation of the system costs. Similarly, the optimal solution is also computed by minimizing the objective function instead of the term described in Equation 10 to achieve a computational speed up of 30X. Evaluating the cost of this solution using the average-demand approximation results in errors less than % error. Thus, in these experiments, we define the optimal costs as the approximate costs obtained through this procedure. Further details justifying these approximation are presented in Appendix D
For our first set of experiments, we compare the objective values obtained by Algorithm 1 and the Baseline as a function of sample size . In Figure 2, we plot the ratio of Algorithm 1’s cost to Baseline’s cost over the course of iterations for different values of . For this set of experiments, we set the privacy parameters to and . Note that Baseline’s cost is fixed for a given and is computed offline to serve as a benchmark. For a given , we only have iterations since each data point is only used once in Algorithm 1 to maintain privacy. For all three experiments (, , ), the cost decreases monotonically with additional iterations. It is therefore not surprising that the final costs for the case is the lowest, as we expect the routing policy learned with 50 data points to be better than the routing policy learned from 10 data points. It is however very interesting that even with a random routing policy initialization, our algorithm finds solutions that are just around 2% away from the optimal policy. We suspect that this 2% gap is due to the fact that the two algorithms have slightly different objective functions. Indeed, Algorithm 1 has an regularizer in the objective, but Baseline does not. Our results therefore show that although the convergence is guaranteed only in the limit , we can obtain practically useful solutions with a relatively small number of data points.

In the next set of experiments, we study the effect of different privacy parameters on total travel time. To this end, we compare the costs of the pre-noise and post-noise solutions from Algorithm 1. We conduct this comparison for and . Table 1 presents the percentage increase in total travel time due to the addition of privacy noise. The results in indicate that the price of privacy, i.e., the increase in total travel time due to the introduction of differential privacy noise is less than % in the worst case. In fact, for more commonly used privacy parameters of and , the cost of privacy is even smaller. One reason for this low cost of privacy is the high demand in the traffic network. From Figure 1(c), we observe that every OD pair typically has a few hundred trips. With over 500 OD pairs, it is thus clear that there are tens of thousands of vehicles in the network contributing to trip information with every data point. Thus, with such a large number of vehicles, the noise required to protect the identity of one vehicle is not too high.
Cost (% increase) | ||
---|---|---|
0.01 | 0.1 | |
0.01 | 0.5 | |
0.1 | 0.1 | |
0.1 | 0.5 | |
0.5 | 0.1 | |
0.5 | 0.5 |
5.4 Sensitivity



We now study the sensitivity of our algorithm to input parameters. For these experiments, we fix the number of data points to , since prior results suggest that most of the cost benefits are obtained with 50 data points. First, we study the effect of the regularizer term by setting and plot the convergence of the normalized costs in Figure 3(a). We see that for larger values of , the costs decrease faster. This makes sense because larger results in a lower condition number , which leads to faster convergence. We also note that the cost ratio does not go to 1 because Algorithm 1 is minimizing a regularized objective, while Baseline has no regularization.
In Figure 3(b), we compare three scenarios with varying slope for the latency function . Recall that our previous experiments set the slope on each edge such that the travel time on the link doubles when the traffic is equal to the link capacity. This setting corresponds to a sensitivity factor of 2. We consider two more cases where where the travel time at capacity flow is 1.5 times and 5 times the free flow latency. Note that changing the latency sensitivity factor changes the matrix . Thus, for each of these experiments, we recompute the value of and and set it to be equal to the largest eigenvalue of the appropriate . The optimal cost also varies for all the three cases and is recomputed. The value of is fixed at . We observe that when the latency function is steeper, i.e., the sensitivity factor is higher, the algorithm takes more iterations to reduce the costs, but eventually ends up with the lowest costs. This is because a larger leads to larger condition number , which makes convergence slow. However, a larger means that the regularizer is a smaller proportion of the total cost, meaning that the objectives of Algorithm 1 and Baseline become more similar, which is what we believe causes the cost ratio to improve as the latency function becomes steeper.
Finally, we present the sensitivity of our algorithm to varying traffic demand in Figure 3(c). Elaborating further, in these experiments, we compare the nominal setting, where the demand is the mean demand with a scale factor of 1 to two cases. In the first case, we use a lower demand, where the mean traffic is 0.5 times the nominal traffic, and in the second case, the mean traffic is 1.5 times the nominal traffic. Again, as the demand changes, the matrix changes, and we recompute and as before. We observe that for the same value of , higher demand leads to better convergence and lower costs. This is because higher demand increases the travel time, making the regularizer a smaller proportion of Algorithm 1’s objective. The objective functions becoming more similar leads to the cost ratio being closer to .
6 Conclusions
In this paper, we study the problem of learning network routing policies from sensitive user data. In particular, we consider the setting of a transportation network, where we want to learn and share a routing policy such that it does not reveal too much information about individual trips that may have contributed to learning this policy. Our paper presented a new approach to learn privacy-preserving routing policies by solving a reformulated network flow problem using a differentially private variant of the stochastic gradient descent algorithm. We prove that our algorithm is asymptotically optimal, meaning that the cost of the routing policy produced by our algorithm converges to the optimal non-private cost as the number of data points goes to infinity. Finally, our simulations on a Sioux Falls road network suggests that for realistic travel demands, we can learn differentially private routing policies that result in only a 2% suboptimality in terms of total travel time.
There are several interesting directions for future work. First, because differentially private algorithms are not allowed to be sensitive to single data points, they are naturally robust, and can be useful for tracking non-stationary demand distributions, as opposed to the stationary demand models we studied in this paper. In this paper, we studied request-level differential privacy where the goal is to occlude the influence of a single trip on the algorithm’s output. Another practical and important notion is user-level differential privacy where the goal is to occlude the influence of all trips belonging to the same person on the algorithm’s output. User-level privacy is harder to achieve, but is important in practice. Finally, generalizing the results to non-smooth objective functions would expand the domain of models that this technique can be applied to.
References
- [AHFI+18] Khalil Al-Hussaeni, Benjamin CM Fung, Farkhund Iqbal, Gaby G Dagher, and Eun G Park. Safepath: Differentially-private publishing of passenger trajectories in transportation systems. Computer Networks, 143:126–139, 2018.
- [BS03] Alastair R Beresford and Frank Stajano. Location privacy in pervasive computing. IEEE Pervasive computing, 2(1):46–55, 2003.
- [CML11] Chi-Yin Chow, Mohamed F Mokbel, and Xuan Liu. Spatial cloaking for anonymous location-based services in mobile peer-to-peer environments. GeoInformatica, 15(2):351–380, 2011.
- [CTW+21] Nicholas Carlini, Florian Tramèr, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom B. Brown, Dawn Song, Úlfar Erlingsson, Alina Oprea, and Colin Raffel. Extracting training data from large language models. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, pages 2633–2650. USENIX Association, 2021.
- [DKBS15] Roy Dong, Walid Krichene, Alexandre M. Bayen, and S. Shankar Sastry. Differential privacy of populations in routing games. In 2015 54th IEEE Conference on Decision and Control (CDC), pages 2798–2803, 2015.
- [DMNS06] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, volume 3876, pages 265–284. Springer, 2006.
- [DR14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
- [Dwo08] Cynthia Dwork. Differential privacy: A survey of results. In International conference on theory and applications of models of computation, pages 1–19. Springer, 2008.
- [FKT20] Vitaly Feldman, Tomer Koren, and Kunal Talwar. Private stochastic convex optimization: optimal rates in linear time. In Proccedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, STOC, pages 439–449. ACM, 2020.
- [FMTT18] Vitaly Feldman, Ilya Mironov, Kunal Talwar, and Abhradeep Thakurta. Privacy amplification by iteration. In 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS, pages 521–532. IEEE Computer Society, 2018.
- [GFCA14] Moein Ghasemzadeh, Benjamin CM Fung, Rui Chen, and Anjali Awasthi. Anonymizing trajectory data for passenger flow analysis. Transportation research part C: emerging technologies, 39:63–79, 2014.
- [GLTY18] Mehmet Emre Gursoy, Ling Liu, Stacey Truex, and Lei Yu. Differentially private and utility preserving publication of trajectory data. IEEE Transactions on Mobile Computing, 18(10):2315–2329, 2018.
- [GMW87] Oded Goldreich, Silvio Micali, and Avi Wigderson. How to play any mental game or A completeness theorem for protocols with honest majority. In Alfred V. Aho, editor, STOC 1987, New York, New York, USA, pages 218–229. ACM, 1987.
- [GZFS15] Yanmin Gong, Chi Zhang, Yuguang Fang, and Jinyuan Sun. Protecting location privacy for task allocation in ad hoc mobile cloud computing. IEEE Transactions on Emerging Topics in Computing, 6(1):110–121, 2015.
- [HC20] Brian Yueshuai He and Joseph YJ Chow. Optimal privacy control for transport network data sharing. Transportation Research Part C: Emerging Technologies, 113, 2020.
- [HTP17] Shuo Han, Ufuk Topcu, and George J. Pappas. Differentially private distributed constrained optimization. IEEE Transactions on Automatic Control, 62(1):50–64, 2017.
- [LYH20] Yang Li, Dasen Yang, and Xianbiao Hu. A differential privacy-based privacy-preserving data publishing algorithm for transit smart card data. Transportation Research Part C: Emerging Technologies, 115:102634, 2020.
- [Pan14] Vijay Pandurangan. Riding with the stars: Passenger privacy in the nyc taxicab dataset. Available Online, 2014.
- [Swe02] Latanya Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570, 2002.
- [Tra16] Transportation networks for research. github.com/bstabler/TransportationNetworks, 2016. Accessed January 20, 2021.
- [Tre11] Luca Trevisan. Lecture notes on optimization and algorithmic paradigms (lecture 11). Available online, 2011.
- [WHL+18] Zhibo Wang, Jiahui Hu, Ruizhao Lv, Jian Wei, Qian Wang, Dejun Yang, and Hairong Qi. Personalized privacy-preserving task allocation for mobile crowdsensing. IEEE Transactions on Mobile Computing, 18(6):1330–1341, 2018.
- [WLK+17] Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, and Jeffrey F. Naughton. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, pages 1307–1322. ACM, 2017.
- [YLL+19] Ke Yan, Guoming Lu, Guangchun Luo, Xu Zheng, Ling Tian, and Akshita Maradapu Vera Venkata Sai. Location privacy-aware task bidding and assignment for mobile crowd-sensing. IEEE Access, 7:131929–131943, 2019.
Appendix A Notation
The terms and concepts defined and used throughout the paper are all described here for the sake of convenience.
Notation Definition Graph representation of the network. (Definition 1) The vertex set of the network . (Definition 1) The edge set of the network . (Definition 1) The number of vertices in , i.e., . (Definition 1) The number of edges in , i.e., . (Definition 1) Operation period which represents the time of day that the network operator is trying to optimize its decisions in. (Definition 2) Number of minutes in the operation period. (Definition 2) Demand matrix. specifies the rate at which requests from to appear in the network. (Definition 3) The distribution of . (Defined in Section 3.2) Dataset containing days of historical request data for the operation period . (Defined in Section 3.2) Unit flow. A flow that routes exactly 1 unit of flow from to through the network. (Definition 10) Unit network flow. Specifies a unit flow for all pairs of vertices in the network. (Definition 11) The set of all unit network flows. It is also the feasible set for optimization problems (8),(9) and (10). (Definition 11) A unit network flow and the th gradient descent iterate from Algorithm 1. is used to show explicit dependence of on the historical training data . (Algorithm 1) Link latency function for the edge . If is the total flow on , then will be the average travel time through the edge. (Definition 4) Total travel time function. is the total travel time of requests in experience when they are routed according to . (Defined in (7)) Differential privacy parameters. As and get smaller, the privacy guarantees offered by differential privacy get stronger. (Definition 8) Upper bound on the standard deviation of the gradient of . (Assumption 2) Hessian of with respect to . Concretely, is the hessian of with respect to evaluated at . Strong convexity parameter. (Assumption 4) Smoothness parameter. (Assumption 5 Upper bound on . (Assumption 6)
Appendix B Privacy Analysis (Proof of Theorem 2)
Recall the privacy guarantee for the Gaussian Mechanism from Theorem 4. To show that Algorithm 1 is -differentially private, it suffices to show the following Lemma:
Lemma 1.
For every , any value of , and any we have .
Lemma 1 is sufficient to prove privacy, because it implies that the sensitivity of is at most . To see why this is true, let and be request-level-adjacent data sets that differ only on the th day. Furthermore, let be the request for which and differ. Let and be the final iterates of gradient descent obtained by using the data sets and respectively. By the fundamental theorem of calculus,
Here due to Lemma 1. is because is a consequence of being request-level-adjacent.
Proof of Lemma 1.
Define . By Assumptions 3 , 4 and 5, the functions are all twice differentiable, -strongly convex and -smooth. For each , and any by chain rule we have
Since , differentiating both sides with respect to gives
where is the Hessian of . Since is -strongly convex, we know . By -smoothness of , we also know that . Therefore if , we see that . From this we can conclude that
(12) |
Finally, note that only depends on and , and in particular it does not depend on . Therefore differentiating both sides of with respect to gives
(13) |
where is due to Assumption 6. Combining inequalities (12) and (13) gives
(14) |
Letting note that
If , then for all , and we see that
On the other hand, if , then and we see that
Thus in either case we have .
Finally, note that (14) implies . Combining these two bounds gives the desired result:
∎
Appendix C Performance Analysis (Proof of Theorem 3)
Let be a solution to (9). Note that
Taking expectation of both sides conditioned on , we see that
To show that is closer to than is, we will provide bounds for both Term 2 and Term 3.
C.1 Bounding Term 2
To upper bound Term 2, noting that for all , we have
(15) |
By Assumption 6 and the dominated convergence theorem, we have . Hence we see that
From this observation we have the three following remarks:
-
1.
Since is a zero mean random vector, is the variance of given . By Assumption 2, for any , which implies that .
-
2.
By Assumption 5, is -smooth for every . -smoothness of implies that is -lipschitz. Thus we have
where is due to Jensen’s inequality and is due to being -Lipschitz.
-
3.
Conditioned on , is zero mean and is constant, meaning that is a zero mean random vector. Therefore .
Applying these three remarks to the inequality (15) we see that
(16) |
C.2 Bounding Term 3
By linearity of expectation, Term 3 is equal to
Define . We can re-write the above equation as
Since is -strongly convex for every , we can conclude that is also -strongly convex. To upper bound Term 3, we use -strong convexity of to conclude that
and | |||
Adding these inequalities together gives
This inequality implies the following bound on Term 3:
(17) |
C.3 Putting everything together
(18) |
By the tower property of expectation, we can write
(19) |
Combining the recursive relation from (18) with (19) we see that
Since we chose , we have . This means that . Thus defining , we have
where is because in Appendix B we showed that for all . Next, let so that if and otherwise. We then have
Finally, we have
where is due to Jensen’s inequality and is due to the fact that for any pair of unit network flows .
Appendix D Computational approximations
In our experiments, we make several approximations for computational tractability. In this section, we provide empirical evidence that these approximations are reasonable and do not introduce significant errors. For ease of discussion, we define the following notations.
-
•
: Solution obtained by solving the optimization problem (7) with regularizer
-
•
: Solution obtained by solving the optimization problem with the average demand and a regularizer
-
•
: Evaluating the routing policy on demand with a regularizer
-
•
: Evaluating the average cost of the routing policy on the set of demands with a regularizer . More precisely,
-
•
: Evaluating the cost of the routing policy on the average demands with a regularizer
Error #1 | Error #2 | Error #3 | |
10 | |||
25 | |||
50 |
Table 2 presents errors from three different approximations. Our first approximation is to compute the system costs for a given policy by using the average demand instead of averaging the costs over every observed demand. The first column (denoted as Error #) lists the fractional error introduced by this approximation for different values of when using the optimal routing policy. We note that the error is less than , and we we observe a 30-50X improvement in computational time when evaluating the costs using he average flow. This justifies the use of the average demand for estimating costs. Our second approximation is in solving an easier optimization problem to compute the optimal routing policy. In this case, the exact approach would be to solve the optimization problem described in Equation 10. However, the size of this problem grows rapidly with the number of data points . Our approximation involves solving the easier optimization problem of maximizing to obtain the routing policy instead of solving the original optimization problem to obtain . The second column of the table (titled Error #2) presents the error introduced due to this approximation on the travel costs. The small errors indicate that this assumption is reasonable, and helps us obtain upto a 30X speedup in solving the optimization problem. Finally, we show through numerical evaluations that the addition of the regularizer does not change the travel costs significantly (third column, denoted as Error #3), and results in less than a 0.01% error in the total travel cost.