Private Location Sharing for Decentralized Routing services
Abstract
Data-driven methodologies offer many exciting upsides, but they also introduce new challenges, particularly in the realm of user privacy. Specifically, the way data is collected can pose privacy risks to end users. In many routing services, a single entity (e.g., the routing service provider) collects and manages user trajectory data. When it comes to user privacy, these systems have a central point of failure since users have to trust that this entity will not sell or use their data to infer sensitive private information. Unfortunately, in practice many advertising companies offer to buy such data for the sake of targeted advertisements.
With this as motivation, we study the problem of using location data for routing services in a privacy-preserving way. Rather than having users report their location to a central operator, we present a protocol in which users participate in a decentralized and privacy-preserving computation to estimate travel times for the roads in the network in a way that no individuals’ location is ever observed by any other party. The protocol uses the Laplace mechanism in conjunction with secure multi-party computation to ensure that it is cryptogrpahically secure and that its output is differentially private.
A natural question is if privacy necessitates degradation in accuracy or system performance. We show that if a road has sufficiently high capacity, then the travel time estimated by our protocol is provably close to the ground truth travel time. We validate the protocol through numerical experiments which show that using the protocol as a routing service provides privacy guarantees with minimal overhead to user travel time.
1 Introduction
Big Data and data-driven methodologies have shown promise in improving the efficiency, safety and adaptability of mobility services. However, certain types of data sharing can also lead to privacy risks for users. In this paper we focus on merits and risks of sharing location data. We discuss how location data is useful for determining congestion levels in routing services (e.g., Google Maps, Apple Maps, Waze), and we discuss user privacy risks involved with location sharing. With this as motivation we show how a protocol for decentralized location sharing can mitigate privacy risks while retaining some of the merits of location information for routing services.
Repeated exposure to conventional location sharing can lead to privacy risks for users. In many current routing services, users provide their location data in exchange for routing recommendations. While users often only provide a small amount of their location data each time that they use a routing service, if a user regularly uses routing services, the data they share over many interactions can be stitched together to form a more complete picture of the user’s routines, behaviors, preferences, etc. User privacy in such settings thus requires trust that the routing services will not share user data with other entities. However in practice, advertising companies offer to buy this user data to build user profiles for the sake of targeted advertising. As a result, even though users only share small amounts of their location data in each interaction with a routing service, a single entity may end up with a large amount (likely more than the user is comfortable with) of their location data (see Fig. 1).
While location sharing presents privacy challenges, it also provides utility for routing services. Location information is helpful because congestion levels of a road can be estimated from the number of vehicles on the road. A key insight toward addressing privacy challenges is that the congestion level only depends on aggregate location information; what matters is the number of vehicles on a road, not which particular users are on the road. This suggests that aggregation procedures can be used to protect individual user location while still providing the location information needed for routing services.

1.1 Statement of Contributions
Motivated by this observation, in this paper we propose a decentralized location sharing protocol where users on the road will periodically compute and announce the traffic counts (e.g., approximate number of vehicles traveling on each road) of the transportation network in a decentralized and privacy-preserving manner. Since only the total number of vehicles on each road is announced, the location of individual users is not discernible by observers, which is contrary to many current location sharing setups where users give their individual location data directly to routing services. With this protocol, user privacy does not rely on a trusted data custodian, and there are no single points of failure.

Furthermore, assuming the roads in the network are sufficiently large, we can prove that the travel time estimates produced by the protocol will be close to the estimates produced by the ground truth with high probability. This result showcases an interesting complementarity between differential privacy and delay functions used in travel time estimation. In low traffic situations, differential privacy constraints lead to poor accuracy for traffic count estimation. However, delay functions are insensitive for small inputs and can thus tolerate the poor accuracy. On the other hand, delay functions are very sensitive in high traffic situations, and differential privacy can provide high accuracy in these settings. Thus when a delay function is composed with a differentially private mechanism, the two compensate for the others’ weaknesses to yield accurate and private travel time estimates. We corroborate this insight using numerical experiments which show that the protocol provides a privacy-preserving routing service with minimal overhead to the travel time of users.
1.2 Related Work
Privacy research in transportation mainly focuses on location privacy, whereby the aim is to prevent untrusted entities from learning geographic locations or location sequences of an individual [1]. A number of privacy-preserving approaches have been proposed for various location-based applications, e.g., trajectory publishing, mobile crowdsensing, traffic control, etc. From a methodological perspective, these approaches are often implemented through spatial cloaking [2], differential privacy[3], and Secure Multi-Party Computation (MPC) [4].
Spatial cloaking-based approaches rely on aggregation to convert users’ exact locations to coarse information. These approaches are often based on k-anonymity [5], where a mobility dataset is divided into equivalence classes based on data attributes (e.g., geological regions, time, etc.) so that each class contains at least k records [6, 7]. These k-anonymity-based approaches can guarantee that every record in the dataset is indistinguishable from at least k-1 other records. However, k-anonymity is generally considered to be a weak privacy guarantee. Furthermore, due to coarse data aggregation, spatial cloaking-based approaches can lead to low data accuracy.
Differential privacy-based approaches provide a sound privacy guarantee by producing randomized responses to queries, whereby two datasets that differ in only one entry produce statistically indistinguishable responses [8]. In other words, differential privacy ensures that an adversary with arbitrary background information (e.g., query responses, other entries) cannot infer individual entries with high confidence. Existing research for location data either probabilistically generates obfuscated locations from a user’s true location [9, 10] or adds noises to the number of users within each equivalent class [11, 12, 13, 14, 15]. However, differential privacy-based approaches can suffer from two drawbacks. First, due to randomization, there is a trade-off between the accuracy of the response and the level of privacy. Second, most existing research requires a trusted data collector to generate random responses, which does not fit our decentralized setting in this paper.
Secure MPC serves as an excellent technique for decentralized settings, whereby several players jointly compute a function over their data while keeping these data private. Existing secure MPC-based research proposes traffic monitoring and control approaches that keep users’ location data confidential, based on secret sharing [16, 17], homomorphic encryption [18, 19], and blockchain [20]. Secure MPC can ensure accuracy since no noises are added to protect location privacy. However, Secure MPC can suffer from high computational overhead due to encryption, and the computation results might leak private information (See Remark 4 for more details).
1.3 Organization
This paper is organized as follows. In Section 2 we present a model for the transportation system, and specify both the system objective and privacy requirements. We present a decentralized and privacy-preserving routing service protocol in Section 3 along with all of the statistical and cryptographic tools used by the protocol. In Section 4 we prove that if the roads in the transportation network are sufficiently large, then the protocol provides a privacy-preserving routing service whose travel time estimates are provably close to the ground truth. We evaluate our protocol in numerical experiments and present the results in Section 5. We summarize our work and identify important areas for future work in Section 6.
2 Model
In this section we describe the transportation network model, the objective for the users’ distributed algorithm to estimate traffic counts, and the privacy requirements for the algorithm.
2.1 Transportation Network
The transportation network is represented as a directed graph where edges represent roads and vertices represent road intersections. We use and to denote the number of vertices and edges in the graph respectively. The concepts of traffic flow, traffic counts, and travel times are essential to this work, so we will describe them here.
Definition 1 (Traffic Flow).
For a given road , its traffic flow measures the number of vehicles that enter the road during a fixed time interval (e.g., every second).
Definition 2 (Travel Time).
Each edge has an associated delay function where is the estimated travel time on the road if the traffic flow on the edge is .
Definition 3 (Traffic Counts).
For a given road , its traffic count is the number of vehicles currently on the road. At steady state the traffic count is equal to the traffic flow multiplied by the travel time. Specifically, . For convenience, we define the flow-counts function as so that .
Throughout this paper we make the following natural assumption on delay functions.
Assumption 1 (Properties of Delay Functions).
We assume that for each road , is a positive, non-decreasing and differentiable function on .
Remark 1.
The Bureau of Public Roads (BPR) function is a commonly used volume delay function which satisfies Assumption 1. Namely, it is a degree polynomial with positive coefficients (i.e., ).
Definition 4 (Travel Time as a Function of Traffic Counts).
For each road we define as the function that estimates travel time based on traffic counts. In other words, for an edge with volume and counts , we have . Since we know from Definition 3 that , we have .
Road capacities are a concept that will be important to our methodology and results, which we define as follows:
Definition 5 (-capacity).
For , the -capacity of a road , denoted , is the largest value so that for all we have .
2.2 Users and Traffic Counts
At any given time , let denote the number of users currently traveling in the transportation network. For , the state of user at time , given by , specifies which road the user is on. The entry of the vector is given by:
Note that exactly one entry of is and all others are . The traffic counts at time , denoted , represents the total number of vehicles on each road and is defined as
The number of users traveling on road at time , denoted by , is defined as .
2.3 Communication Model
In this work we assume that users can communicate with one another through private channels. Concretely, this means that for any pair of users and , user can send a message that can only be deciphered by user . Such a communication channel can be easily established using standard public key cryptography systems.
2.4 System Objective
The goal of the system is to periodically broadcast estimated travel times for all roads in the network for the sake of route recommendation. The accuracy of travel time estimates will be measured by mean absolute percentage error (MAPE) as defined in Definition 6.
Definition 6 (Mean Absolute Percentage Error (MAPE)).
Suppose is a (possibly randomized) estimator for a positive target value . Then the mean absolute percentage error (MAPE) of is given by
where the expectation is taken over the randomness in .
The following remark explains how the traffic counts are valuable to this effort.
Remark 2 (routing service from traffic counts).
The functions from Definition 4 can be used to compute estimated travel times for all roads at time from the traffic counts. A routing service can then recommend routes to users based on shortest paths computed from the estimated travel times.
With Remark 2 in mind, the system’s goal is to compute and announce the traffic counts of the system every minutes. Concretely, for each , at time the traveling users must compute an approximation to in a distributed and privacy-preserving way where privacy is defined according to Definition 7.
Definition 7 (Privacy-Preserving Mechanism).
A mechanism is -privacy preserving if it is -differentially private and can be computed in a distributed setting in a way that is cryptographically secure against semi-honest adversaries.
The precise definitions for cryptographic security, semi-honest adversaries and differential privacy are presented and motivated in the next section.
Remark 3 (On the choice of location data for travel time estimation).
In this work we use location data to estimate travel times in a transportation network. This is done by first estimating the traffic flow from traffic counts, and then estimating travel time from traffic flow. One natural alternative is to have users share both their location and speed. In this alternative approach, the location would specify which road the user is on and the average speed reported on a road could be used to estimate its travel time. We opted not to use speed information for two main reasons. The first is due to privacy requirements. As we will discuss and motivate in Section 2.5.2, differential privacy is an important property that we want our method to have. Due to the properties of differential privacy, there are effective ways to compute counts (such as the number of users on a given road) but no clear way to compute an average of user data (such as average speed) in a differentially private way. See Remark 5 for more details. The second is for ease of deployment. Requiring only location data means that our protocol only needs sparse GPS measurements, whereas speed estimation needs continuous GPS measurements, which essentially means that users are being tracked.
2.5 Privacy Requirements
To ensure user privacy, there are two requirements we impose on a desired protocol for the computation of traffic counts: cryptographic security and differential privacy.
2.5.1 Cryptographic Security
Cryptographic security pertains to settings where a group of agents, each with private data, would like to compute a joint function of everyone’s data without any agent needing to reveal its private data to other agents. In our setting, at time the users traveling in the network are agents, where is the private data of the th user, and the desired function is the sum of everyone’s private data. We make the following standard assumption on user behavior:
Assumption 2 (Semi-honest users).
We assume that all users are semi-honest111Semi-honest adversaries, honest-but-curious adversaries, and passive adversaries are equivalent and used interchangeably in the cryptography literature., which means they will follow the protocol but may try to do additional computation to learn the secret data of other users.
The definition of cryptographic privacy measures privacy by comparing protocols to an ideal computation model which is defined below.
Definition 8 (Ideal Computation Model).
In the Ideal Computation Model, there are agents with private data wanting to compute . Each agent sends its private data to a trusted third party which uses the private data to compute and sends this value back to all of the agents.
However, since trusted third parties cannot be assumed to exist, the ideal computation model cannot be implemented in a trustless and decentralized setting. Still, this model serves as a gold standard, and cryptographically secure protocols are required to provide the same level of security as this ideal model.
Definition 9 (Cryptographic Security).
A protocol between agents with private data wanting to compute is cryptographically secure if no probabilistic polynomial time agent learns anything more about other agents’ data than they would have learned in the Ideal Computation Model.
In other words, a protocol is cryptographically secure if no computationally efficient agent learns more from interacting with the protocol than they would from interacting with the Ideal Computation Model.
Remark 4.
We emphasize that this does not mean that agents learn nothing about other agents’ data. This is illustrated by a simple three agent example with private data and query function . In this example, by learning , learns the sum of the other agents’ data: .
In light of Remark 4, it is more accurate to say cryptographically secure protocols reveal nothing about other agents’ data beyond the value of the output.
Cryptographic security is necessary for user privacy, since we certainly do not want users to be able to determine the location of certain individuals through our protocol.
Unfortunately, for the application of user location data, cryptographic security alone is not enough to ensure user privacy, which we illustrate in the following example.
Example 1 (Insufficiency of Cryptographic Privacy for Sparse Data).
If Alice is an early bird and wakes up to run errands in the city before anyone else wakes up, then in the morning since Alice is the only person in the network, and therefore we have , hence the traffic counts reveal Alice’s location information. While does not explicitly label the single traveler in the system as Alice, this information can be inferred if the traveler begins and ends its route at Alice’s house. More generally, cryptographic security does not provide user privacy in sparse data settings. Even when there are multiple users active in the network, side information attacks can be used to associate trajectories in sparse datasets to certain individuals [21].
This motivates the second privacy requirement we enforce in this paper, which is differential privacy.
2.5.2 Differential Privacy
With Example 1 in mind, to protect the privacy of users like Alice, the output of a privacy-preserving protocol should not depend too much on the data of any single user. One way to ensure this is through differential privacy. To quantify the influence of a single user, we first introduce the concept of adjacent datasets.
Definition 10 (Adjacent Datasets).
Two datasets are adjacent if contains at most one datapoint that is not in and contains at most one datapoint that is not in . Concretely, are adjacent if and .
In our setting, a dataset would be the locations of the users who are traveling within the transit network at time . The dataset obtained from by modifying the location of one user, who we will call Alice, would be adjacent to since contains only the datapoint corresponding to Alice’s original location, and contains only Alice’s newly modified location. One sufficient way to ensure that a mechanism does not depend too much on any single users’ data is to demand that the mechanism behaves similarly on adjacent datasets. This is the approach taken by differential privacy which is defined below.
Definition 11 (Differntially Private Mechanism).
For , a -differentially private mechanism is a randomized function mapping datasets into an output space so that for any event and any adjacent datasets , we have
To understand why differential privacy gives us the desired privacy we seek, first note that for any two adjacent datasets , the distributions of are very similar. More specifically, the total variation distance between the distributions of is at most . Because of this, no hypothesis test can determine from the output of the mechanism whether its input was or with success probability better than , which is barely better than random guessing for small . This result holds even if the hypothesis test is given knowledge of all datapoints in .
Now suppose is a dataset that contains Alice’s location, and is obtained from by modifying Alice’s location arbitrarily. If an observer were able to accurately infer Alice’s location based on the output of a -differentially private mechanism, then it would be able to reliably distinguish between the inputs and . However, since differential privacy makes such a task statistically impossible, by contraposition it is statistically impossible for an observer to accurately infer Alice’s location based on the mechanism’s output. Hence differential privacy ensures privacy of Alice’s data.
The following remark describes a general methodology for achieving differential privacy.
Remark 5 (Query sensitivity and the required noise level).
Dwork’s pioneering work [8] proposes adding noise to queries in order to achieve differential privacy. Given a data set and a query , the mechanism is differentially private so long as is a random variable with sufficiently large variance. Specifically, to achieve -differential privacy, the variance of should be at least where is the sensitivity of the function , which is defined as
Revisiting Remark 3, the role of sensitivity in differential privacy is a main reason why we chose to estimate travel time using counts rather than with average speed. The sensitivity of counting functions is since the modification of a single data point can change the count by at most , however the sensitivity of an average is unbounded since a large change to a single data point in a data set can lead to a large change in the average. As such, counting functions are much more compatible with the concept of differential privacy than averages are.
3 Methodology
In this section we describe our distributed protocol to enable users to approximately compute in a privacy-preserving way. We will be using a Laplace Mechanism, which is described in Section 3.1 to ensure that the protocol is differentially private and will use secure multi-party computation, which is described in Section 3.2 to achieve cryptographic security. Using these tools, we present our privacy-preserving travel time estimation protocol in Section 3.3
3.1 Differential Privacy via the Laplace Mechanism
As previously mentioned, our goal is to compute a differentially private approximation to for every . For this we will use the Laplace Mechanism [8] , which produces a differentially private estimate to based on the following rule
where has independent and identically distributed entries according to the Laplace distribution with mean and scale parameter defined below.
Definition 12 (Laplace Distribution).
The Laplace Distribution with mean and scale parameter is a probability distribution over denoted as with probability density function given by
(1) |
We use the Laplace mechanism because it provides a differentially private approximation with the minimum possible mean absolute error [22].
Fact 1.
The mechanism is -differentially private.
Since we are interested in a decentralized and trustless computation model, the following remark shows that care must be taken in the computation of , particularly pertaining to the computation of , in order for differential privacy to be achieved.
Remark 6 ( must remain hidden).
It is essential to the differential privacy of that no observer learns the value of . Since is announced as the output of the protocol, if in addition is known by an observer then that observer can reconstruct by computing . In this case, the computation of is not cryptographically secure because the observer learns more about the user data than since in particular it learns the value of .
In light of Remark 6, in the next subsection we discuss a cryptographic technique called secret sharing which we will use for a cryptographically secure computation of .
3.2 Secure MPC via Secret Sharing
In this section we review a cryptographic tool known as secret sharing and discuss how different variants of it can be used to enable cryptographically secure arithmetic operations on private data. We describe how cryptographically secure addition can be performed on private data in Section 3.2.1 using Additive Secret Sharing, and how cryptographically secure multiplication can be performed on private data in Section 3.2.2 using Shamir Secret Sharing.
3.2.1 Secure Multi-Party Addition via Additive Secret Sharing
Suppose there are agents and someone wants to share a secret value with the agents so that the agents can reconstruct if they work together, but no group of fewer than agents can reconstruct the secret. This can be done using Additive Secret Sharing.
In Additive Secret Sharing, a large prime integer is first chosen. The shares are all chosen independently and uniformly at random from the set and the final share is determined by . Finally, is given to for each .
First, note that the agents can reconstruct by simply adding all of their shares together since by construction we have .
Next note that any group of strictly fewer than agents cannot reconstruct the secret. A straightforward calculation shows that for any strict subset , the distribution of does not depend on , and therefore provides no information on the value of .
Example 2 (Secure Multi-Party Addition).
One valuable application of Additive Secret Sharing is cryptographically secure computation of the sum of agents’ private data. Given agents with private data , their objective is to compute . For each , shares via Additive Secret Sharing by producing shares where is given to . At the end of this process, has received and can compute . The important observation here is that are additive secret shares for . Hence the agents can share the values with one another and compute via
3.2.2 Secure Multi-Party Multiplication via Shamir Secret Sharing
Shamir Secret Sharing [23] offers a more general -of- method for secret sharing. In a setting with agents and a secret to be shared among them, a -of- secret sharing scheme assigns shares to the agents so that any subset of agents can recover , but no subset of fewer than agents can recover . Note that Additive Secret Sharing is a -of- scheme.
Shamir’s Secret Sharing is based on the fact that a degree polynomial is uniquely determined from evaluations. As in the Additive Secret Sharing setting, a large prime is chosen. To share a secret value , the sharer generates a random degree polynomial
where are independent and uniformly distributed over . The share given to is . By construction, the shares and coefficients satisfy the following linear relationship
where is the Vandermonde matrix.
Since is a degree polynomial, any group of agents can solve a linear system to obtain the values of and thus recover the secret.
However, for any subset with , a careful calculation shows that the distribution of does not depend on and hence provides no information on the value of .
Example 3 (Secure Multi-Party Multiplication).
Shamir Secret Sharing and Additive Secret Sharing can be used to perform cryptographically secure multiplication. Given agents with additive secret shares for the values respectively so that and , the goal is for the agents to compute the product in a cryptographically secure way.
The computation of will require one round of communication. In this communication round, for each , performs Shamir secret sharing for its values and . Specifically, it generates two random polynomials of degree so that and and then sends to agent for each .
After the communication, obtains . From these it computes and . Note that are Shamir shares for the polynomial and similarly are Shamir shares for the polynomial . Since the polynomials all have degree at most , the polynomials also have degree at most . Thus if we define the polynomial , then has degree at most .
Now computes . By definition, are Shamir shares for the polynomial . Noting that
we have a degree polynomial whose constant term is the desired value . Furthermore, the agents know the value of at and hence can solve a linear system to obtain the coefficients of and thus obtain .
The Shamir shares can be converted into Additive shares by setting where is the entry of .
Remark 7 (Honest Majority Regime).
We would like to mention that the Secure Multi-Party Multiplication scheme we describe in this section requires an honest majority assumption on user behavior. This means that the scheme requires at least users to be fully honest, meaning that they will follow the protocol and will not collude in any way with any other users. The remaining users are assumed to be semi-honest. This is a stronger condition than Assumption 2 which only requires that all users are semi-honest. There exist Secure Multi-Party Multiplication schemes for the semi-honest setting based on Beaver Triples [24], but the scheme we presented in this section is more computationally efficient, and in practice honest majority may not be an unreasonable assumption.
3.3 Cryptographically secure and Differentially Private Estimation of Travel Times
In this section we show how the differential privacy and secret sharing tools we have discussed can be used to construct a privacy-preserving and decentralized protocol for travel time estimation, which can then be used by a routing service to recommend routes as per Remark 2. The protocol is described in Algorithm 1.
In order to satisfy the privacy requirements as stated in Definition 7, our protocol must be both differentially private and cryptographically secure. Recall from Section 3.1 that we will use the Laplace mechanism to obtain a differentially private estimate to the traffic counts, which is defined below
where is an i.i.d. vector of distributed random variables. We thus need to compute in a decentralized and cryptographically secure way.
In this section we demonstrate how to compute one entry of the vector , i.e., for an edge . The computation of the entire vector is obtained by parallelizing the computation of all entries.
We begin by choosing a large prime integer . Inverse transform sampling is a method which can transform a uniform random variable into a random variable with a desired distribution. Using this method we can compute where is uniformly distributed over and is a scaled version of the cumulative distribution function of . Concretely, is given by
To make the sampling of more computationally efficient, we will approximate with a degree polynomial . A larger degree leads to a more accurate approximation of the Laplace distribution but comes at a computational cost.
Thus approximately computing amounts to computing where is required to be uniformly distributed over . First, additive shares for can be computed using Secure Multi-Party Addition as is done in Example 2. Next, note that if are independent and uniformly random on , then is also uniformly distributed. Therefore user will draw a random value so that the value will be uniformly distributed. Using Secure Multi-Party Addition, the users can obtain additive shares for . The users will need to compute . Since is a polynomial of degree , the users will need additive shares for for the computation of . The users can obtain such shares through Secure Multi-Party Multiplication by multiplying with itself using Shamir and Additive Secret Sharing as described in Example 3. Using this method the users obtain additive shares for respectively. Letting be the coefficients of so that , the users can now construct additive shares for by taking linear combinations of previously computed shares. Explicitly, the shares
are additive shares for since by construction we have
Algorithm 1 describes the procedure that each user performs to enable the decentralized and privacy-preserving computation of .
4 Analysis: Accuracy of travel times based on
In the previous section we presented a decentralized and privacy-preserving protocol for computing , which is a differentially private estimate of the traffic counts at time (i.e., the th timestep). Since the traffic counts are useful for travel time estimation through volume delay functions, one natural question is how the travel time estimates obtained from will differ from those obtained from the ground truth traffic counts . In this section we show that if the roads in the transportation network are sufficiently large, then the travel time estimates obtained from will be close to those obtained had we used the non-privacy-preserving ground truth value .
We first discuss the errors in estimating the traffic counts due to the Laplace mechanism in Section 4.1. Next, in Section 4.2 we show how the properties of volume delay functions mitigate the errors induced by the Laplace mechanism, and how composing these two together can help achieve accurate and private travel time estimates.
4.1 Accuracy of the Laplace Mechanism
Recall that is a differentially private estimate of the traffic count on road at time . The mean absolute percentage error (MAPE) of as an estimate for is given by
From this we can make two conclusions. When there is a lot of traffic on , meaning that is much larger than , then will be large and hence will have a small MAPE. However, if is small, then will have a large MAPE.
This shows us that the Laplace Mechanism has poor accuracy when reporting small values. In fact, this is true for all differentially private mechanisms since the Laplace Mechanism has the minimum mean absolute error among all differentially private mechanisms [22]. This observation is consistent with Example 1 in that sparse data and small values pose the most difficulty in privacy-preserving efforts.
Fortunately, this bad news does not end our hopes for achieving both accuracy and privacy in travel time estimation. Even if may not always be a good estimate for , recall that our ultimate objective is travel time estimation, so we are interested in how well approximates . Recall Definition 4 for a description of . Next we will show how properties of delay functions can enable accurate travel time estimates even if traffic count estimation is poor.
4.2 Protocol accuracy for travel time estimation
In this section we show that if a road is sufficiently large, then the travel time estimates computed from are close to the travel time estimates computed from the ground truth . Mathematically, this means that is a good estimate for .
The key insight behind our result lies in the complementary qualities of differential privacy and volume delay functions. As we saw in Section 4.1, the Laplace mechanism has good accuracy when reporting large values, but poor accuracy for reporting small values. Volume delay functions on the other hand, are very sensitive when the input is large, but very insensitive when the input is small. When composing a volume delay function with a Laplace mechanism, the complementary qualities manifest in two ways. When the traffic is larger than a road’s capacity, the volume delay function is very sensitive. Fortunately, in this case the high accuracy of the Laplace mechanism ensures that the traffic count is estimated accurately, leading to accurate traffic flow estimation, which leads to accurate travel time estimation. On the other hand, when the traffic is below the road’s capacity, the Laplace mechanism has poor accuracy, however the delay function is very insensitive in this regime and is able to tolerate large estimation error, leading to accurate travel time estimation. Thus the Laplace mechanism and volume delay function cover each others’ weaknesses to enable accurate travel time estimation for any level of traffic.
We formalize this insight through Theorem 1, which, given desired privacy and accuracy levels and respectively, provides conditions under which will be close to with high probability. The condition is determined by the road’s -critical traffic count, which is defined below.
Definition 13 (-critical traffic count).
The -critical traffic count of a road is the number of vehicles on the road in steady state so that the travel time is exactly times as large as its free-flow travel time. Mathematically, the -critical capacity of is
where is the -capacity of the road as defined in Defintion 5. With this setup in place, we now present Theorem 1.
Theorem 1 (Accuracy of Travel Time Estimates).
Let specify the desired privacy and accuracy levels respectively and let represent a failure probability. For a road , if satisfies Assumption 1 and
where is the -capacity of , then for any value of , the following condition is satisfied with probability at least :
4.3 Discussion
One natural and immediate question is whether the requirement on the -critical traffic count of roads in Theorems 1 are satisfied by real road networks. To answer this question we first discuss parameter choices. To ensure a meaningful privacy guarantee for each timestep, should be significantly smaller than . For this reason we focus on applications where .
With , and , the condition in Theorem 1 requires that a road’s -critical count be at least cars. For such roads, the Theorem states that the estimated travel time will be within 10 percent of the ground truth with probability at least 90 percent. To see whether such a requirement is reasonable, we studied a real world transportation network. The -critical capacities of all roads in the Sioux Falls transportation network are plotted in Figure 3. The figure shows that more than 80 percent of the roads in the Sioux Falls network have -critical counts above 127. Thus the conditions required by Theorem 1 are realistic for most roads.
Care must also be taken in choosing the value of , i.e. the amount of time between updates to the estimated traffic counts. Specifically, should be chosen so that it is similar to the typical travel time of a road in the network. If is much smaller than the typical travel time on a road, then sample average approximation can be used to denoise to get a better estimate of . While better estimation is usually good, it is also synonymous with less privacy. To illustrate this concept, suppose that for timesteps the ground truth traffic counts for a given road is constant, meaning that there is a value so that for all . Now for each , is an unbiased estimator for with variance . This variance is necessary to ensure differential privacy. However, note that is also an unbiased estimator for but now has variance . This decrease in variance leads to worse privacy guarantees. For this reason, should be chosen similar to the travel time of a road so that can never get too large.
5 Numerical Experiments
We evaluate the performance of our protocol in the Sioux Falls road network setting. The purpose of these numerical experiments is to compare the travel time experienced by vehicles when they are routed using (a) travel time estimates derived from our protocol and (b) travel time estimates derived from the ground truth traffic counts. In particular we want to quantify the extent of travel time degradation incurred by the use of our protocol’s privacy-preserving mechanisms. Thus, these experiments help us test the practical implications of our road-level theoretical results when viewed in the context of the entire network. In Section 5.1, we present details about the data sources, road network, traffic demand, and the experimental setup. In Section 5.2, we study the impact of our protocol on route choices and travel time. These simulations suggest that the price for privacy may be negligible or even zero, thereby strengthening the case for conducting real-world field studies.
5.1 Setup
The properties of the Sioux Falls road network as well as the typical user demands is obtained from the Transportation Network Test Problems (TNTP) dataset. The Sioux Falls road network consists of 24 nodes and 76 edges. Each edge is characterized by a maximum speed, free flow throughput (i.e., vehicles per hour), and length of the segment. Note that we use the terms edge and road interchangeability. Travel times are computed using BPR functions whose parameters are obtained from the aforementioned edge characteristics. Additionally, the dataset also reports the steady state traffic demand between 528 origin-destination (OD) pairs.
We conduct two types of experiments: private routing and non-private routing. In both types of experiments we simulate traffic flow on the network at a time resolution of seconds. At every time step, we draw new demand from a Poisson distribution, with the mean demand proportional to the steady state demand reported in the dataset. Thus, at each time step, we draw a random number of vehicles with a corresponding origin and destination. For each vehicle, we identify a shortest travel time route between its origin and destination nodes. In the private routing experiment, the shortest path is computed using the most recent travel time estimates produced by our protocol. In the non-private routing experiment, the shortest path is computed using the ground truth travel time which depends on the ground truth traffic counts. In both types of experiments, the simulated movement of the vehicles on the road network is determined by the ground truth traffic counts on that road.
The duration of the simulation is 2 hours, with additional buffer time in the end for vehicles already in transit to complete their trips. For the private routing experiments, the vehicles are assumed to use our protocol to update travel time estimates every minutes to minimize the number of reports that a vehicle makes from the same road (See Section 4.3). In our experiments, we consider values of and . We will discuss how our protocol would perform for other values of in Section 5.2. We consider three vehicle demand profiles. In the baseline scenario, about 60,120 vehicles are expected to join the road network every hour. We also consider a low demand scenario, where all the OD Poisson parameters are decreased by 50% and a high demand scenario, where all the OD Poisson parameters are increased by 50%. To gain some more insight into the degree of strain this demand induces on the road network, we refer the reader to Table 1. Here, we report the rate of vehicles being added to the network for each scenario as well as the minimum, maximum, and average road utilization during the simulation period. The road utilization is defined as the term from Remark 1. Utilization greater than 1 indicates congestion on a road. The baseline demand profile results in several congested roads – representing a realistic scenario for testing our protocol. Additionally, as expected, increasing the demand rate results in a higher road utilization and more congested roads.
Scenarios |
|
|
|
|
||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Baseline | 60,100 | 0.05 | 1.15 | 0.52 | ||||||||
Low | 30,050 | 0.00 | 0.90 | 0.28 | ||||||||
High | 90,150 | 0.07 | 1.47 | 0.74 |
5.2 Results

We evaluate the impact of privacy noise on the routes and travel times of vehicles in the network. Table 2 presents several performance measures for under the three demand profiles. Note that the choice of is overly conservative since smaller represents a stricter privacy requirement and thus more noise injected into the system. Typical values of that are chosen in practice are larger than 0.1 depending on the application. Nevertheless, we consider this relatively stringent privacy requirement to understand the performance limits our protocol.
From Table 2, we first observe that the increase in average travel time for a vehicle is only 8 seconds in the baseline case. This corresponds to a 1.3% increase in travel time. For the low and high demand scenarios, the increase in average travel times are 3.1 sec (0.6%) and 13.2 sec (1.9%) respectively. Vehicles will experience a low additional travel time if the routing decisions for vehicles from our protocol closely resembles what they would have done without any privacy noise. In other words, if the shortest paths on the graph with noisy travel time estimates is the same (or very similar) to the shortest path on a graph with accurate travel time estimates, the vehicles will experience very little additional travel time. Our results confirm that this is indeed the case – the routes chosen by almost 90% of the vehicles are unchanged due to our protocol. Finally, as we compare the three demand scenarios, we observe that higher demand results in greater congestion and subsequently higher travel times. Furthermore, when demand is higher, the choice of an appropriate route for each vehicle is even more critical to minimize. Working with noisy estimates of travel time, during periods of high congestion can thus lead to incorrect routing choices. This is consistent with our observation that the route of 90.9% of the cars are unchanged by our protocol when the demand is low, but only 87% of the routes remain unchanged when the demand is high.
Performance measure | Low demand | Baseline demand | High demand |
---|---|---|---|
Travel time (sec) | 546.5 | 591.9 | 678.2 |
Travel time with our protocol (sec) | 549.6 | 599.9 | 691.4 |
Increase in travel time (sec) | 3.1 | 8.0 | 13.2 |
Increase in travel time (%) | 0.6 | 1.3 | 1.9 |
Cars with no change in route (%) | 90.9 | 88.3 | 87.1 |
Cars with no increase in travel time (%) | 65.9 | 41.3 | 20.6 |
Performance measure | Low demand | Baseline demand | High demand |
---|---|---|---|
Travel time (sec) | 546.5 | 591.9 | 678.2 |
Travel time with our protocol (sec) | 546.5 | 592.2 | 677.4 |
Increase in travel time (sec) | 0.0 | 0.2 | -0.7 |
Increase in travel time (%) | 0.0 | 0.0 | -0.1 |
Cars with no change in route (%) | 98.4 | 97.5 | 94.4 |
Cars with no increase in travel time (%) | 90.7 | 67.9 | 38.6 |
Next, we study the performance of our protocol with a lower privacy setting of . Note that we expect that the performance of our protocol converges to the non-private setting as . Interestingly, our results show that even with , the performance of our protocol becomes indistinguishable to the non-private setting. The performance of our protocol for is shown in Table 3. We observe that the increase in travel time is nearly 0 for all the three demand scenarios. In fact, the randomness in the travel time estimates can also result in marginal improvements in travel time in some settings (reflected as a negative increase in travel time). Not surprisingly, the routes chosen by nearly all the cars also is unaffected by our privacy preserving protocol. The results from varying are very encouraging – for a reasonable privacy requirement, we are able to get privacy for ‘free’ with no loss in system performance. Although not shown for brevity, our experiments indicated that cars observe no increase in average travel time for all values of .
The results from these experiments are significantly better than what is guaranteed by Theorem 1. As we discussed in Section 4.3, Theorem 1 promises that the estimated travel time on each road will be within 10 percent of the ground truth at least 90 percent of the time when . Our numerical results in Table 2 show that even when (i.e., times as noisy as ), our protocol only introduces an overhead of 8 percent in the baseline case and 13 percent in a high demand case. We believe this is because real world road networks have redundancy. By this we mean that there are many near-optimal paths from an origin to a destination, so it is very likely that at least one of these paths has an accurately estimated travel time. Theorem 1 looks only at the edge level and is thus does not exploit favorable network topologies. On a positive note, Theorem 1 is general in the sense that it can be applied to a network with any topology and any demand structure.
6 Conclusion
In this paper we propose a protocol for a decentralized routing service where travel times are computed from user location data in a privacy-preserving way. In most current routing services, users give their individual location data directly to routing services in exchange for route recommendations. Since this data is associated with the users’ identity, users’ schedules, habits, preferences and other private information can be inferred through repeated interactions with routing services. Contrary to this, the protocol proposed in this paper is both differentially private and cryptographically secure, meaning that only the aggregate effect of traffic on travel time is obtainable from the protocol, and users’ individual location data cannot be inferred by other parties. We also show that for large roads, it is possible to estimate travel time both accurately and privately. This is due to complementary qualities of differential privacy and delay functions. We evaluated the performance of the protocol through simulation in the Sioux Fall transportation network and showed that the protocol incurs minimal performance overhead in practice while providing a principled privacy guarantee.
There are many interesting and important directions for future work. The first direction is related to finding a more refined definition for privacy. Travel time estimation in the literature is often based on flow or average speed of vehicles on the road. However, we chose to estimate travel times based on traffic counts due to compatibility with differential privacy. More specifically, without additional domain-specific assumptions, it is impossible to compute flow or average speed in both an accurate and differentially private way (see Remark 5). Thus while differential privacy is a general and powerful concept, it is perhaps too restrictive for some common mobility applications such as flow or speed estimation. Developing a more specialized notion of privacy for mobility applications could enable more algorithmic possibilities while retaining meaningful privacy guarantees. The second direction is related to adoption rate. In this paper we implicitly assume that all vehicles in the network are willing to participate in the protocol, though technically a uniform and known adoption rate would be sufficient. While we believe this is a reasonable assumption in an era of connected vehicles, developing a protocol that is agnostic to participation rate would provide robustness. A third direction is related to other applications. Developing decentralized and privacy-preserving pricing for roads and for mobility services would be an interesting direction.
References
- [1] A. R. Beresford and F. Stajano, “Location privacy in pervasive computing,” IEEE Pervasive computing, vol. 2, no. 1, pp. 46–55, 2003.
- [2] C.-Y. Chow, M. F. Mokbel, and X. Liu, “Spatial cloaking for anonymous location-based services in mobile peer-to-peer environments,” GeoInformatica, vol. 15, no. 2, pp. 351–380, 2011.
- [3] C. Dwork, “Differential privacy: A survey of results,” in International conference on theory and applications of models of computation, pp. 1–19, Springer, 2008.
- [4] O. Goldreich, S. Micali, and A. Wigderson, “How to play any mental game or A completeness theorem for protocols with honest majority,” in STOC 1987, New York, New York, USA (A. V. Aho, ed.), pp. 218–229, ACM, 1987.
- [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.
- [6] M. Ghasemzadeh, B. C. Fung, R. Chen, and A. Awasthi, “Anonymizing trajectory data for passenger flow analysis,” Transportation research part C: emerging technologies, vol. 39, pp. 63–79, 2014.
- [7] B. Y. He and J. Y. Chow, “Optimal privacy control for transport network data sharing,” Transportation Research Part C: Emerging Technologies, vol. 113, 2020.
- [8] C. Dwork, F. McSherry, K. Nissim, and A. D. Smith, “Calibrating noise to sensitivity in private data analysis,” in Theory of Cryptography Conference, vol. 3876, pp. 265–284, Springer, 2006.
- [9] Z. Wang, J. Hu, R. Lv, J. Wei, Q. Wang, D. Yang, and H. Qi, “Personalized privacy-preserving task allocation for mobile crowdsensing,” IEEE Transactions on Mobile Computing, vol. 18, no. 6, pp. 1330–1341, 2018.
- [10] K. Yan, G. Lu, G. Luo, X. Zheng, L. Tian, and A. M. V. V. Sai, “Location privacy-aware task bidding and assignment for mobile crowd-sensing,” IEEE Access, vol. 7, pp. 131929–131943, 2019.
- [11] Y. Gong, C. Zhang, Y. Fang, and J. Sun, “Protecting location privacy for task allocation in ad hoc mobile cloud computing,” IEEE Transactions on Emerging Topics in Computing, vol. 6, no. 1, pp. 110–121, 2015.
- [12] M. E. Gursoy, L. Liu, S. Truex, and L. Yu, “Differentially private and utility preserving publication of trajectory data,” IEEE Transactions on Mobile Computing, vol. 18, no. 10, pp. 2315–2329, 2018.
- [13] K. Al-Hussaeni, B. C. Fung, F. Iqbal, G. G. Dagher, and E. G. Park, “Safepath: Differentially-private publishing of passenger trajectories in transportation systems,” Computer Networks, vol. 143, pp. 126–139, 2018.
- [14] R. Dong, W. Krichene, A. M. Bayen, and S. S. Sastry, “Differential privacy of populations in routing games,” in 2015 54th IEEE Conference on Decision and Control (CDC), pp. 2798–2803, IEEE, 2015.
- [15] Y. Li, D. Yang, and X. Hu, “A differential privacy-based privacy-preserving data publishing algorithm for transit smart card data,” Transportation Research Part C: Emerging Technologies, vol. 115, p. 102634, 2020.
- [16] M. Wernke, F. Dürr, and K. Rothermel, “Pshare: Ensuring location privacy in non-trusted systems through multi-secret sharing,” Pervasive and Mobile Computing, vol. 9, no. 3, pp. 339–352, 2013.
- [17] W. Ren and S. Tang, “Egeoindis: An effective and efficient location privacy protection framework in traffic density detection,” Vehicular Communications, vol. 21, p. 100187, 2020.
- [18] T. Li, L. Lin, and S. Gong, “Autompc: Efficient multi-party computation for secure and privacy-preserving cooperative control of connected autonomous vehicles,” in SafeAI@ AAAI, 2019.
- [19] J. Zhang, F. Yang, Z. Ma, Z. Wang, X. Liu, and J. Ma, “A decentralized location privacy-preserving spatial crowdsourcing for internet of vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 4, pp. 2299–2313, 2020.
- [20] S. Zou, J. Xi, G. Xu, M. Zhang, and Y. Lu, “Crowdhb: A decentralized location privacy-preserving crowdsensing system based on a hybrid blockchain network,” IEEE Internet of Things Journal, 2021.
- [21] V. Pandurangan, “Riding with the stars: Passenger privacy in the nyc taxicab dataset.” Available Online, 2014.
- [22] Q. Geng and P. Viswanath, “The optimal mechanism in differential privacy,” in 2014 IEEE International Symposium on Information Theory, 2014.
- [23] A. Shamir, “How to share a secret,” Commun. ACM, vol. 22, no. 11, pp. 612–613, 1979.
- [24] I. Damgård, M. Keller, E. Larraia, V. Pastro, P. Scholl, and N. P. Smart, “Practical covertly secure MPC for dishonest majority - or: Breaking the SPDZ limits,” in Computer Security - ESORICS 2013 - 18th European Symposium on Research in Computer Security, Egham, UK, September 9-13, 2013. Proceedings (J. Crampton, S. Jajodia, and K. Mayes, eds.), vol. 8134 of Lecture Notes in Computer Science, pp. 1–18, Springer, 2013.
Appendix A Proof of Theorem 1
We prove Theorem 1 by proving the following Lemma:
Lemma 1.
For any , the following inequality
is satisfied with probability at least when .
The Theorem follows by applying the result with . We will prove Lemma!1 by considering two exhaustive cases:
-
Case 1: ,
-
Case 2: .
A.1 Proving Lemma 1 in Case 1
In Case we have . By the condition in Theorem 1, , and thus .
Next, note that
where is due to the the formula for the cumulative distribution function for the Laplace distribution. Therefore, with probability at least , both are less than . For the remainder of the Case 1 discussion we will condition on the high probability event that both are less than .
A.2 Proving Lemma 1 in Case 2
For Case 2 we have . Our analysis for this case will involve and , so we will compute them using chain rule:
Using this, we can now compute using chain rule:
Defining , we see that
We make an observation that will be useful later:
Observation 1.
Since is non-negative under Assumption 1, we have .
Observation 2.
. This is because since , we have . Therefore .
With this setup in hand, we are now ready to prove the Lemma. First note that
where is due to the fact that has the exponential distribution with parameter , and this distribution has the cumulative distribution function . Thus with probability at least , we have . For the remainder of the Case 2 discussion we will condition on the high probability event that .
By the fundamental theorem of calculus,
where is due to Observation 1. Since was defined to be , is due to the fact that is increasing, therefore is an increasing function of and hence is a decreasing function of . is because since we are in the event that . is due to Observation 2, and is because is an increasing function, since it is composed with , which are both increasing. Since , we have
which proves Lemma 1 in Case 2.