Privacy-Preserving Taxi-Demand Prediction Using Federated Learning

Yumeki Goto Osaka University, Japan
[email protected] Tomoya Matsumoto Osaka University, Japan
[email protected] Hamada Rizk {@IEEEauthorhalign} Naoto Yanai Osaka University, Japan
Tanta University, Egypt
[email protected] Osaka University, Japan
[email protected] Hirozumi Yamaguchi Osaka University, Japan
[email protected]

Abstract

Taxi-demand prediction is an important application of machine learning that enables taxi-providing facilities to optimize their operations and city planners to improve transportation infrastructure and services. However, the use of sensitive data in these systems raises concerns about privacy and security. In this paper, we propose the use of federated learning for taxi-demand prediction that allows multiple parties to train a machine learning model on their own data while keeping the data private and secure. This can enable organizations to build models on data they otherwise would not be able to access. Evaluation with real-world data collected from 16 taxi service providers in Japan over a period of six months showed that the proposed system can predict the demand level accurately within 1% error compared to a single model trained with integrated data.

Index Terms:

Taxi demand, federated learning, trajectory generation, transportation system

I Introduction

The utilization of spatio-temporal location data has immense potential to enhance the availability and improvement of various services, especially data-driven approaches, which can train intelligent models in different domains, such as transportation, urban planning, and emergency management. One such service, taxi transportation, is a critical component of modern urban transportation systems, providing convenient and efficient transportation to a wide range of passengers. However, there is often a mismatch between the supply of taxis and passenger demand, leading to decreased profits for taxi providers due to increased cruising times, fuel consumption, and longer wait times for customers.

To address this issue, taxi-demand prediction systems have been proposed that utilize data-driven approaches to predict taxi demand and optimize dispatch processes [1, 2]. Machine or deep learning models are trained with real customer mobility data to forecast future taxi demand in a specific geographic area. This training data includes pickup and drop-off locations, routes taken, and timing information of customers. However, sharing such trajectory data raises significant privacy concerns as it could reveal intimate personal details, such as individuals’ whereabouts, movement patterns, and even their religious, political, or sexual convictions, through the prediction of Points of Interest (POI) using mapping data and coordinates. facilities may have different legal and regulatory requirements that they need to comply with. These requirements can vary between countries and regions and need to be considered when working with data from different facilities.

Various privacy-preserving methods [3, 4, 5, 6, 7, 8, 9] have been proposed to address privacy concerns associated with personal data. These methods aim to protect the privacy of individuals by anonymizing the data before sharing it. Differential privacy is a method that introduces randomness into data, making it difficult for an attacker to determine the identity of individuals [6]. K-anonymity groups individuals into groups with similar characteristics, making it difficult to determine the identity of any individual[3]. L-diversity and t-closeness are other privacy-preserving methods that generalize data to prevent sensitive information disclosure[10, 11, 12]. Secure computation allows for the computation of a function on private data without revealing it [13, 14, 15]. While these methods can protect privacy, they can also result in a loss of data quality and quantity, negatively impacting the performance of the service (e.g., the prediction accuracy of taxi demand). Thus, it is important to weigh the trade-off between privacy and performance when choosing a privacy-preserving method.

In this paper, we propose a novel taxi-demand prediction system that prioritizes customer privacy and builds the model without necessitating sharing data. This can be achieved by employing federated learning that allows multiple parties to train a machine learning model on their own data while keeping the data private and secure. In the context of taxi-demand prediction, this could be useful because it allows multiple facilities (e.g., taxi service providers) to collaborate on building a demand prediction model without sharing their proprietary data with each other. This can lead to more accurate predictions, as the model is able to learn from a larger and more diverse dataset.

However, the application of federated learning in this context faces a generalization problem as the local models are trained with absolute latitude-longitude values associated with each facility’s data. The use of absolute latitude-longitude values may exhibit region-dependence characteristics that affect the generalization ability and convergence of the global prediction model. To address this challenge, the system incorporates a number of techniques to encode the absolute latitude-longitude values into a region-independent space, making the model more versatile and applicable to different geographical areas.

Refer to caption — Figure 1: An example of how the taxi demand is biased toward no or low level in an area of one facility. X-Y are the lat-long values and the boxes represent the number of taxi requests in this spot at a specific time.

The proposed system was subjected to a rigorous evaluation using real-world data gathered from 16 taxi service providers in Japan. The data was collected over a six-month period and employed to evaluate the system’s effectiveness in maintaining prediction performance while preserving passenger privacy. The results obtained from the evaluation confirm that the proposed system, which utilizes federated learning and associated modules, achieves a comparable accuracy level with a negligible reduction of less than 1% in accuracy compared to non-federated learning approaches that require sharing of customer data among facilities.

The rest of the paper is organized as follows: Section II contains related works. Section III explains our federated learning system for taxi-demand prediction in detail. Section IV discusses evaluations of the system. Finally, the conclusions are discussed in Section V.

II Related Work

This section describes taxi-demand prediction and privacy-preserving machine learning, including federated learning in spatiotemporal data and several privacy-preserving notions, as related works.

II-A Taxi-Demand Prediction

The prediction of taxi demand has recently garnered considerable attention, owing to the abundance of large-scale spatiotemporal data that facilitates the training of deep neural networks, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks.

Recent studies have leveraged both spatial and temporal characteristics to predict taxi demand with greater accuracy. For example, [16] employs a CNN to capture spatial features and an LSTM to capture temporal features, resulting in improved accuracy compared to methods that only consider semantic, spatial, or temporal information. [17] recognizes the existence of spatiotemporal correlations between pick-up and drop-off locations and proposes a taxi-demand prediction model using multitask learning, which predicts both pick-up and drop-off locations as interrelated tasks. This approach leads to more accurate prediction results.

Other studies have focused on accounting for the heterogeneity of taxi demand across regions. [18] clusters taxi-demand data and trains region-specific models to predict demand, taking into account the unique distribution and temporal variability of demand in each region. While these machine learning-based methods have shown promising results when applied to spatiotemporal data, they do not consider privacy threats associated with sharing users’s data, even anonymized. The methods proposed in [19, 20] represent groundbreaking approaches to sharing synthetic versions of data by utilizing generative adversarial networks, thereby enabling secure data publication.

In contrast, our proposed system evaluates the accuracy of taxi-demand prediction while preserving privacy. The system uses federated learning to avoid sharing sensitive customer data.

II-B Privacy-Preserving Machine Learning

The main motivation for federated learning in spatiotemporal data is for privacy-preserving on heterogeneous data [21] that may cause a model drift problem for conventional training algorithms. Federated learning in spatiotemporal data is often discussed in actual application environments, i.e., urban [22, 23], renewable energy [24], and robotics [25]. In this paper, we discuss taxi-demand prediction as an application environment different from the above existing works.

The most popular approach for privacy-preserving machine learning is differential privacy [26] which provides theoretical security. Differential privacy is used for gradient computation [27], and it can theoretically prevent data recovery [28]. There are results on differential privacy of federated learning [29] and developments of libraries [30, 31, 32]. However, differential privacy deteriorates accuracy significantly.

Another approach for providing privacy is to achieve definitions such as k-anonymity [33, 34] and l-diversity [35]. The k-anonymity requires each record to share the same values with at least k-1 other records in the dataset while the l-diversity requires each equivalence class to contain at least l sensitivities. Similar to differential privacy, accuracies of machine learning models based on these notions deteriorate [36, 37].

III The System Details

This section describes the proposed system in detail. The virtual gridding module and its resultant taxi-demand prediction model are first described. Then, the federated learning is described.

III-A The Virtual Gridding Module

The Virtual Gridding module is a crucial component that operates during both the online and offline phases of the system. In the offline phase, the module processes historical trajectory data to construct a comprehensive demand profile for the city. This profile is then used to train the machine learning models that power the demand prediction functionality of the system. The module achieves this by transforming the raw trajectory data collected from taxi drivers into a more manageable and interpretable format.

To accomplish this, the module creates a virtual grid, dividing the city map into evenly spaced grid cells that correspond to specific locations. By tracking the number of pick-up and drop-off events within each cell during a specified time-slot, the module accurately calculates the total demand events for each area. This approach enables the system to provide a high-level overview of the taxi demand across various regions of the city. The resulting demand patterns can then be leveraged to train machine learning models for predicting the number of demand events accurately in different cells. Furthermore, the grid-based visualization of the demand patterns can be used to identify areas of high or low demand quickly.

During the online phase, this module converts any latitude and longitude coordinate into the corresponding grid cell in real-time. This cell ID can be fed into the trained demand prediction model to make accurate real-time predictions ensuring that the system has access to the most recent demand information.

III-B Taxi-Demand Prediction Model

This module is responsible for leveraging the input features ( $c$ ) to train a deep localization model and find its optimal parameters. The trained model is used during the online phase by the Demand Predictor module to provide an estimate of the taxi-demand. A deep fully-connected neural network is adopted here due to its representational ability, which allows the learning of complex patterns.

III-B1 The Network Architecture

Fig. 2 shows our deep network structure. We construct a deep fully connected neural network consisting of cascaded hidden layers of nonlinear processing neurons. Specifically, we use the hyperbolic tangent function (tanh) as the activation function for the hidden layers due to its non-linearity, differentiability (i.e., having stronger gradients and avoiding bias in the gradients), and consideration of negative and positive inputs [38]. The input layer of the network is the cell id and the timestamp. The output layer consists of a number of neurons corresponding to the number of taxi-demand levels in the data. This network is trained to operate as a multinomial (multi-class) classifier by leveraging a softmax activation function in the output layer. This leads to a probability distribution over the demand levels given a spatiotemporal input (cell location and time).

More formally, the input feature vector $c_{i}$ = $(c_{i1},c_{i2},..c_{ik})$ of length $k$ , the corresponding discrete outputs (i.e logits) $c_{i}$ is $a_{i}=(a_{i1},a_{i2},..,a_{in})$ capture the score for each demand level from the possible $n$ total taxi-demand levels to be the estimated level. The softmax function converts the logit score $a_{ij}$ (for sample $i$ to be at demand level $j$ ) into a probability as:

p(a_{ij})=\frac{e^{a_{ij}}}{\sum_{j=1}^{j=q}{e^{a_{ij}}}}

(1)

III-B2 Training

During the training phase, the ground-truth probability label vector of demand $P(a_{i})=[p(a_{i1}),p(a_{i2})...p(a_{in})]$ is formalized using one-hot-encoding. This encoding has a probability of one for the correct demand levels and zeros for others.

The model is trained using the Adaptive Moment Estimation (Adam optimizer [39]) to minimize the average cross-entropy between the estimated output probability distribution $P(a_{i})$ and the one-hot-encoded vector $g_{i}$ . The loss function is defined as follows:

\mathop{\mathcal{L}}=\frac{1}{N_{s}}\sum_{i=1}^{n}D(P(a_{i}),g_{i})

(2)

where $P(a_{i})$ is obtained using the softmax function, $g_{i}$ is the one-hot encoded vector of the $i^{th}$ sample, $N_{s}$ is the number of samples available for training, and $D(P(a_{i}),g_{i})$ is the cross-entropy distance function defined as:

D(P(a_{i}),g_{i})=-\sum_{j=1}^{n}g_{ij}log(P(a_{ij}))

(3)

III-C Federated Learning

III-C1 Our Approach

Federated learning is a distributed machine learning technique that enables multiple clients to train a model collaboratively without sharing their private data with a central server. In this study, we use the Federated Averaging (FedAvg) algorithm on our federated learning of taxi-demand prediction. FedAvg, proposed by McMahan et al. [40], is a widely used framework for federated learning due to its simplicity and scalability.

The FedAvg algorithm works as follows: At the beginning of each round, the central server selects a subset of clients to participate in the training process. The server sends the current global model to the selected clients, and each client trains the model using their local data. Specifically, each client updates the model by computing the gradients of their local loss function and performing a gradient descent step. This local update is given by:

w_{t+1}^{k}\leftarrow w_{t}-\eta g_{k}

(4)

where $w_{t}$ and $w_{t+1}^{k}$ are the model parameters at round $t$ and $t+1$ respectively, $k$ is the client ID, $\eta$ is the learning rate, and $g_{k}$ is the gradient of the local loss function with respect to the model parameters.

After the local updates are completed, each client sends their updated model to the central server. The server then averages all the received models to obtain a new global model. The global update is given by:

w_{t+1}\leftarrow\sum_{k=1}^{K}\frac{n_{k}}{n}w_{t+1}^{k}

(5)

where $n_{k}$ is the number of data samples held by client $k$ , $n$ is the total number of data samples in the system, and $K$ is the total number of clients participating in the training process.

The FedAvg algorithm repeats the above process for a specified number of rounds until convergence. The global model of the server is the final output.

Fig.3 shows the overview of our federated learning approach for taxi-demand prediction. Each client represents a specific facility and has access to its own private data. We implemented our approach using PyTorch, a popular machine learning framework, and Flower [41], a federated learning framework for PyTorch.

Our approach involves the following steps:

1.

The central server sends the current global model to a subset of clients.
2.

Each client trains the model using their local data, and updates the model using the FedAvg algorithm.
3.

Each client sends their updated model back to the central server.
4.

The central server averages all the received models to obtain a new global model.
5.

The above steps are repeated until convergence.

TABLE I: Hyperparameters of experiment settings.

Criteria	Value (bold default)
Number of prediction classes	4
Number of global epochs	300
Patience of early stopping	10, 30, $\infty$
Number of facilities	4, 8, 16
Number of local epochs	1

IV Evaluation

This section describes experimental evaluations. Firstly, data collection is described. Then, the evaluations of the taxi-demand prediction model described in Section III and the privacy are described.

IV-A Data Collection and Setting

IV-A1 Data Collection

We gathered real-world data from $16$ service facilities in Japan over a period of six months. The collected data includes (1) vehicle information and their trajectories (including idle time), and (2) spatiotemporal data of each customer’s pickup and drop-off event for each vehicle. The system determined the trajectory of each customer’s trip by merging the two datasets using the vehicle ID and time as the key factors. This resulted in 15,178 trips, with taxi demands ranging from 0 to 20, calculated using a grid size of 1 km and a time slot of 1 hour.

The trajectory data was obtained through GPS for latitude and longitude, with data acquisition intervals of approximately every 5 seconds, with some missing data. To determine the locations of pickup and drop-off events, we used data on vehicle positions during the 45 seconds before and after the event, if available. If the data was not present, the event was omitted from the evaluation data. The number of demands with determined locations and times was 10327.

IV-A2 Experimental Setting

We describe each setting below.

Data Splitting

In the following experiments, we split the entire data into three subsets, i.e., 64% for training data, 16% for validation data, and 20% for test data. The training data is utilized for training the model, the validation data is for early stopping the training, and the test data is for computing the evaluation metrics described later. In the case of federated learning, each facility has the training and validation data, and a central server has the test data. We then utilize the split dataset for two models, i.e., a single model and federated learning. Each model is trained in the same setting as Section IV-B.

Metrics

We focus on two metrics, i.e., accuracy and balanced accuracy [42] for taxi-demand prediction evaluation. Since the gathered data described above are class-imbalanced, the evaluation of the conventional accuracy for prediction results is insufficient. Therefore, we adopt the balanced accuracy, which is the average of the accuracy between all the classes. We utilize the existing implementations of the scikit-learn library for the above metrics.

Hyperparameters

Hyperparameters in the experiments are shown in Table I, where four prediction classes are defined as non, low, med, and high. We also set the ‘margin’ described in Section III-C as 1.

IV-B Evaluation of Taxi-Demand Prediction

Figure 4 and Figure 5 illustrate the comparison between the single model and the proposed federated learning approach, and it shows that the accuracy and balanced accuracy of the federated learning are slightly lower than those of the single model by 0.096, and 0.310, respectively. However, it is essential to highlight that federated learning enables privacy preservation by training the model on decentralized data without compromising the security of the data. This aspect is particularly important for commercial applications, e.g., taxi-demand prediction based on customers’ data. Therefore, the slight tradeoff between accuracy and privacy in federated learning is a reasonable compromise, and it makes this approach a practical and promising solution for privacy-sensitive scenarios. Specifically, federated learning ensures compliance with privacy regulations such as the General Data Protection Regulation (GDPR) by keeping the data local and not transmitting it to a central server.

Figures 6 shows the result of the patience parameter that controls early stopping. This parameter represents the number of epochs required before terminating the training process when no performance improvement is obtained. According to the figure, the system accuracy seems to reach the optimal model with as low as only 10 epochs.

V Conclusion

In this paper, we presented a novel approach to privacy-preserving taxi demand prediction using federated learning. Our proposed system leverages the FedAvg federated learning technique to train a taxi-demand prediction model without compromising the privacy and security of customer data owned by taxi-providing facilities. By enabling facilities to build models on data they would otherwise be unable to access, our approach offers significant benefits in terms of data availability. To evaluate the effectiveness of our proposed system, we conducted experiments using real-world data collected from 16 taxi service providers in Japan over a period of six months. The results demonstrated that the system accurately predicts demand levels with less than a 1% decrease in accuracy compared to classical solutions.

Acknowledgment

This work was supported by JST, CREST Grant JPMJCR21M5, Japan, and JSPS, KAKENHI Grant 22K12011, and NVIDIA award.

References

[1] H. Yu, V. Raychoudhury, and S. Saha, “Dynamic taxi ride-sharing through adaptive request propagation using regional taxi demand and supply,” in Proc. of MobiQuitous 2021. Springer, 2021, pp. 40–56.
[2] D. Zhang, L. Sun, B. Li, C. Chen, G. Pan, S. Li, and Z. Wu, “Understanding taxi service strategies from taxi gps traces,” IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 1, pp. 123–135, 2015.
[3] O. Abul, F. Bonchi, and M. Nanni, “Never walk alone: Uncertainty for anonymity in moving objects databases,” in 2008 IEEE 24th International Conference on Data Engineering, 2008, pp. 376–385.
[4] T.-H. You, W.-C. Peng, and W.-C. Lee, “Protecting moving trajectories with dummies,” in Proc. of MDM 2007, 2007, pp. 278–282.
[5] A. Suzuki, M. Iwata, Y. Arase, T. Hara, X. Xie, and S. Nishio, “A user location anonymization method for location based services in a real environment,” in Proc. of GIS 2010. ACM, 2010, p. 398–401.
[6] K. Jiang, D. Shao, S. Bressan, T. Kister, and K.-L. Tan, “Publishing trajectories with differential privacy guarantees,” in Proc. of SSDBM 2010, ser. SSDBM. New York, NY, USA: ACM, 2013.
[7] H. Jiang, J. Li, P. Zhao, F. Zeng, Z. Xiao, and A. Iyengar, “Location privacy-preserving mechanisms in location-based services: A comprehensive survey,” ACM Comput. Surv., vol. 54, no. 1, jan 2021.
[8] M. Ohno, R. Ukyo, T. Amano, H. Rizk, and H. Yamaguchi, “Privacy-preserving pedestrian tracking using distributed 3d lidars,” in Proc. of PerCom 2023. IEEE, 2023, pp. 43–52.
[9] M. Mohsen, H. Rizk, and M. Youssef, “Privacy-preserving by design: Indoor positioning system using wi-fi passive tdoa,” in The 24th IEEE International Conference on Mobile Data Management. IEEE, 2023.
[10] B. Liu, W. Zhou, T. Zhu, L. Gao, and Y. Xiang, “Location privacy and its applications: A systematic study,” IEEE Access, vol. 6, pp. 17 606–17 624, 2018.
[11] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “L-diversity: Privacy beyond k-anonymity,” ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, p. 3–es, mar 2007.
[12] N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and l-diversity,” in Proc. of ICDE 2007. IEEE, 2006, pp. 106–115.
[13] J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” 2016.
[14] M. Hao, H. Li, X. Luo, G. Xu, H. Yang, and S. Liu, “Efficient and privacy-enhanced federated learning for industrial artificial intelligence,” IEEE Transactions on Industrial Informatics, vol. 16, no. 10, pp. 6532–6542, 2020.
[15] Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-based federated learning for traffic flow prediction,” Future Generation Computer Systems, vol. 117, pp. 328–337, 2021.
[16] H. Yao, F. Wu, J. Ke, X. Tang, Y. Jia, S. Lu, P. Gong, J. Ye, and Z. Li, “Deep multi-view spatial-temporal network for taxi demand prediction,” in Proc. of AAAI 2018, vol. 32, no. 1. AAAI Press, 2018, pp. 2588–2595.
[17] C. Zhang, F. Zhu, X. Wang, L. Sun, H. Tang, and Y. Lv, “Taxi demand prediction using parallel multi-task learning model,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 2, pp. 794–803, 2022.
[18] C. Zhang, F. Zhu, Y. Lv, P. Ye, and F.-Y. Wang, “Mlrnn: Taxi demand prediction based on multi-level deep learning and regional heterogeneity analysis,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8412–8422, 2022.
[19] R. Ozeki, H. Yonekura, H. Rizk, and H. Yamaguchi, “Balancing privacy and utility of spatio-temporal data for taxi-demand prediction,” in The 24th IEEE International Conference on Mobile Data Management. IEEE, 2023.
[20] ——, “Sharing without caring: Privacy protection of users’ spatio-temporal data without compromise on utility,” in Proc. of SIGSPATIAL 2022. ACM, 2022.
[21] J. Wen, Z. Zhang, Y. Lan, Z. Cui, J. Cai, and W. Zhang, “A survey on federated learning: challenges and applications,” International Journal of Machine Learning and Cybernetics, vol. 14, no. 2, pp. 513–535, 2023.
[22] W. Li and S. Wang, “Federated meta-learning for spatial-temporal prediction,” Neural Computing and Applications, vol. 34, no. 13, pp. 10 355–10 374, 2022.
[23] X. Yuan, J. Chen, J. Yang, N. Zhang, T. Yang, T. Han, and A. Taherkordi, “Fedstn: Graph representation driven federated learning for edge computing enabled urban traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–11, 2022.
[24] Y. Li, J. Li, and Y. Wang, “Privacy-preserving spatiotemporal scenario generation of renewable energies: A federated deep generative learning approach,” IEEE Transactions on Industrial Informatics, vol. 18, no. 4, pp. 2310–2320, 2022.
[25] N. Majcherczyk, N. Srishankar, and C. Pinciroli, “Flow-fl: Data-driven federated learning for spatio-temporal predictions in multi-robot systems,” in Proc. of ICRA 2021. IEEE, 2021, pp. 8836–8842.
[26] C. Dwork, “Differential privacy,” in Proc. of ICALP, ser. LNCS, vol. 4052. Springer, 2006, pp. 1–12.
[27] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang, “Deep learning with differential privacy,” in Proc. of CCS 2016. ACM, 2016, pp. 308–318.
[28] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha, “Privacy risk in machine learning: Analyzing the connection to overfitting,” in Proc. of CSF 2018. IEEE, 2018, pp. 268–282.
[29] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. S. Quek, and H. Vincent Poor, “Federated learning with differential privacy: Algorithms and performance analysis,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 3454–3469, 2020.
[30] A. Yousefpour, I. Shilov, A. Sablayrolles, D. Testuggine, K. Prasad, M. Malek, J. Nguyen, S. Gosh, A. Bharadwaj, J. Zhao, G. Cormode, and I. Mironov, “Opacus: User-friendly differential privacy library in pytorch,” CoRR, vol. abs/2109.12298, 2021.
[31] N. Papernot, “Machine learning at scale with differential privacy in TensorFlow,” in Proc. of PEPR 2019. USENIX Association, 2019, invited Talk. [Online]. Available: https://www.usenix.org/node/238163
[32] L. Prediger, N. A. Loppi, S. Kaski, and A. Honkela, “d3p - A python package for differentially-private probabilistic programming,” Proceedings on Privacy Enhancing Technologies, vol. 2022, no. 2, pp. 407–425, 2022.
[33] L. Sweeney, “k-anonymity: A model for protecting privacy,” International journal of uncertainty, fuzziness and knowledge-based systems, vol. 10, no. 05, pp. 557–570, 2002.
[34] K. LeFevre, D. DeWitt, and R. Ramakrishnan, “Mondrian multidimensional k-anonymity,” in Proc. of ICDE 2006. IEEE, 2006, pp. 25–25.
[35] A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data, vol. 1, no. 1, pp. 3–es, 2007.
[36] S. Khan, V. Saravanan, T. J. Lakshmi, N. Deb, N. A. Othman et al., “Privacy protection of healthcare data over social networks using machine learning algorithms,” Computational Intelligence and Neuroscience, vol. 2022, pp. 1–8, 2022.
[37] D. Slijepčević, M. Henzl, L. D. Klausner, T. Dam, P. Kieseberg, and M. Zeppelzauer, “k-anonymity in practice: How generalisation and suppression affect machine learning classifiers,” Computers & Security, vol. 111, p. 102488, 2021.
[38] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Müller, “Efficient backprop,” in Neural networks: Tricks of the trade. Springer, 2012, pp. 9–48.
[39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[40] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–1282.
[41] D. J. Beutel, T. Topal, A. Mathur, X. Qiu, T. Parcollet, P. P. de Gusmão, and N. D. Lane, “Flower: A friendly federated learning research framework,” arXiv preprint arXiv:2007.14390, 2020.
[42] K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, “The balanced accuracy and its posterior distribution,” in Proc. of ICPR 2010. IEEE, 2010, pp. 3121–3124.