ScaleSFL:
A Sharding Solution for Blockchain-Based Federated Learning

Evan Madill 0000-0002-6668-3671 Department of Computer ScienceUniversity of ManitobaWinnipegMBCanada [email protected] , Ben Nguyen Department of Computer ScienceUniversity of ManitobaWinnipegMBCanada [email protected] , Carson K. Leung 0000-0002-7541-9127 Department of Computer ScienceUniversity of ManitobaWinnipegMBCanada [email protected] [email protected] and Sara Rouhani Department of Computer ScienceUniversity of ManitobaWinnipegMBCanada [email protected]

(2022)

Abstract.

Blockchain-based federated learning has gained significant interest over the last few years with the increasing concern for data privacy, advances in machine learning, and blockchain innovation. However, gaps in security and scalability hinder the development of real-world applications. In this study, we propose ScaleSFL, which is a scalable blockchain-based sharding solution for federated learning. ScaleSFL supports interoperability by separating the off-chain federated learning component in order to verify model updates instead of controlling the entire federated learning flow. We implemented ScaleSFL as a proof-of-concept prototype system using Hyperledger Fabric to demonstrate the feasibility of the solution. We present a performance evaluation of results collected through Hyperledger Caliper benchmarking tools conducted on model creation. Our evaluation results show that sharding can improve validation performance linearly while remaining efficient and secure.

Blockchain, Consensus, Privacy, Federated Learning, Edge Computing, Sharding, Hyperledger Fabric

^†^†copyright: acmcopyright^†^†journalyear: 2022^†^†doi: 10.1145/3494106.3528680^†^†conference: Proceedings of the Fourth ACM International Symposium on Blockchain and Secure Critical Infrastructure ; May 30, 2022; Nagasaki, Japan^†^†booktitle: Proceedings of the Fourth ACM International Symposium on Blockchain and Secure Critical Infrastructure (BSCI ’22), May 30, 2022, Nagasaki, Japan^†^†price: 15.00^†^†isbn: 978-1-4503-9175-7/22/05^†^†ccs: Computing methodologies Machine learning^†^†ccs: Computing methodologies Distributed computing methodologies^†^†ccs: Security and privacy^†^†ccs: Security and privacy Distributed systems security^†^†ccs: Security and privacy Privacy-preserving protocols

1. Introduction

With the increasing concern for data privacy and security of personal data, the demand for creating trusted methods of privacy-preserving communication is rapidly increasing. Recent advances in machine learning have also spurred interest in a variety of privacy-sensitive domains such as healthcare informatics (ref:healthcare-fl, ) and coronavirus disease 2019 (COVID-19) diagnoses (ref:covid-19-diagnosis-fl, ). However, the privacy of personal data used to train these models has been called into question. Traditional methods of training require data to be centrally located. However, this is challenging or impossible with privacy regulations.

Federated learning (FL) (ref:federated-learning, ) is a proposed solution allowing sensitive data to remain distributed among edge devices. This framework will enable devices to train models locally on-device and then upload their local model updates to a trusted central server to be aggregated, resulting in a global model that is then distributed. This process is repeated until the desired objective is reached, such as the convergence of model accuracy.

Additional improvements have been made to the aggregation strategies involved in traditional FL, notably in reducing the communication overhead to the central aggregation server. Solutions such as HierFavg (ref:hierfavg, ) introducing a hierarchical scaling solution, where model updates are instead sent to local edge servers performing partial model aggregation. This combines the benefits of both centralized cloud and edge computing, resulting in both reduced latency and communication overhead.

While the traditional FL framework has demonstrated its viability in real-world applications, it has several notable issues. First, model updates can indirectly leak the local data used to train them. Specifically, users can be de-anonymized through data reconstruction via statistical analysis and data mining. An example of this was with the Netflix Prize Dataset (ref:deanonymize-netflix, ) where data was correlated to users using an auxiliary IMDb dataset. The second issue is the centralized nature of aggregating the model updates, which creates a single point of failure and relies on trusted authorities.

Differential privacy (ref:differential-privacy, ; ref:differential-privacy-impossibility, ), a mathematical framework for evaluating dataset privacy of a given model, can be employed to solve the first issue, for example, by adding a certain noise to the model update. A solution to the second issue is to utilize blockchain technologies to decentralize the task of aggregating the local model updates. However, using a decentralized system incurs several more issues by removing the presence of a centrally trusted entity which gives rise to, for example, the problem of detecting malicious clients sending false model updates.

Additionally, because local datasets do not necessarily represent the global distribution, as they are non-IID (independent and identically distributed), two honest clients could submit opposing model updates. This problem can be addressed by validating each transaction. One possible solution is to allow each node to execute validation procedures in a smart contract, as described in BAFFLE (ref:baffle, ). However, delegating this task to every node of the network is computationally expensive considering the high overhead, local model training, and the consensus task (ref:committee-consensus, ). Another solution is to select a committee of nodes each round to perform this validation task (ref:committee-consensus, ).

We can define the current number of clients training each round as $\mathcal{C}$ , and the number of committee nodes (endorsing peers) as $\mathcal{P}_{E}$ . In each round, a total of $\mathcal{C}\times\mathcal{P}_{E}$ computations must be completed to validate all transactions.

Here, we propose a sharding mechanism extending the committee consensus method. As the network expands, clients will be assigned to different shards within the network, and a committee will be elected for each shard. After every phase, each shard’s committee will publish a local shard-aggregated model, which will then be globally aggregated and redistributed to all clients. If we have $\mathcal{S}$ many shards, this would reduce the global computations from $\mathcal{C}\times\mathcal{P}_{E}$ to $\frac{\mathcal{C}\times\mathcal{P}_{E}}{\mathcal{S}^{2}}$ for each shard, or $\frac{\mathcal{C}\times\mathcal{P}_{E}}{\mathcal{S}}$ globally.

To preserve the security of the system across shards, we propose a flexible protocol allowing for a pluggable poisoning mitigation, and detection strategies (ref:roni, ; ref:krum, ; ref:fools-gold, ; ref:attack-adaptive-aggregation, ). This allows proposed models to adopt suitable strategies, while allowing the system evolves with recent advances.

Our key contributions of this paper can be summarized as follows:

•

a sharding framework called ScaleSFL, which supports a sharding solution for blockchain-based federated learning. We formally discuss the design of our algorithm for the sharding mechanism based on two consensus levels: shard-level consensus and mainchain consensus.
•

an implementation of a proof-of-concept prototype system using Hyperledger Fabric (ref:fabric, ) to run local experiments and evaluate the solution effectiveness.
•

benchmarks that we demonstrate the performance of our implementation by using Hyperledger Caliper benchmarking tools (ref:hyperledger-caliper, ), showing our framework scales linearly with the number of shards.

The remainder of this paper is organized as follows. The next section provides background and related works. Our ScaleSFL framework, a sharding solution for blockchain-based federated learning, is introduced in Section 3. Evaluation and discussion are presented in Sections 4 and 5, respectively. Finally, we conclude our work, and describe future works in Section 6.

2. Background and Related Works

Our current work integrates the ideas of federated learning and differential privacy, as well as blockchain-based federated learning. In this section, we introduce individual topics and corresponding hybrid solutions to provide background.

2.1. Federated Learning and Differential Privacy

Privacy-preserving machine learning combines the use of federated learning with differential privacy (ref:privacy-preserving-deep-learning, ; ref:differential-privacy-sgd, ; ref:cit21, ). Federated learning allows sensitive data from edge devices (e.g., smartphones, tablets, IoT devices) to remain on-device while training is done locally (distributed training on edge devices) (ref:federated-learning, ). In the original FL environment, a centralized server manages the training process. The server broadcasts the current model during each round to a subset of participating nodes to activate local training (requesting a model update). After all model update submission responses are received, the server aggregates the local models into a global model and prepares for the next round (ref:federated-learning, ; ref:committee-consensus, ). While FL offers many privacy benefits, differential privacy in conjunction provides stronger privacy guarantees (ref:federated-learning, ) by mitigating data reconstruct-ability via random model update noise.

McMahan et al. (ref:federated-learning, ) proposed the Federated Averaging algorithm (FedAvg). Their algorithm is a generalization of federated stochastic gradient descent (FedSGD) (ref:differential-privacy-sgd, ) allowing each node to perform multiple batch updates while exchanging the models’ parameters, with which we based our FL model training approach. Experiments demonstrate that FL can be used for training a variety of high-quality machine learning models at reasonable communication cost, exemplifying high practicality (ref:federated-learning, ). Therefore, we choose to apply differential privacy and use an adapted version of FedAvg for our current work.

Liu et al. (ref:hierfavg, ) introduced HierFavg, a hierarchical scaling solution introducing multiple edge servers, each performing partial model aggregation, combining the benefits of both centralized cloud and edge computing. While centralized servers benefit from the vast reach of data collection, the communication overhead and latency of downloading and evaluating such updates limit this approach. Instead, model updates are sent to local edge servers to benefit from fast and efficient model updates, reducing the number of local iterations and total runtime. These local updates are then sent to a cloud server allowing for a sampling of the global data distribution to be considered and outperforming the local edge server performance. Combining these approaches is nontrivial, and the authors performed convergence analysis describing qualitative guidelines in applying their approach. While our approach is primarily motivated by increasing the throughput of model update metadata to blockchain ledgers, the composition of shards can benefit from the strategies found in hierarchical solutions.

2.2. Blockchain-Based Federated Learning

Blockchain technology can be utilized to decentralize the aforementioned centralized server in federated learning. The aggregation and facilitation logic previously done by the central server can be replaced by implementing smart contracts (ref:committee-consensus, ).

In addition to Ramanan and Nakayama’s blockchain-based aggregator free federated learning (BAFFLE) (ref:baffle, ) and Li et al.’s blockchain-based federated learning framework with committee consensus (ref:committee-consensus, ) referenced earlier, numerous strategies for integrating blockchain technology with federated learning to achieve decentralization have been considered (SatybaldyN20, ). However, the approaches proposed by previous works do not directly focus on the performance of their chosen underlying consensus mechanism with which their proposed frameworks are built upon.

2.2.1. Blockchain Integration

Kim et al.’s blockchain-based federated learning (BlockFL) (ref:bc-on-device-fl, ) was implemented using the proof-of-work consensus mechanism and focused on analyzing the end-to-end latency of their system. Ma et al.’s proposed blockchain-assisted decentralized FL (BLADE-FL) (ref:fl-meets-bc, ) was able to address the single point of failure issue while analyzing the learning performance. Liu et al.’s proposed federated learning with asynchronous convergence (FedAC) method (ref:bfl-async, ) seeks to improve communication performance through integrating asynchronous global aggregation. While these works focus on the performance of various system components, they do not consider the scalability requirements for large numbers of participants and the impact on the consensus mechanism’s throughput.

Additionally, many blockchain-based federated learning applications with domain-specific frameworks have also been proposed. For example, healthcare (ref:bfl-healthcare, ), IoT (ref:bfl-iot, ) and 6G networks (ref:bfl-6g, ). Similarly, these works focus on cutting-edge implementation plausibility rather than scalability or details regarding the performance of their consensus mechanisms.

Gai et al. (ref:bfl-iiot, ) proposed a model that utilizes a blockchain-based solution to develop a privacy-preserving environment for task allocations in a cloud/edge system. Employing blockchain enhances trustworthiness in edge nodes and security in communications. Utilizing smart contracts also contributes toward reaching optimal task allocations. The model applies a differential privacy algorithm (Laplace distribution mechanism (ref:differential-privacy, )) to process energy cost data in edge nodes.

Li et al. (ref:committee-consensus, ) proposed a blockchain-based FL framework with committee consensus in response to consensus mechanism efficiency and framework scalability. The committee consensus mechanism works as follows: A subset of nodes constitutes a committee responsible for validating model updates and new block generation. Meanwhile, the other nodes only execute their local model training and send their updates to the committee nodes. This committee is re-elected every round based on scores from the previous round, or alternatively, re-election is randomized for implementation simplicity. Li et al. demonstrated that committee consensus is more efficient and scalable while remaining secure.

2.2.2. Sharding Solutions

Sharding allows for scalability by splitting the network into smaller subgroups. This reduces storage and communication requirements while increasing throughput as the number of shards increases.

Yuan et al. (ref:bfl-shard, ) proposed ChainFL, a two-layered blockchain-driven FL system that splits an IoT network into multiple shards. Also using Hyperledger Fabric (ref:fabric, ), ChainFL adopts a direct acyclic graph (DAG)-based mainchain to achieve parallel (and asynchronous) cross-shard validation. While their approach is DAG-based, we focus on a hierarchical FL solution, and are able to use existing off-chain FL frameworks to increase interoperability.

2.3. Model Update Poisoning

A notable issue when aggregating model updates from untrusted and potentially adversarial clients are model poisoning attacks. In order to disrupt or bias the training process, and adversary may submit biased, or corrupted model updates. Because model updates must be verified prior to being accepted to the ledger, we consider each defence mechanism applied to each endorsing peer rather than a single centralized server. One possibility to make the training more robust is to apply norm constraints to each model update, such as clipping model updates and applying random noise to the aggregated model to prevent overfitting (ref:fl-survey, ). However, further mechanisms are required to counter adversarial examples and backdoors (ref:fl-backdoor, ). A subset of model poisoning is data poisoning attacks. These occur when the clients’ dataset has been altered, such as a targeted attack by adversarial manipulation of edge devices, or untargeted such as a faulty data collection process, resulting in corrupted samples (e.g., incorrect labels). Here, we focus on readily applicable works related to the general detection of poisoned model updates.

Barreno et al. (ref:roni, ) proposed a technique to determine which samples from a training set are malicious. To do so, they present the reject on negative influence (RONI) defence. This method measures the effect each training sample has on the accuracy of the model. This defence strategy is appropriately modified to suit the needs of a FL setting. Instead, we measure the influence of each model update on the accuracy of the global model. This can be done by utilizing a held-out testing set, potentially unique to each endorsing peer. This defence is not suitable for general FL applications due to the nature of client data being non-IID and incorrectly rejected due to misrepresentation by the test set. Additionally, malicious updates may go unnoticed due depending on the threshold of acceptance.

Blanchard et al. (ref:krum, ) introduced a method designed to counter adversarial updates in a FL setting. Multi-Krum is a byzantine-resilient defence strategy that selects $n$ vectors furthest from the calculated mean, removing them from the aggregated gradient update. These distances are calculated based on the euclidean distance between gradient vectors. This strategy is effective for attacks with up to 33% compromised/adversarial clients. However, the calculated mean can be influenced by the Sybil updates and may fail for an attacker with more significant influence over the network.

Fung et al. (ref:fools-gold, ) proposed a novel defence strategy to discriminate between honest clients and Sybils based on the diversity of their respective gradient updates. The intuition behind this defence is to identify Sybils based on a shared objective, and thus their models’ updates are likely to have low variance. This works by using a cosine similarity, looking for similarity in terms of indicative features (i.e., those that contribute to the correctness of the update). This approach can be further augmented with other defence methods such as Multi-Krum to determine the poisoned updates.

Wan and Chen (ref:attack-adaptive-aggregation, ) introduced an attack-adaptive defence utilizing an attention-based neural network to learn plausible attacks. Instead of using similarity-based mechanisms, this introduces tailored defence strategies to prevent backdoor attacks.

Ma et al. (ref:dp-data-poisoning, ) explored differential privacy as a data poisoning defence based on creating a robust learning algorithm. While this defence is ideal for systems already using differential privacy, it enforces the use of noise in the learning algorithm, hurting performance. Additionally, this assumes a small subset of malicious clients, failing to defend against larger coordinated attacks.

Refer to caption — Figure 1. Global overview of our FL sharding approach

Related attacks, such as model update plagiarism (ref:fl-meets-bc, ), may additionally occur, leading to a misleading distribution of data amongst clients. A malicious user may attempt to submit the same model update multiple times through several clients, known as lazy clients. They may then attempt to reap the rewards of contribution for each of these submissions. These can be detected by using a PN-sequence (ref:fl-meets-bc, ; ref:blade-fl-analysis, ), whereby the initial model update is submitted with random noise applied to the update and is then further verified by other clients using their own PN-sequence, checking for correlation.

3. Our Sharding Solution

This section describes our approach and outlines the architecture for our proposed sharding solution. An outline of the workflow can be seen in Figure 1. We can break this into two primary components, the mainchain, which coordinates aggregated models from shards, and the individual shard chains.

Our framework follows a few core principles. Our solution is modular such that (a) the policies that govern model acceptance are abstracted, for example, model update poisoning, and mitigation strategies (Section 2.3), are supported to determine model acceptance. Our framework additionally (b) supports external FL coordination, such that new paradigms such as data-centric FL are supported. This means the source of models updates is agnostic.

With these principles, our solution acts as an additional component to existing FL workflows such that it handles model provenance, the evolution of security and poisoning defences, scalability through sharding, and the possibility of reward distribution for future works. Therefore, we solve the initial problem of FL systems being dependent on a central service to aggregate updates, providing a trusted system to pin verified model updates. We will further discuss the components of this system below.

3.1. FL Algorithm

The underlying problem of federated learning is to produce a global model from client updates trained on local privacy-sensitive datasets. We can define each client $\mathcal{C}_{k}$ to have their own set of data $D_{k}\subseteq D,\forall k,0\leq i\leq K$ . Using this, we can define the global objective to optimize as a modification of the FedAvg algorithm (ref:federated-learning, ) as

(1)

\underset{w\in R^{d}}{min}f(w)\quad\textrm{and}\quad\underset{w\in R^{d}}{min}F_{k}(w)\quad\textrm{for each client}

where the batch loss for each client is defined as

(2)

f_{b_{k,i}}(w)=\frac{1}{|b_{k}|}\sum_{x_{i}\in b_{k,i}}l(x_{i},w_{k})

and $b_{k,i}$ being a random subset from $D_{k}$ . Each client updates their local objective with respect to their dataset $D_{k}$ by

(3)

w_{k}\leftarrow w_{k}-\eta_{k}\nabla f_{b_{k,i}}(w_{k})

where $\eta_{k}$ is the clients learning rate. The clients objective then becomes

(4)

F_{k}(w)=\frac{1}{|D_{i}|}\sum_{b_{k,i}\in D_{i}}f_{b_{k}}(w)

and so the shards objective will become

(5)

G_{s}(w)=\sum_{k}^{K}\frac{|D_{k}|}{D}F_{k}(w)

This can be defined as weight updates by

(6)

w_{s}\leftarrow w_{s}+\sum_{k}^{K}\frac{|D_{k}|\Delta w_{k}}{D}

This happens in parallel across all shards, where the global objective will become the sum of the results produced across all shards

(7)

f(w)=\sum_{s}^{S}\frac{|D_{s}|}{D}G_{s}(w)

We should note that this described approach follows the original FedAvg algorithm and can be modified in terms of fairness (ref:q-fed-avg, ) among other averaging strategies.

3.2. Shard Level Consensus

Each task deployed on the network can run an independent consensus mechanism. This allows for various task-related workloads to be accommodated by adapting the consensus algorithm. For example, when training large models where the performance of a single node may be limited, mechanisms such as PBFT (ref:pbft, ) can be used. However, in shards with a fewer number of clients, we can make use of Raft consensus (ref:raft-consensus, ). This may benefit settings where model complexity is lower, and so a peer may be able to handle the throughput bottleneck of Raft consensus, determined by node performance.

Within the chosen consensus mechanism, we apply a pluggable policy to determine the acceptance of model writes. This policy can be substituted to handle alternative schemes (Section 2.3). These schemes handle the identification of malicious clients by finding poisoned model updates or detecting Sybil attacks. It should be noted that while these plugins are applied at a task level to maintain the network’s security, they can be upgraded with the smart contract that they are governing.

Another quality to consider is the detection of lazy clients. Detection of these clients may use a PN-sequence, which is applied to the model updates. This technique could then be applied within the shard level consensus to verify the clients’ PN-sequence.

Because each shard contains a subset of the total participants for each task, the algorithmic complexity of consensus is significantly reduced to $\frac{\mathcal{C}\times\mathcal{P}_{E}}{\mathcal{S}^{2}}$ . As a result, the network overhead needed to communicate model updates is also significantly reduced, which is the primary bottleneck in peer-to-peer federated learning.

3.3. Mainchain Consensus

The mainchain is responsible for coordinating all verified aggregated model updates from each shard. This chain contains all participants across all shards; however, the activity on this chain is limited to shard level aggregation results and task proposals. The consensus for this follows the same pluggable consensus policy as the Shard level consensus (Section 3.2); however, the submitting peers are limited only to the subset of endorsing peers on the shard chains. In this way, we limit the endorsing peers to those in possession of an aggregated model and are able to verify its authenticity.

All accepted model updates to the mainchain will have been endorsed by each shard endorsing peers, which can safely be distributed and aggregated. If multiple models from a single shard have been accepted by disagreements between a shards endorsing peers, the model with more endorsements will win. So, if the endorsing peers submit the same model (model hashes are identical), the endorsing peers can safely evaluate the model update once. Figure 2 shows this process. The final step here is to post the final global model of the mainchain. This model represents the result after a single round of FL, and each of the shards will start the next round with these parameters.

We note that a new round may begin earlier within a shard by assuming the model aggregated by a shard endorsing peers is valid before it is accepted to the mainchain. If this model is not accepted, then the round within that shard must be restarted. This should only happen if the majority of endorsing peers within a shard are compromised, and as such, the disruption will only affect the compromised shard.

3.4. Workflow

This section discusses the entire workflow of our proposed framework, discussing the steps required to produce a trained model.

Firstly, we define the participants’ category in our system. We can break this into three categories: clients, peers, and endorsing peers, and described as follows.

(1)

Clients: In a traditional FL setting, all participants are considered to be clients, where they are comprised of typical IoT devices such as smartphones, laptops, appliances, and vehicles, among others. Clients hold the private datasets that will be used for training models and are they are responsible for producing model updates. We consider the same assumption here, where clients do not validate models but rely on more powerful devices to ensure the system’s security. This is the lightest class of participants in terms of resource consumption.
(2)

Peers: Peers are responsible for holding copies of the ledger. From this pool of peers, a committee can be elected to reduce the evaluation overhead. This helps take the load off of endorsing peers and allows peers to offer API services for clients.
(3)

Endorsing peers: Members of the committee are responsible for evaluating model updates during consensus. They hold their own private dataset to endorse other model updates and can submit their own model updates, although they are not required to do so. Furthermore, the policy for evaluation of model weights can be modified, and these peers additionally must check for valid authentication of the write-set. Finally, committee members must participate in the mainchain consensus to determine which shard level updates will be accepted.

These participants will then participate in one or more tasks, as managed by the mainchain smart contract. Starting from a fresh network, we will describe the following process.

3.4.1. Task Proposal

The procedure begins with a task proposal on the application layer. It will be proposed on the mainchain in the form of a smart contract, and it outlines the requirements and task description. We leave the details of this specification open, as we may not know the complete set of features or optimization level objectives as such in a vertical FL (VFL) scenario (ref:federated-learning-concepts, ). Once this proposal has garnered enough interest in client registration, a new shard will be provisioned, and shard-related smart contracts will be deployed to the newly created shard(s). These processes can be thought of as an open request for participants to collaborate on a specific task.

3.4.2. Client Training

Within a specific shard, an off-chain coordination network will be formed as defined by the proposal. This could utilize existing FL frameworks and best practices, and new paradigms. As such, the fraction of clients fit within a specific round will be determined off-chain by this process and exclude, for example, currently unavailable clients due to battery or network coverage issues. These selected clients are then responsible for producing a model update to the global model weights $w$ . Each required client will then compute their local updates as seen in Eq. (4). We can see this as Step 1 in Figure 3.

3.4.3. Off-Chain Model Storage

A client will then optionally upload a model to an off-chain cache such as InterPlanetary File System (IPFS) (ref:ipfs, ). At this stage, the model can be downloaded publicly; however, it has not yet been verified nor recorded on the ledger yet. This step is used to record historical models or take the network load off of serving local model updates to peers. We can see this as Step 2 in Figure 3.

3.4.4. Model Submission

Once the model has been uploaded and is available for download, the client will submit the update metadata by invocation of the shard level smart contract. This update should include information such as the model’s hash value and a link from which the model can be downloaded. This can be seen in Step 3 in Figure 3.

3.4.5. Peer Endorsement

The peers on the shard must endorse this model update according to the shard level consensus mechanism (Section 3.2). This work could be offloaded to the peers’ local workers, where additional computing resources may be present. This can be seen in Steps 4 and 5 in Figure 3.

3.4.6. Model Evaluation

The peers’ local workers request the model weight update from the link specified in the model update. The downloaded model will then be verified against the submitted hash to ensure integrity. Once the model is verified, it can be evaluated by various techniques; for example, the peer may test the model against its local dataset and reject the update on sufficient degradation in performance, such as in the RONI defence. This would be ideal in cases where the data distribution is known to be IID. Alternative solutions such as the FoolsGold scheme may be used for non-IID data, which could be used to identify poisoning attacks from Sybils. The results of this will be returned from the worker back to the peer. Here, we have summarized Steps 6-8 in Figure 3.

3.4.7. Shard Aggregation

Endorsed model updates are written to the ledger according to the consensus quorum. The off-chain FL process can continue by aggregating the models from the current round if they have been accepted into the ledger. These models will then be downloaded and aggregated by the coordinating servers according to Eq. (7). These aggregated models will then be published to the mainchain. Similarly, these aggregated shard updates are only be accepted if the committee of endorsing peers reaches a consensus on the update governed by the mainchains consensus mechanism (Section 3.3).

3.4.8. Global Aggregation

Finally, all accepted shard level model updates accepted to the mainchain will be downloaded and aggregated by the shard level servers. The peers can easily verify these models by checking the hash of the new distributed global model against the posted hash from the mainchain for the finalized FL round.

4. Experiments

To demonstrate the effectiveness of our proposed sharding solution, we evaluate the performance in a Proof of Concept (PoC) implementation¹¹1https://github.com/blockchain-systems/ScaleSFL. The purpose of experimentation here is first to measure the throughput of a FL workload scale with respect to the number of shards. Secondly, we measure the sharding impact on model performance in both IID and non-IID scenarios. It is important to ensure the quality of models and keep the rate of convergence low when a potential subset of peers are malicious.

To run these experiments, we use Hyperledger Fabric (ref:fabric, ), a permissioned distributed ledger platform. This platform offers a modular architecture allowing for plug-n-play consensus and membership services. Hyperledger Fabric offers an execute-order-validate architecture for transactions, allowing each peer to evaluate models in parallel within each shard. This is opposed to most public blockchain platforms that first order transactions and execute them sequentially. Fabric additionally offers modular endorsement and validation policies which we will use to implement custom consensus logic within an endorsement plugin. We will read in a proposal response payload, sending the given transaction read-write set to the peers’ worker for validation. This read-write set contains the necessary information to endorse or reject the proposed model update. This will be built directly into the Fabric peers.

Our solution workflow may be adapted to alternative permissioned platforms such as Quorum (ref:quorum, ), or public platforms such as Ethereum (ref:ethereum, ), which may allow for additional reward allocation schemes. Implementations on these platforms may consider the use of child chains utilizing existing layer-2 scaling solutions. We choose Hyperledger Fabric for our PoC primarily for the flexibility of consensus and membership services during development.

Table 1. Experimental Configuration

Component	Type	CPU	GPU	RAM	Disk (SSD)
Caliper Benchmark	Caliper 0.4.2	Ryzen 7 [email protected] GHz	NVIDIA RTX 3080	64GB	1TB
Fabric Peer (Shard)	Fabric 2.3.3	Ryzen 7 [email protected] GHz	NVIDIA RTX 3080	64GB	1TB
Fabric Peer Worker	Python 3.9, PyTorch 1.10.1	Ryzen 7 [email protected] GHz	NVIDIA RTX 3080	64GB	1TB

Tests here are run locally on a single machine simulating a Fabric test-network with a single orderer running Raft (ref:raft-consensus, ), and eight peers, each with their own organization, along with a certificate authority. Each peer worker runs on a single thread, allowing each peer to act independently. The experimental configuration can be seen in Table 1.

We implement two smart contracts to implement both the mainchain consensus and shard level consensus mechanisms. Smart contracts, otherwise known as chaincode in Fabric, will be deployed to a specific channel within Fabric. We use channels to simulate shards, where each channel operates independently with the ability for different membership and endorsement policies per channel. Additionally, we relax the previous definitions of endorsing peers, such that $P\colon\{Peers\}=P_{E}\colon\{Endorsing~{}peers\}$ . We deploy a shard level smart contract we term the “models” chaincode to each channel. Each participant operating within a shard will be assigned to a shard by our managing contract. We deploy the mainchain consensus on its own channel, in which every peer from every shard will join. We term this the catalyst contract, where it is responsible for aggregating a global model from each of the shards. As mentioned previously, model updates are stored off-chain. In this implementation, each peers worker hosts the models locally with a gRPC server.

We coordinate the FL processes off-chain while pinning updates back to each shard’s ledger. This coordination is abstracted from the approach discussed previously; in this case, we use the Flower (ref:flower, ) an open-source extensible FL framework. In this implementation, during aggregation, submitted models are verified to have been endorsed, which can be checked by querying the peers’ local ledger to reduce a round of communication. This is conducted by creating a custom strategy within the Flower server and modifying the aggregated fit to filter out any updates which are not present on-chain, by querying the models’ smart contract. We additionally implement differential privacy using Opacus (ref:opacus, ), applied during client training. We use an $(\epsilon,\delta)$ target of (5, $1\mathrm{e}{-5}$ ), a noise multiplier of 0.4, and a max gradient norm of 1.2.

4.1. Evaluation

For evaluation, we employ Hyperledger Caliper (ref:hyperledger-caliper, ), an open source benchmarking tool allowing for various workloads to be conducted for our system under test (SUT). These tests are performed by an independent process that generates transactions to be sent to the SUT by a specified configuration, monitoring for responses to determine metrics such as latency, throughput and success rate. We conduct tests with varying numbers of workers, transactions counts, and transactions per second (TPS) to evaluate the limits of the system.

We run several workloads to test the effects of sharding on transaction throughput and model performance. The benchmark tasks include:

•

Update creation throughput: We first run an experiment to test the throughput of model creation. To do this, we generate a number of models updates, make the parameters available locally, and have the endorsing peers evaluate them during consensus. In this way, we can measure how sharding affects the performance of the consensus mechanism by parallelizing the workload among shards. We aim to verify our initial claim that the global computations required with become $\frac{\mathcal{P}\times\mathcal{C}}{\mathcal{S}}$ .
•

Model performance: We evaluate the performance of the models across two scales: (1) the global number of epochs computed and (2) the number of gradients, which provide more accurate measures of performance for decentralized training (ref:bfl-shard, ). While we are primarily concerned with the performance of integrating existing FL solutions into the blockchain, sharding should allow a larger fraction of clients to be fit, allowing for model performance to be increased.

4.2. Datasets

For experimentation, we use three datasets, namely the MNIST dataset (ref:mnist, ) consisting of 60,000 28x28 images of handwritten digits, and the CIFAR-10 dataset (ref:cifar10, ) which consists of a dataset of 60,000 32 $\times$ 32 images separated into 10 classes with 6,000 images per class. We use these to represent IID data, where these classes of data are split evenly between clients.

For non-IID scenarios, we use the LEAF dataset (ref:leaf-dataset, ) which can be used to test model performance by partitioning the data between peers in a reproducible way. In this setting, the classes of data may not be evenly split. For example, in the case of the Federated Extended MNIST (FEMNIST) dataset, characters were split by the writer of the digit or character. This can be a strong benchmark to model real-world distributions of data.

These datasets are used as a common baseline for comparison and evaluation against previous solutions (ref:federated-learning, ; ref:bfl-async, ; ref:bfl-shard, ), showing the addition of a sharding solution maintains competitive results on model performance. While a non-IID assignment of data across shards may influence the generated global model during the global aggregation stage, the selection of mainchain defence mechanisms plays the largest role in model convergence. Thus, the results generated by the chosen datasets demonstrate the applicability of sharding solutions to existing Blockchain-based methods.

4.3. Results

We created multiple workloads for the Update Creation Throughput benchmark to explore the effectiveness of our sharding solution. These workloads are conducted without malicious updates using the MNIST (ref:mnist, ) dataset to evaluate the performance of model creation transactions. We set the timeout period for each transaction to 30 seconds, allowing us to consider the number of failed transactions as stale (i.e., not malicious). In this experiment, each of the clients evaluated the update against their entire local dataset, in which we have used the entire test split of the MNIST dataset for each client.

We first measure the throughput with respect to the number of shards, as seen in Figure 4. This workload uses two Caliper workers, evaluated over 200 transactions, where we can see the throughput for each independently operating shard which allows the global throughput to scale linearly. This directly reflects the time for the evaluation of a model that is the primary bottleneck. We note here that the sent TPS for each number of shards is set just above its throughput in order to saturate the system for the evaluation. In this workload, we used the same number of sent transactions for consistency; however, higher numbers of shards have shown improvements in throughput for increased transactions sent, while a lower number of shards may begin to fail earlier.

To test the maximum throughput achieved by our system, we measure the sent TPS against the system throughput and average latency. See Figure 5. This workload is run with 2 caliper workers over 200 transactions. As we increase the sent TPS, the system will reach a point where it becomes saturated and is unable to handle a higher TPS. We can see this by looking at the point where the average latency begins to increase in the figure, where the throughput is maximized. We conduct this workload by measuring sent TPS in increments of 3, starting from 3 TPS. We can see each additional shard increases the overall system throughput similarly in Figure 4 while demonstrating the maximum throughput limit.

To explore the limits of a usage surge, we measure the number of transactions sent by system throughput, average latency, and failure rate. In this benchmark, we are interested in the system’s behaviour under a workload with a higher sent TPS than throughput. This workload is run with 2 caliper workers and a sent TPS just above the maximum throughput as seen in the previous workload. See Figure 6. When the system cannot handle the number of transactions currently queued, the average latency will spike, and the number of failed requests will increase because the system begins timing out requests. The system will then go through a “flush” period where many long-running requests begin to fail, so the average latency of successful requests will drop. We note here that the average latency peaks at around 16 seconds due to the average between the maximum latency (the timeout limit) and the minimum latency, which corresponds to the time it takes to evaluate a model fully.

Figure 7 shows the throughput over this workload, where the throughput of the system decreases when it is no longer able to handle the number of incoming requests, as the overhead to process these additional requests limits the number of successful requests processed. The decrease in throughput corresponds to the increase in average latency previously observed.

We finally test how the system handles concurrent requests. To do so, we run multiple workloads, varying the number of caliper workers, and measuring the system throughput and average latency. This workload configuration sends 200 transactions, with the sent TPS equal to the previously-mentioned maximum throughput. This can be in Figure 8. We observe that the throughput of the system with respect to the number of workers is quite noisy, however, there is a general downward trend in the system throughput. Increasing the number of caliper workers here allows us to scale the workload generation, however since the endorsement workers evaluating the model operate sequentially in these experiments (limited to a single thread), The throughput is limited. Similarly, we can see an upward trend in average latency caused by each transaction’s time waiting for execution in the queue. We can see that the number of shards plays the largest factor here as workloads with more than 2 shards are tightly grouped with respect to average latency due to these workloads being able to operate in parallel across shards.

Table 2. Best accuracy with varying minibatch size (B) and number of local epochs (E)

$B$	$E$	FedAvg (Accuracy)	ScaleSFL (Accuracy)
10	1	00.8784	0.9835
10	5	0.9511	0.9897
10	15	0.9654	0.9896
20	1	00.7249	0.9758
20	5	0.9251	0.9881
20	15	0.9544	0.9889

To measure model performance, we measure the effect of various training parameters. We use minibatch sizes of $B\in\{10,20\}$ , and local epoch sizes of $E\in\{1,5,15\}$ . We present results here from the non-IID MNIST dataset, however similar results hold for both CIFAR-10, and LEAF benchmarks. The effects of both minibatch size and local epoch sizes can be seen in Figure 9, and Table 2, where we compare our ScaleSFL solution and the traditional FedAvg algorithm. This comparison is performed under the assumption all clients are honest, and thus we are simply comparing the effect of sharding. Both training loss and accuracy are considered, and workflows are run for 15 global epochs, where the number of ScaleSFL shards is set to 8, with each shard sampling eight clients each round, making for a total of 64 clients. The FedAvg algorithm is comparably run with 64 clients, with both methods using a client learning rate of $\eta_{k}=1e-2$ .

In Figure 9, we observe the convergence rate of ScaleSFL is faster than the FedAvg algorithm, generally reaching an accuracy of $0.98$ within the 15 global epochs considered. This is due to the parallelized nature of ScaleSFL, where each shard can simultaneously filter and produce viable updates with reasonable client sampling rates, combining them to produce a stronger global model at each step.

5. Discussion

Hierarchical Sharding

Since each shard operates independently, several additional improvements over a traditional FL workflow can be achieved. For example, a typical device selection step is usually performed, sampling a subset of the available devices for each round. However, since the population of each shard is much smaller than the global population, each shard’s ability to update and maintain active devices becomes more efficient, and during aggregation, a much larger sampling of devices can be used from the global population.

Additional optimizations could be achieved by placement of clients based on region (ref:hierfavg, ). Since each client may submit their model to an off-chain cache such as IPFS, shards with a population based within the same region could reduce overhead latency. In this way, the overhead only occurs during the global aggregation phase. The possibility of single-shard takeover attacks may be introduced by not using a random sampling scheme. However, as noted before, this disruption will only affect the compromised shard since all endorsing peers from all shards must agree on which shard level updates are aggregated. This may allow for region based sampling algorithms to be used for the assignment of participants. These strategies should be chosen depending on the underlying task, potentially chosen by the task submitter, or consensus among task participants prior to shard creation.

Alternative participant sampling can be considered in the case of cross-silo or consortium based settings such as medical or financial organizations. For example, it may be beneficial to allow clients of a particular organization to be grouped under a single shard (ref:bfl-shard, ). Fine-grained control oversampling and permissions are delegated independently within each shard.

Model Provenance

By pinning model information back to the mainchain, we provide an open model hub for sharing and distributing trained models in a decentralized fashion. Model submission to IPFS additionally allows for transparent model check-pointing and disaster recovery. In the case that a bug has been introduced at some point in the FL process, either intentionally through malicious attack, or by data bias or incompatibility, previous model checkpoints may be restored, and a new task may be initiated using this saved model checkpoint.

Rewards Allocation

While our implementation of ScaleSFL uses Hyperledger Fabric, alternative implementations may consider platforms such as Ethereum (ref:ethereum, ) for implementation of reward mechanisms. Since training models in a permissionless fashion may result in decreased contributions, due to the availability of models through a model hub, and without expending computing resources, rewards become an integral part of the workflow. While access to the models should remain free, contributing computing resources should be rewarded by distributing rewards to the contributing clients. Additional incentives could be provided by task contributors or interested clients, to “sweeten the pot”, encouraging community contributions. Additionally, with the incorporation of a native cryptocurrency, submitting models transactions could incur a small gas fee. This would prevent attempted Denial of Service (DOS) attacks against the system since the rewards for model contributions are only realized for non-malicious updates. However, common DOS attacks may generate random model updates or send updates with dead cache links.

Alternative Attacks

Up to this point, we have discussed malicious clients as clients which send poisoned model updates. However, malicious clients may attempt various alternate types of attacks, such as uploading a different model than specified in the task request or making the model update very large, attempting to perform a DOS attack on a shard. These attacks may be prevented by checking the size of the model uploaded to the cache prior to download. However, a more clever client may attempt to submit a legitimate model many times, attempting to either reap the rewards for training, or to perform a DOS attack against the system. These clients are known as lazy nodes (ref:fl-meets-bc, ). They additionally attempt to copy another client’s model update during a gossip phase. This problem has severe negative effects on global model performance, so updates from these lazy nodes should be discarded. One way to recognize these lazy nodes is to publish model updates with a pseudo-noise (PN) sequence, and then subsequently publish the PN sequence after each client has published their model update. Lazy nodes will be spotted by checking the correlation between the model updates and the published PN sequence (ref:fl-meets-bc, ). Gas fees should deter spotted clients, and Sybils from attempting to submit further model updates. Incorporating the defence into our workflow would modify the off-chain model storage step, ensuring all models submitted during model submission have a valid PN signature.

6. Conclusions

In this paper, we proposed a sharding solution for blockchain based federated learning. We described a modular workflow, allowing for pluggable defence mechanisms and poisoning mitigations strategies. This workflow supports off-chain FL frameworks to be used, while pinning verified model updates to the blockchain. We additionally implement a proof-of-concept of ScaleSFL using Hyperledger fabric. Our experimental results demonstrated that ScaleSFL improves validation performance linearly with the addition of shards. This helps address the prevalent scalability issue related to blockchain consensus by demonstrating the impact of sharding.

As ongoing and future work, we further explore the details of a sharded solution applied to the blockchain. Specifically, we aim to simulate malicious attacks on the system via model poisoning updates. This would show the effectiveness of a blockchain solution to provide security against bad actors. Additionally, we will explore the application of a reward mechanism for participating in the model training process for our current system. This could be explored on other chains such as Ethereum (ref:ethereum, ), where the use of native tokens is well-supported. Finally, we would like to implement a more full-featured implementation of this PoC, including dynamic shard creation and allowing model proposition through our catalyst contract.

Acknowledgements.

This work is partially supported by NSERC (Canada) and University of Manitoba.

References

[1] Jie Xu, Benjamin S Glicksberg, Chang Su, Peter Walker, Jiang Bian, and Fei Wang. Federated learning for healthcare informatics. Journal of Healthcare Informatics Research, 5(1):1–19, 2021.
[2] Adnan Qayyum, Kashif Ahmad, Muhammad Ahtazaz Ahsan, Ala Al-Fuqaha, and Junaid Qadir. Collaborative federated learning for healthcare: Multi-modal covid-19 diagnosis at the edge. arXiv preprint arXiv:2101.07511, 2021.
[3] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
[4] Lumin Liu, Jun Zhang, SH Song, and Khaled B Letaief. Client-edge-cloud hierarchical federated learning. In ICC 2020-2020 IEEE International Conference on Communications (ICC), pages 1–6. IEEE, 2020.
[5] Arvind Narayanan and Vitaly Shmatikov. How to break anonymity of the netflix prize dataset. arXiv preprint cs/0610105, 2006.
[6] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
[7] Cynthia Dwork. Differential privacy. In International Colloquium on Automata, Languages, and Programming, pages 1–12. Springer, 2006.
[8] Paritosh Ramanan and Kiyoshi Nakayama. Baffle: Blockchain based aggregator free federated learning. In 2020 IEEE International Conference on Blockchain (Blockchain), pages 72–81. IEEE, 2020.
[9] Yuzheng Li, Chuan Chen, Nan Liu, Huawei Huang, Zibin Zheng, and Qiang Yan. A blockchain-based decentralized federated learning framework with committee consensus. IEEE Network, 35(1):234–241, 2020.
[10] Marco Barreno, Blaine Nelson, Anthony D Joseph, and J Doug Tygar. The security of machine learning. Machine Learning, 81(2):121–148, 2010.
[11] Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, and Julien Stainer. Machine learning with adversaries: Byzantine tolerant gradient descent. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 118–128, 2017.
[12] Clement Fung, Chris JM Yoon, and Ivan Beschastnikh. Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866, 2018.
[13] Ching Pui Wan and Qifeng Chen. Robust federated learning with attack-adaptive aggregation. arXiv preprint arXiv:2102.05257, 2021.
[14] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the thirteenth EuroSys conference, pages 1–15, 2018.
[15] Hyperledger caliper, Apr 2020.
[16] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pages 1310–1321, 2015.
[17] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
[18] Anifat M. Olawoyin, Carson K. Leung, and Qi Wen. Privacy preservation of COVID-19 contact tracing data. In Proceedings of the IUCC-CIT-DSCI-SmartCNS 2021, pages 288–295, 2021.
[19] Abylay Satybaldy and Mariusz Nowostawski. Review of techniques for privacy-preserving blockchain systems. In Proceedings of the 2nd ACM International Symposium on Blockchain and Secure Critical Infrastructure (BSCI ’20), pages 1–9. ACM, 2020.
[20] Hyesung Kim, Jihong Park, Mehdi Bennis, and Seong-Lyun Kim. Blockchained on-device federated learning. IEEE Communications Letters, 24(6):1279–1283, 2019.
[21] Chuan Ma, Jun Li, Ming Ding, Long Shi, Taotao Wang, Zhu Han, and H Vincent Poor. When federated learning meets blockchain: A new distributed learning paradigm. arXiv preprint arXiv:2009.09338, 2020.
[22] Yinghui Liu, Youyang Qu, Chenhao Xu, Zhicheng Hao, and Bruce Gu. Blockchain-enabled asynchronous federated learning in edge computing. Sensors, 21(10):3335, 2021.
[23] Jonathan Passerat-Palmbach, Tyler Farnan, Robert Miller, Marielle S Gross, Heather Leigh Flannery, and Bill Gleim. A blockchain-orchestrated federated learning architecture for healthcare consortia. arXiv preprint arXiv:1910.12603, 2019.
[24] Yang Zhao, Jun Zhao, Linshan Jiang, Rui Tan, Dusit Niyato, Zengxiang Li, Lingjuan Lyu, and Yingbo Liu. Privacy-preserving blockchain-based federated learning for iot devices. IEEE Internet of Things Journal, 8(3):1817–1829, 2020.
[25] Yunlong Lu, Xiaohong Huang, Ke Zhang, Sabita Maharjan, and Yan Zhang. Low-latency federated learning and blockchain for edge association in digital twin empowered 6g networks. IEEE Transactions on Industrial Informatics, 17(7):5098–5107, 2020.
[26] Keke Gai, Yulu Wu, Liehuang Zhu, Zijian Zhang, and Meikang Qiu. Differential privacy-based blockchain for industrial internet-of-things. IEEE Transactions on Industrial Informatics, 16(6):4156–4165, 2019.
[27] Shuo Yuan, Bin Cao, Yao Sun, and Mugen Peng. Secure and efficient federated learning through layering and sharding blockchain. arXiv preprint arXiv:2104.13130, 2021.
[28] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
[29] Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, and Dimitris Papailiopoulos. Attack of the tails: Yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems, 33:16070–16084, 2020.
[30] Yuzhe Ma, Xiaojin Zhu, and Justin Hsu. Data poisoning against differentially-private learners: Attacks and defenses. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, page 4732–4738. AAAI Press, 2019.
[31] Jun Li, Yumeng Shao, Kang Wei, Ming Ding, Chuan Ma, Long Shi, Zhu Han, and Vincent Poor. Blockchain assisted decentralized federated learning (blade-fl): Performance analysis and resource allocation. IEEE Transactions on Parallel and Distributed Systems, 2021.
[32] Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith. Fair resource allocation in federated learning. In International Conference on Learning Representations, 2020.
[33] Miguel Castro, Barbara Liskov, et al. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999.
[34] Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm. In 2014 USENIX Annual Technical Conference (USENIX ATC 14), pages 305–319, 2014.
[35] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19, 2019.
[36] Juan Benet. Ipfs-content addressed, versioned, p2p file system. arXiv preprint arXiv:1407.3561, 2014.
[37] ConsenSys. Consensys/quorum: A permissioned implementation of ethereum supporting data privacy.
[38] Gavin Wood et al. Ethereum: A secure decentralised generalised transaction ledger. Ethereum project yellow paper, 151(2014):1–32, 2014.
[39] Daniel J Beutel, Taner Topal, Akhil Mathur, Xinchi Qiu, Titouan Parcollet, and Nicholas D Lane. Flower: A friendly federated learning research framework. arXiv preprint arXiv:2007.14390, 2020.
[40] Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, and Ilya Mironov. Opacus: User-friendly differential privacy library in PyTorch. arXiv preprint arXiv:2109.12298, 2021.
[41] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[42] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
[43] Sebastian Caldas, Sai Meher Karthik Duddu, Peter Wu, Tian Li, Jakub Konečnỳ, H Brendan McMahan, Virginia Smith, and Ameet Talwalkar. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097, 2018.

ScaleSFL: A Sharding Solution for Blockchain-Based Federated Learning