¹¹institutetext: Grove Inc.; grove.city ²²institutetext: Pocket Scan Technologies LLC; poktscan.com

Relay Mining: Incentivizing Full Non-Validating Nodes Servicing All RPC Types

Olshansky Daniel 11 Rodríguez Colmeiro Ramiro 22

Abstract

Relay Mining presents a scalable solution employing probabilistic mechanisms, crypto-economic incentives, and new cryptographic primitives to estimate and prove the volume of Remote Procedure Calls (RPCs) made from a client to a server. Distributed ledgers are designed to secure permissionless state transitions (writes), highlighting a gap for incentivizing full non-validating nodes to service non-transactional (read) RPCs. This leads applications to have a dependency on altruistic or centralized off-chain Node RPC Providers. We present a solution that enables multiple RPC providers to service requests from independent applications on a permissionless network. We leverage digital signatures, commit-and-reveal schemes, and Sparse Merkle Sum Tries (SMSTs) to prove the amount of work done. This is enabled through the introduction of a novel ClosestMerkleProof proof-of-inclusion scheme. A native cryptocurrency on a distributed ledger is used to rate limit applications and disincentivize over-usage. Building upon established research in token bucket algorithms and distributed rate-limiting penalty models, our approach harnesses a feedback loop control mechanism to adjust the difficulty of mining relay rewards, dynamically scaling with network usage growth. By leveraging crypto-economic incentives, we reduce coordination overhead costs and introduce a mechanism for providing RPC services that are both geopolitically and geographically distributed. We use common formulations from rate limiting research to demonstrate how this solution in the Web3 ecosystem translates to distributed verifiable multi-tenant rate limiting in Web2.

Keywords:

remote procedure call crypto-economic commit-reveal decentralization scalability blockchain rate limiting token bucket

1 Introduction

The rise of Software as a Service (SaaS) has popularized subscription-based models, facilitated through Remote Procedure Calls (RPCs) like JSON-RPC, gRPC, REST APIs, and GraphQL to meet diverse application requirements. These models typically bill development teams based on usage or subscription tiers [16].

In the Web3 space, decentralized applications (dApps) often rely on blockchain data, traditionally accessed by operating a full node [2]. However, the high resource demands — such as the need for at least 16GB of RAM, a 2TB SSD, and a 25Mbps download speed for running a geth client [9] — make self-managing full nodes impractical for most developers, particularly on client devices. Light clients, a concept originating from Bitcoin’s Simple Payment Verification, are evolving but still depend on Node RPC providers that supply necessary proofs and headers [23].

The dependence on these providers accelerates dApp development but also pushes towards further centralization and reliance on a few large infrastructure entities [22]. While light clients aim to minimize this reliance, they still require RPC services to function effectively.

For sustainability, RPC service operators implement rate-limiting and Denial-of-Service (DoS) protection strategies. Despite advancements in distributed ledger technology for permissionless transaction verification (writes), a corresponding solution for data reads (reads) remains undeveloped. Relay Mining addresses this by exploring rate-limiting mechanisms within a live network, applicable to any RPC interface.

1.1 Blockchain Writes - Validators, Builders, Proposers et al

Distributed ledgers (blockchains) are often reduced to the problem of Byzantine Fault Tolerant (BFT) State Machine Replication (SMR) [5]. Discussions about scalability typically focus on increasing throughput — the capacity to write more data to the ledger without compromising safety and liveness. This challenge led to the formulation of the Blockchain Scalability Trilemma, which assesses the trade-offs between scalability, decentralization, and security of a blockchain [14].

When evaluating the scalability of a Layer 1 (L1) blockchain, key metrics include Transactions-Per-Second (TPS) and Time-to-Finality (TTF), which provide direct insights into the network’s capacity and usage, as illustrated in Table 1.

The constraints of limited write capacity lead to on-chain auctions, often analyzed through the lens of Maximal Extractable Value (MEV). The need for efficient transaction ordering has introduced additional roles like Proposers and Builders, enhancing the ecosystem’s focus on scaling data throughput rather than data consumption [18] [15].

	Ethereum	Solana	Dfinity	Aptos	Near
Transactions-Per-Second (TPS)	11	286	5382	5	8.25
Time-To-Finality (TTF)	15m	5-12s	1.4s	$\leq$ 1s	3.3s
Number of Nodes	6562	1872	823	107	798

Table 1: L1 write throughput comparison measured in December 2022; Aptos data is more recent [8] [20].

1.2 Blockchain Reads - RPC Node Providers

Despite advancements in addressing the Scalability Trilemma to manage, verify, and incentivize writes, the predominant type of requests made by application developers are reads. Table. 2 highlights the substantial traffic managed by major node providers in 2023, emphasizing the billions of daily requests, mostly reads, which present challenges not addressed by BFT replicated state machines.

Accessing blockchain data, crucial for operations like those described in the Simple Payment Verification [23], typically requires one of the following infrastructure components:

1.

Full Nodes: Synchronizing a full node to maintain a complete, locally verified copy of the ledger.
2.

Light Clients: Acquiring headers and proofs just-in-time from other full nodes.
3.

RPC Providers: Relying on a third party to retrieve data via an RPC request.

Options (1) and (2) often face practical limitations for widespread application use. Although option (3) has evolved into a robust Web2 SaaS-like ecosystem, it depends on provisioning and maintaining scalable infrastructure, while managing off-chain payments and Service Level Agreements (SLAs) in a more traditional and permissioned environment. Fig. 1 illustrates the operational models commonly adopted by decentralized Applications (dApps) today using these methods.

	Infura	Alchemy	Ankr	Pocket Network
Requests Per Day	7B	??	8B	1.5B
Latency / Round-Trip-Time (sec)	0.77	0.61	0.67	0.76
Availability / Uptime	99.99%	??	99.99%	99%

Table 2: The latency shown is the p50 RTT for Ethereum Mainnet from Miami as measured using RPCMeter [7]. The other metrics provided are a best-effort approximation based on public resources [10][17].

Refer to caption — Figure 1: Communication between a dApp with permissionless and untrusted full nodes (left) versus trusted and centralized full nodes (right).

2 Problem Statement

There are no financial incentives for full non-validating, non-altruistic nodes in a permissionless network. An effective solution must address:

1.

Incentives: Implement on-chain crypto-economic incentives for non-transactional read RPC requests.
2.

Scalability: Scale to accommodate hundreds of billions of daily read requests due to the nature of read request volume.
3.

Rate Limiting: Facilitate rate limiting within a permissionless, multi-tenant environment.

Current on-chain incentives are tailored for full validating or block building nodes, rewarding them for processing state transitions and collecting transaction fees. However, no analogous incentives exist for read-only operations, making reliance on altruistic node operators unsustainable for dApps.

2.1 RPC Trilemma

This paper introduces the Relay Mining algorithm, addressing the aforementioned needs. With prospects for future expansion into tokenomics and the duality between centralized Gateways and permissionless hardware operators, we also outline the RPC Trilemma. Illustrated in Fig. 2, this trilemma involves selecting RPC providers based on a trade-off among Reliability, Performance, and Cost. While our optimistic crypto-economic model incentivizes rate limiting and read request servicing, addressing just-in-time quality of service remains beyond this paper’s scope.

3 Experience & Motivation

The foundation of the problem statement in Section 2 emerged from the challenges faced by a live permissionless network of RPC providers, notably Pocket Network. This network, built on Tendermint BFT [11], began operation on October 22, 2021, and has served as a decentralized RPC provider, reaching a peak of 2 billion relays per day by May 8, 2023. Addressing the scalability of processing such a vast number of relays was the results of the following protocol characteristics:

1.

Utilization of a balanced (complete) Merkle Tree (IAVL) for relay book-keeping commitments. [12]
2.

Representation of each relay by a single leaf in the form of a signed (request, response) pair.
3.

Sorting of leaves in the Merkle Tree before insertion to ensure replay protection.
4.

Allocation of a significant portion (70%) of each block to Merkle Proofs.
5.

Handling of hundreds to thousands of requests per second by each provider.

These practices, especially the sorting of leaves for replay protection, introduced significant processing inefficiencies and security vulnerabilities [28].

The Relay Mining algorithm proposes enhancements to address these inefficiencies within a permissionless and decentralized network of RPC service providers. Key improvements include:

1.

Implementation of a Sparse Merkle Sum Tree, which is more efficient in both storage and updates.
2.

Use of a hash collision mechanism to manage relay proofs, thereby decoupling tree size from demand growth.
3.

Employment of a ClosestMerkleProof proof-of-inclusion scheme to validate the work done without needing to sort the Merkle Tree.

4 Related Work

4.1 Rate Limiting Algorithms

Most rate-limiting algorithms can be categorized as either Token Bucket algorithms (e.g., Leaky Bucket) or Window-based algorithms (e.g., Fixed Window, Sliding Window) [24]. Much of the existing work and literature that extends these to Distributed Rate Limiting (DRL) was captured in [21] [25], with two primary goals: 1. Finding a tradeoff between rate-limiting accuracy and communication overhead between brokers, 2. Reducing the DRL penalty on the end-user in exchange for accuracy. Although these approaches are valuable, they assume that the brokers or service providers, despite being geographically distributed, are not decentralized in terms of ownership and assume non-Byzantine guarantees.

4.2 Distributed Rate Limiting

Although the solutions discovered by the team at Washington University in St. Louis [21] do not directly apply to Relay mining, their formulation, as shown in Section 5, is relevant. They distilled common Token Bucket rate-limiting algorithms into two key parameters when managing communication between Clients and Brokers:

$\bullet$

r: The rate at which tokens are generated. This bounds the long-term rate of message services.
$\bullet$

b: The maximum number of tokens that can be accumulated. This bounds the maximum burst of messages that can be sent.

Their work on Scalable Real-Time Messaging (SRTM) encapsulated the following principles in reducing the Distributed Rate Limit (DRL) penalty:

$\bullet$

Concentration: Identification of the smallest number of brokers for message servicing to meet the network’s Service Level Objectives (SLOs).
$\bullet$

Max-min: Maximization of the minimum workload assigned to each broker.
$\bullet$

Correlation-awareness: Assignment of publishers to brokers to minimize inter-publisher correlation and reduce message burstiness.

4.3 Rate Limiting in Web3

To the authors’ knowledge, no other multi-tenant decentralized RPC node provider network currently operates in production. Centralized node operators such as Alchemy employ subscription tiers and compute units to manage rate limiting. This strategy, however, relies on centralization and does not necessitate on-chain proof and verification [1]. Compute units measure the total computational resources that applications utilize on platforms, rather than tallying individual requests. Another protocol in development, Lava, also intends to implement compute units for rate limiting [6], but details of this implementation remain undisclosed. The lack of an existing decentralized RPC provider in production underscores the unique and complex challenges involved in maintaining decentralization while achieving on-chain proof and verification.

5 Rate Limiting Formulation

In order to present our approach to rate limiting in the context of Relay Mining, we will draw analogs from the concepts above to primitives in Pocket Network.

The primary actors of concern are:

$\bullet$

Application: The client sending RPC requests (real-time messages) to one or more Servicers (brokers).
$\bullet$

Servicers: Distributed multi-tenant brokers responsible for handling and responding to RPC requests from the Application.

Parameters r and b map directly to on-chain values:

$\bullet$

AppStake(r): A proxy into the maximum number of requests an Application can make across all current and future sessions across all valid Servicers. This directly relates to the Application’s on-chain stake using the network’s native token and can be increased through up-staking.
$\bullet$

RelaysPerSession(b): The maximum number of requests an Application can make across all Servicers in a single session.

The core SRTM principles map directly to on-chain parameters:

$\bullet$

ServicersPerSession(Concentration): The smallest number of Servicers that can handle an Application’s requests in a specific session to meet the network’s Service Level Objectives (SLO). These can include Quality of Service (QoS), decentralization, fairness, etc…
$\bullet$

RelaysPerServicerPerSession(Max-min): The maximum number relays a single Servicer can handle in a single session.
$\bullet$

SessionData(Correlation-awareness): Random assignment of an Application to a subset of valid Servicers for the duration of the session.

6 Relay Mining

Relay Mining is a multi-step mechanism designed to ensure fair and permissionless distribution of work between a single Application and multiple Servicers. Utilizing data and parameters stored on the distributed ledger (i.e., on-chain data), this process provides incentive-driven rate limiting for the Application without necessitating additional communication between Servicers. A combination of commit-and-reveal mechanisms and probabilistically requested membership proofs is employed to verify the amount of work completed. Furthermore, a dynamic difficulty modulation is used to scale the network’s capacity as it organically expands. Fig. 3 illustrates the four stages of a Session Lifecycle.

6.1 Session Lifecycle

Applications and Servicers must stake a certain balance and the service(s) that they plan to access and provide, respectively. Note that the maximum long-term rate that the Application can receive is proportional to $r$ . This is shown in Step 1 of Fig. 3.

Any Session (a periodic of time determined by a specific number of blocks) determines which Servicers can receive a reward for servicing an Application’s request. This is done with the assumption of the Random Oracle Model and uses on-chain data as entropy [3] as shown in Step 2 of Fig. 3. Note that the maximum number of messages that can be serviced by each individual Servicer per session is $b$ . Per the work described in [21], enforceable sub-buckets are not implemented and $r<\sum\limits_{l=1}^{k}r_{l}$ . Due to the lack of guarantee that the sub-token count of each individual Servicer will be exhausted, a margin is provided, and a negative Application balance becomes theoretically possible. In this unlikely scenario, an Application would need to rotate the keys associated with its dApp in order to continue using the services provided by the network, and additional mitigations are reserved for future work.

6.2 Payable Relay Accumulation

Proposition 1 in [21] demonstrated that in the context of highly dynamic workloads, splitting tokens into sub-buckets $(r_{l},b_{l})$ such that $r=\sum\limits_{l=1}^{k}r_{l}$ and $b=\sum\limits_{l=1}^{k}b_{l}$ can never improve the running sum of message delays. However, building on the work of [25], who investigated probabilistic DRL, we can deduce that there is always an inherent tradeoff between absolute system accuracy and communication cost. Considering the possibility and added complexity of potential Byzantine actors in a multi-tenant, decentralized, and permissionless system, we opt for the tradeoff of accuracy in exchange for reducing coordination costs. In Algorithm 1, $RelayAccuracy$ represents the maximum overhead that can cause an Application’s stake to be negative at the end of a Session. This is shown in the first half of Step 3 in Fig. 3.

Algorithm 1 Payable Relay Accumulation

AppStake\leftarrow 10^{6}

\triangleright

Amount of staked Applications tokens

ServicersPerSession\leftarrow 12

\triangleright

Number of valid payable Servicers per session

TTRM\leftarrow 1000

\triangleright

Token to relay multiplier per session

RelayAccuracy\leftarrow 0.2

\triangleright

An accuracy error margin for TTRM

b\leftarrow\frac{AppStake\cdot TTRM}{ServicersPerSession}\cdot(1+RelayAccuracy)

\triangleright

Num token buckets per servicer per session

6:Given a Session for a specific Application accessing Service Svc

7:for Servicers

[S_{1},...,S_{N}]

between Blocks

[B,B+W)

8: set

tokenCount=b

for each Servicer

9: for each relay R do

10:

signedRequest=Sign_{AppPrivKey}(SvcResponse)

11: if

!valid(signedRequest)

then

12: continue

13: end if

14: if

!payable(signedRelay,App,Servicer,Session)

then

15: continue

16: end if

17: if

tokenCount\leq 0

then

18: continue

19: end if

20:

SvcResponse=Svc(SvcRequest)

21:

signedResponse=Sign_{ServicerPrivKey}(SvcResponse)

22:

d=digest(signedRequest,signedResponse)

23:

hasCollision=checkCollision(d,SvcDifficulty)

24: if

hasCollision

then

25:

tokenCount-=1

26:

key=d

27:

value=(signedRequest,signedResponse)

28:

insertLeafInSparseMerkleTree(key,value)

29: end if

30: end for

31:end for

6.3 Claim & Proof Lifecycle

The second part of Steps 3 & 4 in Fig. 3 provides an overview of the Claim & Proof lifecycle implemented via a commit-and-reveal scheme.

Once a valid payable relay is serviced, assuming the $digest$ of $(SignedRequest,SignedResponse)$ using a collision-resistant hash function matches the difficulty for that chain/service as described in Section 6.4, it is committed for the purposes of volume estimation into a Merkle Tree Data Structure backed by a Key-Value storage engine. In order to avoid the compute overhead of tree re-balancing on every insertion, while maintaining efficient proof sizes, a Sparse Merkle Trie implementation [13], based on Diem’s Jellyfish Merkle Tree sparse optimizations [26], is used. The value stored on every leaf is the $Serialized(SignedRequest,SignedResponse)$ data structure, pointed to by the $digest$ , whose bits act as the path to the leaf in the Trie. This approach guarantees space and compute efficiency, as well as preventing replay attacks through unique leaf guarantees without the need to maintain sorted leaves [28]. The only other modification that will be made to this Trie is that it will be implemented as a Sparse Merkle Sum Trie, so the sum of all the volume-applicable relays are committed to by the root [27] as shown in Fig. 4.

Several blocks after the Servicer submits a Claim to the tree Commitment via an on-chain transaction, there is a time window during which it must also reveal its proof of a randomly selected branch from the tree. Since the tree is not complete, the leaves are not indexed, and the set of all leaves is unknown to anyone but the Servicer’s local state, a random branch must be requested through other means. Using on-chain data (e.g., Block Hash) determined after the commitment, a random set of bits equal to the key’s length is generated in the Random Oracle Model [3], and the closest membership proof of an existing leaf that can be found must be revealed and submitted. For example, if the random set of bits is $0b0001$ , the tree is traversed until it retrieves the membership proof for Leaf 3 in Fig. 4 with the key $0b0011$ .

6.4 Probabilistic Volume Estimation

In a decentralized network where the exact count of processed relays is not tracked, we have to estimate the actual number of relays from sampled data. Consider the number of observed claims $C_{b}$ for a specific blockchain, denoted by bk, as a random variable following the binomial distribution:

B(R_{bk},p_{bk})=\binom{R_{bk}}{C_{bk}}\cdot p_{b}^{C_{bk}}(1-p_{bk})^{R_{bk}-C_{bk}},

(1)

Here, $R_{bk}$ represents the total number of processed relays and $p_{bk}$ denotes the hash collision probability given the current blockchain difficulty. We can then derive the value of $C_{bk}$ as follows ¹¹1For clarity, the subscript ${bk}$ is dropped in the remaining part of this work. Unless otherwise stated, all variables should be interpreted as per-blockchain.:

R=\frac{C}{p}.

(2)

This estimation is particularly accurate for large numbers of trials and higher probabilities [4], a scenario that is not difficult to achieve in the context of relay mining. The difficulty (and hence $p$ ) is controlled by the expected number of claims per block and the number of relays $R$ is typically very high ²²2As per our observations, refer to section 9.1 for details..

6.5 Difficulty Modulation

The proposed method for difficulty modulation follows Bitcoin’s approach [23], which uses a proportional control of an average of the controlled variable to adapt the current hash collision probability [19].

The modulation of difficulty can be described by algorithm 2. The method involves tracking the number of estimated total relays, denoted as $R_{ema}$ , and modulating the hash collision probability $p$ to reach a target number of claims $T$ . The Exponential Moving Average (EMA) parameter $\alpha$ and the number of blocks between updates $U$ control the responsiveness of the algorithm, or the speed at which the difficulty adapts to new relay volumes.

The values assigned to these variables are not fixed and are estimated based on empirical observations. They are subject to change to accommodate the dynamic nature of relay volumes.

Algorithm 2 Relay Mining Difficulty Calculation

T\leftarrow 10^{4}

\triangleright

Target claims by blockchain.

\alpha\leftarrow 0.1

\triangleright

Exponential Moving Average Parameter.

U\leftarrow 4

\triangleright

Number of blocks per difficulty update.

R_{ema}\leftarrow 0

\triangleright

Estimated blockchain relays, averaged by EMA.

p\leftarrow 1

\triangleright

Initial blockchain hash collision probability.

\mbox{height}\leftarrow 0

7:while True do

C\leftarrow getAllClaims()

\triangleright

Get all relay claims.

R\leftarrow\frac{C}{p}

10:

R_{ema}\leftarrow\alpha R+(1-\alpha)R_{ema}

11: if

\mbox{height}\%U

== 0 then

12:

p\leftarrow\frac{T}{R_{ema}}

13: if

p>1

then

14:

p\leftarrow 1

\triangleright

If total relays are lower than target, disable relay mining.

15: end if

16: end if

17:

\mbox{height}\leftarrow+1

18:end while

6.6 dApp Work Estimation

A key component of the protocol method is the correct estimation of work performed by a dApp. This estimation is crucial for accurately processing the service payments from the dApp, which involves token burning, and for rewarding the Servicers, which involves token minting.³³3For simplicity, we assume a one-to-one mapping from an on-chain Application actor to a single dApp making RPC requests, though in theory, multiple dApps could leverage the key of a single Application.

As explained in Section 6.5, the difficulty modulation relies on estimating the total relays per block on a given chain, where each chain can have multiple dApps. Given that the total number of relays $R$ is distributed among $A$ staked dApps, it is important to consider how edge cases could influence the bias and variability of the dApp relay volume estimation.

An experiment was conducted to examine this. Using a target of $T=10^{4}$ claims per block, the variability and bias of the estimated number of relays $R_{dApp}$ were calculated. The difficulty $d$ (inverse of hash collision probability $p$ ) ranged from $1.25$ relays per claim to $1000$ relays per claim, and the dApp participation $v$ ranged from $0.1\%$ to $10\%$ of total blockchain traffic. For each test point, a total of $I=10^{5}$ draws from the resulting variable $x\sim B(R,v,p_{b})$ were sampled. The bias and variability of the estimated dApp traffic were then calculated as follows:

R_{\mbox{dApp}}=\sum_{i=1}^{I}\frac{x_{i}}{I},

(3)

then:

\mbox{Bias}=\frac{R_{\mbox{dApp}}-(v\,R)}{(v\,R)}100,

(4)

and

\mbox{Variability}=2\,\sqrt{\frac{\sum_{i=1}^{I}{(x_{i}-R_{\mbox{dApp}})^{2}}}{I}}100.

(5)

The resulting bias and variability are illustrated in heat maps in Fig.5a and Fig.5b, respectively. We observe that the bias is close to zero across most of the sample space. Only when the number of $R_{\mbox{dApp}}$ is extremely low does bias become noticeable, but even then it remains above $-5\%$ . Conversely, variability is higher, particularly when the dApp participation drops below $1\%$ , under which conditions is grows rapidly.

7 Experimentation on Pocket Network V0

The proposed schema is expected to work under almost any kind of relay volumes and variations. This kind of relay volume variations can be easily obtained from the Pocket Network v0, as all relay data is tracked on its blockchain. Using this, it is possible to have an insight of how relay mining behaves on real traffic data and know the expected errors and bias of implementing it.

The analysis include the bias, variation and transitory effects in the estimation of the blochchain traffic for four different served blockchains with different volume sizes and transitions. The used parameters were the same as presented in algorithm 2. The main findings are summarized in table 3, the complete analysis is shown in the Appendix II. It can be seen that the transient effects on the target number of claims are significant but short lived, however the estimation of the total relay volume (responsible of controlling the difficulty) does not spike, ensuring an effective control of the proposed strategy.

Table 3: Pocket Network V0 Experiment results. Response of the proposed method to different observed network changes.

	Event Duration [blocks]	Min/Max Claims	Min/Max Target Error [%]	Accumulated Target Error by Block [claims]	Min/Max Volume Estimation Error [%]
Steady High Volume	3000	6789 / 20710	-32.11 / 107.10	7	-3.23 / 4.17
Soft surge	175	7804 / 14496	-21.96 / 44.96	620	-2.91 / 3.04
Step drop	44	1305 / 88963	-86.95 / 789.63	170	-1.92 / 3.45
Step surge	24	499 / 13897	-95.01 / 38.97	-60	-5.21 / 2.59

8 Future Work & Implementation

Relay Mining sets the foundation for future advancements in the realm of decentralized rate limiting. Two potential avenues for further exploration include:

Compute Units: By assigning different weights to the leaves in the Merkle Sum Tree, rate limiting can be customized on a per-request basis, enabling more granular control over computational resources.

Websockets: Adapt Relay Mining to accommodate long-lived connections, such as websockets, by applying rate limiting to data transferred over the network at periodic snapshots rather than individual requests. This would enhance the versatility of the rate limiting approach to cater to a wider range of application requirements.

9 Discussion & Conclusions

This paper introduces Relay Mining, a novel algorithm designed to scale efficiently to handle tens of billions of individual RPC requests daily. These requests are proven on a decentralized ledger via a commit-and-reveal scheme. The solution can account for the aggregate volume across all current Web3 Node RPC providers and comes equipped with an inbuilt difficulty modulation to dynamically adapt to the growing traffic in the Web3 industry.

Relay Mining builds upon the insights, performance, and data from the Pocket Network—a decentralized RPC network that has been operating a live mainnet for nearly two years at the time of writing. The network adeptly manages approximately 1.5 billion requests daily from over a dozen independent service providers servicing more than 50 blockchains. For secure and efficient insertions, and to meet random membership proof requirements, a Sparse Merkle Trees (SMTs) with probabilistic leaf insertions was adopted.

Relay Mining’s design and mechanism incorporate principles from Scalable Real-Time Messages (SRTM) [21], aiming to alleviate Distribute Rate Limiting (DRL) penalties through concentration, max-min, and correlation awareness between Applications and Servicers. To our knowledge, this is the first DRL mechanism that leverages crypto-economic incentives to enable multi-tenant rate limiting capable of accounting for Byzantine actors.

This work demonstrates that the algorithm performs best under high relay volumes, a typical scenario in decentralized RPC networks. Although a slight bias exists for applications with exceptionally low comparative volume (less than $0.5\%$ of total blockchain traffic), this bias is minimal (under $|5\%|$ ). Furthermore, the estimation variance in these circumstances does not present a significant issue for the protocol itself, as it self-regulates with other low-traffic applications. Should the issue of extremely low-traffic applications become a concern, specific implementation mechanisms can be introduced to address it.

Finally, it is important to note that several relevant concepts in decentralized RPC networks—such as response integrity, Quality of Service enforcement, and extensions to general-purpose compute units—though outside the scope of this paper, are complementary to the problems that Relay Mining addresses.

References

[1] Inc Alchemy Insights. Alchemy docs - compute units. https://docs.alchemy.com/reference/compute-units, 2022.
[2] Inc Alchemy Insights. Pros and cons of running your own n. https://www.alchemy.com/overviews/running-your-own-node, 2023.
[3] Mihir Bellare and Phillip Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proceedings of the 1st ACM Conference on Computer and Communications Security, CCS ’93, page 62–73, New York, NY, USA, 1993. Association for Computing Machinery.
[4] Saul Blumenthal and Ram C Dahiya. Estimating the binomial parameter n. Journal of the American Statistical Association, 76(376):903–909, 1981.
[5] Ethan Buchman, Jae Kwon, and Zarko Milosevic. The latest gossip on bft consensus, 2019.
[6] Yair Cleper and Gil Binder. Lava: A decentralized rpc network for blockchains, 2022.
[7] Fiag DAO. Rpc meter. https://beta.rpcmeter.io/ethereum/miami, 2023.
[8] Dfinity. L1 comparison - internet computer wiki. https://wiki.internetcomputer.org/wiki/L1_comparison, 2022.
[9] ethereum.org. go-etherum - hardware requirements. https://geth.ethereum.org/docs/getting-started/hardware-requirements, 2023.
[10] Fabio Wehb Ferrari. Ankr becomes one of the first rpc providers to the aptos blockchain, 2022.
[11] Interchain Foundation. What is tendermint? https://docs.tendermint.com/v0.34/introduction/what-is-tendermint.html, 2021.
[12] Interchain Foundation. Iavl+ tree, 2022.
[13] Zhenhuan Gao, Yuxuan Hu, and Qinfan Wu. Jellyfish merkle tree, 2021.
[14] Harry Halpin. Deconstructing the decentralization trilemma. In Proceedings of the 17th International Joint Conference on e-Business and Telecommunications. SCITEPRESS - Science and Technology Publications, 2020.
[15] Lioba Heimbach, Lucianna Kiffer, Christof Ferreira Torres, and Roger Wattenhofer. Ethereum’s proposer-builder separation: Promises and realities, 2023.
[16] SalesForce Inc. What is saas - software as a service. https://www.salesforce.com/in/saas/, 2023.
[17] Inc. Infura. The path to scalability, 2021.
[18] Kshitij Kulkarni, Theo Diamandis, and Tarun Chitra. Towards a theory of maximal extractable value i: Constant function market makers, 2023.
[19] laanwj, fanquake, Pieter Wuille, Hennadii Stepanov, Gavin Andresen, Andrew Chow, John Newbery, Jonas Schnelli, J), Jon Atack, Cory Fields, Carl Dong, Ryan Ofsky, Matt Corallo, Luke Dashjr, Sebastian Falbesoner, João Barbosa, Suhas Daftuar, Samuel Dobson, Non-Github User Bitcoin Commits, Gregory Maxwell, Gloria Zhao, Vasil Dimov, Anthony Towns, Alex Morcos, Sjors Provoost, James O’Beirne, kallewoof, Gregory Sanders, and Amiti Uttarwar. bitcoin/bitcoin. Github, 5 2023.
[20] Aptos Labs. Aptos explorer. https://explorer.aptoslabs.com/?network=mainnet.
[21] Chong Li, Jiangnan Liu, Chenyang Lu, Roch Guerin, and Christopher D. Gill. Impact of distributed rate limiting on load distribution in a latency-sensitive messaging service, 2021.
[22] Moxie Marlinspike. My first impressions of web3. 2022.
[23] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. Decentralized business review, page 21260, 2008.
[24] Jason Ngan. 4 rate limit algorithms every developer should know, 2022.
[25] Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren. Cloud control with distributed rate limiting, 2007.
[26] Celestia Development Team. Sparse merkle trees. https://github.com/celestiaorg/smt, 2021.
[27] Plasma Core Team. Merkle sum tree block structure, 2019.
[28] Alin Tomescu. Why you should probably never sort your merkle tree’s leaves. GitHub repository, 2023.

Appendix I

9.1 Pocket Network Stats

Since its genesis on October 22, 2021, up to the current date (May 10, 2023), Pocket Network has processed over 465 billion successful relays. During this period, the number of served relays per day increased from approximately 1.5 million to 1.5 billion, with a peak of 2 billion on May 8, 2023. The total relay traffic is composed of an increasing number of blockchains, currently supporting 50, distributed over 14 gateways, and over 12 major, with many minor, infrastructure providers across all continents. The performance from May 3, 2023, to May 10, 2023, on the dominant blockchains is presented in table 4.

Table 4: Main stats of the largest blockchains in the Pocket Network, from May 3rd, 2023 to May 10th, 2023. The data is averaged across all gateways, and distributed globally.

Blockchain	Share of Total Traffic [%]	Volume [relays]	Avg. TPS	Peak TPS	Avg. Latency [ms]	Success Rate [%]
Ethereum	$28.1$	$365\times 10^{6}$	8300	16500	138	99.7039
Polygon Mainnet	$25.3$	$326\times 10^{6}$	3200	4700	130	99.7200
DFKchain Subnet	$12.5$	$164\times 10^{6}$	1800	2400	121	99.4618
Gnosis - xDai	$10.2$	$132\times 10^{6}$	1500	2200	152	99.8337

The network is composed of 20,000 nodes; however, this number does not correlate with the number of deployed blockchain nodes. The current version of the protocol cannot verifiably prove the real number of blockchain nodes, as the Servicers are permissionless, but a rough estimate can be obtained through the nodes’ service domains. The largest Servicer group has approximately 5,000 nodes (about 25% of the network), followed by two others with around 2,500 nodes each (about 12.5% of the network each). Then, the number of nodes by domain decreases. Around 90% of the network is held by 14 independent domains. Assuming no blockchain node is shared among node runners ⁴⁴4This is a fair assumption given the competitive play between node runners. and that each node runner has at least two nodes per staked blockchain ⁵⁵5Accounting for redundancy or geographical distribution of blockchain nodes., the Pocket Network has more than 30 independent blockchain servers on each of the main served blockchains. It is important to note that this is only an estimation; however, the number is significant in terms of decentralization and reliability.

9.2 Network Actors

Fig. 6 illustrates the interaction and responsibilities between the various protocol actors in Pocket Network. It is important to note that in the current, live iteration of the protocol, the Portals and the Watchers are off-chain and consolidated as a single entity. The roles of all the actors are outlined below. In the context of Relay Mining, the key item to note is that Applications pay for access to RPC services, and Servicers earn in exchange for handling those requests.

$\bullet$

Validator - Validates BFT state transitions of the distributed ledger
$\bullet$

Servicer - Provides access to an RPC service and earns rewards in native cryptocurrency
$\bullet$

Watcher - Makes periodic Quality of Service (QoS) checks of Servicers
$\bullet$

Application - Pays for access to RPC service in native cryptocurrency
$\bullet$

Portal - Optionally proxies requests on behalf of the Application and provides secondary off-chain services

Appendix II

The relay mining framework was simulated over real-world data obtained from the Pocket Network v0 blockchain. The simulation has the objective of estimating the deviations that are likely to be observed under different network behaviours, for this we selected for blockchains with different behaviours, under a period of $12898$ blocks ( $\sim 134$ days). The selected networks were: Ethereum, Harmony Shard 0, Poligon Mainnet and Polygon Archival, the evolution of the traffic for each of them can be seen in Fig. 7.

These cases were analyzed using the following configuration:

$\bullet$

Target claims by block : $T=10^{4}$
$\bullet$

Exponential Moving Average decay : $\alpha=0.1$
$\bullet$

Number of blocks between updates : $U=4$

The process replicated the algorithm 2 to update the difficulty and calculated the errors in the number of claimed relays $C$ versus the selected target $T$ , also the error of the real number of relays $R^{*}$ to the estimated number of relays $R$ . Using this data it is possible to calculate the error to visualize the distribution of these errors in Fig. 8 and Fig. 9, respectively.

It is also to analyze the transient effects on particular traffic behabiors. In the sampled blockchains data it is possible to observe four different cases:

9.3 Steady High Volume - Polygon Mainnet

This is the expected normal behaviour, where the number of relays by block stay almost constant or varies with low frequency. A blockchain that had this behaviour in the observed period of $3000$ blocks, from height $91000$ to $94000$ ( $2023-04-08$ to $2023-05-07$ ) was the Polygon Mainnet. In Fig. 10 a close-up of the process can be observed.

In this period the average target deviation was of $0.7\%$ , with a peak target deviation of $|107\%|$ . The accumulated error resulted in a total of $20639$ extra claims, over $3000$ blocks. The average volume estimation error was $>0.00\%$ with a peak of $|4.17|\%$ .

9.4 Soft Relay Surge - Ethereum

To illustrate this case we use the Ethereun data over the blocks $93775$ to $93950$ , this is from $2023-05-05$ to $2023-05-07$ , during this period the volume of the blockchain processed in the Pocket Network raised from $2.9\times 10^{6}$ to $1.1\times 10^{7}$ . The process was fast ⁶⁶6Compared to normal (non-step) increases observed in the network. and steady. In Fig. 11 a close-up of the process can be observed.

In this period the average target deviation was of $6.2\%$ , with a peak target deviation of $|44.6\%|$ . The accumulated error resulted in a total of $108580$ extra claims, over $175$ blocks. The average volume estimation error was $-0.02\%$ with a peak of $|3.04|\%$ .

9.5 Step Relay Drop - Harmony Shard 0

An step drop in relays occur when a major source of relays stop requesting data. This effect was observed at height $90772$ ( $2023-04-05$ ) for the Harmony Shard 0 blockchain, where the traffic dropped from $273\times 10^{3}$ to $21\times 10^{3}$ in less than four blocks. In Fig. 12 a close-up of the process can be observed.

In this period the average target deviation was of $60.2\%$ , with a peak target deviation of $|95\%|$ . The accumulated error resulted in a total of $264717$ less claims, over $44$ blocks. The average volume estimation error was $-0.21\%$ with a peak of $|5.21|\%$ .

9.6 Step Relay Surge - Polygon Archival

This case is typical when a new chain is introduced in the ecosystem, the evolution number of relays resembles an step function. This behaviour was observed recently, at height $91532$ ( $2023-04-13$ ) for the blockchain Polygon Archival. This blockchain moved from $1120$ to $276\times 10^{3}$ relays per block. In Fig. 13 a close-up of the process can be observed.

In this period the average target deviation was of $170.32\%$ , with a peak target deviation of $|789\%|$ . The accumulated error resulted in a total of $408767$ extra claims, over $24$ blocks. The average volume estimation error was $0.07\%$ with a peak of $|3.45|\%$ .