Towards Effective Detection of Ponzi schemes on Ethereum with Contract Runtime Behavior Graph

Ruichao Liang [email protected] Wuhan UniversityWuhanChina , Jing Chen [email protected] Wuhan UniversityWuhanChina , Cong Wu [email protected] Nanyang Technological UniversitySingapore , Kun He [email protected] Wuhan UniversityWuhanChina , Yueming Wu [email protected] Nanyang Technological UniversitySingapore , Weisong Sun [email protected] Nanyang Technological UniversitySingapore , Ruiying Du [email protected] Wuhan UniversityWuhanChina , Qingchuan Zhao [email protected] City University of Hong KongChina and Yang Liu [email protected] Nanyang Technological UniversitySingapore

(2024)

Abstract.

Ponzi schemes, a form of scam, have been discovered in Ethereum smart contracts in recent years, causing massive financial losses. Existing detection methods primarily focus on rule-based approaches and machine learning techniques that utilize static information as features. However, these methods have significant limitations. Rule-based approaches rely on pre-defined rules with limited capabilities and domain knowledge dependency. Using static information like opcodes for machine learning fails to effectively characterize Ponzi contracts, resulting in poor reliability and interpretability. Our research shows no significant difference between Ponzi and non-Ponzi contracts at the opcode level. Moreover, relying on static information like transactions for machine learning requires a certain number of transactions to achieve detection, which limits the scalability of detection and hinders the identification of 0-day Ponzi schemes.

In this paper, we propose PonziGuard, an efficient Ponzi scheme detection approach based on contract runtime behavior. Inspired by the observation that a contract’s runtime behavior is more effective in disguising Ponzi contracts from the innocent contracts, PonziGuard establishes a comprehensive graph representation called contract runtime behavior graph (CRBG), to accurately depict the behavior of Ponzi contracts. Furthermore, it formulates the detection process as a graph classification task on CRBG, enhancing its overall effectiveness. The experiment results show that PonziGuard surpasses the current state-of-the-art approaches in the ground-truth dataset, achieving a precision of 96.9%, recall of 98.2%, and F1-score of 97.5%. It also exhibits the highest level of interpretability among the current tools. We applied PonziGuard to Ethereum Mainnet and demonstrated its effectiveness in real-world scenarios. Using PonziGuard, we identified 805 Ponzi contracts on Ethereum Mainnet, which have resulted in an estimated economic loss of 281,700 Ether or approximately $500 million USD. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts.

Smart Contract, Ponzi Scheme, Flow Analysis, Graph Neural Networks

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Security and privacy Software security engineering^†^†ccs: Security and privacy Malware and its mitigation

1. Introduction

With the popularity of Ethereum and the anonymity it provides, various scams have been discovered to implement themselves through smart contracts (Szabo, 1994). Ponzi schemes are one of the typical scams found in Ethereum smart contracts (Bartoletti et al., 2020; Chainalysis, 2022), namely Ponzi contracts, disguising as investment programs to lure users under the promise of high profits while users gain profits only if the investments made by subsequent users join the Ponzi schemes. Ponzi schemes have been one of the biggest consumers of gas on Ethereum, heightening already bad congestion and jacking up transaction fees (Rustgi, 2020).

Several approaches have been proposed (Bartoletti et al., 2020; Chen et al., 2018; Jung et al., 2019; Fan et al., 2020a, b, 2021; Lou et al., 2020; Sun et al., 2020; Chen et al., 2021; Yu et al., 2021) to detect Ponzi contracts on Ethereum. Rule-based approaches (Bartoletti et al., 2020; Chen et al., 2021; Sun et al., 2020) require domain knowledge on Ponzi schemes and can hardly cover all possible scenarios based on the existing known Ponzi contracts, which limits their capability to detect Ponzi contracts that fall outside the scope of the rules. Other detection approaches use static information such as opcode frequency and transactions for machine learning models to improve detection capabilities(Chen et al., 2018; Yu et al., 2021; Jung et al., 2019; Lou et al., 2020; Fan et al., 2020a, b, 2021; Lu et al., 2024; Zheng et al., 2023). However, this static information has a low correlation with Ponzi schemes themselves, and these approaches fail to effectively characterize the Ponzi contracts, resulting in poor reliability and interpretability. For instance, Figure 1 shows the frequency distributions of some most frequently used operations in some Ponzi contracts and non-Ponzi contracts. These operations are predominantly stack operations and do not capture the characteristics of Ponzi contracts. The Kullback-Leibler Divergence (KL divergence) calculated from Figure 1 measures the difference between two frequency distributions. It can be concluded from the KL divergence that the distributions of opcode frequency exhibit low differences between Ponzi and non-Ponzi contracts, and no substantial similarities between different Ponzi contracts by comparison. Moreover, those approaches utilizing Ethereum transactions cannot detect 0-day Ponzi contracts, i.e., having none real transactions.

To address this gap, we delved deeper into the behaviors of Ponzi contracts at runtime and found that the contract runtime information provides more valuable insights into the unique characteristics of Ponzi contracts. We will discuss this insight in more detail in Section 2. Motivated by this observation, we propose a comprehensive graph representation called contract runtime behavior graph (CRBG) to characterize the runtime behaviors of Ponzi contracts.

In this paper, we propose PonziGuard, an effective Ponzi scheme detection approach based on CRBG. Specifically, we perform static analysis on the smart contract and leverage the acquired insights to generate transaction sequences that mimic typical investment behavior in a Ponzi scheme. We invoke the smart contract with these transaction sequences and conduct dynamic taint analysis during the contract’s execution to gather runtime information. Then, we construct CRBG based on the contract runtime information and empower Graph Neural Networks (GNNs) for CRBG analysis. We formulate the detection of Ponzi contracts as a graph classification task. We have experimentally validated the effectiveness of CRBG and conducted comparative experiments on a ground-truth dataset to evaluate the performance of PonziGuard. We further applied PonziGuard to Ethereum Mainnet to evaluate the effectiveness of our approach in real-world scenarios. The dataset and experimental results are publicly available online¹¹1https://github.com/PonziDetection/PonziGuard. In summary, this paper makes the following contributions.

•

We propose PonziGuard, an efficient approach for detecting Ponzi schemes on Ethereum. It does not require any domain knowledge and on-chain transaction. It can identify 0-day Ponzi schemes before any economic losses occur.
•

We introduce CRBG, a comprehensive graph representation for effectively characterizing the behaviors of Ponzi contracts. We model the detection of Ponzi contracts as a graph classification task and prove that CRBG is effective in disguising the Ponzi contracts from the innocent contracts.
•

We propose a strategy for generating contract invoke sequences based on function properties and data dependencies, enabling us to mimic typical investment behavior in a Ponzi scheme. We also design a dynamic taint engine to collect contract runtime behavior, which is essential for constructing CRBG.
•

Experimental results show that PonziGuard outperforms the current state-of-the-art approaches on the ground-truth dataset and is also effective in real-world scenarios. We found 805 Ponzi contracts using PonziGuard out of 14,000,000 Ethereum Mainnet blocks which have resulted in an estimated economic loss of 281,700 Ether or approximately $500 million USD. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts.

Refer to caption — Figure 1. Opcode Frequency Distributions. The KL divergence between Ponzi and non-Ponzi contracts ranges from 0.011 to 0.018, while the KL divergence between different Ponzi contracts ranges from 0.012 to 0.016.

This paper is an extended version of our previous work (Liang et al., 2024) published in IEEE/ACM International Conference on Software Engineering (ICSE) 2024. We significantly enhanced the previous conference version in the following aspects: i) we added a static code analysis framework prior to generating transactions for collection of information such as data dependencies and function properties, to guide the generation of transaction sequences (§3.2); ii) we updated our transaction sequence generation algorithm, using information from static analysis and some heuristics to generate function invoke chains that mimic investment behavior of Ponzi schemes (§3.3); iii) we performed additional processing on CRBG including graph pruning and subgraph joining to achieve improved experimental results (§3.5); iv) we conducted the comparison experiment using a larger and more recent dataset, and added two SOTA works for comparison (§5.2); v) we conducted an interpretability experiment to visually explain how our CRBG works in the detection of Ponzi schemes (§5.3); vi) we conducted an ablation experiment to illustrate the impact of the graph pruning and joining processing that we integrated into CRBG (§5.4.1); vii) we explained the reasons behind the short lifetime of the 805 Ponzi contracts we identified, including how they were terminated (§5.5); viii) we conducted an experiment to detect newly deployed 0-day Ponzi schemes on Ethereum, showcasing our tool’s ability to detect 0-day Ponzi schemes (§5.5.2).

2. Background And Insight

In this section, we introduce some necessary backgrounds and discuss our insight into utilizing the contract runtime behavior graph (CRBG) to detect Ponzi schemes.

2.1. Ethereum Smart Contracts

Ethereum smart contracts are programs running on top of Ethereum. They can be written in several programming languages, including Solidity, Viper, and Serpent. To deploy smart contracts on the blockchain, they need to be compiled into bytecode and then submitted to the blockchain with transactions. Once deployed on-chain, the contracts become immutable and the implementation of their logic relies on message calls from transactions. When invoked by a transaction, contracts will be executed in Ethereum Virtual Machine (EVM), a stack-based architecture (Wood, 2022). There are three areas to store data in EVM:

•

Stack: The stack is an object for basic stack operations in EVM. Data is pushed or popped from the top of the stack through instructions.
•

Memory: The memory is a simple word-addressed byte array. It is used for temporary data storage, transfer of arguments and return values, and code copying (Wood, 2022). The data in the memory comes from the stack or the external environments.
•

Storage: Unlike the memory and stack that are volatile, the storage is non-volatile and maintained as part of the smart contract state. Variables in the storage region are called state variables, and they are persistent variables stored in the form of key-value pairs. Transactions can update the state variables of smart contracts by invoking the execution of contracts in EVM.

2.2. Ponzi Schemes

A Ponzi scheme is an investment fraud that involves the payment of purported returns to existing investors from funds contributed by new investors (SEC, 2019). It is a classic fraud that originated at least 150 years ago and now appears on blockchains (Bartoletti et al., 2020). Leveraging smart contracts, Ponzi schemes become more threatening and stealthy than ever and have grabbed a huge amount of profits on the blockchain (Chainalysis, 2022).

⬇

1function enter(){

2 if(msg.value <1/100 ether){

3 msg.sender.send(msg.value);

4 return;}

5 uint amount = msg.value;

6 uint idx = persons.length;

7 persons.length += 1;

8 persons[idx].etherAddress = msg.sender;

9 persons[idx].amount = amount;}

11function pay(){

12 while(this.balance > persons[payoutIdx].amount / 100 * 500){

13 uint transactionAmount = persons[payoutIdx].amount / 100 * 500;

14 persons[payoutIdx].etherAddress.send(transactionAmount);

15 payoutIdx += 1;}}

Listing 1: Motivating Example

2.2.1. Code Example

Listing 1 shows a code snippet of a typical Ponzi contract. The snippet comprises two functions, namely enter() and pay(), where enter() is responsible for receiving Ether from investors and pay() handles the redistribution of Ether. This contract promises investors very high return rates (Line 13) in exchange for their initial investment. The promised returns are paid out of new investments to attract additional investors until the scammers close up their scam and abscond with the illicit profits. Without legitimate earnings, a Ponzi scheme needs a steady stream of new investors to keep it running, otherwise, it will inevitably collapse and let the vast majority of participants bear the loss (Artzrouni, 2009).

2.2.2. Criteria

Based on some previous studies (Bartoletti et al., 2020; SEC, 2019; Chen et al., 2021) and our analysis of known Ponzi contracts, we have developed explicit criteria for objectively identifying Ponzi contracts in our study. Our proposed criteria include:

•

A Ponzi contract must incorporate at least two explicit behavioral logics: investment and reward. This criterion excludes contracts that receive cryptocurrency but provide users with assets through external markets such as real-world trades or auctions that utilize cryptocurrency for payments.
•

The assets of a Ponzi contract must come from a multitude of investors rather than a specific source. This means that Ponzi contracts have no sources of income other than attracting investments. This criterion excludes contracts specifically designed to fulfill certain functions, such as enterprises distributing incentives to employees.
•

In a Ponzi contract, all the investors are promised rewards that are typically expected to exceed their initial investment, although the implementation of these rewards is contingent upon attracting further investments. In other words, as long as there are constant new investments, everyone can theoretically reap the rewards. This criterion excludes the contracts that are likely to be mistaken for Ponzi contracts, such as gambling and puzzle contracts. In such contracts, not all users are promised rewards as they would be in a Ponzi contract. (There are always losers in gambling or puzzle games.)

2.3. Contract Behaviors and Our Insight

Through our proposed criteria, we can observe that the most crucial distinction between Ponzi contracts and benign contracts lies in their behavioral characteristics, such as the investment and reward logics and the flow of Ether, rather than specific transaction or instruction-level statistics. Therefore, in this section, we explore the contract runtime behaviors, trying to find an effective representation of these behaviors.

We invoke the smart contracts, gather their runtime information, and construct graphs based on this information, as depicted in Figure 2. It is important to note that the graphs in Figure 2 have been intentionally simplified to highlight the core logic of the contracts for the sake of clarity. The left graph in Figure 2(a) depicts the investment behavior of the enter() function within the Ponzi contract shown in Listing 1. This contract first utilizes the CALLVALUE and LT operations to compare the Ether amount provided by investors (corresponding to Line 2 in Listing 1), and then utilizes SSTORE to store the investment amount and the address of the investors (Line 8 and Line 9). As it only relies on the comparison of the investment amount as the condition for receiving Ether, the source of Ether for the contract is not restricted to a specific address but encompasses all investors, which aligns with our second criterion for Ponzi contracts. The right graph in Figure 2(a) depicts the reward behavior of the pay() function within the Ponzi contract shown in Listing 1. It first uses SLOAD to load the investment amount of the investor and calculates the promised reward (corresponding to Line 12 in Listing 1). If the contract balance is deemed sufficient to cover the reward, as determined through the comparison using BALANCE and GT, it proceeds to load the investor’s address and completes the transfer (Line 14). Since this reward process iterates in a loop where the only condition for transferring Ether to investors is a sufficient contract balance, it can be inferred that every investor can potentially receive a reward as long as there is a continuous influx of investors, which aligns with our third criterion. The behaviors presented by these two graphs of Figure 2(a) also satisfy our first criterion for Ponzi contracts. For comparison, consider Figure 2(b), which represents the bet and pay behaviors of a gambling contract²²20x4f9048d95616dbf7acc16fc4179f5ac6ee37bce6. In this case, the contract only rewards the gambler whose pre-selected number precisely matches the current timestamp. While this gambling contract fulfills the first and second criteria, it falls short of meeting our third criterion for Ponzi contracts.

In conclusion, these graphs depict the behaviors of the Ponzi contract as reflected in its source code and fulfill the criteria we have proposed, distinguishing it from benign contracts. This demonstrates that the graphs we constructed have the capability to effectively reveal the distinctive behavioral traits of Ponzi contracts. We refer to these graphs as contract runtime behavior graphs (CRBG). The illustrated graphs in Figure 2 serve as a preliminary illustration for clarity, while a more comprehensive description of CRBG can be found in Section 3.5.

3. PonziGuard

We first give an overview of PonziGuard. Then, we describe each step in detail.

3.1. Overview

As shown in Figure 3, we conduct code review and static analysis on the smart contract to extract function properties and identify data dependencies. Based on this gathered information, we conduct function filtering and selection to generate function invocation chains that mimic the investment behavior seen in a Ponzi scheme. Subsequently, we generate the corresponding transaction sequences to invoke the contract. Leveraging the dynamic taint engine, we collect runtime information during the contract execution within the instrumented EVM, creating raw graphs that encompass runtime control flow and data flow. After applying graph pruning and subgraph joining, we obtain the contract runtime behavior graph (CRBG). The CRBG serves as input for training a Graph Neural Network (GNN) designed to classify the graph. Consequently, we transform the Ponzi scheme detection into a graph classification task.

3.2. Static Code Analysis

In order to enhance the likelihood of triggering the behavior of Ponzi contract, we conduct static analysis to provide valuable information for generating transaction sequences. Specifically, we first examine the contract’s source code to extract essential function properties, including the function name, visibility, and whether they are declared payable (i.e., capable of receiving Ether). Then, we leverage slither (Feist et al., 2019), a static analysis framework designed for smart contracts, to extract read and write operations of each function on state variables. Subsequently, we organize the data dependencies related to the functions within the contract. For instance, if function A writes to state variable $\alpha$ and function B reads from state variable $\beta$ , where $\beta$ is dependent on or exactly $\alpha$ , we include function B to the list of functions with a data dependency relationship on function A.

Input :

F_{kws},F_{payable},F_{w},F_{all},Max

Output :

TxSequences

g

\leftarrow

TxSequences\leftarrow init()

3 while $g$ < $Max$ do

txs

\leftarrow

init()

func

\leftarrow

randomChoose

(

F_{kws}

F_{payable}

F_{w}

)

6 if $func$ is empty then

func

\leftarrow

randomChoose

(

F_{all}

)

8 end if

tx\leftarrow generateTransaction(func)

txs.add(tx)

11 while len( $txs$ ) <len( $F_{all}$ ) do

F_{dep}

\leftarrow

getDependency

(

txs

)

13 if $F_{dep}$ is not empty then

func

\leftarrow

randomChoose

(

F_{dep}

F_{all}

)

15 end if

16 else

func

\leftarrow

randomChoose

(

F_{all}

)

18 end if

tx\leftarrow generateTransaction(func)

txs.add(tx)

22 end while

TxSequences.add(txs)

g\leftarrow g+1

25 end while

Algorithm 1 Transaction Sequences Generation.

3.3. Transaction Sequence Generation

We utilize the extracted function properties and data dependencies to generate transactions, thereby simulating the investment behavior of the Ponzi contract and triggering its functionality. Due to the presence of state variables in smart contracts, prior transactions can influence the execution of subsequent ones. Therefore, achieving successful investment may necessitates multiple transactions. Accordingly, we generate transaction sequences to interact with the contract. The process of generating transaction sequences is outlined in Algorithm 1. Initially, we filter out functions that are irrelevant to the contract’s (Ponzi) behavior, such as view functions and pure functions, and categorize specific functions into distinct groups For instance, in Algorithm 1, $F_{kws}$ denotes functions whose names contain keywords (such as invest, enter, init, deposit); $F_{payable}$ encompasses payable functions, including the fallback function; $F_{w}$ encompasses functions capable of altering state variables, while $F_{all}$ encompasses all functions in the contract. We use heuristics to select the initial function to invoke (Lines 4-10). Specifically, we give priority to payable functions because the investment function is typically declared as payable. Additionally, we prioritize functions whose names contain specific keywords because they are more likely to encapsulate the logic of Ponzi contract investment. Once the first transaction is generated (Line 10), we proceed to complete the transaction sequences in a Read-After-Write order (Lines 11-22), which is a common practice to create meaningful function invocation chains. The function $randomChoose()$ selects a function randomly from its arguments (Line 5), while $generateTransaction()$ is responsible for creating a valid transaction based on the chosen function (Line 9). To achieve this, $generateTransaction()$ initially analyzes the contract’s ABI (Application Binary Interface), then randomly selects values within the valid input range for fixed data types, such as uint256. For non-fixed data types like string, it determines a positive number as the data length and generates an input of that length. Additionally, for payable functions, $generateTransaction()$ employs a continuously increasing flow of Ether attached to transactions to facilitate the activation of specific behaviors of Ponzi contracts, such as investment and reward. The function $getDependency()$ retrieves functions that have data dependencies on the provided transactions (Line 12).

3.4. Runtime Behavior Collection

We use the generated transaction sequences to trigger contract execution, and collect the contract runtime behavior information. To achieve this, we design a taint engine for EVM to perform dynamic taint analysis and gather runtime details of smart contracts. Dynamic taint analysis, a widely-used program analysis technique (Schwartz et al., 2010), utilizes predefined taints to track program execution and observe runtime data flow and control flow.

3.4.1. Taint Sources and Sinks

As shown in Table 1, we have selected some operations as taint sources to introduce taint data. These operations will push some external data into the stack or memory, such as CALLER, CALLVALUE, CALLDATALOAD, CALLDATACOPY, which are related to the transaction sender and arguments, and TIMESTAMP, BLOCKHASH, which are related to the blockchain environment. In addition, we also consider some operations related to the contract itself, such as BALANCE and ADDRESS. Data derived from these sources is marked as tainted, while other data is marked as untainted. Regarding taint sinks, we select some meaningful operations as the location to check the flow of taint data. These operations either take taint data as their arguments (e.g., GT, CALL and SSTORE), or load taint data and push it into the stack (e.g., MLOAD and SLOAD).

3.4.2. Taint Propagation

To achieve the taint propagation, we implement the taint engine that encompasses the components such as a taint stack, a taint memory, and a taint storage. Each slot of the taint stack contains a taint that marks the corresponding slot in the EVM stack. Since the EVM memory is a byte array, each taint in the taint memory is responsible for a byte in the EVM memory. Both the taint stack and taint memory are volatile regions that are freed and allocated at the start of each new transaction. In contrast, the EVM storage is non-volatile and stores state variables in key-value pairs. In these key-value pairs, a 32-byte address calculated from the state variable is stored as the key, and the state variable is stored as the value. We maintain the taint storage in the same structure, with the address of the state variable as the key and the taint as the value. As storage is non-volatile, the taint storage is kept until all transactions are completed, as part of the Ethereum world state. In general, when one operand of an arithmetic operation is tainted, the result of the operation is also tainted regardless of the other operands. The implementation of the taint engine enables us to capture and trace the data flow throughout the contract execution.

3.4.3. Raw Graphs

We gather the information obtained in the contract execution and construct a raw graph that integrates the control flow and data flow of the contract runtime. The nodes of the graph are the operations executed during runtime, and we add control flow and data flow edges as the graph edges. The control flow edges are categorized into six types, with the most common type being the adjacent edge. This edge connects two operations whose program counters differ by only 1, indicating that they are executed in a successive manner. The other types of control flow edges include the jump edge, which connects JUMP(I) and the operation executed after the jump, as well as the call, return, and creation edges that similarly connect the corresponding operation (e.g., CALL, RETURN, CREATE) and its successor. Regarding the data flow edges, we follow the principle of adding edges from taint sources to taint sinks, representing the propagation of taint data. There are eight kinds of data flow edges according to the taint sources in Table 1.

Figure 4(a) shows examples of the output graphs obtained from the dynamic taint analysis. The two graphs presented in Figure 4(a) are the result of invoking enter() and pay() within the contract shown in Listing 1. For the sake of clarity, only the nodes representing the taint sources and sinks that capture the main logic of the functions are included in these simplified graphs shown in Figure 4(a).

Table 1. Taint Sources and Sinks.

Sources

Opcode Type

CALLVALUE/CALLDATASIZE/CALLER/ORIGIN/CALLDATALOAD/CALLDATACOPY

Transaction Related

TIMESTAMP / BLOCKHASH

Blockchain Environment

BALANCE / ADDRESS

Contract Related

Sinks

Opcode Type

EQ / LT / SLT / GT / SGT

Comparison

MSTORE / MSTORE8 / MLOAD

Memory Related

SSTORE / SLOAD

Storage Related

CALL / CALLCODE / DELEGATECALL / STATICCALL

Call

JUMPI

Jump

3.5. CRBG Construction

In this section, we illustrate why the raw graph obtained from the runtime behavior collection stage is not suitable for training an effective model and how we process these raw graphs to improve their representation of the behavioral characteristics of the Ponzi contract, making them suitable inputs for the graph neural network. The entire process is outlined in Figure 5.

3.5.1. Mitigating Graph Data Redundancy

As outlined in Section 3.4, we generate transactions to invoke the contract and execute its designated behavior patterns. Each transaction triggers the contract to execute once, generating a corresponding graph structure. To maximize the activation of Ponzi scheme behavior patterns, we generate multiple transaction sequences for each contract. Consequently, numerous redundant graphs occur, corresponding to repeated contract executions, failed executions due to contract assertions or invalid inputs, and executions unrelated to Ponzi scheme behavior patterns.

To mitigate graph data redundancy, we adopted two strategies to prune the raw graphs. The first strategy involves pruning based on contract behavior. This entails removing graphs from failed executions and executions that are not related to Ponzi scheme behavior patterns. Specifically, if an execution does not contain the opcode CALLVALUE and CALLER, it is unlikely to represent an investment operation of a Ponzi scheme. Similarly, if the execution lacks comparison (such as LT, GT, EQ) on the state variables, it is unlikely to be a reward operation of a Ponzi scheme. Therefore, we remove from the graphs any executions that are neither investment nor reward behaviors. Additionally, we remove graphs without the SSTORE opcode, as it is one of the most commonly used opcodes to modify the contract state, and executions without SSTORE often indicate that they ended in failure. The second strategy involves pruning similar graphs. We compute the cosine similarity of node and edge features between each pair of graphs to assess their similarity. Graphs surpassing a specified threshold of similarity are deemed similar. Among similar graphs, we retain only one and discard the others.

3.5.2. Joining Independent Graphs and Enhancing Data Flow Integrity

After the invocation, a contract may correspond to multiple graphs, as each transaction can invoke the contract and generate a graph, as depicted in Figure 5. However, an individual graph may not be sufficient to fully capture the behavior of the contract. For instance, in Figure 4(a), each graph only depicts a single stage of the contract (i.e., the investment stage for enter() and the reward stage for pay()), and neither of these graphs alone can conclusively determine that it is a Ponzi contract. Moreover, the data flow of the contract is isolated among graphs. Since smart contracts have persistent variables, there may also be data flow across transactions, which cannot be captured by individual graphs.

To address these issues, we connect all graphs of the same contract sequentially using a new type of edge called connection edge. This creates a connected graph that can better represent the behavior of the contract across multiple transactions. Furthermore, we complete the across-transaction data flow among previously independent graphs using taint storage. The taint storage records the taint status of variables at the end of each transaction, and this information is used to propagate taints to subsequent transactions. With these enhancements, we are able to capture data flow that spans multiple transactions and more accurately analyze the behavior of smart contracts.

3.5.3. Enriching Node Features

In the raw graph, nodes are distinguished by the type of operation they include. This results in each node being represented by a 139-dimensional one-hot vector (corresponding to 139 unique operations), with a single non-zero entry corresponding to the type of operation. However, one-hot vectors do not capture any information about the relationships between nodes in the graph, which are crucial for understanding its structure and properties. To improve the classification accuracy, we need better node embeddings that can capture these relationships.

We noticed that there is an introduction for each operation in the Ethereum Yellow Paper (Wood, 2022) as exemplified in Table 2. In Table 2, $\alpha$ represents the additional items placed on the stack, while $\delta$ represents the items removed from the stack (Wood, 2022). The description section explains how the operation works in text, and shows how it operates the data in the EVM in the formula. To embed the nodes, we first remove the formula in the description section and keep only the text explanation to preserve the functional information of the operation. As shown in Figure 5, for each node, we use Doc2Vec (Le and Mikolov, 2014), a model for generating embeddings of variable-length pieces of text, to convert the text explanation into a 100-dimensional vector, which we stitch together with $\alpha$ and $\delta$ to form the node feature. After that, we have completed the construction of CRBG which will be labeled for model training later. Figure 4(b) shows the constructed CRBG after the raw graphs in Figure 4(a) were preprocessed. The CRBG in Figure 4(b) has better node embeddings, more comprehensive contract runtime information, and can better characterize the contracts.

Table 2. Introduction of JUMPI.

Value

Mnemonic

\delta

\alpha

Description

0x57

JUMPI

Conditionally alter the program counter.

\mathrm{J_{JUMPI}(\mu)\equiv\left\{\begin{aligned} \mu_{s}[0]\quad if\;\mu_{s}[1]\neq 0\\ \mu_{pc}+1\quad otherwise\end{aligned}\right.}

3.6. Graph Classification

Deep learning excels at automatic feature extraction from raw data and achieves top performance in many fields (Wu et al., 2020a, 2022b). Unlike traditional deep learning models that primarily handle vector or matrix data (Wu et al., 2022a), graph neural networks (GNNs) excel at modeling and processing graph-structured data (Wu et al., 2020c). In this section, we introduce our GNN solution to the Ponzi contract identification problem. As illustrated in Figure 6, our GNN model consists of three parts: graph input, graph embedding learning, and classification.

3.6.1. Graph input

We use CRBG ( $\mathcal{G}$ ) as the input graph which contains nodes $\mathcal{V}=\{1,...,n\}$ and edges $\mathcal{E}$ . The node features matrix X has a dimension of $(|\mathcal{V}|,102)$ , where each node is represented by a 102-dimensional feature vector. The edge index I has a dimension of $(2,|\mathcal{E}|)$ , where each column corresponds to an edge and contains the indices of the nodes that the edge connects. The edge features matrix E has a dimension of $(|\mathcal{E}|,15)$ , where each edge is represented by a 15-dimensional feature vector. $|\mathcal{V}|$ and $|\mathcal{E}|$ represent the number of nodes and edges in $\mathcal{G}$ .

3.6.2. Graph embedding learning

In graph embedding learning, we choose Graph Attention Networks (GAT) as the component of GNN convolutional layers. GAT performs the aggregation based on the self-attention mechanism, i.e., calculating the weights between nodes and edges through learnable weight matrices W and $\textbf{W}_{e}$ , so that each node can be weighted and aggregated according to the characteristics of its surrounding nodes. Since CRBG has multi-dimensional edge features, the attention coefficients $\alpha_{i,j}$ in the self-attention mechanism are computed as:

(1)

\alpha_{i,j}=\frac{\exp(\textbf{e}_{i,j})}{\sum_{k\in\mathcal{N}_{i\cup i}}\exp(\textbf{e}_{i,k})}

where $\textbf{e}_{i,j}$ represents the attention score indicating the importance of node $j$ ’s features to node $i$ , and $\mathcal{N}_{i\cup i}$ represents the set of adjacent nodes of node $i$ . $\textbf{e}_{i,j}$ is obtained by concatenating the feature vectors of node $i$ and node $j$ and performing linear transformation:

(2)

\textbf{e}_{i,j}=\text{LeakyReLU}(\vec{\textbf{a}}^{T}[\textbf{W}\vec{\textbf{h}}_{i}||\textbf{W}\vec{\textbf{h}}_{j}||\textbf{W}_{e}\vec{\textbf{m}}_{i,j}])

where $\mathrm{LeakyReLU}$ represents the activation function, $\vec{\textbf{a}}$ represents the weight vector, $||$ represents the concatenation operation, $\vec{\textbf{h}}_{i}$ represents the feature vector of node $i$ , and $\vec{\textbf{m}}_{i,j}$ represents the multi-dimensional edge features between node $i$ and $j$ .

By calculating the weight between nodes, the weighted sum of the adjacent nodes of node $i$ can be obtained:

(3)

\vec{\textbf{h}}_{i}^{\prime}=\sigma\left(\sum_{j\in\mathcal{N}_{i}}\alpha_{i,j}\textbf{W}\vec{\textbf{h}_{j}}\right)

where $\vec{\textbf{h}}_{i}^{\prime}$ represents the updated eigenvector of node $i$ , $\sigma$ represents the activation function, $\mathcal{N}_{i}$ represents the adjacent node of node $i$ .

We set two GAT layers and use ReLU in the middle for nonlinearly transforming the node features in order to better handle the nonlinear relationship of data and increase the expressiveness of the network. We utilize mean-pooling to aggregate the node features and obtain the global feature representation of the graph.

3.6.3. Classification

The classifier comprises a dropout and a fully connected layer (FC). The dropout randomly sets a fraction of the output of neurons to zero, which helps prevent overfitting and improves the model’s generalization ability. The purpose of a fully connected layer is to learn non-linear combinations of the features in the input data, allowing the model to make more accurate predictions. We input the global feature representation into the classifier and obtain the predicted class label for the graph.

4. Implementation

We leverage slither (Feist et al., 2019), a static analysis framework, to extract data dependencies of smart contracts. We instrumented the official Golang implementation of EVM (version 1.10.6) (ethereum foundation, 2023) to collect contract runtime information. We implemented our dynamic taint engine in Golang (version 1.16.6) to cooperate with the instrumented EVM and construct the CRBG. Our GNN model was implemented using Pytorch (pytorch, 2023), and we employed Graph Attention Networks as the convolutional layers.

5. Experiments

5.1. Research Questions

Our test environment is comprised of a server with a 16-core Intel(R)-Xeon(R)-Gold-5218 CPU $@$ 2.30 GHz, 340GB of RAM, and the Ubuntu 18.04 LTS operating system. We conduct experiments to answer the following four questions.

•

RQ1: How effective is PonziGuard in identifying Ponzi contracts compared to the existing tools?
•

RQ2: How does CRBG work in detecting Ponzi contracts? Is the detection process interpretable?
•

RQ3: How effective is the CRBG compared to the raw graph obtained directly from runtime?
•

RQ4: How does PonziGuard perform in real-world scenarios? Can it detect 0-day Ponzi schemes?
•

RQ5: What is the overhead of PonziGuard?

Table 3. Overall Evaluation Results. Values in parentheses represent the standard deviations across the K-fold.

Approach	Precision	Recall	F1-score
OpML(Chen et al., 2018)	89.0% (0.05)	77.8% (0.04)	83.0% (0.03)
TxML(Yu et al., 2021)	69.6% (0.06)	63.4% (0.02)	66.4% (0.07)
SADPonzi(Chen et al., 2021)	88.1%	64.5%	74.5%
MulCas(Zheng et al., 2023)	95.1%	67.4%	78.9%
SourceP(Lu et al., 2024)	91.6% (0.04)	93.3% (0.03)	92.4% (0.02)
PonziGuard	96.9% (0.03)	98.2% (0.03)	97.5% (0.02)

5.2. RQ1: Effectiveness of PonziGuard

5.2.1. Dataset

We utilize the dataset in XBlock (zhijie, 2023) provided by Zheng et al. (Zheng et al., 2023), obtained through crawling Etherscan (Etherscan, 2023) and manual cross-checking. This ground-truth dataset comprises 6,498 smart contracts, with 314 identified as smart Ponzi schemes. These smart contracts range from height 0 to height 7,500,000, and undergo a manual cross-check procedure to ensure the accuracy of their labels.

5.2.2. Evaluation metrics

We use the following evaluation metrics to measure the effectiveness of our approach.

Precision measures the proportion of true positive predictions made by the approach out of all positive predictions: Precision = TP / (TP + FP). Recall measures the proportion of true positive predictions made by the approach out of all actual positive instances in the dataset: Recall = TP / (TP + FN). F1-score is the harmonic mean of Precision and Recall, providing a single measure of the approach’s overall performance: F1-score = 2 $\times$ Precision $\times$ Recall / (Precision + Recall)

5.2.3. State-Of-The-Art

We evaluated the effectiveness of PonziGuard and compared it with the studies of Chen et al. (Chen et al., 2018), Yu et al. (Yu et al., 2021), SADPonzi (Chen et al., 2021), MulCas (Zheng et al., 2023) and SourceP (Lu et al., 2024). Chen et al. (Chen et al., 2018) detect Ponzi contracts using XGBoost mainly based on the opcode frequency, and in this paper we refer to their work as OpML. Yu et al. (Yu et al., 2021) utilize the transactions on Ethereum to identify Ponzi contracts, and in this paper we refer to their work as TxML. SADPonzi detects Ponzi contracts based on symbolic execution. MulCas extracts contract features from multiple views and detects Ponzi schemes through multi-view training and ensemble. SourceP trains the classification model by converting contract source code into Abstract Syntax Trees (AST) to extract data flow information. (Chen et al., 2018; Zheng et al., 2023; Lu et al., 2024) represent state-of-the-art (SOTA) operation/source code-based machine learning approaches. (Yu et al., 2021) represents SOTA transaction-based machine learning approaches, and (Chen et al., 2021) represents SOTA rule-based approaches.

5.2.4. Result and Analysis

In our approach, we generated 6,498 graphs from 6,498 contracts in the dataset for model training and testing. While we used the same dataset in the comparative experiment, different approaches processed the data differently. For example, in the approach of Chen et al. (Chen et al., 2018), we compiled the 6,498 contracts into bytecode and counted the opcode frequency as inputs for model training and testing. In the approach of Yu et al. (Yu et al., 2021), we collected the transactions of these contracts on Ethereum and performed a random selection process to obtain a transaction network as input. The contract bytecode could be directly applied by the symbolic execution tool of SADPonzi. For the machine learning-based approaches, we randomly divided the dataset into 5 folds and performed K-fold cross-validation. The mean values of the evaluation metrics across the K models, as well as their corresponding standard deviations, were calculated to measure the average performances. As shown in Table 3, PonziGuard outperformed all the baselines on the test set, achieving 96.9% precision, 98.2% recall, and 97.5% F1-score. We believe that the poor performance of the state-of-the-art approaches can be attributed to the fact that static information cannot characterize the Ponzi contracts (OpML, TxML, MulCas and SourceP), and not all Ponzi contracts conform to the pre-defined behavior patterns (SADPonzi).

Answer to RQ1: PonziGuard outperforms the state-of-the-art approaches in the comparative experiment, demonstrating the effectiveness of our approach in identifying Ponzi contracts.

5.3. Interpretability

We conduct an interpretability experiment to explain what role CRBG plays in detecting Ponzi schemes and to validate our insight into using CRBG as the key detection mechanism. Specifically, we selected a graph from the test set as input, and then calculated the gradient of the final classification decision for each node and edge in the input graph. Using these gradient values, we generated importance heatmaps to highlight the nodes and edges with great impact on the classification decisions, as shown in Figure 7 and Figure 8.

In Figure 7(a), some of the nodes with the greatest contribution form two clusters, and magnified details are shown in Figure 7(b) and Figure 7(c). In these two clusters, nodes 11 and 84 are the central nodes with the most connected nodes, and correspond to the opcode CALLVALUE and BALANCE, respectively. These two clusters represent the investment and reward behaviors of the Ponzi scheme. Specifically, the connections between node 11 and its surrounding nodes indicate that the contract continuously attracts new investors and records their investment amounts. The connections between nodes around node 84 represent the contract’s cyclic operation of distributing rewards to investors. Several other important nodes such as nodes 6 and 192 shown in Figure 7(d) and Figure 7(e), correspond to comparison operations. They represent the value checks before both the investment and reward behaviors.

Similarly, the most important edges in Figure 8 also form two clusters, same as those observed in Figure 7, delineating the control flow and data flow of the Ponzi scheme during the investment and reward processes. In particular, some of the most important edges such as (6, 7) and (194, 195) are presented in Figure 8(d) and Figure 8(e). As previously discussed, nodes 6 and 192, along with their adjacent nodes, represent the value check before the investment and reward. Subsequently, edges (6, 7) and (194, 195) represent the contract’s behaviors subsequent to the completion of these checks. Specifically, edge (6, 7) represent the storage (SSOTRE) of investor’s funds, while edge (194, 195) represents the external call (CALL) used for transferring rewards.

Answer to RQ2: The experiment showcases that CRBG effectively reflects the characteristic behaviors of Ponzi schemes, and these characteristics greatly contribute to the classification decisions. This underscores CRBG’s effectiveness in Ponzi scheme detection. Additionally, the experiment highlights that CRBG can provide valuable interpretability for our tool.

5.4. RQ3: Effectiveness of CRBG

In this section, we evaluate the effects of various processes applied to CRBG through a series of ablation studies, in order to determine how effective is CRBG compared to the raw graph obtained directly from runtime.

5.4.1. Effectiveness of graph pruning and joining

To evaluate the impact of graph pruning and graph joining, on the performance of CRBG, we designed two ablation settings. In each setting, we omitted one of these steps respectively. We then compared the size of CRBG generated in these two ablation settings and the corresponding experimental results on the same dataset, which is shown in Figure 10 and Figure 10.

For the CRBG without the graph pruning operation, as shown in Figure 10, the average number of nodes and edges per graph increased by nearly 30%. This added significant redundant information, leading to lower precision, recall, and F1-score compared to the original CRBG. Additionally, the larger graph sizes increased the model’s training time. For the CRBG without the subgraph joining operation, as shown in Figure 10, the average number of nodes and edges per graph was significantly reduced to nearly one-fifth of the original CRBG. However, as discussed in Section 3.5, these smaller subgraphs failed to fully capture the contract’s behavior, resulting in a decrease in performance metrics: precision, recall, and F1-score dropped by 5.9%, 10.2% and 8.0%, respectively.

5.4.2. Effectiveness of runtime data flow and control flow in CRBG

To gain a better understanding of the effectiveness of control and data flow in CRBG, we performed an ablation study by configuring PonziGuard in four distinct modes: data flow only (DF), control flow only (CF), both (DF+CF), and neither (NE). In DF mode, we removed control flow edges and only kept data flow edges in CRBG. On the contrary, in CF mode, we removed data flow edges and only kept control flow edges. In NE mode, we removed both data flow edges and control flow edges in CRBG. The mode DF+CF is the native PonziGuard which includes both control and data flow.

Figure 11 shows the performance of these four modes after 5-fold cross-validation on the same dataset. Compared to the native PonziGuard baseline, there were drops of 9.3%, 15%, and 12.2% in the evaluation metrics of the CF mode. In the DF mode, the evaluation metrics decreased to a greater extent compared to the native PonziGuard baseline, with a drop of 13.5%, 24.2%, and 19.3%, respectively. Undoubtedly, NE mode exhibited the worst performance, with a significant drop of 26.4%, 42.1%, and 35.3%, respectively. The main reason for the poor performance of the CF mode is that, with control flow only, PonziGuard cannot capture the flow of investors’ investments in contracts. Therefore, some contracts with Ether redistribution logic may be misreported. On the other hand, the lack of control flow in the DF mode results in the loss of contract context information, such as the functions and order in which variables are used. This ablation study highlights that both control flow and data flow are crucial in capturing the behavioral patterns of Ponzi contracts, and the gathering of this runtime information significantly improves the performance of PonziGuard.

Table 4. Comparing model performance between different node embedding settings.

Test

Node Embeddings adopted in Test Set

Model

Precision

Recall

F1-score

One-hot vectors

MOE

88.5%

82.1%

85.2%

Enhanced

MEE

96.4%

Variant 1

MEE

96.3%

92.9%

94.5%

Variant 2

MEE

96.2%

89.3%

92.6%

5.4.3. Effectiveness of node embeddings adopted in CRBG

In Section 3.5, we utilized Doc2Vec to enhance node embeddings in CRBG based on the operation descriptions from the Ethereum Yellow Paper. We believe that the semantic information conveyed in these descriptions is representative and can capture the relationships between the nodes in CRBG. We conducted a comparative experiment to evaluate the efficacy of the node embeddings we enhanced. In this comparative experiment, one model was trained on the dataset described in Section 5.2.1 using our enhanced node embeddings (Model with enhanced node embeddings, abbreviated as MEE). In contrast, another model was trained on the same dataset, but replacing our node embeddings with one-hot vectors (Model with one-hot vectors as node embeddings, abbreviated as MOE). We used 80% of the dataset for training and 20% for testing. As shown in Table 4, compared with the one-hot vectors as node embeddings (Test 1), the node embeddings based on the operation description (Test 2) performed better in the evaluation metrics. It demonstrates that the node embeddings generated from operation descriptions capture the underlying semantics, leading to a better understanding of the graph’s structure and properties, which accounts for this performance improvement.

Table 5. Variants of opcode description.

Value	Mnemonic	Original Description:
0x57	JUMPI	”Conditionally alter the program counter.”
		Synonyms substitution:
		”Conditionally change the instruction pointer.”
		Changing sentence structure or grammar:
		”Alter the program counter based on the condition.”

To ascertain that the performance improvement is primarily attributed to the semantics themselves, rather than the way the semantics are described, we conducted additional analysis. We rewrote the operation descriptions in the Ethereum Yellow Paper with two principles while retaining the original semantics. The first principle is using synonyms substitution. By substituting words with their synonyms, we retained the original semantics while using a different expression. Another principle is changing sentence structure or grammar. This can be done by using different sentence patterns, altering the word order, or adjusting the placement of clauses. For instance, as shown in Table 5, the description for JUMPI in Table 2 can be rewritten as ”Conditionally change the instruction pointer” and ”Alter the program counter based on the condition” according to these two principles. Based on these two alternative description rewriting principles, we re-extracted the node embeddings and created two variants of the test set (Variant 1 and Variant 2). Then, we evaluated our model (MEE, the model trained with enhanced node embeddings) on these two variant test sets. As shown in Table 4, our model exhibited similar performance on the variant test sets (Test 3 and 4) compared to the native test set (Test 2), suggesting that the improvement in model performance is primarily attributed to the semantic information rather than the specific way in which the semantics are conveyed.

Answer to RQ3: CRBG proves effective compared to the raw graph obtained directly from runtime. And ablation studies further confirm that our processes on CRBG such as graph pruning, subgraph joining, and enhanced node embeddings significantly enhance detection performance.

5.5. RQ4: Performance in real-world scenarios

To evaluate the effectiveness of PonziGuard in real-world scenarios, we conducted two experiments on the Ethereum Mainnet. The first is a large-scale transaction replay on the Ethereum historical blocks, to assess the number and economic impact of Ponzi contracts over the past few years. The second involved collecting recently deployed contracts to detect 0-day Ponzi schemes.

5.5.1. Historical Transaction Replay

Firstly, we ran the Geth client with the option: sync-mode-full to synchronize with the Ethereum Mainnet. The number of smart contracts has experienced explosive growth in recent years (about one million per quarter (Alchemy, 2022)), which is a significant amount for our approach based on runtime information. Therefore, we set the synchronization time until January 2022 (approximately 14,000,000 blocks), only as a preliminary experiment to verify the performance of PonziGuard in real-world scenarios. Then, we replaced the native EVM with our instrumented EVM and integrated the dynamic taint engine. We re-executed every transaction on the synchronized blockchain from the genesis block, which is a time-consuming process. Finally, we fed the generated graphs into our GNN model for prediction. As a result, PonziGuard successfully identified 805 Ponzi contracts on Ethereum Mainnet, out of which 497 contracts have accessible source code on Etherscan (Etherscan, 2023). We randomly selected 50 contracts³³3https://github.com/PonziDetection/PonziGuard/tree/main/dataset/Result/verified from these 497 contracts and conducted a manual examination through Remix (Project, 2023), a solidity IDE, to ensure they meet our predefined criteria for Ponzi contracts, resulting in a 100% true positive rate. To gain deeper insights into these 805 Ponzi contracts, we conducted further analysis using the data collected from Etherscan in the remainder of this section.

Creation Time of Ponzi Contracts. Figure 13 shows the distribution of these 805 Ponzi contracts. Ponzi schemes started appearing on Ethereum as early as 2015. Subsequently, the rapid development of Ethereum led to a significant growth of Ponzi contracts during the years 2016-2019. Then, we witnessed a brief recession in Ponzi schemes possibly linked to the impact of the COVID-19 pandemic (Lehman, 2021). The global crypto mining boom in 2021 (Drakopoulos, 2021) resulted in another minor peak in Ponzi schemes. With the increasing popularity of various tokens on Ethereum, ERC-20 Tokens for instance, we anticipate another peak in Ponzi schemes on Ethereum in the near future.

Lifetime of Ponzi Contracts We regard the time from the creation of a Ponzi contract to its last transaction as its lifetime. We investigated the lifetime of these 805 Ponzi contracts, as shown in Figure 13. While some of these contracts remain active in 2023⁴⁴4For instance: 0xa90be2201bfed97587a2a17949e8624eafe51d13 and 0xf8f04b23dace12841343ecf0e06124354515cc42, the majority of Ponzi contracts have a lifetime of less than three months, and their average lifetime is about seven months. As for the short lifetime of the Ponzi, some ended because the scam failed to attract new investors, resulting in its collapse. Others were ended because the scam owner intentionally triggered the self-destruct function of the contracts and absconded with the funds. These findings indicate that Ponzi contracts are likely to collapse within a short period of time, and most of their users will not be unable to reclaim their promised returns.

Financial Impact We analyzed the financial impact of the 805 Ponzi contracts identified by PonziGuard on Ethereum Mainnet by aggregating their transactions and the inflow of Ether. Figure 15 shows their monthly distribution, revealing a positive correlation between the inflow of Ether and the number of transactions of the contracts. The peak was in February 2018, when a total of 117,953 Ether flowed into Ponzi contracts, equivalent to $108 million at the exchange rate of that time. From January 2015 to July 2023, 615,483 transactions, totaling 281,700 Ether, flowed into Ponzi contracts. At the current exchange rate, the value of these tokens can reach as high as $500 million. It is also evident that, in recent years, the involvement of Ether may not be substantial in a Ponzi scheme, as some of them began adopting ERC tokens for investments and rewards. However, it is important to note that such kinds of Ponzi contracts still meet the criteria outlined in Section 2.2, and our method remains effective in identifying them⁵⁵5Evidenced by the example of 0xb3836d31d43d315ba74c21aad3818f9378256152.

5.5.2. Detecting 0-day Ponzi Schemes

To ascertain whether new Ponzi schemes still keep emerging and whether our tool can detect them effectively, we collected the source code of about 10,000 smart contracts recently deployed in March 2024 from Etherescan (Etherscan, 2023) and employed PonziGuard for detection. After manual confirmation, we identified 15 Ponzi schemes among these 10,000 recently deployed contracts, and their address are available at our GitHub repository⁶⁶6https://github.com/PonziDetection/PonziGuard/tree/main/dataset/Result. As shown in Figure 15, most of these Ponzi contracts were newly deployed and have only one transaction (creation transaction) on them, showcasing the ability of PonziGuard to uncover 0-day Ponzi schemes.

Answer to RQ4: PonziGuard successfully identified 805 Ponzi contracts in the Ethereum historical blocks and found 15 latest 0-day Ponzi contracts deployed within a month, demonstrating the effectiveness of PonziGuard in real-world scenarios. These contracts have resulted in significant financial losses, amounting to millions of USD, which emphasizes the severity of Ponzi contracts on Ethereum and the urgency of identifying them effectively.

5.6. RQ5: Overhead of PonziGuard

In PonziGuard, we instrument the EVM and build a dynamic taint engine to obtain contract runtime information, which introduces a certain amount of time overhead compared to the native smart contract execution environment. We conducted experiments to evaluate this overhead.

5.6.1. Ground-Truth Dataset

For the experiment described in Section 5.2, the contracts were executed in an independent instrumented EVM with the taint engine. To evaluate the time overhead, we generated 1,000 transactions for a contract and sent them along with the contract to both the native EVM and the instrumented EVM separately. To accurately assess the time overhead, we repeated the process 10 times and recorded the average time it took to process these transactions. The results, shown in Figure 17, indicate that when the processing of 1,000 transactions was completed, the average overhead of the instrumented EVM reached a maximum of approximately 30.2%.

5.6.2. Real-World Scenarios

In the experiment described in Section 5.5, we conducted re-execution of historical transactions on the synchronized blockchain. To evaluate the time overhead, we re-executed the transactions of the first 500,000 blocks on the synchronized blockchain using both the native Geth and the Geth modified by PonziGuard separately. To accurately assess the time overhead, we repeated the process 10 times and recorded the average time it took to complete the re-execution. As shown in Figure 17, when it comes to 500,000 blocks, the time overhead amounts to 25.5%, which is smaller than the overhead on the ground-truth dataset. This can be attributed to the fact that re-execution involves additional reading and verification operations on the blockchain, in addition to the time consumed by contract execution.

Answer to RQ5: The time overhead introduced by our taint engine and the modification of EVM is an acceptable compromise to obtain contract runtime information.

6. Related Work

In this section, we first describe the previous studies about Ponzi schemes on Ethereum. Then, we describe the studies related to the techniques we use.

6.1. Ponzi Scheme on Ethereum

Bartoletti et al. (Bartoletti et al., 2020), the first to study Ponzi schemes on Ethereum, use the Normalized Levenshtein Distance (NLD) to measure the similarity of contract bytecode. Similarly, the rule-based approaches have been developed by Sun et al. (Sun et al., 2020) who leverage behavior forest similarity to detect Ponzi contracts, and Chen et al. (Chen et al., 2021) who use symbolic execution for detection. These approaches require a comprehensive summary of existing Ponzi schemes and expert experience. However, it is challenging to cover all possible scenarios based on the existing known Ponzi contracts, which limits their capability to detect Ponzi contracts that fall outside the scope of the summarized rules. Additionally, other approaches (Chen et al., 2018; Jung et al., 2019; Fan et al., 2020a; Lou et al., 2020; Yu et al., 2021; Lu et al., 2024; Zheng et al., 2023) use static information like opcode or transactions for machine learning models to improve detection capabilities. However, these approaches suffer from the limitation that static information cannot well distinguish Ponzi contracts from other contracts, and transaction-based machine learning approaches cannot detect 0-day Ponzi schemes.

6.2. Smart Contract Fuzzing

Fuzzing has been proven to be effective to exploit vulnerabilities in smart contracts (Ashraf and Chant, 2022; Jiang et al., 2018; He et al., 2019; Wüstholz and Christakis, 2020; Torres et al., 2021; Nguyen et al., 2020; Zhang et al., 2020). ContractFuzzer (Jiang et al., 2018) is a black-box fuzzer for Ethereum smart contracts to detect security bugs such as gasless send and timestamp dependency. Some grey-box fuzzers (He et al., 2019; Wüstholz and Christakis, 2020; Torres et al., 2021; Nguyen et al., 2020; Zhang et al., 2020; Xue et al., 2024; Shou et al., 2023; Ye et al., 2023) have also been proposed for smart contracts. These methods are designed to exploit vulnerabilities in smart contracts, while PonziGuard uses fuzzing to invoke contracts and obtain their runtime information.

6.3. Taint Analysis

Taint analysis is an effective technique to analyze the data flow in programs (Shcherbakov et al., 2024; Wang et al., 2023; Sun et al., 2023). There have been studies that leverage taint analysis to help analyze smart contract such as Osiris (Torres et al., 2018), Sereum (Rodler et al., 2019) and EthPloit (Zhang et al., 2020). Osiris (Torres et al., 2018) is an integer bug detection framework that combines taint analysis and symbolic execution. Sereum (Rodler et al., 2019) leverages taint analysis to protect smart contracts with re-entrancy vulnerabilities from being exploited. EthPloit (Zhang et al., 2020) adopts taint analysis to generate exploit-targeted transaction sequences, in order to make the contract fuzzing process more efficient. Those studies are orthogonal to this paper: they aim to uncover security vulnerabilities in smart contracts, while our tool is designed specifically for for identifying malicious contracts.

6.4. Graph Neural Network

Graph Neural Networks (GNNs) are a subset of deep learning techniques that have shown remarkable effectiveness across various domains, including user authentication (Wu et al., 2020b; Wu et al., 2024) and mitigating vulnerabilities (Zou et al., 2021). GNNs are designed to process and learn from data that is structured in the form of graphs (Wu et al., 2020c). They have been shown highly effective in various tasks, such as node classification (Ivanov and Prokhorenkova, 2021; Thakoor et al., 2022), link prediction (Zhang et al., 2021; You et al., 2019), and graph classification (Bouritsas et al., 2022; Wei et al., 2023). In this paper, we leverage GNNs for CRBG analysis and formulate the detection of Ponzi contracts as a graph classification task.

⬇

1 Origin:

2 uint index = Calculator.length + 1;

3 Calculator[index].ethereumAddress = msg.sender;

4 Calculator[index].name = "masterly calculated";

7 Modified:

8 uint index = Calculator.length;

9 Calculator.length += 1;

10 Calculator[index].ethereumAddress = msg.sender;

11 Calculator[index].name = "masterly calculated";

Listing 2: Snippet of squareRootPonzi

7. Discussion

We note that some static analysis tools (Feist et al., 2019; Schneidewind et al., 2020; ConsenSys, 2023; Chen et al., 2020) can obtain the Static control and data flow with lower overhead, which also reflect the contract behavior to some extent. In this section, we provide the explanation for our decision to use Runtime information rather than Static information to construct our CRBG.

Firstly, static analysis is inherently imprecise following the principle of over-approximation. This conservative approach preserves all ”could happen” or ”could exist” cases, which is useful for capturing program errors and vulnerabilities but inappropriate for characterizing a program’s behavior. For instance, squareRootPonzi⁷⁷70x8ea6c8077d6316b46e449aec8fb60a606cf50eea is a false positive case in the previous study (Bartoletti et al., 2020), and its code snippet is shown in Listing 2. This contract appears to follow the logic of a typical Ponzi scheme, however, the incorrect assignment to the variable index will cause the typical IndexError during its runtime. Consequently, the contract will always exit with an error. The correct code is demonstrated in Line 8 and Line 9. However, static analysis tools cannot recognize this invalid execution path due to the lack of runtime information, and following the principle of over-approximation. If we utilize the static information to characterize the contract behaviors, it is likely to misreport it as a Ponzi scheme.

Secondly, the output of static analysis includes all possible execution paths and data flows of the contract, making it challenging to determine which information should be pruned. Constructing this information into a graph structure can result in a significant increase in size and contain redundant data, which is not efficient for model training.

8. Conclusion

In this paper, we propose PonziGuard, an approach for identifying Ponzi schemes on Ethereum based on the contract runtime behavior graphs (CRBG). The experimental results demonstrate that PonziGuard is effective on both the ground-truth dataset and real-world scenarios with acceptable overhead. Moreover, our preliminary experiment conducted on Ethereum Mainnet has identified 805 Ponzi contracts that have caused millions of USD in financial losses. We also found 0-day Ponzi schemes in the recently deployed 10,000 smart contracts. This highlights the severity of Ponzi contracts on Ethereum and the pressing need to effectively identify them.

References

(1)
Alchemy (2022) Alchemy. 2022. ethereum-statistics. Retrieved November 17, 2022 from https://www.alchemy.com/overviews/ethereum-statistics
Artzrouni (2009) Marc Artzrouni. 2009. The mathematics of Ponzi schemes. Mathematical Social Sciences 58 (2009).
Ashraf and Chant (2022) Imran Ashraf and W. K. Chant. 2022. An Empirical Study on the Effects of Entry Function Pairs in Fuzzing Smart Contracts. In IEEE Annual Computers, Software, and Applications Conference (COMPSAC).
Bartoletti et al. (2020) Massimo Bartoletti, Salvatore Carta, Tiziana Cimoli, and Roberto Saia. 2020. Dissecting Ponzi schemes on Ethereum: Identification, analysis, and impact. Future Generation Computer Systems 102 (2020).
Bouritsas et al. (2022) Giorgos Bouritsas, Fabrizio Frasca, Stefanos Zafeiriou, and Michael M Bronstein. 2022. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2022).
Chainalysis (2022) Chainalysis. 2022. The Chainalysis 2022 Crypto Crime Report. Retrieved March 20, 2023 from https://go.chainalysis.com/2022-Crypto-Crime-Report.html
Chen et al. (2020) Ting Chen, Rong Cao, Ting Li, Xiapu Luo, Guofei Gu, Yufei Zhang, Zhou Liao, Hang Zhu, Gang Chen, Zheyuan He, Yuxing Tang, Xiaodong Lin, and Xiaosong Zhang. 2020. SODA: A Generic Online Detection Framework for Smart Contracts. Proceedings of Network and Distributed System Security Symposium (NDSS) (2020).
Chen et al. (2021) Weimin Chen, Xinran Li, Yuting Sui, Ningyu He, Haoyu Wang, Lei Wu, and Xiapu Luo. 2021. SADPonzi: Detecting and Characterizing Ponzi Schemes in Ethereum Smart Contracts. In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).
Chen et al. (2018) Weili Chen, Zibin Zheng, Jiahui Cui, Edith C. H. Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology. In Proceedings of International World Wide Web Conferences (WWW).
ConsenSys (2023) ConsenSys. 2023. Mythril. Retrieved April 22, 2023 from https://github.com/ConsenSys/mythril/
Drakopoulos (2021) Dimitris Drakopoulos. 2021. Crypto Boom Poses New Challenges to Financial Stability. Retrieved July 3, 2023 from https://www.imf.org/en/Blogs/Articles/2021/10/01/blog-gfsr-ch2-crypto-boom-poses-new-challenges-to-financial-stability
ethereum foundation (2023) ethereum foundation. 2023. Official Go implementation of the Ethereum protocol. Retrieved October 20, 2022 from https://geth.ethereum.org/
Etherscan (2023) Etherscan. 2023. Etherscan.io. Retrieved March 20, 2023 from https://etherscan.io/
Fan et al. (2021) Shuhui Fan, Shaojing Fu, Haoran Xu, and Xiaochun Cheng. 2021. Al-SPSD: Anti-leakage smart Ponzi schemes detection in blockchain. Information Processing and Management 58 (2021).
Fan et al. (2020a) Shuhui Fan, Shaojing Fu, Haoran Xu, and Chengzhang Zhu. 2020a. Expose Your Mask: Smart Ponzi Schemes Detection on Blockchain. In Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN).
Fan et al. (2020b) Shuhui Fan, Haoran Xu, Shaojing Fu, and Ming Xu. 2020b. Smart Ponzi Scheme Detection using Federated Learning. In Proceedings of IEEE International Conference on High Performance Computing and Communications (HPCC).
Feist et al. (2019) Josselin Feist, Gustavo Grieco, and Alex Groce. 2019. Slither: a static analysis framework for smart contracts. In IEEE/ACM International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB).
He et al. (2019) Jingxuan He, Mislav Balunovic, Nodar Ambroladze, Petar Tsankov, and Martin T. Vechev. 2019. Learning to Fuzz from Symbolic Execution with Application to Smart Contracts. In Proceedings of Conference on Computer and Communications Security (CCS).
Ivanov and Prokhorenkova (2021) Sergei Ivanov and Liudmila Prokhorenkova. 2021. Boost then Convolve: Gradient Boosting Meets Graph Neural Networks. In International Conference on Learning Representations (ICLR).
Jiang et al. (2018) Bo Jiang, Ye Liu, and W. K. Chan. 2018. ContractFuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of International Conference on Automated Software Engineering (ASE).
Jung et al. (2019) Eunjin Jung, Marion Le Tilly, Ashish Gehani, and Yunjie Ge. 2019. Data Mining-Based Ethereum Fraud Detection. In Proceedings of IEEE International Conference on Blockchain (Blockchain).
Le and Mikolov (2014) Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning.
Lehman (2021) Richard Lehman. 2021. Ponzi schemes dropped in 2020. Retrieved July 3, 2023 from https://www.ponzitracker.com/home/ponzi-schemes-dropped-in-2020-but-this-may-not-be-a-silver-lining
Liang et al. (2024) Ruichao Liang, Jing Chen, Kun He, Yueming Wu, Gelei Deng, Ruiying Du, and Cong Wu. 2024. PonziGuard: Detecting Ponzi Schemes on Ethereum with Contract Runtime Behavior Graph (CRBG). In Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE).
Lou et al. (2020) Yincheng Lou, Yanmei Zhang, and Shiping Chen. 2020. Ponzi Contracts Detection Based on Improved Convolutional Neural Network. In Proceedings of International Conference on Services Computing (SCC).
Lu et al. (2024) Pengcheng Lu, Liang Cai, and Keting Yin. 2024. SourceP: Detecting Ponzi Schemes on Ethereum with Source Code. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Nguyen et al. (2020) Tai D. Nguyen, Long H. Pham, Jun Sun, Yun Lin, and Quang Tran Minh. 2020. sFuzz: an efficient adaptive fuzzer for solidity smart contracts. In Proceedings of International Conference on Software Engineering (ICSE).
Project (2023) Remix Project. 2023. Remix. Retrieved June 12, 2023 from https://remix-project.org/
pytorch (2023) pytorch. 2023. Pytorch. Retrieved June 12, 2023 from https://pytorch.org/
Rodler et al. (2019) Michael Rodler, Wenting Li, Ghassan O. Karame, and Lucas Davi. 2019. Sereum: Protecting Existing Smart Contracts Against Re-Entrancy Attacks. In Proceedings of Network and Distributed System Security Symposium (NDSS).
Rustgi (2020) Nivesh Rustgi. 2020. Ethereum’s Top Gas Guzzlers are Ponzi Schemes. Retrieved March 26, 2023 from https://cryptobriefing.com/ethereums-top-gas-guzzlers-ponzi-schemes/
Schneidewind et al. (2020) Clara Schneidewind, Ilya Grishchenko, Markus Scherer, and Matteo Maffei. 2020. ethor: Practical and provably sound static analysis of ethereum smart contracts. In Proceedings of ACM SIGSAC Conference on Computer and Communications Security (SIGSAC).
Schwartz et al. (2010) Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. 2010. All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of IEEE Symposium on Security and Privacy (S&P).
SEC (2019) U.S. SEC. 2019. U.S. Securities and Exchange Commission (SEC) Website. Retrieved February 7, 2023 from https://www.sec.gov/spotlight/enf-actions-ponzi.shtml
Shcherbakov et al. (2024) Mikhail Shcherbakov, Paul Moosbrugger, and Musard Balliu. 2024. Unveiling the Invisible: Detection and Evaluation of Prototype Pollution Gadgets with Dynamic Taint Analysis. In Proceedings of International World Wide Web Conferences (WWW).
Shou et al. (2023) Chaofan Shou, Shangyin Tan, and Koushik Sen. 2023. ItyFuzz: Snapshot-Based Fuzzer for Smart Contract. In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).
Sun et al. (2023) Cong Sun, Yuwan Ma, Dongrui Zeng, Gang Tan, Siqi Ma, and Yafei Wu. 2023. μDep: Mutation-Based Dependency Generation for Precise Taint Analysis on Android Native Code. IEEE Transactions on Dependable and Secure Computing 20 (2023).
Sun et al. (2020) Weisong Sun, Guangyao Xu, Zijiang Yang, and Zhenyu Chen. 2020. Early Detection of Smart Ponzi Scheme Contracts Based on Behavior Forest Similarity. In Proceedings of International Conference on Software Quality, Reliability and Security (QRS).
Szabo (1994) Nick Szabo. 1994. Smart Contracts: Building Blocks for Digital Markets.
Thakoor et al. (2022) Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Velickovic, and Michal Valko. 2022. Large-Scale Representation Learning on Graphs via Bootstrapping. In The Tenth International Conference on Learning Representations (ICLR).
Torres et al. (2021) Christof Ferreira Torres, Antonio Ken Iannillo, Arthur Gervais, and Radu State. 2021. ConFuzzius: A Data Dependency-Aware Hybrid Fuzzer for Smart Contracts. In Proceedings of European Symposium on Security and Privacy (EuroS&P).
Torres et al. (2018) Christof Ferreira Torres, Julian Schütte, and Radu State. 2018. Osiris: Hunting for Integer Bugs in Ethereum Smart Contracts. In Proceedings of Annual Computer Security Applications Conference (ACSAC).
Wang et al. (2023) Chao Wang, Ronny Ko, Yue Zhang, Yuqing Yang, and Zhiqiang Lin. 2023. Taintmini: Detecting Flow of Sensitive Data in Mini-Programs with Static Taint Analysis. In IEEE/ACM International Conference on Software Engineering (ICSE).
Wei et al. (2023) Lanning Wei, Huan Zhao, Zhiqiang He, and Quanming Yao. 2023. Neural Architecture Search for GNN-Based Graph Classification. ACM Trans. Inf. Syst. (2023).
Wood (2022) Gavin Wood. 2022. Ethereum: A secure decentralised generalised transaction ledger Berlin version. Retrieved January 26, 2023 from https://ethereum.github.io/yellowpaper/paper.pdf
Wu et al. (2024) C. Wu, H. Cao, G. Xu, C. Zhou, J. Sun, R. Yan, Y. Liu, and H. Jiang. 2024. It’s All in the Touch: Authenticating Users with HOST Gestures on Multi-Touch Screen Devices. IEEE Transactions on Mobile Computing (2024).
Wu et al. (2022a) Cong Wu, Jing Chen, Kun He, Ziming Zhao, Ruiying Du, and Chen Zhang. 2022a. EchoHand: High Accuracy and Presentation Attack Resistant Hand Authentication on Commodity Mobile Devices. In Proceedings of Conference on Computer and Communications Security (CCS).
Wu et al. (2020a) Cong Wu, Kun He, Jing Chen, Ruiying Du, and Yang Xiang. 2020a. CaIAuth: Context-Aware Implicit Authentication When the Screen Is Awake. IEEE Internet of Things Journal 7 (2020).
Wu et al. (2020b) Cong Wu, Kun He, Jing Chen, Ziming Zhao, and Ruiying Du. 2020b. Liveness is Not Enough: Enhancing Fingerprint Authentication with Behavioral Biometrics to Defeat Puppet Attacks. In Proceedings of USENIX Security Symposium.
Wu et al. (2022b) Cong Wu, Kun He, Jing Chen, Ziming Zhao, and Ruiying Du. 2022b. Toward Robust Detection of Puppet Attacks via Characterizing Fingertip-Touch Behaviors. IEEE Transactions on Dependable and Secure Computing 19 (2022).
Wu et al. (2020c) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. 2020c. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems 32 (2020).
Wüstholz and Christakis (2020) Valentin Wüstholz and Maria Christakis. 2020. Harvey: a greybox fuzzer for smart contracts. In Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
Xue et al. (2024) Yinxing Xue, Jiaming Ye, Wei Zhang, Jun Sun, Lei Ma, Haijun Wang, and Jianjun Zhao. 2024. xFuzz: Machine Learning Guided Cross-Contract Fuzzing. IEEE Transactions on Dependable and Secure Computing 21 (2024).
Ye et al. (2023) Mingxi Ye, Yuhong Nan, Zibin Zheng, Dongpeng Wu, and Huizhong Li. 2023. Detecting State Inconsistency Bugs in DApps via On-Chain Transaction Replay and Fuzzing. In Proceedings of ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA).
You et al. (2019) Jiaxuan You, Rex Ying, and Jure Leskovec. 2019. Position-aware graph neural networks. In International conference on machine learning (ICML). 7134–7143.
Yu et al. (2021) Shanqing Yu, Jie Jin, Yunyi Xie, Jie Shen, and Qi Xuan. 2021. Ponzi Scheme Detection in Ethereum Transaction Network. In Blockchain and Trustworthy Systems (BlockSys). https://doi.org/10.1007/978-981-16-7993-3_14
Zhang et al. (2021) Muhan Zhang, Pan Li, Yinglong Xia, Kai Wang, and Long Jin. 2021. Labeling trick: A theory of using graph neural networks for multi-node representation learning. Advances in Neural Information Processing Systems 34 (2021).
Zhang et al. (2020) Qingzhao Zhang, Yizhuo Wang, Juanru Li, and Siqi Ma. 2020. EthPloit: From Fuzzing to Efficient Exploit Generation against Smart Contracts. In Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).
Zheng et al. (2023) Zibin Zheng, Weili Chen, Zhijie Zhong, Zhiguang Chen, and Yutong Lu. 2023. Securing the Ethereum from Smart Ponzi Schemes: Identification Using Static Features. ACM Trans. Softw. Eng. Methodol. (2023).
zhijie (2023) zhijie. 2023. Ponzi Contract Dataset. Retrieved May 3, 2024 from https://xblock.pro/#/dataset/25
Zou et al. (2021) Deqing Zou, Yawei Zhu, Shouhuai Xu, Zhen Li, Hai Jin, and Hengkai Ye. 2021. Interpreting Deep Learning-based Vulnerability Detector Predictions Based on Heuristic Searching. ACM Trans. Softw. Eng. Methodol. 30 (2021).