This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: RMIT University, Melbourne VIC 3000, AU 22institutetext: Monash University, Clayton VIC 3800, AU

Improving the Accuracy of Transaction-Based Ponzi Detection on Ethereum

Phuong Duy Huynh 11    Son Hoang Dau 11    Xiaodong Li 11    Phuc Luong 22    Emanuele Viterbo 22
Abstract

The Ponzi scheme, an old-fashioned fraud, is now popular on the Ethereum blockchain, causing considerable financial losses to many crypto investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code. This contract-code-based approach, while achieving very high accuracy, is not robust because a Ponzi developer can fool a detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected. On the contrary, a transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. In this paper, we aim to improve the accuracy of the transaction-based models by employing time-series features, which turn out to be crucial in capturing the life-time behaviour a Ponzi application but were completely overlooked in previous works. We propose a new set of 85 features (22 known account-based and 63 new time-series features), which allows off-the-shelf machine learning algorithms to achieve up to 30%30\% higher F1-scores compared to existing works.

1 Introduction

Since the birth of Bitcoin in 2008 [39], the blockchain technology has grown exponentially and revolutionised the way currencies and digital assets are transferred. Thanks to its inherent decentralisation, anonymity, and immutability, a blockchain provides better tampering resistance, robustness, privacy protection, and cheaper turn-around costs compared to traditional financial systems [19, 58]. Apart from digital finance applications, Turing-complete smart contracts introduced first by Ethereum [7] and then by other similar blockchain platforms allow developers to implement sophisticated logic on the chain, further expanding the applicability of the technology to many other sectors including supply chains [21, 34, 8], data sharing [49, 31], and the internet of things [41, 20, 40].

In recent years, crypto-crowdfunding via initial coin offerings (ICOs) has become a major fundraising method used by many businesses [38], providing an attractive alternative to traditional stock exchanges. By the end of December 2023, the global market capitalization of blockchains had reached a staggering amount of over $1.7 trillion with more than 2.2 million different cryptocurrencies [16]. However, this phenomenal success of the blockchain technology in digital finance has also led to a rising number of cybercrimes. Smart-contract-supporting blockchains have now become a paradise for a plethora of devastating financial scams, such as Ponzi, Honeypots, Trapdoor, Phishing, and Rug Pull [9, 28].

A Ponzi scheme is a scam that promises high returns to investors by using the funds from newcomers to pay earlier investors. This scam collapses when few or no new investors join, making most investors, except for the early ones and the scheme owner, lose their investment. According to Chainanalysis’s 2021 Crypto Crime Report [10], from 2017 to 2020, most blockchain frauds were Ponzi schemes, which accounted for nearly $7 billion worth of cryptocurrency in 2019, more than double of all other scams in 2020. The development of Ponzi schemes on Ethereum111In the scope of this work we focus on Ponzi schemes on Ethereum only. For Bitcoin-based Ponzi schemes, please refer to [50, 4]. have attracted some attention from the research community. The first work was done by Bartoletti et al. [3], who analysed Solidity source codes of smart contracts and proposed four criteria to identify a Ponzi scheme (their paper first appeared on Arxiv in 2017). They classified Ponzi schemes on the Ethereum chain into four different types according to their money distribution logic. They also constructed the very first Ponzi dataset on Ethereum, consisting of 184 Ponzi contracts by manually inspecting their source codes. To automatically detect Ponzi schemes, a number of detection models using machine learning methods [12, 13, 32, 24, 51, 57, 30] and symbolic execution techniques [14] have been developed in the literature.

Most of the machine learning approaches employed both transaction-based features (e.g., account features) and contract-code-based features (e.g., opcode features) in their models to improve the detection accuracy. However, a contract-code-based approach, while capable of achieving very high accuracy, is not robust. First, the Solidity source codes of a majority of contracts on Ethereum are not available [60]. Second, a Ponzi operator can fool a contract-code-based detection model by obfuscating the opcode (see [14, Section 7.2.1]) or inventing a new profit distribution logic that cannot be detected (see [14, Section 8]). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated (see Section 2.3). However, the current transaction-based detection models [12, 13, 32] achieve fairly poor performance, e.g. 44% and 68.8% (see the discussion in Experiment 1, Section 4.3).

In this work, we aim to develop more robust and accurate detection models that only rely on transaction data. To this end, we first retrieved from the XBlock-ETH repository [56] all transactions associated with 1395 labeled applications provided in [14]. We then analysed the data to capture the way Ponzi applications work. We observed that Ponzi and non-Ponzi applications have distinctive behaviours and characteristics and more importantly, that the time factor, which has been overlooked in most studies, is crucial in identifying a Ponzi application. For example, the balance of a Ponzi account may have several “cliffs” in its lifetime, which occurred when funds were gradually accumulated and then paid out to one or several investors. As another example, a Ponzi application often has shorter lifespan with a peak transaction volume around its creation time, and then few to none transaction after that. Based on such intuition, we designed a new feature list that consists of existing features and novel time-series features that capture the behaviours of an application throughout its lifetime.

To evaluate the effectiveness of the proposed list of features, we ran different machine learning algorithms on this list and on the existing lists of features used by other transaction-based models [12, 32], treated as the baselines. Analysing the list of important features from the best performing model (LGBM), we found that time-series features indeed contributed to the improvement of the F1-score of the model. The improvement is up to 5.7% as compared to models that only used account features. Furthermore, using LGBM’s feature importance, we trimmed our original feature set of size 545 to obtain a much smaller one consisting of the 85 most important features (for LGBM). It is remarkable that the trimmed feature set outperforms the original one for all performance metrics (Accuracy, Precision, Recall, F1-score, and running time) in all five off-the-shell machine learning algorithms. Especially, we observed a sharp increase in F1-score when using the new list of 85 features (22 known features and 63 time-series features) compared to when using the existing lists from [12] and [32]. More specifically, our model achieved 11% higher F1-score compared to [32] (Random Forest) and 30% higher F1-score compared to [12] (XGBoost). Last but not least, we demonstrated that our approach can also detect, with high accuracy, new types of Ponzi schemes that were not present in the training dataset.

The rest of the paper is organised as follows. In Section 2, we introduce the background knowledge and discuss related work. We describe how to construct time-series features in Section 3 and demonstrate the effectiveness of our new feature set via extensive experiments in Section 4. We conclude the paper in Section 5.

2 Background

2.1 Ethereum in a Nutshell

Ethereum is the second most popular blockchain after Bitcoin in terms of market capitalization [15]. It is also the largest platform that provides a decentralised virtual environment (i.e., Ethereum Virtual Machine or EVM for short) to execute smart contracts [46]. In 2022, the Ethereum chain reached 15 million blocks with over 1.5 billion transactions [48]. Smart contracts on Ethereum are executable programs that run automatically when their trigger conditions are met. Those contracts can be implemented using an object-oriented and high-level language called Solidity [22]. Contract source codes are then compiled into bytecodes, which can be represented as low-level human-readable instructions - opcodes [55]. After that, the bytecodes are launched onto EVM. Once a contract is deployed, it cannot be modified by anyone. Moreover, any activity in the life cycle of a contract must be triggered by a transaction. Therefore, any interaction ‘from’ or ‘to’ a contract is recorded as a transaction and stored on the blockchain permanently. In other words, in Ethereum, a transaction is a key unit that involves all activities of a contract.

2.2 Ponzi Schemes on Ethereum

The blockchain technology, although has the potential to revolutionise the way traditional businesses work [36], also creates a golden opportunity for cybercriminals, resulting in the migrations of many financial scams to the blockchain platforms [9]. Among blockchain scams, Ponzi schemes [2] were the most popular from 2017 to 2020. In hindsight, this is not a surprise because Blockchain’s inherent properties, i.e., automation, transparency, immutability, and anonymity create an ideal environment for this scam to grow [35].

In layman term, Ponzi schemes are scams often camouflaged as high-return investment programs that use the funds from new investors to pay existing ones. With no real project behind and no intrinsic value, a Ponzi scheme will collapse when there are not enough new investors joining and/or the payment commitment can no longer be fulfilled. A more official and authoritative definition of Ponzi schemes is given by the U.S. Securities and Exchange Commission [43]. At the heart of each Ponzi scheme is a money redistribution mechanism. Bartoletti et al. [3] classified Ponzi schemes on Ethereum into four different categories based on their redistribution mechanisms, including Chain-shaped, Tree-shaped, Handover, and Waterfall schemes (see Appendix 0.A).

2.3 Related Works

Existing Ponzi detection models can be divided into two groups, depending on whether they rely on smart contract codes or on the transactions.

Contract-Code-Based Approaches: Bartoletti et al. [3] first proposed four criteria to detect a Ponzi application by inspecting their contract source codes. However, it turns out that the Solidity source codes of 77.3% contracts on Ethereum are not available [60]. To tackle this drawback and to detect Ponzi automatically, many researchers built Ponzi detection tools based on the frequency distribution of operation codes (opcodes), which are always available on the Ethereum chain. Chen et al. [12, 13] proposed an automatic detection tool on opcode features using machine learning models. Their experimental results showed that the detection models using opcode features achieved greater performance than those using account features, which were aggregated from transactions. Fan et al. [23] pointed out that an imbalanced dataset caused an over-fit in previous works [12, 13]. To improve data quality and the detection accuracy, they proposed a data enhancement method that expanded the dataset and eliminated the imbalance using data sampling techniques. Wang et al. [51] adopted a deep learning technique to build a more accurate detection tool and also used oversampling techniques (Smote and Tomek) to deal with an imbalanced dataset. Jung et al. [32] and Sun et al. [44] focused more on crafting better representative features than improving data quality.

A common drawback of all previously mentioned studies is the lack of robustness. As pointed out by Chen et al. [14], scammers can use code obfuscation techniques [6] to counter those detection models that rely on opcode features (see [14, Section 7.2.1]). For example, a contract code can be manipulated or modified to change the opcode occurrence frequency. Chen et al. [14] also proposed in their work a new detection tool called SADPonzi, which was built upon a semantic-aware approach and achieved 100% Precision and Recall. SADPonzi was proven to be more robust than the current opcode-based method when facing code obfuscation techniques. More specifically, it can detect a Ponzi contract by comparing the extracted semantic information of its bytecode and the predefined semantics of four known Ponzi schemes. However, the approach of SADPonzi requires a domain expert to analyse a Ponzi application’s operational logic to build the corresponding semantic pattern, which can be costly to put into practice. On top of that, as also mentioned by the authors, SADPonzi can only effectively detect known Ponzi types with predefined semantics, and may fail to detect a new Ponzi variant (see [14, Section 8]).

Transaction-Based Approaches: Transactions are records that save historical activities between an application and its participants. Transaction data was used in some existing works [3, 12, 13] to capture the differences between Ponzi and non-Ponzi applications. Detection tools based on transaction data are more resilient to scammers’ countermeasures because transaction information cannot be modified or deleted from the chain. Although scammers can add transaction records, they cannot manipulate transaction data as freely as they can with smart contract’s source code and opcodes for two reasons. First, any participant, not just the creator, can create transactions, which is not under the control of the contract creator. Second, the cost to create a transaction on the chain is expensive (approximately $14.26 on average per transaction [47]). These factors prevent the Ponzi scammer from manipulating their transaction data arbitrarily to elude detection, e.g. by flooding the system with fake transactions.

Despite several advantages, existing transaction-based models [12, 13, 14] suffered from low classification accuracy, achieving F1-scores around 44%-69% only. The key reason for their mediocre performance, based on our analysis, could be due to the fact that existing works have completely missed the time dimension when building their models. We note that some studies [12, 13, 32, 51], in order to improve the detection accuracy, integrated account features with opcode features. However, such a hybrid approach also inherits the shortcomings of the contract-code-based approach. In this work, we explored the temporal behaviour of Ponzi applications and introduced time-series features alongside existing account features, aiming for both robustness and accuracy in detection.

3 Transaction-Based Features Extraction

3.1 Data Collection

We refined the dataset of labeled Ponzi and non-Ponzi addresses provided in the SADPonzi paper [14] by first downloading and extracting transaction histories of these contracts from the XBlock-ETH repository [59, 56]. We then filtered out unsuccessful transactions which failed for various reasons such as insufficient gas (a required fee to successfully conduct a transaction) or errors in the contract codes. We also discarded applications with no transactions or having lifetime (the period from the recreation time to the last transaction) shorter than one day. These are outliers, whose behaviours do not resemble the whole group. Even if such an application was a Ponzi, it was also a failed scam. Therefore, removing those applications is important to build a clean dataset, especially for a transaction-based approach. Our refined dataset contains 1182 non-Ponzi and 79 Ponzi applications. The Ponzi types statistics are displayed in Table 1.

Table 1: Ponzi types (see Appendix 0.A) statistics for our refined dataset.
Ponzi type Number of applications Percentage
Chain-shaped 68 86%
Tree-shaped 1 1.3%
Handover 1 1.3%
Waterfall 4 5%
Other 5 6.4 %

3.2 The Importance of Temporal Behaviour in Ponzi Detection

In this section, we investigate how Ponzi and non-Ponzi applications work differently with respect to their temporal behaviours. To this end, we chose to analyse the historical transaction data of DynamicPyramid, a representative Ponzi contract2220xa9e4e3b1da2462752aea980698c335e70e9ab26c (DynamicPyramid’s address). This is a chain-shaped scheme, the most popular type, which constitutes 86% of all known Ponzi contracts. In general, different types of applications have different transaction behaviours, and understandably, Ponzi applications have unique behaviours that are different from non-Ponzi ones. In our analysis, we compare the representative applications of the two groups to demonstrate their potential differences regarding temporal behaviours. A comprehensive comparison between different types of Ponzi and all different types of non-Ponzi, while valuable, would be an overkill for our purpose, and hence, out of the scope.

Transaction volumes. We start our analysis by comparing the transaction volumes of a Ponzi application (DynamicPyramid) and a non-Ponzi application3330xb2c3531f77ee0a7ec7094a0bc87ef4a269e0bcfc (a non-Ponzi contract address). The transaction volume of an application measures the daily number of associated transactions. As observed in Fig. 1, DynamicPyramid had a shorter lifespan with a peak transaction volume concentrating in the first month followed by almost no activities. In comparison, the non-Ponzi application had more regular activities throughout its long lifespan. The reason is that a Ponzi application, although often introduces itself as a potential project with a high investment return promise, has no actual project behind. Thus, participants of Ponzi applications often participated actively at the beginning as early investors got paid regularly. However, as time goes by, fewer investors got paid and more will start leaving, leading to the unavoidable collapse.

Refer to caption
Figure 1: Daily transaction volumes of a Ponzi (DynamicPyramid) and a non-Ponzi applications. The Ponzi application had a shorter lifespan with a peak transaction volume concentrating in the first month followed by almost no activities. By contrast, the non-Ponzi application had more regular activities throughout its long lifespan.

Investment versus payment activities. Pushing our analysis one step further, we break down transactions into two different types, namely, investments and payments. An investment refers to a transaction sending ETH from an investor to an application, whereas a payment refers to a transaction from an application that pays ETH to an investor. As demonstrated in Fig. 2, payments (orange dots) and investments (blue dots) of the Ponzi application concentrated only in the first month. Moreover, each orange dot was preceded by some blue dots of smaller ETH amounts. This is because the examined Ponzi application, a chain-shaped scheme, must gather sufficient new investment funds before making a payment to a single participant. After this intensely active period, the number of payments decreased and finally disappeared. This happened because the application’s balance was no longer enough to make any new payment despite having a few new investments coming in. On the contrary, both investment and payment activities spread out over the lifespan of the non-Ponzi application.

Application balance. The balance of an application is the amount of ETH in the application at a time. How the balance varies as time goes by can indicate the type of application. As demonstrated in Fig. 3, the balance of the Ponzi application (Dynamic Pyramid) often rose gradually (investments), and after a while, dropped dramatically, generating a “cliff” (payment). The reason is that the balance was gradually accumulated from the investments until reaching the amount that the application had committed to pay to a particular investor when they joined the application. Once the desired balance was reached, the promised profit was immediately paid to the corresponding investor.

Refer to caption
Figure 2: Investment and payment activities of a Ponzi (DynamicPyramid) and a non-Ponzi applications. Several lower investments (blue dots) were followed by a higher payment (orange dot) in the Ponzi application, which demonstrates the funds accumulation before a payment to an investor can be made.
Refer to caption
Figure 3: Application balances (in the first four months after launch) of a Ponzi application (DynamicPyramid) and a non-Ponzi application. As observed, the chart of the Ponzi contract had a number of “cliffs” while that of the non-Ponzi contract had none.

3.3 Transaction-Based Features Extraction

We classify transaction-based features into two types: account features and time-series features. Transaction-based detection models proposed so far in the literature only used account features [12], which capture general statistics of the transactions associated with the application, e.g. the total/average investment amount, the final balance of the contract, or the maximum number of payments to an investor. As mentioned in Section 3.2, using only account features led to rather poor prediction performance (low F1-scores). To improve transaction-based Ponzi detection models, it is essential to employ both account features and time-series features. We discuss in detail below how to extract these features, especially the new time-series ones.

Account features: This type of features captures general information about the contract of interest and has been widely used in previous studies [12, 13, 32, 51]. More specifically, general statistic metrics such as average, count, sum, standard deviation, and Gini coefficient [54] can be extracted from the set of all relevant transactions to aggregate account features. Although account features are insufficient to capture all behaviours of the Ponzi scheme, they are still very useful in revealing Ponzi’s working logic. For example, the Gini coefficient of the number of payments can demonstrate an inequality in money distribution, or the final balance of the application indicates whether the investment funds have been distributed all to investors. Therefore, we still include in our list 29 account features introduced earlier in the literature [13, 12, 32, 51] (listed in Appendix 0.B).

Time-series features: As discussed earlier, time-series features play an important role in identifying Ponzi applications. Unlike account features, they capture the behaviours and activities throughout the application’s lifetime. To aggregate time-series features, we first partitioned our transactions into several time intervals (with interval length of either 12, 24, or 48 hours) and built 43 time series (see Appendix 0.C for the complete list), which measure various aspects of the transactions. These times series form three dimensions, namely, the contract address, the interval, and the data value for that contract in that interval (e.g. account balance). We then use a dimensionality-reduction technique to capture various characteristics (e.g, mean, entropy, spikiness) of aggregated time-series and mapped the 3-D data into the 2-D space to produce the final time-series features. The creation steps of time-series are depicted clearly below:

(1) For a fixed time duration TT, we split our transaction data into NN time intervals of length TT each where Nlife_time/TN\triangleq\lceil\texttt{life\_time}/T\rceil. In our study, we have chosen 3 different values of T to study, which are 12, 24, and 48 hours.

(2) Based on the timestamp field, we assigned each transaction to its corresponding interval.

(3) We created a comprehensive list of 43 time-series to represent all activities that occur during the application’s lifetime. Thus, for each application, the time-series can be represented as a 2-D matrix of size N×43N\times 43. Lastly, if the dataset has MM applications, the time-series data extracted from the whole dataset can be represented as a 3-D array with size M×N×43M\times N\times 43.

Finally, to generate time-series features, we applied a dimensionality-reduction technique [29] to compress the time-series data. In particular, we employed a finite set of 12 statistical measures (see Appendix 0.D) proposed in [18, 25, 29, 53] to capture the global information of the 43 time-series and compressed the 3-D time-series data down to a 2-D M×516M\times 516 matrix (note that 516=43×12516=43\times 12). The Python codes for time-series features aggregation are available at [1].

4 Experiments

4.1 Machine Learning Models

To measure the effectiveness of our proposed set of features (account and time-series features), we reused classification methods employed in the previous studies  [32, 12], namely, Random Forest (RF) [45] and XGBoost (XGB) [11]. In addition, other well-known classification methods such as K-nearest neighbour (KNN) [17], Support vector machine (SVM) [27], and LightGBM (LGBM) [33] were also included in our experiments in order to find the most suitable classification model for the problem. RF uses the bootstrap resampling technique to generate different training decision trees from the original dataset and a prediction is made by aggregating the predictions from these trees. This algorithm works effectively in several domains [42] including fraud detection [5]. XGB is a gradient-boosting based algorithm that creates gradient-boosted decision trees in sequential form and then groups these trees to form a strong model. Unlike RF, the result of XGB is the prediction of the last model, which addressed data misclassified from previous models. KNN is a non-parametric classifier that uses proximity to estimate the likelihood of a data point belonging to one group. SVM is a classic algorithm that has been widely applied in binary classification or fraud detection problems, which performs classification by establishing a hyperplane that enlarges the boundary between two categories in a multi-dimensional feature space. LGBM is also a gradient-boosting-based algorithm similarly to XGB. However, LGBM grows a tree vertically (leaf-wise) while XGB grows trees horizontally (level-wise). With leaf-wise algorithms, LGBM is often more accurate and faster than other gradient-boosting-based algorithms. The Scikit-learn python library444https://scikit-learn.org/stable/ was used for RF (default parameters), KNN (default parameters), SVM (default parameters with “poly” kernel), and XGB (default parameters without using label encoder). The Microsoft LightGBM Python library555https://github.com/microsoft/LightGBM was used for LGBM (with default parameters).

4.2 Evaluation Metrics, Model Structure and Experiment Setting

To evaluate the performance of our models, we use standard prediction metrics including Accuracy, Precision, Recall, and F1-score, which are calculated based on the numbers of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) as follows. Accuracy(TP+TN)/(TP+FP+TN+FN)\textbf{Accuracy}\triangleq(\texttt{TP}+\texttt{TN})/(\texttt{TP}+\texttt{FP}+\texttt{TN}+\texttt{FN}) is the fraction of correct predictions, PrecisionTP/(TP + FP)\textbf{Precision}\triangleq\texttt{TP}/(\texttt{TP + FP}) is the fraction of the actual Ponzi applications out of all the predicted Ponzi by the method, RecallTP/(TP + FN)\textbf{Recall}\triangleq\texttt{TP}/(\texttt{TP + FN}) is the fraction of detected scams among all actual scams, and F1-score(2PrecisionRecall)/(Precision + Recall)\textbf{F1-score}\triangleq(2\cdot\texttt{Precision}\cdot\texttt{Recall})/(\texttt{Precision + Recall}) is the harmonic mean of Precision and Recall.

After account features and time-series features were produced, these two feature groups were used individually and as a combination in different models. Our overall transaction-based detection workflow is designed as follows.

Train-test split: the dataset and their feature groups were split into a training set (80%) and a test set (20%). The former is used for training a detection model, while the later is used to evaluate the trained model.

Data sampling: the Ponzi applications only occupy 6% of total applications in our dataset. Therefore, we applied data sampling techniques to balance our dataset. We adopted the well-known oversampling method Borderline-SMOTE [26] to generate new Ponzi instances that have more than half of the K nearest neighbours being non-Ponzi applications. That helps to enhance the existence of Ponzi applications that are more likely to be misclassified as they are located near the border of the two classes.

Model training: the KK-fold cross-validation method was used to train a classifier on the training set. In our experiment, we only set KK = 5, which is lower than common practice in the literature because our dataset is small.

Model evaluation: a trained model was used to classify the applications in the unseen test dataset. To guarantee the reliability of our experiment, we repeated the experiment process 50 times, and the final result was obtained by taking the average. It is worth mentioning that the same hyperparameters were used for the same models for a fair comparison.

4.3 Experimental Results

We conducted three experiments to demonstrate the advantages of our proposed time-series features.

Experiment 1 (Feature sets and detection models evaluation). In this experiment, we aimed to evaluate the effectiveness of our proposed feature list while applying it across diverse machine learning models. As already mentioned, most of the previous studies used either opcode features or both opcode features and account features to build their detection models. Only a few works attempted a transaction-based approach (without using opcode features) separately [12, 32]. To show advantages over previous studies, we rerun their approaches on our dataset as the baselines. However, their codes and feature data have not been released to the community, so we re-implemented those models based on the descriptions in their papers, including feature lists and machine learning models.

In this experiment, we first evaluated our feature sets corresponding to different time intervals (TT = 12, 24, 48 hours). We also tested with three feature sets: ACC consists of account features only, TS consists of time-series features only, and ACC-TS consists of both. A comparison of various metrics between our feature sets and the baselines including the feature set from Chen et al. [12] (Appendix 0.B.1) and from Jung et al. [32] (Appendix 0.B.2) is provided in Table 2. Finally, we evaluated the detection performance of different machine learning models using our ACC-TS feature sets (see Table 3).

Table 2: Effectiveness of the new feature set. The asterisk ‘*’ indicates that our feature list outperforms both baselines [12, 32].
Features Set Number of Features Model Accuracy Precision Recall F1
Jung et al. [32] 18 RF 0.966 0.837 0.604 0.694
Chen et al. [12] 7 XGB 0.942 0.587 0.456 0.499
ACC 29 RF 0.961 0.670 0.823 0.733
XGB 0.965 0.700 0.835 0.756
TS 516 RF 0.957 0.638 0.813 0.710
XGB 0.962 0.681 0.830 0.743
ACC-TS 12 Hrs 545 RF 0.965 0.706 0.826 0.755
XGB 0.972 0.752 0.856 0.797
ACC-TS 24 Hrs 545 RF 0.967 0.691 0.840 0.751
XGB 0.974 0.743 0.887 0.802
ACC-TS 48 Hrs 545 RF 0.969 0.704 0.816 0.748
XGB 0.973 0.733 0.854 0.782

We note that the F1-score we obtained for XGBoost is close to what was reported in [12] (49.9% versus 44%). However, we were unable to reproduce the very high scores reported in [32] for Random Forest. The authors of [14] also encountered the same issue: they re-implemented the approach in [32] and achieved similar F1-score (69.4%) as ours (68.8%). This could be due to the fact that in both [14] and our paper, we started from the same dataset of 1395 Ponzi and non-Ponzi schemes, while in [32], the authors used a different dataset that includes 3203 non-Ponzi addresses. Unfortunately, their paper doesn’t provide enough detail on how these addresses were collected and hence, we were not able to recreate their dataset. We also noticed that although the bytecode size (size_info) was created from the smart contract bytecode, it was listed among the top eight important transaction-based features listed in [32, Table 2]. This makes their detection model depend on the contract code as well and therefore is susceptible to opcode obfuscation techniques [14, 6]. In our experiment, to reproduce their transaction-based model, we ignored this irrelevant (contract-code-based) feature size_info.

As shown in Table 2, the detection models using our ACC-TS 24 Hrs feature set improved the F1-score of the models by Jung et al. [32] and Chen et al. [12] by 5.7% and 30.3%, respectively. It is also clear that using both account and time-series feature lead to better Accuracy, Precision, Recall, and F1-score compared to using individual type of features. Note that when using time-series features (TS) alone, these models already yielded higher F1-scores than the baselines. From Table 3, we observed that tree-based classifiers were more efficient in Ponzi detection than other algorithms. More specifically, RF, XGB, and LGBM achieved better Accuracy, Precision, and F1-score values than other classifiers across different ACC-TS feature sets. Among tree-based models, LGBM with the ACC-TS 12 Hrs features achieved the best F1-score, Accuracy, and Precision. Last but not least, our models achieved 11% higher F1-score for RF compared to [32] and 30% higher F1-score for XGB compared to [12].

Table 3: Performance comparison among different models and feature sets. The LGBM model with ACC-TS 12 Hrs features achieved the highest F1-score.
Features Set Number of Features Model Accuracy Precision Recall F1
ACC 29 SVM 0.894 0.378 0.829 0.510
KNN 0.898 0.388 0.875 0.532
RF 0.961 0.670 0.823 0.733
XGB 0.965 0.700 0.835 0.756
LGBM 0.967 0.717 0.823 0.760
ACC-TS 12 Hrs 545 SVM 0.828 0.282 0.964 0.432
KNN 0.899 0.392 0.928 0.547
RF 0.965 0.706 0.826 0.755
XGB 0.972 0.752 0.856 0.797
LGBM 0.975 0.779 0.867 0.817
ACC-TS 24 Hrs 545 SVM 0.845 0.281 0.973 0.433
KNN 0.898 0.368 0.918 0.521
RF 0.967 0.691 0.840 0.751
XGB 0.974 0.743 0.887 0.802
LGBM 0.975 0.768 0.878 0.812
ACC-TS 48 Hrs 545 SVM 0.824 0.242 0.950 0.380
KNN 0.886 0.326 0.922 0.476
RF 0.969 0.704 0.816 0.748
XGB 0.973 0.733 0.854 0.782
LGBM 0.974 0.737 0.854 0.784

Experiment 2 (Contribution of time-series features). Next, we investigate how much the newly proposed time-series features have contributed to LGBM’s performance, which was the best performing model. To do this, we first retrieve the list of feature importance from LGBM in the previous experiment. The importance of a feature in the LGBM model is defined to be the number of times this feature is used to split the data across all decision trees. In LGBM, an effective feature selection technique, namely Exclusive Feature Bundling (EFB), has been adopted to reduce the number of features without affecting the model’s performance. We find that only 205/545 features (516 time-series features and 29 account features) had been used at least once to build a tree in the LGBM detection model. More specifically, these 205 important features consist of 176 time-series features and 29 account features. We sorted these 205 features in descending order of importance. After that, we conducted a detection with the LGBM model using only the 5,10,15,,2055,10,15,\ldots,205 most important features among the 205. The experimental results shown in Fig. 4 demonstrate how the F1-score values of the prediction were increased as more time-series features were added in the model. We can also observe in the bottom sub-figure that from the top 5 onward, time-series features start to appear. For example, the top 30 contains 15 account features and 15 time-series features.

Refer to caption
Figure 4: LGBM’s performance when using the most important features (top sub-figure) and the percentages of time-series features among these top features (bottom sub-figure). The F1-score value increases as more time-series features are used in the feature list, demonstrating the effectiveness of using time-series features.

As can be seen from Fig. 4, the F1-score sharply increases when we increase the number of features, especially at the beginning. According to the F1-score in Fig. 4, the value exceeds 0.8 when at least 20 features out of 205 most important features are used, reaching a peak at 0.83 with the top 85 features which consist of 63 time-series features and 22 account features. Then, the F1-score values fluctuate around 0.81 as the number of features further increases. As shown in the bottom sub-figure of Fig. 4, the percentage of time-series features in the important feature list increases in the same direction as F1-score. It proves that our proposed time-series features have significantly contributed to the best-performing model (LGBM) in Experiment 1.

According to above experiment, we refined the ACC-TS 12 Hrs feature set by selecting top 85 important features of the best model (see Appendix 0.E), labelled as RF-ACC-TS 12 Hrs. To improve the detection performance, we ran all detection models with selected features instead of the full set of features. Table 4 show the detection performance results between models that use ACC-TS 12 Hrs and RF-ACC-TS 12 Hrs feature sets. Remarkably, the models using RF-ACC-TS 12 Hrs features show an average improvement of 2.1%, 5.5%, 1.6%, 5.3% for Accuracy, Precision, Recall, and F1-score, respectively.

Experiment 3 (Detecting a new type of Ponzi). To verify whether our classification model using the proposed feature list can detect a new Ponzi type, we conducted the third experiment using the LGBM model as follows. The key idea is to train our model on some types of Ponzi schemes and test it on another type of Ponzi schemes to see if it can still accurately detect these schemes.

Table 4: Our new list of 85 features (RF-ACC-TS) completely outperformed the list of originally proposed 545 features (ACC-TS) for all metrics. The ‘time’ column measures the training and prediction time (in seconds) when using different sets of features.
Features Set NoF Model Accuracy Precision Recall F1
Time
(Seconds)
ACC-TS 12 Hrs 545 SVM 0.828 0.282 0.964 0.432 0.359
KNN 0.899 0.392 0.928 0.547 0.094
RF 0.964 0.696 0.822 0.749 1.439
XGB 0.972 0.752 0.856 0.797 2.437
LGBM 0.975 0.779 0.867 0.817 6.493
RF-ACC-TS 12 Hrs 85 SVM 0.910 0.425 0.967 0.586 0.111
KNN 0.909 0.422 0.963 0.582 0.063
RF 0.973 0.767 0.855 0.804 0.849
XGB 0.973 0.770 0.857 0.803 0.715
LGBM 0.977 0.795 0.876 0.830 3.076

We first removed all applications for each known type of Ponzi schemes mentioned in Section 2.2 from our training set. The removed applications were then used in a test set for testing the trained model’s new Ponzi detection ability. However, we only removed each of the three Ponzi types (waterfall schemes, tree-shaped schemes, and handover schemes) or all three and not the chain-shaped schemes, which account for 86% of all the Ponzi schemes in the dataset. If we remove all chain-shaped schemes, the number of Ponzi samples becomes too small to learn the scam’s behaviours. Furthermore, various test sets with different scam rates were used to test our model in different situations, e.g., with a full-scam test set (100% scams), a balance test set (50% scams), and a few-scam test set (6% scams, similar to our entire dataset’s scam rate). Due to the lack of Ponzi (P) applications, we can only decrease the scam rate by increasing the number of non-Ponzi (non-P) applications in test sets.

Table 5: The outcomes of Experiment 3 (Detecting a new type of Ponzi). All applications of each Ponzi type were removed from our training set. These applications were then used only for testing. We also experimented with test sets of different Ponzi rates (100%, 50%, and 6%). New Ponzi types were successfully detected with high accuracy, demonstrating the capability of our proposed feature list and detection models.
Test scheme Scam rate #P #non-P Accuracy Precision Recall F1-score
Waterfall 100% 4 0 0.91 1.0 0.91 0.94
50% 4 4 0.94 0.98 0.89 0.93
6% 4 62 0.97 0.79 0.89 0.83
Tree-shaped 100% 1 0 1.0 1.0 1.0 1.0
50% 1 1 0.99 0.99 1.0 0.99
6% 1 15 0.98 0.87 1.0 0.91
Handover 100% 1 0 0.97 0.97 0.97 0.97
50% 1 1 0.97 0.94 0.95 0.94
6% 1 15 0.98 0.80 0.94 0.85
All of above 100% 6 0 0.92 1.0 0.92 0.95
50% 6 6 0.94 0.98 0.91 0.93
6% 6 94 0.98 0.80 0.91 0.84

The results, demonstrated in Table 5, indicate that the detection model can detect over 89% of actual new Ponzi applications in a given test set (greater than 89% of Recall value in most cases). Moreover, the Precision value is approximately 80% even in test sets with very few scams, and the model also achieved F1-score at least 90% in all cases. Analysing the top 30 important features exported from the trained model, we notice that more than 80% of features in the list are aggregated from transaction volume, investment and payment activities, and application balance with 50% of them being time-series features. It is not surprising since those features help discriminate between Ponzi and non-Ponzi applications, as clearly shown in Section 3.2. That confirms the ability of the transaction-based approach to detect a new Ponzi and the importance of the time factor. Although the dataset we use (from [14]) is not ideal in the sense that there are very few Ponzi applications of types other than the chain-shaped, which may affect the reliablity of our third experiment, the outcome still gives strong evidence that a completely new Ponzi type can be detected.

5 Conclusions

In this study, we proposed a robust method for detecting Ponzi schemes in Ethereum using only the transaction data, which is harder and more costly to be manipulated by scammers. We proposed a list of effective features that reflect the scam natures, extracted from a careful analysis of the Ponzi and non-Ponzi schemes, in order to improve the detection accuracy of the transaction-based approach. More specifically, our analysis showed that some characteristics of a Ponzi application depend on time and should be captured by time series, representing the application’s behaviours and activities throughout its lifetime. As such, we introduced a list of novel time-series features which help to significantly improve various performance metrics compared to the existing transaction-based approaches.

There are several open problems left for future research. First, although we have considerably increased the detection accuracy of transaction-based detection tools, there is still room for future improvement. Specifically, one open problem is to find more effective statistical measures to capture the global information of time series. Second, it is desirable to collect more data to extend the ground-truth dataset for Ponzi applications, which will help to train the detection models more effectively. Moreover, we can test our approaches on other popular algorithms that work effectively on big data such as Artificial Neural Networks [52] or Recurrent Neural Networks [37]. Finally, blockchain scams are becoming more sophisticated. Scammers may use multiple smart contracts or smart contracts from a third party as an additional service, making the picture much more complicated. In such cases, the application’s transactions might not be enough to perform fraud detection. Detecting such sophisticated scams remains a big challenge for future research.

References

  • [1] Python codes for this paper (2022), https://github.com/ponzidetector/time-dependent-based-ponzi-detector.git
  • [2] Artzrouni, M.: The mathematics of Ponzi schemes. Mathematical Social Sciences 58(2), 190–201 (2009). https://doi.org/https://doi.org/10.1016/j.mathsocsci.2009.05.003, https://www.sciencedirect.com/science/article/pii/S0165489609000572
  • [3] Bartoletti, M., Carta, S., Cimoli, T., Saia, R.: Dissecting Ponzi schemes on Ethereum: Identification, Analysis, and Impact. Future Generation Computer Systems 102, 259–277 (2020). https://doi.org/https://doi.org/10.1016/j.future.2019.08.014, https://www.sciencedirect.com/science/article/pii/S0167739X18301407
  • [4] Bartoletti, M., Pes, B., Serusi, S.: Data mining for detecting Bitcoin Ponzi schemes. In: 2018 Crypto Valley Conference on Blockchain Technology (CVCBT). pp. 75–84 (2018)
  • [5] Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: A comparative study. Decision support systems 50(3), 602–613 (2011)
  • [6] BiAn: a source code level code obfuscation tool developed for solidity smart contract (2022), https://github.com/xf97/BiAn
  • [7] Buterin, V.: Ethereum: A Next Generation Smart Contract and Decentralized application platform (2014), https://ethereum.org/en/whitepaper/
  • [8] Casado-Vara, R., Prieto, J., De la Prieta, F., Corchado, J.M.: How blockchain improves the supply chain: Case study alimentary supply chain. Procedia computer science 134, 393–398 (2018)
  • [9] Chainalysis: Crypto Crime Series: Decoding Ethereum Scams (2019), https://blog.chainalysis.com/reports/ethereum-scams/
  • [10] Chainalysis: The Chainalysis 2021 Crypto Crime Report (2021), https://go.chainalysis.com/2021-Crypto-Crime-Report.html
  • [11] Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794 (2016)
  • [12] Chen, W., Zheng, Z., Cui, J., Ngai, E., Zheng, P., Zhou, Y.: Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In: Proceedings of the 2018 world wide web conference. pp. 1409–1418 (2018)
  • [13] Chen, W., Zheng, Z., Ngai, E.C.H., Zheng, P., Zhou, Y.: Exploiting blockchain data to detect smart ponzi schemes on ethereum. IEEE Access 7, 37575–37586 (2019)
  • [14] Chen, W., Li, X., Sui, Y., He, N., Wang, H., Wu, L., Luo, X.: SADPonzi: Detecting and characterizing Ponzi schemes in Ethereum smart contracts. Proceedings of the ACM on Measurement and Analysis of Computing Systems 5(2), 1–30 (2021)
  • [15] Coinmarketcap: Cryptocurrencies Ranking (2021), https://coinmarketcap.com/
  • [16] Coinmarketcap: Total Cryptocurrency Market Cap (2023), https://coinmarketcap.com/charts/
  • [17] Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE transactions on information theory 13(1), 21–27 (1967)
  • [18] Deng, H., Runger, G., Tuv, E., Vladimir, M.: A time series forest for classification and feature extraction. Information Sciences 239, 142–153 (2013)
  • [19] Dinh, T.T.A., Liu, R., Zhang, M., Chen, G., Ooi, B.C., Wang, J.: Untangling blockchain: A data processing view of blockchain systems. IEEE Transactions on Knowledge and Data Engineering 30(7), 1366–1385 (2018). https://doi.org/10.1109/TKDE.2017.2781227
  • [20] Dorri, A., Kanhere, S.S., Jurdak, R.: Towards an optimized blockchain for IoT. In: 2017 IEEE/ACM Second International Conference on Internet-of-Things Design and Implementation (IoTDI). pp. 173–178. IEEE (2017)
  • [21] Dutta, P., Choi, T.M., Somani, S., Butala, R.: Blockchain technology in supply chain operations: Applications, challenges and research opportunities. Transportation research part e: Logistics and transportation review 142, 102067 (2020)
  • [22] Ethereum: Solidity (2015), https://docs.soliditylang.org/en/v0.8.16/
  • [23] Fan, S., Fu, S., Xu, H., Cheng, X.: Al-SPSD: Anti-leakage smart Ponzi schemes detection in blockchain. Information Processing & Management 58(4), 102587 (2021)
  • [24] Fan, S., Fu, S., Xu, H., Zhu, C.: Expose your mask: Smart ponzi schemes detection on blockchain. In: 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–7. IEEE (2020)
  • [25] Fulcher, B.D., Jones, N.S.: Highly comparative feature-based time-series classification. IEEE Transactions on Knowledge and Data Engineering 26(12), 3026–3037 (2014)
  • [26] Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing. pp. 878–887. Springer (2005)
  • [27] Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998)
  • [28] Huynh, P.D., De Silva, T., Dau, S.H., Li, X., Gondal, I., Viterbo, E.: From Programming Bugs to Multimillion-Dollar Scams: An Analysis of Trapdoor Tokens on Decentralized Exchanges. arXiv preprint arXiv:2309.04700 (2023)
  • [29] Hyndman, R.J., Wang, E., Laptev, N.: Large-scale unusual time series detection. In: 2015 IEEE international conference on data mining workshop (ICDMW). pp. 1616–1619. IEEE (2015)
  • [30] Ibba, G., Pierro, G.A., Di Francesco, M.: Evaluating machine-learning techniques for detecting smart ponzi schemes. In: 2021 IEEE/ACM 4th International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB). pp. 34–40. IEEE (2021)
  • [31] Jaiman, V., Urovi, V.: A consent model for blockchain-based health data sharing platforms. IEEE access 8, 143734–143745 (2020)
  • [32] Jung, E., Le Tilly, M., Gehani, A., Ge, Y.: Data mining-based ethereum fraud detection. In: 2019 IEEE International Conference on Blockchain (Blockchain). pp. 266–273. IEEE (2019)
  • [33] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017)
  • [34] Korpela, K., Hallikas, J., Dahlberg, T.: Digital supply chain transformation toward blockchain integration. In: proceedings of the 50th Hawaii international conference on system sciences (2017)
  • [35] Li, X., Jiang, P., Chen, T., Luo, X., Wen, Q.: A survey on the security of blockchain systems. Future Generation Computer Systems 107, 841–853 (2020). https://doi.org/https://doi.org/10.1016/j.future.2017.08.020, https://www.sciencedirect.com/science/article/pii/S0167739X17318332
  • [36] Lin, C., He, D., Huang, X., Khan, M.K., Choo, K.K.R.: DCAP: A Secure and Efficient Decentralized Conditional Anonymous Payment System Based on Blockchain. IEEE Transactions on Information Forensics and Security 15, 2440–2452 (2020). https://doi.org/10.1109/TIFS.2020.2969565
  • [37] Medsker, L.R., Jain, L.: Recurrent neural networks. Design and Applications 5, 64–67 (2001)
  • [38] Morkunas, V.J., Paschen, J., Boon, E.: How blockchain technologies impact your business model. Business Horizons 62(3), 295–306 (2019). https://doi.org/https://doi.org/10.1016/j.bushor.2019.01.009, https://www.sciencedirect.com/science/article/pii/S0007681319300096
  • [39] Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Decentralized Business Review (2008)
  • [40] Novo, O.: Blockchain meets IoT: An architecture for scalable access management in IoT. IEEE internet of things journal 5(2), 1184–1195 (2018)
  • [41] Panarello, A., Tapas, N., Merlino, G., Longo, F., Puliafito, A.: Blockchain and iot integration: A systematic survey. Sensors 18(8),  2575 (2018)
  • [42] Rokach, L.: Decision forest: Twenty years of research. Information Fusion 27, 111–125 (2016)
  • [43] Securities, U., Commission, E.: What Is A Ponzi Scheme? (2009), https://www.sec.gov/spotlight/enf-actions-ponzi.shtml
  • [44] Sun, W., Xu, G., Yang, Z., Chen, Z.: Early detection of smart ponzi scheme contracts based on behavior forest similarity. In: 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). pp. 297–309. IEEE (2020)
  • [45] Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of chemical information and computer sciences 43(6), 1947–1958 (2003)
  • [46] Szabo, N.: Smart Contracts: Building Blocks for Digital Markets (1996), https://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/ Literature/LOTwinterschool2006/szabo.best.vwh.net/smart_contracts_2.html
  • [47] The Etherscanners team: Average daily transaction fee (2022), https://etherscan.io/chart/avg-txfee-usd
  • [48] team, T.E.: The Ethereum Blockchain Explorer (2015), https://etherscan.io/
  • [49] Theodouli, A., Arakliotis, S., Moschou, K., Votis, K., Tzovaras, D.: On the design of a blockchain-based system to facilitate healthcare data sharing. In: 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). pp. 1374–1379. IEEE (2018)
  • [50] Vasek, M., Moore, T.: There’s no free lunch, even using Bitcoin: Tracking the popularity and profits of virtual currency scams. In: Böhme, R., Okamoto, T. (eds.) Financial Cryptography and Data Security. pp. 44–61. Springer Berlin Heidelberg, Berlin, Heidelberg (2015)
  • [51] Wang, L., Cheng, H., Zheng, Z., Yang, A., Zhu, X.: Ponzi scheme detection via oversampling-based Long Short-Term Memory for smart contracts. Knowledge-Based Systems 228, 107312 (2021)
  • [52] Wang, S.C.: Artificial neural network. In: Interdisciplinary computing in java programming, pp. 81–100. Springer (2003)
  • [53] Wang, X., Smith, K., Hyndman, R.: Characteristic-based clustering for time series data. Data mining and knowledge Discovery 13(3), 335–364 (2006)
  • [54] Wiki: Gini coefficient (2022), https://en.wikipedia.org/wiki/Gini_coefficient
  • [55] Wood, G.: Ethereum: A Secure Decentralised Generalised Transaction Ledger (2014), https://gavwood.com/paper.pdf
  • [56] XBlock: XBlock: Ethereum On-chain Data (2022), https://xblock.pro/xblock-eth.html
  • [57] Zhang, Y., Yu, W., Li, Z., Raza, S., Cao, H.: Detecting Ethereum Ponzi schemes based on improved LightGBM algorithm. IEEE Transactions on Computational Social Systems 9(2), 624–637 (2021)
  • [58] Zhao, Q., Chen, S., Liu, Z., Baker, T., Zhang, Y.: Blockchain-based privacy-preserving remote data integrity checking scheme for IoT information systems. Information Processing & Management 57(6), 102355 (2020). https://doi.org/https://doi.org/10.1016/j.ipm.2020.102355, https://www.sciencedirect.com/science/article/pii/S0306457320308505
  • [59] Zheng, P., Zheng, Z., Wu, J., Dai, H.N.: XBlock-ETH: Extracting and exploring blockchain data from ethereum. IEEE Open Journal of the Computer Society 1, 95–106 (2020). https://doi.org/10.1109/ojcs.2020.2990458
  • [60] Zhou, Y., Kumar, D., Bakshi, S., Mason, J., Miller, A., Bailey, M.: Erays: Reverse engineering ethereum’s opaque smart contracts. In: 27th USENIX Security Symposium (USENIX Security 18). pp. 1371–1385 (2018)

Appendix 0.A Ponzi Types

Bartoletti et al. [3] defined the following types of Ponzi schemes on blockchain.

Chain-shaped schemes use a linear money distribution mechanism. These schemes often commit to paying investors a multiple, e.g., double, of their original investments. Each new investor joining the scheme is appended to a payment list in their order of arrival. Each investor in the list is paid in full with their promised amount whenever the accumulated fund (minus some commission fee) is sufficient. These schemes will collapse when the investment becomes too large to fulfill and the waiting time of late comers grows.

Tree-shaped schemes use a tree structure to manage the money redistribution, in which an inviter is a parent node, and the invitees are their children. Once a new investor joins the scheme, his investment is split and distributed among the ancestors: the nearer an ancestor is, the more he will receive. In this type of Ponzi scheme, investors cannot guess how much they will gain because their profit depends on how many users they and their descendants can invite and also how much these users pay. Similar to other schemes, tree-shaped Ponzi collapses when there are no or too few users joining.

Handover schemes, like Chain-shaped schemes, also use a linear payment list. However, instead of gathering newcomers’ investments, these schemes require the entry toll, which increases every time a new user joins this scheme. At a time, only one user is invited by the last user in the list, and the new entry toll is paid entirely to the inviter to make an instant profit. Once the inviter is paid, he hands the privilege over to the following user who just came.

Waterfall schemes are similar to chain-shaped schemes in payment order but different in money distribution logic. Every new investment is distributed along the list of existing investors from the first to the last or until the fund is exhausted. This first-join-first-receive logic implies that the later joining investors are less likely to reap any profit.

Appendix 0.B Account feature list

The list of 29 different account features, which are used to represent the general characteristics of an application, are given below.

0.B.1 List of features from Chen et al. [12]

  • know_rate: the proportion of participants who have invested before payment.

  • balance: the balance of a smart contract.

  • num_in_txs: total number of transactions sent to a contract (investments).

  • num_out_txs: total number of transactions sent out from a contract (payments).

  • difference_idx: the difference measurement of counts between payment and investment.

  • paid_rate: the proportion of investors who received at least one payment.

  • max_pay: maximum number of payment to a participant.

0.B.2 List of features from Jung et al. [32].

  • num_in_txs (abv): total number of transactions sent to a contract (investments).

  • num_out_txs (abv): total number of transactions sent out from a contract (payments).

  • total_inv_amt: total ETH amount sent to a contract (total investment amount).

  • total_pay_amt: total ETH amount sent out from a contract (total payment amount).

  • avg_inv_amt: the average of ETH amounts sent to a contract.

  • avg_pay_amt: the average of ETH amounts sent out from a contract.

  • dev_inv_amt: the standard deviation of ETH amounts sent to a contract.

  • dev_pay_amt: the standard deviation of ETH amounts sent out from a contract.

  • avg_time_btw_txs: the average of time distance between two consecutive transactions.

  • life_time: a contract lifetime.

  • gini_amt_in: the Gini coefficient of total ETH amount sent to a contract.

  • gini_amt_out: the Gini coefficient of total ETH amount received from a contract.

  • avg_time_btw_txs: the average of time distance between two consecutive transactions.

  • overlap_addr: the number of addresses that invested and be paid by contract.

  • gini_time_in: the Gini coefficient of number of transactions sent to a contract.

  • gini_time_out: the Gini coefficient of number of transactions sent out from a contract.

  • num_inv_acc: number of distinct account addresses that send ETH to a contract.

  • num_pay_acc: number of distinct account addresses that send ETH to a contract.

* We ignored the irrelevant (contract-code-based) feature size_info.

0.B.3 Other Account Features

  • balance_rate: ratio between balance and total investment.

  • payment_time: ratio between balance and total investment.

  • num_all_txs: total number of tranasctions.

  • num_in_txs: total number of transactions sent to a contract.

  • num_out_txs: total number of transactions sent out from a contract.

  • pay_skewness: a payment skewness.

Appendix 0.C Time-series list

Below is the list of 43 different time series we used to represent the change of information associated with an application throughout its lifetime in different aspects. Those time series were derived from basic transaction information. We then grouped them by the information from which they were created. All of these data depend on time, e.g., which day they were measured.

0.C.1 ETH value:

  • balance: the amount of ETH in a contract.

  • profit_and_loss: subtraction of total investments (profit) and total payments (loss) of a contract.

  • loss: total ETH amounts that a contract pays to its participants.

  • loss_by_contract: total ETH amount sent from a contract to other contracts.

  • loss_by_person: total ETH amount sent from a contract to the other user accounts.

  • loss_from_internal_txs: total ETH amount recorded by internal transactions that a contract pays to its participants.

  • loss_from_normal_txs: total ETH amount recorded by external transactions that a contract pays to its participants.

  • profit: total ETH amount that the contract received from its participants.

  • profit_by_contract: total ETH amount that the contract received from other contracts.

  • profit_by_person: total ETH amount that the contract received from other user accounts.

  • profit_from_internal_txs: total ETH amount recorded by internal transactions that a contract pays received from its participants.

  • profit_from_normal_txs: total ETH amount recorded by external transactions that a contract pays received from its participants.

0.C.2 Transaction:

  • total_txs: total number of transactions.

  • total_internal_txs: total number of internal transactions

  • total_in_coming_txs: total number of transactions sent to a contract.

  • total_in_coming_internal_txs: total number of internal transactions sent to the contract.

  • total_in_coming_normal_txs: total number of external transactions sent to the contract.

  • total_normal_txs: total number of external transactions

  • total_out_going_txs: total number of transactions sent from a contract.

  • total_out_going_internal_txs: total number of internal transactions sent from a contract.

  • total_out_going_normal_txs: total number of external transactions sent from a contract.

0.C.3 Participant address:

  • total_unique_addresses: total number of distinct participants (addresses) of a contract.

  • total_unique_in_coming_addresses: total number of distinct participants who sent transactions to a contract.

  • total_unique_in_coming_addresses_from_internal: total number of distinct participants who sent internal transactions to a contract.

  • total_unique_in_coming_addresses_from_normal: total number of distinct participants who sent external transactions to a contract.

  • total_unique_out_going_addresses: total number of distinct participants who receive transactions from a contract.

  • total_unique_out_going_addresses_from_internal: total number of distinct participants who receive internal transactions from a contract.

  • total_unique_out_going_addresses_from_normal: total number of distinct participants who receive external transactions from a contract.

0.C.4 Calling Function666 Calling functions can be retrieved from a transaction’s input data:

  • total_unique_calling_function: total number of distinct functions were called by a contract or its participants.

  • total_unique_in_coming_calling_function: total number of distinct functions called by participants.

  • total_unique_in_coming_calling_function_from_internal: total number of distinct functions called via internal transactions by participants.

  • total_unique_in_coming_calling_function_from_normal: total number of distinct functions called via external transactions by participants.

  • total_unique_out_going_calling_function: total number of distinct functions called by contracts.

  • total_unique_out_going_calling_function_from_internal: total number of distinct functions called via internal transactions by contracts

  • total_unique_out_going_calling_function_from_normal: total number of distinct functions called via external transactions by contracts

0.C.5 Participant Account Type:

  • num_in_coming_txs_from_contract: number of transactions sent to a contract from other contracts

  • num_in_coming_txs_from_person: number of transactions sent to a contract from other user accounts.

  • num_out_going_txs_to_contract: number of transactions sent from a contract to other contracts

  • num_out_going_txs_to_person: number of transactions sent from a contract to other user accounts.

  • num_unique_in_coming_contract_address: number of distinct contracts that sent transactions to a contract

  • num_unique_in_coming_person_address: number of distinct user accounts that sent transactions to a contract

  • num_unique_out_going_contract_address: number of distinct contracts that received transactions from a contract

  • num_unique_out_going_person_address: number of distinct user accounts that received transactions from a contract.

Appendix 0.D Time-series Statistical Measures

Below are the 12 statistical measures that were used to capture the characteristics of a time series.

  • Mean: Mean value of intervals

  • Var: Variance value of intervals

  • ACF1: First order of auto-correlation of the series

  • Linearity:Strength of linearity calculated based on the coefficients of an orthogonal quadratic regression

  • Curvature: Strength of curvature calculated based on the coefficients of an orthogonal quadratic regression

  • Trend: Strength of trend of a time-series based on an STL decomposition

  • Season: Strength of seasonality of a time-series based on an STL

  • Entropy: Spectral entropy measures the “forecastability” of a time-series, where low values indicate a high signal-to-noise ratio, and large values occur when a series is difficult to forecast

  • Lumpiness: Changing variance in remainder computed on non-overlapping windows

  • Spikiness: Strength of spikiness which is variance of the leave-one-out variances of the remainder component

  • Fspots: Flat spot using discretization computed by dividing the sample space of a time-series into ten equal-sized intervals, and computing the maximum run length within any single interval.

  • Cpoints: The number of times a time-series crosses the mean line

Appendix 0.E Refined Feature List of 85 Features

Rank Feature Type Rank Feature Type
1 avg_inv_amt ACC 44 profit_and_loss_entropy TD
2 num_all_txs ACC 45 profit_and_loss_variance TD
3 balance ACC 46 profit_and_loss_curvature TD
4 avg_pay_amt ACC 47 difference_idx ACC
5 total_txs_lumpiness TD 48 total_in_coming_normal_txs_fspots TD
6 avg_time_btw_txs ACC 49 total_in_coming_internal_txs_acf1 TD
7 total_in_coming_txs_lumpiness TD 50 gini_amt_in ACC
8 balance_rate ACC 51 total_txs_spikiness TD
9 dev_inv_amt ACC 52 loss_lumpiness TD
10 num_in_txs ACC 53 total_txs_acf1 TD
11 paid_rate ACC 54 total_in_coming_normal_txs_entropy TD
12 gini_amt_out ACC 55 gini_time_out ACC
13 profit_and_loss_acf1 TD 56 total_in_coming_normal_txs_mean TD
14 total_internal_txs_lumpiness TD 57 total_internal_txs_mean TD
15 dev_pay_amt ACC 58 profit_from_normal_txs_mean TD
16 total_inv_amt ACC 59 total_in_coming_internal_txs_fspots TD
17 gini_time_in ACC 60 profit_and_loss_trend TD
18 payment_time ACC 61 nbr_tx_in ACC
19 total_in_coming_internal_txs_linearity TD 62 total_in_coming_normal_txs_linearity TD
20 total_in_coming_internal_txs_lumpiness TD 63 total_in_coming_internal_txs_entropy TD
21 total_in_coming_normal_txs_lumpiness TD 64 total_pay_amt ACC
22 profit_mean TD 65 profit_spikiness TD
23 profit_and_loss_mean TD 66 total_in_coming_normal_txs_cpoints TD
24 total_internal_txs_cpoints TD 67 num_inv_acc ACC
25 total_txs_fspots TD 68 total_internal_txs_spikiness TD
26 profit_and_loss_linearity TD 69 total_txs_variance TD
27 total_txs_linearity TD 70 total_in_coming_txs_acf1 TD
28 know_rate ACC 71 num_out_going_txs_to_contract_lumpiness TD
29 total_txs_cpoints TD 72 total_in_coming_txs_entropy TD
30 balance_acf1 TD 73 total_txs_trend TD
31 total_txs_mean TD 74 profit_linearity TD
32 total_in_coming_txs_linearity TD 75 total_in_coming_normal_txs_spikiness TD
33 profit_lumpiness TD 76 total_in_coming_normal_txs_acf1 TD
34 profit_entropy TD 77 loss_entropy TD
35 profit_and_loss_fspots TD 78 num_out_going_txs_to_person_entropy TD
36 total_internal_txs_linearity TD 79 profit_from_normal_txs_entropy TD
37 profit_acf1 TD 80 total_internal_txs_acf1 TD
38 total_txs_entropy TD 81 max_pay ACC
39 balance_mean TD 82 loss_acf1 TD
40 overlap_addr ACC 83 total_in_coming_internal_txs_spikiness TD
41 total_in_coming_internal_txs_mean TD 84 num_out_going_txs_to_contract_spikiness TD
42 total_internal_txs_entropy TD 85 num_out_going_txs_to_contract_mean TD
43 total_out_going_txs_entropy TD