[1]\fnmMing-Fong \surSie [2,4]\fnmYen-Jui \surChang [3,4]\fnmChing-Ray \surChang [1]\fnmShih-Wei \surLiao
[1]\orgdivDepartment of Computer Science and Information Engineering, \orgnameNational Taiwan University, \orgaddress\streetNo. 1, Sec. 4, Roosevelt Rd., \cityTaipei City, \postcode10617, \stateTaiwan 2]\orgdivMaster Program in Intelligent Computing and Big Data, \orgnameChung Yuan Christian University, \orgaddress\streetNo. 499, Xingzhong N. Rd., \cityTaoyuan City, \postcode320317, \stateTaiwan 3]\orgdivDepartment of Physics, \orgnameNational Taiwan University
4]\orgdivQuantum Information Center, \orgnameChung Yuan Christian University, \orgaddress\streetNo. 499, Xingzhong N. Rd., \cityTaoyuan City, \postcode320317, \stateTaiwan
Efficient Bitcoin Address Classification Using Quantum-Inspired Feature Selection
Abstract
Over 900 million Bitcoin transactions have been recorded, posing considerable challenges for machine learning regarding computation time and maintaining prediction accuracy. We propose an innovative approach using quantum-inspired algorithms implemented with Simulated Annealing and Quantum Annealing to address the challenge of local minima in solution spaces. This method efficiently identifies key features linked to mixer addresses, significantly reducing model training time. By categorizing Bitcoin addresses into six classes—exchanges, faucets, gambling, marketplaces, mixers, and mining pools—and applying supervised learning methods, our results demonstrate that feature selection with SA reduced training time by 30.3% compared to using all features in a random forest model while maintaining a 91% F1-score for mixer addresses. This highlights the potential of quantum-inspired algorithms to swiftly and accurately identify high-risk Bitcoin addresses based on transaction features.
keywords:
Bitcoin, Blockchain, Feature Selection, Machine Learning, Quantum-Inspired Acceleration1 Introduction
As traditional computing progresses, it is approaching the boundaries of its capabilities when faced with complex challenges like optimization or quantum phenomena simulation. To address these challenges, quantum-inspired computing has emerged, drawing inspiration from quantum mechanics to explore new dimensions of computational power. Quantum-inspired Ising machines offer a unique approach to solving integer programming problems by transforming them into equivalent Ising model representations. The Ising model [1], originally formulated to describe magnetic interactions, represents atoms (or spins) in two states (+1 or -1), with their interactions determining the system’s overall energy. Many optimization problems can be translated into an energy minimization problem of the Ising model, with quantum or quantum-inspired methods used to find the configuration with the lowest energy.
Quantum-inspired algorithms leverage principles like superposition and tunneling to efficiently navigate the energy landscape of the Ising model. This enables them to find optimal solutions more effectively, especially in vast solution spaces with numerous local optima, bridging the gap between classical and quantum computing [2, 3].
Bitcoin was launched in 2009, and it was a game-changer in the world of cryptocurrency[4]. It introduced a decentralized ledger that serves two main purposes: as a medium of exchange and a store of value[5]. However, certain services known as ”mixers” make Bitcoin transactions difficult to trace. Mixers are used to combine multiple users’ cryptocurrency into target wallets, complicating the tracing process back to the original source[6]. Money launderers can use mixer services to hide illegal activities by converting their illicit proceeds into Bitcoin and using mixer services to obscure the origin[7].
Bitcoin’s blockchain has recorded over 900 million transactions, making data analysis tasks such as predicting and understanding transaction behavior significantly challenging. Applying machine learning algorithms to this dataset presents hurdles like dataset imbalance[8], high dimensionality, and dynamic evolution. Dataset imbalance occurs when certain transaction types significantly outnumber others, leading to difficulty in obtaining accurate predictions. The complex features associated with each transaction lead to high dimensionality, and the continuous evolution of the data structure results in dynamic changes over time. Consequently, machine learning methods often incur high computational costs without guaranteed accuracy improvements. Given these challenges, exploring alternative approaches tailored to blockchain data is essential. Mixer addresses are crucial to regulatory bodies and entities aiming to maintain blockchain integrity[9]. Chang et al. [10] provide a comprehensive overview of quantum computing applications in traditional and blockchain financial systems.
We categorized Bitcoin addresses into various groups, such as exchanges, faucets, gambling, marketplaces, mixers, and mining pools, to create a structured framework for analysis. Our main focus is on mixer addresses, which are especially interesting because of their potential use in illegal activities and the privacy implications. Mixer addresses are used to obscure the trail of Bitcoin transactions, making them crucial to both regulatory bodies and entities that aim to maintain the blockchain’s integrity.[9]
Our research proposes Quantum-Inspired Feature Selection (QIFS) that employs a Quadratic Unconstrained Binary Optimization (QUBO)-like model, aiming to balance feature influence and dependence for optimal classification performance to simplify the feature selection process for Bitcoin transaction records and use supervised learning techniques to construct a predictive model for the identification of mixer addresses, with a particular emphasis on random forests[11] and gradient-boosting decision trees[12]. These methods are complemented by cross-validation to ensure the reliability and generalizability of our model. The efficacy of our approach is evaluated using the receiver operating characteristic (ROC) curve[13], a tool to assess the performance of classification models.
Our study has produced promising findings that demonstrate a significant reduction in the training time of the random forest model—by 30.3%—while maintaining a 91% F1-score in identifying mixer addresses. These results not only validate the effectiveness of our quantum-inspired algorithms with QUBO to select features to accelerate training time but also highlight its potential for practical applications, especially in identifying high-risk Bitcoin addresses.
2 Method
2.1 Quantum-Inspired Feature Selection and Optimization
Large blockchain datasets often contain redundant information, leading to increased computational costs. Feature selection aims to distill essential information for efficient analysis. Our QIFS method identifies the most relevant features for Bitcoin mixer activity from transaction data, optimizing feature influence and independence through a QUBO model[14].
The cleaned blockchain transaction data is represented as a feature matrix , where each of the columns corresponds to a feature, and each of the rows represents transaction data associated with a Bitcoin address:
We also define an outcome vector , which records the classification results for Bitcoin addresses:
Here, is a binary variable that takes on values 0 (rejection in Bitcoin classification) or 1 (acceptance).
We aim to select a subset of features from the original features to classify Bitcoin addresses. The QUBO model, adapted from Milne et al. [14], balances feature influence and dependence, using the Spearman rank correlation coefficient to evaluate the relevance and distinctiveness of each feature.
The objective function is:
where is a binary variable indicating if feature is selected, and (where ) weights the influence term. The negative sign in the first term aims to maximize influence, while the second term seeks to minimize dependence.
This cost function is optimized using a quantum-inspired machine to determine the feature subset that best balances the influence-dependence trade-off:
where is a matrix encoding both influence and independence factors.
Figure 2 illustrates our proposed Quantum-Inspired Feature Selection pipeline.
2.2 Quantum-Inspired Acceleration and Machine Learning
Quantum-inspired algorithms employ heuristic methods that approximate quantum behaviors, such as tunneling, to enhance optimization processes [15]. These methods leverage an expanded search space, allowing for more efficient exploration compared to classical methods, and can potentially escape local minima to improve identification of the global optimum.
Simulated Annealing (SA) [16] is another technique to optimize the QUBO formulation during feature selection. SA mimics slowly cooling a material to reach its lowest energy state. By gradually reducing the temperature, SA probabilistically explores the solution space, reducing the likelihood of becoming trapped in local minima and increasing the probability of finding a global optimum. In our methodology, we employ simulated annealing to enhance feature selection, particularly when dealing with large and complex feature spaces.
Quantum annealing (QA) leverages quantum hardware to solve QUBO formulations directly. By formulating our feature selection problem as a QUBO model and utilizing quantum annealing for optimization, we benefit from faster convergence to optimal solutions compared to classical optimization methods.
Quantum-inspired machines, SA, and QA are all applied to further optimize QUBO to identify the global minimum energy cost during feature selection, thereby accelerating the supervised classification process. In our approach, we implement all three techniques to determine the most effective feature selection strategy for our specific task.
In this study, we evaluate the effectiveness of features selected using QIFS, implemented via SA and quantum methods such as D-Wave’s QA and Binary Quadratic Model (BQM). The feature sets are validated on Bitcoin address classification tasks using 10-fold cross-validation, focusing on identifying mixer classes. Metrics such as precision and AUC are used to demonstrate the robustness and utility of features selected by these methods.
Figure 3 provides an overview of our quantum-inspired approaches to feature selection, highlighting similarities and differences between the optimization techniques.
By leveraging quantum-inspired optimization, we aim to achieve high accuracy while minimizing computational overhead. These approaches significantly enhance processing efficiency, enabling effective classification even with large and complex datasets.
3 Experiments
Our experiment is divided into several distinct parts, as illustrated in Figure 4 and implemented in code available at Siemingfong/Quantom Annealing222https://github.com/Siemingfong/Quantom_Annealing.
3.1 Data Collection
We set up a full Bitcoin node leveraging 64GB of RAM, a 12th Gen Intel® Core™ i9-12900K CPU, GeForce GTX 1060 6 GB GPU, and 4 TB of SSD storage, granting us access to the complete raw transaction history of the network. This history contains detailed information about each transaction, including its hash, inputs, outputs, block height, and timestamp. The retrieval process took four months, covering 544,283 blocks and 345,882,038 transactions. Due to computational limitations, we focused on analyzing the first 1,000 transactions associated with each Bitcoin address.
3.2 Bitcoin Address Labeling
We used WalletExplorer.com333walletexplorer.com to acquire a dataset of 694,676 labeled Bitcoin addresses, interacting with 3,459,773 transactions. These addresses were categorized into six classes: exchange, faucet, gambling, market, mixer, and mining pool. Table 1 summarizes the distribution of addresses and corresponding transactions across these categories.
3.3 Smmarizing Transaction History Features
We processed the transaction history for each address by aggregating all relevant transactions into a single sheet record. Then these records were sorted chronologically prior to feature extraction. Table 2 details the selected features, which are categorized into five groups: basic statistics features, extra statistics features, moment features and transaction patterns features. Calculating each feature requires access to the complete transaction history for an address. This requires retrieving all transactions involving the address before computing any statistical measures. By iterating through the sorted transaction history only once, we can efficiently extract all 69 features.
Category | Addresses | Interactive Addresses | Interactive Transactions |
---|---|---|---|
Exchange | 163,537 | 675,694 | 2,582,337 |
Faucet | 14,720 | 15,359 | 87,245 |
Gambling | 78,698 | 107,661 | 170,815 |
Market | 102,129 | 57,198 | 181,337 |
Mixer | 289,006 | 166,536 | 234,449 |
Pool | 46,586 | 26,127 | 203,590 |
Total | 694,676 | 1,048,575 | 3,459,773 |
Feature | Description |
---|---|
Lifespan of Bitcoin address transaction frequency. | |
Ratio of received transactions to total transactions. | |
Ratio of received coinbase transaction to total transactions. | |
Frequency of digit transactions spent in USD, where ranges from to . | |
Frequency of digit transactions received in USD, where ranges from to . | |
Ratio of Bitcoin addresses in both inputs and outputs. | |
Average number of inputs used in spent transactions. | |
Average number of outputs used in spent transactions. | |
Basic Statistics | |
lifetime | Lifespan of transactions days. |
Total Bitcoin amount spent. | |
Total Bitcoin amount received. | |
Total spent in USD, converted using daily BTC/USD rates from finance.yahoo.com444finance.yahoo.com/. | |
Total received in USD, converted using daily BTC/USD rates from finance.yahoo.com444finance.yahoo.com/. | |
Total number of transactions. | |
Total number of spent transactions. | |
Total number of received transactions. | |
Total number of coinbase transactions. | |
Total number of payback transactions. | |
Average BTC amount held after each transaction. | |
Standard deviation of post-transaction BTC balances. | |
Average USD amount held after each transaction. | |
Standard deviation of post-transaction USD balances. | |
Standard deviation of total number of received transactions inputs. | |
Standard deviation of total number of spent transactions outputs. | |
Extra Statistics | |
Distribution of the total transaction in moments. | |
Distribution moments of spent transactions. | |
Distribution moments of received transactions. | |
Distribution moments of coinbase transactions. | |
Distribution moments of payback transactions. | |
Distribution moments of transaction intervals. | |
Moments | |
Total number of multiple input transactions. | |
Total number of multiple output transactions. | |
Both transactions involve multiple inputs and outputs. | |
Tranascion Patterns |
The four transaction types, coinbase, spent, received, and payback, are mutually exclusive. A transaction is categorized as a coinbase transaction if it has a coinbase input, representing a block reward. If it lacks a coinbase input and the sender’s address appears in any of its inputs, it’s identified as a spent transaction. Conversely, if the address appears only in the outputs, it’s classified as a received transaction. Finally, transactions with the address in both inputs and outputs are categorized as payback transactions. To understand the behavior of the address, we analyze the distribution moments of various types of transactions. These include total transactions, spent transactions, received transactions, coinbase transactions, payback transactions, and multiple input and output transactions. We believe that the frequency of transactions for each category helps identify address types. Beyond frequency, moments of these distributions and transaction intervals reveal further insight. For example, they can indicate an address that was initially active but has become dormant, or even exhibit periodic transaction patterns. In addition, distinguishing between spending, receiving, and payback transactions is crucial. Provides valuable clues on how addresses are being used within the Bitcoin transaction history.
3.4 Quantum-Inspired Implementation
Before the QISF process, each class is binarized as 0 or 1, transforming the data into a QUBO structure suitable for decision-making. Subsequently, the Spearman method is employed to compute correlations between each feature and the mixer class. The results of this correlation analysis are visualized in Figure 5. Table 3 evaluates the time efficiency of various QISF methods for the mixer class. SA, which selected 23 features, exhibited the lowest computational time at 0.0055 seconds. In comparison, QA with 9 features and QA BQM with 7 features required considerably more time, measuring 6.644 seconds and 6.0041 seconds, respectively. These results highlight that SA offers a significant computational speed advantage over other quantum-inspired algorithms in feature selection tasks.
Quantum-Inspired Algorithms | Features | Time |
---|---|---|
Simulated Annealing | 23 | 0.0055 s |
Quantum Annealing | 9 | 6.644 s |
Quantum Annealing BQM | 7 | 6.0041 s |
The QISF methodology is specifically tailored for the mixer class. For implementation, we utilized the Python library dwave-neal [17], which progressively converges toward the equilibrium distribution of the Ising model by iteratively updating spins in a sequence of increasing beta values. For quantum processing, the D-Wave Quantum Machine Advantage system 6.4, equipped with 5,612 working qubits, was employed. The D-Wave Hybrid solver (version 2.2) was used to solve general BQM problems. Tables 4, 5 provide details on the features selected by SA, QA, and QA BQM for the mixer class.
SA top 10 features.
Feature Name
Feature Type
Extra Stats
Extra Stats
Extra Stats
Extra Stats
lifetime
Extra Stats
Basic Stats
Basic Stats
Moments
Moments
Moments
SA top 11 to 23 features.
Feature Name
Feature Type
Extra Stats
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Moments
Extra Stats
Extra Stats
Extra Stats
Extra Stats
QA top 9 features.
Feature Name
Feature Type
Extra Stats
Extra Stats
Basic Stats
Basic Stats
Basic Stats
Moments
Moments
TX Pattern
TX Pattern
QA BQM top 7 features.
Feature Name
Feature Type
lifetime
Extra Stats
Extra Stats
Extra Stats
Basic Stats
Basic Stats
Moments
TX Pattern
3.5 Training Classifiers
We training of seven common classification algorithms include Logistic Regression[18], Adaptive Boosting with Decision Tree (AdaBoost-SAMME)[19, 20], Random Forest[11], Extreme Gradient Boosting (XGBoost)[12], LightGBM[21], SVM[22] and Neural Network. These algorithms serve as a baseline for comparison with the quantum approach. We leverage the Python machine learning library Scikit-learn[23] to employ seven different classifiers. To optimize each classifier and identify a suitable set of parameters, we utilize a 10-fold cross-validation. Notably, decision tree-based methods are unaffected by data normalization. Therefore, for these methods, we retain the original data. However, for classifiers like logistic regression and SVM, which rely on distance metrics, we normalize the features by dividing each dimension by its maximum absolute value.
Among the mixer classes, both Random Forest and SA Random Forest achieve the best F1 score, reaching 92% and 91%. SA Random Forest exhibits a faster 30.3% training time 212 minutes and 23.2 seconds compared to Quantum Random Forest 305 minutes and 5.7 seconds. Table 6 details the results of both all features and SA feature selection of mixer in supervised classifiers. Our experiments revealed that most classifiers performed consistently on the mixer category. However, Logistic Regression showed poor performance on this category. This suggests that tree-based classifiers might be better suited for this task compared to linear models.
Method | Precision | Recall | F1 | Accuracy | AUC | Training Time |
---|---|---|---|---|---|---|
Logistic Regression | 0.6 | 0.57 | 0.58 | 0.84 | 0.92 | 10 mins 56.8 s |
AdaBoost | 0.4 | 0.81 | 0.53 | 0.84 | 0.74 | 1106 mins 52.8 s |
Random Forest | 0.87 | 0.97 | 0.92 | 0.95 | 0.99 | 305 mins 5.7 s |
XGBoost | 0.86 | 0.89 | 0.88 | 0.95 | 0.99 | 70 mins 5 s |
LightGBM | 0.85 | 0.88 | 0.87 | 0.94 | 0.98 | 56 mins 1.8 s |
SVM *10%111Using only 10% of the dataset | 0.73 | 0.37 | 0.49 | 0.85 | 0.96 | 532 mins 58.9 s |
Neural Network *10%222Neural Network trained with 10% of the available data | 0.68 | 0.93 | 0.79 | 0.85 | 0.98 | 87 mins 51.5 s |
SA Logistic Regression | 0.53 | 0.8 | 0.59 | 0.91 | 0.83 | 10 mins 49.6 s |
SA AdaBoost | 0.33 | 0.65 | 0.44 | 0.81 | 0.69 | 589 mins 21.1 s |
SA Random Forest | 0.85 | 0.97 | 0.91 | 0.94 | 0.99 | 212 mins 23.2 s |
SA XGBoost | 0.54 | 0.63 | 0.58 | 0.87 | 0.89 | 67 mins 13.2 s |
SA LightGBM | 0.81 | 0.84 | 0.83 | 0.94 | 0.96 | 37 mins 43.9 s |
SA SVM *10%333SVM trained with 10% of the dataset using SA | 0.22 | 1 | 0.36 | 0.42 | 0.71 | 4465 mins 27.2 s |
SA Neural Network *10%444Neural Network trained with 10% of the data using SA | 0.66 | 0.94 | 0.77 | 0.83 | 0.98 | 83 mins 3.7 s |
QA Logistic Regression | 0.49 | 0.54 | 0.50 | 0.82 | 0.82 | 6 mins 47.8 s |
QA AdaBoost | 0.57 | 0.49 | 0.52 | 0.85 | 0.84 | 513 mins 5.6 s |
QA Random Forest | 0.58 | 0.62 | 0.60 | 0.86 | 0.88 | 125 mins 27.7 s |
QA XGBoost | 0.54 | 0.63 | 0.58 | 0.87 | 0.89 | 63 mins 29.5 s |
QA LightGBM | 0.55 | 0.62 | 0.58 | 0.87 | 0.89 | 37 mins 43.9 s |
QA SVM *10%555SVM trained with 10% of the dataset using QA | 0.26 | 1 | 0.41 | 0.51 | 0.86 | 4492 mins 26.3 s |
QA Neural Network *10%666Neural Network trained with 10% of the data using QA | 0.54 | 0.60 | 0.57 | 0.85 | 0.88 | 87 mins 7.3 s |
QA BQM Logistic Regression | 0.51 | 0.56 | 0.53 | 0.85 | 0.84 | 6 mins 44.1 s |
QA BQM AdaBoost | 0.47 | 0.74 | 0.57 | 0.84 | 0.89 | 563 mins 56.8 s |
QA BQM Random Forest | 0.65 | 0.76 | 0.75 | 0.90 | 0.93 | 183 mins 17.5 s |
QA BQM XGBoost | 0.48 | 0.76 | 0.59 | 0.85 | 0.90 | 63 mins 36.6 s |
QA BQM LightGBM | 0.48 | 0.75 | 0.58 | 0.85 | 0.90 | 38 mins 25 s |
QA BQM SVM *10%777SVM trained with 10% of the dataset using QA | 0.54 | 0.09 | 0.16 | 0.83 | 0.86 | 4391 mins 7.1 s |
QA BQM Neural Network *10%888Neural Network trained with 10% of the data using BQM | 0.54 | 0.58 | 0.56 | 0.85 | 0.88 | 83 mins 49.5 s |
4 Performance Evaluation
We analyze the precision, recall, and F1-score for the following 6 classes: Exchange, Faucet, Gambling, Market, Mixer, and Pool. Specifically for the Mixer class, both the Random Forest and the SA Random Forest models achieved the highest F1-score of 0.92 and 0.91 shown in Figure 7 and Figure 7, indicating a strong and consistent classification performance. The QA Random Forest had a lower F1-score of 0.60, while the QA BQM Random Forest showed some improvement with an F1-score of 0.70.
Table 6 illustrates the performance of various supervised classifiers using full features, SA, QA and QA BQM for feature selection in the Mixer class. The classifiers evaluated include traditional models such as Logistic Regression and SVM, as well as ensemble methods like Random Forest, AdaBoost, and Gradient Boosting algorithms such as XGBoost and LightGBM. In several cases, models were trained on 10% of the available dataset to compare their performance when data is limited. For instance, the Neural Network *10% was trained using 10% of the data, while models like SA SVM *10% and QA SVM *10% were trained with reduced datasets using SA and QA feature selection techniques, respectively. Each model’s precision, recall, F1 score, accuracy, AUC, and training time were recorded, highlighting the trade-offs between computational cost and performance.
Overall, Table 6 demonstrates that traditional machine learning models Random Forest, XGBoost, and LightGBM achieved the highest accuracy around 95% and AUC around 0.99 within the mixer class. Notably, Random Forest exhibited the strongest F1-score 92% and recall 0.97% among all models. In terms of training time, LightGBM emerged as the fastest, taking only about 56 minutes, while AdaBoost was significantly slower, requiring over 1100 minutes. Interestingly, some SA for feature selection in Logistic Regression and Random Forest, achieved recall scores comparable to their classical counterparts. However, these quantum models generally exhibited lower precision and accuracy. Encouragingly, SA Random Forest showed a significant improvement in training time efficiency, offering a roughly 30% reduction compared to the classical Random Forest while maintaining high F1-score and recall.
In different classes of bitcoin addresses, the Random Forest model, as demonstrated in Figure 7, exhibited strong performance in certain categories. Specifically, it achieved excellent results for the Exchange class, with a precision of 0.91%, recall of 0.92%, and an F1-score of 0.92%. Likewise, the model performed well in the Market class, reaching a precision of 0.86%, recall of 0.85%, and an F1-score of 0.85%. For the Mixer class, the model showed particularly high recall 0.97% and also maintained strong precision 0.87% and an F1-score of 0.91%. However, the model faced challenges with the Faucet class, where it struggled to achieve a precision of 0.36%, recall of 0.20%, and an F1-score of 0.26%. In the Gambling class, the Random Forest showed a trade-off between precision 0.57% and recall 0.48%, leading to an F1-score of 0. 52%. For the Pool class, the model’s performance was more balanced, with a precision of 0.81%, recall of 0.69%, and an F1-score of 0.75%, indicating moderate success across these metrics.


The SA Random Forest model, as shown in Figure 7, demonstrated a performance similar to the Random Forest in the exchange, market and mixer classes, maintaining strong precision, recall and F1 scores. However, the SA Random Forest exhibited lower precision 0.30% and recall 0.23% for the Faucet class, further highlighting its challenges in this category compared to the standard model. In the Gambling class, the SA Random Forest also displayed a similar trade-off between precision 0.53% and recall 0.47%, leading to a marginally lower F1-score 0.50%. Additionally, the SA Random Forest demonstrated lower recall 0.63% for the Pool class compared to the Random Forest, although its precision remained relatively close at 0.78%, resulting in an overall F1-score of 0.70%.
The QA Random Forest model, as shown in Figure 9, exhibited more varied results. While it achieved moderate success in the Exchange class with a precision of 0.88%, it struggled with a much lower recall 0.63%, yielding an F1-score of 0.73%. Performance significantly declined in the Faucet class, where both precision 0.08% and recall 0.08% were extremely low. In the Gambling class, the Quantum Annealing Random Forest achieved a recall of 0.55% but at the expense of precision 0.22%, resulting in an F1-score of 0.31%. The model also struggled in the Market class, with a precision of 0.35% and recall of 0.52%, leading to an F1-score of 0.42%. Performance for the Mixer class was moderate, with a precision of 0.58%, recall of 0.62%, and an F1-score of 0.60%, and similar issues were seen in the Pool class, where the model achieved a precision of 0.30% and recall of 0.17%.


The QA BQM Random Forest model, as demonstrated in Figure 9, also had mixed results. It showed better performance than the Quantum Annealing Random Forest in the Exchange class, with a precision of 0.86% and recall of 0.64%, resulting in an F1-score of 0.74%. However, for the Faucet class, the model continued to struggle, with precision 0.10% and recall 0.38% leading to a low F1-score of 0.16%. In the Gambling class, the model’s precision 0.27% and recall 0.44% yielded an F1-score of 0.33%. The Market class saw slightly better results, with precision 0.36% and recall 0.56%, leading to an F1-score of 0.44%. In contrast, the Mixer class had a relatively good performance, with precision of 0.65%, recall of 0.76%, and an F1-score of 0.70%. The Pool class showed an improvement in precision 0.45% but lower recall 0.35%, yielding an F1-score of 0.39%.
In conclusion, both Random Forest and SA Random Forest demonstrated better overall classification performance, particularly in achieving a balance between precision and recall for the Exchange and Mixer classes. In contrast, QA Random Forest and QA BQM Random Forest struggled in certain classes, such as Gambling and Faucet, and may require further optimization to improve their precision and recall in those categories.
4.1 Confusion Matrix
We evaluate recall scores for each class within Bitcoin labeled Bitcoin addresses using several supervised machine learning classifiers, alongside SA, QA, and QA BQM quantum algorithms. These classifiers are assessed based on transaction history summaries of labeled addresses. Recall serves as a metric to quantify each model’s effectiveness in accurately identifying positive cases, where a positive case indicates the correct classification of a transaction history as belonging to the mixer class. Figures 11, Figure 11, Figure 13, Figure 13 illustrate the recall scores for each classifier across distinct classes. Notably, within the mixer class, both the Random Forest and SA Random Forest classifiers achieve the highest recall score of 97%, indicating their superior effectiveness in identifying mixer transactions. In contrast, QA and QA BQM Random Forest classifiers demonstrate lower recall performances, achieving scores of 0.62% and 0.76%, respectively.




4.2 Area under the ROC Curve
AUC reflects the probability of a classifier ranking a positive instance higher than a negative one. In Table 7, a higher AUC for the mixer class indicates a stronger ability to distinguish between mixer and othres Bitcoin labeled addresses. Notably, Random Forest, XGBoost, and SA Random Forest all achieve outstanding AUC scores of 0.99, rivaling those of exchange, market, and mining pool classes. This suggests these Bitcoin class modules exhibit stability and effectiveness.
Strikingly, among all classifiers evaluated, SA Random Forest and Random Forest emerged alongside XGBoost as the top performers, exceeding an AUC score of 0.99 for the mixer class. This exceptional performance suggests their remarkable ability to identify a high stable models of genuine mixer transactions. Particularly noteworthy, Random Forest and SA Random Forest demonstrates exceptional performance within the mixer class, achieving the best results among all models with a F1-score of 91% and an AUC of 0.99.
Method | Exchange | Faucet | Gambling | Market | Mixer | Pool |
---|---|---|---|---|---|---|
Logistic Regression | 0.89 | 0.74 | 0.72 | 0.87 | 0.92 | 0.91 |
AdaBoost | 0.89 | 0.67 | 0.37 | 0.96 | 0.74 | 0.80 |
Random Forest | 0.96 | 0.92 | 0.90 | 0.99 | 0.99 | 0.98 |
XGBoost | 0.96 | 0.93 | 0.89 | 0.99 | 0.99 | 0.98 |
LightGBM | 0.94 | 0.93 | 0.88 | 0.97 | 0.98 | 0.94 |
SVM *10%111Using only 10% of the dataset | 0.89 | 0.75 | 0.78 | 0.96 | 0.90 | 0.86 |
Neural Network *10%222Neural Network trained with 10% of the available data | 0.94 | 0.89 | 0.85 | 0.97 | 0.98 | 0.96 |
SA Logistic Regression | 0.81 | 0.82 | 0.63 | 0.84 | 0.83 | 0.85 |
SA AdaBoost | 0.85 | 0.61 | 0.37 | 0.91 | 0.69 | 0.79 |
SA Random Forest | 0.95 | 0.91 | 0.88 | 0.99 | 0.99 | 0.97 |
SA XGBoost | 0.85 | 0.93 | 0.88 | 0.99 | 0.98 | 0.97 |
SA LightGBM | 0.91 | 0.92 | 0.85 | 0.97 | 0.96 | 0.92 |
SA SVM *10%333SVM trained with 10% of the dataset using SA | 0.72 | 0.71 | 0.77 | 0.79 | 0.71 | 0.68 |
SA Neural Network *10%444Neural Network trained with 10% of the data using SA | 0.79 | 0.83 | 0.79 | 0.88 | 0.88 | 0.89 |
QA Logistic Regression | 0.76 | 0.63 | 0.74 | 0.82 | 0.82 | 0.76 |
QA AdaBoost | 0.76 | 0.52 | 0.52 | 0.79 | 0.84 | 0.72 |
QA Random Forest | 0.79 | 0.80 | 0.78 | 0.88 | 0.93 | 0.85 |
QA XGBoost | 0.79 | 0.84 | 0.79 | 0.91 | 0.90 | 0.90 |
QA LightGBM | 0.79 | 0.88 | 0.81 | 0.93 | 0.89 | 0.81 |
QA SVM *10%555SVM trained with 10% of the dataset using QA | 0.73 | 0.68 | 0.76 | 0.70 | 0.86 | 0.70 |
QA Neural Network *10%666Neural Network trained with 10% of the data using QA | 0.79 | 0.83 | 0.79 | 0.88 | 0.88 | 0.89 |
QA BQM Logistic Regression | 0.65 | 0.72 | 0.67 | 0.76 | 0.84 | 0.85 |
QA BQM AdaBoost | 0.75 | 0.70 | 0.57 | 0.77 | 0.89 | 0.78 |
QA BQM Random Forest | 0.79 | 0.80 | 0.78 | 0.88 | 0.93 | 0.85 |
QA BQM XGBoost | 0.79 | 0.84 | 0.79 | 0.91 | 0.90 | 0.90 |
QA BQM LightGBM | 0.78 | 0.83 | 0.79 | 0.89 | 0.90 | 0.89 |
QA BQM SVM *10%777SVM trained with 10% of the dataset using QA | 0.69 | 0.72 | 0.76 | 0.74 | 0.86 | 0.81 |
QA BQM Neural Network *10%888Neural Network trained with 10% of the data using BQM | 0.78 | 0.85 | 0.78 | 0.89 | 0.88 | 0.89 |
4.3 Important Features
We examine the most important features in different machine learning models, ranked based on their importance of information gain as described by Louppe et al.[24]. The random forest model shown in Table 8, the top 10 most important features of the random forest model are dominated by those related to transaction balances, including USD and BTC, and the transaction volume includes the number spent and received. This suggests that the model heavily relies on financial activity for classification. Features related to transaction patterns and moments like average and standard deviation of balances also appear within the top 20, indicating their influence on the model’s predictions.
SA Random Forest model shown in Table 9, the top 9 most important features show some overlap with the Random Forest model, with features related to transaction balances including USD and BTC, spending include USD and BTC appearing again. However, the SA Random Forest model seems to place less emphasis on transaction volume include number spent and received compared to the classical model. Interestingly, a new feature, lifetime, emerges as important in the SA Random Forest model, suggesting that the model might benefit from understanding the overall duration of user activity.
All Features
Feature Name
Feature Type
Extra Stats
Extra Stats
Extra Stats
Extra Stats
Extra Stats
Basic Stats
Extra Stats
Extra Stats
Extra Stats
TX Pattern
The top 11 to 20 features.
Feature Name
Feature Type
Basic Stats
TX Pattern
Moments
Moments
Moments
Moments
Extra Stats
Extra Stats
Extra Stats
Extra Stats
SA Model Features
Feature Name
Feature Type
Extra Stats
Extra Stats
Extra Stats
Extra Stats
Extra Stats
lifetime
Extra Stats
Moments
Extra Stats
Extra Stats
QA Random Forest model shown in Table 10, the top 6 most important features highlight the significance of received transactions and various statistical moments. In particular, the feature related to the USD received reappears, indicating its importance across models. Basic statistics such as transaction frequency also emerge, alongside higher moments that may provide deeper insights into the distribution of transaction amounts. Additionally, transaction patterns are emphasized through features that capture output behavior and overall transaction trends, suggesting that a holistic view of user activity is essential in this model.
QA Model Features
Feature Name
Feature Type
Extra Stats
Basic Stats
Moments
Moments
TX Pattern
TX Pattern
QA BQM Model Features.
Feature Name
Feature Type
Extra Stats
Basic Stats
lifetime
Extra Stats
Moments
TX Pattern
In the QA BQM Random Forest model, the top 5 important features reveal a continued emphasis on balance statistics and lifetime as significant factors. The average balance in BTC demonstrates its relevance, along with the transaction frequency as a basic statistic. The inclusion of the lifetime feature further aligns with the trend observed in other models, indicating its potential impact on predicting user behavior. Moments related to payback and overall transaction patterns are also included, reinforcing the model’s focus on understanding transactional dynamics over time.
5 Conclusion
In this work, we propose a novel framework for identifying Bitcoin mixer addresses by integrating QIFS with transaction history summaries and supervised machine learning classifiers. Our approach utilizes both existing statistical measures and newly introduced Transaction Pattern features, which significantly contribute to improved classification accuracy, particularly with the Random Forest model.
The transaction history summaries encompass essential information about Bitcoin addresses, including basic statistics, extended statistics, moments, and transaction patterns. The basic statistical features build on previous work by Toyoda et al. [25], while the extended statistics and moments were inspired by Lin et al. [26]. The Transaction Patterns features, informed by the MIOT model, were crucial in capturing mixer address activity.
Our experimental evaluation employed a range of classification algorithms, including Logistic Regression, AdaBoost-SAMME, Random Forest, XGBoost, LightGBM, SVM, and Neural Network, using the Scikit-learn Python library. To optimize the classifiers, we applied 10-fold cross-validation. The experiment results demonstrated that Random Forest and SA Random Forest were the best-performing models for the mixer class, achieving F1 scores of 92
We also evaluated the precision, recall, and F1 scores for six different address categories: Exchange, Faucet, Gambling, Market, Mixer, and Pool. Both Random Forest and SA Random Forest achieved high performance in the Mixer class, while QA and QA BQM-based classifiers showed lower effectiveness, indicating room for improvement.
Overall, the combination of QIFS, particularly with SA and Random Forest, demonstrates substantial potential for enhancing the efficiency and accuracy of Bitcoin mixer identification. This approach not only achieves competitive classification metrics but also reduces computational overhead, showcasing the viability of quantum-inspired feature selection in financial transaction analysis tasks.
5.1 Future Work
The current classification methodology is affected by data imbalance, as highlighted in Table 1. This imbalance often leads to increased false negatives and reduced sensitivity for minority classes, ultimately degrading the model’s overall reliability and robustness. To mitigate this limitation, we plan to integrate the Synthetic Minority Oversampling Technique (SMOTE) [27] prior to feature summarization. This technique is anticipated to enhance the recall for underrepresented classes, resulting in a more balanced and robust classification model. Additionally, incorporating cost-sensitive learning during the training phase of Random Forest and other decision tree-based algorithms could further improve model performance, as indicated by Seliya et al. [28].
We also propose conducting an ablation study to evaluate the contribution of each feature set to the overall model performance. Understanding the impact of individual feature sets will inform future improvements by identifying which features are most crucial for classification accuracy and which are redundant, allowing for a more efficient and streamlined model. This study will involve systematically removing different feature groups—such as Transaction Patterns—and assessing the resultant impact on classification accuracy. By quantifying the influence of individual feature sets and examining the effectiveness of quantum-inspired SA in feature selection, we aim to gain a deeper understanding of the factors driving performance improvements.
Moreover, the QIFS methodology demonstrates substantial potential beyond its current scope, which is currently focused on Bitcoin mixer address classification. By optimizing feature selection, QIFS can significantly reduce computational overhead for classical machine learning models, thereby facilitating efficient processing across a variety of domains where feature selection plays a critical role. This positions QIFS as a viable approach for broader applications in fields such as cybersecurity, healthcare, and predictive maintenance, where effective feature selection is pivotal for improving model efficiency and overall performance. For instance, in cybersecurity, QIFS can be used to identify key features for detecting network intrusions, enhancing the precision and speed of intrusion detection systems. In healthcare, QIFS could optimize the selection of biomarkers from high-dimensional genomic data, thus improving the accuracy of diagnostic models. In predictive maintenance, QIFS can help identify critical indicators of machinery health, allowing for proactive interventions and minimizing downtime.
Acknowledgements We acknowledge support from the National Science and Technology Council, Taiwan, under Grants NSTC 112-2119-M-033-001, by the research project Applications of quantum computing in optimization and finances.
References
- \bibcommenthead
- Brush [1967] Brush, S.G.: History of the lenz-ising model. Reviews of Modern Physics 39, 883–893 (1967)
- Chang et al. [2024] Chang, Y.-J., Nien, C.-F., Huang, K.-P., Zhang, Y.-T., Cho, C.-H., Chang, C.-R.: Quantum computing for optimization with ising machine. IEEE Nanotechnology Magazine 18(3), 15–22 (2024) https://doi.org/10.1109/MNANO.2024.3378485
- Mandal et al. [2021] Mandal, A.K., Panday, M., Biswas, A., Goswami, S., Chakrabarti, A., Chakraborty, B.: An approach of feature subset selection using simulated quantum annealing. In: Data Management, Analytics and Innovation: Proceedings of ICDMAI 2020, Volume 1, pp. 133–146 (2021). Springer
- Nakamoto [2008] Nakamoto, S.: Bitcoin: A peer-to-peer electronic cash system. Decentralized business review (2008)
- Baur and Dimpfl [2021] Baur, D.G., Dimpfl, T.: The volatility of bitcoin and its role as a medium of exchange and a store of value. Empirical Economics 61(5), 2663–2683 (2021)
- Rathore et al. [2022] Rathore, M.M., Chaurasia, S., Shukla, D.: Mixers detection in bitcoin network: a step towards detecting money laundering in crypto-currencies. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 5775–5782 (2022). IEEE
- Liu et al. [2021] Liu, M., Chen, H., Yan, J.: Detecting roles of money laundering in bitcoin mixing transactions: A goal modeling and mining framework. Frontiers in Physics 9, 665399 (2021)
- Ashfaq et al. [2022] Ashfaq, T., Khalid, R., Yahaya, A.S., Aslam, S., Azar, A.T., Alsafari, S., Hameed, I.A.: A machine learning and blockchain based efficient fraud detection mechanism. Sensors 22(19), 7162 (2022)
- Wu et al. [2021] Wu, L., Hu, Y., Zhou, Y., Wang, H., Luo, X., Wang, Z., Zhang, F., Ren, K.: Towards understanding and demystifying bitcoin mixing services. In: Proceedings of the Web Conference 2021, pp. 33–44 (2021)
- Chang et al. [2023] Chang, Y.-J., Sie, M.-F., Liao, S.-W., Chang, C.-R.: The prospects of quantum computing for quantitative finance and beyond. IEEE Nanotechnology Magazine 17(2), 31–37 (2023) https://doi.org/10.1109/MNANO.2023.3249501
- Breiman [2001] Breiman, L.: Random forests. Machine learning 45, 5–32 (2001)
- Chen and Guestrin [2016] Chen, T., Guestrin, C.: Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
- Zweig and Campbell [1993] Zweig, M.H., Campbell, G.: Receiver-operating characteristic (roc) plots: a fundamental evaluation tool in clinical medicine. Clinical chemistry 39(4), 561–577 (1993)
- Milne et al. [2017] Milne, A., Rounds, M., Goddard, P.: Optimal Feature Selection in Credit Scoring and Classification Using a Quantum Annealer. https://1qbit.com. White Paper, 1QB Information Technologies (2017)
- Hauke et al. [2019] Hauke, P., Katzgraber, H., Lechner, W., Nishimori, H., Oliver, W.: Perspectives of quantum annealing: methods and implementations. Reports on Progress in Physics 83 (2019) https://doi.org/10.1088/1361-6633/ab85b8
- Kirkpatrick et al. [1983] Kirkpatrick, S., Gelatt Jr, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
- Inc. [2023] Inc., D.-W.S.: dwave-neal: A Python Simulated Annealing Sampler Library (2023). https://pypi.org/project/dwave-neal/
- LaValley [2008] LaValley, M.P.: Logistic regression. Circulation 117(18), 2395–2399 (2008)
- Freund and Schapire [1997] Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1), 119–139 (1997)
- Hastie et al. [2009] Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class adaboost. Statistics and its Interface 2(3), 349–360 (2009)
- Ke et al. [2017] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, pp. 3146–3154 (2017)
- Cortes and Vapnik [1995] Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20, 273–297 (1995)
- Pedregosa et al. [2011] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Machine learning in python. the Journal of machine Learning research 12, 2825–2830 (2011)
- Louppe et al. [2013] Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Advances in Neural Information Processing Systems, pp. 431–439 (2013)
- Toyoda et al. [2018] Toyoda, K., Ohtsuki, T., Mathiopoulos, P.T.: Multi-class bitcoin-enabled service identification based on transaction history summarization. In: 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), pp. 1153–1160 (2018). IEEE
- Lin et al. [2019] Lin, Y.-J., Wu, P.-W., Hsu, C.-H., Tu, I.-P., Liao, S.-w.: An evaluation of bitcoin address classification based on transaction history summarization. In: 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), pp. 302–310 (2019). IEEE
- Chawla et al. [2002] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research 16, 321–357 (2002)
- Seliya and Khoshgoftaar [2011] Seliya, N., Khoshgoftaar, T.: The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (2011) https://doi.org/10.1002/widm.38