This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

CyberLearning: Effectiveness Analysis of Machine Learning Security Modeling to Detect Cyber-Anomalies and Multi-Attacks

Iqbal H. Sarker1,2∗ 1Department of Computer Science and Engineering,
Chittagong University of Engineering & Technology,
Chittagong-4349, Bangladesh.
2Swinburne University of Technology,
Melbourne, VIC-3122, Australia.
*Corresponding email: [email protected]
ORCID iD: https://orcid.org/0000-0003-1740-5517
Abstract

Detecting cyber-anomalies and attacks are becoming a rising concern these days in the domain of cybersecurity. The knowledge of artificial intelligence, particularly, the machine learning techniques can be used to tackle these issues. However, the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. In this paper, we present “CyberLearning”, a machine learning-based cybersecurity modeling with correlated-feature selection, and a comprehensive empirical analysis on the effectiveness of various machine learning based security models. In our CyberLearning modeling, we take into account a binary classification model for detecting anomalies, and multi-class classification model for various types of cyber-attacks. To build the security model, we first employ the popular ten machine learning classification techniques, such as naive Bayes, Logistic regression, Stochastic gradient descent, K-nearest neighbors, Support vector machine, Decision Tree, Random Forest, Adaptive Boosting, eXtreme Gradient Boosting, as well as Linear discriminant analysis. We then present the artificial neural network-based security model considering multiple hidden layers. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. Overall, this paper aims to serve as a reference point for data-driven security modeling through our experimental analysis and findings in the context of cybersecurity.

keywords:
cybersecurity; machine learning; deep learning; classification; feature selection; anomaly detection; cyber-attacks; security intelligence; cyber data analytics; intelligent systems.
journal: Journal: Internet of Things - Elsevier

1 Introduction

In recent days, the demand for cybersecurity and protection against cyber-anomalies and various types of attacks, such as unauthorized access, denial-of-service (DoS), botnet, malware, or worms has been ever increasing. Such anomalies led to irreparable damage and financial losses in large-scale computer networks [1] [2]. For example, one ransomware virus in May 2017 caused tremendous losses to many organizations and sectors, including banking, medical care, electricity, and universities, and caused a loss of 8 billion dollars [3]. In the domain of cybersecurity, such security breaches or intrusions have become the common issue these days while securing a cyber-system as well as an Internet of Things (IoT) system. Although various traditional methods, such as firewalls, encryption, etc., are designed to handle Internet-based cyber-attacks, an intelligent system that effectively detects such anomalies or attacks, is the key to tackle these issues. Thus, in this paper, we mainly focus on the knowledge of artificial intelligence, particularly, the applicability of machine learning security modeling, which could be more effective due to its automated learning capabilities from the training security data.

Developing machine learning-based security models to analyze various cyber-attacks or anomalies, and eventually detect or predict the threats can be used for intelligent security services [4]. Typically, the detection models could be for handling multiple associated cyber-attacks, i.e., “multi-class” problem, or to detect anomalies, i.e., “binary-class” problem. Several recent research, such as to detect botnet attack [5], attack and anomaly detection analysis in IoT sensors in IoT site [6], classifying attacks to build an intrusion detection system [7], to detect the anomalous network connections and classifying the normal traffic and attack [8], etc. have been done in the area. Although several machine learning techniques are used for different purposes, these are limited to analyze the variations in the significance of the security features, or to conduct the empirical analysis in a small range in terms of techniques used for security intelligence modeling. These are discussed briefly in Section 2, and summarized in Table 1. Moreover, in case of unknown attacks, the abnormal behaviors that are considered as anomalies, which is different from the normal traffic, and the relevant model can be used in many security solutions [1] [9]. Thus to classify the associated attacks in several well-known classes such as DoS, botnet, malware, worms, etc. as well as to classify anomalies for unknown attacks from the normal traffic is essential for intelligent modeling in the area of cybersecurity.

Different machine learning models by taking into account the above-mentioned issues may perform differently according to their learning capabilities from security data. The reason is that the effectiveness of a learning-based security model may vary depending on the significance of the associated security features and the data characteristics. In the real-world scenario, the cybersecurity issues might be involved with a huge number of security features, several known or unknown attack classes, or anomalies. Thus, an effective feature selection technique and a robust classification model usually consist of the construction of an intelligent intrusion detection system. Various types of machine learning techniques and their applicability in the area of cybersecurity, have been discussed briefly in Sarker et al. [9], however a detailed empirical analysis is needed by taking into account the above-mentioned issues to make an intelligent decision in the area. Therefore, we aim to present a comprehensive empirical analysis on the effectiveness of various machine learning based security models by taking into account the issues, to make an intelligent decision in such diverse real-world scenarios in the area.

To address the issues mentioned above, in this paper, we present “CyberLearning”, a machine learning-based security modeling by taking into account the significance of the security features, and relevant experimental analysis. In our analysis, we take into account a binary classification model for detecting anomalies, and multi-class classification model for detecting various types of cyber-attacks, such as DoS, Backdoor, Worms, etc. In a binary-class classification model, the given security dataset is categorized into two classes, such as ‘normal’ or ‘anomaly’, whereas in a multi-class classification model, the given dataset is categorized into several attack classes, mentioned above. For modeling, we first employ the popular ten machine learning classification techniques, such as Naive Bayes (NB), Logistic regression (LR), Stochastic gradient descent (SGD), K-nearest neighbors (KNN), Support vector machine (SVM), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), Linear discriminant analysis (LDA), as well as Artificial Neural Network (ANN) based model, which is frequently used in deep learning [10] [11]. For selecting features, we take into account the feature correlation values, and then the resultant security model has been built based on the selected features considering both the model accuracy and simplicity or complexity. The main idea is that the learning-based model typically examines the behavior of the network utilizing the data, finding the security patterns for profiling the normal behavior, and thus detects the anomalies or associated attacks. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 [2] and NSL-KDD [12].

The contributions of this work can be summarized as follows.

  • We first highlight the importance of security features in a machine learning security modeling to detect cyber-anomalies and multi-attacks. Thus we adopt a correlated-feature selection approach to reduce the insignificant or irrelevant security features, which makes the security model lightweight and more applicable.

  • We present a binary classification model for detecting cyber-anomalies or unknown attacks, where the security model classifies the data into two classes, such as ‘normal’ and ‘anomaly’. We also analyze the effectiveness of various popular machine learning classification models while detecting such anomalies.

  • We present a multi-class classification model for detecting various cyber-attacks, such as DoS, Backdoor, Worms, etc. where the security model classifies the data into these attack classes. We also analyze the effectiveness of various popular machine learning classification models while detecting such cyber-attacks.

  • Finally, we conduct a range of experiments and present a comprehensive empirical analysis on the effectiveness of various machine learning classification based security modeling for unknown test cases.

The rest of the paper is organized as follows. Section 2 provides the background and related work of our study. In Section 3, we present our machine learning-based security modeling by taking into account the significance of the security features. We evaluate the resultant security model and report the experimental results in Section 4. In Section 5, several key findings of our analysis in the area are summarized. Finally, Section 6 concludes this paper and highlights the future work.

2 Background and Related Work

A number of research has been done in the area of cybersecurity with the capability of detecting cyber-anomalies and attacks or intrusions. In the cyber industry, both the signature-based intrusion detection system (SIDS) and anomaly-based intrusion detection systems (AIDS) are well-known for detecting and preventing cyber-attacks [9]. SIDS is based on known signatures of the attacks [13]. AIDS, on the other hand, has the benefit of identifying invisible threats over SIDS, including the ability to distinguish unknown or zero-day attacks [14] [15]. Although association analysis is popular in the area of machine learning to build rule-based intelligent systems [16] [17] [18], it might not be effective due to its redundant generation and complexity with higher dimensions of security features while detecting anomalies or cyber-attacks. Thus, to achieve our goal, in this work, we primarily focus on machine learning classification models [19], for security modeling because of their automated learning capabilities from the security data.

Several machine learning techniques have been used for various purposes. For instance, Li et al. [20] classify different types of attacks such as DoS, Probe or Scan, U2R, R2L, as well as regular traffic using SVM classifier using the most common KDD’99 cup dataset. Similarly, Amiri et al. [21], Wagner et al. [22], Kotpalliwar et al. [23], Saxena et al. [24], Pervez et al. [25], Li et al. [20], Shon et al. [26], Kokila et al. [27], and Raman et al. [7] used SVM classifier in their studies for the purpose of detecting attacks. Several other classifiers are used to detect intrusions or attacks, in addition to the SVM classifier mentioned above. For example, a probability-based Bayesian network is used by Kruegel et al. [28] to identify events processing TCP/IP packets. Benferhat et al. have identified a DoS intrusion detector using the same Bayesian network in their research [29]. Similarly, Panda et al. [30], Koc et al. [31] also use the naive Bayes classifier for detecting attacks in their systems.

Several studies [32] [33] have been conducted to classify malicious traffic and intrusions using a logistic regression model. The KNN, an instance-based learning algorithm, is another common method of machine learning where the classification of a point is determined by that data point’s k-nearest neighbors. Vishwakarma et al. [34], Shapoorifard et al. [35], Sharifi et al. [36] use KNN classification technique in their studies for the purpose of intrusion detection systems. Authors in [37] consider neural classifier, and in [38] consider wavelet transform for anomaly detection particularly DoS attacks. A significant number of research in the domain of cybersecurity, such as Relan et al. [39], Rai et al. [40], Ingre et al. [41], Malik et al. [8], [41], Puthran et al. [42], Moon et al. [43], Balogun et al. [44], Sangkatsanee et al. [45] use DT classification approach in their studies for the purpose of building intrusion detection systems. To detect anomalies and address loT cybersecurity threats in smart city, Alrashdi et al. [46] use RF learning consisting of multiple decision trees in their binary classification model. Mazini et al. [47] use AdaBoost approach with feature selection while building anomaly network-based intrusion detection system in their work.

A machine learning security model for detecting anomalies has been presented in [1], which is effective in terms of prediction accuracy as well as reducing the feature dimensions based on the decision tree classification approach with feature selection. Recently, a machine learning-based botnet attack detection framework with sequential detection architecture has been presented in [5], where ANN, DT, and NB classification techniques are used. Hasan et al. [6] perform attack detection analysis in IoT sites, to develop a smart, secured, and reliable IoT based infrastructure. Although several machine learning techniques, such as SVM, DT, RF, LR, and ANN are used, the analysis is limited to a small number of security features for detecting different types of attacks. Moreover, the variations in the significance of the security features, which could be a crucial part while building an effective security model using machine learning techniques, are not addressed.

Table 1: A summary of machine learning based security models for detecting cyber-anomalies and attacks
Purposes Used Techniques Type References
To detect IoT-Botnet Attack ANN, DT, NB and Feature selection Multiclass Soe et al. [5] (2020)
Classifying attacks to build an efficient intrusion detection system SVM and Feature selection Multiclass Raman et al. [7] (2019)
To design a host-based intrusion detection system LR and Feature selection Multiclass Besharati et al. [33] (2019)
To detect attacks in the IoT environment LR, SVM, DT, RF, and ANN Multiclass Hasan et al. [6] (2019)
To build anomaly network-based intrusion detection system AdaBoost and , Feature selection Multiclass Mazini et al. [47] (2019)
Detecting attacks to establish an efficient intrusion detection system SVM and Feature removal Multiclass Li et al. [20] (2012)
To classify network events as normal or attack events NB and Feature selection Multiclass Koc et al. [31] (2012)
To detect intrusion to the cloud system KNN and Feature selection Multiclass Sharifi et al. [36] (2015)
To detect the anomalous network connections and classifying normal and attack DT and pruning Binary Malik et al. [8] (2018)
To detect anomalies and address loT cybersecurity threats in Smart City RF learning Binary Alrashdi et al. [46] (2019)
To detect anomalies in a network and classifying normal and attack DT and Feature selection Binary Sarker et al. [1] (2020)
To introduce cybersecurity data science highlighting cyber-anomalies and attacks Overall machine learning perspective Sarker et al. [9] (2020)
To detect cyber-anomalies and multi-attacks NB, LDA, KNN, XGBoost, DT, RF, SVM, SGD, AdaBoost, LR, ANN, and Feature selection Binary and Multiclass CyberLearning (our analysis)

In the real-world scenario, the cybersecurity issues might be involved with a huge number of security features, and the effectiveness of a learning-based security model may vary depending on the significance of the associated security features and the data characteristics. Various types of machine learning techniques and their applicability in the area of cybersecurity, have been discussed in Sarker et al. [9], however a detailed empirical analysis is needed to make an intelligent decision in the area. Unlike the above approaches, in this paper, we present “CyberLearning”, a machine learning-based cybersecurity modeling with correlated-feature selection according to their significance in modeling, and a comprehensive empirical analysis on the effectiveness of various machine learning-based security models. While building the security models, we take into account a binary classification model for detecting anomalies, and a multi-class classification model for detecting multi-attacks in the context of cybersecurity, to provide a comprehensive view to the readers in the area. In Table 1, we also summarize the most relevant machine learning-based security models within the scope of our study for a clear understanding for the readers.

3 Materials and Methods

In this section, we present our security model of machine learning to detect cyber-anomalies and attacks. This involved several processing steps: exploring the security dataset, preparing raw data, determining the correlation and ranking of features, and constructing a security model. We address these steps briefly in the following section in order to achieve our goal.

3.1 Exploring Security Dataset

Usually, security datasets reflect a series of information records consisting of several security features and relevant details that can be used to construct a security model [9] for detecting anomalies. Thus, to detect malicious activity or anomalies, it is important to understand the nature of raw cybersecurity data and the trends of security incidents. In this work, we use the most popular UNSW-NB15 [2] and NSL-KDD [12] security datasets, to build the data-driven security model and the effectiveness analysis.

Table 2: UNSW-NB15 Dataset features with value type.
Feature Name Value Type Feature Name Value Type
srcipsrcip Nominal sportsport Integer
dstipdstip Nominal dsportdsport Integer
protoproto Nominal statestate Nominal
durdur Float sbytessbytes Integer
dbytesdbytes Integer sttlsttl Integer
dttldttl Integer slosssloss Integer
dlossdloss Integer serviceservice Nominal
SloadSload Float DloadDload Float
SpktsSpkts Integer DpktsDpkts Integer
swinswin Integer dwindwin Integer
stcpbstcpb Integer dtcpbdtcpb Integer
smeanszsmeansz Integer dmeanszdmeansz Integer
SloadSload Float DloadDload Float
SpktsSpkts Integer DpktsDpkts Integer
swinswin Integer dwindwin Integer
trans_depthtrans\_depth Integer res_bdy_lenres\_bdy\_len Integer
SjitSjit Float DjitDjit Float
StimeStime Timestamp LtimeLtime Timestamp
SintpktSintpkt Float DintpktDintpkt Float
tcprtttcprtt Float synacksynack Float
ackdatackdat Float is_sm_ips_portsis\_sm\_ips\_ports Binary
ct_state_ttlct\_state\_ttl Integer ct_flw_http_mthdct\_flw\_http\_mthd Integer
is_ftp_loginis\_ftp\_login Binary ct_ftp_cmdct\_ftp\_cmd Integer
ct_srv_srcct\_srv\_src Integer ct_srv_dstct\_srv\_dst Integer
ct_dst_ltmct\_dst\_ltm Integer ct_src_ltmct\_src\_ltm Integer
ct_src_dport_ltmct\_src\_dport\_ltm Integer ct_dst_sport_ltmltmct\_dst\_sport\_ltmltm Integer
ct_dst_src_ltmct\_dst\_src\_ltm Integer

Nine types of attacks, including Fuzzers, Study, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms, are included in the UNSW-NB15[2] dataset. It contains 257673 instances with the training and testing set and 45 features. On the other hand, NSL-KDD [12] dataset contains the Denial of Service Attack (DoS), User to Root Attack (U2R), Remote to Local Attack (R2L), and Probing Attack. The raw data source consists of 494020 instances with 41 security features that are taken into account in our experimental analysis. The features can be in various types in a dataset. For instance, in Table 2, we show the security features of the UNSW-NB15 dataset, where the features are not identical. Thus effectively analyzing these features and building a security model for detecting the anomalies and multi-attacks mentioned above, is the key in our analysis.

3.2 Security Data Pre-Processing

Data preparation includes anomaly and attacks, feature encoding, and scaling according to the characteristics of the given dataset.

  • Anomaly and Attacks: As mentioned earlier, the dataset UNSW-NB15 [2] contains nine types of attacks. These are known as anomalies in this dataset and are used in a binary classification model, while all these separate attacks are used in a multi-class classification model that is taken into account in our analysis. Similarly, the four types of attacks such as DoS, U2R, R2L, and Probing, are known as anomalies in NSL-KDD [12] dataset and are used in the corresponding classification model.

  • Feature encoding: As shown in Table 2, the dataset UNSW-NB15 [2] contains several feature types such as the nominal, integer, float, timestamp, and binary values. Thus, to fit the data to the security model, we first convert all the nominal valued features into vectors. Although, “One Hot Encoding” is a popular technique, we use “Label Encoding” in this work. The reason is that, in one hot encoding technique, a significant number of feature dimensions increase [1]. The label-encoding technique, on the other hand, transforms the feature values directly into precise numeric values that can be used to fit a classification model for machine learning. Similarly, the features in NSL-KDD [12] dataset are encoded to build the resultant security model.

    Refer to caption
    Figure 1: Secuity feature `sbyte`sbyte^{\prime}
    Refer to caption
    Figure 2: Secuity feature `synack`synack^{\prime}
  • Feature scaling: Feature scaling is also known as data normalization in the task of data pre-processing. All the security features in a dataset may not identical in terms of data distribution, and vary from feature to feature. For instance, Figure 1 and Figure 2 show the data distribution for two different features, sbytesbyte and synacksynack respectively in the dataset UNSW-NB15 [2]. According to Figure 1 and Figure 2, for some data points, the value is very low while for some data points, it is much higher. Thus, we use Standard Scaler, a data scaling method that is used to normalize the range of the feature values with the mean value = 0 and standard deviation = 1.

  • Data Splitting: As we aim to build learning-based security modeling, data splitting can be considered as an important part. The reason is that a good security model may be based on bad data splitting. Thus, for building a fair model and evaluation, we first consider the data from data sources as input data and split them using a kk fold cross-validation technique [48]. According to kk fold cross-validation technique, we first randomly partition the input data mentioned above into kk mutually exclusive subsets or “folds”, d1,d2,,dkd_{1},d_{2},...,d_{k}. Each fold has an approximately equal size of data instances. The model needs kk iteration to complete the overall process. Thus, in each iteration ii, we use all the data instances of all folds except did_{i} as the training dataset that can be used to build the resultant security model. For evaluation purpose did_{i} is used as the testing dataset in each iteration ii. Eventually, the average result is taken into account as the outcome of the model.

3.3 Modeling Techniques

In our CyberLearning modeling, we take into account the impact of security features while building the security model. In the following, we present how we rank the features for selection, and various machine learning algorithms that are employed to build the model, and effectiveness analysis within the scope of our study.

3.3.1 Feature Ranking and Selection

Feature selection in the cybersecurity domain can provide a better understanding of the security data, a way of simplifying the security model by reducing the computational cost or model complexity, as well as providing significant outcomes in a machine learning-based model. Security dataset may contain data with high dimensions, and some of them may be highly correlated to anomalies or attacks, while some have less correlation or no correlation at all. Thus, in order to create a machine learning classification-based security model, all the security features in a given dataset may not contain significant details. In addition, due to the over-fitting issue [1] [49], further processing with all the security features could provide poor results. Thus, security feature selection is required not only to reduce the computational cost but also to create a more efficient security model with a higher accuracy rate. Thus, security feature selection is considered as a method that can be used to filter those features that are less significant, redundant, or have no impact on modeling, from the given security dataset.

To achieve this goal, we first calculate the correlation of the security features, known as the Pearson correlation coefficient, and rank them accordingly. The correlation-based feature selection is based on the following hypothesis: “Good feature subsets contain features highly correlated with the target class, yet uncorrelated or less correlated to each other”. If XX and YY represent two random contextual variables, then the correlation coefficient between XX and YY is defined as [48] -

r(X,Y)=i=1n(XiX¯)(YiY¯)i=1n(XiX¯)2i=1n(YiY¯)2r(X,Y)=\frac{\sum_{i=1}^{n}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}\sqrt{\sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2}}} (1)

In the field of statistics, the formula Equ. 1 is often used to determine how strong that relationship is between those two variables XX and YY. In our security modeling, the higher the value, the more significant the security feature for building the resultant learning-based security model. For instance, a value of 11 (max) means that the outcome of the learning-based security model is directly associated with that security feature, and 0 (min) means that the output of the model does not depend on that security feature at all. Thus, in the scope of our analysis, we calculate the correlation coefficient values of each security feature in both our binary classification modeling for detecting anomalies and multi-class classification modeling for detecting various types of attacks.

3.4 Machine Learning Algorithms and Parameters

In this section, we present how various machine learning classification techniques as well as ANN-based modeling with multiple hidden layers, are used in our security modeling.

3.4.1 Naive Bayes (NB)

Naïve Bayesian (NB) [50] is one of the common classification techniques for machine learning that is often used in the field of machine learning and data science. This is based on Bayes’s theorem that describes the probability of a given feature, according to the prior knowledge of situations related to that feature. Let, X={x1,x2,,xn}X=\{x_{1},x_{2},...,x_{n}\} is a security feature vector of size nn, and cc is a class variable that represents the cyber-attacks or anomalies. Thus, it calculates the probability (P)(P) using the following equation [48]:

P(c|X)=P(X|c)P(c)P(X)P(c|X)=\frac{P(X|c)P(c)}{P(X)} (2)
P(c|x1,x2,,xn)=P(x1|c)P(x2|c)P(xn|c)P(c)P(x1)P(x2)P(xn)P(c|x_{1},x_{2},...,x_{n})=\frac{P(x_{1}|c)P(x_{2}|c)...P(x_{n}|c)P(c)}{P(x_{1})P(x_{2})...P(x_{n})} (3)

To build a security model, we use the Gaussian Naive Bayes classifier [51] assuming all the security features are following a Gaussian distribution i.e, normal distribution. The prior probabilities of the classes in our security modeling are adjusted according to the data. The portion of the largest variance of all security features is added to the variances for calculation stability or smoothing.

3.4.2 Linear Discriminant Analysis (LDA)

In machine learning, Linear Discriminant Analysis (LDA) [48] is another probability-based method to find a linear combination of security features that separates the anomaly or attack classes. This method is also known as a generalization of Fisher’s linear discriminant, that projects a given security dataset onto a lower-dimensional space, i.e., dimensionality reduction that minimizes the model complexity or reduce the computational costs of the resultant security model. Consequently, it has the capability for good class-separability to avoid the problem of overfitting. Thus, the resulting combination mentioned above can be used as a linear classifier or, more specifically, for dimensionality reduction of security features before performing the tasks of anomaly or attack classification. The standard LDA model typically fits a Gaussian density to each class such as ‘anomaly’ or ‘normal’ or various types of attacks, assuming that all classes share the same covariance matrix [51]. For modeling, the LDA approach also uses Bayes’ theorem mentioned above to estimate probabilities and to make predictions of the class anomaly or various types of cyber-attacks based upon the probability that a new input dataset belongs to each anomaly or attack class. The class which has the highest probability is considered the output anomaly or attack class, and then the LDA makes a prediction. In our security modeling, we use singularvaluedecompositionsingular\;value\;decomposition as a solver method with no shrinkage to get the outcome. The prior probabilities of the classes in our security modeling are inferred from the given security data.

3.4.3 K-nearest Neighbor (KNN)

K-nearest neighbors (KNN) [52], also known as a lazy learning algorithm, is an instance learning or non-generalizing learning. Instead of using all data instances during classification, this approach does not have a specialized training process for constructing a model. Based on a ’feature similarity’ scale, it classifies new test cases, considering a distance function, such as MinkowskiMinkowski, EuclideanEuclidean, ManhattanManhattan distance etc [48]. Let, two variables XX and YY, then the MinkowskidistanceMinkowski\;distance between these two variables is defined as (i|XiYi|p)1/p,wherep1\left(\sum_{i}|X_{i}-Y_{i}|^{p}\right)^{1/p},\ \text{where}\ p\geq 1. It can behave differently depending on pp values, such as p=1p=1 and p=2p=2 represent Manhattan and Euclidean distance respectively.

d(X,Y)=i=1n(XiYi)2d\left(X,Y\right)=\sqrt{\sum_{i=1}^{n}\left(X_{i}-Y_{i}\right)^{2}} (4)

In our security modeling, we take into account the most popular Euclidean distance considering p=2p=2 [51], and can be defined as Equ 4. The number of neighbors indicating as kk values is another key parameter in a KNN based security modeling. Thus, we take into account k=5k=5, as the number of neighbors, and uniform weights, where all points in each neighborhood are weighted equally in our security modeling.

3.4.4 Decision Tree (DT)

Decision tree (DT) [53] is a well-known classification framework for machine learning, which is commonly used in various fields of use. A decision tree is a method of non-parametric supervised learning that breaks down a given security dataset into smaller subsets and incrementally generates a related branch of the tree. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain, which can be expressed mathematically as [51].

Entropy:H(x)=i=1np(xi)log2p(xi)Entropy:H(x)=-\sum_{i=1}^{n}p(x_{i})\log_{2}p(x_{i}) (5)
Gini(E)=1i=1cpi2Gini(E)=1-\sum_{i=1}^{c}{p_{i}}^{2} (6)

Where pip_{i} denotes the probability of an element being classified for a distinct anomaly or attack class. To build a decision tree based security model, we use “Gini Index” that is determined by deducting the sum of squared of probabilities of each anomaly or attack class from 11 that can be expressed as Equ. 6. While generating the tree considering both the anomaly or attack classes, nodes are taken into account to expand until all leaves are pure or until all leaves contain less than two sample instances.

3.4.5 Random Forest (RF)

In the field of machine learning and data science, the random forest (RF) [54] is well known as an ensemble classification technique that is used in different application areas. This consists of multiple decision trees, where a decision tree classifier discussed above is used as a single tree in the forest model. This combines the bootstrap aggregation (bagging) [55] with the random selection of features [56] to create a collection of controlled variance decision trees. The majority voting of the generated decision trees in a forest model is used to measure the outcome. To build a random forest security model, we generate N=100N=100 decision trees in the forest, where the quality of a split in a tree is measured by ‘Gini’, defined earlier in Equ. 6.

3.4.6 Support Vector Machine (SVM)

In machine learning, support vector machine (SVM) [48] is another popular classification technique. This technique is based on a hyperplane between the data space, which best divides the security dataset into two classes, such as ‘anomaly’ or ‘normal’ and can behave differently based on the mathematical functions known as the kernel that can be different types such as linear, nonlinear, polynomial, radial basis function (RBF), sigmoid, etc. To build a security model, we use the RBF kernel [57], also known as the Gaussian kernel, considering no prior knowledge about the given security data. The RBF kernel is mathematically defined as -

k(x,y)=exp(λxy2)k(x,y)=exp(-\lambda||x-y||^{2}) (7)

where λ\lambda is a parameter that sets the “spread” of the kernel. Based on this RBF kernel function defined in Equ. 7, this technique manipulates the given security data accordingly to achieve the goal. Overall, it works in two stages, including the identification of the optimal hyperplane in the data space and then the mapping of the security data instances according to the hyperplane’s defined decision boundaries. Moreover, we use C=1.0C=1.0 (regularization parameter), considering the trade-off between achieving a low training error, and a low testing error in a SVM based security model.

3.4.7 Logistic Regression (LR)

Another common probabilistic dependent statistical model used to solve the classification problems in machine learning is Logistic Regression (LR) [58]. Typically, logistic regression calculates the probabilities using a logistic equation, which is often referred to as the mathematically defined sigmoid function -

g(z)=11+exp(z)g(z)=\frac{1}{1+exp(-z)} (8)

While building LR based security modeling, we use L2L_{2} regularization, i.e., RidgeRidge regression that adds squared magnitude of coefficient as penalty term to the loss function. The “C” is similar to the SVM model. We also use Scikit-learn solver “lbfgs” [51], which stands for Limited-memory Broyden–Fletcher–Goldfarb–Shanno, to build the security model.

3.4.8 Adaptive Boosting (AdaBoost)

Boosting, a machine-learning algorithm, can be used for classification that is able to reduce bias and variance from the dataset. Boosting helps to convert weak learners to strong ones. Adaptive Boosting (AdaBoost) is such an algorithm formulated by Yoav Freund et al. [59]. In that sense, AdaBoost is called an adaptive classifier by significantly enhancing the efficiency of the classifier, but in some instances, it can trigger overfits. For noisy data and outliers, AdaBoost is sensitive. We use a decision tree classifier with maximum depth (max_depth=1max\_depth=1) as a base estimator. The maximum number of estimators is taken into account as 5050 at which boosting is terminated.

3.4.9 Extreme Gradient Boosting (XGBoost)

Gradient Boosting is another ensemble learning algorithm, similar to the Random Forests discussed above, that creates a final model based on a set of individual models. Similar to how neural networks use gradient descent to optimize weights, the gradient is used to minimize the loss function. XGBoost stands for Extreme Gradient Boosting, which is known as a special Gradient Boosting method that takes into account more accurate approximations to find the best model. It computes second-order gradients of the loss function to minimize the loss and advanced regularization (L1 & L2), which reduces overfitting and improves model generalization. We employ scikit-learn [51] API compatible class while building a security model based on XGBClassifier in our analysis.

3.4.10 Stochastic Gradient Descent (SGD)

Stochastic gradient descent (SGD) [48] is an iterative method for optimizing an objective function with suitable smoothness properties, where the word ‘stochastic’ means a system or a process that is linked with a random probability. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Gradient Descent is mathematically a convex function whose output is a partial derivative of a set of its input parameters. Let, α\alpha is the learning rate, and JiJ_{i} is the cost of ithi^{th} training example, then Equ. 9 represents the weight update process for the stochastic gradient descent at jthj^{th} iteration.

wj:=wjαJiwjw_{j}\ :=\ w_{j}-\alpha\ \frac{\partial J_{i}}{\partial w_{j}} (9)

The greater the gradient, the steeper the slope. While building the security model, we use a loss function `hinge`hinge^{\prime}, which gives a linear SVM. Moreover, we use L2L_{2} regularization similar to logistic regression and a constant alpha=0.0001alpha=0.0001 that multiplies the regularization term while building the security model.

3.4.11 Artificial Neural Network (ANN)

Artificial Neural Network (ANN) is also a machine learning technique and used typically in deep learning modeling, which is comprised of a network of artificial neurons or nodes [48]. In this work, we build a feed-forward ANN-based deep learning security model consisting of an input layer with the selected security features, three hidden layers with 128 neurons, and an output layer with one neuron for binary classification, or the equal number of classes for multi-class classification task. We also use dropout in each layer to simplify the security model and compile the neural network model with Adam optimizer [60].

ReLU:f(x)=max(0,x)ReLU:f(x)=max(0,x) (10)
Softmax:f(yk)=exp(ϕk)jcexp(ϕj)Softmax:f(y_{k})=\frac{\exp(\phi_{k})}{\sum^{c}_{j}\exp(\phi_{j})} (11)
Sigmoid:f(z)=11+ezSigmoid:f(z)=\frac{1}{1+e^{-z}} (12)
Loss={(ylog(p)+(1y)log(1p))for binaryc=1Myo,clog(po,c)for multiclassLoss=\begin{cases}-{(y\log(p)+(1-y)\log(1-p))}&\texttt{for }binary\\ -\sum_{c=1}^{M}y_{o,c}\log(p_{o,c})&\texttt{for }multiclass\\ \end{cases} (13)

We use 100 epochs with a batch size of 128 when training the security network. We often use a small value of 0.001 as the learning rate, as it enables the global minimum to be reached by the security network model. We use the Rectified Linear Unit (ReLU) described in Equ. with regard to the activation function. 10, which addresses the problem of the vanishing gradient, as well as helps the model to learn faster. However, we use the Softmax activation function defined in Equ. 11 for multi-class attack detection and the Sigmoid or Logistic activation function defined in Equ. 12 for binary classification as it exists between (0 to 1) in the output layer. To adjust the weights of the model, we use the Cross-Entropy loss function, defined in Equ. 13, where MM represents the number of attack classes cc, yy represents binary indicator, and pp represents probability observation oo. The popular Backpropagation technique [48] is used to adjust the connection weights between neurons of the security model during learning.

4 Experimental Results and Analysis

In this section, we aim to briefly analyze and report the experimental results of machine learning-based security modeling as well as artificial neural network-based model utilizing the security datasets. For this, we first set up our experiments highlighting several questions to evaluate our security model, and then briefly discuss the experimental results and findings in various dimensions related to our analysis of cyber-anomalies and multi-attacks detection.

4.1 Experimental Setup

To evaluate our CyberLearning model, we aim to answer the following questions:

  • Question 1: Does the impact of the security features vary from feature to feature while building a machine learning-based security model?

  • Question 2: How effective is the machine-learning-based security model for detecting cyber-anomalies considering binary classification?

  • Question 3: How effective is the machine-learning-based security model for detecting multi-attacks considering multi-class classification?

  • Question 4: How effective the artificial neural network-based security model for detecting the anomalies and multi-class attacks?

To answer these questions related to our CyberLearning analysis, we have conducted a range of experiments on security datasets consisting of the anomalies and multi-attacks discussed in the earlier section. We have implemented all these methods in Python programming language using Scikit-learn [51], Tensorflow, and Keras [60], and executed them on Google Colab [61]. In the following subsections, we first define the evaluation metrics that are taken into account in our experimental evaluation.

4.2 Evaluation Metric

To measure the effectiveness of our CyberLearning model, we compute the outcome results in terms of precision, recall, F-score, as well as model accuracy in percentage. For this, we first calculate the true positive rate (TP), true negative rate (TN), false positive rate (FP), and false-negative rate (FN) that are defined as below [48] -

  • TP (true positive): An outcome where the security model correctly detects or classifies the positive class of anomaly or attacks.

  • TN (true negative): An outcome where the security model correctly detects or classifies the negative class of anomaly or attacks.

  • FP (false positive): An outcome where the security model incorrectly detects or classifies the positive class of anomaly or attacks.

  • FN (false negative): An outcome where the security model incorrectly detects or classifies the negative class of anomaly or attacks.

Based on these definitions of TP, TN, FP, and FN, we can compute the precision, recall, F-score, accuracy as below [48] -

Precision=TPTP+FPPrecision=\frac{TP}{TP+FP} (14)
Recall=TPTP+FNRecall=\frac{TP}{TP+FN} (15)
F1score=2PrecisionRecallPrecision+RecallF1-score=2*\frac{Precision*Recall}{Precision+Recall} (16)
Accuracy=TP+TNTP+TN+FP+FNAccuracy=\frac{TP+TN}{TP+TN+FP+FN} (17)

In the area of machine learning and data science, these metrics are well-known and widely used to measure the effectiveness of a model [48] [19]. The greater the value the effective the security model is. In the following subsection, we discuss the experimental results briefly and analyze the model effectiveness considering these metrics.

4.3 Impact of Security Features and Ranking

To answer the first question mentioned above, in this experiment, we calculate and show the impact of each feature based on their correlation values. Table 3 shows the calculated correlation scores of all the 42 security features utilizing the given security dataset UNSW-NB15. The results are shown in a descending order for detecting anomalies considering binary classification, where the values are arranged from the largest to the smallest number. If we observe Table 3, we see that the calculated scores of all features are not identical in a given dataset, and may vary from feature-to-feature according to their impact on the target anomaly and attack classes.

Table 3: The ranking of the security features with corresponding correlation scores for detecting anomalies utilizing the dataset UNSW-NB15.
Rank Feature Score Rank Feature Score
01 sttlsttl 0.624082 22 dlossdloss 0.075961
02 ct_state_ttlct\_state\_ttl 0.476559 23 serviceservice 0.073552
03 statestate 0.462972 24 dbytesdbytes 0.060403
04 ct_dst_sport_ltmct\_dst\_sport\_ltm 0.371672 25 djitdjit 0.048819
05 swinswin 0.364877 26 synacksynack 0.043250
06 dloaddload 0.352169 27 spktsspkts 0.043040
07 dwindwin 0.339166 28 dinpktdinpkt 0.030136
08 raterate 0.335883 29 durdur 0.029096
09 ct_src_dport_ltmct\_src\_dport\_ltm 0.318518 30 smeansmean 0.028372
10 ct_dst_src_ltmct\_dst\_src\_ltm 0.299609 31 tcprtttcprtt 0.024668
11 dmeandmean 0.295173 32 sbytessbytes 0.019376
12 stcpbstcpb 0.266585 33 dttldttl 0.019369
13 dtcpbdtcpb 0.263543 34 response_body_lenresponse\_body\_len 0.018930
14 ct_src_ltmct\_src\_ltm 0.252498 35 sjitsjit 0.016436
15 ct_srv_dstct\_srv\_dst 0.247812 36 ct_flw_http_mthdct\_flw\_http\_mthd 0.012237
16 ct_srv_srcct\_srv\_src 0.246596 37 ct_ftp_cmdct\_ftp\_cmd 0.009092
17 ct_dst_ltmct\_dst\_ltm 0.240776 38 is_ftp_loginis\_ftp\_login 0.008762
18 sloadsload 0.165249 39 protoproto 0.008023
19 is_sm_ips_portsis\_sm\_ips\_ports 0.160126 40 trans_depthtrans\_depth 0.002246
20 sinpktsinpkt 0.155454 41 slosssloss 0.001828
21 dpktsdpkts 0.097394 42 ackdatackdat 0.000817

According to Table 3, the feature sttlsttl has the highest score of 0.6240820.624082 and thus selected as the top-ranked feature, whereas another feature ackdatackdat has a lower score of 0.0008170.000817 that is closer to the value 0 for this dataset, and thus selected as the last ranked feature. These correlation scores may be different for another dataset depending on their features and classes. The higher the correlation value, the more significant the feature in a security model. Thus, based on the scores, we can conclude that all the features in a given security dataset might not have a similar impact to build a data-driven security model.

4.4 Effectiveness Analysis for Detecting Cyber-Anomalies

To show the effectiveness of the security models based on machine learning classifiers, Table 4 shows the effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based anomaly detection models considering binary classification. The results in Table 4 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. These are selected according to their correlation scores and ranking, shown in Table 3 considering a particular threshold. If we observe the results in Table 4, we can see that various machine learning security models have an impact on the number of selected features. In general, higher accuracy results considering a minimum number of top-ranked features represent the effectiveness of the security models, in terms of both the detection outcome and model complexity or simplicity. For instance, the NB security model gives higher accuracy (85%) when the top 24 features are selected to build the model. Similarly, RF and SVM security models also give higher accuracy (95%) and (92%), when the top 24 features are selected to build the corresponding models. Some models such as LDA, AdaBoost, SGD, and LR show their significant results considering all the 42 features, while some models such as KNN, XGBoost show their significant results considering only the top 17 selected features. In addition to RF (accuracy 95%), DT (accuracy 94%), and XGBoost (accuracy 93%) also give significant results for detecting anomalies.

Table 4: Effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based anomaly detection models utilizing the dataset UNSW-NB15.
Model Features (42) Features (31) Features (24) Features (17)
NB 82 83 85 75
LDA 89 87 87 84
KNN 92 92 92 92
XGBoost 93 93 93 93
DT 94 94 93 92
RF 95 95 95 94
SVM 92 92 92 91
AdaBoost 93 92 92 92
SGD 89 88 88 86
LR 90 88 87 84

In addition to Table 4, Figure 3 also shows the relative comparison of various security models based on machine learning classifiers for detecting anomalies. The comparative results are shown in terms of precision, recall, and F1 score for different numbers of top-ranked selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. For each security model, we use the same train and testing data to calculate these metrics for fair evaluation.

Refer to caption
(a) Anomaly detection with all 42 features
Refer to caption
(b) Anomaly detection with top 31 features
Refer to caption
(c) Anomaly detection with top 24 features
Refer to caption
(d) Anomaly detection with top 17 features
Figure 3: Effectiveness comparison results in terms of precision, recall, and F1 score for different machine learning classifier based anomaly detection models utilizing the dataset UNSW-NB15.

If we observe Figure 3, we find that tree-based classification models give higher prediction results than other security models, in terms of precision, recall, and F1 score, while applying on cybersecurity data consisting of various security features. In particular, the RF (Random Forest) based security model generating multiple decision trees gives the prediction results with the highest values of accuracy, recall, and F1 score for different number of features, shown in Figure 3. The interesting finding is that the RF model gives similar results with the features of 42, 31, and 24, and a comparatively lower result with feature 17. The reason for decreasing the result is that it losses significant information while reducing the features. Thus, the RF model with the top 24 security features is taken into account as an effective model considering both the accuracy and model complexity. Overall, based on the selected security features, we can conclude that the RF model gives better results in detecting cyber anomalies. The explanation is that the random forest model produces a collection of logical rules based on the chosen security features that take into account multiple decision trees created in the forest, and offers an outcome based on the majority vote of those trees.

Table 5: Effectiveness comparison results in terms of accuracy (%), precision, recall, and F1 score for different machine learning classifier based anomaly detection models utilizing the dataset NSL-KDD.
Model Accuracy (%) Precision Recall F1 Score
NB 98 0.98 0.98 0.98
LDA 99 0.99 0.99 0.99
KNN 99 0.99 0.99 0.99
XGBoost 99 0.99 0.99 0.99
DT 99 0.99 0.99 0.99
RF 99 0.99 0.99 0.99
SVM 99 0.99 0.99 0.99
AdaBoost 98 0.98 0.98 0.98
SGD 99 0.99 0.99 0.99
LR 98 0.98 0.98 0.98

In Table 5, we also show the effectiveness comparison results utilizing another widely used security dataset NSL-KDD. The results are shown in terms of accuracy (%), precision, recall, and F-score, for different machine learning classifier based anomaly detection models considering binary classification. The results in Table 5 are shown for the top five selected features according to their correlation scores and ranking. If we observe the results in Table 5, we can see that almost all the security models give significant results (accuracy 99%) with the selected top 5 features. Thus, we can conclude that machine learning-based security models are highly dependent on the quality and characteristics of the data, and may give different results for different datasets.

4.5 Effectiveness Comparison for Detecting Multi-Attacks

To show the effectiveness of the security models based on machine learning classifiers, Table 6 shows the effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based attacks detection models considering multi-class classification. The results in Table 6 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. These features are selected similarly, i.e., according to their correlation scores and ranking considering a particular threshold. If we observe the results in Table 6, we can see that various machine learning security models for detecting multi-attacks have also an impact on the number of selected features. As higher accuracy results with a minimum number of features represent the effectiveness of the security models, the RF model is effective with the accuracy (83%) when the top 31 features are selected to build the model. Similarly, XGBoost, DT, and SVM security models also give higher accuracy (81%), (81%), and (79%), when the top 31 features are selected to build the corresponding models. Several security models such as NB, LDA, KNN, and AdaBoost show their significant results considering the top 24 features, while the LR model shows significant results considering all the 42 features. Overall, in addition to RF (accuracy 83%), DT (accuracy 81%), and XGBoost (accuracy 81%) also give significant results for detecting multi-attacks.

Table 6: Effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based multi-attacks detection models utilizing the dataset UNSW-NB15.
Model Features (42) Features (31) Features (24) Features (17)
NB 43 43 44 42
LDA 67 67 67 65
KNN 76 77 77 73
XGBoost 81 81 80 76
DT 81 81 80 77
RF 83 83 82 80
SVM 79 79 78 74
AdaBoost 51 31 62 57
SGD 71 72 70 63
LR 76 75 74 71

In addition to Table 6, Figure 4 also shows the relative comparison of various security models based on machine learning classifiers for detecting multi-attacks in terms of precision, recall, and F1 score utilizing the dataset UNSW-NB15. For each security model, we use the same train and testing data to calculate these metrics for fair evaluation.

Refer to caption
(a) Multi-attacks detection with all 42 features
Refer to caption
(b) Multi-attacks detection with top 31 features
Refer to caption
(c) Multi-attacks detection with top 24 features
Refer to caption
(d) Multi-attacks detection with top 17 features
Figure 4: Effectiveness comparison results in terms of precision, recall, and F1 score for different machine learning classifier based multi-attacks detection models utilizing the dataset UNSW-NB15.

If we observe Figure 4, we find that tree-based classification models also provide higher prediction results in terms of accuracy, recall, and F1 score, for multi-attack detection than other security models. In particular, the security model based on RF (Random Forest) generating multiple decision trees gives the prediction results with the highest accuracy, recall, and F1 score values, shown in Figure 4. The interesting finding is that like the anomaly detection model, the RF model gives similar results with the features of 42, 31, and 24, and a comparatively lower result with the feature 17 for multi-attack detection. The reason for decreasing the result is that it losses significant information while reducing the features. Thus, the RF model with the top 24 security features is taken into account as an effective model considering both the accuracy and model complexity. Overall, we can conclude that the RF model gives better results in detecting multi-attacks based on the selected security features. The reason is that the random forest model generates a set of logic rules for the attacks based on the selected security features considering several decision trees generated in the forest, and provide an outcome based on the majority voting of these trees.

Table 7: Effectiveness comparison results in terms of accuracy (%), precision, recall, and F1 score for different machine learning classifier based multi-attacks detection models utilizing the dataset NSL-KDD.
Model Accuracy (%) Precision Recall F1 Score
NB 91 0.97 0.91 0.93
LDA 96 0.98 0.96 0.97
KNN 99 0.99 0.99 0.99
XGBoost 99 0.99 0.99 0.99
DT 99 0.99 0.99 0.99
RF 99 0.99 0.99 0.99
SVM 99 0.98 0.99 0.99
AdaBoost 91 0.91 0.91 0.90
SGD 98 0.98 0.98 0.98
LR 98 0.98 0.98 0.98

In Table 7, we also show the effectiveness comparison results utilizing another widely used security dataset NSL-KDD. The results are shown in terms of accuracy (%), precision, recall, and F-score, for different machine learning classifier based multi-attacks detection models considering multi-class classification. The results in Table 7 are shown for the top five selected features according to their correlation scores and ranking. If we observe the results in Table 7, we can see that most of the security models such as KNN, XGBoost, DT, RF, and SVM give the highest results (accuracy 99%) with the selected top 5 features. The other models also give significant results. Based on the results discussed above, we can conclude that machine learning-based security models are highly dependent on the quality and characteristics of the data, and may give different results for different datasets.

4.6 Effectiveness Analysis for Neural Network-based Security Model

To show the model effectiveness based on artificial neural network, Figure 5 shows the calculated outcome in terms of model accuracy and loss score for detecting anomalies considering binary classification. The results in Figure 5 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. The features are selected similarly, according to their correlation scores and ranking, shown in Table 3 considering a particular threshold mentioned above. Similarly, for multi-attacks classification, Figure 6 shows the calculated outcome in terms of model accuracy and loss score considering multi-class classification according to our goal. For each neural network-based security model, we use the same train and testing data for fair evaluation and comparison.

Refer to caption
(a) Accuracy score with all 42 features.
Refer to caption
(b) Accuracy score with the top 31 features.
Refer to caption
(c) Accuracy score with the top 24 features.
Refer to caption
(d) Accuracy score with the top 17 features.
Refer to caption
(e) Loss score with all 42 features.
Refer to caption
(f) Loss score with the top 31 features.
Refer to caption
(g) Loss score with the top 24 features.
Refer to caption
(h) Loss score with the top 17 features.
Figure 5: Calculated outcome in terms of accuracy and loss score of the deep neural network based security model for detecting anomalies utilizing the dataset UNSW-NB15.
Refer to caption
(a) Accuracy score with all 42 features.
Refer to caption
(b) Accuracy score with the top 31 features.
Refer to caption
(c) Accuracy score with the top 24 features.
Refer to caption
(d) Accuracy score with the top 17 features.
Refer to caption
(e) Loss score with all 42 features.
Refer to caption
(f) Loss score with the top 31 features.
Refer to caption
(g) Loss score with the top 24 features.
Refer to caption
(h) Loss score with the top 17 features.
Figure 6: Calculated outcome in terms of accuracy and loss score of the neural network based security model for detecting multi-attacks utilizing the dataset UNSW-NB15.

If we observe the results in Figure 5 and Figure 6, we can see that a neural network-based security model with a variable number of selected features can detect both the anomalies and multi-attacks. Similar to classic machine learning classification models, discussed above, we get higher accuracy results in anomaly detection using the neural network-based security model. According to Figure 5, the model with the top 24 features gives the results of 92% accuracy with a loss of 0.1681, which is significant in terms of accuracy and complexity, comparing with other models with different number of features, shown in Figure 5. Thus model with the top 24 features can be selected as an effective security model that gives significant accuracy with a reduced number of features for detecting anomalies. Similarly, a model with the top 24 features can also be selected as an effective model for detecting multi-attacks, shown in Figure 6.

Refer to caption
(a) Accuracy score with all 42 features.
Refer to caption
(b) Accuracy score with the top 27 features.
Refer to caption
(c) Accuracy score with the top 18 features.
Refer to caption
(d) Accuracy score with the top 5 features.
Refer to caption
(e) Loss score with all 42 features.
Refer to caption
(f) Loss score with the top 27 features.
Refer to caption
(g) Loss score with the top 18 features.
Refer to caption
(h) Loss score with the top 5 features.
Figure 7: Calculated outcome in terms of accuracy and loss score of the deep neural network based security model for detecting anomalies utilizing the dataset NSL-KDD.
Refer to caption
(a) Accuracy score with all 42 features.
Refer to caption
(b) Accuracy score with the top 27 features.
Refer to caption
(c) Accuracy score with the top 18 features.
Refer to caption
(d) Accuracy score with the top 5 features.
Refer to caption
(e) Loss score with all 42 features.
Refer to caption
(f) Loss score with the top 27 features.
Refer to caption
(g) Loss score with the top 18 features.
Refer to caption
(h) Loss score with the top 5 features.
Figure 8: Calculated outcome in terms of accuracy and loss score of the deep neural network based security model for detecting multi-attacks utilizing the dataset NSL-KDD.

Besides, Figure 7 shows the calculated outcome in terms of model accuracy and loss score for detecting anomalies considering binary classification utilizing another widely used dataset NSL-KDD. The results in Figure 7 are shown by varying the number of selected features such as 42, 27, 18, and 5 utilizing the dataset NSL-KDD. These are selected according to their correlation scores and ranking considering a particular threshold as well. Similarly, for multi-attacks classification, Figure 8 shows the calculated outcome in terms of model accuracy and loss score considering multi-class classification. According to Figure 7, the model with the top 18 features gives the results of 99% accuracy with a loss of 0.0243, which is significant in terms of accuracy and complexity, comparing with other models with different number of features, shown in Figure 7. Thus model with the top 18 features can be selected as an effective security model that gives significant accuracy with a reduced number of features for detecting anomalies. Similarly, a model with the top 27 features can also be selected as an effective model for detecting multi-attacks, shown in Figure 8.

5 Discussion

Overall, our CyberLearning model based on machine learning approaches is fully security data-oriented that reflects the data patterns related to the security incidents, e.g., cyber-anomalies and attacks, according to our goal. The model can effectively detect anomalies and different types of attacks, such as DoS, Backdoor, Worms, etc, where the popular machine learning classification techniques including artificial neural network models are employed. The experimental analysis on the UNSW-NB15 [2] and NSL-KDD [12] datasets, have shown the effectiveness of the resultant security models according to their learning capabilities in various situations, as discussed in the earlier section.

According to the experimental analysis discussed in Section 4, we can say that different machine learning-based security models perform differently for detecting cyber-anomalies, or multi-attacks utilizing the training security data. The significance of the security features greatly impact both the binary classification model while detecting anomalies for unknown attacks, as well as the multi-class classification model while detecting several known classes mentioned above. For instance, according to experimental results shown in Table 3, the feature sttlsttl has the highest correlation score of 0.6240820.624082 and thus selected as the highly significant feature, whereas another feature ackdatackdat has a lower score of 0.0008170.000817 that is closer to the value 0 for the dataset UNSW-NB15 [2], and thus can be considered as the less significant feature for modeling. A set of highly significant security features reducing the insignificant or irrelevant features can help to make the security model lightweight and more applicable. For instance, the NB security model gives higher accuracy (85%) when the top 24 features are selected for detecting cyber-anomalies, rather than considering all 42 features as shown in Table 4. Overall, the security models for detecting anomalies and attacks, based on various learning algorithms are also affected by the variations in the significance of the security features, as discussed briefly in Section 4.

Besides, a robust classification model is essential to the design of an intelligent intrusion detection system. The reason is that the performance of all machine learning classification techniques are not identical in the real world scenario, depending on their learning capabilities from the security data. As shown in Table 4, and Figure 3, the RF-based security model generating multiple decision trees, give higher prediction results for detecting anomalies than other security models, in terms of accuracy, precision, recall, and F1 score. Several other models such as DT, XGBoost, SVM, KNN, AdaBoost also give significant outcome based on the selected features. According to the results, shown in Table 5, the RF model also gives higher accuracy for detecting anomalies. As shown in the Table 6, Figure 4, the RF-based security model also gives higher prediction results for detecting multi-attacks than other security models, in terms of accuracy, precision, recall, and F1 score. Several other models such as DT and XGBoost also give significant outcome based on the selected features while detecting multi-attacks. According to the results, shown in Table 5, the RF model also gives higher accuracy for detecting multi-attacks.

The model performance for detecting multi-attacks may differ with the anomaly detection mentioned above, even for the same learning technique. For instance, the accuracy of the RF security model for detecting anomalies is 95%, and 83% for multi-attacks detection for the dataset UNSW-NB15 [2]. Similarly, it achieves 99% for both cases for the dataset NSL-KDD [12]. Thus, we can say that the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. Overall, we can conclude that RF (Random Forest) based security model is more effective for detecting anomalies and multi-attacks. The reason is that the random forest model has the learning capabilities considering several decision trees that generate a set of logic rules based on the selected security features. Thus, the model gives higher prediction results in terms of accuracy, precision, recall, and F1 score.

A real-life cybersecurity application is the actual platform to use the CyberLearning model that typically examines the behavior of the network, finding the security patterns for profiling the normal behavior, and thus detects the anomalies or associated attacks. Although an ANN model has its hidden layers for computing, it also affects on the significance of the features. For instance, an ANN model with the selected security features gives significant accuracy for detecting anomalies and multi-attacks, as discussed briefly in Section 4. Although we use the security datasets UNSW-NB15 [2] and NSL-KDD [12] while building the security model, our analysis is also applicable to other application domains in the area of cybersecurity, including IoT security. Several deep learning networks such as Convolutional neural network (CNN), recurrent neural network (RNN), Long Short-Term Memory (LSTM), deep belief network (DBN), or an autoencoder, etc. could be effective while working on a huge number of datasets. Typically deep learning algorithms perform well when the data volumes are large [9] [62]. In addition, noisy instance analysis [63], incorporating contextual information [18] [64], or recency analysis considering recent patterns in data [65], could be another potential research dimensions in the area. Overall, we believe that our CyberLearning model including a comprehensive experimental analysis opens a promising path for future research in the domain of cybersecurity, while working on machine learning-based security modeling, to make the security model lightweight and more applicable in the area.

6 Conclusion and Future Work

In this paper, we have presented CyberLearning, where we have taken into account a binary classification model for detecting anomalies and a multi-class classification model for various types of cyber-attacks. In our modeling, we have also taken into account the impact of security features, and eventually built a machine learning based effective model with feature selection. While building the security models, we have employed the most popular machine learning classification techniques as well as artificial neural network learning considering multiple hidden layers. Finally, we have examined the effectiveness of these learning-based security models by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. We believe that our empirical analysis and findings can be used as a reference guide in both academia and industry in the area of cybersecurity for effectively building a data-driven security modeling and system based on machine learning techniques.

To collect more recent security data with higher dimensions in the environment of IoT, and build a data-driven secure system using learning techniques could be a future work.

References

  • [1] I. H. Sarker, Y. B. Abushark, F. Alsolami, A. I. Khan, Intrudtree: A machine learning based cyber security intrusion detection model, Symmetry 12 (5) (2020) 754.
  • [2] N. Moustafa, J. Slay, Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in: 2015 military communications and information systems conference (MilCIS), IEEE, 2015, pp. 1–6.
  • [3] X. Qu, L. Yang, K. Guo, L. Ma, M. Sun, M. Ke, M. Li, A survey on the development of self-organizing maps for unsupervised intrusion detection, Mobile Networks and Applications (2019) 1–22.
  • [4] I. H. Sarker, Ai-driven cybersecurity: An overview, security intelligence modeling and research directions, SN Computer Science (2021).
  • [5] Y. N. Soe, Y. Feng, P. I. Santosa, R. Hartanto, K. Sakurai, Machine learning-based iot-botnet attack detection with sequential architecture, Sensors 20 (16) (2020) 4372.
  • [6] M. Hasan, M. M. Islam, M. I. I. Zarif, M. Hashem, Attack and anomaly detection in iot sensors in iot sites using machine learning approaches, Internet of Things 7 (2019) 100059.
  • [7] M. G. Raman, N. Somu, S. Jagarapu, T. Manghnani, T. Selvam, K. Krithivasan, V. S. Sriram, An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm, Artificial Intelligence Review (2019) 1–32.
  • [8] A. J. Malik, F. A. Khan, A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection, Cluster Computing 21 (1) (2018) 667–680.
  • [9] I. H. Sarker, A. Kayes, S. Badsha, H. Alqahtani, P. Watters, A. Ng, Cybersecurity data science: an overview from machine learning perspective, Journal of Big Data 7 (1) (2020) 1–29.
  • [10] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Computer Science (2021).
  • [11] I. H. Sarker, Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective, SN Computer Science (2021).
  • [12] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE symposium on computational intelligence for security and defense applications, IEEE, 2009, pp. 1–6.
  • [13] S. Seufert, D. O’Brien, Machine learning for automatic defence against distributed denial of service attacks, in: 2007 IEEE International Conference on Communications, IEEE, 2007, pp. 1217–1222.
  • [14] A. Alazab, M. Hobbs, J. Abawajy, M. Alazab, Using feature selection for intrusion detection system, in: 2012 International Symposium on Communications and Information Technologies (ISCIT), IEEE, 2012, pp. 296–301.
  • [15] A. L. Buczak, E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Communications surveys & tutorials 18 (2) (2015) 1153–1176.
  • [16] R. Agrawal, R. Srikant, et al., Fast algorithms for mining association rules, in: Proc. 20th int. conf. very large data bases, VLDB, Vol. 1215, 1994, pp. 487–499.
  • [17] I. H. Sarker, A. Kayes, Abc-ruleminer: User behavioral rule-based machine learning method for context-aware intelligent services, Journal of Network and Computer Applications (2020) 102762.
  • [18] I. H. Sarker, Context-aware rule learning from smartphone data: survey, challenges and future directions, Journal of Big Data 6 (1) (2019) 95.
  • [19] I. H. Sarker, A. Kayes, P. Watters, Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage, Journal of Big Data (2019).
  • [20] Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, K. Dai, An efficient intrusion detection system based on support vector machines and gradually feature removal method, Expert Systems with Applications 39 (1) (2012) 424–430.
  • [21] F. Amiri, M. R. Yousefi, C. Lucas, A. Shakery, N. Yazdani, Mutual information-based feature selection for intrusion detection systems, Journal of Network and Computer Applications 34 (4) (2011) 1184–1199.
  • [22] C. Wagner, J. François, T. Engel, et al., Machine learning approach for ip-flow record anomaly detection, in: International Conference on Research in Networking, Springer, 2011, pp. 28–39.
  • [23] M. V. Kotpalliwar, R. Wajgi, Classification of attacks using support vector machine (svm) on kddcup’99 ids database, in: 2015 Fifth International Conference on Communication Systems and Network Technologies, IEEE, 2015, pp. 987–990.
  • [24] H. Saxena, V. Richariya, Intrusion detection in kdd99 dataset using svm-pso and feature reduction with information gain, International Journal of Computer Applications 98 (6) (2014).
  • [25] M. S. Pervez, D. M. Farid, Feature selection and intrusion classification in nsl-kdd cup 99 dataset employing svms, in: The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), IEEE, 2014, pp. 1–6.
  • [26] T. Shon, Y. Kim, C. Lee, J. Moon, A machine learning framework for network anomaly detection using svm and ga, in: Proceedings from the sixth annual IEEE SMC information assurance workshop, IEEE, 2005, pp. 176–183.
  • [27] R. Kokila, S. T. Selvi, K. Govindarajan, Ddos detection and analysis in sdn-based environment using support vector machine classifier, in: 2014 Sixth International Conference on Advanced Computing (ICoAC), IEEE, 2014, pp. 205–210.
  • [28] C. Kruegel, D. Mutz, W. Robertson, F. Valeur, Bayesian event classification for intrusion detection, in: 19th Annual Computer Security Applications Conference, 2003. Proceedings., IEEE, 2003, pp. 14–23.
  • [29] S. Benferhat, T. Kenaza, A. Mokhtari, A naive bayes approach for detecting coordinated attacks, in: 2008 32nd Annual IEEE International Computer Software and Applications Conference, IEEE, 2008, pp. 704–709.
  • [30] M. Panda, M. R. Patra, Network intrusion detection using naive bayes, International journal of computer science and network security 7 (12) (2007) 258–263.
  • [31] L. Koc, T. A. Mazzuchi, S. Sarkani, A network intrusion detection system based on a hidden naïve bayes multiclass classifier, Expert Systems with Applications 39 (18) (2012) 13492–13500.
  • [32] R. Bapat, A. Mandya, X. Liu, B. Abraham, D. E. Brown, H. Kang, M. Veeraraghavan, Identifying malicious botnet traffic using logistic regression, in: 2018 Systems and Information Engineering Design Symposium (SIEDS), IEEE, 2018, pp. 266–271.
  • [33] E. Besharati, M. Naderan, E. Namjoo, Lr-hids: logistic regression host-based intrusion detection system for cloud environments, Journal of Ambient Intelligence and Humanized Computing 10 (9) (2019) 3669–3692.
  • [34] S. Vishwakarma, V. Sharma, A. Tiwari, An intrusion detection system using knn-aco algorithm, Int. J. Comput. Appl. 171 (10) (2017) 18–23.
  • [35] H. Shapoorifard, P. Shamsinejad, Intrusion detection using a novel hybrid method incorporating an improved knn, Int. J. Comput. Appl. 173 (1) (2017) 5–9.
  • [36] A. M. Sharifi, S. K. Amirgholipour, A. Pourebrahimi, Intrusion detection based on joint of k-means and knn, Journal of Convergence Information Technology 10 (5) (2015) 42.
  • [37] P. A. R. Kumar, S. Selvakumar, Distributed denial of service attack detection using an ensemble of neural classifier, Computer Communications 34 (11) (2011) 1328–1341.
  • [38] A. Dainotti, A. Pescapé, G. Ventre, A cascade architecture for dos attacks detection based on the wavelet transform, Journal of Computer Security 17 (6) (2009) 945–968.
  • [39] N. G. Relan, D. R. Patil, Implementation of network intrusion detection system using variant of decision tree algorithm, in: 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE), IEEE, 2015, pp. 1–5.
  • [40] K. Rai, M. S. Devi, A. Guleria, Decision tree based algorithm for intrusion detection, International Journal of Advanced Networking and Applications 7 (4) (2016) 2828.
  • [41] B. Ingre, A. Yadav, A. K. Soni, Decision tree based intrusion detection system for nsl-kdd dataset, in: International Conference on Information and Communication Technology for Intelligent Systems, Springer, 2017, pp. 207–218.
  • [42] S. Puthran, K. Shah, Intrusion detection using improved decision tree algorithm with binary and quad split, in: International Symposium on Security in Computing and Communication, Springer, 2016, pp. 427–438.
  • [43] D. Moon, H. Im, I. Kim, J. H. Park, Dtb-ids: an intrusion detection system based on decision tree using behavior analysis for preventing apt attacks, The Journal of supercomputing 73 (7) (2017) 2881–2895.
  • [44] A. O. Balogun, R. G. Jimoh, Anomaly intrusion detection using an hybrid of decision tree and k-nearest neighbor (2015).
  • [45] P. Sangkatsanee, N. Wattanapongsakorn, C. Charnsripinyo, Practical real-time intrusion detection using machine learning approaches, Computer Communications 34 (18) (2011) 2227–2235.
  • [46] I. Alrashdi, A. Alqazzaz, E. Aloufi, R. Alharthi, M. Zohdy, H. Ming, Ad-iot: Anomaly detection of iot cyberattacks in smart city using machine learning, in: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), IEEE, 2019, pp. 0305–0310.
  • [47] M. Mazini, B. Shirazi, I. Mahdavi, Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and adaboost algorithms, Journal of King Saud University-Computer and Information Sciences 31 (4) (2019) 541–553.
  • [48] J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques (2011).
  • [49] N. Sneha, T. Gangil, Analysis of diabetes mellitus for early prediction using optimal features selection, Journal of Big Data 6 (1) (2019) 13.
  • [50] G. H. John, P. Langley, Estimating continuous distributions in bayesian classifiers, in: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 1995, pp. 338–345.
  • [51] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830.
  • [52] D. W. Aha, D. Kibler, M. K. Albert, Instance-based learning algorithms, Machine learning 6 (1) (1991) 37–66.
  • [53] J. R. Quinlan, C4.5: Programs for machine learning, Machine Learning (1993).
  • [54] L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
  • [55] L. Breiman, Bagging predictors, Machine learning 24 (2) (1996) 123–140.
  • [56] Y. Amit, D. Geman, Shape quantization and recognition with randomized trees, Neural computation 9 (7) (1997) 1545–1588.
  • [57] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy, Improvements to platt’s smo algorithm for svm classifier design, Neural computation 13 (3) (2001) 637–649.
  • [58] S. Le Cessie, J. C. Van Houwelingen, Ridge estimators in logistic regression, Journal of the Royal Statistical Society: Series C (Applied Statistics) 41 (1) (1992) 191–201.
  • [59] Y. Freund, R. E. Schapire, et al., Experiments with a new boosting algorithm, in: Icml, Vol. 96, Citeseer, 1996, pp. 148–156.
  • [60] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, 2019.
  • [61] Colaboratory [online]. available: https://colab.research.google.com/.
  • [62] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, C. Wang, Machine learning and deep learning methods for cybersecurity, IEEE Access 6 (2018) 35365–35381.
  • [63] I. H. Sarker, A machine learning based robust prediction model for real-life mobile phone data, Internet of Things 5 (2019) 180–193.
  • [64] I. H. Sarker, M. M. Hoque, M. K. Uddin, T. Alsanoosy, Mobile data science and intelligent apps: Concepts, ai-based modeling and research directions, Mobile Networks and Applications (2020) 1–19.
  • [65] I. H. Sarker, A. Colman, J. Han, Recencyminer: mining recency-based personalized behavior from contextual smartphone data, Journal of Big Data 6 (1) (2019) 49.