CyberLearning: Effectiveness Analysis of Machine Learning Security Modeling to Detect Cyber-Anomalies and Multi-Attacks

Iqbal H. Sarker^1,2∗ ¹Department of Computer Science and Engineering,
Chittagong University of Engineering & Technology,
Chittagong-4349, Bangladesh.
²Swinburne University of Technology,
Melbourne, VIC-3122, Australia.
*Corresponding email: [email protected]
ORCID iD: https://orcid.org/0000-0003-1740-5517

Abstract

Detecting cyber-anomalies and attacks are becoming a rising concern these days in the domain of cybersecurity. The knowledge of artificial intelligence, particularly, the machine learning techniques can be used to tackle these issues. However, the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. In this paper, we present “CyberLearning”, a machine learning-based cybersecurity modeling with correlated-feature selection, and a comprehensive empirical analysis on the effectiveness of various machine learning based security models. In our CyberLearning modeling, we take into account a binary classification model for detecting anomalies, and multi-class classification model for various types of cyber-attacks. To build the security model, we first employ the popular ten machine learning classification techniques, such as naive Bayes, Logistic regression, Stochastic gradient descent, K-nearest neighbors, Support vector machine, Decision Tree, Random Forest, Adaptive Boosting, eXtreme Gradient Boosting, as well as Linear discriminant analysis. We then present the artificial neural network-based security model considering multiple hidden layers. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. Overall, this paper aims to serve as a reference point for data-driven security modeling through our experimental analysis and findings in the context of cybersecurity.

keywords:

cybersecurity; machine learning; deep learning; classification; feature selection; anomaly detection; cyber-attacks; security intelligence; cyber data analytics; intelligent systems.

^†^†journal: Journal: Internet of Things - Elsevier

1 Introduction

In recent days, the demand for cybersecurity and protection against cyber-anomalies and various types of attacks, such as unauthorized access, denial-of-service (DoS), botnet, malware, or worms has been ever increasing. Such anomalies led to irreparable damage and financial losses in large-scale computer networks [1] [2]. For example, one ransomware virus in May 2017 caused tremendous losses to many organizations and sectors, including banking, medical care, electricity, and universities, and caused a loss of 8 billion dollars [3]. In the domain of cybersecurity, such security breaches or intrusions have become the common issue these days while securing a cyber-system as well as an Internet of Things (IoT) system. Although various traditional methods, such as firewalls, encryption, etc., are designed to handle Internet-based cyber-attacks, an intelligent system that effectively detects such anomalies or attacks, is the key to tackle these issues. Thus, in this paper, we mainly focus on the knowledge of artificial intelligence, particularly, the applicability of machine learning security modeling, which could be more effective due to its automated learning capabilities from the training security data.

Developing machine learning-based security models to analyze various cyber-attacks or anomalies, and eventually detect or predict the threats can be used for intelligent security services [4]. Typically, the detection models could be for handling multiple associated cyber-attacks, i.e., “multi-class” problem, or to detect anomalies, i.e., “binary-class” problem. Several recent research, such as to detect botnet attack [5], attack and anomaly detection analysis in IoT sensors in IoT site [6], classifying attacks to build an intrusion detection system [7], to detect the anomalous network connections and classifying the normal traffic and attack [8], etc. have been done in the area. Although several machine learning techniques are used for different purposes, these are limited to analyze the variations in the significance of the security features, or to conduct the empirical analysis in a small range in terms of techniques used for security intelligence modeling. These are discussed briefly in Section 2, and summarized in Table 1. Moreover, in case of unknown attacks, the abnormal behaviors that are considered as anomalies, which is different from the normal traffic, and the relevant model can be used in many security solutions [1] [9]. Thus to classify the associated attacks in several well-known classes such as DoS, botnet, malware, worms, etc. as well as to classify anomalies for unknown attacks from the normal traffic is essential for intelligent modeling in the area of cybersecurity.

Different machine learning models by taking into account the above-mentioned issues may perform differently according to their learning capabilities from security data. The reason is that the effectiveness of a learning-based security model may vary depending on the significance of the associated security features and the data characteristics. In the real-world scenario, the cybersecurity issues might be involved with a huge number of security features, several known or unknown attack classes, or anomalies. Thus, an effective feature selection technique and a robust classification model usually consist of the construction of an intelligent intrusion detection system. Various types of machine learning techniques and their applicability in the area of cybersecurity, have been discussed briefly in Sarker et al. [9], however a detailed empirical analysis is needed by taking into account the above-mentioned issues to make an intelligent decision in the area. Therefore, we aim to present a comprehensive empirical analysis on the effectiveness of various machine learning based security models by taking into account the issues, to make an intelligent decision in such diverse real-world scenarios in the area.

To address the issues mentioned above, in this paper, we present “CyberLearning”, a machine learning-based security modeling by taking into account the significance of the security features, and relevant experimental analysis. In our analysis, we take into account a binary classification model for detecting anomalies, and multi-class classification model for detecting various types of cyber-attacks, such as DoS, Backdoor, Worms, etc. In a binary-class classification model, the given security dataset is categorized into two classes, such as ‘normal’ or ‘anomaly’, whereas in a multi-class classification model, the given dataset is categorized into several attack classes, mentioned above. For modeling, we first employ the popular ten machine learning classification techniques, such as Naive Bayes (NB), Logistic regression (LR), Stochastic gradient descent (SGD), K-nearest neighbors (KNN), Support vector machine (SVM), Decision Tree (DT), Random Forest (RF), Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), Linear discriminant analysis (LDA), as well as Artificial Neural Network (ANN) based model, which is frequently used in deep learning [10] [11]. For selecting features, we take into account the feature correlation values, and then the resultant security model has been built based on the selected features considering both the model accuracy and simplicity or complexity. The main idea is that the learning-based model typically examines the behavior of the network utilizing the data, finding the security patterns for profiling the normal behavior, and thus detects the anomalies or associated attacks. The effectiveness of these learning-based security models is examined by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 [2] and NSL-KDD [12].

The contributions of this work can be summarized as follows.

•

We first highlight the importance of security features in a machine learning security modeling to detect cyber-anomalies and multi-attacks. Thus we adopt a correlated-feature selection approach to reduce the insignificant or irrelevant security features, which makes the security model lightweight and more applicable.
•

We present a binary classification model for detecting cyber-anomalies or unknown attacks, where the security model classifies the data into two classes, such as ‘normal’ and ‘anomaly’. We also analyze the effectiveness of various popular machine learning classification models while detecting such anomalies.
•

We present a multi-class classification model for detecting various cyber-attacks, such as DoS, Backdoor, Worms, etc. where the security model classifies the data into these attack classes. We also analyze the effectiveness of various popular machine learning classification models while detecting such cyber-attacks.
•

Finally, we conduct a range of experiments and present a comprehensive empirical analysis on the effectiveness of various machine learning classification based security modeling for unknown test cases.

The rest of the paper is organized as follows. Section 2 provides the background and related work of our study. In Section 3, we present our machine learning-based security modeling by taking into account the significance of the security features. We evaluate the resultant security model and report the experimental results in Section 4. In Section 5, several key findings of our analysis in the area are summarized. Finally, Section 6 concludes this paper and highlights the future work.

2 Background and Related Work

A number of research has been done in the area of cybersecurity with the capability of detecting cyber-anomalies and attacks or intrusions. In the cyber industry, both the signature-based intrusion detection system (SIDS) and anomaly-based intrusion detection systems (AIDS) are well-known for detecting and preventing cyber-attacks [9]. SIDS is based on known signatures of the attacks [13]. AIDS, on the other hand, has the benefit of identifying invisible threats over SIDS, including the ability to distinguish unknown or zero-day attacks [14] [15]. Although association analysis is popular in the area of machine learning to build rule-based intelligent systems [16] [17] [18], it might not be effective due to its redundant generation and complexity with higher dimensions of security features while detecting anomalies or cyber-attacks. Thus, to achieve our goal, in this work, we primarily focus on machine learning classification models [19], for security modeling because of their automated learning capabilities from the security data.

Several machine learning techniques have been used for various purposes. For instance, Li et al. [20] classify different types of attacks such as DoS, Probe or Scan, U2R, R2L, as well as regular traffic using SVM classifier using the most common KDD’99 cup dataset. Similarly, Amiri et al. [21], Wagner et al. [22], Kotpalliwar et al. [23], Saxena et al. [24], Pervez et al. [25], Li et al. [20], Shon et al. [26], Kokila et al. [27], and Raman et al. [7] used SVM classifier in their studies for the purpose of detecting attacks. Several other classifiers are used to detect intrusions or attacks, in addition to the SVM classifier mentioned above. For example, a probability-based Bayesian network is used by Kruegel et al. [28] to identify events processing TCP/IP packets. Benferhat et al. have identified a DoS intrusion detector using the same Bayesian network in their research [29]. Similarly, Panda et al. [30], Koc et al. [31] also use the naive Bayes classifier for detecting attacks in their systems.

Several studies [32] [33] have been conducted to classify malicious traffic and intrusions using a logistic regression model. The KNN, an instance-based learning algorithm, is another common method of machine learning where the classification of a point is determined by that data point’s k-nearest neighbors. Vishwakarma et al. [34], Shapoorifard et al. [35], Sharifi et al. [36] use KNN classification technique in their studies for the purpose of intrusion detection systems. Authors in [37] consider neural classifier, and in [38] consider wavelet transform for anomaly detection particularly DoS attacks. A significant number of research in the domain of cybersecurity, such as Relan et al. [39], Rai et al. [40], Ingre et al. [41], Malik et al. [8], [41], Puthran et al. [42], Moon et al. [43], Balogun et al. [44], Sangkatsanee et al. [45] use DT classification approach in their studies for the purpose of building intrusion detection systems. To detect anomalies and address loT cybersecurity threats in smart city, Alrashdi et al. [46] use RF learning consisting of multiple decision trees in their binary classification model. Mazini et al. [47] use AdaBoost approach with feature selection while building anomaly network-based intrusion detection system in their work.

A machine learning security model for detecting anomalies has been presented in [1], which is effective in terms of prediction accuracy as well as reducing the feature dimensions based on the decision tree classification approach with feature selection. Recently, a machine learning-based botnet attack detection framework with sequential detection architecture has been presented in [5], where ANN, DT, and NB classification techniques are used. Hasan et al. [6] perform attack detection analysis in IoT sites, to develop a smart, secured, and reliable IoT based infrastructure. Although several machine learning techniques, such as SVM, DT, RF, LR, and ANN are used, the analysis is limited to a small number of security features for detecting different types of attacks. Moreover, the variations in the significance of the security features, which could be a crucial part while building an effective security model using machine learning techniques, are not addressed.

Table 1: A summary of machine learning based security models for detecting cyber-anomalies and attacks

Purposes	Used Techniques	Type	References
To detect IoT-Botnet Attack	ANN, DT, NB and Feature selection	Multiclass	Soe et al. [5] (2020)
Classifying attacks to build an efficient intrusion detection system	SVM and Feature selection	Multiclass	Raman et al. [7] (2019)
To design a host-based intrusion detection system	LR and Feature selection	Multiclass	Besharati et al. [33] (2019)
To detect attacks in the IoT environment	LR, SVM, DT, RF, and ANN	Multiclass	Hasan et al. [6] (2019)
To build anomaly network-based intrusion detection system	AdaBoost and , Feature selection	Multiclass	Mazini et al. [47] (2019)
Detecting attacks to establish an efficient intrusion detection system	SVM and Feature removal	Multiclass	Li et al. [20] (2012)
To classify network events as normal or attack events	NB and Feature selection	Multiclass	Koc et al. [31] (2012)
To detect intrusion to the cloud system	KNN and Feature selection	Multiclass	Sharifi et al. [36] (2015)
To detect the anomalous network connections and classifying normal and attack	DT and pruning	Binary	Malik et al. [8] (2018)
To detect anomalies and address loT cybersecurity threats in Smart City	RF learning	Binary	Alrashdi et al. [46] (2019)
To detect anomalies in a network and classifying normal and attack	DT and Feature selection	Binary	Sarker et al. [1] (2020)
To introduce cybersecurity data science highlighting cyber-anomalies and attacks	Overall machine learning perspective	–	Sarker et al. [9] (2020)
To detect cyber-anomalies and multi-attacks	NB, LDA, KNN, XGBoost, DT, RF, SVM, SGD, AdaBoost, LR, ANN, and Feature selection	Binary and Multiclass	CyberLearning (our analysis)

In the real-world scenario, the cybersecurity issues might be involved with a huge number of security features, and the effectiveness of a learning-based security model may vary depending on the significance of the associated security features and the data characteristics. Various types of machine learning techniques and their applicability in the area of cybersecurity, have been discussed in Sarker et al. [9], however a detailed empirical analysis is needed to make an intelligent decision in the area. Unlike the above approaches, in this paper, we present “CyberLearning”, a machine learning-based cybersecurity modeling with correlated-feature selection according to their significance in modeling, and a comprehensive empirical analysis on the effectiveness of various machine learning-based security models. While building the security models, we take into account a binary classification model for detecting anomalies, and a multi-class classification model for detecting multi-attacks in the context of cybersecurity, to provide a comprehensive view to the readers in the area. In Table 1, we also summarize the most relevant machine learning-based security models within the scope of our study for a clear understanding for the readers.

3 Materials and Methods

In this section, we present our security model of machine learning to detect cyber-anomalies and attacks. This involved several processing steps: exploring the security dataset, preparing raw data, determining the correlation and ranking of features, and constructing a security model. We address these steps briefly in the following section in order to achieve our goal.

3.1 Exploring Security Dataset

Usually, security datasets reflect a series of information records consisting of several security features and relevant details that can be used to construct a security model [9] for detecting anomalies. Thus, to detect malicious activity or anomalies, it is important to understand the nature of raw cybersecurity data and the trends of security incidents. In this work, we use the most popular UNSW-NB15 [2] and NSL-KDD [12] security datasets, to build the data-driven security model and the effectiveness analysis.

Table 2: UNSW-NB15 Dataset features with value type.

Feature Name	Value Type	Feature Name	Value Type
$srcip$	Nominal	$sport$	Integer
$dstip$	Nominal	$dsport$	Integer
$proto$	Nominal	$state$	Nominal
$dur$	Float	$sbytes$	Integer
$dbytes$	Integer	$sttl$	Integer
$dttl$	Integer	$sloss$	Integer
$dloss$	Integer	$service$	Nominal
$Sload$	Float	$Dload$	Float
$Spkts$	Integer	$Dpkts$	Integer
$swin$	Integer	$dwin$	Integer
$stcpb$	Integer	$dtcpb$	Integer
$smeansz$	Integer	$dmeansz$	Integer
$Sload$	Float	$Dload$	Float
$Spkts$	Integer	$Dpkts$	Integer
$swin$	Integer	$dwin$	Integer
$trans\_depth$	Integer	$res\_bdy\_len$	Integer
$Sjit$	Float	$Djit$	Float
$Stime$	Timestamp	$Ltime$	Timestamp
$Sintpkt$	Float	$Dintpkt$	Float
$tcprtt$	Float	$synack$	Float
$ackdat$	Float	$is\_sm\_ips\_ports$	Binary
$ct\_state\_ttl$	Integer	$ct\_flw\_http\_mthd$	Integer
$is\_ftp\_login$	Binary	$ct\_ftp\_cmd$	Integer
$ct\_srv\_src$	Integer	$ct\_srv\_dst$	Integer
$ct\_dst\_ltm$	Integer	$ct\_src\_ltm$	Integer
$ct\_src\_dport\_ltm$	Integer	$ct\_dst\_sport\_ltmltm$	Integer
$ct\_dst\_src\_ltm$	Integer

Nine types of attacks, including Fuzzers, Study, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms, are included in the UNSW-NB15[2] dataset. It contains 257673 instances with the training and testing set and 45 features. On the other hand, NSL-KDD [12] dataset contains the Denial of Service Attack (DoS), User to Root Attack (U2R), Remote to Local Attack (R2L), and Probing Attack. The raw data source consists of 494020 instances with 41 security features that are taken into account in our experimental analysis. The features can be in various types in a dataset. For instance, in Table 2, we show the security features of the UNSW-NB15 dataset, where the features are not identical. Thus effectively analyzing these features and building a security model for detecting the anomalies and multi-attacks mentioned above, is the key in our analysis.

3.2 Security Data Pre-Processing

Data preparation includes anomaly and attacks, feature encoding, and scaling according to the characteristics of the given dataset.

•

Anomaly and Attacks: As mentioned earlier, the dataset UNSW-NB15 [2] contains nine types of attacks. These are known as anomalies in this dataset and are used in a binary classification model, while all these separate attacks are used in a multi-class classification model that is taken into account in our analysis. Similarly, the four types of attacks such as DoS, U2R, R2L, and Probing, are known as anomalies in NSL-KDD [12] dataset and are used in the corresponding classification model.
•

Feature encoding: As shown in Table 2, the dataset UNSW-NB15 [2] contains several feature types such as the nominal, integer, float, timestamp, and binary values. Thus, to fit the data to the security model, we first convert all the nominal valued features into vectors. Although, “One Hot Encoding” is a popular technique, we use “Label Encoding” in this work. The reason is that, in one hot encoding technique, a significant number of feature dimensions increase [1]. The label-encoding technique, on the other hand, transforms the feature values directly into precise numeric values that can be used to fit a classification model for machine learning. Similarly, the features in NSL-KDD [12] dataset are encoded to build the resultant security model.

Figure 1: Secuity feature $`sbyte^{\prime}$

Figure 2: Secuity feature $`synack^{\prime}$
•

Feature scaling: Feature scaling is also known as data normalization in the task of data pre-processing. All the security features in a dataset may not identical in terms of data distribution, and vary from feature to feature. For instance, Figure 1 and Figure 2 show the data distribution for two different features, $sbyte$ and $synack$ respectively in the dataset UNSW-NB15 [2]. According to Figure 1 and Figure 2, for some data points, the value is very low while for some data points, it is much higher. Thus, we use Standard Scaler, a data scaling method that is used to normalize the range of the feature values with the mean value = 0 and standard deviation = 1.
•

Data Splitting: As we aim to build learning-based security modeling, data splitting can be considered as an important part. The reason is that a good security model may be based on bad data splitting. Thus, for building a fair model and evaluation, we first consider the data from data sources as input data and split them using a $k$ fold cross-validation technique [48]. According to $k$ fold cross-validation technique, we first randomly partition the input data mentioned above into $k$ mutually exclusive subsets or “folds”, $d_{1},d_{2},...,d_{k}$ . Each fold has an approximately equal size of data instances. The model needs $k$ iteration to complete the overall process. Thus, in each iteration $i$ , we use all the data instances of all folds except $d_{i}$ as the training dataset that can be used to build the resultant security model. For evaluation purpose $d_{i}$ is used as the testing dataset in each iteration $i$ . Eventually, the average result is taken into account as the outcome of the model.

3.3 Modeling Techniques

In our CyberLearning modeling, we take into account the impact of security features while building the security model. In the following, we present how we rank the features for selection, and various machine learning algorithms that are employed to build the model, and effectiveness analysis within the scope of our study.

3.3.1 Feature Ranking and Selection

Feature selection in the cybersecurity domain can provide a better understanding of the security data, a way of simplifying the security model by reducing the computational cost or model complexity, as well as providing significant outcomes in a machine learning-based model. Security dataset may contain data with high dimensions, and some of them may be highly correlated to anomalies or attacks, while some have less correlation or no correlation at all. Thus, in order to create a machine learning classification-based security model, all the security features in a given dataset may not contain significant details. In addition, due to the over-fitting issue [1] [49], further processing with all the security features could provide poor results. Thus, security feature selection is required not only to reduce the computational cost but also to create a more efficient security model with a higher accuracy rate. Thus, security feature selection is considered as a method that can be used to filter those features that are less significant, redundant, or have no impact on modeling, from the given security dataset.

To achieve this goal, we first calculate the correlation of the security features, known as the Pearson correlation coefficient, and rank them accordingly. The correlation-based feature selection is based on the following hypothesis: “Good feature subsets contain features highly correlated with the target class, yet uncorrelated or less correlated to each other”. If $X$ and $Y$ represent two random contextual variables, then the correlation coefficient between $X$ and $Y$ is defined as [48] -

r(X,Y)=\frac{\sum_{i=1}^{n}(X_{i}-\bar{X})(Y_{i}-\bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}}\sqrt{\sum_{i=1}^{n}(Y_{i}-\bar{Y})^{2}}}

(1)

In the field of statistics, the formula Equ. 1 is often used to determine how strong that relationship is between those two variables $X$ and $Y$ . In our security modeling, the higher the value, the more significant the security feature for building the resultant learning-based security model. For instance, a value of $1$ (max) means that the outcome of the learning-based security model is directly associated with that security feature, and $0$ (min) means that the output of the model does not depend on that security feature at all. Thus, in the scope of our analysis, we calculate the correlation coefficient values of each security feature in both our binary classification modeling for detecting anomalies and multi-class classification modeling for detecting various types of attacks.

3.4 Machine Learning Algorithms and Parameters

In this section, we present how various machine learning classification techniques as well as ANN-based modeling with multiple hidden layers, are used in our security modeling.

3.4.1 Naive Bayes (NB)

Naïve Bayesian (NB) [50] is one of the common classification techniques for machine learning that is often used in the field of machine learning and data science. This is based on Bayes’s theorem that describes the probability of a given feature, according to the prior knowledge of situations related to that feature. Let, $X=\{x_{1},x_{2},...,x_{n}\}$ is a security feature vector of size $n$ , and $c$ is a class variable that represents the cyber-attacks or anomalies. Thus, it calculates the probability $(P)$ using the following equation [48]:

P(c|X)=\frac{P(X|c)P(c)}{P(X)}

(2)

P(c|x_{1},x_{2},...,x_{n})=\frac{P(x_{1}|c)P(x_{2}|c)...P(x_{n}|c)P(c)}{P(x_{1})P(x_{2})...P(x_{n})}

(3)

To build a security model, we use the Gaussian Naive Bayes classifier [51] assuming all the security features are following a Gaussian distribution i.e, normal distribution. The prior probabilities of the classes in our security modeling are adjusted according to the data. The portion of the largest variance of all security features is added to the variances for calculation stability or smoothing.

3.4.2 Linear Discriminant Analysis (LDA)

In machine learning, Linear Discriminant Analysis (LDA) [48] is another probability-based method to find a linear combination of security features that separates the anomaly or attack classes. This method is also known as a generalization of Fisher’s linear discriminant, that projects a given security dataset onto a lower-dimensional space, i.e., dimensionality reduction that minimizes the model complexity or reduce the computational costs of the resultant security model. Consequently, it has the capability for good class-separability to avoid the problem of overfitting. Thus, the resulting combination mentioned above can be used as a linear classifier or, more specifically, for dimensionality reduction of security features before performing the tasks of anomaly or attack classification. The standard LDA model typically fits a Gaussian density to each class such as ‘anomaly’ or ‘normal’ or various types of attacks, assuming that all classes share the same covariance matrix [51]. For modeling, the LDA approach also uses Bayes’ theorem mentioned above to estimate probabilities and to make predictions of the class anomaly or various types of cyber-attacks based upon the probability that a new input dataset belongs to each anomaly or attack class. The class which has the highest probability is considered the output anomaly or attack class, and then the LDA makes a prediction. In our security modeling, we use $singular\;value\;decomposition$ as a solver method with no shrinkage to get the outcome. The prior probabilities of the classes in our security modeling are inferred from the given security data.

3.4.3 K-nearest Neighbor (KNN)

K-nearest neighbors (KNN) [52], also known as a lazy learning algorithm, is an instance learning or non-generalizing learning. Instead of using all data instances during classification, this approach does not have a specialized training process for constructing a model. Based on a ’feature similarity’ scale, it classifies new test cases, considering a distance function, such as $Minkowski$ , $Euclidean$ , $Manhattan$ distance etc [48]. Let, two variables $X$ and $Y$ , then the $Minkowski\;distance$ between these two variables is defined as $\left(\sum_{i}|X_{i}-Y_{i}|^{p}\right)^{1/p},\ \text{where}\ p\geq 1$ . It can behave differently depending on $p$ values, such as $p=1$ and $p=2$ represent Manhattan and Euclidean distance respectively.

d\left(X,Y\right)=\sqrt{\sum_{i=1}^{n}\left(X_{i}-Y_{i}\right)^{2}}

(4)

In our security modeling, we take into account the most popular Euclidean distance considering $p=2$ [51], and can be defined as Equ 4. The number of neighbors indicating as $k$ values is another key parameter in a KNN based security modeling. Thus, we take into account $k=5$ , as the number of neighbors, and uniform weights, where all points in each neighborhood are weighted equally in our security modeling.

3.4.4 Decision Tree (DT)

Decision tree (DT) [53] is a well-known classification framework for machine learning, which is commonly used in various fields of use. A decision tree is a method of non-parametric supervised learning that breaks down a given security dataset into smaller subsets and incrementally generates a related branch of the tree. For splitting, the most popular criteria are “gini” for the Gini impurity and “entropy” for the information gain, which can be expressed mathematically as [51].

Entropy:H(x)=-\sum_{i=1}^{n}p(x_{i})\log_{2}p(x_{i})

(5)

Gini(E)=1-\sum_{i=1}^{c}{p_{i}}^{2}

(6)

Where $p_{i}$ denotes the probability of an element being classified for a distinct anomaly or attack class. To build a decision tree based security model, we use “Gini Index” that is determined by deducting the sum of squared of probabilities of each anomaly or attack class from $1$ that can be expressed as Equ. 6. While generating the tree considering both the anomaly or attack classes, nodes are taken into account to expand until all leaves are pure or until all leaves contain less than two sample instances.

3.4.5 Random Forest (RF)

In the field of machine learning and data science, the random forest (RF) [54] is well known as an ensemble classification technique that is used in different application areas. This consists of multiple decision trees, where a decision tree classifier discussed above is used as a single tree in the forest model. This combines the bootstrap aggregation (bagging) [55] with the random selection of features [56] to create a collection of controlled variance decision trees. The majority voting of the generated decision trees in a forest model is used to measure the outcome. To build a random forest security model, we generate $N=100$ decision trees in the forest, where the quality of a split in a tree is measured by ‘Gini’, defined earlier in Equ. 6.

3.4.6 Support Vector Machine (SVM)

In machine learning, support vector machine (SVM) [48] is another popular classification technique. This technique is based on a hyperplane between the data space, which best divides the security dataset into two classes, such as ‘anomaly’ or ‘normal’ and can behave differently based on the mathematical functions known as the kernel that can be different types such as linear, nonlinear, polynomial, radial basis function (RBF), sigmoid, etc. To build a security model, we use the RBF kernel [57], also known as the Gaussian kernel, considering no prior knowledge about the given security data. The RBF kernel is mathematically defined as -

k(x,y)=exp(-\lambda||x-y||^{2})

(7)

where $\lambda$ is a parameter that sets the “spread” of the kernel. Based on this RBF kernel function defined in Equ. 7, this technique manipulates the given security data accordingly to achieve the goal. Overall, it works in two stages, including the identification of the optimal hyperplane in the data space and then the mapping of the security data instances according to the hyperplane’s defined decision boundaries. Moreover, we use $C=1.0$ (regularization parameter), considering the trade-off between achieving a low training error, and a low testing error in a SVM based security model.

3.4.7 Logistic Regression (LR)

Another common probabilistic dependent statistical model used to solve the classification problems in machine learning is Logistic Regression (LR) [58]. Typically, logistic regression calculates the probabilities using a logistic equation, which is often referred to as the mathematically defined sigmoid function -

g(z)=\frac{1}{1+exp(-z)}

(8)

While building LR based security modeling, we use $L_{2}$ regularization, i.e., $Ridge$ regression that adds squared magnitude of coefficient as penalty term to the loss function. The “C” is similar to the SVM model. We also use Scikit-learn solver “lbfgs” [51], which stands for Limited-memory Broyden–Fletcher–Goldfarb–Shanno, to build the security model.

3.4.8 Adaptive Boosting (AdaBoost)

Boosting, a machine-learning algorithm, can be used for classification that is able to reduce bias and variance from the dataset. Boosting helps to convert weak learners to strong ones. Adaptive Boosting (AdaBoost) is such an algorithm formulated by Yoav Freund et al. [59]. In that sense, AdaBoost is called an adaptive classifier by significantly enhancing the efficiency of the classifier, but in some instances, it can trigger overfits. For noisy data and outliers, AdaBoost is sensitive. We use a decision tree classifier with maximum depth ( $max\_depth=1$ ) as a base estimator. The maximum number of estimators is taken into account as $50$ at which boosting is terminated.

3.4.9 Extreme Gradient Boosting (XGBoost)

Gradient Boosting is another ensemble learning algorithm, similar to the Random Forests discussed above, that creates a final model based on a set of individual models. Similar to how neural networks use gradient descent to optimize weights, the gradient is used to minimize the loss function. XGBoost stands for Extreme Gradient Boosting, which is known as a special Gradient Boosting method that takes into account more accurate approximations to find the best model. It computes second-order gradients of the loss function to minimize the loss and advanced regularization (L1 & L2), which reduces overfitting and improves model generalization. We employ scikit-learn [51] API compatible class while building a security model based on XGBClassifier in our analysis.

3.4.10 Stochastic Gradient Descent (SGD)

Stochastic gradient descent (SGD) [48] is an iterative method for optimizing an objective function with suitable smoothness properties, where the word ‘stochastic’ means a system or a process that is linked with a random probability. A gradient is the slope of a function that calculates a variable’s degree of change in response to another variable’s changes. Gradient Descent is mathematically a convex function whose output is a partial derivative of a set of its input parameters. Let, $\alpha$ is the learning rate, and $J_{i}$ is the cost of $i^{th}$ training example, then Equ. 9 represents the weight update process for the stochastic gradient descent at $j^{th}$ iteration.

w_{j}\ :=\ w_{j}-\alpha\ \frac{\partial J_{i}}{\partial w_{j}}

(9)

The greater the gradient, the steeper the slope. While building the security model, we use a loss function $`hinge^{\prime}$ , which gives a linear SVM. Moreover, we use $L_{2}$ regularization similar to logistic regression and a constant $alpha=0.0001$ that multiplies the regularization term while building the security model.

3.4.11 Artificial Neural Network (ANN)

Artificial Neural Network (ANN) is also a machine learning technique and used typically in deep learning modeling, which is comprised of a network of artificial neurons or nodes [48]. In this work, we build a feed-forward ANN-based deep learning security model consisting of an input layer with the selected security features, three hidden layers with 128 neurons, and an output layer with one neuron for binary classification, or the equal number of classes for multi-class classification task. We also use dropout in each layer to simplify the security model and compile the neural network model with Adam optimizer [60].

ReLU:f(x)=max(0,x)

(10)

Softmax:f(y_{k})=\frac{\exp(\phi_{k})}{\sum^{c}_{j}\exp(\phi_{j})}

(11)

Sigmoid:f(z)=\frac{1}{1+e^{-z}}

(12)

Loss=\begin{cases}-{(y\log(p)+(1-y)\log(1-p))}&\texttt{for }binary\\ -\sum_{c=1}^{M}y_{o,c}\log(p_{o,c})&\texttt{for }multiclass\\ \end{cases}

(13)

We use 100 epochs with a batch size of 128 when training the security network. We often use a small value of 0.001 as the learning rate, as it enables the global minimum to be reached by the security network model. We use the Rectified Linear Unit (ReLU) described in Equ. with regard to the activation function. 10, which addresses the problem of the vanishing gradient, as well as helps the model to learn faster. However, we use the Softmax activation function defined in Equ. 11 for multi-class attack detection and the Sigmoid or Logistic activation function defined in Equ. 12 for binary classification as it exists between (0 to 1) in the output layer. To adjust the weights of the model, we use the Cross-Entropy loss function, defined in Equ. 13, where $M$ represents the number of attack classes $c$ , $y$ represents binary indicator, and $p$ represents probability observation $o$ . The popular Backpropagation technique [48] is used to adjust the connection weights between neurons of the security model during learning.

4 Experimental Results and Analysis

In this section, we aim to briefly analyze and report the experimental results of machine learning-based security modeling as well as artificial neural network-based model utilizing the security datasets. For this, we first set up our experiments highlighting several questions to evaluate our security model, and then briefly discuss the experimental results and findings in various dimensions related to our analysis of cyber-anomalies and multi-attacks detection.

4.1 Experimental Setup

To evaluate our CyberLearning model, we aim to answer the following questions:

•

Question 1: Does the impact of the security features vary from feature to feature while building a machine learning-based security model?
•

Question 2: How effective is the machine-learning-based security model for detecting cyber-anomalies considering binary classification?
•

Question 3: How effective is the machine-learning-based security model for detecting multi-attacks considering multi-class classification?
•

Question 4: How effective the artificial neural network-based security model for detecting the anomalies and multi-class attacks?

To answer these questions related to our CyberLearning analysis, we have conducted a range of experiments on security datasets consisting of the anomalies and multi-attacks discussed in the earlier section. We have implemented all these methods in Python programming language using Scikit-learn [51], Tensorflow, and Keras [60], and executed them on Google Colab [61]. In the following subsections, we first define the evaluation metrics that are taken into account in our experimental evaluation.

4.2 Evaluation Metric

To measure the effectiveness of our CyberLearning model, we compute the outcome results in terms of precision, recall, F-score, as well as model accuracy in percentage. For this, we first calculate the true positive rate (TP), true negative rate (TN), false positive rate (FP), and false-negative rate (FN) that are defined as below [48] -

•

TP (true positive): An outcome where the security model correctly detects or classifies the positive class of anomaly or attacks.
•

TN (true negative): An outcome where the security model correctly detects or classifies the negative class of anomaly or attacks.
•

FP (false positive): An outcome where the security model incorrectly detects or classifies the positive class of anomaly or attacks.
•

FN (false negative): An outcome where the security model incorrectly detects or classifies the negative class of anomaly or attacks.

Based on these definitions of TP, TN, FP, and FN, we can compute the precision, recall, F-score, accuracy as below [48] -

Precision=\frac{TP}{TP+FP}

(14)

Recall=\frac{TP}{TP+FN}

(15)

F1-score=2*\frac{Precision*Recall}{Precision+Recall}

(16)

Accuracy=\frac{TP+TN}{TP+TN+FP+FN}

(17)

In the area of machine learning and data science, these metrics are well-known and widely used to measure the effectiveness of a model [48] [19]. The greater the value the effective the security model is. In the following subsection, we discuss the experimental results briefly and analyze the model effectiveness considering these metrics.

4.3 Impact of Security Features and Ranking

To answer the first question mentioned above, in this experiment, we calculate and show the impact of each feature based on their correlation values. Table 3 shows the calculated correlation scores of all the 42 security features utilizing the given security dataset UNSW-NB15. The results are shown in a descending order for detecting anomalies considering binary classification, where the values are arranged from the largest to the smallest number. If we observe Table 3, we see that the calculated scores of all features are not identical in a given dataset, and may vary from feature-to-feature according to their impact on the target anomaly and attack classes.

Table 3: The ranking of the security features with corresponding correlation scores for detecting anomalies utilizing the dataset UNSW-NB15.

Rank	Feature	Score	Rank	Feature	Score
01	$sttl$	0.624082	22	$dloss$	0.075961
02	$ct\_state\_ttl$	0.476559	23	$service$	0.073552
03	$state$	0.462972	24	$dbytes$	0.060403
04	$ct\_dst\_sport\_ltm$	0.371672	25	$djit$	0.048819
05	$swin$	0.364877	26	$synack$	0.043250
06	$dload$	0.352169	27	$spkts$	0.043040
07	$dwin$	0.339166	28	$dinpkt$	0.030136
08	$rate$	0.335883	29	$dur$	0.029096
09	$ct\_src\_dport\_ltm$	0.318518	30	$smean$	0.028372
10	$ct\_dst\_src\_ltm$	0.299609	31	$tcprtt$	0.024668
11	$dmean$	0.295173	32	$sbytes$	0.019376
12	$stcpb$	0.266585	33	$dttl$	0.019369
13	$dtcpb$	0.263543	34	$response\_body\_len$	0.018930
14	$ct\_src\_ltm$	0.252498	35	$sjit$	0.016436
15	$ct\_srv\_dst$	0.247812	36	$ct\_flw\_http\_mthd$	0.012237
16	$ct\_srv\_src$	0.246596	37	$ct\_ftp\_cmd$	0.009092
17	$ct\_dst\_ltm$	0.240776	38	$is\_ftp\_login$	0.008762
18	$sload$	0.165249	39	$proto$	0.008023
19	$is\_sm\_ips\_ports$	0.160126	40	$trans\_depth$	0.002246
20	$sinpkt$	0.155454	41	$sloss$	0.001828
21	$dpkts$	0.097394	42	$ackdat$	0.000817

According to Table 3, the feature $sttl$ has the highest score of $0.624082$ and thus selected as the top-ranked feature, whereas another feature $ackdat$ has a lower score of $0.000817$ that is closer to the value $0$ for this dataset, and thus selected as the last ranked feature. These correlation scores may be different for another dataset depending on their features and classes. The higher the correlation value, the more significant the feature in a security model. Thus, based on the scores, we can conclude that all the features in a given security dataset might not have a similar impact to build a data-driven security model.

4.4 Effectiveness Analysis for Detecting Cyber-Anomalies

To show the effectiveness of the security models based on machine learning classifiers, Table 4 shows the effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based anomaly detection models considering binary classification. The results in Table 4 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. These are selected according to their correlation scores and ranking, shown in Table 3 considering a particular threshold. If we observe the results in Table 4, we can see that various machine learning security models have an impact on the number of selected features. In general, higher accuracy results considering a minimum number of top-ranked features represent the effectiveness of the security models, in terms of both the detection outcome and model complexity or simplicity. For instance, the NB security model gives higher accuracy (85%) when the top 24 features are selected to build the model. Similarly, RF and SVM security models also give higher accuracy (95%) and (92%), when the top 24 features are selected to build the corresponding models. Some models such as LDA, AdaBoost, SGD, and LR show their significant results considering all the 42 features, while some models such as KNN, XGBoost show their significant results considering only the top 17 selected features. In addition to RF (accuracy 95%), DT (accuracy 94%), and XGBoost (accuracy 93%) also give significant results for detecting anomalies.

Table 4: Effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based anomaly detection models utilizing the dataset UNSW-NB15.

Model	Features (42)	Features (31)	Features (24)	Features (17)
NB	82	83	85	75
LDA	89	87	87	84
KNN	92	92	92	92
XGBoost	93	93	93	93
DT	94	94	93	92
RF	95	95	95	94
SVM	92	92	92	91
AdaBoost	93	92	92	92
SGD	89	88	88	86
LR	90	88	87	84

In addition to Table 4, Figure 3 also shows the relative comparison of various security models based on machine learning classifiers for detecting anomalies. The comparative results are shown in terms of precision, recall, and F1 score for different numbers of top-ranked selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. For each security model, we use the same train and testing data to calculate these metrics for fair evaluation.

Refer to caption — (a) Anomaly detection with all 42 features

If we observe Figure 3, we find that tree-based classification models give higher prediction results than other security models, in terms of precision, recall, and F1 score, while applying on cybersecurity data consisting of various security features. In particular, the RF (Random Forest) based security model generating multiple decision trees gives the prediction results with the highest values of accuracy, recall, and F1 score for different number of features, shown in Figure 3. The interesting finding is that the RF model gives similar results with the features of 42, 31, and 24, and a comparatively lower result with feature 17. The reason for decreasing the result is that it losses significant information while reducing the features. Thus, the RF model with the top 24 security features is taken into account as an effective model considering both the accuracy and model complexity. Overall, based on the selected security features, we can conclude that the RF model gives better results in detecting cyber anomalies. The explanation is that the random forest model produces a collection of logical rules based on the chosen security features that take into account multiple decision trees created in the forest, and offers an outcome based on the majority vote of those trees.

Table 5: Effectiveness comparison results in terms of accuracy (%), precision, recall, and F1 score for different machine learning classifier based anomaly detection models utilizing the dataset NSL-KDD.

Model	Accuracy (%)	Precision	Recall	F1 Score
NB	98	0.98	0.98	0.98
LDA	99	0.99	0.99	0.99
KNN	99	0.99	0.99	0.99
XGBoost	99	0.99	0.99	0.99
DT	99	0.99	0.99	0.99
RF	99	0.99	0.99	0.99
SVM	99	0.99	0.99	0.99
AdaBoost	98	0.98	0.98	0.98
SGD	99	0.99	0.99	0.99
LR	98	0.98	0.98	0.98

In Table 5, we also show the effectiveness comparison results utilizing another widely used security dataset NSL-KDD. The results are shown in terms of accuracy (%), precision, recall, and F-score, for different machine learning classifier based anomaly detection models considering binary classification. The results in Table 5 are shown for the top five selected features according to their correlation scores and ranking. If we observe the results in Table 5, we can see that almost all the security models give significant results (accuracy 99%) with the selected top 5 features. Thus, we can conclude that machine learning-based security models are highly dependent on the quality and characteristics of the data, and may give different results for different datasets.

4.5 Effectiveness Comparison for Detecting Multi-Attacks

To show the effectiveness of the security models based on machine learning classifiers, Table 6 shows the effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based attacks detection models considering multi-class classification. The results in Table 6 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. These features are selected similarly, i.e., according to their correlation scores and ranking considering a particular threshold. If we observe the results in Table 6, we can see that various machine learning security models for detecting multi-attacks have also an impact on the number of selected features. As higher accuracy results with a minimum number of features represent the effectiveness of the security models, the RF model is effective with the accuracy (83%) when the top 31 features are selected to build the model. Similarly, XGBoost, DT, and SVM security models also give higher accuracy (81%), (81%), and (79%), when the top 31 features are selected to build the corresponding models. Several security models such as NB, LDA, KNN, and AdaBoost show their significant results considering the top 24 features, while the LR model shows significant results considering all the 42 features. Overall, in addition to RF (accuracy 83%), DT (accuracy 81%), and XGBoost (accuracy 81%) also give significant results for detecting multi-attacks.

Table 6: Effectiveness comparison results in terms of accuracy (%) for different machine learning classifier based multi-attacks detection models utilizing the dataset UNSW-NB15.

Model	Features (42)	Features (31)	Features (24)	Features (17)
NB	43	43	44	42
LDA	67	67	67	65
KNN	76	77	77	73
XGBoost	81	81	80	76
DT	81	81	80	77
RF	83	83	82	80
SVM	79	79	78	74
AdaBoost	51	31	62	57
SGD	71	72	70	63
LR	76	75	74	71

In addition to Table 6, Figure 4 also shows the relative comparison of various security models based on machine learning classifiers for detecting multi-attacks in terms of precision, recall, and F1 score utilizing the dataset UNSW-NB15. For each security model, we use the same train and testing data to calculate these metrics for fair evaluation.

If we observe Figure 4, we find that tree-based classification models also provide higher prediction results in terms of accuracy, recall, and F1 score, for multi-attack detection than other security models. In particular, the security model based on RF (Random Forest) generating multiple decision trees gives the prediction results with the highest accuracy, recall, and F1 score values, shown in Figure 4. The interesting finding is that like the anomaly detection model, the RF model gives similar results with the features of 42, 31, and 24, and a comparatively lower result with the feature 17 for multi-attack detection. The reason for decreasing the result is that it losses significant information while reducing the features. Thus, the RF model with the top 24 security features is taken into account as an effective model considering both the accuracy and model complexity. Overall, we can conclude that the RF model gives better results in detecting multi-attacks based on the selected security features. The reason is that the random forest model generates a set of logic rules for the attacks based on the selected security features considering several decision trees generated in the forest, and provide an outcome based on the majority voting of these trees.

Table 7: Effectiveness comparison results in terms of accuracy (%), precision, recall, and F1 score for different machine learning classifier based multi-attacks detection models utilizing the dataset NSL-KDD.

Model	Accuracy (%)	Precision	Recall	F1 Score
NB	91	0.97	0.91	0.93
LDA	96	0.98	0.96	0.97
KNN	99	0.99	0.99	0.99
XGBoost	99	0.99	0.99	0.99
DT	99	0.99	0.99	0.99
RF	99	0.99	0.99	0.99
SVM	99	0.98	0.99	0.99
AdaBoost	91	0.91	0.91	0.90
SGD	98	0.98	0.98	0.98
LR	98	0.98	0.98	0.98

In Table 7, we also show the effectiveness comparison results utilizing another widely used security dataset NSL-KDD. The results are shown in terms of accuracy (%), precision, recall, and F-score, for different machine learning classifier based multi-attacks detection models considering multi-class classification. The results in Table 7 are shown for the top five selected features according to their correlation scores and ranking. If we observe the results in Table 7, we can see that most of the security models such as KNN, XGBoost, DT, RF, and SVM give the highest results (accuracy 99%) with the selected top 5 features. The other models also give significant results. Based on the results discussed above, we can conclude that machine learning-based security models are highly dependent on the quality and characteristics of the data, and may give different results for different datasets.

4.6 Effectiveness Analysis for Neural Network-based Security Model

To show the model effectiveness based on artificial neural network, Figure 5 shows the calculated outcome in terms of model accuracy and loss score for detecting anomalies considering binary classification. The results in Figure 5 are shown by varying the number of selected features such as 42, 31, 24, and 17 utilizing the dataset UNSW-NB15. The features are selected similarly, according to their correlation scores and ranking, shown in Table 3 considering a particular threshold mentioned above. Similarly, for multi-attacks classification, Figure 6 shows the calculated outcome in terms of model accuracy and loss score considering multi-class classification according to our goal. For each neural network-based security model, we use the same train and testing data for fair evaluation and comparison.

If we observe the results in Figure 5 and Figure 6, we can see that a neural network-based security model with a variable number of selected features can detect both the anomalies and multi-attacks. Similar to classic machine learning classification models, discussed above, we get higher accuracy results in anomaly detection using the neural network-based security model. According to Figure 5, the model with the top 24 features gives the results of 92% accuracy with a loss of 0.1681, which is significant in terms of accuracy and complexity, comparing with other models with different number of features, shown in Figure 5. Thus model with the top 24 features can be selected as an effective security model that gives significant accuracy with a reduced number of features for detecting anomalies. Similarly, a model with the top 24 features can also be selected as an effective model for detecting multi-attacks, shown in Figure 6.

Besides, Figure 7 shows the calculated outcome in terms of model accuracy and loss score for detecting anomalies considering binary classification utilizing another widely used dataset NSL-KDD. The results in Figure 7 are shown by varying the number of selected features such as 42, 27, 18, and 5 utilizing the dataset NSL-KDD. These are selected according to their correlation scores and ranking considering a particular threshold as well. Similarly, for multi-attacks classification, Figure 8 shows the calculated outcome in terms of model accuracy and loss score considering multi-class classification. According to Figure 7, the model with the top 18 features gives the results of 99% accuracy with a loss of 0.0243, which is significant in terms of accuracy and complexity, comparing with other models with different number of features, shown in Figure 7. Thus model with the top 18 features can be selected as an effective security model that gives significant accuracy with a reduced number of features for detecting anomalies. Similarly, a model with the top 27 features can also be selected as an effective model for detecting multi-attacks, shown in Figure 8.

5 Discussion

Overall, our CyberLearning model based on machine learning approaches is fully security data-oriented that reflects the data patterns related to the security incidents, e.g., cyber-anomalies and attacks, according to our goal. The model can effectively detect anomalies and different types of attacks, such as DoS, Backdoor, Worms, etc, where the popular machine learning classification techniques including artificial neural network models are employed. The experimental analysis on the UNSW-NB15 [2] and NSL-KDD [12] datasets, have shown the effectiveness of the resultant security models according to their learning capabilities in various situations, as discussed in the earlier section.

According to the experimental analysis discussed in Section 4, we can say that different machine learning-based security models perform differently for detecting cyber-anomalies, or multi-attacks utilizing the training security data. The significance of the security features greatly impact both the binary classification model while detecting anomalies for unknown attacks, as well as the multi-class classification model while detecting several known classes mentioned above. For instance, according to experimental results shown in Table 3, the feature $sttl$ has the highest correlation score of $0.624082$ and thus selected as the highly significant feature, whereas another feature $ackdat$ has a lower score of $0.000817$ that is closer to the value $0$ for the dataset UNSW-NB15 [2], and thus can be considered as the less significant feature for modeling. A set of highly significant security features reducing the insignificant or irrelevant features can help to make the security model lightweight and more applicable. For instance, the NB security model gives higher accuracy (85%) when the top 24 features are selected for detecting cyber-anomalies, rather than considering all 42 features as shown in Table 4. Overall, the security models for detecting anomalies and attacks, based on various learning algorithms are also affected by the variations in the significance of the security features, as discussed briefly in Section 4.

Besides, a robust classification model is essential to the design of an intelligent intrusion detection system. The reason is that the performance of all machine learning classification techniques are not identical in the real world scenario, depending on their learning capabilities from the security data. As shown in Table 4, and Figure 3, the RF-based security model generating multiple decision trees, give higher prediction results for detecting anomalies than other security models, in terms of accuracy, precision, recall, and F1 score. Several other models such as DT, XGBoost, SVM, KNN, AdaBoost also give significant outcome based on the selected features. According to the results, shown in Table 5, the RF model also gives higher accuracy for detecting anomalies. As shown in the Table 6, Figure 4, the RF-based security model also gives higher prediction results for detecting multi-attacks than other security models, in terms of accuracy, precision, recall, and F1 score. Several other models such as DT and XGBoost also give significant outcome based on the selected features while detecting multi-attacks. According to the results, shown in Table 5, the RF model also gives higher accuracy for detecting multi-attacks.

The model performance for detecting multi-attacks may differ with the anomaly detection mentioned above, even for the same learning technique. For instance, the accuracy of the RF security model for detecting anomalies is 95%, and 83% for multi-attacks detection for the dataset UNSW-NB15 [2]. Similarly, it achieves 99% for both cases for the dataset NSL-KDD [12]. Thus, we can say that the effectiveness of a learning-based security model may vary depending on the security features and the data characteristics. Overall, we can conclude that RF (Random Forest) based security model is more effective for detecting anomalies and multi-attacks. The reason is that the random forest model has the learning capabilities considering several decision trees that generate a set of logic rules based on the selected security features. Thus, the model gives higher prediction results in terms of accuracy, precision, recall, and F1 score.

A real-life cybersecurity application is the actual platform to use the CyberLearning model that typically examines the behavior of the network, finding the security patterns for profiling the normal behavior, and thus detects the anomalies or associated attacks. Although an ANN model has its hidden layers for computing, it also affects on the significance of the features. For instance, an ANN model with the selected security features gives significant accuracy for detecting anomalies and multi-attacks, as discussed briefly in Section 4. Although we use the security datasets UNSW-NB15 [2] and NSL-KDD [12] while building the security model, our analysis is also applicable to other application domains in the area of cybersecurity, including IoT security. Several deep learning networks such as Convolutional neural network (CNN), recurrent neural network (RNN), Long Short-Term Memory (LSTM), deep belief network (DBN), or an autoencoder, etc. could be effective while working on a huge number of datasets. Typically deep learning algorithms perform well when the data volumes are large [9] [62]. In addition, noisy instance analysis [63], incorporating contextual information [18] [64], or recency analysis considering recent patterns in data [65], could be another potential research dimensions in the area. Overall, we believe that our CyberLearning model including a comprehensive experimental analysis opens a promising path for future research in the domain of cybersecurity, while working on machine learning-based security modeling, to make the security model lightweight and more applicable in the area.

6 Conclusion and Future Work

In this paper, we have presented CyberLearning, where we have taken into account a binary classification model for detecting anomalies and a multi-class classification model for various types of cyber-attacks. In our modeling, we have also taken into account the impact of security features, and eventually built a machine learning based effective model with feature selection. While building the security models, we have employed the most popular machine learning classification techniques as well as artificial neural network learning considering multiple hidden layers. Finally, we have examined the effectiveness of these learning-based security models by conducting a range of experiments utilizing the two most popular security datasets, UNSW-NB15 and NSL-KDD. We believe that our empirical analysis and findings can be used as a reference guide in both academia and industry in the area of cybersecurity for effectively building a data-driven security modeling and system based on machine learning techniques.

To collect more recent security data with higher dimensions in the environment of IoT, and build a data-driven secure system using learning techniques could be a future work.

References

[1] I. H. Sarker, Y. B. Abushark, F. Alsolami, A. I. Khan, Intrudtree: A machine learning based cyber security intrusion detection model, Symmetry 12 (5) (2020) 754.
[2] N. Moustafa, J. Slay, Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in: 2015 military communications and information systems conference (MilCIS), IEEE, 2015, pp. 1–6.
[3] X. Qu, L. Yang, K. Guo, L. Ma, M. Sun, M. Ke, M. Li, A survey on the development of self-organizing maps for unsupervised intrusion detection, Mobile Networks and Applications (2019) 1–22.
[4] I. H. Sarker, Ai-driven cybersecurity: An overview, security intelligence modeling and research directions, SN Computer Science (2021).
[5] Y. N. Soe, Y. Feng, P. I. Santosa, R. Hartanto, K. Sakurai, Machine learning-based iot-botnet attack detection with sequential architecture, Sensors 20 (16) (2020) 4372.
[6] M. Hasan, M. M. Islam, M. I. I. Zarif, M. Hashem, Attack and anomaly detection in iot sensors in iot sites using machine learning approaches, Internet of Things 7 (2019) 100059.
[7] M. G. Raman, N. Somu, S. Jagarapu, T. Manghnani, T. Selvam, K. Krithivasan, V. S. Sriram, An efficient intrusion detection technique based on support vector machine and improved binary gravitational search algorithm, Artificial Intelligence Review (2019) 1–32.
[8] A. J. Malik, F. A. Khan, A hybrid technique using binary particle swarm optimization and decision tree pruning for network intrusion detection, Cluster Computing 21 (1) (2018) 667–680.
[9] I. H. Sarker, A. Kayes, S. Badsha, H. Alqahtani, P. Watters, A. Ng, Cybersecurity data science: an overview from machine learning perspective, Journal of Big Data 7 (1) (2020) 1–29.
[10] I. H. Sarker, Machine learning: Algorithms, real-world applications and research directions, SN Computer Science (2021).
[11] I. H. Sarker, Deep cybersecurity: A comprehensive overview from neural network and deep learning perspective, SN Computer Science (2021).
[12] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis of the kdd cup 99 data set, in: 2009 IEEE symposium on computational intelligence for security and defense applications, IEEE, 2009, pp. 1–6.
[13] S. Seufert, D. O’Brien, Machine learning for automatic defence against distributed denial of service attacks, in: 2007 IEEE International Conference on Communications, IEEE, 2007, pp. 1217–1222.
[14] A. Alazab, M. Hobbs, J. Abawajy, M. Alazab, Using feature selection for intrusion detection system, in: 2012 International Symposium on Communications and Information Technologies (ISCIT), IEEE, 2012, pp. 296–301.
[15] A. L. Buczak, E. Guven, A survey of data mining and machine learning methods for cyber security intrusion detection, IEEE Communications surveys & tutorials 18 (2) (2015) 1153–1176.
[16] R. Agrawal, R. Srikant, et al., Fast algorithms for mining association rules, in: Proc. 20th int. conf. very large data bases, VLDB, Vol. 1215, 1994, pp. 487–499.
[17] I. H. Sarker, A. Kayes, Abc-ruleminer: User behavioral rule-based machine learning method for context-aware intelligent services, Journal of Network and Computer Applications (2020) 102762.
[18] I. H. Sarker, Context-aware rule learning from smartphone data: survey, challenges and future directions, Journal of Big Data 6 (1) (2019) 95.
[19] I. H. Sarker, A. Kayes, P. Watters, Effectiveness analysis of machine learning classification models for predicting personalized context-aware smartphone usage, Journal of Big Data (2019).
[20] Y. Li, J. Xia, S. Zhang, J. Yan, X. Ai, K. Dai, An efficient intrusion detection system based on support vector machines and gradually feature removal method, Expert Systems with Applications 39 (1) (2012) 424–430.
[21] F. Amiri, M. R. Yousefi, C. Lucas, A. Shakery, N. Yazdani, Mutual information-based feature selection for intrusion detection systems, Journal of Network and Computer Applications 34 (4) (2011) 1184–1199.
[22] C. Wagner, J. François, T. Engel, et al., Machine learning approach for ip-flow record anomaly detection, in: International Conference on Research in Networking, Springer, 2011, pp. 28–39.
[23] M. V. Kotpalliwar, R. Wajgi, Classification of attacks using support vector machine (svm) on kddcup’99 ids database, in: 2015 Fifth International Conference on Communication Systems and Network Technologies, IEEE, 2015, pp. 987–990.
[24] H. Saxena, V. Richariya, Intrusion detection in kdd99 dataset using svm-pso and feature reduction with information gain, International Journal of Computer Applications 98 (6) (2014).
[25] M. S. Pervez, D. M. Farid, Feature selection and intrusion classification in nsl-kdd cup 99 dataset employing svms, in: The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014), IEEE, 2014, pp. 1–6.
[26] T. Shon, Y. Kim, C. Lee, J. Moon, A machine learning framework for network anomaly detection using svm and ga, in: Proceedings from the sixth annual IEEE SMC information assurance workshop, IEEE, 2005, pp. 176–183.
[27] R. Kokila, S. T. Selvi, K. Govindarajan, Ddos detection and analysis in sdn-based environment using support vector machine classifier, in: 2014 Sixth International Conference on Advanced Computing (ICoAC), IEEE, 2014, pp. 205–210.
[28] C. Kruegel, D. Mutz, W. Robertson, F. Valeur, Bayesian event classification for intrusion detection, in: 19th Annual Computer Security Applications Conference, 2003. Proceedings., IEEE, 2003, pp. 14–23.
[29] S. Benferhat, T. Kenaza, A. Mokhtari, A naive bayes approach for detecting coordinated attacks, in: 2008 32nd Annual IEEE International Computer Software and Applications Conference, IEEE, 2008, pp. 704–709.
[30] M. Panda, M. R. Patra, Network intrusion detection using naive bayes, International journal of computer science and network security 7 (12) (2007) 258–263.
[31] L. Koc, T. A. Mazzuchi, S. Sarkani, A network intrusion detection system based on a hidden naïve bayes multiclass classifier, Expert Systems with Applications 39 (18) (2012) 13492–13500.
[32] R. Bapat, A. Mandya, X. Liu, B. Abraham, D. E. Brown, H. Kang, M. Veeraraghavan, Identifying malicious botnet traffic using logistic regression, in: 2018 Systems and Information Engineering Design Symposium (SIEDS), IEEE, 2018, pp. 266–271.
[33] E. Besharati, M. Naderan, E. Namjoo, Lr-hids: logistic regression host-based intrusion detection system for cloud environments, Journal of Ambient Intelligence and Humanized Computing 10 (9) (2019) 3669–3692.
[34] S. Vishwakarma, V. Sharma, A. Tiwari, An intrusion detection system using knn-aco algorithm, Int. J. Comput. Appl. 171 (10) (2017) 18–23.
[35] H. Shapoorifard, P. Shamsinejad, Intrusion detection using a novel hybrid method incorporating an improved knn, Int. J. Comput. Appl. 173 (1) (2017) 5–9.
[36] A. M. Sharifi, S. K. Amirgholipour, A. Pourebrahimi, Intrusion detection based on joint of k-means and knn, Journal of Convergence Information Technology 10 (5) (2015) 42.
[37] P. A. R. Kumar, S. Selvakumar, Distributed denial of service attack detection using an ensemble of neural classifier, Computer Communications 34 (11) (2011) 1328–1341.
[38] A. Dainotti, A. Pescapé, G. Ventre, A cascade architecture for dos attacks detection based on the wavelet transform, Journal of Computer Security 17 (6) (2009) 945–968.
[39] N. G. Relan, D. R. Patil, Implementation of network intrusion detection system using variant of decision tree algorithm, in: 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE), IEEE, 2015, pp. 1–5.
[40] K. Rai, M. S. Devi, A. Guleria, Decision tree based algorithm for intrusion detection, International Journal of Advanced Networking and Applications 7 (4) (2016) 2828.
[41] B. Ingre, A. Yadav, A. K. Soni, Decision tree based intrusion detection system for nsl-kdd dataset, in: International Conference on Information and Communication Technology for Intelligent Systems, Springer, 2017, pp. 207–218.
[42] S. Puthran, K. Shah, Intrusion detection using improved decision tree algorithm with binary and quad split, in: International Symposium on Security in Computing and Communication, Springer, 2016, pp. 427–438.
[43] D. Moon, H. Im, I. Kim, J. H. Park, Dtb-ids: an intrusion detection system based on decision tree using behavior analysis for preventing apt attacks, The Journal of supercomputing 73 (7) (2017) 2881–2895.
[44] A. O. Balogun, R. G. Jimoh, Anomaly intrusion detection using an hybrid of decision tree and k-nearest neighbor (2015).
[45] P. Sangkatsanee, N. Wattanapongsakorn, C. Charnsripinyo, Practical real-time intrusion detection using machine learning approaches, Computer Communications 34 (18) (2011) 2227–2235.
[46] I. Alrashdi, A. Alqazzaz, E. Aloufi, R. Alharthi, M. Zohdy, H. Ming, Ad-iot: Anomaly detection of iot cyberattacks in smart city using machine learning, in: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), IEEE, 2019, pp. 0305–0310.
[47] M. Mazini, B. Shirazi, I. Mahdavi, Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and adaboost algorithms, Journal of King Saud University-Computer and Information Sciences 31 (4) (2019) 541–553.
[48] J. Han, J. Pei, M. Kamber, Data mining: concepts and techniques (2011).
[49] N. Sneha, T. Gangil, Analysis of diabetes mellitus for early prediction using optimal features selection, Journal of Big Data 6 (1) (2019) 13.
[50] G. H. John, P. Langley, Estimating continuous distributions in bayesian classifiers, in: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., 1995, pp. 338–345.
[51] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in python, the Journal of machine Learning research 12 (2011) 2825–2830.
[52] D. W. Aha, D. Kibler, M. K. Albert, Instance-based learning algorithms, Machine learning 6 (1) (1991) 37–66.
[53] J. R. Quinlan, C4.5: Programs for machine learning, Machine Learning (1993).
[54] L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.
[55] L. Breiman, Bagging predictors, Machine learning 24 (2) (1996) 123–140.
[56] Y. Amit, D. Geman, Shape quantization and recognition with randomized trees, Neural computation 9 (7) (1997) 1545–1588.
[57] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, K. R. K. Murthy, Improvements to platt’s smo algorithm for svm classifier design, Neural computation 13 (3) (2001) 637–649.
[58] S. Le Cessie, J. C. Van Houwelingen, Ridge estimators in logistic regression, Journal of the Royal Statistical Society: Series C (Applied Statistics) 41 (1) (1992) 191–201.
[59] Y. Freund, R. E. Schapire, et al., Experiments with a new boosting algorithm, in: Icml, Vol. 96, Citeseer, 1996, pp. 148–156.
[60] A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, 2019.
[61] Colaboratory [online]. available: https://colab.research.google.com/.
[62] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, C. Wang, Machine learning and deep learning methods for cybersecurity, IEEE Access 6 (2018) 35365–35381.
[63] I. H. Sarker, A machine learning based robust prediction model for real-life mobile phone data, Internet of Things 5 (2019) 180–193.
[64] I. H. Sarker, M. M. Hoque, M. K. Uddin, T. Alsanoosy, Mobile data science and intelligent apps: Concepts, ai-based modeling and research directions, Mobile Networks and Applications (2020) 1–19.
[65] I. H. Sarker, A. Colman, J. Han, Recencyminer: mining recency-based personalized behavior from contextual smartphone data, Journal of Big Data 6 (1) (2019) 49.