Inter-Domain Fusion for Enhanced Intrusion Detection in Power Systems: An Evidence Theoretic and Meta-Heuristic Approach
Abstract
False alerts due to misconfigured or compromised intrusion detection systems (IDS) in industrial control system (ICS) networks can lead to severe economic and operational damage. To solve this problem, research has focused on leveraging deep learning techniques that would help reduce false alerts. However, a shortcoming is that these works often require or implicitly assume the physical and cyber sensor data to be trustworthy. Implicit trust of data is a major problem with using artificial intelligence or machine learning (AI/ML) for cyber-physical system (CPS) security, because the times when these solutions are needed most to detect an attack are also the times when they are more at risk, with both greater likelihood and greater impact, of also being compromised. To address this inevitable shortcoming, the problem can thus be reframed as how to make good decisions given uncertainty. Then, the decision is detection, and the uncertainty includes whether or not the data that would be used in ML-based IDS is compromised. Thus, this article presents an approach for reducing false alerts in cyber-physical power systems that addresses this critical problem of dealing with uncertainty without the knowledge of prior distribution of the alerts. Specifically, an evidence theoretic based approach leveraging Dempster Shafer (DS) combination rules and their variants is proposed for reducing false alerts. A multi-hypothesis mass function model is designed that leverages probability scores obtained from various supervised-learning classifiers. Using this model, a location-cum-domain based fusion framework is proposed to evaluate the intrusion detector’s performance using Disjunctive, Conjunctive and Cautious Conjunctive rules of combinations, that fuse multiple piece of evidences from inter-domain and intra-domain sensors. The approach is demonstrated in a cyber-physical power system testbed (RESLab), and the classifiers are trained with datasets from Man-In-The-Middle attack emulation in a large-scale synthetic electric grid. For evaluating the performance, we consider plausibility, belief, pignistic, general Bayesian theorem based metrics as decision functions. To improve the performance, a multi-objective based genetic algorithm is proposed for feature selection considering the decision metrics as the fitness function. Finally, we present a software application to evaluate the DS fusion approaches with different parameters and architectures.
Index Terms:
Dempster Shafer Theory, Intrusion Detection SystemI Introduction
The increase of advanced control and communication technologies within the electric power grid can make the system more vulnerable to cyber intrusions. Several ICS-targeted attacks such as Stuxnet [1], Ukraine [2], Mumbai [3] are well known for severe impacts with advanced concept of operations. The criticality of power grid infrastructure necessitates the design of resilient detection and defense mechanisms to prevent such attacks.
The behavior of cyber intrusions and their impact on a network is stochastic in nature. This stochasticity is typically modeled using Markov Processes, where the transition probabilities depend on attributes represented graphically such as the degree of the nodes and the prior distribution of the states of the nodes [4]. Similarly, uncertainty is an innate feature of any intrusion analysis. The uncertainty arises due to the defender’s inability to completely view the adversary’s steps, as the monitoring tools can only observe certain symptoms or effects of malicious activities.
Stochasticity and uncertainty complicate cyber intrusion detection and incident forensics. Intrusion Detection Systems (IDS) commonly rely on rule-based policies (signature-based) or deviations from a baseline (behavioral-based) to detect cyber intrusions. These systems produce false alarms, both false negatives and false positives. Signature-based IDSs result in higher false negatives for stealthy [5] and zero-day attacks. Behavioral-based IDSs, while based on statistics, often result in high false positives. High false positives are detrimental to an organization’s efficiency and effectiveness at threat response because they cost time and money for security professionals to investigate, and they erode an organization’s trust in the system’s results. False negatives also pose a significant threat, since an undetected attack may escalate more privileges to result in increased damage or loss to the organization’s assets. For Industrial Control Systems (ICS), IDSs may be further customized based on process-data analysis, control-command analysis, and with help of an ICS physical model [6]. While security tools such as IDSs and firewalls provide key functions, they are typically assumed to be trustworthy. Furthermore, obtaining the data needed for theoretical models to improve the function of such security tools is a challenge, as even behavioral-based IDSs do not have enough intrusion information to build the statistical models [7]. The lack of trust in the IDS creates uncertainty in the evidences from sensors.
To address these challenges due to stochasticity, uncertainty, and lack of adequate data availability, we present a cyber-physical power system intrusion detection system based on the theory of uncertainty, which we call the Inter-Domain Evidence theoretic Approach for Inference in cyber-physical power systems (IDEA-I). We address the problem of high false alarms in IDS, through the solution we develop that works to leverage fusion of evidence by domain and location using Dempster-Shafer (DS) rules of combination. IDEA-I is based on an autonomous data fusion architecture [8], where the features extracted are fed to the classifiers or estimators for decision making before they are fused. This is decision-level fusion, where each sensor performs individual processing to produce an estimate, and then these estimates are combined in the fusion process. There are numerous methods possible to achieve the fusion process, such as voting methods, Bayesian inference, DS methods, and generalized evidence processing theory [8]. DS inference [9] is a fusion technique applicable to the autonomous fusion architecture, and so is Bayesian [10] inference, because these fusion algorithms are fed with the probability distributions computed from the classifiers or the estimators.
In IDEA-I, we propose the usage of Dempster-Shafer Theory of Evidence (DSTE) for network detection in power system control networks. This approach provides value in how it handles uncertainty due to its ability to quantify unknowns. Specifically, two advantages we achieve from D-S theory are (1) its ability to deal with the lack of prior probabilities for various events and (2) its ability to combine evidences from multiple sources [11].
The major contributions of this paper are as follows:
-
1.
A cyber-physical power system intrusion detection system IDEA-I is proposed that improves intrusion detection by inferring cyber-physical state information to improve situational awareness based on the fundamentals of DSTE, various rules of fusion and decision criteria.
-
2.
A method for computing mass functions for stochastic cyber-physical parameters, from the detection probability computed in our prior work on data fusion [12], is proposed and evaluated in IDEA-I. The performance based on two different architectures, location and location-cum-domain based fusion, using IDEA-I is evaluated.
-
3.
IDEA-I is extended to formulate a feature selection unconstrained optimization problem and solved using Non-dominating Sorted Genetic Algorithm (NSGA) [13], to improve IDEA-I accuracy.
-
4.
IDEA-I is developed as a software tool that includes the development of a DSTE library in C#. The application is used to evaluate the performance of the proposed fusion algorithm for varying scenarios and parameters.
The paper is organized as follows. Section II presents background on the DSTE approach. Section III describes IDEA-I including how the method would need to work in cyber-physical power systems with its rules of combinations and their implications. A Genetic Algorithm (GA)-based optimization problem is proposed in Section IV for feature selection to improve IDEA-I performance. Section V presents the experimental setup in our cyber-physical power system testbed RESLab, the use cases that were designed to test IDEA-I, and their implementation. Then, Section V-F introduces the two types of architecture proposed for the fusion. We compare the approach and results with the centralized-based fusion and other decision-level fusions such as Bayesian inference. The overall results are discussed in Section VI, and Section VIII concludes the paper.
II Background
DSTE is applied in many areas of machine learning and deep learning. An unsupervised classification problem in multisource remote sensing is formulated through DS theory, as one can consider union of classes rather than individual class [14]. A neural network based classifier is proposed where the DS computation of mass function or basic belief assessment (BBA) and rules of combination are implemented in two hidden layers respectively [15]. Majority of the research are either centered towards wireless networks [16], network security [17], or autonomous mobile robots [18].
DSTE is also being applied to network security. In [19], multi-source alarm information is fused through DSTE which is associated with nodes vulnerability information, integrated with the severity of threats for situational assessment of network security. A network anomaly detector with enhanced reliability with low false alarms is proposed using DSTE [20]. An IDS is proposed in [17] where the mass function are computed based on the incoming and outgoing traffic ratio, service rate and the prior knowledge in the domain of DDoS attacks. A distributive and collaborative based IDS is proposed using DSTE for fusion data from multiple nodes [21] where the detection is done collaboratively and the decision is distributed among all nodes.
The presented work, IDEA-I, is the first to leverage DS theory for the purpose of classification based on the dataset [12] generated from MiTM attacks in a cyber-physical power system testbed [22].
Data fusion in a cyber-physical system should utilize data from the physical (e.g., power) and cyber sensors as evidences to generate belief functions for the hypothesis. Examples include root vulnerability exploitation or situational awareness for attack prevention. Cyber-physical frameworks for situational awareness [23, 24, 25] have been proposed that identify critical assets and contingencies using power system simulators, graph theories, dynamic programming, etc. For example, one framework [23] builds a partially observable Markov Decision Process (POMDP) model of the grid network that represents all possible attack paths. The robustness of such a framework primarily depends on fusion of information like network access policies, firewall rulesets, physical sensors, etc. The transition probabilities of the security states in the POMDP model depend on the amount of data accrued in real-time. However, uncertainty is present due to unavailability of complete view of the adversary’s steps and monitoring limitations. The presented IDEA-I addresses this gap through its use of DSTE for power system cyber-physical situational awareness, that handles uncertainty due to its ability of quantifying unknowns. Analogous to the space situational awareness (SSA) paradigm [26], we improve the cyber-physical situational awareness (CyPSA) framework by accurately representing the state knowledge of objects in the cyber-physical environment to provide better prediction capabilities for potential threats.
DSTE suffers from major drawbacks of its computational requirements and the challenges it encounters while eliciting the probability masses from multiple evidence [27]. Hence, to address this, we have proposed the use of formulating an optimization problem, by taking the decision function from DSTE as the objective function, for feature selection to train the classifier. Since there are multiple decision metrics, we employ a multi-objective optimization problem and solve using a meta-heuristic GA approach. GA has been used extensively in network intrusion detection such as flow-based traffic characterization [28], IDS rules generation [29], feature selection [30], etc. GA in the DSTE framework was proposed in the turbine maintenance optimization problem [27]. In this work, we present GA with DSTE in cyber-physical security for feature selection.
III Development of IDEA-I from Dempster Shafer Theory & Combination Rules
Uncertainty is classified into two categories based on knowledge and behavior of the system: Aleatory and Epistemic uncertainty [27]. Aleatory uncertainty is caused due to random behavior of system, while Epistemic uncertainty is caused due to lack of knowledge of the system. Under normal operation of a SCADA or OT network, the traffic are not random as in an IT network, hence aleatory uncertainty rarely occurs. Under a compromised situation, the system state is ignorant rather than stochastic, hence the uncertainty in events is epistemic. For example, a zero-day attack cannot be detected by knowledge-based or signature-based IDS, due to lack of information about the intrusions. Dempster and Shafer introduced the belief function for modeling epistemic uncertainty for reasoning under uncertainty. Quantifying uncertainty with a precise measure is difficult, and hence a measure of probability as an interval is considered. Three major frameworks for interval-based representation of uncertainty are the following: a) Imprecise probability, b) Possibility Theory, and c) Demspter Shafer Theory of Evidence (DSTE). DSTE is preferred because of its high degree of theoretical development, better relationship with traditional probability theory, large engineering applications in the past few years, and the versatility of the theory to represent and combine different types of evidences.
In evidence theory, logs at each sensor act as evidence that are considered for reasoning of an event. Theoretically, there are four types of evidence: a) Consonant, b) Consistent, c) Arbitrary, and d) Disjoint [31]. The data of a control network with an IDS for cyber intrusion detection and a bad data detector for power system state estimation can be collectively considered as Disjoint evidence due to their different purpose of deployment. Traditional probability theory cannot handle consonant, consistent, or arbitrary evidence without resorting to assumptions in distributions. DSTE can handle all these kinds by combining a notion of probability with the traditional conception of sets.
The IDEA-I framework is illustrated in Fig. 1. The Datasets are sensor data extracted from substation networking devices, DNP3 Master and Outstations from the RESLab testbed [22]. DNP3 [32] is a protocol used in SCADA systems for monitoring and controlling field devices. Data Pre-processing is performed before training the ML based IDS Classifier. The output from the classifier from different sensors carries data of varying timestamps which synchronized with the Mean Value based Time Synchronization block. The calculations for Mass Function Computation, DS Rules of Combination, and the Decision Function blocks of DSTE are detailed in this section. The decision function is used in the fitness function in the version 2 of NSGA i.e. NSGA-2 based Feature Selection block (Section IV) to again filter the features in the Data Pre-processing block. Each block in the flow-chart is detailed further.

III-A Dempster Shafer Theory of Evidence (DSTE)
Fundamentally, DSTE can be separated from the basic probability theory on the basis of the manner one distributes the probability density or mass based on the type of random variables. For example, probability theory assigns to both and for the toss of an unbiased coin. However, DS theory assigns a belief to and but assigns a belief to the set , i.e., “Either Head or Tail.” DS does not compel picking a probability when there is no evidence. This approach provides three kinds of answers: , , and Don’t Know. Allowing the third option, i.e., ignorance, can make evidential reasoning valuable when there are not enough data to validate a hypothesis.
DSTE is concerned with bounds for probabilities of provability, rather than computing probabilities of truth. The two bounds are called belief and plausibility. Equivalent to the state space in probability, there is a set of mutually exclusive and exhaustive hypotheses denoted by , also called the Frame of Discernment. The set of all possible subsets of , including itself and the null set , is called a power set and designated by . Thus, the power set comprises all possible hypotheses or so-called focal elements.
III-B Basic Belief Assignment in DSTE
The basic belief assignment (BBA) function or the mass distribution function (), distributes the belief over the power set of the frame of discernment. Subsets of such that are called focal sets of . This mass distribution function can be classified into , , , , , , and . The definitions of each are presented in Appendix A.
A subnormal BBA can be transformed to a normal BBA by the normalization operation defined in Eq. 1:
(1) |
However, the utmost care is required before normalization: the authors in [9] explain a controversial issue in the normalization of the upper and lower probabilities and its implication on the rules of combination of the evidences. The different functions considered in the Decision Function block of DSTE (Fig. 1) for validating a hypothesis are presented in Appendix B.
III-C Rules of Combination
The purpose of aggregation of information is to summarize a collection of data, whether the data is coming from a single source or from multiple sources.
III-C1 Dempster’s Rules of Combination (DRC)
Dempster’s rules of combination is a procedure for combining independent piece of evidence. The requirement for establishing the independence of sources is an important philosophical question. From a set theoretic standpoint, these rules can potentially occupy a continuum between Conjunction (AND-based on set intersection) and disjunction (OR-based on set union) [33]. For a situation where all the evidences are reliable, a conjunctive operation is appropriate, while for one reliable source, disjunctive operation is preferred. Hence, in the domain of intrusion detection, one cannot rely on all the IDSs and network logs to give reliable information due to the existence of false positives and negatives in the algorithms. Thus, in these scenarios, one should prefer the disjunctive rule.
However, there are many other combination operations such as A and B or C, A and C or B, etc. Prade [33] describes these three types of combination as conjunctive pooling (, if ), disjunctive pooling (), and a tradeoff of both of them. The original combination rule of multiple basic probability assignments known as the Dempster rule is a generalization of Bayes’ rule. This rule strongly emphasizes the agreement between multiple sources and ignores all the conflicting evidence through a normalization factor. The mathematical operation , corresponds to the normalized conjunctive fusion rule:
(2) | |||
where
(3) |
The disjunctive rule of combination is given by:
(4) |
III-C2 Combine Cautious (CC)
The Combine Cautious rules of combination is based on the work [34]. Conventional DS rules of combination require the evidence from multiple source to be distinct or independent, which may not be true in a realistic application. Many works have developed mechanisms to overcome the limitations of the distinctness assumptions, but they were limited to at most two focal sets. These methods were extended to separable belief functions, but since all belief functions are not separable, the conventional method was not further extended. The operators in these rules of combination, need to satisfy the mathematical properties such as associative, commutative, and idempotency. Many rules of combination that were developed either did not obey those requirements, or they were not scalable for large focal sets. Moreover, the conjunctive rule is based on the assumption that the belief functions to be combined are induced from reliable sources of information. Due to the above challenges in the DRC, the CC method of combination is considered.
In the DRC, combination rules belong to the credal level where the evidence are aggregated, but the decisions rules are implemented at the pignistic level [35]. In the CC, the weight function are computed using the commonality function defined in Appendix B given by the following,
(5) | ||||
where denotes the set of even natural numbers.
As per the Least Commitment Principle used in combine cautious [34], the rules of combination are as follows: Let and be two consonant BBA. The is said to be consonant, if its focal sets are nested i.e. , where is the maximum operator), and let and be their respective commonality functions. Then, the consonant BBA with commonality function is claimed to be the s-least committed element in the set , where is the specialization matrix. Hence, the cautious combination rule for the two nondogmatic BBAs is given by , where is the minimum operator and each is the respective weight function.
III-D Decision Criteria
III-D1 Belief and Plausibility Scores
III-D2 Pignistic Scores
To take a rational decision, we propose to transform beliefs into pignistic probability functions through the generalized pignistic transformation (GPT) [35]. The pignistic transformation is based on the following equation,
(6) |
where is the number of the worlds present in the set , and are the other components in the frame of discernment. Usually decisions are made by computing the expectation over multiple simulations, using the pignistic as the probability function needed to compute expectations. Usually, one uses the maximum of the pignistic probability as decision criterion. The max of is often considered as a prudent betting decision criterion between the two other alternatives such as max of plausibility or belief function.
III-D3 General Bayesian Theorem (GBT)
The GBT is a generalization of Bayes’ theorem, except the conditional probabilities in Eq. 7 are replaced by belief functions, and the a priori belief function on is vacuous. GBT can be used for backward propagation of the belief networks to compute the posterior probabilities induced on for any .
(7) |
IV Genetic Algorithm for Feature Selection
Feature selection is a challenging task, for intrusion detection in a cyber physical system involves uncertainty in intrusion events. The complexity of the CPS model increases when the cyber and physical network models are defined in detail. Leveraging feature reduction techniques such as PCA can assist in improving detection accuracy, but the feature transformation can result in un-identification of the decipherable features. Hence, it is crucial to adopt non-transformable techniques in feature reduction using optimization techniques while also considering system uncertainty.
The feature selection problem based on the stochastic system, which relies on the epistemically uncertain parameters, can be formulated as a multi-objective optimization problem with uncertain objective functions (i.e., the belief, plausibility, pignistic scores of the hypothesis). In this context, the objective of the present work is to propose a feature selection technique by propagating the uncertainties of the conventional classifiers onto the fitness values and formulating the solution of the GA as a binary encoding. GA was previously used in network intrusion detection such as flow-based traffic characterization [28], IDS rules generation [29], feature selection [30], etc.
In this work, a GA-based meta-heuristic approach is adopted for feature selection, and the initial population consists of chromosomes of randomly selected features. The detection probabilities of these randomly selected features are computed by training them through the classifiers. Then, the fitness functions (belief, plausibility, and pignistic) are obtained for the different evidences, which were further used for selection, mutation and cross-over operation. Since there are multiple hypothesis in this problem, a multiple objective problem is formulated.
IV-A NSGA-2
A single fitness function cannot provide an optimal solution for the multiple decision metrics considered in the DSTE framework. Hence, multi-objective GA algorithms need to be explored. NSGA [13] has been found to solve multi-objective problems efficiently. In this paper, a faster version of NSGA (NSGA-2) has been adopted to solve the feature selection problem.
The algorithm for NSGA-2 is given in Algorithm 1. It involves primarily two steps: a) From the given population, , at iteration , the offspring solution, , is obtained using the selection, mutation, and crossover operations (Line 12-15). In the first step, using the union of and , non-dominated sorting is performed to obtain solutions at different pareto-front levels (Line 2-3). Non-dominates sorting is a sorting done between two solutions, say and , where is considered to dominate , if and only if there is no objective of worse than that objective of and there is at least one objective of better than that objective of . Pareto-Front is a set of non-dominated solutions, being chosen as optimal if no objective can be improved without sacrificing at least one other objective. b) In the second step, while the next population set is obtained by sequentially adding the elements in the obtained pareto fronts, starting with 1 until the condition is satisfied (where is the solution in the front, and is the maximum size of the population), for the selection of the elements in , crowding-distance computation using the fitness function in each front (Line 6) is performed to obtain diverse solutions (Line 5-9).
IV-B Problem Formulation
The objective of the problem is to minimize the error with reference to the attack labels , over the sampled time throughout the simulation, so as to identify the least number of features that need to be considered for training the classifiers,
(8) |
where , is the set of all decision metrics such as fused Belief, Plausibility, Pignistic, GBT functions, is their corresponding scores after the fusion operations, as presented in Section III-D and III, respectively, and is the simulation duration. The decision variables are binary encoded indicating whether a feature is selected for training the classifier or not. The at time depends on the feature selected for training. depending on the attack window, i.e., during attack or else .
V Testbed & Fusion Architecture
Before discussing the location-based fusion, it is essential to understand the architecture of the RESLab testbed that is producing the data during emulation of different Man-in-the-Middle attacks.
V-A Testbed Architecture
The RESLab emulation testbed consists of a network emulator, a power system emulator, an OpenDNP3 master and a RTAC based master, an intrusion detection system, and data storage, fusion and visualization software, shown in Fig. 2. A brief overview of each component is given below. The detailed explanation of RESLab, including its architecture and use cases, is provided in [22, 36]. Common Open Research Emulator (CORE) is used to emulate the communication network. PowerWorld Dynamic Studio (PWDS) is a real-time simulation engine for operating the simulated power system case in real-time as a server [37]. DNP3 Masters are incorporated using open DNP3. Snort is used in the testbed as the rule-based, open-source intrusion detection system (IDS). The Elasticsearch, Logstash, and Kibana (ELK) stack is used to probe and store all virtual and physical network interface traffic. A separate VM is deployed to operate the fusion engine that collects network logs and Snort alerts from the ELK stack using the Elasticsearch client and raw packet captures from CORE using pyshark. The fusion engine constructs cyber and physical features and merges them, using the time stamps from different sources to ensure correct alignment of information. Further, it pre-processes the features using imputation, scaling, and encoding before training for intrusion detection using supervised, unsupervised and semi-supervised learning techniques. More details can be found in [12].

V-B Modifying Measurements and Commands
The objective of the intruder is to disrupt grid operations. Details on the sequence of actions that create the FCI and FDI attacks and how they impact the power system side are detailed in [22, 36]. These are four use cases:
Use Case 1 (): Branch Control Modifications. Each binary direct operate command is changed from a CLOSE to a TRIP command, with any other traffic simply forwarded. The change in the binary operate command introduces some processing delay, which may cause the packet to be retransmitted.
2. Use Case 2 (): Generator Set-Point Modification. When the MiTM script is running, the analog point for the generator is set to a lower value, which decreases the generator setpoint from its current value.
3. Use Case 3 (): Measurement and Status Modification. is a combination of false command and data injection attacks. After each polling interval, the DNP3 master will send a read request packet to each outstation, which then sends a read response packet to the master. This read response is filled with the all the binary input, analog input, binary output, and analog output DNP3 points. Next, analog input points in the read response packet are changed to a lower value lower of 20 MW or 0 MW. The operator controlling the DNP3 master is then forced to send an analog direct operate command to bring the generators back to their original loaded set points. However, when the operator sends this original set point value to the generator, the MiTM script is programmed to change the setpoint to 20 MW or 0 MW.
4. Use Case 4 (): Measurement and Status Modification. The adversary first follows the steps of Use Case 3, then modifies the read response packet of the preceding packets, based on the actual set point given by the master. Thus the master is unaware of the contingency created.
V-C Use Cases Implementation
For each use case, the polling intervals and the number of polled DNP3 outstations are altered. The polling intervals tested were 30 and 60 s, while the number of polled DNP3 outstations were five and ten. For instance, the scenario indicates with ten outstations and a polling interval of 30 s is implemented.
In each scenario, the normal operation is conducted first without the MiTM attack. Then the operation is conducted again with the attack to analyze its impact. Finally, the attack is stopped and the network restored. The main reason for choosing poll intervals of 30 and 60 s is that most DNP3 masters have polling rates of 30 s, 1 min, or 5 min, with a maximum of 15 min. A poll interval more than two minutes has little impact on attack strength because the adversary processing time is less than 60 to 70 ms. Similarly, outstation numbers of five and ten are considered, since the objective is to study the communication dynamics of an impacted outstation, and how the number of outstations becomes a limitation on the attack success probability. The numbers of five and ten coincide with our use cases in the Texas 2000-bus synthetic model, where each utility control center is communicating with at least two substations and at most 25 substations.
V-D Mean Value Based Time Synchronization
The classifier probability scores of detection and the time stamp may vary for different locations. The sample times will also vary. Hence, a time resolution window is selected to compute the average of the probability scores from samples existing in that window and store the average probability score. This ensures time synchronization for fusion by location. The lower the window size, the higher the time resolution will be, but more noise will be present in the decision function. The impact of resolution is studied considering accuracy of the fusion technique for .
V-E Supervised learning based IDS
Different supervised learning based classifier are used in the Data Fusion Engine [12]. The probability scores based on the classifier’s output for each data point are considered for computing the mass function from each evidence. In the testbed, the IDSs are trained at three different locations in the network: a) DNP3 outstation, b) DNP3 master, and the c) substation router. Hence, adopting the autonomous fusion architecture is appropriate, which is more decision-centric fusion. Then location based fusion is considered using the DS rules of combinations. Initially, seven types of supervised-based classifiers are trained: a) Support Vector Classifier (SVC), b) K-Nearest Neighbor (kNN), c) Decision Tree (DT), d) Random Forest (RF), e) Gaussian Naive Bayes (GNB), f) Bernoulli Naive Bayes (BNB), and g) Multi-layer Perceptron (MLP) to compute the probability scores for different use cases with varying poll rates and polled number of outstations. Then, the value of the decision function is used to evaluate the belief mass to feed into the DS-based fusion engine. Further, the mass distribution is computed based on the probability score. The frame of discernment for the given IDS problem is given by . If the probability of intrusion for a data point is say , then the dogmatic belief mass distribution is set to be the following,
(9) |
where the first mass belief distribution quantifies uncertainty, as per the variance of the probability scores considered in [38]. However, since most if not all states of belief are based on imperfect and not entirely conclusive evidence, non-dogmatic belief functions should be considered where is very small [34], say and not zero. In this scenario, a different belief mass distribution is proposed:
(10) |
V-F Fusion Architectures
Two types of fusion architectures are proposed and experimented as shown in Fig. 3.
V-F1 Fusion by Location (FL)
In the first case, we explore the performance through fusion by location based on the probability scores obtained from classifiers trained with both cyber and physical features at the substation router, DNP3 master, and outstation.
V-F2 Fusion by Location and Domain (FLD)
In the second case, we fuse by location as well as domain, utilizing the probability scores obtained from classifiers trained with pure cyber and physical features.

VI Results and Analysis
This section describes the experiments performed on evaluating the performance of IDEA-I on the data collected from the sensors in the RESLab testbed [22], where different Man-In-The-Middle attacks were implemented in the emulated synthetic electric grid. Different rules of combination and decision criteria are evaluated from the DSTE. Further, these criteria or scores were used to compare different classifiers that must be considered prior to incorporation of DS rules of combination. Post combination, the scores are fed to the NSGA-2 algorithm for feature selection to improve the performance. Based on the selected classifier, different types of fusion techniques used in DSTE are evaluated. The performance of two architecture introduced in the previous section that involves fusion by location and domain are assessed. Finally, the impact of the time resolution in the fusion operation on its accuracy alongwith the time complexity of the fusion algorithm is analysed with varying number of hypothesis in the belief mass distribution.
VI-A Decision Criteria Selection
In DSTE, different criteria or scores serve different purposes. Hence, four decision criteria a) Belief score, b) Plausibility score, c) Pignistic score, and d) Score based on General Bayesian Theorem (GBT) presented earlier are evaluated. The basic idea is to evaluate the accuracy under these different criteria, for all the use cases, while considering different classifiers and cyber-physical features, and finally to select the criteria that has the highest accuracy in the most scenarios. For the evaluations, the time resolution is assumed to be 15 s, and the disjunctive rule of combination is considered. The accuracy is calculated based on , where and are true positives and negatives, and are false positives and negatives. It can be observed from Table I that for seven classifiers and 14 use cases, the Pignistic score has the highest accuracy under 78 scenarios. The Belief score, Plausibility score, and GBT score have the highest accuracy under 7, 7, and 8 scenarios respectively. Thus, results show the pignistic score as a reliable criteria for evaluation. Fig. 4 shows the decision metrics for the disjunctive fusion performed on the evidence collected from router, master, and outstation IDS with a decision tree classifier. The pignistic score is a better indicator for intrusion detection compared to the other scores. These scores are also utilized to formulate the fitness function for the meta-heuristic based feature selection and to re-evaluate the DS rules of fusion.
Scenarios | Classifiers | ||||||||
---|---|---|---|---|---|---|---|---|---|
UC | os | PI | SVC | K-NN | DT | RF | GNB | BNB | MLP |
UC1 | 10 | 30 | c | c | c | c | b | c | b |
10 | 60 | c | c | c | c | c | c | c | |
UC2 | 5 | 30 | c | c | c | c | c | c | c |
5 | 60 | c | c | d | c | c | c | c | |
10 | 30 | c | c | a | c | c | c | c | |
10 | 60 | c | c | a | c | b | c | c | |
UC3 | 5 | 30 | c | c | c | c | c | c | c |
5 | 60 | c | c | a | c | c | c | c | |
10 | 30 | b | c | c | c | c | a | c | |
10 | 60 | b | c | c | c | c | b | c | |
UC4 | 5 | 30 | c | c | a | c | d | b | d |
5 | 60 | c | c | a | c | c | c | d | |
10 | 30 | c | c | d | c | d | d | d | |
10 | 60 | c | c | c | c | c | c | a |

VI-B Comparison by Classifier
A classifier’s performance may vary based on the features selected for training the learning engine. Certain use-cases perform well with the use of linear classifiers such as SGD or logistic regression, while some outperform with the use of non-linear classifiers such as DT, RF, or SVM. Some cases perform better with the use of deep learning based classifiers such as CNN or RNN, depending on the spatial and temporal nature of features and relationship with the labels. Here, the seven classifiers are compared with the architecture of fusion. Figs. 5, 5, 5, 5 show the comparison of the classifiers used, prior to the disjunctive based fusion, evaluated based on the precision, recall, F1-score and accuracy calculated after disjunctive based location fusion. For UC-1, UC-2, UC-3, and UC-4, the DT- and RF-based classifiers are found to be better options with disjunctive-based fusion. However, the precision scores for UC-1 and UC-2 were better with SVM and k-NN classifiers, but accuracy and F1-score are considered the major criteria here, so DT- and RF-based classifiers are considered for further experiments.




VI-C Comparison of Rules of Combination
Disjunctive rules of combination requires at least one reliable evidence. Conjunctive rules perform well if the evidences are independent and reliable. Cautious combination rules perform better when the cardinality of the frame of discernment is high. Hence, the comparison of conjunctive, disjunctive, and cautious conjunctive fusion technique is performed. Figs. 6, 6, 6 show the impact on the precision, recall, F1-score, and accuracy scores computed with the pignistic function for three different rules of combination. In the experiment, the intruder compromises the substation network, hence both the substation router and the DNP3 outstation are compromised. Among the three sensors, with the assistance of one source considered to be secure, i.e., the DNP3 master, the disjunctive fusion with DT and RF classifiers performs better in comparison to conjunctive and its cautious counterpart.



VI-D Comparison of Two Architectures
Feature-based fusion is performed prior to fusion by location (FL). If the features from diverse domains are unable to be fused due to lack of performance or due to lack of evidence from any one domain, one needs to adopt the FLD architecture, where fusion-by-location on the raw domain-specific features are performed, prior to fusion-by-domain. Figs. 5, 5,5, 5 show the results for the FL architecture, while Figs.7, 7, 7, 7 show results from the FLD architecture. FLD-based fusion outperforms FL-based fusion in many scenarios, but in some cases there was not much influence. Hence, both may be adopted depending on the scenario.




VI-E Impact of Time Resolution while Merging by Location
Since low time resolution results in noise in intrusion detection, it is advisable to consider smoothening techniques. The physical sensor and cyber sensor time intervals between samples may vary; hence, it is essential for fusion to bring the samples to the same time frame. This comparison evaluates detection performance based on varying time resolution, considered during time synchronization prior to fusion by location. Figs. 8, 8, 8, 8, show the effect of different resolutions in the Mean Value based Time Synchronization block, implemented for the probability scores obtained from the DT classifier for . Results show that increasing the sample time lead to better decision scores, except the GBT score. The GBT and the Belief scores are same for all the time resolution except for the .




VI-F Comparison with NSGA-2 based Feature Selection
Detection performance considering feature selection using the NSGA-2 algorithm varies based on the selection of type of classifier. For all the classifiers, the results obtained with the GA algorithm improve the detection performance. The comparison of the results for RT and DF classifiers with and without GA-based feature selection is shown in Fig. 9.

VI-G Computation Analysis with Varying Number of Sensors
The DS rules of combination are dependent upon the size of the frame of discernment as well as the number of mass functions being combined [39]. The time complexity involved in fusion algorithms varies based on the number of evidences to fuse. It has been proven that as the set size of the frame of discernment increases, the cautious combination rule is found to be more effective [40].
VII DSTE Evaluation Framework
A desktop application for IDEA-I is developed for the evaluation of DSTE rules of combination for different use-cases and parameters. Fig. 10 shows the application, visualizing the decision metric for for disjunctive fusion, with only cyber features, and using a mean value based time synchronization of 15 sec. The check-box labeled “Merge by Location and Domain?” is used to select either the - or -based architecture. The code for the DSTE evaluation application is available in Github [41]. The datasets for the evaluation of the IDEA-I are also publicly available at IEEE Dataport [42].

VIII Conclusion
An evidence theoretic based data fusion framework for detecting cyber intrusion in power systems is presented. The framework is evaluated by studying the performance of different classifiers using DS rules of combination. Results show the evidence from the Decision Tree and Random Forest classifiers to be the best among other techniques. Results also show that higher time resolution in mean-value based time synchronization improves the decision metrics. The Pignistic Function decision criteria is observed to be the best among all the others for all the use-cases. The FLD (autonomous architecture) outperforms the FL (centralized architecture) based fusion in many scenarios, but in some there is not much influence, so both techniques may be considered depending on the scenarios. Among the different rules of combination, the disjunctive rules performed the best when considered with Decision Tree and Random Forest probability scores. Finally, an application has been developed and presented that performs these analyses and facilitates the DS theoretic framework for the fusion of cyber and physical sensors in power systems. The application is created, used, and made available to evaluate the impact of fusion type, use-case, time-resolution and architecture.
Appendix A Types of mass function
-
1.
Normal: If is not a focal set or Ø
-
2.
Subnormal: If is a focal set. In our experiments, the null set in the focal set hence subnormal belief function is considered.
-
3.
Dogmatic: If is not a focal set. In the experiments, is in the focal set hence it has nondogmatic belief function. The combine cautious rule can only be implemented if the mass distribution is nondogmatic.
-
4.
Vacuous: If is the only focal set. For certain timestamp when there are no alerts, we also observe vacuous focal set.
-
5.
Simple: If it has at most two focal sets and, if it has two, is one of them.
-
6.
Categorical: If it has only one focal set.
-
7.
Bayesian: If its focal sets are singleton.
Appendix B DS Functions
B-A Belief Function
The belief function maps each hypotheses B to a value bel(B) between 0 and 1, defined in in Eq. 11:
(11) |
In words, belief in a hypothesis B is the sum of masses of elements which are subsets of A, .
B-B Plausibility Function
The plausibility function maps each hypotheses B to a value pls(B) between 0 and 1, defined in Eq. 12:
(12) |
In words, pls(B) is the sum of all the mass of the sets, that intersects with the set B. It has an alternate definition, i.e., the weight of evidence that does not refute B, and hence the belief and plausibility are related by Eq.13.
(13) |
B-C Commonality Function
The commonality function maps each hypotheses B to a value q(B) between 0 and 1, defined in Eq. 14:
(14) |
B-D Relationship among the function
Shafer showed that there is one-to-one correspondence among these m, bel, pls and q functions, where Eqs. 15 and 16 give examples:
(15) |
(16) |
Acknowledgment
The work described in this paper was supported by funds from the National Academy of Sciences under award PO# 2000009323 titled Online Resilience Support System for Cyber-Physical Situational Awareness and from US Department of Energy’s (DoE) Cybersecurity for Energy Delivery Systems program under award DE-OE0000895.
References
- [1] D. Kushner, “The real story of stuxnet,” IEEE Spectrum, vol. 50, no. 3, pp. 48–53, March 2013.
- [2] R. M. Lee, M. J. Assante, and T. Conway, “Analysis of the cyber attack on the ukrainian power grid,” E-ISAC, SANS, Washington DC, US, Mar 2016.
- [3] Pierluigi Paganini. (2020, Nov) October Mumbai power outage may have been caused by a cyber attack. [Online]. Available: https://securityaffairs.co/wordpress/111209/hacking/mumbai-power-outage-cyber-attack.html
- [4] S. Xu, W. Lu, and H. Li, “A stochastic model of active cyber defense dynamics,” 2016.
- [5] S. Patton, W. Yurcik, and D. Doss, “An achilles’ heel in signature-based ids: Squealing false positives in snort,” 12 2001.
- [6] Y. Hu, A. Yang, H. Li, Y. Sun, and L. Sun, “A survey of intrusion detection on industrial control systems,” International Journal of Distributed Sensor Networks, vol. 14, p. 155014771879461, 08 2018.
- [7] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, 12 2019.
- [8] D. Hall, Mathematical Techniques in Multisensor Data Fusion, 01 1992.
- [9] L. A. Zadeh, “A simple view of the dempster-shafer theory of evidence and its implication for the rule of combination,” AI Mag., vol. 7, no. 2, p. 85–90, Jul. 1986.
- [10] Peng Xie, J. H. Li, Xinming Ou, Peng Liu, and R. Levy, “Using bayesian networks for cyber security analysis,” in 2010 IEEE/IFIP International Conference on Dependable Systems Networks, 2010, pp. 211–220.
- [11] L. Zomlot, S. C. Sundaramurthy, K. Luo, X. Ou, and S. R. Rajagopalan, “Prioritizing intrusion analysis using dempster-shafer theory,” in Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, ser. AISec ’11. New York, NY, USA: Association for Computing Machinery, 2011, p. 59–70. [Online]. Available: https://doi.org/10.1145/2046684.2046694
- [12] A. Sahu, Z. Mao, P. Wlazlo, H. Huang, K. Davis, A. Goulart, and S. Zonouz, “Multi-source multi-domain data fusion for cyberattack detection in power systems,” IEEE Access, vol. 9, pp. 119 118–119 138, 2021.
- [13] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: Nsga-ii,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
- [14] S. Le Hegarat-Mascle, I. Bloch, and D. Vidal-Madjar, “Application of dempster-shafer evidence theory to unsupervised classification in multisource remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 4, pp. 1018–1031, 1997.
- [15] T. Denoeux, “A neural network classifier based on dempster-shafer theory,” IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 30, no. 2, pp. 131–150, 2000.
- [16] S. Bu, F. R. Yu, X. P. Liu, P. Mason, and H. Tang, “Distributed combined authentication and intrusion detection with data fusion in high-security mobile ad hoc networks,” IEEE Transactions on Vehicular Technology, vol. 60, no. 3, pp. 1025–1036, 2011.
- [17] W. Hu, J. Li, and Q. Gao, “Intrusion detection engine based on dempster-shafer’s theory of evidence,” in 2006 International Conference on Communications, Circuits and Systems, vol. 3, 2006, pp. 1627–1631.
- [18] R. R. Murphy, “Dempster-shafer theory for sensor fusion in autonomous mobile robots,” IEEE Transactions on Robotics and Automation, vol. 14, no. 2, pp. 197–206, 1998.
- [19] F. Xuewei, W. Dongxia, M. Guoqing, and L. Jin, “Security situation assessment based on the ds theory,” in 2010 Second International Workshop on Education Technology and Computer Science, vol. 3, 2010, pp. 352–356.
- [20] L. Liu and Y. Liu, “Research on the technology of network intrusion detection based on modified d-s evidence theory,” in 2009 WRI World Congress on Software Engineering, vol. 2, 2009, pp. 447–450.
- [21] A. Farroukh, N. Mukadam, E. Bassil, and I. H. Elhajj, “Distributed and collaborative intrusion detection systems,” in 2008 IEEE Lebanon Communications Workshop, 2008, pp. 41–45.
- [22] A. Sahu, P. Wlazlo, Z. Mao, H. Huang, A. Goulart, K. Davis, and S. Zonouz, “Design and evaluation of a cyber-physical testbed for improving attack resilience of power systems,” IET Cyber-Physical Systems: Theory & Applications. [Online]. Available: https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cps2.12018
- [23] K. R. Davis, C. M. Davis, S. A. Zonouz, R. B. Bobba, R. Berthier, L. Garcia, and P. W. Sauer, “A cyber-physical modeling and assessment framework for power grid infrastructures,” IEEE Transactions on Smart Grid, vol. 6, no. 5, pp. 2464–2475, 2015.
- [24] A. Sahu, H. Huang, K. Davis, and S. Zonouz, “A framework for cyber-physical model creation and evaluation,” in 2019 20th International Conference on Intelligent System Application to Power Systems (ISAP), 2019, pp. 1–8.
- [25] S. Zonouz, C. M. Davis, K. R. Davis, R. Berthier, R. B. Bobba, and W. H. Sanders, “Socca: A security-oriented cyber-physical contingency analysis in power infrastructures,” IEEE Transactions on Smart Grid, vol. 5, no. 1, pp. 3–13, 2014.
- [26] A. Jaunzemis, M. Holzinger, and K. Luu, “Sensor tasking for spacecraft custody maintenance and anomaly detection using evidential reasoning,” Journal of Aerospace Information Systems, vol. 15, pp. 1–26, 02 2018.
- [27] M. Compare and E. Zio, “Genetic algorithms in the framework of dempster-shafer theory of evidence for maintenance optimization problems,” IEEE Transactions on Reliability, vol. 64, no. 2, pp. 645–660, 2015.
- [28] A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T. Abrão, and M. L. Proença, “Network anomaly detection system using genetic algorithm and fuzzy logic,” Expert Systems with Applications, vol. 92, pp. 390–402, 2018. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S095741741730619X
- [29] P. Angelo and A. Drummond, “Adaptive anomaly‐based intrusion detection system using genetic algorithm and profiling,” Security and Privacy, vol. 1, 08 2018.
- [30] M. Gauthama Raman, N. Somu, K. Kirthivasan, R. Liscano, and V. Shankar Sriram, “An efficient intrusion detection system based on hypergraph - genetic algorithm for parameter optimization and feature selection in support vector machine,” Knowledge-Based Systems, vol. 134, pp. 1–12, 2017. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950705117303209
- [31] Y. Tang, N. Oren, S. Parsons, and K. Sycara, “Dempster-shafer argument schemes,” 2013, tenth International Workshop on Argumentation in Multi-Agent Systems (ArgMAS 2013) ; Conference date: 06-05-2013 Through 10-05-2013.
- [32] G. Clarke, D. Reynders, and E. Wright, Practical modern SCADA protocols: DNP3, 60870.5 and related systems. Newnes, 2004.
- [33] D. Dubois and H. Prade, On the Combination of Evidence in Various Mathematical Frameworks. Dordrecht: Springer Netherlands, 1992, pp. 213–241. [Online]. Available: https://doi.org/10.1007/978-94-011-2438-6_13
- [34] T. Denundefinedux, “Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence,” Artif. Intell., vol. 172, no. 2–3, p. 234–264, Feb. 2008. [Online]. Available: https://doi.org/10.1016/j.artint.2007.05.008
- [35] P. Smets and R. Kennes, “The transferable belief model,” Artificial Intelligence, vol. 66, no. 2, pp. 191 – 234, 1994. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0004370294900264
- [36] P. Wlazlo, A. Sahu, Z. Mao, H. Huang, A. Goulart, K. Davis, and S. Zonouz, “Man-in-the-middle attacks and defence in a power system cyber-physical testbed,” IET Cyber-Physical Systems: Theory & Applications. [Online]. Available: https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/cps2.12014
- [37] Glover, T. Overbye, and Sarma, “Powerworld simulator.” [Online]. Available: https://www.powerworld.com/products/simulator/overview
- [38] M. Raza, I. Gondal, D. Green, and R. L. Coppel, “Classifier fusion using dempster-shafer theory of evidence to predict breast cancer tumors,” in TENCON 2006 - 2006 IEEE Region 10 Conference, 2006, pp. 1–4.
- [39] P. Orponen, “Dempster’s rule of combination is #p-complete,” Artificial Intelligence, vol. 44, no. 1, pp. 245 – 253, 1990. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0004370290901037
- [40] N. Wilson, “Algorithms for dempster-shafer theory,” 01 2000.
- [41] A. Sahu, “Dempster shafer fusion graphical user interface,” https://github.com/Abhijeet1990/DempsterShafer_Fusion_GUI.git, 2021.
- [42] A. Sahu, Z. Mao, P. Wlazlo, H. Huang, K. Davis, A. Goulart, and S. Zonouz, “Cyber-physical dataset for mitm attacks in power systems,” IEEE Dataport, 2021. [Online]. Available: https://dx.doi.org/10.21227/e4dd-2163