Benchmarking Frameworks and Comparative Studies of Controller Area Network (CAN) Intrusion Detection Systems: A Review

S.Shaila Sharminlabel=e1][email protected] Corresponding author. .[ H.Hafizah Mansorlabel=e2][email protected] [ A.F.Andi Fitriah Abdul Kadirlabel=e3][email protected] [ N.A.Normaziah A. Azizlabel=e4][email protected] [ Kulliyyah of Information and Communication Technology, \orgnameInternational Islamic University Malaysia, Selangor, \cnyMalaysiapresep=
]e1,e2,e3,e4

(0000)

Abstract

The development of intrusion detection systems (IDS) for the in-vehicle Controller Area Network (CAN) bus is one of the main efforts being taken to secure the in-vehicle network against various cyberattacks, which have the potential to cause vehicles to malfunction and result in dangerous accidents. These CAN IDS are evaluated in disparate experimental conditions that vary in terms of the workload used, the features used, the metrics reported, etc., which makes direct comparison difficult. Therefore, there have been several benchmarking frameworks and comparative studies designed to evaluate CAN IDS in similar experimental conditions to understand their relative performance and facilitate the selection of the best CAN IDS for implementation in automotive networks. This work provides a comprehensive survey of CAN IDS benchmarking frameworks and comparative studies in the current literature. A CAN IDS evaluation design space is also proposed in this work, which draws from the wider CAN IDS literature. This is not only expected to serve as a guide for designing CAN IDS evaluation experiments but is also used for categorizing current benchmarking efforts. The surveyed works have been discussed on the basis of the five aspects in the design space—namely IDS type, attack model, evaluation type, workload generation, and evaluation metrics—and recommendations for future work have been identified.

Controller area network,

Intrusion detection,

Benchmarking,

Evaluation,

keywords:

^†^†volume: 0

1 Introduction

The adoption of drive-by-wire technology means that today’s vehicles are equipped with as many as 150 Electronic Control Units (ECUs) [1] which control the different subsystems of a vehicle and enable various functionalities related to performance, safety, and comfort. The operations of a vehicle rely on the communication of these ECUs among themselves, which occurs on the internal vehicular network connecting these ECUs together. One such protocol for internal vehicular networks is the Controller Area Network (CAN), which is used in nearly all modern vehicles due to its simple, inexpensive, and reliable implementation. However, the CAN bus also lacks security features, namely encryption and authentication, which makes it vulnerable to a range of attacks that can be conducted through any of the various communication interfaces a modern vehicle is equipped with, such as Bluetooth, Wi-Fi, and cellular.

Therefore, the need to secure the CAN bus has resulted in the development of CAN intrusion detection systems (IDS), particularly because a CAN IDS can be implemented without affecting the real-time performance of a CAN bus in the resource-constrained environment of an in-vehicle network [2, 3]. As such, there have been numerous intrusion detection methods proposed for the CAN bus over the years [3, 4, 5].

These works provide results of evaluation experiments to demonstrate the efficacy of their proposed methods. But across the CAN IDS literature, these evaluation methods vary in terms of the workload used, the features used, the parameters chosen, and the metrics reported. Since the evaluation of these CAN IDS is performed with different experimental setups and in different contexts, it is not known how they fare in comparison to each other. Furthermore, the evaluation experiments reported in these works may not be replicable because the implementations are not readily available or documentation is not sufficiently comprehensive to aid reproduction [2, 6]. This has resulted in efforts to evaluate CAN intrusion detection methods on an equal footing using benchmarking frameworks and in comparative studies. Evaluating CAN intrusion detection methods in similar settings facilitates the understanding of the relative performance of the CAN IDS under test and, ultimately, the selection of the best CAN IDS for implementation.

Given the various ways in which intrusion detection methods can be evaluated [7], this work proposes an evaluation design space for the CAN IDS context that includes IDS types, attacks tested, evaluation type, workload source, and evaluation metrics. This design space enumerates the various evaluation methods found in the CAN IDS literature and is aimed at serving as a guide for planning future CAN IDS evaluation experiments. A survey of benchmarking and comparison studies of CAN IDS has also been conducted and the works discussed in terms of the proposed design space to understand current efforts at benchmarking and identify avenues for future work. The contributions of this paper can thus be summarized as follows:

1.

Outlining a CAN IDS evaluation design space to aid the design of evaluation and benchmarking experiments.
2.

Providing a comprehensive survey of benchmarking frameworks and comparative studies of CAN IDS in the current literature, as well as categorizing and discussing the surveyed works according to the proposed design space.
3.

Discussing trends in current benchmarking and comparison efforts and providing recommendations for future work.

Contrary to prior survey works on CAN intrusion detection, this paper does not focus on surveying CAN IDS of different categories but rather on enumerating the various methods used for evaluating and benchmarking CAN IDS and organizing the information into an evaluation design space. We also survey benchmarking studies of CAN IDS, which have only been done for conventional IDS for computer networks. Table 1 outlines how this work compares with related literature on CAN IDS and conventional IDS.

The rest of this paper has been organized as follows. Section 2 provides background information on the CAN bus and relevant threats. Section 3 provides an overview of related works in the literature, including the state of the art in CAN IDS and the benchmarking and evaluation of traditional computer networks. The proposed CAN IDS evaluation design space is described in Section 4. Section 5 presents the survey of benchmark frameworks and comparative studies of CAN IDS, while Section 6 discusses the surveyed works in terms of the five parts of the design space. Section 7 discusses opportunities for future work, while Section 8 concludes this paper.

Table 1: Comparison of Present Study with Related Works

Work	Focuses on CAN	Discusses IDS evaluation methods, datasets and/or evaluation metrics	Proposes an IDS evaluation design space or framework	Surveys or updates previous surveys of IDS	Surveys benchmarking studies of IDS	Performs and reports new IDS benchmarking experiments	Provides recommendations for future work
Aliwa et al. [4]	✓			✓			✓
Wu et al. [3]	✓	✓		✓			✓
Karopoulos et al. [5]	✓	✓		✓			✓
Young et al. [8]	✓			✓
Al-Jarrah et al. [9]	✓	✓		✓			✓
Nappi [10]	✓	✓		✓
Rajapaksha et al. [11]	✓	✓		✓			✓
Panigrahi et al. [12]		✓			✓	✓
Kilincer et al. [13]		✓			✓	✓	✓
Milenkoski et al. [7]		✓	✓				✓
Our work	✓	✓	✓		✓		✓

2 Background

2.1 Controller Area Network

CAN is a serial multi-master communication protocol and is one of the most commonly used for internal vehicular networks. It particularly finds use for subsystems such as powertrain and chassis, which are integral to the operation of a vehicle and include functionality such as transmission, braking, and steering [4].

The CAN protocol spans the physical and data link layers of the OSI model. It is a message-based protocol whereby ECUs (i.e., nodes) communicate information related to the current state of the vehicle via message broadcasts that are received by all the other nodes on the network. A CAN message mainly consists of an arbitration identifier (AID) and a message payload of up to 8 bytes, along with other fields like the data length code (DLC) and cyclic redundancy check (CRC). The AID is used in the bit-wise arbitration process in the event of collision, whereby AIDs with lower values have higher priority. CAN also provides error checking and error confinement mechanisms. [14]

2.2 CAN attack model

The CAN protocol was designed at a time when the internal vehicular network operated in isolation from external networks. Therefore, the protocol does not provide for security in its design and lacks encryption and authentication. Aspects of the CAN protocol, such as the arbitration mechanism, can also be exploited. These factors, along with the fact that modern vehicles are equipped with various interfaces for external communication, make the CAN bus vulnerable to a range of cyber attacks that can disrupt the operations of a vehicle and cause dangerous – even fatal – accidents.

Cho and Shin [15] provide an attack model for the CAN bus that is used in several works, such as [16, 17]. This attack model assumes a compromised node on the CAN bus and uses the following terminology: a weakly compromised ECU is one that has been silenced or suspended by an attacker but cannot be used by the attacker to inject messages; on the other hand, a fully compromised ECU is one that the attacker has full control over and can use to inject messages into the CAN bus. This attack model classifies CAN bus attacks into the following categories:

1.
Fabrication attacks: These attacks involve the injection of fabricated messages into the CAN bus using a fully compromised ECU with the intention of overriding messages broadcast by particular ECUs or disrupting CAN bus communications. Fabrication attacks constitute the most common category of attacks in the CAN intrusion detection literature and include the following:
1. (a)
  
  Denial-of-service (DoS): A DoS attack can be carried out by injecting fabricated messages with the AID 0x000 or any AID lower in value than other legitimate AIDs, at a high frequency. Such messages would have the highest priority and always win the arbitration process, thereby blocking the broadcast of legitimate messages on the CAN bus.
2. (b)
  
  Fuzzing: Fuzzing involves the insertion of messages with random AIDs and payloads into the CAN bus at a high frequency. In some attacks the AIDs may be random, while in others the AID may be legitimate but the payload is random.
3. (c)
  
  Targeted ID: This type of attack involves the injection of messages of a particular AID with a manipulated payload. Such messages can be injected with a flooding delivery, i.e., at a high frequency, or with a flam delivery, where each forged message is injected immediately after a legitimate message of the same AID.
  
  Injection of messages that were previously seen and captured from the CAN bus, commonly termed as a replay attack, can also be considered a form of fabrication targeted ID attack. A replay attack can be conducted by injecting a previously captured sequence of messages, thereby targeting multiple AIDs, or by injecting a message time series of a single AID.
2.

Suspension attack: A suspension attack involves inhibiting message broadcasts from a weakly compromised ECU. This achieves an effect similar to a DoS and is observable as an absence of messages of a particular AID.
3.

Masquerade attack: Mounting a masquerade attack requires two main steps: the first is to suspend the message broadcasts from a weakly compromised ECU, and the second is use a fully compromised ECU to broadcast messages with the same AID as the former ECU but with a manipulated payload to effectively masquerade as the former ECU.

The attacks thus described range in their sophistication and in the methods that would be effective for detecting them. Attacks such as DoS and fuzzing do not require knowledge of the meaning and semantics of CAN broadcasts, which is often confidential, proprietary information and can vary among vehicle makes and models. They can also be detected by simpler methods, such as those that analyze the timing of AIDs. On the other hand, a masquerade attack is an advanced attack requiring greater skill to mount and would also require detection methods that analyze the payload of the messages [16].

3 Related works

3.1 CAN intrusion detection

Apart from the development of CAN encryption and authentication methods, the development of intrusion detection systems has been one of the main approaches being taken to secure the CAN bus [4]. This is because it is possible to implement a CAN IDS in the resource-constrained environment of an internal vehicular network without impacting CAN bus traffic or requiring changes to the CAN protocol [2, 3, 5].

A large number of CAN IDS have been proposed in the literature that vary in terms of the techniques used, features used, deployment location, etc. Numerous surveys of CAN intrusion detection methods provide categorisation for these CAN IDS, such as in [8, 3]. In their survey of cyberattacks and countermeasures for in-vehicle networks, Aliwa et al. [4] enumerate and review cryptographic methods and IDS for the CAN bus, categorising CAN IDS according to the approach used for intrusion detection. Karopoulos et al. [5] compile a meta-taxonomy of CAN IDS, which provides a way to categorize CAN IDS based on deployment location, detection technique, network layer, and reaction type.

Depending on the technique used for intrusion detection, we classify CAN IDS into the following categories (shown in Figure 1), which are similar to those proposed in [4] and the classification by type proposed by Karopoulos et al. [5] in their metataxonomy. For each category, we cite representative works and guide the reader to more detailed surveys of CAN IDS such as [8, 3, 4, 9]:

1.

Signature-based IDS: These IDS are knowledge-based or rule-based IDS that rely on a database of signatures for intrusion detection. This type of IDS is exemplified by that proposed by Studnia et al. [18], which uses formal language theory to derive attack signatures from specifications of the network and ECU behaviour. Since the architecture of the in-vehicle network remains mostly unchanged throughout a vehicle’s lifespan, the expected behaviour of an in-vehicle network can be used to derive attack signatures. Such an approach is useful for detecting common, known attacks with low false positive and false negative rates. However, to ensure that the IDS is able to detect new attacks, the signature database needs to be regularly updated. Moreover, IDS relying on attack signature databases may not be capable of detecting novel, unknown attacks.
2.

Anomaly-based IDS: These are behaviour-based methods that use patterns observed in normal CAN bus traffic and detect deviations from these normal patterns as attacks on the CAN bus. This is currently the most common approach for CAN intrusion detection, with 38 out of 41 CAN IDS works surveyed by Karopoulos et al. [5] from the years 2020–2022 being of this type. The wide variety of anomaly-based CAN IDS can be further categorized into the following:
1. (a)
  
  Statistical methods: These IDS use statistical methods to build a model of normal CAN network traffic and use them to detect anomalies. A number of CAN IDS take advantage of the fact that most CAN messages are broadcast at fixed regular frequencies. Timing-based IDS analyse time intervals between messages and raise an alert when the observed time interval deviates from the normal by a certain threshold, such as in [19, 20]. Young et al. [8] find that frequencies of messages, as opposed to timing, are better indicators of attacks like fabrication, with better accuracy and false positives. Message frequencies are also used in [21] with an adaptive cumulative sum (CUSUM) algorithm and in [22] which uses wavelet analysis. However, while these IDS are suitable for detecting anomalies for periodic AIDs, they cannot be used for AIDs broadcast aperiodically.
  
  Apart from timing and frequency, other statistical CAN IDS use features such as AIDs and message payloads for intrusion detection. Marchetti and Stabili [23] identify that normal CAN bus traffic contains recurring sequences of AIDs, and the occurrence of any unusual transition between AIDs is indicative of an attack. This approach is similar to the graph-based approach taken in [17], whereby a graph representing valid AID transitions is built from normal CAN traffic, and the chi-square test is applied to features derived from this graph for anomaly detection. The entropy of AIDs and message payloads has also been used as a means to detect anomalies in CAN traffic, in [24, 25, 26]. In order to detect attacks involving manipulation of data fields, an IDS based on calculating Hamming distances between consecutive payloads of the same AID has also been proposed [27], which is found to be effective against fuzzing attacks but not attacks involving the injection of a previously recorded message sequence (replay attack).
  
  Statistical intrusion detection methods can be described to be effective in detecting only those attacks that impact the features being analyzed. As such, while timing- or frequency-based CAN IDS can be useful for detecting fabrication and suspension attacks, they may be unsuitable for masquerade attacks, which manifest as manipulated data fields. Such IDS are still lightweight approaches compared to machine learning methods and can be easily implemented in vehicle-grade ECUs for real-time intrusion detection.
2. (b)
  
  Machine Learning (ML) methods: These IDS use ML algorithms to build a model of CAN bus traffic and use it to detect attacks on the CAN bus. Traditional learning algorithms that have been applied to CAN intrusion detection include Support Vector Machine (SVM) and k-Nearest Neighbours (kNN) [28, 29], one-class SVM (OCSVM) [30], Isolation Forest [31]. Apart from these algorithms that learn "shallow" models, many works now apply deep learning techniques that can learn complex patterns in CAN traffic. These include Deep Neural Networks (DNN) [32, 33], Long Short-Term Memory (LSTM) [34, 35], autoencoders [36, 37, 38], Convolutional Neural Networks (CNN) [39, 40], and Generative Adversarial Networks (GAN) [41]. While these works utilize features derived from CAN message frames such as AID and data fields, Xun et al. [42] propose a CAN IDS using voltage signals of ECUs, whereby 14 time-domain features are used to train a deep support vector domain description (SVDD) model.
  
  CAN intrusion detection has been treated as both a supervised and an unsupervised learning problem. While supervised learning methods require labelled datasets for training the detection models, unsupervised methods require only benign data with no attack traffic. Unsupervised methods can thus also detect novel, unknown attack types, as opposed to supervised learning methods, which can detect only the attack types they have been trained with. A detailed survey of ML-based CAN IDS can be found in [11].
3.

Hybrid IDS A hybrid IDS is one that combines signature- and anomaly-based techniques for intrusion detection. These IDS are designed to combine the benefits of both types of CAN IDS: while a signature-based technique can efficiently detect common and known attacks with low false positives, anomaly-based methods enable the detection of novel attack types. The CAN IDS proposed by Zhang et al. [43] is an example of this type of IDS that includes a whitelist of valid AIDs, a DLC validity check, a time interval-based detector, and a DNN detector implemented in that order. The DNN detector further uses AIDs, data field Hamming distances, entropy of data fields, and certain data field bytes as features, which were obtained from feature selection. Another hybrid IDS is CANova [44], which uses a reverse engineering algorithm to extract signals from CAN messages and categorize messages, which is then used to select and apply appropriate detection modules depending on the category. This IDS includes static rules-based, timing-based, Hamming distance-based, Vector Auto-Regression-based, and Recurrent Neural Network (RNN) autoencoder-based modules. CAN IDS such as these can reduce detection times since known attacks are quickly detected by more lightweight methods, while also ensuring that novel attacks are caught using more computationally intensive methods.

CAN IDS can also be divided into two categories depending on the layer of the OSI model on which they operate:

1.

Data link layer IDS: These IDS apply any of the aforementioned techniques to features derived from CAN message frames, such as AIDs, DLCs, and payload data for intrusion detection. Among payload-based CAN IDS, there are those that analyse raw data bytes (such as [27]) and others that analyse signals derived from the payload bytes (such as [36]). This type of CAN IDS forms the majority of CAN IDS in the literature, as can be seen in the various surveys of CAN IDS [3, 4, 8, 9].
2.

Physical layer IDS: These IDS use measurements of physical characteristics such as clock skew, voltage, and signal strength to perform intrusion detection. This type of IDS corresponds to OSI Layer 1. Cho and Shin [15] proposed a Clock-based IDS (CIDS), which estimates the clock skews of ECUs by analysing time intervals of periodic messages and uses them to build a profile of normal CAN traffic. On the other hand, VoltageIDS [45] and Viden [46] are CAN IDS that are based on using voltage measurements to fingerprint existing ECUs in a CAN bus and comparing observed CAN bus traffic with the fingerprints to detect intruding nodes. A survey of physical characteristics-based CAN IDS can be found in [47].

As discussed in [48], the nature of the CAN in-vehicle network—characterized by limited computing resources, real-time response requirements, and a lack of sender and receiver identification—makes the design of CAN IDS different from those meant for computer networks. As such, a CAN IDS should ideally have low resource requirements, be able to detect and report attacks immediately, and be able to process the large number of CAN messages that are generated on the bus. In terms of attack detection accuracy, a CAN IDS should have a low false negative rate as well as a low false positive rate. Since CAN bus communications are used to control safety-critical subsystems in a vehicle, a low false negative rate is required to ensure that as many attacks as possible are detected. A low false positive rate is also required so that the CAN IDS does not raise an impractically large number of false alarms. ML-based IDS methods should also be designed to resist evasive, adversarial attacks.

We find a larger number of works that utilize Layer 2 features like AIDs and data fields for intrusion detection, as opposed to physical characteristics-based CAN IDS, which are still a burgeoning area of study. Al-Jarrah et al. [9] also note the same in their review of in-vehicle intrusion detection systems where they examine surveyed works in terms of features and feature selection methods, datasets, performance metrics, benchmark models, and targeted attack types. They further find that most works focus on evaluating detection capability by reporting security-related metrics and do not report performance metrics like detection and training time. They also identify a lack of benchmark models and benchmark datasets for CAN IDS evaluation. Nappi [10] updates their survey by reviewing more recent works, finding that more CAN IDS works now contextualize their findings through comparison with some benchmark models. However, there still remains a lack of standard benchmark models, and works that report detection latency still number in the minority.

3.2 IDS benchmarking and comparison

A brief overview of prior work in CAN IDS evaluation frameworks has been provided in [2], whereby prior evaluation frameworks [49, 50] have been reviewed to highlight key features and distinguish the proposed framework. Wu et al. [3] provide a description of datasets and tools used for evaluations in prior work and compare selected CAN IDS in terms of CAN IDS type, false positive rates, contributions, and drawbacks. However, apart from these, there have not been any comprehensive surveys of benchmarking and comparison efforts for CAN IDS, to the best of our knowledge.

Similar reviews of comparative studies have been carried out for conventional computer network IDS, which mainly focus on the comparison of ML methods. Panigrahi et al. [12] note that central to the research into the usage of ML for intrusion detection is the selection of the most appropriate classifier for building IDS. Therefore, they provide a review of comparative studies that examine supervised learning methods for network intrusion detection, summarising the classification models evaluated, datasets used, evaluation metrics reported, and findings of the studies. An analysis of a total of 54 classifiers from six categories of classification models has also been presented, with 13 metrics related to detection capability reported. Similarly, Almomani et al. [51] also enumerate comparative studies of ML classifiers for network intrusion detection and present an evaluation of 10 supervised learning methods, reporting accuracy, precision, and F1-score. Finally, Kilincer et al. [13] conducted a survey of ML approaches to network intrusion detection by focusing on five datasets that are most commonly used for network IDS research. The authors note that network IDS studies are generally limited to a few datasets, examine only one or few classification methods, and examine only few attack types. To address these issues, classical ML models like SVM, kNN, and DT have been developed and evaluated using the five datasets as benchmark models. Results of experiments are compared against prior work utilizing the same datasets, thus contextualising past results.

Conventional computer network intrusion detection literature is surveyed by Milenkoski et al. [7] to examine methods employed in the evaluation of IDS, and an evaluation design space is proposed in an effort to categorize these common methods. A design space is described by Baum et al. [52] as a "multidimensional space of design choices," consisting of a set of relevant dimensions that can be used to classify and describe entities in a specific domain. The IDS evaluation design space proposed by Milenkoski et al. [7] consists of three elements: workload, metrics, and measurement methodology. Workloads for evaluating IDS are classified as benign, malicious, and mixed, based on the presence of attacks in the workload. The authors also distinguish between workloads in executable forms for live testing of CAN IDS and trace forms for later replay. Metrics are classified as security-related and performance-related metrics. The measurement methodology part of the design space identifies IDS properties that are of interest and the workload and metrics that are employed to evaluate these properties. This evaluation design space not only serves as a basis for categorising the literature, but is also aimed at facilitating the planning of IDS evaluation exercises.

The present work thus attempts to propose a design space for the context of CAN IDS evaluation, not only to categorize current comparative evaluation methods but also to provide a guideline for designing IDS evaluations for CAN, which differs from conventional computer networks in features and complexity.

4 CAN IDS evaluation design space

Similar to the guidelines for evaluating conventional CAN IDS proposed in [7], we understand that planning a CAN IDS evaluation study should begin with identification of the goals of the study and the associated constraints. The goal of an evaluation study is usually the examination of the detection capability and/or the performance aspects of one or several CAN IDS. On the other hand, the main constraints in the evaluation of CAN IDS are the availability of resources such as vehicles, testbeds, and other tools, as well as the access to requisite data and information such as proprietary CAN database files that contain rules to decode CAN messages. Consideration of these goals and constraints should then inform the design of the evaluation study.

Our proposed CAN IDS evaluation design space is not only aimed at enumerating the various evaluation methods available but is also meant to provide a way to describe any CAN IDS evaluation study completely. To do so, we have divided the design space into five essential components, summarized in Figure 1: the IDS types being evaluated, the attacks considered, the evaluation type, the workload being used, and the evaluation metrics being reported. The choices that should be made for each component thus depend on the goals and constraints of the study, as well as on the choices made for related components. Choosing to report the accuracy, precision, and recall metrics to describe detection capability is an example of a goal influencing the evaluation metric choice; while choosing online tests on a lower-cost testbed instead of a real vehicle represents a choice of evaluation type resulting from a constraint. The decision to include only fabrication and suspension attacks for testing timing-based CAN IDS is a further example of the choice for one component (CAN IDS type) influencing that for another component (attack types).

The remainder of this section describes in further detail each component of the CAN IDS evaluation design space and how consideration of goals, constraints, and related components influences the choices for each component.

Refer to caption — Figure 1: CAN IDS Evaluation Design Space

4.1 IDS type

As described in Section 3.1, a CAN IDS can be either signature-based, anomaly-based, or hybrid, depending on the detection technique used. Anomaly-based methods are further divided into statistical and ML-based methods. Depending on the features used and the OSI layer they operate on, CAN IDS can either be a data link layer or a physical layer IDS.

The classification of a particular CAN IDS informs the type of data or workload required for its evaluation—while a physical layer CAN IDS would require Layer 1 data, a data link layer CAN IDS would require logs containing CAN message frames. As a further example, a CAN IDS based on analysis of AID sequences would require only the AIDs of CAN broadcasts, while a CAN IDS that incorporates timing-based features would require high precision message timestamps.

The detection technique employed by a given CAN IDS also informs the types of attacks that can be detected by the IDS and are relevant for inclusion in its assessment. This is because different attacks manifest as changes in different features of CAN bus traffic, and CAN IDS differ in the features used. Timing and frequency-based CAN IDS [8, 19, 22, 21, 20] can be good at detecting fabrication and suspension attacks, which alter the timing and frequencies of AIDs, but ineffective for masquerade attacks, which are observed as manipulated payloads. Masquerade attacks can instead be detected by CAN IDS that analyse the data fields of CAN messages, such as the Hamming distance CAN IDS [27] or ML-based CAN IDS [36]. Physical characteristics-based CAN IDS, which fingerprint CAN IDS using physical features that are difficult to spoof, are capable of identifying malicious nodes and can thus not only detect fabrication and suspension attacks but also masquerade attacks [15, 53].

4.2 Evaluation type

This part of the CAN IDS evaluation design space identifies two types of evaluations: offline and online evaluations. In an offline evaluation, a CAN IDS is used to analyse a CAN bus log or dataset that has already been collected from a real or simulated CAN bus. This is opposed to an online assessment, where a CAN IDS performs real-time analysis of CAN bus data from a real vehicular CAN bus, testbed, simulation, or data log replay. The physical characterstic-based CAN IDS proposed in [46, 15] are evaluated in CAN bus prototypes with nodes consisting of Arduino boards and CAN shields, and with a real vehicle whereby the IDS is implemented on a node connected to the CAN bus via the OBD-II port. Ujiie et al. [54] implement their rule-based CAN IDS on an ATMega162 microcontroller and test it against both a simulated CAN bus in Vector CANoe as well as with a real vehicle. Unlike these works, Desta et al. [55] evaluate their CAN IDS, which uses an LSTM model for AID sequence prediction, by replaying a CAN bus data log and using the SocketCAN API to perform detection.

Using real vehicles is advantageous in that they most closely resemble the real-world environment in which an in-vehicle CAN IDS would operate. But it is relatively difficult to use real vehicles for CAN IDS assessments. Not only is it expensive to acquire and use a real vehicle for security testing, but mounting attacks like targeted ID, suspension, and masquerade on a real vehicular CAN bus can be difficult, time-consuming, and also pose a risk to passengers and bystanders [16, 56]. To address these problems, a testbed for the purpose of online CAN IDS evaluation has been developed in [57], which uses CARLA car simulator in combination with the Vector CANoe CAN bus simulator to generate realistic driving scenarios. The assessment of a clustering-based ML CAN IDS demonstrated inferior detection performance in the online assessment using the testbed compared to an offline experiment with a dataset collected from the same testbed, which highlights the importance of online assessment regardless of its drawbacks.

On the other hand, offline assessments with collected CAN bus logs (as well as online tests by replaying logs) can be performed repeatedly with relative ease to obtain statistically significant evaluation results. A large number of works, such as [22, 36, 20, 27, 25, 44, 43], use collected CAN bus logs to assess their proposed CAN IDS. These datasets are commonly collected either via the OBD-II port available in all vehicles or by tapping into the in-vehicle CAN bus. Offline evaluation is a good starting point to understand the detection capability in terms of accuracy, false positive rates, etc. of a CAN IDS before using online evaluations to understand performance aspects of the CAN IDS such as detection times. Using publicly available datasets further enhances the reproducibility of CAN IDS works and allows direct comparison with results from other CAN IDS assessments performed with the same datasets.

4.3 Workload

Workload can be described as the work that must be performed by a system and can be viewed as input to the system [58]. In the context of a CAN IDS, its workload comes from CAN bus traffic or measurements of physical characteristics. While CAN IDS datasets serve as the most common source of data to evaluate CAN IDS, in this paper we borrow the term ’workloads’ from the wider conventional IDS literature to describe any input to a CAN IDS under test regardless of the source it originates from—which could be not just datasets but real-time CAN bus traffic or measurements of physical quantities from a vehicle, testbed, or simulation in an online test.

The workload used to evaluate an IDS may be generated in various ways [7]. For CAN IDS evaluation, we may distinguish between benign, attack-free workloads and malicious workloads containing attacks. Benign CAN IDS workloads may be from a real vehicle (real) or artificially generated (synthetic). Attack workloads can be obtained by conducting attacks on a real CAN bus (real attacks) or via simulation, which can be by manipulating a collected benign CAN trace to include attacks or by artificially generating malicious CAN workloads (simulated attacks). These various types of workloads are outlined in Figure 2.

A number of publicly available CAN datasets have been published in recent years, which has made them a popular choice for the design and evaluation of CAN IDS [10]. The Hacking and Countermeasures Research Lab (HCRL) has published three CAN intrusion datasets [59, 60, 41], which contain benign CAN bus logs as well as logs of fabrication attacks such as DoS, fuzzing, and targeted ID conducted on a real in-vehicle CAN bus. These datasets have been used in [61, 17, 40, 31, 62, 63] among others. Verma et al. [16], who also provide a comprehensive survey of CAN intrusion datasets, present the Real ORNL Dynamometer (ROAD) dataset, consisting of real benign samples and samples of fuzzing, targeted ID (flam delivery), and masquerade attacks. While the fabrication attacks were conducted on a real vehicle CAN bus, the masquerade attack samples were created by manipulating the targeted ID logs. This dataset has been used in [64].

The dataset provided in [65] consists of logs collected from both a CAN bus prototype as well as real vehicles. Unlike the datasets mentioned thus far, the attack samples are simulated in that the benign data from vehicles has been augmented to create attack datasets (with the exception of the attack samples collected from the prototype). This dataset has been used in works such as [66, 22]. In a similar vein, the CrySyS lab has published real benign CAN bus logs and provided a log infector tool that can be used to manipulate benign logs to create masquerade attack samples. This tool has been used to create the attack samples used to evaluate the statistical CAN IDS in [67]. A completely synthetic dataset, SynCAN, with simulated targeted ID, suspension, and masquerade attacks is published by the authors of [34], where they use it to evaluate their LSTM autoencoder-based CAN IDS. It has also been used in [37, 38], which are also autoencoder-based CAN IDS. Apart from ROAD, SynCAN is the only dataset providing translated signal values instead of raw CAN data fields.

Physical fingerprinting-based CAN IDS, unlike data link layer CAN IDS, cannot be evaluated using the aforementioned datasets. Towards this end, Foruhandeh et al. [68] have published a dataset consisting of voltage measurements from a real vehicle along with their Single-frame based Physical-Layer (SIMPLE) identification solution that can detect intrusions and identify the sending ECU for each message. This dataset is also used in [69] for the detection of a hill-climbing style masquerade attack whereby an attacker attempts to evade detection and alter ECU fingerprints to do so. Popa et al. [70] have also made available clock skew and voltage data from a total of 54 ECUs across 10 vehicles, which can be used for the development of physical layer CAN IDS.

4.4 Attack model

The attack model considered in this design space is the same as the one described in Section 2.2.

As discussed in section 4.1, the attack types that can be detected by a CAN IDS depend on the features and techniques used by the CAN IDS. However, this is not the only consideration a researcher has to make when selecting attack types for CAN IDS assessment. Fabrication attacks are relatively less complex compared to suspension and masquerade attacks and form the most common class of attacks found in the CAN IDS literature [16], which is reflected in the fact that real attack datasets are available for only these attack types [16, 60, 41, 59]. On the contrary, while CAN intrusion datasets with suspension and masquerade attacks are available, these are simulated attack samples created either from a purely synthetic benign dataset [34] or a real benign dataset [65]. The advantage of using real attacks over simulated attacks is that in the former case, the attacks are known to have caused an effect on the operations of the vehicle, i.e., the effects are physically verified. For example, while creating the ROAD dataset, the authors of [16] noted abnormal behaviour such as accelerator pedals becoming ineffective, false displays on speedometer, and incorrect reverse light status. With simulated attacks, not only is it impossible to verify their physical effects, but the attack simulation method (such as manipulating benign CAN bus logs) may result in an unrealistic attack sample.

Another important consideration in selecting attack datasets for evaluation is the "difficulty" of the detection problem captured in the dataset [71]. Verma et al. [16] find that datasets such as [59] and [41] that are commonly used consist of unstealthy attacks with high frequency injection of malicious messages that can be detected by trivial, timing-based detectors and are less suitable for assessing more sophisticated CAN IDS that should be able to detect low rate fabrication attacks [72, 16].

Since the usage of real test vehicles is outside the reach of many researchers, there is a need for comprehensive CAN intrusion datasets consisting of real attack traces of all known attack types, ranging from simple, easily detected attacks to complex, stealthy attacks, for robust CAN IDS evaluation and benchmarking. This has become necessary for physical layer CAN IDS as well, most of which continue to be evaluated with real vehicles and testbeds.

4.5 Evaluation metrics

As per [7], the metrics selected and reported while evaluating an IDS should depend on the properties of the IDS being assessed. Metrics can be security-related metrics, which quantify attack detection capability, or performance-based metrics, which quantify non-functional aspects of an IDS such as resource consumption. The authors of [7] also distinguish between basic security metrics, which quantify individual attack detection properties, and composite security metrics, which combine basic metrics.

Security-related metrics allow assessment of attack detection properties such as attack detection accuracy and resistance to evasive attacks. These include classification accuracy, precision, recall, F1-score, and false positive rate, which have been reported in [28, 20, 8, 55, 38, 43] among others. Basic security metrics such as true positive rate, false positive rate, false negative rate, true negative rate, precision, and recall must be reported and analyzed together to understand the performance of a given CAN IDS [7]. Some works choose to report receiver operator characteristics (ROC) and area under curve (AUC) [21, 36], which are considered composite security metrics, to indicate the detection performance of an IDS at multiple operating points.

An important problem to consider when using CAN datasets for assessment is the class imbalance in such datasets, whereby malicious messages that are a part of attack traffic are usually present as a very small percentage of total captured CAN bus traffic. This is further reason to not rely on a single metric like accuracy, which would yield high values for a detector that only predicts the normal class for a highly imbalanced dataset, and instead use a suite of security metrics to understand the ability of the IDS to distinguish between normal and attack traffic. While some may choose to counter the imbalanced dataset problem by reporting balanced accuracy, Chicco et al. [73] recommend use of the Matthews Correlation Coefficient (MCC), which has been described as a single metric that summarizes the performance of a binary detector. MCC is especially suited for CAN intrusion detection since it is equally important for a CAN IDS to correctly classify both normal and attack traffic, i.e., to keep both false positive and false negative rates low. The MCC is reported alongside other security metrics in [44].

Considering the safety-critical nature and real-time requirements of the in-vehicle network, it may be argued that attack detection latency is an important metric that should be considered in assessing the attack detection capabilities of a CAN IDS [74, 10, 9]. Detection latency, or Time To Detection (TTD), has been defined as the time taken to classify a CAN message from the time it was received [10] and has been reported in [75]. Nichelini et al. [44] report Testing Time per Packet (TTP) as the ratio of the total detection time and the number of messages in their test dataset, which gives the average time taken by the CAN IDS to evaluate a single CAN message. Unlike these works, Sunny et al. [76] report best case and worst case computation time to examine if their proposed CAN IDS can be used for real-time evaluation of CAN messages, which can be published every 2 ms.

Apart from detection latency, non-functional properties of a CAN IDS like resource consumption, performance overhead, and workload processing capacity are also of interest, particularly since they are expected to be deployed in the resource-constrained environment of the in-vehicle network. A CAN IDS that is highly accurate in detecting attacks may still become impractical for implementation in the in-vehicle network if it is not able to process rapidly generated CAN bus traffic in time or requires significant computing resources to maintain quick response times. An analysis of computational complexity and memory requirement have been provided for the Hamming distance-based CAN IDS in [27]. Unlike works with offline assessment using datasets, memory footprint in kilobytes and inference time (similar to TTD) of the autoencoder-based detector in [38] have been measured by implementing the IDS on an automotive-grade microcontroller.

5 Survey of benchmark frameworks and comparative studies of CAN IDS

This section provides an overview of the evaluation frameworks and comparative studies of CAN IDS that have been detailed in the literature. The papers included in this study were published between 2017 and 2022 (September) and were selected by conducting a search on Google Scholar, IEEEXplore, and the ACM Digital Library using the keywords "controller area network intrusion detection system benchmark", "controller area network intrusion detection system evaluation", "controller area network intrusion detection system testbed", "controller area network intrusion detection system comparative". Papers were included in and excluded from this study by considering abstracts.

The surveyed works differ in their scope in terms of the types of IDS evaluated, the attack types tested, and the metrics reported. Since the works mostly restrict themselves to particular types of CAN IDS, they have been categorized as those that evaluate statistical CAN IDS (listed in Table 2) and those that evaluate ML-based CAN IDS (listed in Table 3). The attack types that have been used for evaluation by the surveyed works are summarized in Table LABEL:table-attacks, while the reported metrics are provided in Table LABEL:table-metrics.

Table 2: Statistical CAN IDS Evaluated in Surveyed CAN IDS Benchmark Frameworks and Comparative Studies

Work	IDS Evaluated	Key Features Used
		Timestamp	AID	Payload
Ji et al. (2018) [6]	Entropy-based [24]		✓
	Clock skew-based [15]	✓
	ID sequence algorithm [23]		✓
	Frequency-based [77]	✓
Dupont et al. (2019) [49]	Diagnostic messages detection [78]		✓
	Pattern matching^a [54, 79]		✓	✓
	Time interval-based [80]	✓
	Frequency-based [77]	✓
	Time interval-based [19]	✓
	Time interval-based [20]	✓
	Entropy-based (for CAN message windows) [25, 24]		✓
	Entropy-based (for flows of individual AIDs) [25, 24]		✓
Stabili et al. (2021) [81]	ID sequences algorithm [23]		✓
	Entropy-based algorithm [25]		✓
	Hamming distance [27]			✓
	Missing message algorithm [82]	✓
Agbaje et al. (2022) [2]	Time interval-based [19]	✓
	Frequency-based [8]	✓
	CUSUM [21]		✓
	Entropy-based [24]		✓
	Graph-based [17]		✓
	ID sequences algorithm [23]		✓
	Hamming distance [27]			✓
	Neural network^b [61]		✓	✓
Blevins et al. (2021) [83]	Mean Inter-message time	✓
	Binning	✓
	Fitting a Gaussian curve	✓
	Kernel Density Estimation	✓
Stachowski et al. (2019) [84]^c	Anomaly-based IDS
^aSignature-based CAN IDS
^bML-based CAN IDS
^cDetails of CAN IDS evaluated are not disclosed

Table 3: ML-Based CAN IDS Evaluated in Surveyed CAN IDS Benchmark Frameworks and Comparative Studies

Work	IDS Evaluated	Key Features Used
		Timestamp	AID	Payload
Taylor et al. (2018) [85]	Long Short-Term Memory network			✓
	Gated Recurrent Unit network			✓
	Markov chains			✓
Berger et al. (2019) [86]	One-Class Support Vector Machine	✓	✓	✓
	Support Vector Machine	✓	✓	✓
	Neural network	✓	✓	✓
	Long Short-Term Memory network	✓	✓	✓
Moulahi et al. (2021) [87]^a	Support Vector Machine	✓	✓	✓
	Decision Trees	✓	✓	✓
	Random Forest	✓	✓	✓
	Multi-layer perceptron	✓	✓	✓
Costa Cañones (2021) [50]	Isolation Forest			✓
	One-Class Support Vector Machine			✓
	Autoencoder with neural network			✓
	Autoencoder with LSTM			✓
	Autoencoder with GRU			✓
	CANnolo (26)			✓
Swessi and Idoudi (2021) [88]	Decision Trees	✓	✓	✓
	Random Forest	✓	✓	✓
	Bagging Tree	✓	✓	✓
	Extra Trees	✓	✓	✓
	Gradient Boosting	✓	✓	✓
	Adaptive Boosting	✓	✓	✓
	Voting	✓	✓	✓
	Stacking	✓	✓	✓
	eXtreme Gradient Boosting	✓	✓	✓
	Light Gradient Boosting	✓	✓	✓
	Category Gradient Boosting	✓	✓	✓
Anyanwu et al. (2021) [89]	Tree		✓	✓
	Support Vector Machine		✓	✓
	Ensemble Learning		✓	✓
	Discriminant models		✓	✓
	Nearest Neighbour		✓	✓
	Logistic Regression		✓	✓
Okokpujie et al. (2021) [90]^a	Feedforward neural network		✓	✓
	Support Vector Machine		✓	✓
^aThese studies model intrusion detection as a multi-class classification problem where the attack type is predicted

5.1 Statistical intrusion detection methods

Ji et al. [6] is one of the first works to present a comparative study of light-weight, statistical CAN intrusion detection methods from the literature. Four statistical-based intrusion detection methods are evaluated, which are based on analysing information entropy, clock skew, ID sequences, and CAN bus throughput, respectively. These methods use only AID and arrival timestamps as features. Four simulated attack datasets have been used for evaluation, which, like many other attack datasets, are highly imbalanced. The clock skew approach was found to be the best overall across the tested attack scenarios, with almost perfect true positive rates (TPR) and false positive rates (FPR). Among the other methods, the throughput approach was best at detecting flooding, while the ID sequences method was good at detecting the injection of forged messages. However, the entropy approach did not yield good results for replay attacks, where the entropy of the CAN bus stream is not significantly changed.

Dupont et al. [49] present a unified framework for CAN IDS evaluation as well as a publicly available dataset created for the same. This includes benign data collected from two live vehicles and a CAN bus prototype. Various fabrication attacks as well as a suspension attack are simulated on the benign dataset to create the attack dataset (except for one real targeted ID attack on the prototype). Two other publicly available datasets consisting of real attacks have also been used. It was found that most of the methods are only able to detect attacks that cause drastic changes, as in the case with flooding. The methods are described as relying on narrow indicators of compromise and producing too many false positives. The authors suggest that content-aware methods that take into account not just the bit representation but also the semantics of CAN messages would yield better results.

The benchmark framework by Stabili et al. [81] allows the evaluation of four IDS algorithms from the literature against a threat model consisting of three attack types: replay, fuzzing, and disruption. The attack datasets used for testing, representing seven attack scenarios, have been created by simulating attacks on a real benign dataset collected from a vehicle. The four algorithms chosen use different features of CAN bus traffic for anomaly detection: entropy, AID sequences, payloads, and timing. It was found that while the message sequence algorithm showed efficacy in all attack scenarios, the other IDS were effective only for certain attacks.

Agbaje et al. [2] identify inconsistencies in three aspects that hinder comparative evaluation of CAN IDS: disparate training datasets, disparate evaluation datasets, and disparate evaluation metrics. To address these inconsistencies, a flexible evaluation framework is provided to enable consistent and repeatable evaluation of CAN IDS. This work differs from previous art in that it allows the addition of new datasets and algorithms alongside the ones that have been compared in this work. A variety of intrusion detection methods have been evaluated that differ in the features and techniques used. It is observed that different methods have their own strengths and weaknesses and are not equally capable of detecting all attacks. The authors conclude that more generalised methods that take into account the interrelationships between messages, such as graph- and ML-based methods, could be better suited to detect a large variety of attacks.

The study by Blevins et al. [83] uniquely focuses on benchmarking only timing-based intrusion detection methods against the ROAD dataset [16], which consists of real attacks with verified physical effects. Alongside fuzzing attacks, the dataset also includes targeted ID attacks using flam delivery, which makes them stealthy attacks with minimum numbers of injected messages. Four statistical methods utilizing timing of CAN messages have been evaluated against fuzzy attack and targeted ID attack logs. The binning and mean inter-message time methods performed well, both with and without outliers, as opposed to the methods relying on fitting a distribution curve to the data. Binning was the best detector in terms of both the area under curve (AUC) of the PR-curve and the F1-score, and was implemented in an OBD-II plug-in prototype. While the experiments reported here are offline experiments, a detection latency analysis is provided for the binning algorithm.

Stachowski et al. [84] provide an assessment methodology for evaluating and comparing intrusion detection products designed for the automotive CAN bus. This evaluation methodology differs from the rest of the surveyed works in that it involves online evaluation of CAN IDS—all the CAN IDS being evaluated are integrated into test vehicles for real-time intrusion detection. Three anomaly-based CAN IDS products have been evaluated to demonstrate this methodology, but further details on the vendors or intrusion detection methods have not been provided. While the methodology includes qualitative and quantitative metrics, the assessment carried out used only quantitative metrics related to attack detection accuracy. A large number of targeted ID attacks were carried out on the CAN bus of test vehicles while the vehicles were stationary and in motion. The IDS were also tested in driving scenarios with no attacks being carried out. Furthermore, evaluations were carried out in two phases, with IDS vendors given the opportunity to fine-tune their IDS for the second phase. As a result, the evaluated CAN IDS were generally found to perform better in the second phase, with higher true positive rates (TPR) and lower false positive rates (FPR). However, none of the IDS evaluated were found to be effective in detecting all attacks.

5.2 ML-based intrusion detection methods

Taylor et al. [85] develop IDS based on two types of recurrent neural networks (RNN)—long short-term memory (LSTM) and gated recurrent units (GRU)—and compare them against Markov models. To facilitate IDS evaluation, a comprehensive attack framework is proposed with parametrised attack descriptions that can be used to generate realistic, representative attack simulations. This attack framework has been used to create attack datasets for 537 test cases with different types of fuzzing and targeted ID attacks. It is found that the RNN models generally perform very well with high AUC measures, with the large LSTM model deemed the best. On the other hand, Markov models were not much better than chance at anomaly detection. The effect of changing attack parameters has also been reported, whereby the performance of the LSTM detection model is found to vary depending on the variability of the CAN signals, AIDs, and attack types.

Berger et al. [86] examine the performance of four supervised and unsupervised learning methods against the same datasets containing DoS, fuzzing, and targeted ID attacks. Experiments were performed by varying the number of training samples (for OCSVM and SVM) and the number of neurons (for neural networks and LSTM). OCSVM demonstrated a high bias towards predicting the normal class. On the other hand, SVM and neural network provided high accuracies, but require attack samples for training, which is disadvantageous. Although LSTM demonstrated diminished performance and is the most computationally intensive method tested, the authors conclude that it can be improved and is a viable method of intrusion detection.

The comparative study of Moulahi et al. [87] differs from the works discussed thus far in that the IDS models implemented are multi-class classification models with three attack classes. Four traditional classification algorithms have been evaluated with a feature set that not only includes the AID and payload data of each CAN message but also the AIDs of the previous three messages. While the remote impersonation attack is detected with near-perfect metrics, the performances of the evaluated models suffer when it comes to the fuzzing and DoS attacks, due to the lower number of examples of the latter attack types in the dataset. The findings of this work have also been put into context by comparing the results with previous work using the same dataset. Overall, the random forest (RF) model in this work is found to be the best in terms of both detection accuracy and training and testing times.

The benchmark framework provided by Costa Cañones [50] evaluates Isolation Forest, OCSVM, and autoencoder intrusion detection models against various simulated attack datasets. Apart from three basic attack scenarios, this work uniquely presents two sophisticated adversarial attack scenarios to examine IDS evasion. The IF and OCVSVM were shown to have inferior performance compared to the autoencoder detectors, particularly in terms of accuracy and recall. Furthermore, it is shown that the autoencoder detectors that also took into consideration time sequences were effective for detecting attacks with valid payloads. They were also more resilient against adversarial attacks designed to evade them.

Like in [83], Swessi & Idoudi [88] study a particular type of intrusion detection models—ensemble learning—for the detection of fuzzing attacks. Fuzzing attacks can be difficult to detect, especially when the injected messages use legitimate AIDs. This extensive study compares 11 different ensemble algorithms against three real fuzzing attack datasets. Ensemble learning methods were generally found to be very effective with very high accuracies, but bagging, extreme Gradient Boosting (XGB), light gradient boosting (LGB), and category gradient boosting (CGB) were the best in terms of both detection rates as well as training and testing times.

Anyanwu et al. [89] conducted a comparative study of a total of 22 different ML models across six types, to determine the best algorithm for CAN intrusion detection. Although the attack model or attacks tested are not described in this work, the documentation of the datasets used for evaluation describes them as containing fuzzing and DoS attacks. Decision trees, KNN, SVM, and ensemble models gave accuracies of 100%. Training time and the number of misclassifications were used to differentiate among these models with perfect accuracies; the decision tree classifier of the type Fine was deemed the best considering all these factors.

Okokpujie et al. [90] also conducted a comparative study between Feedforward Neural Network (FNN) and SVM models against four different real attack datasets. The SVM models evaluated include those with linear, polynomial, radial basis, and signmoid kernels. As in [87], intrusion detection has been modeled as a multi-class classification problem that includes attack type classification. The authors note that accuracy in itself is not a sufficient measure of detection performance, particularly with unbalanced attack datasets, and therefore also report precision, recall, and F1-score. Considering all reported metrics, SVM with the radial basis kernel was found to be the best, while the FNN model was not able to detect some of the attack types at all.

6 Discussion

In this section, the reviewed benchmarking framework and comparative studies are further categorised and discussed in terms of the proposed design space to understand current efforts as well as opportunities for future work.

6.1 IDS type

Regarding the type of CAN IDS, anomaly-based intrusion detection methods are clearly seen as the way forward for CAN intrusion detection—anomaly-based methods are the only type of IDS that were found to have been benchmarked and compared in the surveyed works. In the wider CAN IDS literature, anomaly-based IDS are indeed the most common type of IDS proposed [4, 5]. Signature-based methods are not very common since complete CAN attack signatures do not exist, and such databases are difficult to create as the implementation of CAN messaging differs among vehicles of different models and makes [4].

Among the anomaly-based methods, each of the surveyed studies restricted themselves to either one of two types of IDS: statistical methods or ML-based methods. The exception to this is the evaluation framework in [2], where a neural network IDS is included along with the other statistical methods. Papers benchmarking statistical methods [2, 81, 49, 6] include a variety of IDS that use different types of features and techniques for attack detection, including timing and frequencies of AIDs, AID sequences, Hamming distances in payload, and entropy of CAN messages. Meanwhile, studies of ML-based methods have applied algorithms like SVM, decision trees, Isolation Forest, ensemble algorithms, as well as various types of neural networks.

Since intrusion detection methods are not equally effective at detecting all attack types, some studies focus on particular intrusion detection techniques or attack types. Blevins et al. [83] benchmark only timing-based intrusion detection methods, which are computationally inexpensive but are effective only against attacks that alter the timing of CAN messages on the CAN bus. This is why only fuzzing and targeted ID attacks are used for evaluation and not masquerade attacks, which do not impact message timing. Another work [88] examines the detection of fuzzing attacks in particular, which can be difficult to detect, using ensemble learning algorithms that use timestamps, AIDs, and payload data as features.

It is also observed that while most of the surveyed works treat anomaly detection as a binary classification problem, two of these—[87] and [90]—model intrusion detection as a multi-class classification problem aiming to classify attack data into attack types. Attack classification can indeed be a useful addition to an IDS since the type of attack can inform attack mitigation responses.

Finally, we find no benchmarking or comparative evaluation study that includes physical characteristics-based CAN IDS. Review of physical layer CAN IDS reveals that they are commonly evaluated in online tests with testbeds and real vehicles. These evaluation methods are not common in the surveyed works, which take advantage of publicly available datasets. This indicates a need for comparative evaluation frameworks for online testing as well as datasets (such as [68, 70]), which would encourage similar benchmarking studies of physical layer CAN IDS.

6.2 Evaluation type

If we distinguish between offline and online evaluations, almost all of the works surveyed have performed benchmarking with offline evaluations, whereby the assessed CAN IDS analyse collected CAN bus datasets. This can be explained by the proliferation of publicly available datasets covering common attack types in the literature, such as fabrication attacks. With comprehensive documentation, offline experiments using datasets can be replicated easily. Offline assessments can also be conveniently used to evaluate multiple IDS on an equal footing against the same dataset under equivalent test conditions.

On the other hand, the assessment methodology in [84] is the only one among the surveyed works to include online evaluations of CAN IDS, whereby IDS products have been integrated into test vehicles for evaluation in different scenarios. As noted in Section 6.2, online tests are relatively difficult to perform with a real vehicle due to costs and safety risks. Notably, we did not find any studies that utilize testbeds (such as that proposed in [57]) or simulations (such as Vector CANoe, as used in [54]) for the purpose of benchmarking, both of which are options that provide a more realistic environment for assessing detection capability and performance without the disadvantages of using real vehicles. This indicates a need for mature frameworks for real-time assessment and benchmarking of CAN IDS that can not only be implemented without the costs and risks associated with test vehicles but also allow repeatable assessments of CAN IDS under identical experimental conditions.

6.3 Workload

As mentioned in the previous subsection, the most common method used for comparative evaluation of CAN IDS is using datasets for offline experiments. The popularity of real CAN bus datasets in the broader CAN literature is reflected among the surveyed benchmarking frameworks and comparative studies, wherein all the datasets used are derived from the CAN bus of real vehicles. However, they differ in whether the attacks themselves are real or simulated. Half of the works surveyed use publicly available datasets from HCRL [59, 41, 60, 94], which consist of real attack traces. Another dataset including real attacks with physically verified effects is the ROAD dataset [16], which was more recently published and has been used in only two of the surveyed works [83, 88]. Other than these, the attack datasets that are used have been created by modifying logs of real CAN bus traffic to simulate attacks on the CAN bus. One of the earliest works [6] performed their comparative evaluation with an unpublished dataset whereby benign data collected from a real car is replayed in a CAN bus simulation software, where attack scenarios are conducted to generate attack samples. The study by Taylor et al. [85] is unique in that they propose a method for generating attack samples using a parametrized attack framework. While it may be argued that approaches such as these [6, 85] may not produce realistic attack samples, these methods provide a customisable way of generating attack samples, enabling assessment against a variety of attack scenarios—from high frequency message injection attacks to low rate injection attacks and masquerade attacks.

Unlike the remaining studies, [84] perform online evaluations on real test vehicles, the CAN buses of which are made to undergo different types of targeted ID attacks generated using attack scripts.

6.4 Attack model

Among all the works surveyed, fabrication attacks are the most common attack type tested, which is due to the fact that fabrication attacks comprise the majority of attacks found in CAN IDS literature, and there are several CAN datasets that provide real CAN bus logs with real attack samples [16]. On the other hand, few works evaluate intrusion detection methods against suspension and masquerade attacks. Only one study [49] performed IDS evaluations against all attack types in our attack model, but with simulated attack datasets. This highlights the need for realistic samples of these attack types.

The benefit of using real attacks over simulated attacks is realised in the online assessments in [84], where the physical effects of attacks like doors locking could be observed and verified and the evaluated CAN IDS could be confirmed to be effective in realistic attack scenarios. This is also the case for studies that use datasets like [16, 41] that were collected from a real vehicle undergoing cyberattacks. However, real attack datasets that have been used in the surveyed works have their limitations. For instance, the dataset [41] used in [2, 49] does not provide a difficult detection challenge since it consists of fabrication attacks that can be easily detected by trivial IDS methods and may not sufficiently test IDS methods that are capable of detecting more subtle attacks [16].

Apart from the attack classes identified in Section 2.2, ML-based CAN IDS have also been demonstrated to be vulnerable against adversarial attacks [95] that are designed to evade detection. The benchmark framework in [50] also evaluates the IDS under test against adversarial attack samples generated using Generative Adversarial Networks (GAN) and heuristics.

6.5 Evaluation metrics

From the evaluation metrics described in Section 4.5, we find that the surveyed studies all focus on examining attack detection accuracy by presenting security-related metrics almost exclusively. These metrics include classification accuracy, precision, recall, false positive rate (FPR), and receiver operator characteristic (ROC) curve. In order to avoid the pitfalls associated with relying on only a single metric like classification accuracy [96], the majority of the works report multiple metrics to provide a complete picture of attack detection capability. The study in [88] also reports balanced accuracy in addition to other metrics to account for class imbalance in attack datasets. Furthermore, as noted in subsection 6.4, resistance to evasive attacks is evaluated in the benchmark framework provided in [50].

On the contrary, performance metrics are not widely reported in the surveyed studies. None of the surveyed works report detection latency, with only one work [83] providing an analysis of detection latency in terms of the computational time of the detection algorithms. Detection speed and latency to filter benign messages are included in the suite of metrics in the online assessment by Stachowski et al. [84], but they are ultimately not recorded and reported. Another study [88] reports testing execution time in addition to training time as a means of indicating performance. This trend can be explained by the fact that, unlike the security-related metrics that can be obtained in offline experiments with datasets, obtaining precise measurements of detection latency requires some form of online testing, such as simulations or testbeds. This further highlights the necessity of online testing for benchmarking CAN IDS. We also find a limited examination of non-functional properties such as resource consumption, performance overhead, and workload processing capacity.

7 Recommendations for future work

In compiling our evaluation design space and conducting the survey of benchmarking and comparative evaluation studies, we found several concerns that lead to opportunities for future work. First, we observed a lack of studies that incorporate CAN IDS from different categories. Only [2] and [65] include an ML-based and signature-based CAN IDS, respectively, in their studies of otherwise statistical CAN IDS. We were also not able to find studies that benchmark physical layer CAN IDS along with data link layer CAN IDS, or indeed, a study with only physical layer CAN IDS. Secondly, offline evaluations with CAN datasets were found to be the most prevalent methodology employed for benchmarking, with only one study [84] performing CAN IDS evaluations on real vehicular networks. However, this study focuses on presenting an assessment framework and does not disclose the mechanics of the CAN IDS being evaluated, thus providing no insight into detection techniques and their effectiveness against the selected attack types. We also observe that more complex attack types like suspension and masquerade are not considered in the surveyed works, while in the broader CAN IDS literature, there are numerous CAN IDS that consider the detection of masquerade attacks [64, 44, 43]. Finally, we find that all the reviewed studies are restricted to the assessment of security-related properties, and most do not provide assessment of performance factors such as detection latency or resource overhead. In light of these issues, we provide the following recommendations for future work:

Comprehensive benchmarking datasets

Benchmarking datasets collected from real vehicles and consisting of real attack traces are crucial for the development and evaluation of CAN IDS. As opposed to synthetic datasets and simulated attacks, real traces with real attacks capture the dynamics of CAN bus traffic and are best suited for evaluating CAN IDS. While a number of CAN datasets have been published for IDS evaluation [5, 72, 16], none has emerged as a benchmark dataset in the way the KDD Cup and DARPA datasets are used for computer network IDS evaluation [4]. There is also a need for comprehensive datasets of real attack traces of all known types. While several traces containing fabrication attacks are available, currently available datasets do not contain real traces of suspension or masquerading attacks, and more attacks are being discovered. The creation of such a dataset would allow robust benchmarking and evaluation of CAN IDS, ensuring that CAN IDS being developed can effectively detect at least all known attacks.

Methods of synthesizing datasets and simulating attacks are also not without merit: creating real attack datasets requires considerable resources and skill, and simulation tools can allow the creation of customisable attacks, as the parametrised attack framework in [85] illustrates. Such datasets can therefore fill in gaps where real datasets are not available.

Online evaluation methods

Current benchmarking and evaluation studies mostly use offline methods with CAN bus datasets. This restricts the kinds of CAN IDS properties that can be assessed – the surveyed works almost exclusively focus on evaluating attack detection accuracy. Apart from attack detection accuracy, attack detection latency, resource consumption, and workload processing capability are important considerations for an in-vehicle IDS but have not been examined sufficiently in current CAN IDS benchmarking studies. Therefore, there is a need for online evaluation methods for CAN IDS, such as in [84], that can allow repeatable comparative evaluation studies. Benchmarking with online testing methods, such as using trace replay tools, simulations, and testbeds, can facilitate not just the assessment of non-functional properties but also allow a more accurate evaluation of attack detection capability in a manner close to real operating environments. Documentation becomes an important aspect of reporting online evaluation conducted using real vehicles, testbeds, and simulations; documenting and reporting hardware devices and components, network topologies, and source code used would further enhance the reproducibility of results.

Benchmarking Layer 1 CAN IDS

While most CAN intrusion detection methods in the literature use features derived from OSI Layer 2 data, i.e., CAN frames, the physical characteristics-based methods use Layer 1 information for attack detection. So current CAN attack datasets cannot be used to evaluate these physical IDS and, as a result, these IDS have not been benchmarked alongside the CAN IDS using CAN messages. Hence, inclusive benchmarking methods are needed where both Layer 1, Layer 2, and even hybrid CAN IDS can be evaluated and compared with each other under similar test conditions. Apart from using real vehicles and testbeds, this can be realised by creating datasets of physical measurements (such as [68, 70]) and datasets with both Layer 1 and Layer 2 data for offline testing, as well as simulations of physical characteristics for online testing. Inclusive benchmarking of Layer 1 and Layer 2 CAN IDS will allow direct comparison among different types of CAN IDS and also facilitate the development of comprehensive intrusion detection methods that incorporate more diverse features and detect a wider range of attacks.

Comprehensive evaluation metrics

Among the surveyed studies, it is observed that the set of metrics reported differs, which hinders direct comparison across studies. Furthermore, the focus is only on security-related metrics, with performance-related metrics largely not being measured or reported. But in order to select an IDS for implementation in automotive networks, it is necessary to assess not just attack detection capability but also detection latency and other non-functional properties. This means that a comprehensive suite of evaluation metrics needs to be developed that includes both security- and performance-related metrics, covering the assessment of CAN IDS in all practical aspects. The selection of security-related metrics should consider factors such as class imbalance in attack datasets and prior likelihood of attack, which make using only classification accuracy or ROC insufficient [96] and necessitate metrics like balanced accuracy and MCC, which give equal importance to both normal and attack classes.

8 Conclusion

Many intrusion detection methods are being developed for the CAN bus in an endeavour to secure it against various types of cyberattacks that have the potential to cause vehicles to malfunction and result in dangerous accidents. The evaluation of these CAN IDS varies in terms of the CAN IDS type assessed, the attack types considered, the evaluation type, the workload used for evaluation, and the evaluation metrics reported. We thus propose a CAN IDS evaluation design space in the manner of [7] encapsulating these five aspects of CAN IDS assessment, with the aim of categorizing current CAN IDS works and serving as a guide for planning evaluation studies by enumerating existing approaches to CAN IDS evaluation.

CAN IDS are usually evaluated under disparate experimental conditions, which hinders direct comparison. Therefore, there have been a number of benchmark frameworks proposed and comparative studies conducted that evaluate CAN IDS in similar experimental conditions to reveal how they perform in relation to each other. Such benchmarking efforts ultimately facilitate the selection of the most appropriate CAN intrusion detection methods for implementation in in-vehicle networks. This work surveys current efforts at benchmarking and comparing CAN IDS and discusses them in terms of the proposed CAN IDS evaluation design space in order to understand current trends as well as directions for future work.

From the surveyed works, it is apparent that anomaly-based CAN IDS are the most popular type of CAN IDS selected for benchmarking since they have the capability to detect novel, unknown attacks and do not require attack signatures. Among anomaly-based CAN IDS, it is observed that only statistical- and ML-based methods are the ones that are typically included in benchmarking studies. Because of the difficulties associated with conducting online tests, all comparative evaluations are offline evaluations using CAN bus datasets. There are a number of publicly available traces of CAN bus traffic collected from real vehicles, both under normal operation and under attack, which are commonly used for offline evaluations. However, such datasets are limited in the types of attacks they contain; while there are several datasets available with common fabrication attacks, there is a lack of datasets containing other classes of attacks like suspension and masquerade attacks. Offline experiments also allow measurement of only security-related metrics related to attack detection accuracy. As such, attack detection latency and other non-functional properties are understudied in current benchmarking and comparative studies.

Examining surveyed works in terms of this design space reveals avenues for future work: benchmarking datasets, repeatable online evaluations, methods for comparing Layer 1 CAN IDS with Layer 2 CAN IDS, and comprehensive evaluation metrics.

References

[1] R.N. Charette, How Software Is Eating the Car, IEEE, 2021. https://spectrum.ieee.org/software-eating-car.
[2] P. Agbaje, A. Anjum, A. Mitra, G. Bloom and H. Olufowobi, A Framework for Consistent and Repeatable Controller Area Network IDS Evaluation, in: NDSS Automotive and Autonomous Vehicle Security (AutoSec) Workshop 2022, 2022.
[3] W. Wu, R. Li, G. Xie, J. An, Y. Bai, J. Zhou and K. Li, A Survey of Intrusion Detection for In-Vehicle Networks, IEEE Transactions on Intelligent Transportation Systems 21(3) (2020), 919–933. doi:10.1109/TITS.2019.2908074.
[4] E. Aliwa, O. Rana, C. Perera and P. Burnap, Cyberattacks and Countermeasures for In-Vehicle Networks, ACM Computing Surveys 54(1) (2021). doi:10.1145/3431233.
[5] G. Karopoulos, G. Kambourakis, E. Chatzoglou, J.L. Hernández-Ramos and V. Kouliaridis, Demystifying In-Vehicle Intrusion Detection Systems: A Survey of Surveys and a Meta-Taxonomy, Electronics 11(7) (2022), 1072. doi:10.3390/electronics11071072.
[6] H. Ji, Y. Wang, H. Qin, Y. Wang and H. Li, Comparative Performance Evaluation of Intrusion Detection Methods for In-Vehicle Networks, IEEE Access 6 (2018), 37523–37532. doi:10.1109/ACCESS.2018.2848106.
[7] A. Milenkoski, M. Vieira, S. Kounev, A. Avritzer and B.D. Payne, Evaluating Computer Intrusion Detection Systems: A Survey of Common Practices, ACM Computing Surveys 48(1) (2015). doi:10.1145/2808691.
[8] C. Young, H. Olufowobi, G. Bloom and J. Zambreno, Automotive Intrusion Detection Based on Constant CAN Message Frequencies Across Vehicle Driving Modes, in: Proceedings of the ACM Workshop on Automotive Cybersecurity, AutoSec ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 9–14. ISBN ISBN 978-1-4503-6180-4. doi:10.1145/3309171.3309179.
[9] O.Y. Al-Jarrah, C. Maple, M. Dianati, D. Oxtoby and A. Mouzakitis, Intrusion Detection Systems for Intra-Vehicle Networks: A Review, IEEE Access 7 (2019), 21266–21289. doi:10.1109/ACCESS.2019.2894183.
[10] F. Nappi, A survey of Intrusion Detection Systems for Controller Area Networks and FPGA evaluation, Master’s thesis, Politecnico Milano, 2022.
[11] S. Rajapaksha, H. Kalutarage, M.O. Al-Kadri, A. Petrovski, G. Madzudzo and M. Cheah, AI-based Intrusion Detection Systems for In-Vehicle Networks: A Survey, ACM Computing Surveys (2022), 3570954. doi:10.1145/3570954.
[12] R. Panigrahi, S. Borah, A.K. Bhoi, M.F. Ijaz, M. Pramanik, R.H. Jhaveri and C.L. Chowdhary, Performance Assessment of Supervised Classifiers for Designing Intrusion Detection Systems: A Comprehensive Review and Recommendations for Future Research, Mathematics 9(6) (2021), 690. doi:10.3390/math9060690.
[13] I.F. Kilincer, F. Ertam and A. Sengur, Machine learning methods for cyber security intrusion detection: Datasets and comparative study, Computer Networks 188 (2021), 107840. doi:https://doi.org/10.1016/j.comnet.2021.107840. https://www.sciencedirect.com/science/article/pii/S1389128621000141.
[14] S. Corrigan, Introduction to the Controller Area Network (CAN), Technical Report, Texas Instruments, 2016. www.ti.com.
[15] K.-T. Cho and K.G. Shin, Fingerprinting Electronic Control Units for Vehicle Intrusion Detection, in: Proceedings of the 25th USENIX Conference on Security Symposium, SEC’16, USENIX Association, USA, 2016, pp. 911–927. ISBN ISBN 978-1-931971-32-4.
[16] M.E. Verma, M.D. Iannacone, R.A. Bridges, S.C. Hollifield, P. Moriano, B. Kay and F.L. Combs, Addressing the Lack of Comparability & Testing in CAN Intrusion Detection Research: A Comprehensive Guide to CAN IDS Data & Introduction of the ROAD Dataset, arXiv, 2020. doi:10.48550/ARXIV.2012.14600.
[17] R. Islam, R.U.D. Refat, S.M. Yerram and H. Malik, Graph-Based Intrusion Detection System for Controller Area Networks, IEEE Transactions on Intelligent Transportation Systems 23(3) (2022), 1727–1736. doi:10.1109/TITS.2020.3025685.
[18] I. Studnia, E. Alata, V. Nicomette, M. Kaâniche and Y. Laarouchi, A language-based intrusion detection approach for automotive embedded networks, International Journal of Embedded Systems 10(1) (2018). doi:10.1504/IJES.2018.10010488.
[19] M.R. Moore, R.A. Bridges, F.L. Combs, M.S. Starr and S.J. Prowell, Modeling Inter-Signal Arrival Times for Accurate Detection of CAN Bus Signal Injection Attacks: A Data-Driven Approach to in-Vehicle Intrusion Detection, in: Proceedings of the 12th Annual Conference on Cyber and Information Security Research, CISRC ’17, Association for Computing Machinery, New York, NY, USA, 2017. ISBN ISBN 978-1-4503-4855-3. doi:10.1145/3064814.3064816.
[20] H. Song, H. Kim and H. Kim, Intrusion detection system based on the analysis of time intervals of CAN messages for in-vehicle network, in: 2016 International Conference on Information Networking (ICOIN), IEEE Computer Society, Los Alamitos, CA, USA, 2016, pp. 63–68. doi:10.1109/ICOIN.2016.7427089.
[21] H. Olufowobi, U. Ezeobi, E. Muhati, G. Robinson, C. Young, J. Zambreno and G. Bloom, Anomaly Detection Approach Using Adaptive Cumulative Sum Algorithm for Controller Area Network, in: Proceedings of the ACM Workshop on Automotive Cybersecurity, AutoSec ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 25–30. ISBN ISBN 978-1-4503-6180-4. doi:10.1145/3309171.3309178.
[22] M. Bozdal, M. Samie and I.K. Jennions, WINDS: A Wavelet-Based Intrusion Detection System for Controller Area Network (CAN), IEEE Access 9 (2021), 58621–58633. doi:10.1109/ACCESS.2021.3073057. https://ieeexplore.ieee.org/document/9402263/.
[23] M. Marchetti and D. Stabili, Anomaly detection of CAN bus messages through analysis of ID sequences, in: 2017 IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 1577–1583. doi:10.1109/IVS.2017.7995934.
[24] M. Müter and N. Asaj, Entropy-based anomaly detection for in-vehicle networks, in: 2011 IEEE Intelligent Vehicles Symposium (IV), 2011, pp. 1110–1115. doi:10.1109/IVS.2011.5940552.
[25] M. Marchetti, D. Stabili, A. Guido and M. Colajanni, Evaluation of anomaly detection for in-vehicle networks through information-theoretic algorithms, in: 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI), 2016, pp. 1–6. doi:10.1109/RTSI.2016.7740627.
[26] G. Baldini, On the Application of Entropy Measures with Sliding Window for Intrusion Detection in Automotive In-Vehicle Networks, Entropy 22(9) (2020), 1044. doi:10.3390/e22091044. https://www.mdpi.com/1099-4300/22/9/1044.
[27] D. Stabili, M. Marchetti and M. Colajanni, Detecting attacks to internal vehicle networks through Hamming distance, in: 2017 AEIT International Annual Conference, 2017, pp. 1–6. doi:10.23919/AEIT.2017.8240550.
[28] A. Alshammari, M. A. Zohdy, D. Debnath and G. Corser, Classification Approach for Intrusion Detection in Vehicle Systems, Wireless Engineering and Technology 09(04) (2018), 79–94. doi:10.4236/wet.2018.94007.
[29] R.U.D. Refat, A.A. Elkhail, A. Hafeez and H. Malik, Detecting CAN Bus Intrusion by Applying Machine Learning Method to Graph Based Features, in: Intelligent Systems and Applications, Vol. 296, K. Arai, ed., Springer International Publishing, Cham, 2022, pp. 730–748, Series Title: Lecture Notes in Networks and Systems. ISBN ISBN 978-3-030-82198-2 978-3-030-82199-9. doi:10.1007/978-3-030-82199-9_49. https://link.springer.com/10.1007/978-3-030-82199-9_49.
[30] O. Avatefipour, A. Saad Al-Sumaiti, A.M. El-Sherbeeny, E. Mahrous Awwad, M.A. Elmeligy, M.A. Mohamed and H. Malik, An Intelligent Secured Framework for Cyberattack Detection in Electric Vehicles’ CAN Bus Using Machine Learning, IEEE Access 7 (2019), 127580–127592, Publisher: IEEE. doi:10.1109/ACCESS.2019.2937576.
[31] S. Sharmin and H. Mansor, Intrusion Detection on the In-Vehicle Network Using Machine Learning, in: 3rd International Cyber Resilience Conference (CRC), IEEE, Virtual, 2021, pp. 26–31. ISBN ISBN 978-1-66541-844-7. doi:10.1109/CRC50527.2021.9392627.
[32] J. Zhang, F. Li, H. Zhang, R. Li and Y. Li, Intrusion detection system using deep learning for in-vehicle security, Ad Hoc Networks 95 (2019), 101974. doi:https://doi.org/10.1016/j.adhoc.2019.101974. https://www.sciencedirect.com/science/article/pii/S1570870519304354.
[33] F. Fenzl, R. Rieke, Y. Chevalier, A. Dominik and I. Kotenko, Continuous fields: Enhanced in-vehicle anomaly detection using machine learning models, Simulation Modelling Practice and Theory 105 (2020), 102143. doi:https://doi.org/10.1016/j.simpat.2020.102143. https://www.sciencedirect.com/science/article/pii/S1569190X20300824.
[34] M. Hanselmann, T. Strauss, K. Dormann and H. Ulmer, CANet: An Unsupervised Intrusion Detection System for High Dimensional CAN Bus Data, IEEE Access 8 (2020), 58194–58205. doi:10.1109/ACCESS.2020.2982544. https://ieeexplore.ieee.org/document/9044377/.
[35] M.D. Hossain, H. Inoue, H. Ochiai, D. Fall and Y. Kadobayashi, LSTM-Based Intrusion Detection System for In-Vehicle Can Bus Communications, IEEE Access 8 (2020), 185489–185502. doi:10.1109/ACCESS.2020.3029307.
[36] S. Longari, D.H. Nova Valcarcel, M. Zago, M. Carminati and S. Zanero, CANnolo: An Anomaly Detection System Based on LSTM Autoencoders for Controller Area Network, IEEE Transactions on Network and Service Management 18(2) (2021), 1913–1924. doi:10.1109/TNSM.2020.3038991. https://ieeexplore.ieee.org/document/9262960/.
[37] E. Novikova, V. Le, M. Yutin, M. Weber and C. Anderson, Autoencoder Anomaly Detection on Large CAN Bus Data, in: Proceedings of DLP-KDD 2020, ACM, San Diego, California, 2022, p. 9. ISBN ISBN 978-1-4503-9999-9. doi:10.1145/1122445.1122456.
[38] V.K. Kukkala, S.V. Thiruloga and S. Pasricha, INDRA: Intrusion Detection Using Recurrent Autoencoders in Automotive Embedded Systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39(11) (2020), 3698–3710. doi:10.1109/TCAD.2020.3012749. https://ieeexplore.ieee.org/document/9211565/.
[39] S.V. Thiruloga, V.K. Kukkala and S. Pasricha, TENET: Temporal CNN with Attention for Anomaly Detection in Automotive Cyber-Physical Systems, in: 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022, pp. 326–331. doi:10.1109/ASP-DAC52403.2022.9712524.
[40] A.R. Javed, S.u. Rehman, M.U. Khan, M. Alazab and T.R. G, CANintelliIDS: Detecting In-Vehicle Intrusion Attacks on a Controller Area Network Using CNN and Attention-Based GRU, IEEE Transactions on Network Science and Engineering 8(2) (2021), 1456–1466. doi:10.1109/TNSE.2021.3059881.
[41] E. Seo, H.M. Song and H.K. Kim, GIDS: GAN based Intrusion Detection System for In-Vehicle Network, in: 2018 16th Annual Conference on Privacy, Security and Trust (PST), 2018, pp. 1–6. doi:10.1109/PST.2018.8514157.
[42] Y. Xun, Y. Zhao and J. Liu, VehicleEIDS: A Novel External Intrusion Detection System Based on Vehicle Voltage Signals, IEEE Internet of Things Journal 9(3) (2022), 2124–2133. doi:10.1109/JIOT.2021.3090397.
[43] L. Zhang and D. Ma, A Hybrid Approach Toward Efficient and Accurate Intrusion Detection for In-Vehicle Networks, IEEE Access 10 (2022), 10852–10866. doi:10.1109/ACCESS.2022.3145007. https://ieeexplore.ieee.org/document/9687591/.
[44] A. Nichelini, C.A. Pozzoli, S. Longari, M. Carminati and S. Zanero, CANova: A hybrid intrusion detection framework based on automatic signal classification for CAN, Computers & Security 128 (2023), 103166. doi:10.1016/j.cose.2023.103166. https://linkinghub.elsevier.com/retrieve/pii/S0167404823000767.
[45] W. Choi, K. Joo, H.J. Jo, M.C. Park and D.H. Lee, VoltageIDS: Low-Level Communication Characteristics for Automotive Intrusion Detection System, IEEE Transactions on Information Forensics and Security 13(8) (2018), 2114–2129. doi:10.1109/TIFS.2018.2812149. https://ieeexplore.ieee.org/document/8306904/.
[46] K.-T. Cho and K.G. Shin, Viden: Attacker Identification on In-Vehicle Networks, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, ACM, Dallas Texas USA, 2017, pp. 1109–1123. ISBN ISBN 978-1-4503-4946-8. doi:10.1145/3133956.3134001.
[47] A. Hafeez, K. Rehman and H. Malik, State of the Art Survey on Comparison of Physical Fingerprinting-Based Intrusion Detection Techniques for In-Vehicle Security, 2020, pp. 2020–01–0721. doi:10.4271/2020-01-0721. https://www.sae.org/content/2020-01-0721/.
[48] A. Tomlinson, J. Bryans and S.A. Shaikh, Towards Viable Intrusion Detection Methods For The Automotive Controller Area Network, Proceedings of the 2nd ACM Computer Science in Cars Symposium (2018). ISBN ISBN 978-1-4503-6616-8. doi:10.1145/3273946.3273950.
[49] G. Dupont, J. Den Hartog, S. Etalle and A. Lekidis, Evaluation Framework for Network Intrusion Detection Systems for In-Vehicle CAN, in: 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE), Graz, Austria, 2019, pp. 1–6. ISBN ISBN 978-1-7281-0142-2. doi:10.1109/ICCVE45908.2019.8965028.
[50] T. Costa Cañones, Benchmarking framework for Intrusion Detection Systems in Controller Area Networks, Master’s thesis, Politecnico di Milano and Universitat Politecnica de Catalunya, 2021. https://www.politesi.polimi.it/handle/10589/176269.
[51] O. Almomani, M.A. Almaiah, A. Alsaaidah, S. Smadi, A.H. Mohammad and A. Althunibat, Machine Learning Classifiers for Network Intrusion Detection System: Comparative Study, in: 2021 International Conference on Information Technology (ICIT), 2021, pp. 440–445. doi:10.1109/ICIT52682.2021.9491770.
[52] L. Baum, M. Becker, L. Geyer and G. Molter, Mapping requirements to reusable components using Design Spaces, in: Proceedings Fourth International Conference on Requirements Engineering. ICRE 2000. (Cat. No.98TB100219), 2000, pp. 159–167. doi:10.1109/ICRE.2000.855606.
[53] O. Schell and M. Kneib, VALID: Voltage-Based Lightweight Intrusion Detection for the Controller Area Network, in: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), IEEE, Guangzhou, China, 2020, pp. 225–232. ISBN ISBN 978-1-66540-392-4. doi:10.1109/TrustCom50675.2020.00041. https://ieeexplore.ieee.org/document/9343029/.
[54] Y. Ujiie, T. Kishikawa, T. Haga, H. Matsushima, T. Wakabayashi, M. Tanabe, Y. Kitamura and J. Anzai, A method for disabling malicious CAN messages by using a CMI-ECU, in: SAE 2016 World Congress and Exhibition, 2016. ISSN 0148-7191. doi:10.4271/2016-01-0068.
[55] A.K. Desta, S. Ohira, I. Arai and K. Fujikawa, ID Sequence Analysis for Intrusion Detection in the CAN bus using Long Short Term Memory Networks, in: 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), IEEE, Austin, TX, USA, 2020, pp. 1–6. ISBN ISBN 978-1-72814-716-1. doi:10.1109/PerComWorkshops48775.2020.9156250. https://ieeexplore.ieee.org/document/9156250/.
[56] R.S. Rathore, C. Hewage, O. Kaiwartya and J. Lloret, In-Vehicle Communication Cyber Security: Challenges and Solutions, Sensors 22(17) (2022), 6679. doi:10.3390/s22176679.
[57] H. Jadidbonab, A. Tomlinson, H.N. Nguyen, T. Doan and S.A. Shaikh, A Real-Time In-Vehicle Network Testbed for Machine Learning-Based IDS Training and Validation, in: Workshop on Artificial Intelligence and Cyber Security (AI-CyberSec 2021), CEUR Workshop Proceedings, 2021.
[58] M.S. Gadelrab and A. Ghorbani, A New Framework for Publishing and Sharing Network and Security Datasets, in: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012, pp. 539–546. doi:10.1109/SC.Companion.2012.77.
[59] H. Lee, S.H. Jeong and H.K. Kim, OTIDS: A Novel Intrusion Detection System for In-vehicle Network by Using Remote Frame, in: 2017 15th Annual Conference on Privacy, Security and Trust (PST), 2017, pp. 57–5709. doi:10.1109/PST.2017.00017.
[60] M.L. Han, B.I. Kwak and H.K. Kim, Anomaly intrusion detection method for vehicular networks based on survival analysis, Vehicular Communications 14 (2018), 52–63. doi:https://doi.org/10.1016/j.vehcom.2018.09.004.
[61] A. Paul and M.R. Islam, An Artificial Neural Network Based Anomaly Detection Method in CAN Bus Messages in Vehicles, in: 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 2021, pp. 1–5. doi:10.1109/ACMI53878.2021.9528201.
[62] S. Khandelwal and S. Shreejith, A Lightweight Multi-Attack CAN Intrusion Detection System on Hybrid FPGAs, in: 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL), IEEE, Belfast, United Kingdom, 2022, pp. 425–429. ISBN ISBN 978-1-66547-390-3. doi:10.1109/FPL57034.2022.00070. https://ieeexplore.ieee.org/document/10035170/.
[63] V.S. Barletta, D. Caivano, A. Nannavecchia and M. Scalera, Intrusion Detection for in-Vehicle Communication Networks: An Unsupervised Kohonen SOM Approach, Future Internet 12(7) (2020), 119. doi:10.3390/fi12070119. https://www.mdpi.com/1999-5903/12/7/119.
[64] P. Moriano, R.A. Bridges and M.D. Iannacone, Detecting CAN Masquerade Attacks with Signal Clustering Similarity, in: Proceedings Fourth International Workshop on Automotive and Autonomous Vehicle Security, 2022. doi:10.14722/autosec.2022.23028. http://arxiv.org/abs/2201.02665.
[65] G. Dupont, A. Lekidis, J. den Hartog and S. Etalle, Automotive Controller Area Network (CAN) Bus Intrusion Dataset v2 (2019). doi:10.4121/uuid:b74b4928-c377-4585-9432-2004dfa20a5d. https://data.4tu.nl/articles/dataset/Automotive_Controller_Area_Network_CAN_Bus_Intrusion_Dataset/12696950.
[66] S. Sharmin, H. Mansor, A.F. Abdul Kadir and N.A. Aziz, Using Streaming Data Algorithm for Intrusion Detection on the Vehicular Controller Area Network, in: Ubiquitous Security, Vol. 1557, G. Wang, K.-K.R. Choo, R.K.L. Ko, Y. Xu and B. Crispo, eds, Springer Singapore, Singapore, 2022, pp. 131–144, Series Title: Communications in Computer and Information Science. ISBN ISBN 978-981-19046-7-7 978-981-19046-8-4. doi:10.1007/978-981-19-0468-4_10. https://link.springer.com/10.1007/978-981-19-0468-4_10.
[67] A. Gazdag, G. Lupták and L. Buttyán, Correlation-Based Anomaly Detection for the CAN Bus, in: Security in Computer and Information Sciences, Vol. 1596, E. Gelenbe, M. Jankovic, D. Kehagias, A. Marton and A. Vilmos, eds, Springer International Publishing, Cham, 2022, pp. 38–50, Series Title: Communications in Computer and Information Science. ISBN ISBN 978-3-031-09356-2 978-3-031-09357-9. doi:10.1007/978-3-031-09357-9_4. https://link.springer.com/10.1007/978-3-031-09357-9_4.
[68] M. Foruhandeh, Y. Man, R. Gerdes, M. Li and T. Chantem, SIMPLE: single-frame based physical layer identification for intrusion detection and prevention on in-vehicle networks, in: Proceedings of the 35th Annual Computer Security Applications Conference, ACM, San Juan Puerto Rico USA, 2019, pp. 229–244. ISBN ISBN 978-1-4503-7628-0. doi:10.1145/3359789.3359834.
[69] W. Lalouani, Y. Dang and M. Younis, Mitigating voltage fingerprint spoofing attacks on the controller area network bus, Cluster Computing 26(2) (2022), 1447–1460. doi:10.1007/s10586-022-03821-x.
[70] L. Popa, B. Groza, C. Jichici and P.-S. Murvay, ECUPrint—Physical Fingerprinting Electronic Control Units on CAN Buses Inside Cars and SAE J1939 Compliant Vehicles, IEEE Transactions on Information Forensics and Security 17 (2022), 1185–1200. doi:10.1109/TIFS.2022.3158055. https://ieeexplore.ieee.org/document/9730883/.
[71] A. Vahidi, T. Rosenstatter and N.I. Mowla, Systematic Evaluation of Automotive Intrusion Detection Datasets, in: Computer Science in Cars Symposium, ACM, Ingolstadt Germany, 2022, pp. 1–12. ISBN ISBN 978-1-4503-9786-5. doi:10.1145/3568160.3570226.
[72] D. Swessi and H. Idoudi, A Comparative Review of Security Threats Datasets for Vehicular Networks, in: 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), 2021, pp. 746–751. doi:10.1109/3ICT53449.2021.9581683.
[73] D. Chicco, N. Tötsch and G. Jurman, The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation, BioData Mining 14(1) (2021), 13. doi:10.1186/s13040-021-00244-z.
[74] C. Corbett, T. Basic, T. Lukaseder and F. Kargl, A Testing Framework Architecture for Automotive Intrusion Detection Systems, in: Automotive - Safety & Security 2017 - Sicherheit und Zuverlässigkeit für automobile Informationstechnik, P. Dencker, H. Klenk, H.B. Keller and E. Plödererder, eds, Gesellschaft für Informatik, Bonn, 2017, pp. 89–102.
[75] H. Olufowobi, C. Young, J. Zambreno and G. Bloom, SAIDuCANT: Specification-Based Automotive Intrusion Detection Using Controller Area Network (CAN) Timing, IEEE Transactions on Vehicular Technology 69(2) (2020), 1484–1494. doi:10.1109/TVT.2019.2961344_rfseq1.
[76] J. Sunny, S. Sankaran and V. Saraswat, A Hybrid Approach for Fast Anomaly Detection in Controller Area Networks, in: 2020 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), 2020, pp. 1–6. doi:10.1109/ANTS50601.2020.9342791.
[77] A. Taylor, N. Japkowicz and S. Leblanc, Frequency-based anomaly detection for the automotive CAN bus, in: 2015 World Congress on Industrial Control Systems Security (WCICSS), 2015, pp. 45–49. doi:10.1109/WCICSS.2015.7420322.
[78] C. Miller and C. Valasek, A Survey of Remote Automotive Attack Surfaces, in: Black Hat USA, 2014.
[79] S. Abbott-McCune and L.A. Shay, Intrusion prevention system of automotive network CAN bus, in: 2016 IEEE International Carnahan Conference on Security Technology (ICCST), 2016, pp. 1–8. doi:10.1109/CCST.2016.7815711.
[80] M. Gmiden, M.H. Gmiden and H. Trabelsi, An intrusion detection method for securing in-vehicle CAN bus, in: 2016 17th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA), 2016, pp. 176–180. doi:10.1109/STA.2016.7952095.
[81] D. Stabili, F. Pollicino and A. Rota, A benchmark framework for CAN IDS, in: Proceedings of the Italian Conference on Cybersecurity (ITASEC 2021), 2021.
[82] D. Stabili and M. Marchetti, Detection of Missing CAN Messages through Inter-Arrival Time Analysis, in: 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall), 2019, pp. 1–7. doi:10.1109/VTCFall.2019.8891068.
[83] D.H. Blevins, P. Moriano, R.A. Bridges, M.E. Verma, M.D. Iannacone and S.C. Hollifield, Time-Based CAN Intrusion Detection Benchmark, in: Workshop on Automotive and Autonomous Vehicle Security (AutoSec) 2021, Internet Society, Virtual, 2021. ISBN ISBN 978-1-891562-68-1. doi:10.14722/autosec.2021.23013.
[84] S. Stachowski, R. Gaynier and D.J. LeBlanc, An Assessment Method for Automotive Intrusion Detection System Performance, Technical Report, DOT HA 812 708, University of Michigan, Ann Arbor, Transportation Research Institute, 2019.
[85] A. Taylor, S. Leblanc and N. Japkowicz, Probing the Limits of Anomaly Detectors for Automobiles with a Cyberattack Framework, IEEE Intelligent Systems 33(2) (2018), 54–62. doi:10.1109/MIS.2018.111145054.
[86] I. Berger, R. Rieke, M. Kolomeets, A. Chechulin and I. Kotenko, Comparative Study of Machine Learning Methods for In-Vehicle Intrusion Detection, in: Computer Security, S.K. Katsikas, F. Cuppens, N. Cuppens, C. Lambrinoudakis, A. Antón, S. Gritzalis, J. Mylopoulos and C. Kalloniatis, eds, Springer International Publishing, Cham, 2019, pp. 85–101. ISBN ISBN 978-3-030-12786-2.
[87] T. Moulahi, S. Zidi, A. Alabdulatif and M. Atiquzzaman, Comparative Performance Evaluation of Intrusion Detection Based on Machine Learning in In-Vehicle Controller Area Network Bus, IEEE Access 9 (2021), 99595–99605. doi:10.1109/ACCESS.2021.3095962.
[88] D. Swessi and H. Idoudi, Comparative Study of Ensemble Learning Techniques for Fuzzy Attack Detection in In-Vehicle Networks, in: Advanced Information Networking and Applications, L. Barolli, F. Hussain and T. Enokido, eds, Springer International Publishing, Cham, 2022, pp. 598–610. ISBN ISBN 978-3-030-99587-4.
[89] G.O. Anyanwu, C.I. Nwakanma, J.M. Lee and D.-S. Kim, Countering Attacks in IN-Vehicle Network: An Evaluation of Machine Learning Algorithms, in: 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, Republic of, 2021, pp. 657–660. ISBN ISBN 978-1-6654-2383-0. doi:10.1109/ICTC52510.2021.9621200.
[90] K. Okokpujie, G.C. Kennedy, V.P. Nzanzu, M.J. Molo, E. Adetiba and J. Badejo, Anomaly-Based Intrusion Detection for a Vehicle CAN Bus: A Case for Hyundai Avante CN7, Journal of Southwest Jiaotong University 56(5) (2021), 144–156. doi:10.35741/issn.0258-2724.56.5.14.
[91] R. Rieke, M. Seidemann, E.K. Talla, D. Zelle and B. Seeger, Behavior Analysis for Safety and Security in Automotive Systems, in: 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), 2017, pp. 381–385. doi:10.1109/PDP.2017.67.
[92] M. Zago, S. Longari, A. Tricarico, M. Carminati, M. Gil Pérez, G. Martínez Pérez and S. Zanero, ReCAN – Dataset for reverse engineering of Controller Area Networks, Data in Brief 29 (2020), 105149. doi:https://doi.org/10.1016/j.dib.2020.105149.
[93] M. Sami, Intrusion Detection in CAN bus, IEEE Dataport, 2019. doi:10.21227/24m9-a446.
[94] H. Kang, B.I. Kwak, Y.H. Lee, H. Lee, H. Lee and H.K. Kim, Car Hacking: Attack & Defense Challenge 2020 Dataset, IEEE Dataport, 2021. doi:10.21227/qvr7-n418.
[95] J. Choi and H. Kim, On the Robustness of Intrusion Detection Systems for Vehicles Against Adversarial Attacks, in: Information Security Applications, H. Kim, ed., Springer International Publishing, Cham, 2021, pp. 39–50. ISBN ISBN 978-3-030-89432-0.
[96] N. Stakhanova and A.A. Cardenas, Analysis of Metrics for Classification Accuracy in Intrusion Detection, in: Empirical Research for Software Security, CRC Press, 2017, pp. 173–199.