This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

EM-X-DL: Efficient Cross-Device Deep Learning Side-Channel Attack with Noisy EM Signatures

Josef Danial Purdue UniversityWest LafayetteIndianaUSA [email protected] Debayan Das Purdue UniversityWest LafayetteIndianaUSA [email protected] Anupam Golder Georgia Institute of TechnologyAtlantaGeorgiaUSA [email protected] Santosh Ghosh Intel CorporationHillsboroOregonUSA [email protected] Arijit Raychowdhury Georgia Institute of TechnologyAtlantaGeorgiaUSA [email protected]  and  Shreyas Sen Purdue UniversityWest LafayetteIndianaUSA [email protected]
Abstract.

This work presents a Cross-device Deep-Learning based Electromagnetic (EM-X-DL) side-channel analysis (SCA), achieving >90%>90\% single-trace attack accuracy on AES-128, even in the presence of significantly lower signal-to-noise ratio (SNR), compared to the previous works. With an intelligent selection of multiple training devices and proper choice of hyperparameters, the proposed 256-class deep neural network (DNN) can be trained efficiently utilizing pre-processing techniques like PCA, LDA, and FFT on the target encryption engine running on an 8-bit Atmel microcontroller. Finally, an efficient end-to-end SCA leakage detection and attack framework using EM-X-DL demonstrates high confidence of an attacker with <20<20 averaged EM traces.

Electromagnetic Side-Channel Attacks, Cross-Device Attack, Deep Learning, Profiling Attacks, End-to-end SCA.
conference: Arxiv.org; November 11, 2020; Onlinebooktitle: Arxivccs: Security and privacy Embedded systems securityccs: Security and privacy Side-channel analysis and countermeasures

1. INTRODUCTION

With the ever-increasing prevalence of embedded devices and the growth of the Internet of Things (IoT), the security of these devices has become a major concern. Some of the most serious threats to the security of these devices are side-channel analysis (SCA) attacks. By analyzing physical leakage information regarding the power (Kocher et al., 1999), timing (Kocher, 1996), or electromagnetic (EM) signatures (Agrawal et al., 2003), cryptographic secrets can be extracted. Among the most powerful of these side-channel attacks are profiled attacks (Chari et al., 2003), and recently machine learning (ML) models have been shown to be very effective in this profiled SCA attack scenario using both power and EM measurements (et al., 2013; Prouff et al., 2018).

Refer to caption
Figure 1. (a) DNN training with high SNR raw power and low SNR raw EM traces. The model learns quickly from the power traces, but is unable to learn from the raw EM traces. (b) The variance of a point of interest (time sample 103) for both power and EM traces, demonstrating significantly lower SNR of the EM traces.

1.1. Motivation

The main limitation of ML models for profiling SCA attacks is their portability to other target devices. Specifically, these models have been shown to work when the same device is used for both profiling and testing, however, in a real attack, the attacker would use a device to profile, then attack a separate, identical device. This issue of portability has recently been addressed for power ML SCA models on AES-128 in (Das et al., 2019), (Bhasin et al., 2019), and also with a 3-class DNN attacking RSA implementations for EM SCA (Carbone et al., 2019). However, these works only consider high SNR scenarios, and with the introduction of SNR reducing countermeasures (Das et al., 2019) or low cost, low sensitivity EM probes, practical attacks must address the reality of low SNR trace measurements. In this work, we show a deep-learning based cross-device SCA attack with low-SNR EM signatures.

A 256-class DNN model that can be trained successfully (>99%>99\% validation accuracy) (Das et al., 2019) using raw time-domain AES-128 power traces for a particular microcontroller is rendered futile for low SNR EM SCA training even with traces collected from the same device (Figure 1(a)). Figure 1(b) shows the variance of a point of interest (POI, determined using the difference of means approach (Chari et al., 2003; Choudary and Kuhn, 2018)) across 10K EM and power traces. It clearly shows that the variation in the EM traces is much higher than the power traces, implying significantly lower SNR for the EM signatures. Indeed, when considering the side-channel SNR as defined in (Mangard, [n. d.]) as SNR=VAR[Q]VAR[N]\textbf{SNR}=\frac{VAR[Q]}{VAR[N]}, with QQ being the side-channel leakage and NN being the noise, there is a large difference when comparing power and EM measurements. The side-channel SNR across a random selection of 7 devices for power traces is 19.6 dB, while the SNR of equivalent EM traces is 3.1 dB, as is seen in Figure 2. Note that the SNR of a single device is comparable for both power and EM, but adding additional devices lowers the SNR drastically, due to the device-to-device variations. In fact, a majority of the lower SNR in the EM realm is due to inter-device variations being more prominent in EM compared to power, again looking at Figure 2. So, to solve the problem of portability, we need to take into account the inter-device variations (Renauld et al., 2011). To resolve all these issues, we utilize averaging to enhance the SNR, analyze different pre-processing techniques to reduce the dimensionality of the data, and develop an intelligent algorithm to choose the set of training devices for efficient profiling, given the need for more training devices to train a model due to the larger effect of inter-device variations. Finally, we also propose an end-to-end EM-X-DL attack framework to perform EM scanning and find the best point of leakage on an unseen target device. A combination of these techniques allows us to achieve >90%>90\% cross-device accuracy.

Refer to caption
Figure 2. Change in side-channel SNR as devices are added to the dataset for both power and EM. While the power SNR remains fairly high as additional devices are added, the EM SNR drops sharply, indicating a much larger effect of cross-device variations on side-channel leakage in the EM domain compared to power.

1.2. Contribution

The specific contributions of this work are:

  • This work presents a cross-device deep-learning based EM SCA (EM-X-DL) on an AES-128 encryption engine using a 256-class DNN in a low SNR scenario, with ten devices for training and tested on a different set of ten test devices (Sec. 3).

  • Effect of different pre-processing techniques including principal component analysis (PCA), linear discriminant analysis (LDA), fast fourier transform (FFT), spectrogram, on handling the portability issue is analyzed and compared, showing that the LDA is the most efficient approach to achieve maximum average cross-device key prediction accuracy of 91.5%\sim 91.5\% with minimum training time (Sec. 3.3).

  • An algorithm for the optimal selection of the training devices is proposed, so that the number of training devices and thus the overall training time is minimized (Sec. 4, Algo. 1).

Table 1. Literature Review for Profiled-Attack Scenario
Profiled Attack
Scenario
Measure-
ment Type
Profiling
Method
Corresponding
Article
Same-Device Power TA (Chari et al., 2003)
SVM, RF (et al., 2013)(Lerman et al., 2014)
DNN (Maghrebi et al., 2016)
EM TA (Chari et al., 2003)
DNN (Prouff et al., 2018)
Cross-Device Power TA (Choudary and Kuhn, 2018)
DNN (Das et al., 2019)(Bhasin et al., 2019)
EM TA (Montminy et al., 2013)
3-Class
DNN
(Carbone et al., 2019)
(RSA)
256-Class
DNN
This Work*
(AES-128)

*First EM Cross-Device Deep-Learning Attack on a Symmetric Key Algorithm

2. BACKGROUND & RELATED WORK

2.1. EM Side Channel Attacks

Since the inception of power SCA  (Kocher et al., 1999), a wide variety of attacks have been demonstrated, which can be broadly classified into non-profiled attacks like differential/correlational power/EM analysis (DPA, CPA, DEMA, CEMA)  (Brier et al., 2004),  (Kocher et al., 1999), and profiled attacks, such as the statistical template attacks (Chari et al., 2003) and ML SCA attacks. While non-profiled attacks perform an attack in a single phase on a target device, profiled attacks consist of two phases, a profiling phase, to learn a leakage pattern and an attack phase, to attack with only a few traces, which practically operate on different devices. During the profiling stage, the attacker will collect traces from a ”profiling” device identical to the victim device to build a model. During the attack, this model is then used to recover cryptographic secrets from the victim device.

2.2. ML-SCA Attacks

Template attacks have been shown (Chari et al., 2003) to be capable of recovering secret keys with a small number of traces, making them among the most powerful side channel attacks. More recently, supervised ML techniques have been used for profiling SCA (et al., 2013). Among these techniques, DNNs have been one of the most successful, defeating many common countermeasures, such as masking (Gilmore et al., 2015) and clock jitter (Cagli et al., 2017). Table 1 provides the summary of related works on profiling attacks. Till date, only one prior work (Carbone et al., 2019) has focused on cross-device EM ML SCA attack using only one test device running RSA. Note that this attack required a 3-class DNN (Carbone et al., 2019), whereas the proposed single-trace (averaged) EM-X-DL attack on AES-128 requires a 256-class DNN, and thus the effects of portability across devices is significantly more prominent. Additionally, AES measurements have significantly lower side-channel SNR compared to a public key algorithm such as RSA.

Refer to caption
Figure 3. Architecture of the proposed DNN. The network contains 3 dense layers, following each dense layer is a ReLU activation function, batch normalization, and finally a dropout layer. The final output layer provides the output class predictions - the key byte, and thus is size 256, and uses a softmax activation function.
Refer to caption
Figure 4. Effect of averaging on the test accuracy of the 256-class DNN when using raw traces and PCA-transformed traces. Increasing averaging hardly allows the DNN to learn from the time-domain EM traces. With PCA used as a pre-processing step, averaging upto 20×20\times smoothly increases the test accuracy to >99%>99\% for the same device.

3. EM-X-DL SCA ATTACK

This section evaluates the single-trace (averaged) EM-X-DL attack on AES-128 using a 256-class DNN. For profiling the DNN, EM traces are collected from a set of ten training devices (8-bit Atmega microcontrollers) using the Chipwhisperer (O’Flynn and Chen, 2014) platform, specifically the CW-Lite capture board, along with an off-the-shelf H-field sensor (10mm loop diameter) and a 40dB wideband amplifier. The efficient selection of the training devices is discussed in the subsequent section. For evaluating the attack, ten different devices are reserved separately and the cross-device (EM-X-DL) accuracy is reported as an average of these ten test devices.

Refer to caption
Figure 5. Effect of hyperparameters on both same- and cross-device test accuracy for the PCA-DNN model. (a) Dropout between the first and second hidden layers helps prevent overfitting, maximizing cross-device accuracy at a dropout rate of 0.45. (b) Layer size also demonstrates a similar trend, and reaches maximum cross-device accuracy at 1000\sim 1000 for the second hidden layer.

3.1. Effect of EM Probe Choice

The EM probe used to collect both training and testing traces has an effect on the side-channel EM signals recorded. Two probes were considered, first a Langer probe with very high spatial sensitivity (100μm100\mu m diameter), and second a texbox probe with low spatial sensitivity (10mm10mm diameter). In order to estimate the leakage captured by each of the probes, test vector leakage assessment (TVLA) (Becker et al., 2013) was used to measure side channel leakage, scanning over the surface of one of the 8-bit X-MEGA devices under attack. The results of these scans can be seen in figure 6. As expected, the Langer probe finds high leakage in a very small area, while the larger probe detects leakage over a much larger area of the chip. Additionally, the larger probe detects a much higher level of leakage overall. Since the X-MEGA device is running a software implementation of AES-128, side-channel leakage is not highly localized, as it would be in a hardware implementation, and the high spatial sensitivity of the Langer probe does not provide a large benefit in rejecting algorithmic noise, as there is not a single register at which to target during an attack. For the rest of this work, results will be shown from the larger probe, as the leakage levels are already lower than power, and a variety of effects can be more easily investigated with the relatively higher leakage levels with the larger Tekbox probe - a t-value of 8 on the Langer probe vs. a t-value of 22 with the Tekbox probe, as seen in Figure 6.

Refer to caption
Figure 6. Heatmaps created by performing a fixed vs. random TVLA on a 10×1010\times 10 grid spanning the surface of the chip. a) shows results obtained from a Langer ICR HH100-27 probe, while b) shows results from a Tekbox probe. The leakage patterns of the two probes are quite different, which is not unusual as the Langer probe has a much higher spatial resolution. The Langer probe also measured lower side-channel leakage overall, even at the maximum location.

3.2. DNN Architecture & Training

Figure 3 shows the architecture of the proposed 256-class fully-connected (FC) DNN for the EM-X-DL attack. It should be noted that the EM traces captured using Chipwhisperer are time-synchronized and hence use of a convolutional layer is not necessary (Golder et al., 2019). 3000 time samples for each trace were collected from the 8-bit microcontrollers running AES-128 clocked at 7.37MHz.

Refer to caption
Figure 7. PCA and LDA reach their respective peaks (250 and 10) with relatively few features compared to the size of the original traces (3000). As LDA features are chosen to maximize the class separation, while PCA maximizes variance, LDA is a more efficient technique for this higher dimensional data as it can train the DNN significantly faster.
Refer to caption
Figure 8. Effect of the different pre-processing techniques on (a) the DNN training accuracy, (b) the cross-device attack (EM-X-DL) accuracy. While all the pre-processing techniques result in high validation (same-device) accuracy, PCA, LDA, FFT result in >90%>90\% cross-device accuracy, while spectrogram yields 74.6%74.6\% cross-device accuracy.

The DNN, implemented using Tensorflow (Abadi et al., 2016), has a 3000-neuron input layer, followed by three hidden layers with 100, 1024, 512 neurons respectively, and finally the 256-neuron output layer. Rectified Linear Unit (ReLU) activation functions along with batch normalization and dropout used to achieve generalization are utilized for training the DNN. The Adam optimizer, with an initial learning rate of 0.005, which is halved whenever five consecutive training epochs pass without any validation accuracy improvement, is used for training. The effect of different hyperparameters is shown in Figure 5. A dropout of 0.45 is the most optimum for the first hidden layer (Figure 5(a)), while 1024 hidden neurons for the second hidden layer (Figure 5(b)) provides the maximum cross-device accuracy without overfitting to the training devices. For all the results that follow, unless otherwise mentioned, the DNN is trained with ten devices for 100 epochs with a batch size of 64.

Now, as the raw EM traces collected from the ten training devices (100K traces each) are fed to the DNN classifier, the validation accuracy remains low (<1%<1\%) although training accuracy increases, even after 100 epochs. Figure 4 (blue curve) shows the effect of averaging on the same-device (test) accuracy. Even with 20×20\times averaging, the time-domain traces shows a test accuracy of <1%<1\%, while a dimensionality reduction using PCA achieves >99%>99\% test accuracy for the same device. For cross-device attacks, the accuracy is lower, only 90% with 20×20\times averaging and PCA. Figure 9 shows this result in terms of SNR, and shows the DNN’s accuracy for lower levels of SNR as well (achieved by lowering the amount of averaging). As expected the accuracy lowers with the SNR, following a similar pattern to Figure 4. Next, we will look into the effect of augmenting traces from ten training devices along with 20×20\times averaging and different pre-processing strategies on the cross-device accuracy. Note that, unless otherwise specified, cross-device accuracy refers to the average key prediction accuracy of the EM-X-DL attack across all the ten test devices.

Refer to caption
Figure 9. Effect of side-channel SNR on the test accuracy of the 256-class DNN when using PCA-transformed traces. As expected, at higher SNR levels, the DNN achieves higher levels of cross-device accuracy. Note that the limited SNR range is due to the use of avegraging to change SNR, while a majority of SNR reduction is a result of cross-device variations, as seen in Figure 2.

3.3. Single-Trace Attack with Pre-Processing

In the previous sub-section, it was shown that the averaged time-domain EM traces (100K ×\times 10 devices) do not train the DNN efficiently, while dimensionality reduction techniques like PCA have a significant impact in training the DNN. Here, we study the effects of PCA (Golder et al., 2019), LDA  (Renauld et al., 2011) on the time-domain EM traces, as well as the effects of frequency domain based processing (FFT, spectrogram  (Rechberger and Oswald, 2005; Yang et al., 2019)) on the cross-device accuracy.

3.3.1. Dimensionality Reduction using PCA & LDA

Refer to caption
Figure 10. Distribution of cross-device accuracy of the 256-class PCA-DNN trained on random subsets of 6 devices. The mean accuracy is  60%, however, the depending on the subset, it can vary significantly between 3075%30-75\%, highlighting the need for an intelligent selection of the training devices.

PCA transforms the input EM trace samples to their principal sub-space where individual features maximize the variance, while LDA achieves the same effect by maximizing the inter-class separation. As seen in Figure 7(a, b), the optimal number of features to use in these techniques is much lower than the dimensionality of the raw trace, around 250 in the case of PCA, and a mere 10 in the case of LDA. As shown in Figure 8(a, b), both of these techniques lead to roughly similar cross-device accuracy, 91%~{}91\%. However, LDA is more efficient as it requires significantly lower training time (<10×<10\times) than PCA to achieve the same level of accuracy.

3.3.2. Frequency Domain Analysis using FFT & Spectrogram

Using FFT on the time-domain averaged (20×20\times) EM traces produces an EM-X-DL attack accuracy of 91%\sim 91\% (Figure 8(b)), which is similar to PCA/LDA. However, it requires higher training time than both PCA and LDA, and hence is not the most efficient approach. Spectrogram combines both time- and frequency-domain information and is naturally two-dimensional. Hence a 2-D CNN  (Simonyan and Zisserman, 2015) is used for the spectrogram, which achieves a cross-device accuracy of 74.6%74.6\% (Figure 8(b)).

4. EM-X-DL SCA: EFFICIENT SELECTION OF TRAINING DEVICES

Input: Trace Samples from all Devices: TraceData, Number of Devices to select: nDev
Output: Subset of size nDev
  for dev=1:dev=1: length(TraceData) do
     μ1=\mu_{1}= mean(TraceData[dev][:,POI[1])
     μ2=\mu_{2}= mean(TraceData[dev][:,POI[2])
     meanMap.append(dev, (μ1,μ2\mu_{1},\mu_{2}))
  end for
  subset = [1]
  for i=1:i=1: nDev1-1 do
     μtrain\mu_{train} = mean(meanMap[subset])
     nextDev = argmaxj||μtrain\operatorname*{argmax}_{j}||\mu_{train}-meanMap[j][2]||||
     subset.add(meanMap[nextDev][1])
     meanMap.remove(nextDev)
  end for
  return subset
Algorithm 1 Algorithm for Device Selection

As shown in the previous works (Das et al., 2019),(Bhasin et al., 2019), the challenge of a ML SCA model being able to accurately classify traces collected from devices it has not been trained with, can be addressed by training with a variety of devices, so that the model does not overfit to the particular leakage pattern of one device. This remains true when using EM traces, however, many more devices are required to gain a high level of cross-device accuracy, because, as seen in Figure 2, EM measurements are more sensitive to cross-device variations. Moreover, averaging clearly plays a key role, further increasing the number of traces required. Thus, it is of interest to be able to train using the smallest possible set of devices, reducing both the number of traces needed as well as the training time for the DNN. For this, two things must hold true: First, the choice of devices must affect the cross-device accuracy for a given number of devices, and second, there must be a way of determining whether or not to include a device for training from a small sample of traces.

4.1. Cross-Device Accuracy Variance

To address the first point, the effect of the subset, the EM-X-DL model is trained with a random subset of six devices, then tested against all the remaining fourteen devices. As shown in Figure 10, the average cross-device accuracy can vary greatly even for a set of only six devices, with accuracy ranging from 10% to 75% for different six-device combinations. This shows that there are subsets of training devices that can improve accuracy rather than simply adding more devices. However, as there are a large number of possible subsets for a given size, an algorithm is necessary to choose one such subset which results in high cross-device accuracy. Such an algorithm would then enable an attacker to gather quick measurements from a large set of devices, and determine a small subset of devices to collect a large number of traces from, for training the DNN model.

Refer to caption
Figure 11. Bivariate analysis of the first 3 devices chosen by Algorithm 1. The top three chosen devices already span a large portion of the distribution containing all devices.

4.2. Bivariate POI Based Device Selection

The proposed algorithm begins by identifying two points of interest (POIs) in the traces. This can be done through any POI identification technique, here POIs are chosen as time samples which have the highest difference of means (DOM). Once the top two POIs are found, the mean 𝝁𝒊=(μPOI1,μPOI2)\boldsymbol{\mu_{i}}=(\mu_{POI1},\mu_{POI2}) of this POI pair is calculated across all traces for each device. Then, to construct the subset of devices for training, one device is initially chosen arbitrarily, and additional devices are added as follows: The mean POI pair of all devices currently included in the training subset, 𝝁𝒕𝒓𝒂𝒊𝒏\boldsymbol{\mu_{train}} is calculated. Then, the next device is chosen such that 𝝁𝒊𝝁𝒕𝒓𝒂𝒊𝒏2||\boldsymbol{\mu_{i}}-\boldsymbol{\mu_{train}}||_{2} is maximized, where ii varies over all devices not already included in the training subset. In this way, at each step, the device whose top two average POIs are furthest from the average POIs of the currently selected devices is added to the training set. This method is detailed in Algorithm 1.

Refer to caption
Figure 12. Depending on the choice of devices used for training, cross-device accuracy varies significantly. Choosing “dissimilar” devices by algorithm 1 gives high accuracy, while choosing “similar” training devices yields a low cross-device accuracy. Randomly selecting devices shows slightly higher test accuracies than choosing “similar” devices.
Refer to caption
Figure 13. (a) 10×1010\times 10 virtual grid overlay of the chip. (b, c) Comparison of EM-X-DL model accuracy to CEMA-MTD. The ML model is able to predict with high accuracy in the region of the chip with low MTD values, however, when the MTD rises above 250, the model is unable to correctly predict the key values. (d) EM-X-DL model predictions on 20 samples from a high leakage location (1,1), and a low leakage location (9, 4) on a test device. At a location with high leakage, the frequency of the highest predicted key byte value is distinguishable from the next, demonstrating the high confidence of the attacker.

Figure 11 shows the 2-D bivariate normal distribution of the first three devices chosen using this algorithm, along with the total distribution of all devices. With these three devices, a large portion of the distribution spanned by all the devices is covered, revealing the successful operation of the algorithm. Importantly, this algorithm also provides the desired results during training, shown in Figure 12, as using this algorithm to choose the training devices gives higher cross-device (EM-X-DL) accuracy for any number of devices. Additionally, training with the devices closest to the current training set, as opposed to the furthest away, results in cross-device accuracy significantly lower than the maximally different devices, and generally lower accuracy than randomly selected devices as well. These results were obtained with the proposed 256-class DNN, using 20×20\times averaging and PCA-based pre-processing. From Figure 12, we also see that to attain a certain cross-device accuracy, this algorithm requires between 20%40%20\%-40\% fewer training devices compared to random device selection.

5. EM LEAKAGE ASSESSMENT & ATTACK

Once the DNN model for the EM-X-DL SCA is trained, the main goal of an attacker is to break the secret key with minimum number of traces from an identical but unseen target device. This section demonstrates an end-to-end attack strategy using the EM-X-DL model on a new device. By scanning the surface of the victim microcontroller and collecting traces at each point (seen in Figure 13(a)), the heatmap in Figure 13(b) was created by classifying the traces and determining the test accuracy for each point. As all training traces were collected from the same location (with maximum leakage on the chip evaluated using test vector leakage assessment (TVLA)), as expected the accuracy is highest in this region, then drops off sharply further from the measurement point. Figure 13(c) shows the minimum traces to disclosure (MTD) from a CEMA attack over the same chip. Comparing this to the accuracy heatmap shows that the ML model can correctly classify traces that are collected from a location which has an MTD less than 250250.

Now, in this virtual grid, to converge to the best location for the EM-X-DL attack on the new device, the attacker can query the EM-X-DL model with multiple averaged traces collected from the test device and observe if the frequency of the highest predicted key byte is distinguishable from the next. Should leakage be present, the correct key byte would be predicted more often than others. If leakage is not present, predictions would be split between several key values. Thus, the ratio between the first and second most commonly predicted value provides a measure of the attacker’s confidence in the prediction. This effect is shown in Figure 13(d), which shows the five most common predictions for both a location of high leakage,(1,2) (left) and low leakage, (2,9) (right). Note that, with this prior knowledge of the heatmap, the attacker can also divide the chip into 4 quadrants (for this particular chip) and get the correct key from the left most quadrant with a very high confidence.

Preprocessing{Preprocessing} CrossDeviceAccuracy(%){Cross-Device\ Accuracy\ (\%)}
Technique{Technique} Minimum Average Maximum
Time Domain 0.28 0.37 0.45
PCA 81.27 90.72 96.77
LDA 81.21 91.52 96.42
FFT 82.40 91.07 95.50
Spectrogram 30.53 74.58 94.02
Table 2. Cross-Device attack Performance of Deep Learning-based Methods for different Pre-Processing Techniques

6. REMARKS & CONCLUSION

This work showed a Cross-device Deep Learning based EM (EM-X-DL) SCA attack on a symmetric key encryption engine (AES-128) in a low SNR setting. Utilizing a 256-class DNN, averaged EM traces from 10 training devices along with dimensionality-reduction based pre-processing (like LDA) the model achieves 91.5%\sim 91.5\% EM-X-DL single-trace (averaged) attack accuracy against another set of ten test devices. Table 2 summarizes the EM-X-DL attack accuracy for each of the different techniques studied in this article. An algorithm for efficient selection of training devices is proposed to speed up the profiling phase. Finally, an end-to-end attack using EM scanning is demonstrated showing that the attacker can detect the position of highest leakage on the chip using the proposed EM-X-DL model along with the secret key with high confidence.

For the future scope of this work, the end-to-end EM-X-DL attack can be more generalized by capturing traces from multiple locations across the chip, rather than a single location, for training the DNN. This would make the EM-X-DL attack much more efficient and faster as the attacker would be able to extract the key without having to detect one of the highest leakage locations on the chip.

Additionally, it was shown that the SNR of traces used for training and testing have a strong impact on the accuracy of the produced DNN as expected. This encourages development of countermeasures focused on reducing the SNR of side-channel signals, such as (Das et al., 2019). While such countermeasures can always fundamentally be defeated by collecting additional traces, by reducing the SNR significantly, collecting a sufficient number of traces from a large enough variety of devices becomes infeasible.

References

  • (1)
  • Abadi et al. (2016) Martın Abadi et al. 2016. TensorFlow: A system for large-scale machine learning. (2016), 21.
  • Agrawal et al. (2003) Dakshi Agrawal et al. 2003. The EM Side—Channel(s). In CHES 2002. 29–45. https://doi.org/10.1007/3-540-36400-5_4
  • Becker et al. (2013) George Becker, J Cooper, Elke DeMulder, Gilbert Goodwill, Joshua Jaffe, G Kenworthy, T Kouzminov, A Leiserson, M Marson, Pankaj Rohatgi, et al. 2013. Test vector leakage assessment (TVLA) methodology in practice. In International Cryptographic Module Conference, Vol. 1001. 13.
  • Bhasin et al. (2019) Shivam Bhasin et al. 2019. Mind the Portability: A Warriors Guide through Realistic Profiled Side-channel Analysis. http://eprint.iacr.org/2019/661
  • Brier et al. (2004) Eric Brier, Christophe Clavier, and Francis Olivier. 2004. Correlation Power Analysis with a Leakage Model. In CHES 2004. 16–29.
  • Cagli et al. (2017) Eleonora Cagli et al. 2017. Convolutional Neural Networks with Data Augmentation Against Jitter-Based Countermeasures. In CHES 2017.
  • Carbone et al. (2019) Mathieu Carbone et al. 2019. Deep Learning to Evaluate Secure RSA Implementations. (2019), 132–161. https://doi.org/10.13154/tches.v2019.i2.132-161
  • Chari et al. (2003) Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. 2003. Template Attacks. In CHES 2002. 13–28. https://doi.org/10.1007/3-540-36400-5_3
  • Choudary and Kuhn (2018) Marios O. Choudary and Markus G. Kuhn. 2018. Efficient, Portable Template Attacks. 13, 2 (2018), 490–501. https://doi.org/10.1109/TIFS.2017.2757440
  • Das et al. (2019) D. Das, M. Nath, B. Chatterjee, S. Ghosh, and S. Sen. 2019. STELLAR: A Generic EM Side-Channel Attack Protection through Ground-Up Root-cause Analysis. In 2019 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). IEEE Computer Society, Los Alamitos, CA, USA, 11–20. https://doi.org/10.1109/HST.2019.8740839
  • Das et al. (2019) Debayan Das et al. 2019. X-DeepSCA: Cross-Device Deep Learning Side Channel Attack. In DAC 2019. ACM, 134:1–134:6. https://doi.org/10.1145/3316781.3317934
  • et al. (2013) T. Bartkewitz et al. 2013. Efficient Template Attacks Based on Probabilistic Multi-class Support Vector Machines. In Smart Card Research & Advanced Applications.
  • Gilmore et al. (2015) R. Gilmore et al. 2015. Neural network based attack on a masked implementation of AES. In HOST 2015. 106–111. https://doi.org/10.1109/HST.2015.7140247
  • Golder et al. (2019) Anupam Golder et al. 2019. Practical Approaches Towards Deep-Learning Based Cross-Device Power Side Channel Attack.
  • Kocher et al. (1999) Paul Kocher, Joshua Jaffe, and Benjamin Jun. 1999. Differential Power Analysis. In CRYPTO’ 99. 388–397.
  • Kocher (1996) Paul C. Kocher. 1996. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In CRYPTO ’96. https://doi.org/10.1007/3-540-68697-5_9
  • Lerman et al. (2014) Liran Lerman et al. 2014. Power analysis attack: An approach based on machine learning. 3 (2014). https://doi.org/10.1504/IJACT.2014.062722
  • Maghrebi et al. (2016) Houssem Maghrebi et al. 2016. Breaking Cryptographic Implementations Using Deep Learning Techniques. http://eprint.iacr.org/2016/921
  • Mangard ([n. d.]) Stefan Mangard. [n. d.]. Hardware Countermeasures against DPA – A Statistical Analysis of Their Effectiveness. In Topics in Cryptology – CT-RSA 2004 (2004) (Lecture Notes in Computer Science), Tatsuaki Okamoto (Ed.). Springer Berlin Heidelberg, 222–235.
  • Montminy et al. (2013) David P. Montminy et al. 2013. Improving cross-device attacks using zero-mean unit-variance normalization. (2013). https://doi.org/10.1007/s13389-012-0038-y
  • O’Flynn and Chen (2014) Colin O’Flynn and Zhizhang Chen. 2014. ChipWhisperer: An Open-Source Platform for Hardware Embedded Security Research. In COSADE 2014. 243–260.
  • Prouff et al. (2018) Emmanuel Prouff et al. 2018. Study of Deep Learning Techniques for Side-Channel Analysis and Introduction to ASCAD Database. http://eprint.iacr.org/2018/053
  • Rechberger and Oswald (2005) Christian Rechberger and Elisabeth Oswald. 2005. Practical Template Attacks. In Information Security Applications. https://doi.org/10.1007/978-3-540-31815-6_35
  • Renauld et al. (2011) Mathieu Renauld et al. 2011. A Formal Study of Power Variability Issues and Side-Channel Attacks for Nanoscale Devices. In EUROCRYPT 2011 (Lecture Notes in Computer Science). Springer, 109–128. https://doi.org/10.1007/978-3-642-20465-4_8
  • Simonyan and Zisserman (2015) Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. (2015). http://arxiv.org/abs/1409.1556
  • Yang et al. (2019) Guang Yang et al. 2019. Convolutional Neural Network Based Side-Channel Attacks in Time-Frequency Representations. In Smart Card Research and Advanced Applications. 1–17. https://doi.org/10.1007/978-3-030-15462-2_1