[table]capposition=top
Improved (Related-key) Differential-based Neural Distinguishers for SIMON and SIMECK Block Ciphers
Abstract
In CRYPTO 2019, Gohr made a pioneering attempt and successfully applied deep learning to the differential cryptanalysis against NSA block cipher Speck32/64, achieving higher accuracy than the pure differential distinguishers. By its very nature, mining effective features in data plays a crucial role in data-driven deep learning. In this paper, in addition to considering the integrity of the information from the training data of the ciphertext pair, domain knowledge about the structure of differential cryptanalysis is also considered into the training process of deep learning to improve the performance. Meanwhile, taking the performance of the differential-neural distinguisher of Simon32/64 as an entry point, we investigate the impact of input difference on the performance of the hybrid distinguishers to choose the proper input difference. Eventually, we improve the accuracy of the neural distinguishers of Simon32/64, Simon64/128, Simeck32/64, and Simeck64/128. We also obtain related-key differential-based neural distinguishers on round-reduced versions of Simon32/64, Simon64/128, Simeck32/64, and Simeck64/128 for the first time.
keywords:
Deep Learning; (Related-key) Differential Distinguisher; Simon; Simeck; Input Difference1 Introduction
The security analysis of many cryptographic primitives (such as pseudo-random number generators, hash functions, etc.) is usually attributed to attacks on the underlying block ciphers. Various cryptanalytic methods have been proposed over the past few decades, including differential cryptanalysis [1], linear cryptanalysis [2], integral cryptanalysis [3], zero-correlation linear cryptanalysis [4], etc. A block cipher must be able to resist all known cryptanalysis to obtain a strong security statement. In recent years, solver-based automatic tools and dedicated heuristic search algorithms have been extensively adopted to improve the accuracy and efficiency in cryptanalysis of block ciphers, where the cryptanalytic models are often transformed into MILP problems [5, 6], SAT/SMT problems [7, 8] or CP problems [9, 10]. Automatic search technology has improved the analysis ability of block ciphers. The improvement and development of these automatic search technologies provide an inexhaustible source of thought for the design and analysis of block ciphers. However, these search technologies do not extract any new features that are not available manually. Therefore, once optimal distinguishers are obtained, these automatic tools would exert less influence in improving attacks.
Recently, under the joint driven form of big data and the availability of computing hardware, deep learning [11, 12] has made remarkable progress and spread over almost every field of science and technology. Some researchers explored the feasibility of applying machine learning to the field of cryptography. In ASIACRYPT 1991, Rivest [13] made preliminary explorations of the possible connection between cryptography and machine learning, and some researchers applied machine learning in side channel analysis successfully, such as [14, 15]. However, few researchers focused on the application of machine learning to black box cryptanalysis, until the process of applying deep learning to black box cryptanalysis was accelerated by the remarkable work of Gohr [16].
Deep learning algorithms can analyze data and learn effective patterns for predicting new samples. Based on this, Gohr trained a deep neural network using the labeled (labels 0 and 1) ciphertext pairs as training data, where the data with label 1 comes from the encrypted plaintext pair with fixed input difference, and the data with label 0 is a random number. The trained neural network then is used to distinguish between the real ciphertext pairs and random pairs. When his network is applied to Speck32/64, higher accuracy than the classical differential (CD) is achieved. Although the number of rounds using his network has not yet surpassed the number of rounds achieved by the most advanced technology, the neural distinguisher (ND) under the same number of rounds uses some information that the CD has not tapped.
More importantly, a potent key recovery attack is created by combining NDs with CDs and highly selective key search strategies. In essence, the NDs are too short to be used in key recovery and must be prepended with CDs to get the hybrid distinguishers (HDs). Making the resulting HDs usable in a key recovery attack requires better NDs or prepended CDs. Researchers have provided solutions from various angles. Benamira et al. [17] analyzed and explained the inner workings of Gohr’s neural network and enhanced the accuracy of the NDs by creating batches of ciphertext inputs instead of pairs. Bao et al. [18] enhanced the CD’s neutral bits and trained better NDs by investigating different neural networks, enabling key recovery attacks for the 13-round Speck32/64 and 16-round Simon32/64.
Our contribution:
-
•
In this paper, we present (related-key) differential-based neural distinguishers on Simon and Simeck block ciphers. To better match our neural network and increase the accuracy of the neural distinguisher, we adopt the multiple ciphertext pairs (8 ciphertext pairs) to train the neural network fed with the data of form . Fig. 1 shows a schematic representation of these notations. Also, we employ the SE-ResNet network (Fig. 2) due to the success of ResNet on Speck [16] and SENet on Simon [18], as well as their superior performance on classification tasks.
-
•
We notice that the choice of the ND or connecting difference is critical to obtain the best hybrid distinguishers. Therefore, taking the performance of the differential-neural distinguisher of Simon32/64 as an entry point, we investigate the impact of input difference on the performance of the hybrid distinguishers to choose the proper input difference. As a result, the input difference (0,) is a good choice to obtain hybrid distinguishers for Simon-like ciphers.
-
•
Eventually, we build neural distinguishers for Simon32/64, Simon64/128, Simeck32/64 and Simeck64/128. The results are shown in Table 1, which shows that we improve the accuracy of the distinguishers. Meanwhile, we successfully construct the related-key neural distinguishers against Simon32/64, Simon64/128, Simeck32/64 and Simeck64/128 for the first time.
In this paper, the experiment is conducted by Python 3.6.10 in Ubuntu 18.04. The models are implemented by Tensorflow 2.5.0. The experiment uses a server with Intel(R) Xeon(R) Gold 6248 CPU *4 with 2.50GHz, 512GB RAM, and NVIDIA Tesla T4 16GB. The source code is available on Github111https://github.com/JIN-smile/Improved-Related-key-Differential-based-Neural-Distinguishers.
Ciphers | Attack Model | Round | Input difference | Accuracy | TPR | TNR | Source |
Simon 32/64 | ND | 9† | (0x0,0x40) | 0.8940 | 0.8728 | 0.9152 | [18] |
9 | (0x0,0x40) | 0.9176 | 0.9052 | 0.9299 | 5 | ||
10*† | (0x0,0x40) | 0.6865 | 0.6817 | 0.6912 | [18] | ||
10 | (0x0,0x40) | 0.6975 | 0.6662 | 0.7287 | 5 | ||
11*† | (0x0,0x40) | 0.5568 | 0.5419 | 0.5717 | [18] | ||
11 | (0x0,0x40) | 0.5609 | 0.5366 | 0.5852 | 5 | ||
12 | (0x1,0x4) | 0.5152 | 0.4799 | 0.5505 | 5 | ||
12* | (0x0,0x40) | 0.5142 | 0.5029 | 0.5254 | |||
RKND | 10 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 1 | 1 | 1 | 5 | |
11 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.9604 | 0.9639 | 0.9569 | |||
12 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.6477 | 0.6518 | 0.6435 | |||
13 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.5262 | 0.5437 | 0.5081 | |||
Simeck 32/64 | ND | 9 | (0x0,0x40) | 0.9952 | 0.9989 | 0.9914 | 6 |
10 | (0x0,0x40) | 0.7354 | 0.7207 | 0.7501 | |||
11 | (0x0,0x40) | 0.5646 | 0.5356 | 0.5936 | |||
12* | (0x0,0x40) | 0.5146 | 0.4770 | 0.5522 | |||
RKND | 13 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.9950 | 0.9990 | 0.9910 | 6 | |
14 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.6679 | 0.6425 | 0.6933 | |||
15 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.5467 | 0.5173 | 0.5762 | |||
Simon 64/128 | ND | 11 | (0x0,0x40) | 0.9181 | 0.9045 | 0.9318 | 5 |
12 | (0x0,0x40) | 0.7117 | 0.6705 | 0.7530 | |||
13 | (0x0,0x40) | 0.5722 | 0.5230 | 0.6215 | |||
14 | (0x0,0x40) | 0.5148 | 0.4697 | 0.5600 | |||
14* | (0x0,0x40) | 0.5185 | 0.4663 | 0.5707 | |||
RKND | 12 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.9880 | 0.9894 | 0.9865 | 5 | |
13 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.8398 | 0.8389 | 0.8408 | |||
14 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.5788 | 0.5894 | 0.5682 | |||
Simeck 64/128 | ND | 14 | (0x0,0x40) | 0.9142 | 0.8914 | 0.9371 | 6 |
15 | (0x0,0x40) | 0.7663 | 0.6981 | 0.8345 | |||
16 | (0x0,0x40) | 0.6356 | 0.5245 | 0.7467 | |||
17 | (0x0,0x40) | 0.5577 | 0.4301 | 0.6853 | |||
18 | (0x0,0x40) | 0.5202 | 0.3917 | 0.6486 | |||
18* | (0x0,0x40) | 0.5218 | 0.3927 | 0.6510 | |||
RKND | 18 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.9066 | 0.8837 | 0.9295 | 6 | |
19 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.7558 | 0.6845 | 0.8270 | |||
20 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.6229 | 0.5104 | 0.7354 | |||
21 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.5519 | 0.4248 | 0.6790 | |||
22 | (0x0,0x40),(0x0,0x0,0x0,0x40) | 0.5180 | 0.3906 | 0.6455 |
Organization.
Section 2 recalls Simon-like ciphers, (related-key) differential cryptanalysis and CNN network. Section 3 introduces improved (related-key) differential-based neural distinguishers, including the batches of ciphertext pairs with new data format, and the network architecture. Section 4 compares the performance of the hybrid distinguisher with different input difference. Section 5 gives the (related-key) differential-neural distinguishers for round-reduced Simon32/64 and Simon64/128. Section 6 provides the (related-key) differential-neural distinguishers for round-reduced Simeck32/64 and Simeck64/128. Section 7 concludes this paper.
2 Related works
2.1 Notations
Table 2 presents the notations used in this paper.
Notation | Description |
---|---|
Binary vector of bits; is the bit in position with the least significant one. | |
Bitwise AND between and . | |
Bitwise XOR between and . | |
Concatenation of and . | |
Circular left shift of by bits. | |
Circular right shift of by bits. | |
A set of plaintext pairs with left and right branches where and . | |
A set of ciphertext pairs with left and right branches where and . |
2.2 A Brief Description of Simon and Simeck Ciphers
Simon. The lightweight family of AND-RX block ciphers Simon was proposed by the National Security Agency (NSA) in 2013. It adopts the Feistel structure and the round function consists of bitwise AND (, bitwise XOR () and cyclic left shift bit () operation composition. The designer provides ten versions, all marked as Simon, where represents the block size, represents the key length, , . The round function of Simon algorithm is defined as:
The round keys are generated using a linear key schedule through the . A more complete description can refer to paper [19].
Simeck. The Simeck family of lightweight block ciphers was designed by Yang et al. [20], aiming at improving the hardware implementation cost of Simon. Simeck denotes an instance with a -bit block and a -bit key for . The round function of Simeck algorithm is defined as:
Conversely, Simeck uses the non-linear key schedule which reuses the cipher’s round function to generate the round keys. A more complete description can be found in [20].
Simon-like ciphers. Iterated ciphers that use Simon’s round function and generalize it to accept arbitrary rotational parameters are known as Simon-like ciphers . The Simon-like function is then , which the rotational parameters are (8,1,2) and (5,0,1) for all Simon and Simeck versions, respectively.
2.3 (Related-key) Differential Cryptanalysis
Differential cryptanalysis is a chosen-plaintext attack introduced by Biham and Shamir in [1]. It analyzes the effect of the difference of a plaintext pair on the difference of succeeding round outputs in an iterated cipher. Differential cryptanalysis is a widely used tool for the cryptanalysis of encryption algorithms and the development of new attacks due to its generality. Resistance to differential cryptanalysis became one of the basic criteria in the evaluation of the security of block ciphers.
Definition 2.1 (Difference).
[1] Let and be two bit strings of length , then the difference between and is defined as: .
Definition 2.2 (Differential Pair).
[1] Let be -bit vectors, the difference value of the input pair of the block cipher is , after -round of encryption, the difference value of the output pair is , and let a round function , then is called an -round differential pair of block cipher, where is the input difference of round function , is the output difference of . In particular, when , characterizes the differential propagation characteristics of the round function .
For a specific cipher, the differential must be carefully selected to make the differential attack successful. This makes researchers need to study the internal process of the algorithm. The basic method is to track a path passed by a high probability differential at different stages of encryption. This is called differential characteristics in cryptography and is defined as follows.
Definition 2.3 (Differential Characteristics).
[1] Let be -bit vectors and be an -bit constant. When the difference value of the input pair satisfies , the difference value of the intermediate state satisfies during the -th round of encryption, where, . Then, can be named an -round differential characteristic of an iterative block cipher.
For given differential characteristics, use the following definition to calculate its probability.
Definition 2.4.
[1] The probability corresponding to an -round differential characteristic of the iterative block cipher refers to the case where the input and the round keys are independent and random distributed, when the differential value of the input pair is , in the -round encryption process, the difference value of the intermediate state satisfies the probability of , where . Under the above assumption, the probability of the differential characteristic is equal to the product of the differential propagation probabilities of each round, i.e.,:
When the input difference undergoes a linear operation, it will be propagated through the operation with probability 1, and the output difference is deterministic, such as XOR () and cyclic shift () in the ARX operation. When the input difference passes through a non-linear operation, the difference propagation is often probabilistic.
Related-key differential cryptanalysis was introduced by Biham in [21]. Unlike the single-key differentials that have differences only in the plaintexts, related-key differential distinguishers have differences in the master keys as well. It exploits the output differences given a pair of plaintexts and encrypted by a pair of related keys and , respectively. Related-keys differential cryptanalysis is also one of the basic criteria in the evaluation of the security of block ciphers, which has successfully attacked many block ciphers, such as [22, 23, 24].
2.4 Convolutional Neural Network
Convolutional neural network (CNN) is an important paradigm in deep learning. CNN is usually composed of the convolutional layer, non-linear layer, pooling layer and fully connected layer. According to the convolution dimension of the feature map, it can be divided into one-, two-, and three-dimensional convolutional neural network (i.e., 1D-CNN, 2D-CNN and 3D-CNN), where the 1D-CNN applies a convolution over a fixed (multi-)temporal input signal.
Convolution Layer (CONV). Convolution is the basic operation of CNN, and its main purpose is to extract features. The core task of CNN is to learn parameters to extract effective patterns. In the forward propagation, the training data will go through the convolution kernel with initial parameters to obtain the initial output. In the back propagation, a loss function will be applied to adjust the parameters to minimize the gap between the initial output and the target label. After several iterations, when the loss stabilizes, the training process will be finished. Note that in this paper we apply 1D-CNN, then the convolution layer can be denoted by Conv1D.
Non-linear layer. The main purpose of the non-linear layer is to introduce non-linear characteristics into the system. The most common non-linear layer in a CNN network is the rectified linear unit (ReLU) function, defined as . Effectively, it removes negative values from an activation map by setting them to zero. It increases the nonlinear properties of the decision function and of the overall network without affecting the receptive fields of the convolution layer. Other functions are also used to increase nonlinearity, such as the sigmoid function. ReLU is often preferred to other functions because it trains the neural network several times faster without a significant penalty to generalization accuracy.
Fully connected layer (FC). The fully connected layer is generally located in the back layers of the network for performing the classification task. Usually, the input of the fully connected layer is the flatten feature map generated by convolution layer.
In addition, some functional layers may be used in CNN. For example, Batch Normalization (BN) can be applied after the convolution layer to reduce the internal covariate shift, which can effectively prevent the gradient disappearance problem and speed up network training.
Residual Network (ResNet) is one of the most representative CNNs, which was proposed by He et al. [25] in 2015. ResNet can train a deeper CNN model to achieve higher accuracy. The core idea is to establish “shortcuts (skip) connections” between the front layer and the back layer. It is composed of a series of residual blocks. A residual block can be expressed as:
It is divided into two parts: the direct mapping part and the residual part. is the residual part, which is generally composed of two or three convolution operations. The activation functions of ReLU and BN can be rearranged to create a variety of residual block variants.
Squeeze-and-Excitation Network (SENet) is a new network structure proposed by Hu et al. that won the first place in ILSVRC 2017 classification competition [26]. The “Squeeze-and-Excitation” (SE) block adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. It can be integrated into standard architectures by insertion after the non-linearity following each convolution. In this paper, SE block is used directly with the residual network, i.e., the SE-ResNet network.
3 Improved (Related-key) Differential-based Neural Distinguishers
3.1 Dateset: Multiple Ciphertext Pairs with New Data Format
Data plays a very important role in deep learning, data preparation is a fundamental step for deep learning model development. Some researchers explored the use of multiple ciphertext pairs to improve the performance of differential-based neural distinguishers [17, 27, 28]. Some researchers also performed additional transformations on each pair of ciphertexts before feeding them into the network. Concretely, in Gohr’s work, the -round NDs fed with data of form . Subsequently, Benamira et al. [17] conjected the first convolution layer of Gohr’s neural network transforms the input into and a linear combination of those terms. In [28], Hou et al. designed the NDs model with multiple output differences as a sample, i.e., the -round NDs fed multiple pairs with data of form . In [18], Bao et al. accepted the -round NDs fed with data of form , where for Simon ciphers.
In this paper, we employ multiple ciphertext pairs with new data of form to improve the performance of neural distinguishers (the reason for choosing this data format is given in Section 5.3). Then, the process of constructing a dataset can be described.
For the differential-neural distinguisher, first encrypt the plaintext pairs with a random key to get the ciphertext pairs. Then, use the ciphertext pairs to get the data:
where the set of row is denoted by .
Finally, splice and convert it into a string of binary as a sample, and each sample will be attached a label :
where is a constant input difference. It examines how to select the in Section 4.
Unlike differential-neural distinguisher, which uses a random key to encrypt the plaintext pairs, related-key differential-neural distinguisher uses a pair of keys with a difference of to encrypt the plaintext pairs.
We construct the dataset based on the above steps and set . In the basic training process, the size of the training set is , and the test set is . Meanwhile, there is an independent key used for each sample. Therefore, the training set has corresponding random keys, and the test set has corresponding random keys.

3.2 Network Architecture
A deep learning architecture is a multilayer stack of simple modules, most of which are subject to learning, and many of which compute non-linear input-output mappings. Each module in the stack transforms its input to increase both the selectivity and the invariance of the representation. With multiple non-linear layers, say a depth of 5 to 20, a system can implement extremely intricate functions of its inputs that are simultaneously sensitive to minute details.
Given the success of ResNet on Speck [16] and SENet on Simon [18], as well as their superior performance on classification tasks, we use the SE-ResNet network. As shown in Fig. 2, the network consists of three main components: input layer, iteration layer and predict layer. The input layer uses one Conv1D layer and two Dense layers to receive fixed length training data. In the iteration layer, use 5 SE-ResNet modules where each module contains two Conv1D layers and one SE block. To make the network learning more stable and alleviate the problem of gradient disappearance, a BN layer is applied after each Conv1D layer, and then followed by an activation layer with ReLU function. Finally, in predict layer, to make the data smoothly transform from the convolutional layer to the fully connected layer, we introduce a flatten layer to perform one-dimensional flattening of the data output from the convolutional layer. The fully connected layer consists of two Dense layers where each has 64 neurons and an output unit with only one neuron.
We set the batch size to 30000, cyclic learning rate with , , for epoch , which is denoted as cyclic lr. Adam [29] is used as the optimizer with mean squared error (MSE) loss function and L2 regularization parameterized by . Each dataset is trained with 120 epochs for the basic training method. The accuracy, TPR, and TNR of the ND are the average results after 5 repetitions.

4 Comparing the Performance of the Hybrid Distinguisher with Different Input Difference
In this section, we investigate the effect of input difference on the performance of the hybrid distinguishers. Essentially, to be used in key-recovery, the NDs are too short such that they have to be prepended with classical differentials. Whether the resulting HDs can be used in a key-recovery attack depends on whether the input difference of NDs leads to better accuracy and, at the same time, leads to prepended CDs with high differential probability.
Therefore, taking the performance of the hybrid distinguisher of Simon32/64 as an entry point, we investigate the issue in two phases. In the first stage, we study the performance of all input differences with Hamming weights of 1, 2, and 3 on the 11-round ND, and filter the input differences that can obtain a non-marginal advantage (accuracy above 0.50). Then study the performance of these filtered input differences on 12-round ND. In the second stage, we study the probability of the prepended CDs with these filtered input differences.
The First Stage
Let HW denote the Hamming weight of the input difference, then there are input difference with HW. Based on Section 3, traversing these input difference with the batch size 30000 and cyclic lr, we construct 11-round ND of Simon32/64, respectively. There are 128 input differences filtered, of which 48 have an accuracy between 0.51-0.52 and 80 have an accuracy between 0.54-0.56. Therefore, we mainly focus on the performance of these 80 input differences. The results with these 80 input differences are shown in Fig. 3.

It is discovered that 11-round ND with input difference and input difference have similar accuracy, for . Thus, we only list one of these 16 input differences in Table 4. Specifically, for HW, using the input difference (omit the symbol):
(0000,0001), (0000,0002), (0000,0004), (0000,0008),
(0000,0010), (0000,0020), (0000,0040), (0000,0080),
(0000,0100), (0000,0200), (0000,0400), (0000,8000),
(0000,1000), (0000,2000), (0000,4000), (0000,8000),
can construct 11-round ND of Simon32/64 with an accuracy of about 0.561.
For HW, using the input difference:
(0001,0004), (0002,0008), (0004,0010), (0008,0020),
(0010,0040), (0020,0080), (0040,0100), (0080,0200),
(0100,0400), (0200,0800), (0400,1000), (0800,2000),
(1000,4000), (2000,8000), (4000,0001), (8000,0002),
can build 11-round ND of Simon32/64 with an accuracy of about 0.560.
For HW, there are three sets of . Using the input difference:
(0001,0104), (0002,0208), (0004,0410), (0008,0820),
(0010,1040), (0020,2080), (0040,4100), (0080,8200),
(0100,0401), (0200,0802), (0400,1004), (0800,2008),
(1000,4010), (2000,8020), (4000,0041), (8000,0082),
can construct 11-round ND of Simon32/64 with an accuracy of about 0.560.
Using the input difference:
(0001,0006), (0002,000c), (0004,0018), (0008,0030),
(0010,0060), (0020,00c0), (0040,0180), (0080,0300),
(0100,0600), (0200,0c00), (0400,1800), (0800,3000),
(1000,6000), (2000,c000), (4000,8001), (8000,0003),
can obtain 11-round ND of Simon32/64 with an accuracy of about 0.560.
Using the input difference:
(0001,4004), (0002,8008), (0004,0011), (0008,0022),
(0010,0044), (0020,0088), (0040,0110), (0080,0220),
(0100,0440), (0200,0880), (0400,1100), (0800,2200),
(1000,4400), (2000,8800), (4000,1001), (8000,2002),
can get 11-round ND of Simon32/64 with an accuracy of about 0.549.
It can be found that the effect of the 16 input differeces (0x0001 ,0x4004 ) () is slightly inferior to the other 64 () input differeces for 11-round ND.
Then, with the input differences (0x0,0x1), (0x1,0x4),(0x1,0x104),(0x1,0x6),(0x1,0x4004) separately, we construct 12-round ND of Simon32/64 by using the basic training method. The results are shown in Table 3. It shows the accuracy exceeds 0.50 except for the input difference (0x0,0x1) ((0x0,0x1) can get an accuracy of 0.5142 by using the staged training method). Therefore, a total of 64 input differences can make 12-round ND obtain non-marginal advantage by using the basic training method. Meanwhile, the input differential (0x1,0x4) and (0x1,0x6) performed the best, with an accuracy of 0.5152.
Cipher | Input Difference | Acc | TPR | TNR |
---|---|---|---|---|
Simon 32/64 | (0x0000,0x0001) | 0.5004 | 0.1149 | 0.8857 |
(0x0001,0x0004) | 0.5152 | 0.4799 | 0.5505 | |
(0x0001,0x0104) | 0.5151 | 0.4901 | 0.5401 | |
(0x0001,0x0006) | 0.5152 | 0.4852 | 0.5453 | |
(0x0001,0x4004) | 0.5135 | 0.4331 | 0.5940 |
The Second Stage
The NDs are prepended with 3 rounds of CDs in [18], so we use 3 rounds prepended CDs as a benchmark to test the performance of the input differential filtered in the first stage. An SMT solver is used to determine the probability of prepended CDs. We first decide if a differential characteristic with probability exists, then enumerate all differential characteristics with a probability of . The results are presented in Table 4. It can be seen that the probability of the 3 rounds prepended CDs with the input difference (0x0000 ,0x0001 ), (i.e., (0,)) are the highest, followed by 2-bit input differential (0x0001 ,0x0004 ), , and the worst are (0x0001 ,0x4004 ), .
As a result, after these two steps of filtering, the input difference (0,) is possibly the best option for hybrid distinguishers. Meanwhile, the input difference (0x0001 ,0x0004 ), is also a good choice. But we cannot yet give a clearer opinion on how much is set.
HW | ND’s Acc | ND’s TPR | ND’s TNR | Prepended CDs (3-round) | |
---|---|---|---|---|---|
1-bit | (0000,0001) | 0.5607 | 0.5407 | 0.5807 | |
2-bit | (0001,0004) | 0.5602 | 0.5059 | 0.6145 | |
3-bit | (0001,0104) | 0.5601 | 0.5024 | 0.6179 | |
(0001,0006) | 0.5597 | 0.4972 | 0.6221 | ||
(0001,4004) | 0.5495 | 0.4433 | 0.6557 |
5 (Related-key) Differential-Neural Distinguishers for Round-Reduced Simon32/64 and Simon64/128
In this section, the NDs are trained using the basic training method and the staged training method. The training model is based on Section 3.
5.1 Differential-Neural Distinguishers
Simon32/64
Training using the basic scheme. Using the input difference (0x0000,0x0040), we build NDs against Simon32/64 cover to 9-, 10-, and 11-round with 0.9176, 0.6975, and 0.5609 accuracy, respectively. Using the input difference (0x0001,0x0004), we build 12-round ND with 0.5152 accuracy. Table 1 presents the results.
Note that for NDs fed with single ciphertext pairs, with multiple ciphertext pairs with the same label, one can directly obtain a combine-response distinguisher (CRD) using the formula (3) in [16]. Similar to the NDs fed with multiple ciphertext pairs, the CRDs’ accuracy improves quickly with increasing the number of ciphertext pairs. Therefore, we compare the accuracy of NDs with CRDs under the number of ciphertext pairs with the same label. Compared with [18], the accuracy of our NDs are improved.
Training using the Staged Training Method. We also use several stages of pre-training to train a 12-round differential-neural distinguisher for Simon32/64. In the first stage, the best 10-round distinguisher is retained to recognize 9-round Simon32/64 with the input difference (0x0440,0x0100). The number of samples for training and for testing are and , respectively. The number of epochs is 30 and the learning rate is .
In the second stage, the best network of the first stage is retained to recognize 12-round Simon32/64 with the input difference (0x0000,0x0040). For this stage, and examples are freshly generated for training and testing, respectively. The learning rate is for 30 epochs.
Cyclical learning rates are also used for these training stages, the first and second stage both use a minimum learning rate of 0.0001 and a maximum of 0.001. All cycle lengths in these stages are set to 30 epochs. Eventually, the resulting ND achieves an accuracy of 0.5142.
Simon64/128
Training using the basic scheme. Based on the input difference (0x00000000,0x00000040), the NDs reach 0.9181, 0.7117, 0.5722, and 0.5148 accuracy for 11-, 12-, 13-, and 14-round, respectively. As shown in Table 1, the results are summarized.
Training using the Staged Training Method. The best 14-round distinguisher for Simon64/128 is trained using the staged training method.
In the first stage, the retained best 12-round distinguisher is trained and tested with 11-round and samples of Simon64/128 with the input difference (0x00000440,0x00000100). The number of epochs is 30 and the learning rate is . The learning rate scheduler used in this stage is cyclic lr.
Then the best network from the first stage is trained in the second stage. The number of examples for training and for testing are and , using 14-round Simon64/128 data with the input difference (0x00000000,0x00000040). This stage is done in 30 epochs with learning rate of . The learning rate scheduler used in this stage is cyclic lr. Finally, the accuracy of the resulting ND is 0.5185.
5.2 Related-key Differential-Neural Distinguishers
We use the basic training method to train the related-key differential-neural distinguishers. Based on the plaintext difference (0x0000,0x0040) and the key difference (0x0000,0x0000,0x0000,0x0040), we enjoy 1, 0.9604, 0.6477, and 0.5262 accuracy for 10-, 11-, 12-, and 13-round RKNDs against Simon32/64, respectively.
Based on the plaintext difference (0x00000000,0x00000040) and the key difference (0x00000000,0x00000000,0x00000000,0x00000040), we build RKNDs cover to 12-, 13-, and 14-round with 0.9880, 0.8398, and 0.5788 accuracy for Simon64/128, respectively. To the best of our knowledge, this is the first successful application of the RKNDs against Simon-like ciphers.
5.3 Experiment with Different Data Format
In order to improve the accuracy of the ND, we introduce a new data format suitable for the network architecture in this paper. Here, we explain the reason for choosing this data format. We mainly compare the effect of the different data format on the performance of the network based on the experiment of 9-, 10-, and 11-round NDs for Simon32/64.
We use the basic method to train the 9-, 10-, and 11-round NDs based the input difference (0x0000,0x0040), batch size 30000, and cyclic lr. The results are presented in Table 5.
It shows that the NDs using data formats of , , can achieve 11-round, and the accuracy with data format is greater than others. This is the primary cause for using this data format in the paper.
Meanwhile, it is noted that the accuracy dropped when the component was deleted from the data format , i.e., the neural network benefits from providing data . In fact, denotes the partial , and it can be determined without the round key when the ciphertext pair is given.
It is important to note that this comparison is only to show that the data format used in this paper better matches the current network for better performance. Different results may occur when the network is changed.
Cipher | Round | Data Format | Acc | TPR | TNR | Source |
---|---|---|---|---|---|---|
Simon 32/64 | 9 | 0.7524 | 0.7304 | 0.7743 | [16] | |
0.6895 | 0.6613 | 0.7176 | [28] | |||
0.8908 | 0.8786 | 0.9031 | [18] | |||
0.8945 | 0.8834 | 0.9057 | This Paper. | |||
0.9176 | 0.9052 | 0.9299 | This Paper. | |||
10 | 0.5007 | 0.7015 | 0.2989 | [16] | ||
0.5605 | 0.5402 | 0.5809 | [28] | |||
0.6856 | 0.6610 | 0.7102 | [18] | |||
0.6889 | 0.6639 | 0.7139 | This Paper. | |||
0.6975 | 0.6662 | 0.7287 | This Paper. | |||
11 | 0.5006 | 0.4148 | 0.5863 | [16] | ||
0.5007 | 0.8110 | 0.1898 | [28] | |||
0.5555 | 0.5437 | 0.5673 | [18] | |||
0.5578 | 0.5455 | 0.5700 | This Paper. | |||
0.5609 | 0.5366 | 0.5852 | This Paper. |
6 (Related-key) Differential-Neural Distinguishers for Round-Reduced Simeck32/64 and Simeck64/128
Simeck is a lightweight block cipher family that combines the good design components of Simon and Speck to make it even more compact and efficient. In this section, we build NDs and RKNDs for round-reduced Simeck32/64 and Simeck64/128.
6.1 Differential-Neural Distinguishers
Simeck32/64
Training using the basic scheme. Using the input difference (0x0000,0x0040), we build NDs against Simeck32/64 cover to 9-, 10-, and 11-round with 0.9952, 0.7354, and 0.5646 accuracy, respectively. The results are presented in Table 1.
Training using the Staged Training Method. A 12-round differential-neural distinguisher for Simeck32/64 is also obtained by utilizing several stages of pre-training.
The first stage selects the best 10-round distinguisher to recognize 9-round Simeck32/64 with the input difference (0x0140,0x0080). Note that the most likely difference to appear three rounds after the input difference (0x0000,0x0040) is (0x0140,0x0080), and the probability is about .
It freshly generates and samples to train and test the distinguisher, respectively. This stage has 30 epochs and a learning rate of . The learning rate scheduler used in this stage is cyclic lr.
The best network obtained from the first stage is retained to recognize 12-round Simeck32/64 with the input difference (0x0000,0x0040). The number of examples for training and for testing are and , respectively. The number of epochs is 30 and the learning rate is . The learning rate scheduler used in this stage is cyclic lr. Lastly, the ND produced has an accuracy of 0.5146.
Simeck64/128
Training using the basic scheme. Similarly, based on the input difference (0x00000000,0x00000040), the NDs reach accuracies of 0.9142, 0.7663, 0.6356, 0.5577, and 0.5202 for 14-, 15-, 16-, 17-, and 18-round, respectively. The results are shown in Table 1.
Training using the Staged Training Method. We use the staged training method to obtain the best 18-round distinguisher for Simeck64/128.
In the first stage, the retained best 16-round distinguisher is trained and tested with 15-round and samples of Simeck64/128 with the input difference (0x0000140,0x00000080). The number of epochs is 30 and the learning rate is .
Then the best network from the first stage is trained in the second stage. The number of freshly generated examples for training and for testing are and , using 18-round Simeck64/128 data with the input difference (0x00000000,0x00000040). This stage is done in 30 epochs with learning rate of .
Cyclical learning rates are used for these training stages, the first and second stage both use a minimum learning rate of 0.0001 and a maximum of 0.001. All cycle lengths in these stages are set to 30 epochs. As a final result, the ND produced has an accuracy of 0.5218.
6.2 Related-key Differential-Neural Distinguishers
For related-key differential-neural distinguishers, based on the input difference (0x0000,0x0040) and the key difference (0x0000,0x0000,0x0000,0x0040), it covers to 13-, 14-, and 15-round with 0.9950, 0.6679 and 0.5467 accuracy for Simeck32/64, respectively.
For Simeck64/128, based on the input difference (0x00000000,0x00000040) and the key difference (0x00000000,0x00000000,0x00000000,0x00000040), it cover to 18-, 19-, 20-, 21-, and 22-round with 0.9066, 0.7558, 0.6229, 0.5519, and 0.5180 accuracy for Simeck64/128, respectively. It can be seen the gap of RKNDs for Simon and Simeck is obvious, and Simon’s key-expansion algorithm offers better resistance. This is consistent with the conclusion that Lu et al. get using rotational-XOR cryptanalysis in [30].
7 Conclusion
In this paper, we provide an in-depth analysis of the (related-key) differential-neural distinguishers for Simon and Simeck ciphers. We adopt the multiple ciphertext pairs with data of the form fed to the neural network to improve the accuracy of the neural distinguisher. Meanwhile, we investigate the impact of input difference on the performance of the hybrid distinguishers to select the appropriate input difference. For Simon32/64, Simon64/128, Simeck32/64 and Simeck64/128, we construct the (related-key) differential-neural distinguishers with higher accuracy.
It is undeniable that there are many factors that can affect the performance of neural distinguishers. This paper explores its impact on the performance of neural distinguishers from the perspective of data format and input difference. In the future, we plan to further explore ways that can improve the performance of neural networks from multiple dimensions, such as using methods of feature engineering to extract more essential features of the training data and so on.
This work was supported in part by the National Key Research and Development Program of China [No.2021YFB3100800]; and the State Key Laboratory of Information Security [2020-MS-02]; and the National Natural Science Foundation of China [grant numbers 61872379, 61702537]; and the Academy of Finland [grant number 331883].
Data availability
The data underlying this article are available in the article and in its online supplementary material.
References
- [1] Biham, E. and Shamir, A. Differential cryptanalysis of des-like cryptosystems. Journal of CRYPTOLOGY, 4, 3–72.
- [2] Matsui, M. Linear cryptanalysis method for des cipher. Workshop on the Theory and Application of of Cryptographic Techniques, pp. 386–397. Springer.
- [3] Knudsen, L. and Wagner, D. Integral cryptanalysis. International Workshop on Fast Software Encryption, pp. 112–127. Springer.
- [4] Bogdanov, A. and Rijmen, V. Linear hulls with correlation zero and linear cryptanalysis of block ciphers. Designs, codes and cryptography, 70, 369–383.
- [5] Mouha, N., Wang, Q., Gu, D., and Preneel, B. Differential and linear cryptanalysis using mixed-integer linear programming. International Conference on Information Security and Cryptology, pp. 57–76. Springer.
- [6] Sun, S., Hu, L., Wang, P., Qiao, K., Ma, X., and Song, L. Automatic security evaluation and (related-key) differential characteristic search: application to simon, present, lblock, des (l) and other bit-oriented block ciphers. International Conference on the Theory and Application of Cryptology and Information Security, pp. 158–178. Springer.
- [7] Mouha, N. and Preneel, B. A proof that the arx cipher salsa20 is secure against differential cryptanalysis. IACR Cryptol. ePrint Arch., 2013, 328.
- [8] Kölbl, S., Leander, G., and Tiessen, T. Observations on the simon block cipher family. Annual Cryptology Conference, pp. 161–185. Springer.
- [9] Minier, M., Solnon, C., and Reboul, J. Solving a symmetric key cryptographic problem with constraint programming. ModRef 2014, Workshop of the CP 2014 Conference 13.
- [10] Gerault, D., Minier, M., and Solnon, C. Constraint programming models for chosen key differential cryptanalysis. International Conference on Principles and Practice of Constraint Programming, pp. 584–601. Springer.
- [11] LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. nature, 521, 436–444.
- [12] Bengio, Y., Lecun, Y., and Hinton, G. Deep learning for ai. Communications of the ACM, 64, 58–65.
- [13] Rivest, R. L. Cryptography and machine learning. International Conference on the Theory and Application of Cryptology, pp. 427–439. Springer.
- [14] Maghrebi, H., Portigliatti, T., and Prouff, E. Breaking cryptographic implementations using deep learning techniques. International Conference on Security, Privacy, and Applied Cryptography Engineering, pp. 3–26. Springer.
- [15] Hospodar, G., Gierlichs, B., De Mulder, E., Verbauwhede, I., and Vandewalle, J. Machine learning in side-channel analysis: a first study. Journal of Cryptographic Engineering, 1, 293.
- [16] Gohr, A. Improving attacks on round-reduced speck32/64 using deep learning. Annual International Cryptology Conference, pp. 150–179. Springer.
- [17] Benamira, A., Gerault, D., Peyrin, T., and Tan, Q. Q. A deeper look at machine learning-based cryptanalysis. Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 805–835. Springer.
- [18] Bao, Z., Guo, J., Liu, M., Ma, L., and Tu, Y. Enhancing differential-neural cryptanalysis. International Conference on the Theory and Application of Cryptology and Information Security. Springer.
- [19] Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., and Wingers, L. The simon and speck lightweight block ciphers. Proceedings of the 52nd Annual Design Automation Conference, pp. 1–6.
- [20] Yang, G., Zhu, B., Suder, V., Aagaard, M. D., and Gong, G. The simeck family of lightweight block ciphers. International Workshop on Cryptographic Hardware and Embedded Systems, pp. 307–329. Springer.
- [21] Biham, E. New types of cryptanalytic attacks using related keys. Journal of Cryptology, 7, 229–246.
- [22] Jakimoski, G. and Desmedt, Y. Related-key differential cryptanalysis of 192-bit key aes variants. International Workshop on Selected Areas in Cryptography, pp. 208–221. Springer.
- [23] Ko, Y., Hong, S., Lee, W., Lee, S., and Kang, J.-S. Related key differential attacks on 27 rounds of xtea and full-round gost. International Workshop on Fast Software Encryption, pp. 299–316. Springer.
- [24] Biryukov, A. and Nikolić, I. Automatic search for related-key differential characteristics in byte-oriented block ciphers: Application to aes, camellia, khazad and others. Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 322–344. Springer.
- [25] He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- [26] Hu, J., Shen, L., and Sun, G. Squeeze-and-excitation networks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
- [27] Chen, Y., Shen, Y., Yu, H., and Yuan, S. A new neural distinguisher considering features derived from multiple ciphertext pairs. bxac019.
- [28] Hou, Z., Ren, J., and Chen, S. Improve neural distinguishers of simon and speck. Security and Communication Networks, 2021.
- [29] Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization.
- [30] Lu, J., Liu, Y., Ashur, T., and Li, C. On the effect of the key-expansion algorithm in simon-like ciphers. The Computer Journal, 65, 2454–2469.