On the Security of Bitstream-level JPEG Encryption with Restart Markers

\authorblockN Mare Hirose\authorrefmark1, Shoko Imaizumi\authorrefmark1 and Hitoshi Kiya\authorrefmark2 \authorblockA \authorrefmark1 Chiba University, Chiba, Japan
E-mail: [email protected], [email protected] \authorblockA \authorrefmark2 Tokyo Metropolitan University, Tokyo, Japan
E-mail: [email protected]

Abstract

This paper aims to evaluate the security of a bitstream-level JPEG encryption method using restart (RST) markers, where encrypted image can keep the JPEG file format with the same file size as non-encrypted image. Data encrypted using this method can be decoded without altering header information by employing a standard JPEG decoder. Moreover, the use of RST markers enables the definition of extended blocks divided by the markers, so spatially partial encryption and block-permutation-based encryption can be carried out. However, the security of the method was evaluated only with respect to the key space analysis for brute-force attacks and other limited attacks. Accordingly, in this paper, we evaluated the security of the method with respect to robustness against ciphertext-only attacks including state-of-the-art attacks. In experiments, the method is compared with conventional encryption methods, and it is confirmed to be robust against ciphertext-only attacks if parameters used for image encryption are carefully chosen.

1 Introduction

The use of JPEG images has grown due to the expansion of social networking services. These platforms generally restrict the use of various file formats. In many cases, images have to adhere to the JPEG standard. Furthermore, these platforms are not reliable owing to incidents, including information leakage. Accordingly, many encryption methods that keep the JPEG format have been proposed to date. Recently, a bitstream-level encryption method for JPEG compression was proposed, and it was confirmed to outperform conventional encryption methods in some important respects [1]. However, it has not been verified enough yet with respect to the robustness of the encryption against various attacks. Accordingly, we aim to evaluate the security of the method against ciphertext-only attacks (COAs) in this paper.

Bitstream-level encryption is one of the types of encryption combined with image compression. Compared with other types, this type of encryption has some advantages. For example, it permits us not just to keep the JPEG file format but also to use standard JPEG encoders. In addition, some methods of this type ensure that the file size remains the same before and after encryption, so encryption can be carried out by hooking images within a transmission channel. One state-of-the-art method [1] can also generate partially encrypted images, but all other bitstream-level methods cannot.

In [1], the restart (RST) marker, placed at fixed intervals between minimum coded units (MCUs), is used for encryption. This marker contributes to the mixing of encrypted and unencrypted regions. Furthermore, MCUs divided by RST markers can be defined as extended blocks, and the visibility of the encrypted image can be reduced by permuting the position of these blocks. This block permutation is also expected to enhance security strength.

Accordingly, we focus on the above state-of-the-art method to evaluate the attack resistance of the method against ciphertext-only attacks (COAs) in this paper. Key space analysis, key sensitivity analysis, the non-zero counting attack (NZCA), and histogram analysis are performed, and the relationship between the use of RST markers and the attack resistance is discussed. In experiments, it is demonstrated that the RST marker, used by the method used in image encryption for the first time, can enhance the robustness against a variety of attacks in general, and the attack resistance of encrypted images depends on the selection of restart interval values.

2 Related Work

The encryption method discussed in this paper is a method coupled with image compression that is able to produce JPEG files containing obscured visual information [1]. Accordingly, conventional encryption methods coupled with image compression are reviewed here. Furthermore, the structure of JPEG bitstreams is briefly explained.

2.1 Combined Use of Encryption and Compression

Encryption methods coupled with image compression are categorized into four types as follows. Type 1 is compression-then-encryption (CtE) using standard cryptography, in which images are compressed and then encrypted using a standard cryptography including AES (advanced encryption standard). Type 2 is encryption-then-compression (EtC), in which the visual information of images is secured through a perceptual encryption method, referred to as compressible encryption, and afterward the encrypted images undergo compression[2, 3]. Type 3 combines compression and encryption. For type 3, encryption and compression are processed at the same time (see Fig. 1) [4, 5, 6, 7, 8]. Therefore, when JPEG images are provided before encryption, the images need to be decompressed prior to performing encryption. Furthermore, standard encoders cannot be used in this framework. Type 4 is CtE using a bitstream-level encryption method (see Fig. 1), in which images undergo compression and then encryption using a bitstream-level encryption method [9, 10, 11, 12, 1]. Type 4 permits us not just to keep the JPEG format but also to use standard JPEG decoders.

We focus on a state-of-the-art method [1], which is a type 4 method for JPEG images. The method enables us to produce encrypted images with different levels of visibility, such as partially encrypted images.

2.2 JPEG Bitstream and Marker Codes

Here, we describe the structure of the JPEG bitstream. Fig. 2 illustrates an example of the structure of JPEG bitstreams [14]. Codes called SOI and EOI, which respectively signify the beginning and end of a bitstream, are allocated. These are two-byte codes called marker codes. Segments include information necessary for decoding, such as Huffman tables and quantization tables. Marker codes are also allocated at the beginning of each segment. The first byte of the marker code is fixed at FF₍₁₆₎, and the second byte indicates the type of each marker. The second byte of the marker code distinguishes each segment. Note that FF00₍₁₆₎ is not defined as a marker code.

Fig. 3 depicts the structure of the image data of a JPEG bitstream for the case of 4:2:0 color subsampling. Image data is divided and processed in units called minimum coded units (MCUs). Each MCU is composed of four luminance components (Y) and two chroma components (Cb, Cr). Each component contains a DC coefficient and AC coefficients, and the DC coefficient is recorded as the difference value from the previous DC coefficient. Finally, these bit sequences are divided and stored byte by byte. Due to this process, FF₍₁₆₎ may occur in the image data. Therefore, the JPEG encoder inserts 00₍₁₆₎ immediately after FF₍₁₆₎ occurs in the data to tell it apart from the first byte of a marker code. Additionally, when the JPEG decoder detects FF00₍₁₆₎, it reads only FF₍₁₆₎ and skips 00₍₁₆₎. This process is called “byte stuffing.”

One of the JPEG marker codes is called the RST marker, which can be placed at fixed intervals among MCUs. The restart interval ( $RI$ ), which is the interval at which the RST marker is placed, is set during JPEG encoding. When a RST marker is not used, an error that arises in the DC coefficient in a bitstream propagates to the end of the bitstream. In contrast, the value of a DC coefficient following a RST marker is stored as its original value. Therefore, using RST markers prevents errors from propagating among DC coefficients in bitstreams.

3 Security Evaluation

3.1 Bitstream-level JPEG Encryption with RST Markers

The bitstream-level JPEG encryption method with RST markers [1] is summarized here. Fig. 4 shows the encryption process of the method [1]. The method utilizes RST markers for encryption, where a sequence of MCUs divided by RST markers is referred to as an extended block (see Fig. 5). The encryption procedure for this method is shown below.

Step 1:

Select an $RI$ and prepare a bitstream including RST markers to be encrypted.
Step 2:

Select encryption regions.
Step 3:

Extract bytes that satisfy encryption conditions from the extended blocks to maintain the same file size as that before encryption.
Step 4:

Prepare a binary sequence of pseudorandom numbers (PRNs) by using secret K1. Perform an exclusive-or (XOR) operation between additional bits of the extracted bytes and the sequence of PRNs.
Step 5:

Replace the additional bits with the XOR results.
Step 6:

Randomly permute the positions of the extended blocks defined in Step 1 with secret K2.

The encryption conditions in Step 3 were discussed in [12, 13], in which the bytes of a bitstream are categorized into five cases, as shown in Fig. 6, where the white, black, and red areas denote Huffman codes (HCs), additional bits, and bytes placed immediately after FF₍₁₆₎, respectively. Each pattern is defined below.

Pattern 1:

Consists solely of HCs.
Pattern 2:

Consists solely of additional bits.
Pattern 3:

Consists of HCs and additional bits, and every bit in the HC is 1.
Pattern 4:

Consists of HCs and additional bits, and the HC contains 0.
Pattern 5:

Consists solely of 0, and the byte is placed immediately after FF₍₁₆₎.

Among these patterns, only additional bits in Pattern 4 can be encrypted.

In [1], each extended block is encrypted separately. Thus, it is possible to generate a partially encrypted image that contains both unencrypted and encrypted regions. Fig. 7 illustrates an example of images encrypted with this method.

3.2 Threat Model

A threat model includes a set of assumptions, such as an adversary’s goals, knowledge, and capabilities. The aim of an attacker is to restore visual information from encrypted data. We assume that the attacker is able to use encrypted data and the encryption algorithm but does not have the secret key. Accordingly, the attacker can only perform ciphertext-only attacks (COAs) using encrypted images.

3.3 Security Analysis

Several COAs have been studied to restore visual information from encrypted images [4, 10, 11, 13]. In this paper, we use key space analysis, key sensitivity analysis, the non-zero counting attack (NZCA), and histogram analysis for security evaluation. The relationship between the use of RST markers and the attack resistance is discussed.

3.3.1 Key space analysis

Key space is the total number of patterns that can be produced by a encryption algorithm. Here, we analyze the encryption method in the case where all DCT coefficients in the whole image are encrypted. The algorithm encrypts additional bits within bytes that satisfy specific conditions and permutes the positions of the extended blocks, as shown in 3.1. We assume that JPEG encoding is applied to an image with a size of $M\times N$ under $RI=$ r and 4:2:0 color subsampling. First, considering that the number of bytes to be encrypted is $T$ , the minimum and maximum sizes of the key space derived from the encryption of the additional bits $S_{enc\_min}$ and $S_{enc\_max}$ are given by

	$\displaystyle S_{enc\_min}=2^{T},$		(1)
	$\displaystyle S_{enc\_max}=2^{7T}.$		(2)

Note that the minimum number of bits to be encrypted is one, and the maximum is seven within a byte that satisfies the encryption conditions. The key space obtained from the permutation of the extended blocks, $S_{bp}$ , is expressed as

\displaystyle S_{bp}=\left\lfloor\left(\left\lceil\frac{M}{16}\right\rceil\times\left\lceil\frac{N}{16}\right\rceil\times\frac{1}{r}\right)\right\rfloor!.

(3)

Accordingly, the overall minimum key space $S_{min}$ and maximum key space $S_{max}$ are given by

$\displaystyle S_{min}$	$\displaystyle=S_{enc\_min}\times S_{bp}$
	$\displaystyle=2^{T}\times\left\lfloor\left(\left\lceil\frac{M}{16}\right\rceil\times\left\lceil\frac{N}{16}\right\rceil\times\frac{1}{r}\right)\right\rfloor!,$	(4)
$\displaystyle S_{max}$	$\displaystyle=S_{enc\_max}\times S_{bp}$
	$\displaystyle=2^{7T}\times\left\lfloor\left(\left\lceil\frac{M}{16}\right\rceil\times\left\lceil\frac{N}{16}\right\rceil\times\frac{1}{r}\right)\right\rfloor!.$	(5)

Equation (3) shows that the number of extended blocks increases as $RI$ becomes shorter. Therefore, an encrypted image with a shorter $RI$ has a larger key space. The algorithm for the method with RST markers can use a shorter $RI$ and thus can provide a larger key space. For example, if the image in Fig 7 is encoded with $r$ = 4 as an $RI$ , $T$ = 37,031 is given. Thus, the minimum key space $S_{min}$ can be expressed by

\displaystyle S_{min}

\displaystyle=2^{37,031}\times\left\lfloor\left(\left\lceil\frac{384}{16}\right\rceil\times\left\lceil\frac{512}{16}\right\rceil\times\frac{1}{4}\right)\right\rfloor!>2^{256}.

(6)

This is sufficiently larger than $2^{256}$ , which is the space of a key with 256 bits. If we use $r$ = 2, $S_{min}$ is larger than in the case of $r$ = 4. Accordingly, the bitstream-level encryption method with RST markers has enough key space.

3.3.2 Key sensitivity analysis

Attack-resistant encryption methods should be sensitive even to a slight change in the key. We analyze key sensitivity by considering two cases:

Case 1:

Encryption using an encryption key that differs from the original key by only one bit.
Case 2:

Decryption using an incorrect key that differs from the correct key by only one bit.

In Case 1, the encrypted image must be wholly different from the image encrypted with the original key. In Case 2, the image decrypted with an incorrect key must be wholly different from the original image.

3.3.3 NZCA

The sketch attack tries to obtain contour information of the original one from the encrypted image. We use the non-zero-counting attack (NZCA), which is a sketch attack for encrypted JPEG images [4]. This attack attempts to restore the original image’s contours on the basis of the number of non-zero coefficients in each MCU of the encrypted image. The use of RST markers will be evaluated to generate encrypted images robust against NZCA.

3.3.4 Histogram analysis

The histogram is very useful for an adversary to analyze a data distribution graphically in the encrypted domain. An adversary tries to analyze the distribution of pixel frequencies in encrypted images to apply attacks. Accordingly, attack-resistant encryption methods should provide encrypted images with histograms similar to those of other encrypted images.

4 Experimental Results

4.1 Setup

In experiments, we used 1,338 test images with 384 × 512 pixels or 512 × 384 pixels from Uncompressed Colour Image Database [15]. JPEG compression was performed using libjpeg [16], with the color subsampling set to 4:2:0 and the quality factor Q set to 80. Binary sequences of PRNs were prepared using HMAC_DRBG [17] with a 384-bit key. For all images, we encrypted the whole image and permuted the positions of all extended blocks.

4.2 Protection of Visual Information

The encryption method with RST markers can permute the positions of blocks at random, thereby strongly protecting the visual information of images, compared with previous methods without block permutation [12, 13, 5]. For objectively evaluating the protection strength, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) values of encrypted images are shown as boxplots in Fig. 8. From the figure, it is confirmed that the method generates encrypted images without identifiable visual information.

4.3 Key Sensitivity

Fig. 9 illustrates the results of the key sensitivity analysis, where the SSIM values indicate the difference between two images. In Fig. 9, two images were encrypted with keys that differed by a single bit. In contrast, in Fig. 9, the two images were decoded from an encrypted one by using the correct key and an incorrect key that differed from the correct key by a single bit. From the figure, the method was verified to have sufficiently high key sensitivity.

4.4 NZCA

NZCA was carried out on the luminance component of an encrypted image. As illustrated in Fig. 10, in the previous method without RST markers [12, 13], the original image was revealed in its contour. In contrast, the method with RST markers concealed the original image. This was attained by replacing extended blocks. However, when a large $RI$ was adopted, the outline of the original image was revealed in some areas. Accordingly, the use of RST markers is important for enhancing robustness against attacks, but the value of $RI$ should be carefully selected.

4.5 Histogram Analysis

Fig. 11 illustrates the R, G, and B histograms of the original ucid00459 image, which is shown in Fig. 7, and its encrypted images. In the figure, there are histograms for four encrypted images. Three of them were derived by using our method with RST markers, and the other was derived by using the previous method. In our method with RST markers, $RI$ was defined as 2, 4, and 8. The images encrypted using this method were compared with the original image and those encrypted with the previous method. Attack-resistant encryption methods should provide an encrypted image with a totally different histogram from that of the original image and analogous histograms among the three color channels. From the figure, we can see that the R, G, and B channels in the encrypted image have analogous histograms. In addition, the use of a small $RI$ can strongly reduce the identifying characteristics of the image. Other images were verified to have similar trends.

5 Conclusion

In this paper, we conducted security analyses on a state-of-the art encryption method with RST markers against ciphertext-only attacks. It was confirmed that the RST markers used in the method contribute to enhancing security in terms of key space, key sensitivity, NZCA, and histogram analysis. As a result, the method was demonstrated not only to outperform conventional JPEG encryption methods in some important respects but also to maintain a high security level. Furthermore, it was shown that the attack resistance varies with the length of the restart interval.

Acknowledgment

This work was partially supported by JSPS KAKENHI Grant Number JP21H01327.

References

[1] M. Hirose, S. Imaizumi, and H. Kiya, “Encryption Method for JPEG Bitstreams for Partially Disclosing Visual Information,” Electronics, vol. 13, no. 11, 2024.
[2] W. Sirichotedumrong, T. Chuman, S. Imaizumi, and H. Kiya, “Grayscale-Based Block Scrambling Image Encryption for Social Networking Services,” in Proc. IEEE Int. Conf. Multimed. Expo, pp. 1–6, 2018.
[3] H. Kiya, A. P. M. Maung, Y. Kinoshita, S. Imaizumi, and S. Shiota, “An overview of compressible and learnable image transformation with secret key and its applications,” APSIPA Trans. Signal Inf. Process., vol. 11, no. 1, 2022.
[4] Y. Peng, C. Fu, G. Cao, W. Song, J. Chen, and C. W. Sham, “JPEG-compatible Joint Image Compression and Encryption Algorithm with File Size Preservation,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 20, no. 4, pp. 1–20, 2024.
[5] K. Shimizu and T. Suzuki, “Finely Tunable Bitcuboid-based Encryption with Exception-free Signed Binarization for Jpeg Standard,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 4895–4908, 2021.
[6] W. Cao, X. Leng, T. Yu, X. Gu, and Q. Liu, “A Joint Encryption and Compression Algorithm for Multiband Remote Sensing Image Transmission,” Sensors, vol. 23, no. 17, 2023.
[7] O. Watanabe, A. Nakazaki, and H. Kiya, “A fast image-scramble method using public-key encryption allowing backward compatibility with JPEG2000” in Proc. Int. Conf. Image Process., pp. 3435–3438, 2004.
[8] P. Li, Z. Sun, Z. Situ, M. He, and T. Song, “Joint JPEG compression and encryption scheme based on order-8-16 block transform,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 7, pp. 7687-7696, 2023.
[9] Y. Yuan, H. He, Y. Yang, N. Mao, F. Chen, and M. Ali, “JPEG Image Encryption with Grouping Coefficients Based on Entropy Coding,” J. Vis. Commun. Image Represent., vol. 97, 2023.
[10] J. He, S. Huang, S. Tang, and J. Huang, “JPEG Image Encryption With Improved Format Compatibility and File Size Preservation,” IEEE Trans. Multimedia, vol. 20, no. 10, pp. 2645–2658, 2018.
[11] C. Qin, J. Hu, F. Li, Z. Qian, and X. Zhang, “JPEG Image Encryption with Adaptive DC Coefficient Prediction and RS Pair Permutation,” IEEE Trans. Multimedia, vol. 25, pp. 2528–2542, 2022.
[12] H. Kobayashi and H. Kiya, “Bitstream-based Jpeg Image encryption with File-size Preserving,” in Proc. IEEE Glob. Conf. Consum. Electron., pp. 384–387, 2018.
[13] H. Kobayashi and H. Kiya, “File-Size Preserving Encryption of JPEG Images in the Bitstream Domain,” IEICE Trans. Inf.& Syst. Jpn. Ed., vol. J102-D, no. 12, pp. 787–795, 2019.
[14] Information technology—Digital Compression and Coding of Continuous-Tone still Images:Requirements and Guidelines. International Organization for Standardization ISO/IEC IS-10918-1, 1994.
[15] G. Schaefer and M. Stich, “UCID: An Uncompressed Color Image Database,” in Proc. SPIE, vol. 5307, pp. 472–480, 2003.
[16] [Online] Available:URL:https://www.ijg.org
[17] Barker, E. B. and Kelsey, J. M, SP 800-90A. Recommendation for Random Number Generation Using Deterministic Random Bit Generators. NIST: Gaithersburg, MD, USA, 2012.