Why Shape Coding? Asymptotic Analysis of the Entropy Rate for Digital Images
Abstract
This paper focuses on the limit theory of image compression. It proves that for an image source, there exists a coding method with shapes that can achieve the entropy rate under a certain condition where the shape-pixel ratio in the encoder/decoder is . Based on the new finding, an image coding framework with shapes is proposed and proved to be asymptotically optimal for stationary and ergodic processes. Moreover, the condition of shape-pixel ratio in the encoder/decoder has been confirmed in the image database MNIST, which illustrates the soft compression with shape coding is a near-optimal scheme for lossless compression of images.
Index Terms:
Image compression; Information theory; Entropy rate; Limit theorem; Asymptotic boundsI Introduction and overview
One of Shannon’s outstanding achievements in source coding is pointing out the data compression limit. This result has been widely and successfully applied in stream data compression. But image compression is still a challenging issue. This paper is an attempt to analyze the limit theory of image compression.
I-A Preliminaries
Data compression is one of the bases of digital communications, being used to provide efficient and low-cost communication services. Image is the most important and popular medium in the current information age. Hence, image compression is naturally an indispensable part of data compression [1]. Moreover, its coding efficiency directly affects the objective quality of the communication network and the subjective experiences of users.
As a compression method with strict requirements, lossless image coding focuses on reducing the required number of bits to represent an image without losing quality. It guarantees to cut down the occupation of communication and storage resources as much as possible under certain system or scenario constraints. In the area of big data, image lossless coding may play a more significant role in applications where errors can not be allowed, such as in intelligent medical treatment, digital libraries, semantic communications [2, 3], and metaverse in the future.
Entropy rate is one of the important metrics in information theory, which extends the meaning of entropy from a random variable to a random process. It also characterizes the generalized asymptotic equipartition property of a stochastic process. In this paper, we shall employ entropy rate to explain the best achievable data compression. It is well known that the entropy rate of a stochastic process is defined as
(1) |
If the limit exists, then is the per symbol entropy of the random variables, reflecting how the entropy of the sequence increases with . Moreover, the entropy rate can also be defined as
(2) |
is the conditional entropy of the last random variable given all the past random variables. For a stationary stochastic process, the limits in Eq. (1) and (2) exist and are equal [4]. That is, = . In addition, for a stationary Markov chain, the entropy rate is
(3) | ||||
(4) |
The entropy rate is a long-term sequence metric. Even if the initial distribution of the Markov chain is not a stable distribution, it will still tend to converge as in Eq. (3) and (4). Moreover, for a general ergodic source, the Shannon-McMillan-Breiman theorem points out its asymptotic equipartition property. If is a finite-valued stationary ergodic process, then
(5) |
This indicates the convergence relationship between the joint probability density and entropy rate for the general ergodic process. Following a similar idea as that of the analysis of entropy rate, we investigate the asymptotic property of shape-based coding for stationary image ergodic processes.
I-B Shape Coding
A digital image is composed of lots of pixels arranged in order. This form is fixed and if the size of an image is determined, the number and arrangement mode of pixels are also determined. On the other hand, shape coding extends the basic components of images from pixels to shapes, which is a more flexible coding method and may efficiently utilize image embedding structures. Furthermore, it will no longer limit the number and position of shapes. Shape coding has three main characteristics: (1) The image is formed by filling shapes; (2) The position arrangement of shapes changes from a fixed mode to a random variable; (3) The shape database and codebook are generated in a data-driven way, which clearly contains more inherent features of image databases.
Consider a binary digital image , whose length and width are and , respectively, then the total number of pixels is . Suppose it is divided into shapes , where is the -th shape. We use to denote the shape database and to represent filling an image with shape at position in the -th operation. The image with shape coding can be described as [5]
(6) | ||||
(7) |
where and represent the bit length of the shape and its corresponding location at , respectively. The constraint condition indicates that the binary image can be reconstructed through filling operations, which is exactly the same as the original image. On this premise, shape coding tries to reduce the cost of representing an image as much as possible.
The codebook plays an important role in shape coding. It reflects the statistical characteristics and correlation of the data source. Fig. 1 illustrates the structure of shape coding. It consists of two parts: the generation and use of the codebook. On the one hand, one searches and matches the shape of images in the dataset through a data-driven method. At the same time, the frequency statistical analysis is carried out to generate a shape database. On the other hand, the codebook can be used repeatedly in communication and storage tasks to reduce the occupation of resources. The transmitter/compressor encodes the original image with the codebook. After transmission or storage through the channel/storage medium, the receiver/decompressor can decode the compressed file with the same codebook. In this way, one can completely reconstruct the original image.
I-C Relations to Previous Work
The objective of this work is to present the performance limits from the viewpoint of information theory, which is related to our previous works in [5, 6, 7]. An image encoding method through shapes and data-driven methods can improve image lossless compression. In some known databases, soft compression outperforms the most popular methods such as PNG, JPEG2000, and JPEG-LS. However, there was no theoretical support for how shape-based soft compression methods can reach the performance limit. That is, the gap between soft compression and its compression limit, namely the entropy rate is not theoretically known. On the other hand, the entropy rate associated with the asymptotic equipartition property analysis of images can help us design efficient encoding and decoding algorithms from the viewpoint of Shannon’s information theory.
The earliest multi-pixel joint coding method can be traced back to Symbol-Based coding [8], which transmits or stores only the first instance of each pattern class and thereafter substitutes this exemplar for every subsequent occurrence of the symbols. It achieved a degree of bandwidth reduction on a scan-digitized printed text. Fractal theory [9, 10] is also related to block-based coding. Fractal block coding approximates an original image by relying on the assumption that image redundancy can be efficiently exploited through self-transformability on a blockwise basis. On the other hand, soft compression generates the shape database in a data-driven manner, so as to make the codebook used in the encoder and decoder. Image processing-based data-driven methods such as [11, 12, 13] can explore the essential features of images and even eliminate semantic redundancy. The method of using side information to assist data compression has also been used and analyzed by Kieffer [14] and Kontoyiannis [15]. Verdú [16] provided upper and lower bounds on the optimal guessing moments of a random variable by taking values on a finite set when the side information may be available. Rychtáriková et al. [17] generalized the point information gain and derived point information gain entropy, which may help analyze the entropy rate of an image.
Another connection to this paper is the Lempel-Ziv coding schemes [18]. It proposed the concept of compressibility. For every individual infinite sequence , a quantity is defined. It is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for by any finite-state encoder. Wyner [19] derived theorems concerning the entropy of a stationary ergodic information source and used the results to obtain insight into the workings of the Lempel-Ziv data compression algorithm.
The main contribution of this paper is that we present a sufficient condition that allows us to show that the performance limit of shape-based image coding can be asymptotically achievable in terms of entropy rate.
I-D Paper Outline
The rest of this paper is organized as follows. Section II contains our main results, giving asymptotic properties on shape-based image coding in terms of entropy rate. Moreover, it indicates the relationship between the number of shapes and coding performance. In Section III, we present sample numerical results with concrete examples. In Section IV, we give some complementary remarks and conclude this paper.
II The Asymptotic Properties of Image Sources Composed of Shapes
The encoding method with shapes can take advantage of the characteristics of the data and eliminate redundancy in the spatial and coding domains simultaneously. This section theoretically analyzes the performance of image coding with shapes. It will show that when the numbers of shapes and pixels have a reciprocal logarithm relationship, the average code length will asymptotically approach the entropy rate. To the best of our knowledge, this is the first result of image compression in information theory. The framework of this proof is similar to [4] [19], but there are some important differences.
The average number of bits needed to represent the image with shapes is . Specifically,
(8) | ||||
(9) | ||||
(10) |
where (a) and (b) follow from the fact that the uniform distribution has maximum entropy. That is, and . is the average cost of encoding , which reflects the coding requirements of bits. In the sequel, we use Eq. (10) instead of (8) to scale .
Let be a strictly stationary ergodic process with finite states and . Due to the invariance of time, is an ergodic process, where the th-order Markov approximation is used to make an approximation. We will then have
(11) |
where is the initial state. In this way, one can use the -th order Markov entropy rate to estimate the entropy rate of . That is,
(12) | ||||
(13) | ||||
(14) | ||||
(15) |
When , the entropy rate of the th-order Markov approximation converges to the entropy rate of the original random process.
Suppose that is decomposed into shapes . We define as the bits before , where . Let denote the number of shapes whose size is and its previous state , .
Lemma 1.
For , the joint transition probability and shape size satisfy the following inequality
(16) |
where is a constant.
Proof.
Suppose that for fixed and , the sum of the transition probabilities is less than a constant , i.e.,
(17) |
Then,
(18) | ||||
(19) | ||||
(20) | ||||
(21) | ||||
(22) | ||||
(23) |
where (a) follows from Eq. (11) and (b) follows from Jensen’s inequality, thanks to the convexity of for . ∎
Lemma 1 links the conditional probability to , connecting the concepts before and after decomposing . We will continue to explore the quantitative relationship between shapes and pixels.
Lemma 2.
For , the number and size of its shapes meet the following relationship
(24) |
Proof.
For simplicity, we use to represent . Let , then . We define two random variables and such that
(25) |
The mean of is the average length of each shape, i.e., . A random variable with a geometric distribution has maximum entropy when the mean of a discrete random variable is fixed. Thus, we have,
(26) | ||||
(27) | ||||
(28) |
where (a) is the entropy of a random variable with a geometric distribution and (b) follows that the function is monotonically increasing when . On the other hand, . Thus,
(29) | ||||
(30) | ||||
(31) | ||||
(32) | ||||
(33) |
which completes the proof. ∎
Based on these two lemmas, we will further analyze the condition under which the entropy rate can be reached asymptotically.
Theorem 1.
When the numbers of shapes and pixels meet the reciprocal relation of the logarithm, then the average encoding length will asymptotically approximate the entropy rate. That is,
if
(34) |
then
(35) |
Proof.
From Lemma 1, one can write
(36) | ||||
(37) | ||||
(38) |
For simplicity, we use to represent . Thus,
(39) |
From Lemma 2, it follows that
(40) | ||||
(41) | ||||
(42) |
When and , the three terms in the right-hand side of Eq. (42) will all tend to 0. Combining Eq. (39) and (42), we obtain
(43) |
Then,
(44) | ||||
(45) |
The asymptotic property of the second term in the right-hand side of Eq. (10),
(46) |
Thus,
(47) | ||||
(48) | ||||
(49) |
This shows that when and meet the condition in Eq. (34), the average coding length of will asymptotically approximate the entropy rate . ∎
Theorem 1 sets up a bridge between the shapes and the entropy rate for image sources with ergodic properties. It theoretically indicates what order of magnitude we should have the shapes and pixels. When one encodes images with shapes, the average cost will asymptotically tend to the entropy rate if the numbers of shapes and pixels satisfy the reciprocal relation of the logarithm. Moreover, it gives new insights into designing image compression algorithms in theory.
III Numerical Analysis
Class | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|---|
2.84 | 6.02 | 3.17 | 3.20 | 3.77 | 3.40 | 3.20 | 4.05 | 2.81 | 3.52 | |
0.200 | 0.080 | 0.178 | 0.175 | 0.149 | 0.163 | 0.175 | 0.136 | 0.202 | 0.157 |
Section II points out the asymptotic property of encoding methods based on shapes. When , the average encoding length will asymptotically approximate the entropy rate. It indicates the relationship between the shape-pixel number ratio and coding performance. In this section, we present some numerical results to illustrate that for each ergodic process of an image source, if as , one can get the result of Eq. (35).
Table I reveals the numerical results of MNIST datasets. It includes encoding results and in ten categories with the soft compression algorithm [5]. What can be clearly seen in this table is that for all classes. It is on the order of , which is consistent with the assumption in Theorem 1.
Now we focus on simulated images as an alternative analysis. We use the birth and death processes of two states to simulate a stationary ergodic process. For each case, 5000 with are generated, respectively. We encode with fixed size shapes and observe the effect of on coding performance.
Fig. 2 illustrates the shape coding working mechanism of the image source. It indicates the performance of the encoding method with shapes, in bits per pixel (bpp). Cases 1-5 represent different parameters of the infinitesimal generator matrix of the birth-death process, illustrating the relationship between coding performance and . In different cases, the changing trend of these curves is the same. The bpp decreases with the increase of shape size (i.e., the shape-pixel number ratio decreases), which reflects the gain brought by shape. Moreover, as the shape-pixel number ratio continues to decrease, bpp enters the smoothing region. It also shows that the reduction of the number ratio will not always improve the encoding performance. This is due to the fineness of the model itself, which does not take advantage of the additional statistical information of larger shapes. Note that, the numerical difference between the curves is essentially the difference in the entropy rate.
IV Concluding Remarks
In this paper, we investigated the performance limit of shape-based image compression. Our works answered the open problem of the relationship between image decomposition and lossless compression, which reflects the performance variation in general. Specifically, when the numbers of shapes and pixels have a reciprocal relation to the logarithm, the average code length will asymptotically approach the entropy rate.
For image coding algorithms, one should give full attention to the superiority of shapes in image processing. Likewise, it is necessary to take advantage of the characteristics of the image dataset. Through shapes and data-driven methods, one can use the high-dimensional information of images to help code. Moreover, the asymptotic analysis of the entropy rate can also be extended to gray images and multi-component images with some adjustments.
Finally, it should be noted that this paper focuses on the source part, without considering the natural robustness of images in the communication process. In future work, we will explore the theory of joint source-channel image coding in the finite block length regime. It is noted that image lossless compression, especially, soft compression, may become an important block for semantic information communications, and even play some roles in the new developments of metaverse-type services in the future.
References
- [1] Y. Hu, W. Yang, Z. Ma, and J. Liu, “Learning end-to-end lossy image compression: A benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4194–4211, 2022.
- [2] J. Choi and J. Park, “Semantic communication as a signaling game with correlated knowledge bases,” arXiv preprint arXiv:2209.00280, 2022.
- [3] G. Xin and P. Fan, “Exk-sc: A semantic communication model based on information framework expansion and knowledge collision,” Entropy, vol. 24, no. 12, p. 1842, 2022.
- [4] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
- [5] G. Xin, Z. Li, Z. Zhu, S. Wan, P. Fan, and K. B. Letaief, “Soft compression: An approach to shape coding for images,” IEEE Communications Letters, vol. 25, no. 3, pp. 798–801, 2020.
- [6] G. Xin and P. Fan, “Soft compression for lossless image coding based on shape recognition,” Entropy, vol. 23, no. 12, p. 1680, 2021.
- [7] ——, “A lossless compression method for multi-component medical images based on big data mining,” Scientific Reports, vol. 11, no. 1, pp. 1–11, 2021.
- [8] R. Ascher and G. Nagy, “A means for achieving a high degree of compaction on scan-digitized printed text,” IEEE Transactions on Computers, vol. 23, no. 11, pp. 1174–1179, 1974.
- [9] A. Jacquin, “Image coding based on a fractal theory of iterated contractive image transformations,” IEEE Transactions on Image Processing, vol. 1, no. 1, pp. 18–30, 1992.
- [10] F. Guerrini, A. Gnutti, and R. Leonardi, “Iterative mirror decomposition for signal representation,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5541–5545.
- [11] J. Bégaint, D. Thoreau, P. Guillotel, and C. Guillemot, “Region-based prediction for image compression in the cloud,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1835–1846, 2017.
- [12] X. Zhang, C. Yang, X. Li, S. Liu, H. Yang, I. Katsavounidis, S.-M. Lei, and C.-C. J. Kuo, “Image coding with data-driven transforms: Methodology, performance and potential,” IEEE Transactions on Image Processing, vol. 29, pp. 9292–9304, 2020.
- [13] Z. Chen, L.-Y. Duan, S. Wang, Y. Lou, T. Huang, D. O. Wu, and W. Gao, “Toward knowledge as a service over networks: A deep learning model communication paradigm,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1349–1363, 2019.
- [14] E.-h. Yang, A. Kaltchenko, and J. C. Kieffer, “Universal lossless data compression with side information by using a conditional mpm grammar transform,” IEEE Transactions on Information Theory, vol. 47, no. 6, pp. 2130–2150, 2001.
- [15] L. Gavalakis and I. Kontoyiannis, “Fundamental limits of lossless data compression with side information,” IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2680–2692, 2021.
- [16] I. Sason and S. Verdú, “Improved bounds on lossless source coding and guessing moments via rényi measures,” IEEE Transactions on Information Theory, vol. 64, no. 6, pp. 4323–4346, 2018.
- [17] R. Rychtáriková, J. Korbel, P. Macháček, P. Císař, J. Urban, and D. Štys, “Point information gain and multidimensional data analysis,” Entropy, vol. 18, no. 10, p. 372, 2016.
- [18] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE transactions on Information Theory, vol. 24, no. 5, pp. 530–536, 1978.
- [19] A. D. Wyner and J. Ziv, “Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression,” IEEE Transactions on Information Theory, vol. 35, no. 6, pp. 1250–1258, 1989.