This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Why Shape Coding? Asymptotic Analysis of the Entropy Rate for Digital Images

Gangtao Xin İD , Pingyi Fan İD
and Khaled B. Letaief İD
This work was supported by the National Key Research and Development Program of China (Grant NO.2021YFA1000500(4)) Gangtao Xin and Pingyi Fan are with the Department of Electronic Engineering, Tsinghua University, Beijing 100084, China, and also with the Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China (e-mail:[email protected])Khaled Ben Letaief is with the School of Engineering, The Hong Kong University of Science and Technology, Hong Kong
Abstract

This paper focuses on the limit theory of image compression. It proves that for an image source, there exists a coding method with shapes that can achieve the entropy rate under a certain condition where the shape-pixel ratio in the encoder/decoder is 𝒪(1/logt)\mathcal{O}({1/{\log t}}). Based on the new finding, an image coding framework with shapes is proposed and proved to be asymptotically optimal for stationary and ergodic processes. Moreover, the condition 𝒪(1/logt)\mathcal{O}({1/{\log t}}) of shape-pixel ratio in the encoder/decoder has been confirmed in the image database MNIST, which illustrates the soft compression with shape coding is a near-optimal scheme for lossless compression of images.

Index Terms:
Image compression; Information theory; Entropy rate; Limit theorem; Asymptotic bounds
publicationid: pubid: 0000–0000/00$00.00 © 2021 IEEE

I Introduction and overview

One of Shannon’s outstanding achievements in source coding is pointing out the data compression limit. This result has been widely and successfully applied in stream data compression. But image compression is still a challenging issue. This paper is an attempt to analyze the limit theory of image compression.

I-A Preliminaries

Data compression is one of the bases of digital communications, being used to provide efficient and low-cost communication services. Image is the most important and popular medium in the current information age. Hence, image compression is naturally an indispensable part of data compression [1]. Moreover, its coding efficiency directly affects the objective quality of the communication network and the subjective experiences of users.

As a compression method with strict requirements, lossless image coding focuses on reducing the required number of bits to represent an image without losing quality. It guarantees to cut down the occupation of communication and storage resources as much as possible under certain system or scenario constraints. In the area of big data, image lossless coding may play a more significant role in applications where errors can not be allowed, such as in intelligent medical treatment, digital libraries, semantic communications [2, 3], and metaverse in the future.

Entropy rate is one of the important metrics in information theory, which extends the meaning of entropy from a random variable to a random process. It also characterizes the generalized asymptotic equipartition property of a stochastic process. In this paper, we shall employ entropy rate to explain the best achievable data compression. It is well known that the entropy rate of a stochastic process {Zi}\{Z_{i}\} is defined as

H(𝒵)=limtsup1tH(Z1,Z2,,Zt).H(\mathcal{Z})=\lim_{t\to\infty}\sup{1\over t}H(Z_{1},Z_{2},...,Z_{t}). (1)

If the limit exists, then H(𝒵)H(\mathcal{Z}) is the per symbol entropy of the tt random variables, reflecting how the entropy of the sequence increases with tt. Moreover, the entropy rate can also be defined as

H(𝒵)=limtH(Zt|Zt1,Zt2,,Z1).H^{\prime}(\mathcal{Z})=\lim_{t\to\infty}H(Z_{t}|Z_{t-1},Z_{t-2},...,Z_{1}). (2)

H(𝒵)H^{\prime}(\mathcal{Z}) is the conditional entropy of the last random variable given all the past random variables. For a stationary stochastic process, the limits in Eq. (1) and (2) exist and are equal [4]. That is, H(𝒵)H(\mathcal{Z}) = H(𝒵)H^{\prime}(\mathcal{Z}). In addition, for a stationary Markov chain, the entropy rate is

H(𝒵)\displaystyle H(\mathcal{Z}) =H(𝒵)=limtH(Zt|Zt1,,Z1)\displaystyle=H^{\prime}(\mathcal{Z})=\lim_{t\to\infty}H(Z_{t}|Z_{t-1},...,Z_{1}) (3)
=limtH(Zt|Zt1).\displaystyle=\lim_{t\to\infty}H(Z_{t}|Z_{t-1}). (4)

The entropy rate is a long-term sequence metric. Even if the initial distribution of the Markov chain is not a stable distribution, it will still tend to converge as in Eq. (3) and (4). Moreover, for a general ergodic source, the Shannon-McMillan-Breiman theorem points out its asymptotic equipartition property. If {Zi}\{Z_{i}\} is a finite-valued stationary ergodic process, then

1tlogp(Z0,,Zt1)H(𝒵) with probability 1.-{1\over t}\log p(Z_{0},...,Z_{t-1})\to H(\mathcal{Z})\text{ with probability 1}. (5)

This indicates the convergence relationship between the joint probability density and entropy rate for the general ergodic process. Following a similar idea as that of the analysis of entropy rate, we investigate the asymptotic property of shape-based coding for stationary image ergodic processes.

I-B Shape Coding

A digital image is composed of lots of pixels arranged in order. This form is fixed and if the size of an image is determined, the number and arrangement mode of pixels are also determined. On the other hand, shape coding extends the basic components of images from pixels to shapes, which is a more flexible coding method and may efficiently utilize image embedding structures. Furthermore, it will no longer limit the number and position of shapes. Shape coding has three main characteristics: (1) The image is formed by filling shapes; (2) The position arrangement of shapes changes from a fixed mode to a random variable; (3) The shape database and codebook are generated in a data-driven way, which clearly contains more inherent features of image databases.

Consider a binary digital image ZZ, whose length and width are MM and NN, respectively, then the total number of pixels is t=M×Nt=M\times N. Suppose it is divided into c(t)c(t) shapes {s1,s2,,sc(t)}\{s_{1},s_{2},...,s_{c(t)}\}, where sis_{i} is the ii-th shape. We use 𝒟\mathcal{D} to denote the shape database and Fi(Si),i=1,,TF_{i}(S_{i}),i=1,...,T to represent filling an image with shape SiS_{i} at position (xi,yi)(x_{i},y_{i}) in the ii-th operation. The image with shape coding can be described as [5]

min\displaystyle\min i=1c(t)[l(si)+lp(xi,yi)]\displaystyle~{}~{}\sum_{i=1}^{c(t)}[l(s_{i})+l_{p}(x_{i},y_{i})] (6)
s.t.Z=i=1c(t)Fi(si),\displaystyle\text{s.t.}~{}~{}Z=\sum_{i=1}^{c(t)}F_{i}(s_{i}), (7)

where l(si)l(s_{i}) and lp(xi,yi)l_{p}(x_{i},y_{i}) represent the bit length of the shape sis_{i} and its corresponding location at (xi,yi)(x_{i},y_{i}), respectively. The constraint condition indicates that the binary image ZZ can be reconstructed through c(t)c(t) filling operations, which is exactly the same as the original image. On this premise, shape coding tries to reduce the cost of representing an image as much as possible.

The codebook plays an important role in shape coding. It reflects the statistical characteristics and correlation of the data source. Fig. 1 illustrates the structure of shape coding. It consists of two parts: the generation and use of the codebook. On the one hand, one searches and matches the shape of images in the dataset through a data-driven method. At the same time, the frequency statistical analysis is carried out to generate a shape database. On the other hand, the codebook can be used repeatedly in communication and storage tasks to reduce the occupation of resources. The transmitter/compressor encodes the original image with the codebook. After transmission or storage through the channel/storage medium, the receiver/decompressor can decode the compressed file with the same codebook. In this way, one can completely reconstruct the original image.

Refer to caption

Figure 1: The structure of shape coding. It consists of two parts: the generation and use of the codebook. The former makes use of the characteristics of the data source, while the latter improves the compression efficiency.

I-C Relations to Previous Work

The objective of this work is to present the performance limits from the viewpoint of information theory, which is related to our previous works in [5, 6, 7]. An image encoding method through shapes and data-driven methods can improve image lossless compression. In some known databases, soft compression outperforms the most popular methods such as PNG, JPEG2000, and JPEG-LS. However, there was no theoretical support for how shape-based soft compression methods can reach the performance limit. That is, the gap between soft compression and its compression limit, namely the entropy rate is not theoretically known. On the other hand, the entropy rate associated with the asymptotic equipartition property analysis of images can help us design efficient encoding and decoding algorithms from the viewpoint of Shannon’s information theory.

The earliest multi-pixel joint coding method can be traced back to Symbol-Based coding [8], which transmits or stores only the first instance of each pattern class and thereafter substitutes this exemplar for every subsequent occurrence of the symbols. It achieved a degree of bandwidth reduction on a scan-digitized printed text. Fractal theory [9, 10] is also related to block-based coding. Fractal block coding approximates an original image by relying on the assumption that image redundancy can be efficiently exploited through self-transformability on a blockwise basis. On the other hand, soft compression generates the shape database in a data-driven manner, so as to make the codebook used in the encoder and decoder. Image processing-based data-driven methods such as [11, 12, 13] can explore the essential features of images and even eliminate semantic redundancy. The method of using side information to assist data compression has also been used and analyzed by Kieffer [14] and Kontoyiannis [15]. Verdú [16] provided upper and lower bounds on the optimal guessing moments of a random variable by taking values on a finite set when the side information may be available. Rychtáriková et al. [17] generalized the point information gain and derived point information gain entropy, which may help analyze the entropy rate of an image.

Another connection to this paper is the Lempel-Ziv coding schemes [18]. It proposed the concept of compressibility. For every individual infinite sequence xx, a quantity ρ(x)\rho(x) is defined. It is shown to be the asymptotically attainable lower bound on the compression ratio that can be achieved for xx by any finite-state encoder. Wyner [19] derived theorems concerning the entropy of a stationary ergodic information source and used the results to obtain insight into the workings of the Lempel-Ziv data compression algorithm.

The main contribution of this paper is that we present a sufficient condition that allows us to show that the performance limit of shape-based image coding can be asymptotically achievable in terms of entropy rate.

I-D Paper Outline

The rest of this paper is organized as follows. Section II contains our main results, giving asymptotic properties on shape-based image coding in terms of entropy rate. Moreover, it indicates the relationship between the number of shapes and coding performance. In Section III, we present sample numerical results with concrete examples. In Section IV, we give some complementary remarks and conclude this paper.

II The Asymptotic Properties of Image Sources Composed of Shapes

The encoding method with shapes can take advantage of the characteristics of the data and eliminate redundancy in the spatial and coding domains simultaneously. This section theoretically analyzes the performance of image coding with shapes. It will show that when the numbers of shapes and pixels have a reciprocal logarithm relationship, the average code length will asymptotically approach the entropy rate. To the best of our knowledge, this is the first result of image compression in information theory. The framework of this proof is similar to [4] [19], but there are some important differences.

The average number of bits needed to represent the image ZZ with shapes is BZB_{Z}. Specifically,

BZ\displaystyle B_{Z} =1ti=1c(t)[l(si)+lp(xi,yi)]\displaystyle={1\over t}\sum_{i=1}^{c(t)}[l(s_{i})+l_{p}(x_{i},y_{i})] (8)
(a)c(t)logc(t)+i=1c(t)l(si)t\displaystyle\overset{(a)}{\leq}{{c(t)\log c(t)+\sum_{i=1}^{c(t)}l(s_{i})}\over t} (9)
(b)c(t)logc(t)+c(t)log|𝒟|t\displaystyle\overset{(b)}{\leq}{{c(t)\log c(t)+c(t)\log|\mathcal{D}|}\over t} (10)

where (a) and (b) follow from the fact that the uniform distribution has maximum entropy. That is, i=1c(t)l(si)c(t)log𝒟\sum_{i=1}^{c(t)}l(s_{i})\leq c(t)\log\mathcal{D} and i=1c(t)lp(xi,yi)c(t)logc(t)\sum_{i=1}^{c(t)}l_{p}(x_{i},y_{i})\leq c(t)\log c(t). BZB_{Z} is the average cost of encoding ZZ, which reflects the coding requirements of bits. In the sequel, we use Eq. (10) instead of (8) to scale BZB_{Z}.

Let {Zi}i=\{Z_{i}\}_{i=-\infty}^{\infty} be a strictly stationary ergodic process with finite states and zij(zi,zi+1,,zj)z_{i}^{j}\triangleq(z_{i},z_{i+1},\ldots,z_{j}). Due to the invariance of time, P(Zt|Ztkt1)P(Z_{t}|Z_{t-k}^{t-1}) is an ergodic process, where the kkth-order Markov approximation is used to make an approximation. We will then have

Qk(z(k1),,z0,,zt)P(z(k1)0)j=1tP(zj|zjkj1),Q_{k}(z_{-(k-1)},\ldots,z_{0},\ldots,z_{t})\triangleq P(z_{-(k-1)}^{0})\prod_{j=1}^{t}P(z_{j}|z_{j-k}^{j-1}), (11)

where z(k1)0z_{-(k-1)}^{0} is the initial state. In this way, one can use the kk-th order Markov entropy rate to estimate the entropy rate of {Zi}\{Z_{i}\}. That is,

1tlogQk(Z1,Z2,,Zt|Z(k1)0)\displaystyle-{1\over t}\log Q_{k}(Z_{1},Z_{2},\ldots,Z_{t}|Z_{-(k-1)}^{0}) =1tlogj=ttP(Zj|Zjkj1)\displaystyle=-{1\over t}\log\prod_{j=t}^{t}P(Z_{j}|Z_{j-k}^{j-1}) (12)
=1tj=1tlogP(Zj|Zjkj1)\displaystyle=-{1\over t}\sum_{j=1}^{t}\log P(Z_{j}|Z_{j-k}^{j-1}) (13)
ElogP(Zj|Zjkj1)\displaystyle\to-E\log P(Z_{j}|Z_{j-k}^{j-1}) (14)
=H(Zj|Zjkj1).\displaystyle=H(Z_{j}|Z_{j-k}^{j-1}). (15)

When kk\to\infty, the entropy rate of the kkth-order Markov approximation converges to the entropy rate of the original random process.

Suppose that z1tz_{1}^{t} is decomposed into c(t)c(t) shapes s1,s2,,sc(t)s_{1},s_{2},\ldots,s_{c(t)}. We define wiw_{i} as the kk bits before sis_{i}, where w1=z(k1)0w_{1}=z_{-(k-1)}^{0}. Let clwc_{lw} denote the number of shapes whose size is ll and its previous state wi=ww_{i}=w, w𝒵kw\in\mathcal{Z}^{k}.

Lemma 1.

For {Zi}\{Z_{i}\}, the joint transition probability and shape size satisfy the following inequality

logQk(z1,z2,,zt|w1)l,wclwlogαclw,\log Q_{k}(z_{1},z_{2},\ldots,z_{t}|w_{1})\leq\sum_{l,w}c_{lw}\log{\alpha\over c_{lw}}, (16)

where α\alpha is a constant.

Proof.

Suppose that for fixed ll and ww, the sum of the transition probabilities is less than a constant α\alpha, i.e.,

i:|si|=l,wi=w1clwP(si|wi)α.\sum_{i:|s_{i}|=l,w_{i}=w}{1\over c_{lw}}P(s_{i}|w_{i})\leq\alpha. (17)

Then,

logQk(z1,z2,,zt|w1)\displaystyle\log Q_{k}(z_{1},z_{2},\ldots,z_{t}|w_{1}) =logQk(s1,s2,,sc|w1)\displaystyle=\log Q_{k}(s_{1},s_{2},\ldots,s_{c}|w_{1}) (18)
=(a)i=1clogP(si|wi)\displaystyle\overset{(a)}{=}\sum_{i=1}^{c}\log P(s_{i}|w_{i}) (19)
=l,wi:|si|=l,wi=wlogP(si|wi)\displaystyle=\sum_{l,w}\sum_{i:|s_{i}|=l,w_{i}=w}\log P(s_{i}|w_{i}) (20)
=l,wclwi:|si|=l,wi=w1clwlogP(si|wi)\displaystyle=\sum_{l,w}c_{lw}\sum_{i:|s_{i}|=l,w_{i}=w}{1\over c_{lw}}\log P(s_{i}|w_{i}) (21)
(b)l,wclwlogi:|si|=l,wi=w1clwP(si|wi)\displaystyle\overset{(b)}{\leq}\sum_{l,w}c_{lw}\log\sum_{i:|s_{i}|=l,w_{i}=w}{1\over c_{lw}}P(s_{i}|w_{i}) (22)
l,wclwlogαclw\displaystyle\leq\sum_{l,w}c_{lw}\log{\alpha\over c_{lw}} (23)

where (a) follows from Eq. (11) and (b) follows from Jensen’s inequality, thanks to the convexity of logx\log x for x>0x>0. ∎

Lemma 1 links the conditional probability Qk(z1,z2,,zt|w1)Q_{k}(z_{1},z_{2},...,z_{t}|w_{1}) to clwc_{lw}, connecting the concepts before and after decomposing {Zi}\{Z_{i}\}. We will continue to explore the quantitative relationship between shapes and pixels.

Lemma 2.

For {Zi}\{Z_{i}\}, the number and size of its shapes meet the following relationship

l,wclwlogcclwkc+(t+c)log(1+ct)+clogtc\sum_{l,w}c_{lw}\log{c\over c_{lw}}\leq kc+(t+c)\log(1+{c\over t})+c\log{t\over c} (24)
Proof.

For simplicity, we use cc to represent c(t)c(t). Let plw=clwcp_{lw}=\displaystyle{\frac{c_{lw}}{c}}, then l,wplw=1\sum\limits_{l,w}p_{lw}=1. We define two random variables UU and VV such that

Pr(U=l,V=w)=plw.\Pr(U=l,V=w)=p_{lw}. (25)

The mean of UU is the average length of each shape, i.e., E(U)=tcE(U)=\displaystyle{\frac{t}{c}}. A random variable with a geometric distribution has maximum entropy when the mean of a discrete random variable is fixed. Thus, we have,

H(U)\displaystyle H(U) (a)tclogtc(tc1)log(tc1)\displaystyle\overset{(a)}{\leq}{t\over c}\log{t\over c}-({{t\over c}-1})\log({t\over c}-1) (26)
(b)(tc+1)log(tc+1)tclogtc\displaystyle\overset{(b)}{\leq}({{t\over c}+1})\log({t\over c}+1)-{t\over c}\log{t\over c} (27)
=logtc+(tc+1)log(ct+1),\displaystyle=\log{t\over c}+({t\over c}+1)\log({c\over t}+1), (28)

where (a) is the entropy of a random variable with a geometric distribution and (b) follows that the function f(x)=xlogx(x1)log(x1)f(x)=x\log x-(x-1)\log(x-1) is monotonically increasing when x1x\geq 1. On the other hand, H(V)log|𝒵|k=kH(V)\leq\log|\mathcal{Z}|^{k}=k. Thus,

l,wclwlogcclw\displaystyle\sum_{l,w}c_{lw}\log{c\over c_{lw}} =cl,wplwlog1plw\displaystyle=c\sum_{l,w}p_{lw}\log{1\over p_{lw}} (29)
=cH(U,V)\displaystyle=cH(U,V) (30)
cH(U)+cH(V)\displaystyle\leq cH(U)+cH(V) (31)
c[logtc+(tc+1)log(ct+1)]+kc\displaystyle\leq c[\log{t\over c}+({t\over c}+1)\log({c\over t}+1)]+kc (32)
=kc+(t+c)log(1+ct)+clogtc,\displaystyle=kc+(t+c)\log(1+{c\over t})+c\log{t\over c}, (33)

which completes the proof. ∎

Based on these two lemmas, we will further analyze the condition under which the entropy rate can be reached asymptotically.

Theorem 1.

When the numbers of shapes and pixels meet the reciprocal relation of the logarithm, then the average encoding length will asymptotically approximate the entropy rate. That is,
if

c(t)t=𝒪(1logt){c(t)\over t}=\mathcal{O}({1\over{\log t}}) (34)

then

limtl(Z1,Z2,,Zt)t=H(𝒵).\lim\limits_{t\to\infty}{l(Z_{1},Z_{2},...,Z_{t})\over t}=H(\mathcal{Z}). (35)
Proof.

From Lemma 1, one can write

logQk(z1,z2,,zt|w1)\displaystyle\log Q_{k}(z_{1},z_{2},...,z_{t}|w_{1}) l,wclwlogαclw\displaystyle\leq\sum_{l,w}c_{lw}\log{\alpha\over c_{lw}} (36)
=l,wclwlogcclwcα\displaystyle=-\sum_{l,w}c_{lw}\log{{c\cdot c_{lw}}\over{c\cdot\alpha}} (37)
=clogcl,wclwlogclwcα\displaystyle=-c\log c-\sum_{l,w}c_{lw}\log{c_{lw}\over{c\alpha}} (38)

For simplicity, we use QQ to represent Qk(z1,z2,,zt|w1)Q_{k}(z_{1},z_{2},...,z_{t}|w_{1}). Thus,

ctlogc1tlogQ1tl,wclwlogclwcα\displaystyle{c\over t}\log c\leq-{1\over t}\log Q-{1\over t}\sum_{l,w}c_{lw}\log{c_{lw}\over c\alpha} (39)

From Lemma 2, it follows that

1tl,wclwlogclwcα\displaystyle-{1\over t}\sum_{l,w}c_{lw}\log{c_{lw}\over c\alpha} =1tl,wclwlogcclw+ctlogα\displaystyle={1\over t}\sum_{l,w}c_{lw}\log{c\over c_{lw}}+{c\over t}\log\alpha (40)
1t[kc+(t+c)log(1+ct)+clogtc]+ctlogα\displaystyle\leq{1\over t}[kc+(t+c)\log(1+{c\over t})+c\log{t\over c}]+{c\over t}\log\alpha (41)
=ct(k+logα)+ctlogtc+(1+ct)log(1+ct).\displaystyle={c\over t}(k+\log\alpha)+{c\over t}\log{t\over c}+(1+{c\over t})\log(1+{c\over t}). (42)

When ct=𝒪(1logt){c\over t}=\mathcal{O}({1\over{\log t}}) and tt\to\infty, the three terms in the right-hand side of Eq. (42) will all tend to 0. Combining Eq. (39) and (42), we obtain

1tl,wclwlogclwcα0,whent.-{1\over t}\sum_{l,w}c_{lw}\log{c_{lw}\over c\alpha}\to 0,~{}~{}\textit{when}~{}t\to\infty. (43)

Then,

limtsupc(t)logct\displaystyle\lim\limits_{t\to\infty}\sup{{c(t)\log c}\over t} limt1tlogQk(Z1,Z2,,Zt|Z(k1)0)\displaystyle\leq\lim\limits_{t\to\infty}{-{1\over t}\log Q_{k}(Z_{1},Z_{2},...,Z_{t}|Z_{-(k-1)}^{0}}) (44)
H(𝒵).\displaystyle\to H(\mathcal{Z}). (45)

The asymptotic property of the second term in the right-hand side of Eq. (10),

limtc(t)log|𝒟|t=0.\displaystyle\lim\limits_{t\to\infty}{{c(t)\log|\mathcal{D}|}\over t}=0. (46)

Thus,

limtl(Z1,Z2,,Zt)t\displaystyle\lim\limits_{t\to\infty}{l(Z_{1},Z_{2},...,Z_{t})\over t} =limt(c(t)logct+c(t)log|𝒟|t)\displaystyle=\lim\limits_{t\to\infty}({c(t)\log c\over t}+{c(t)\log|\mathcal{D}|\over t}) (47)
=limtc(t)logct\displaystyle=\lim\limits_{t\to\infty}{c(t)\log c\over t} (48)
=H(𝒵).\displaystyle=H(\mathcal{Z}). (49)

This shows that when c(t)c(t) and tt meet the condition in Eq. (34), the average coding length of {Zi}\{Z_{i}\} will asymptotically approximate the entropy rate H(𝒵)H(\mathcal{Z}). ∎

Theorem 1 sets up a bridge between the shapes and the entropy rate for image sources with ergodic properties. It theoretically indicates what order of magnitude we should have the shapes and pixels. When one encodes images with shapes, the average cost will asymptotically tend to the entropy rate if the numbers of shapes and pixels satisfy the reciprocal relation of the logarithm. Moreover, it gives new insights into designing image compression algorithms in theory.

III Numerical Analysis

TABLE I: The numerical analysis of shapes and pixels on MNIST dataset (RavgR_{avg} is the average compression ratio)
Class 0 1 2 3 4 5 6 7 8 9
RavgR_{avg} 2.84 6.02 3.17 3.20 3.77 3.40 3.20 4.05 2.81 3.52
1tc(t)logt{1\over t}{c(t)\log t} 0.200 0.080 0.178 0.175 0.149 0.163 0.175 0.136 0.202 0.157

Section II points out the asymptotic property of encoding methods based on shapes. When c(t)t𝒪(1logt){c(t)\over t}\to\mathcal{O}({1\over\log t}), the average encoding length will asymptotically approximate the entropy rate. It indicates the relationship between the shape-pixel number ratio and coding performance. In this section, we present some numerical results to illustrate that for each ergodic process of an image source, if c(t)t𝒪(1logt){c(t)\over t}\to\mathcal{O}({1\over\log t}) as tt\to\infty, one can get the result of Eq. (35).

Table I reveals the numerical results of MNIST datasets. It includes encoding results RavgR_{avg} and 1tc(t)logt{1\over t}c(t)\log t in ten categories with the soft compression algorithm [5]. What can be clearly seen in this table is that 1tc(t)logt<1{1\over t}c(t)\log t<1 for all classes. It is on the order of 𝒪(1)\mathcal{O}(1), which is consistent with the assumption in Theorem 1.

Now we focus on simulated images as an alternative analysis. We use the birth and death processes of two states to simulate a stationary ergodic process. For each case, 5000 {Zi}\{Z_{i}\} with M=100,N=100M=100,N=100 are generated, respectively. We encode {Zi}\{Z_{i}\} with fixed size shapes and observe the effect of c(t)t{c(t)\over t} on coding performance.

Fig. 2 illustrates the shape coding working mechanism of the image source. It indicates the performance of the encoding method with shapes, in bits per pixel (bpp). Cases 1-5 represent different parameters of the infinitesimal generator matrix of the birth-death process, illustrating the relationship between coding performance and c(t)t{c(t)}\over t. In different cases, the changing trend of these curves is the same. The bpp decreases with the increase of shape size (i.e., the shape-pixel number ratio decreases), which reflects the gain brought by shape. Moreover, as the shape-pixel number ratio continues to decrease, bpp enters the smoothing region. It also shows that the reduction of the number ratio will not always improve the encoding performance. This is due to the fineness of the model itself, which does not take advantage of the additional statistical information of larger shapes. Note that, the numerical difference between the curves is essentially the difference in the entropy rate.

Refer to caption

Figure 2: The performance of encoding method with shapes, in bits per pixel (bpp).

IV Concluding Remarks

In this paper, we investigated the performance limit of shape-based image compression. Our works answered the open problem of the relationship between image decomposition and lossless compression, which reflects the performance variation in general. Specifically, when the numbers of shapes and pixels have a reciprocal relation to the logarithm, the average code length will asymptotically approach the entropy rate.

For image coding algorithms, one should give full attention to the superiority of shapes in image processing. Likewise, it is necessary to take advantage of the characteristics of the image dataset. Through shapes and data-driven methods, one can use the high-dimensional information of images to help code. Moreover, the asymptotic analysis of the entropy rate can also be extended to gray images and multi-component images with some adjustments.

Finally, it should be noted that this paper focuses on the source part, without considering the natural robustness of images in the communication process. In future work, we will explore the theory of joint source-channel image coding in the finite block length regime. It is noted that image lossless compression, especially, soft compression, may become an important block for semantic information communications, and even play some roles in the new developments of metaverse-type services in the future.

References

  • [1] Y. Hu, W. Yang, Z. Ma, and J. Liu, “Learning end-to-end lossy image compression: A benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4194–4211, 2022.
  • [2] J. Choi and J. Park, “Semantic communication as a signaling game with correlated knowledge bases,” arXiv preprint arXiv:2209.00280, 2022.
  • [3] G. Xin and P. Fan, “Exk-sc: A semantic communication model based on information framework expansion and knowledge collision,” Entropy, vol. 24, no. 12, p. 1842, 2022.
  • [4] T. M. Cover, Elements of information theory.   John Wiley & Sons, 1999.
  • [5] G. Xin, Z. Li, Z. Zhu, S. Wan, P. Fan, and K. B. Letaief, “Soft compression: An approach to shape coding for images,” IEEE Communications Letters, vol. 25, no. 3, pp. 798–801, 2020.
  • [6] G. Xin and P. Fan, “Soft compression for lossless image coding based on shape recognition,” Entropy, vol. 23, no. 12, p. 1680, 2021.
  • [7] ——, “A lossless compression method for multi-component medical images based on big data mining,” Scientific Reports, vol. 11, no. 1, pp. 1–11, 2021.
  • [8] R. Ascher and G. Nagy, “A means for achieving a high degree of compaction on scan-digitized printed text,” IEEE Transactions on Computers, vol. 23, no. 11, pp. 1174–1179, 1974.
  • [9] A. Jacquin, “Image coding based on a fractal theory of iterated contractive image transformations,” IEEE Transactions on Image Processing, vol. 1, no. 1, pp. 18–30, 1992.
  • [10] F. Guerrini, A. Gnutti, and R. Leonardi, “Iterative mirror decomposition for signal representation,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5541–5545.
  • [11] J. Bégaint, D. Thoreau, P. Guillotel, and C. Guillemot, “Region-based prediction for image compression in the cloud,” IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 1835–1846, 2017.
  • [12] X. Zhang, C. Yang, X. Li, S. Liu, H. Yang, I. Katsavounidis, S.-M. Lei, and C.-C. J. Kuo, “Image coding with data-driven transforms: Methodology, performance and potential,” IEEE Transactions on Image Processing, vol. 29, pp. 9292–9304, 2020.
  • [13] Z. Chen, L.-Y. Duan, S. Wang, Y. Lou, T. Huang, D. O. Wu, and W. Gao, “Toward knowledge as a service over networks: A deep learning model communication paradigm,” IEEE Journal on Selected Areas in Communications, vol. 37, no. 6, pp. 1349–1363, 2019.
  • [14] E.-h. Yang, A. Kaltchenko, and J. C. Kieffer, “Universal lossless data compression with side information by using a conditional mpm grammar transform,” IEEE Transactions on Information Theory, vol. 47, no. 6, pp. 2130–2150, 2001.
  • [15] L. Gavalakis and I. Kontoyiannis, “Fundamental limits of lossless data compression with side information,” IEEE Transactions on Information Theory, vol. 67, no. 5, pp. 2680–2692, 2021.
  • [16] I. Sason and S. Verdú, “Improved bounds on lossless source coding and guessing moments via rényi measures,” IEEE Transactions on Information Theory, vol. 64, no. 6, pp. 4323–4346, 2018.
  • [17] R. Rychtáriková, J. Korbel, P. Macháček, P. Císař, J. Urban, and D. Štys, “Point information gain and multidimensional data analysis,” Entropy, vol. 18, no. 10, p. 372, 2016.
  • [18] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE transactions on Information Theory, vol. 24, no. 5, pp. 530–536, 1978.
  • [19] A. D. Wyner and J. Ziv, “Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression,” IEEE Transactions on Information Theory, vol. 35, no. 6, pp. 1250–1258, 1989.