Distributed Arithmetic Coding for Sources with Hidden Markov Correlation

Yong Fang and Jechang Jeong This research was supported by Seoul Future Contents Convergence (SFCC) Cluster established by Seoul R&BD Program.The authors are with the Lab. of ICSP, Dep. of Electronics and Communications Engineering, Hanyang University, Haengdang-dong, Seongdong-gu, Seoul 133-791, Korea (e-mail: [email protected]).

Abstract

Distributed arithmetic coding (DAC) has been shown to be effective for Slepian-Wolf coding, especially for short data blocks. In this letter, we propose to use the DAC to compress momery-correlated sources. More specifically, the correlation between sources is modeled as a hidden Markov process. Experimental results show that the performance is close to the theoretical Slepian-Wolf limit.

Index Terms:

Distributed Source Coding (DSC), Slepian-Wolf Coding (SWC), Distributed Arithmetic Coding (DAC), Hidden Markov Correlation, Forward Algorithm.

I Introduction

We consider the problem of Slepian-Wolf Coding (SWC) with decoder Side Information (SI). The encoder compresses discrete source $X=\{x_{t}\}_{t=1}^{N}$ in the absence of $Y=\{y_{t}\}_{t=1}^{N}$ , discretely correlated SI. Slepian-Wolf theorem points out that lossless compression is achievable at rates $R\geq H(X|Y)$ , the conditional entropy of $X$ given $Y$ , where both $X$ and $Y$ are discrete random processes [1]. Conventionally, channel codes, such as turbo codes [2] or Low-Density Parity-Check (LDPC) codes [3], are used to deal with the SWC problem.

Recently, some SWC techniques based on entropy coding are proposed, such as Distributed Arithmetic Coding (DAC) [4, 5] and Overlapped Quasi-Arithmetic Coding (OQAC) [6]. These schemes can be seen as an extension of classic Arithmetic Coding (AC) whose principle is to encode source $X$ at rates $H(X|Y)\leq R<H(X)$ by allowing overlapped intervals. The overlapping leads to a larger final interval and hence a shorter codeword. However, ambiguous codewords are unavoidable at the same time. A soft joint decoder exploits SI $Y$ to decode $X$ . Afterwards, the time-shared DAC (TS-DAC) [7] is proposed to deal with the symmetric SWC problem. To realize rate-incremental SWC, the rate-compatible DAC is proposed in [8].

In this paper, we research how to use the DAC to compress sources with hidden Markov correlation.

II Binary Distributed Arithmetic Coding

Let $p$ be the bias probability of binary source $X$ , i.e., $p=P(x_{t}=1)$ . In the classic AC, source symbol $x_{t}$ is iteratively mapped onto sub-intervals of $[0,1)$ , whose lengths are proportional to $(1-p)$ and $p$ . The resulting rate is $R\geq H(X)$ . In the DAC [4], interval lengths are proportional to the modified probabilities $(1-p)^{\gamma}$ and $p^{\gamma}$ , where

\frac{H(X|Y)}{H(X)}\leq\gamma\leq 1.

(1)

The resulting rate is $R\geq\gamma H(X)\geq H(X|Y)$ . To fit the $[0,1)$ interval, the sub-intervals have to be partially overlapped. More specifically, symbols $x_{t}=0$ and $x_{t}=1$ correspond to intervals $[0,(1-p)^{\gamma})$ and $[1-p^{\gamma},1)$ , respectively. It is just the overlapping that leads to a larger final interval, and hence a shorter codeword. However, as a cost, the decoder can not decode $X$ unambiguously without $Y$ .

To describe the decoding process, we define a ternary symbol set $\{0,\chi,1\}$ , where $\chi$ represents a decoding ambiguity. Let $C_{X}$ be the codeword and $\tilde{x}_{t}$ be the $t$ -th decoded symbol, then

\tilde{x}_{t}=\left\{\begin{IEEEeqnarraybox}[]{[}][c]{l^{\prime}s}0,&$0 \leq C_X < 1 - p^\gamma$\\ \chi,&$1 - p^\gamma\leq C_X < (1-p)^\gamma$\\ 1,&$(1-p)^\gamma\leq C_X < 1$\end{IEEEeqnarraybox}.\right.

(2)

When the $t$ -th symbol is decoded, if $\tilde{x}_{t}=\chi$ , the decoder performs a branching: two candidate branches are generated, corresponding to two alternative symbols $x_{t}=0$ and $x_{t}=1$ . For each new branch, its metric is updated and the corresponding interval is selected for next iteration. To reduce complexity, every time after decoding a symbol, the decoder uses the $M$ -algorithm to keep at most $M$ branches with the best partial metric, and prunes others [4].

Note that the metric is not reliabe for the very last symbols of a finite length sequence $X$ [5]. This problem is solved by encoding the last $T$ symbols without interval overlapping [5]. It means that for $1\leq t\leq(N-T)$ , $x_{t}$ is mapped onto $[0,(1-p)^{\gamma})$ and $[1-p^{\gamma},1)$ ; while for $(N-T+1)\leq t\leq N$ , $x_{t}$ is mapped onto $[0,1-p)$ and $[1-p,1)$ .

Therefore, a binary DAC system can be described by four parameters: $\{p,\gamma,M,T\}$ .

III Hidden Markov Model and Forward Algorithm

Let $S=\{s_{t}\}_{t=1}^{N}$ be a sequence of states and $Z=\{z_{t}\}_{t=1}^{N}$ be a sequence of observations. A hidden Markov process is defined by $\lambda=(A,B,\pi)$ :

$A=\{a_{ji}\}$ : state transition probability matrix, where $a_{ji}=P(s_{t}=i|s_{t-1}=j)$ ;

$B=\{b_{i}(k)\}$ : observation probability distribution, where $b_{i}(k)=P(z_{t}=k|s_{t}=i)$ ;

$\pi=\{\pi_{i}\}$ : initial state distribution, where $\pi_{i}=P(s_{1}=i)$ .

The aim of forward algorithm is to compute $P(z_{1},...,z_{t}|\lambda)$ , given observation $\{z_{1},...,z_{t}\}$ and model $\lambda$ . Let $\alpha_{t}(i)$ be the probability of observing the partial sequence $\{z_{1},...,z_{t}\}$ such that state $s_{t}$ is $i$ , i.e.,

\alpha_{t}(i)=P(z_{1},...,z_{t},s_{t}=i|\lambda).

(3)

Initially, we have

\alpha_{1}(i)=\pi_{i}b_{i}(z_{1}).

(4)

For $t>1$ , $\alpha_{t}(i)$ can be induced through iteration

\alpha_{t}(i)=\{\sum_{j}[\alpha_{t-1}(j)a_{ji}]\}b_{i}(z_{t}).

(5)

Therefore,

P(z_{1},...,z_{t}|\lambda)=\sum_{i}{\alpha_{t}(i)}.

(6)

In practice, $\alpha_{t}(i)$ is usually normalized by

\alpha_{t}(i)=\frac{\alpha_{t}(i)}{\delta_{t}},

(7)

where

\delta_{t}=\sum_{i}{\alpha_{t}(i)}.

(8)

In this case, we have

P(z_{1},...,z_{t}|\lambda)=\prod_{t^{\prime}=1}^{t}{\delta_{t^{\prime}}}.

(9)

IV DAC for Hidden Markov Correlation

Assume that binary source $X$ and SI $Y$ are correlated by $Y=X\oplus Z$ , where $Z$ is generated by a hidden Markov model with parameter $\lambda$ . $X$ is encoded using a $\{p,\gamma,M,T\}$ DAC encoder. The decoding process is very similar to what described in [4]. The only difference is that the forward algorithm is embedded into the DAC decoder and the metric of each branch is replaced by $P(z_{1},...,z_{t}|\lambda)$ , where $z_{t}=x_{t}\oplus y_{t}$ .

V Experimental Results

We have implemented a 16-bit DAC codec system. The bias probability of $X$ is $p=0.5$ . According to the recommendation of [5], we set $M=2048$ and $T=15$ . The same 2-state (0 and 1) and 2-output (0 and 1) sources as in [9] are used in simulations (see Table I). The length of data block used in each test is $N=1024$ .

To achieve lossless compression, each test starts from $\gamma=H(X|Y)$ (see Table II). If the decoding fails, we increase $\gamma$ with 0.01. Such process is iterated until the decoding succeeds. For each model, results are averaged over 100 trials. Experimental results are enlisted in Table II.

For comparison, also included in Table II are experimental results for the same settings from [9]. In each test of [9], $N=16384$ source symbols are encoded using an LDPC code. In addition, to synchronize the hidden Markov model, $N\alpha$ original source symbols are sent to the decoder directly without compression.

The results show that the DAC performs similarly to or slightly better than (for models 1 and 2) the LDPC-based approach [9]. Moreover, for hidden Markov correlation, the DAC outperforms the LDPC-based approach in two aspects:

1). The LDPC-based approach requires longer codes to achieve better performance, while the DAC is insensitive to code length [4].

2). For the LDPC-based approach, to synchronize the hidden Markov model, a certain proportion of original source symbols must be sent to the decoder as “seeds”. However, it is hard to determine $\alpha$ , the optimal proportion of “seeds”. The results reported in [9] were obtained through an exhaustive search, which limits its application in practice.

TABLE I: Models for Simulation

$model$	$\{a_{00},a_{11},b_{0}(0),b_{1}(1)\}$
1	{0.01, 0.03, 0.99, 0.98}
2	{0.01, 0.065, 0.95, 0.925}
3	{0.97, 0.967, 0.93, 0.973}
4	{0.99, 0.989, 0.945, 0.9895}

TABLE II: Experimental Results

$model$	$H(X\|Y)$	[9]	DAC
1	0.24	0.36	0.345908
2	0.52	0.67	0.648594
3	0.45	0.58	0.585645
4	0.28	0.42	0.427236

VI Conclusion

This paper researches the compression of sources with hidden Markov correlation using the DAC. The forward algorithm is incorporated into the DAC decoder. The results are similar to that of the LPDC-based approach. Compared to the LDPC-based approach, the DAC is more suitable for practical applications.

References

[1] D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Info. Theory, vol. 19, no. 4, pp. 471-480, Jul. 1973.
[2] J. Garcia-Frias, “Compression of correlated binary sources using turbo codes,” IEEE Commun. Lett., vol. 5, no. 10, pp. 417-419, Oct. 2001.
[3] A. Liveris, Z. Xiong, and C. Georghiades, “Compression of binary sources with side information at the decoder using LDPC codes,” IEEE Commun. Lett., vol. 6, no. 10, pp. 440-442, Oct. 2002.
[4] M. Grangetto, E. Magli, and G. Olmo, “Distributed arithmetic coding,” IEEE Commun. Lett., vol. 11, no. 11, pp. 883-885, Nov. 2007.
[5] M. Grangetto, E. Magli, and G. Olmo, “Distributed arithmetic coding for the asymmetric slepian-wolf problem,” IEEE Trans. Signal Process (submitted), preprint available at URL www1.tlc.polito.it/sas-ipl/Magli/index.php.
[6] X. Artigas, S. Malinowski, C. Guillemot, and L. Torres, “Overlapped Quasi-Arithmetic codes for Distributed Video Coding,” IEEE International Conference on Image Processing (ICIP), San Antonio, Texas, USA, Sep. 2007.
[7] M. Grangetto, E. Magli, and G. Olmo, “Symmetric Distributed Arithmetic Coding of Correlated Sources,” IEEE International Workshop on Multimedia Signal Processing (MMSP), Crete, Greece, Oct. 2007.
[8] M. Grangetto, E. Magli, R. Tron, and G. Olmo, “Rate-compatible distributed arithmetic coding,” IEEE Commun. Lett., vol. 12, no. 8, pp. 575-577, Aug. 2008.
[9] J. Garcia-Frias and W. Zhong, “LDPC Codes for Compression of Multi-Terminal Sources With Hidden Markov Correlation,” IEEE Commun. Lett., vol. 7, no. 3, pp. 115-117, Mar. 2003.