A Further Note on an Innovations Approach to Viterbi Decoding of Convolutional Codes

Masato Tajima M. Tajima is with University of Toyama, 3190 Gofuku, Toyama 930-8555, Japan (e-mail: [email protected]).

Abstract

In this paper, we show that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well from the viewpoint of mutual information and mean-square error. It is assumed that a code sequence is transmitted symbol by symbol over an AWGN channel using BPSK modulation. Then we can consider the signal model, where the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input. By assuming that the soft-decision input to the main decoder is the innovation, we show that the minimum mean-square error (MMSE) in estimating the binary input is expressed in terms of the distribution of the encoded block for the main decoder. It is shown that the obtained MMSE satisfies indirectly the known relation between the mutual information and the MMSE in Gaussian channels. Thus the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. Moreover, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.

Index Terms:

Convolutional codes, Scarce-State-Transition (SST) Viterbi decoder, innovations, filtering, smoothing, mean-square error, mutual information.

I Introduction

Consider an SST Viterbi decoder [17, 24] which consists of a pre-decoder and a main decoder. In [25], by comparing with the results in the linear filtering theory [1, 9, 11, 12, 15], we showed that the hard-decision input to the main decoder can be seen as the innovation [11, 12, 16, 18]. In the coding theory, the framework of the theory is determined by hard-decision data. In the previous paper [25, Definition 1], we have given the definition of innovations for hard-decision data. We see that the definition applies also to soft-decision data. That is, if the associated hard-decision data is the innovation, then the original soft-decision data can be regarded as the innovation. Hence, we can consider the soft-decision input to the main decoder to be the innovation as well. In this paper, concerning this subject, we show that this is true from a different viewpoint. We use the innovations approach to least-squares estimation by Kailath [11, 12] and the relationship between the mutual information and the mean-square error (e.g., [6]).

Consider the situation where we have observations of a signal in additive white Gaussian noise. (In this paper, it is assumed that the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input, where the latter is not dependent on the SNR. In the following, we call the equiprobable binary input simply the input.) Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. In the discrete-time case [11, 15], he showed that the covariance matrix of the innovation is expressed as the sum of two covariance matrices, where one is corresponding to the estimation error of the signal and the other is corresponding to the observation noise. Then the mean-square error in estimating the signal is obtained from the former covariance matrix by taking its trace. Based on the result of Kailath, we thought as follows: if the soft-decision input to the main decoder corresponds to the innovation, then the associated covariance matrix should have the above property. That is, the covariance matrix of the soft-decision input to the main decoder can be decomposed as the sum of two covariance matrices and hence by following the above, the mean-square error in estimating the signal will be obtained. We remark that in this context, the obtained mean-square error has to be justified using some method. For the purpose, we have noted the relation between the mutual information and the mean-square error.

By the way, we derived the distribution of the input to the main decoder corresponding to a single code symbol in [25]. However, we could not obtain the joint distribution of the input to the main decoder corresponding to a branch. After our paper [25] was published, we have noticed that the distribution of the input to the main decoder corresponding to a single code symbol has a reasonable interpretation (see [26]). Then using this fact, the joint distribution of the input to the main decoder corresponding to a branch has been derived. In that case, the associated covariance matrix can be calculated. Then our argument for obtaining the mean-square error is as follows:

1)

Derive the joint distribution of the soft-decision input to the main decoder.
2)

Calculate the associated covariance matrix using the derived joint distribution.
3)

The covariance matrix of the estimation error of the signal is obtained by subtracting that of the observation noise from the matrix in 2). Moreover, by removing the SNR from the obtained covariance matrix, the covariance matrix of the estimation error of the input is derived.
4)

The mean-square error in estimating the input is given by the trace (i.e., the sum of diagonal elements) of the last covariance matrix.

In this way, we show that the mean-square error in estimating the input is expressed in terms of the distribution of the encoded block for the main decoder. Since in an SST Viterbi decoder, the estimation error is decoded at the main decoder, the result is reasonable.

On the other hand, we have to show the validity of the derived mean-square error. For the purpose, we note the relation between the mutual information and the mean-square error. Including these notions, the mutual information, the likelihood ratio (LR), and the mean-square error are central concerns in information theory, detection theory, and estimation theory. It is well known that the best estimate measured in mean-square sense (i.e., the least-squares estimate) is given by the conditional expectation [16, 18, 21, 27]. In this paper, the corresponding mean-square error is referred to as the minimum mean-square error (MMSE) [6]. In this case, depending on the amount of observations $\{z(t)\}$ used for the estimation, the causal (filtering) MMSE and the noncausal (smoothing) MMSE are considered. When $\{z(\tau),~{}\tau\leq t\}$ is used for the estimation of the input $x(t)$ , it corresponds to filtering, whereas when $\{z(\tau),~{}\tau\leq T\}$ is used for the estimation of $x(t)~{}(t<T)$ , it corresponds to smoothing.

From the late 1960’s to the early 1970’s, the relation between the LR and the mean-square error was actively discussed. Kailath [14] showed that the LR is expressed in terms of the causal least-squares estimate. Moreover [13], he showed that the causal least-squares estimate can be obtained from the LR. Esposito [4] also derived the relation between the LR and the noncausal estimator, which is closely related to [13]. Subsequently, Duncan [3] and Kadota et al. [10] derived the relation between the mutual information and the causal mean-square error. Their works are regarded as extensions of the work of Gelfand and Yaglom (see [1, 3]), who discussed for the first time the relation between the mutual information and the filtering error. Later, Guo et al. [6] derived new formulas regarding the relation between the mutual information and the noncausal mean-square error. Furthermore, Zakai [29] showed that the relation between the mutual information and the estimation error holds also in the abstract Wiener Space [28, 29]. He applied not the Ito calculus [8, 21] but the Malliavin calculus [19, 29]. We remark that the additive Gaussian noise channel is assumed for all above works.

Among those works, we have noted the work of Guo et al. [6]. It deals with the relation between the mutual information and the MMSE. Also, their signal model is equal to ours and hence it seems favorable for manipulating. We thought if the MMSE obtained using the above method satisfies the relations in [6, Section IV-A], then the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. The main text of the paper consists of the arguments stated above. By the way, it is shown that the MMSE is expressed in terms of the distribution of the encoded block for the main decoder. Then we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory. The remainder of this paper is organized as follows.

In Section II, based on the signal model in this paper, we derive the joint distribution of the input to the main decoder.

In Section III, the associated covariance matrix is calculated using the derived joint distribution. Subsequently, by subtracting the covariance matrix of the observation noise from it and by removing the SNR from the resulting matrix, the covariance matrix of the estimation error of the input is obtained. The MMSE in estimating the input is given by the trace of the last matrix. In this case, since the diagonal elements of the target matrix have a correlation, we modify the obtained MMSE. Note that this modification is essentially important. (We remark that some results in Sections II and III have been given in [26] with or without proofs. However, in order for the paper to be self-contained, the necessary materials are provided again with proofs in those sections.)

In Section IV, the validity of the derived MMSE is discussed. The argument is based on the relation between the mutual information and the MMSE. More precisely, we discuss using the results of Guo et al. [6, Section IV-A]. We remark that the input in our signal model is not Gaussian. Hence the input-output mutual information cannot be obtained in a concrete form. We have only inequalities and approximate expressions. For that reason, we carry out numerical calculations using concrete convolutional codes. In this case, in order to clarify the difference between causal estimation (filtering) and noncausal one (smoothing), we take QLI codes [20] with different constraint-lengths. Then the MMSE’s are calculated by regarding these QLI codes as general codes on one side and by regarding them as inherent QLI codes on the other. The obtained results are compared and carefully examined. Moreover, through the argument, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.

In Section V, we give several important comments regarding the discussions in Section IV.

Finally, in Section IV, we conclude with the main points of the paper and with problems to be further discussed.

Let us close this section by introducing the basic notions needed for this paper. Notations in this paper are same as those in [25] in principle. We always assume that the underlying field is $\mbox{GF}(2)$ . Let $G(D)$ be a generator matrix for an $(n_{0},k_{0})$ convolutional code, where $G(D)$ is assumed to be minimal [5]. A corresponding check matrix $H(D)$ is also assumed to be minimal. Hence they have the same constraint length, denoted $\nu$ . Denote by $\mbox{\boldmath$i$}=\{\mbox{\boldmath$i$}_{k}\}$ and $\mbox{\boldmath$y$}=\{\mbox{\boldmath$y$}_{k}\}$ an information sequence and the corresponding code sequence, respectively, where $\mbox{\boldmath$i$}_{k}=(i_{k}^{(1)},\cdots,i_{k}^{(k_{0})})$ is the information block at $t=k$ and $\mbox{\boldmath$y$}_{k}=(y_{k}^{(1)},\cdots,y_{k}^{(n_{0})})$ is the encoded block at $t=k$ . In this paper, it is assumed that a code sequence $y$ is transmitted symbol by symbol over a memoryless AWGN channel using BPSK modulation. Let $\mbox{\boldmath$z$}=\{\mbox{\boldmath$z$}_{k}\}$ be a received sequence, where $\mbox{\boldmath$z$}_{k}=(z_{k}^{(1)},\cdots,z_{k}^{(n_{0})})$ is the received block at $t=k$ . Each component $z_{j}$ of $z$ is modeled as

	$\displaystyle z_{j}$	$\displaystyle=$	$\displaystyle x_{j}\sqrt{2E_{s}/N_{0}}+w_{j}$		(1)
		$\displaystyle=$	$\displaystyle cx_{j}+w_{j},$		(2)

where $c\stackrel{{\scriptstyle\triangle}}{{=}}\sqrt{2E_{s}/N_{0}}$ . Here, $x_{j}$ takes $\pm 1$ depending on whether the code symbol $y_{j}$ is $0$ or $1$ . That is, $x_{j}$ is the equiprobable binary input. (We call it simply the input.) $E_{s}$ and $N_{0}$ denote the energy per channel symbol and the single-sided noise spectral density, respectively. (Let $E_{b}$ be the energy per information bit. Then the relationship between $E_{b}$ and $E_{s}$ is defined by $E_{s}=RE_{b}$ , where $R$ is the code rate.) Also, $w_{j}$ is a zero-mean unit variance Gaussian random variable with probability density function

q(y)=\frac{1}{\sqrt{2\pi}}e^{-\frac{y^{2}}{2}}.

(3)

Each $w_{j}$ is independent of all others. The hard-decision (denoted “^h”) data of $z_{j}$ is defined by

z_{j}^{h}\stackrel{{\scriptstyle\triangle}}{{=}}\left\{\begin{array}[]{rl}0,&\quad z_{j}\geq 0\\ 1,&\quad z_{j}<0.\end{array}\right.

(4)

In this case, the channel error probability (denoted $\epsilon$ ) is given by

\epsilon=\frac{1}{\sqrt{2\pi}}\int_{\sqrt{2E_{s}/N_{0}}}^{\infty}e^{-\frac{y^{2}}{2}}dy\stackrel{{\scriptstyle\triangle}}{{=}}Q\bigl{(}\sqrt{2E_{s}/N_{0}}\bigr{)}.

(5)

Note that the above signal model can be seen as the block at $t=k$ . In that case, we can rewrite as

\mbox{\boldmath$z$}_{k}=c\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k},

(6)

where $\mbox{\boldmath$x$}_{k}=(x_{k}^{(1)},\cdots,x_{k}^{(n_{0})})$ , $\mbox{\boldmath$w$}_{k}=(w_{k}^{(1)},\cdots,w_{k}^{(n_{0})})$ , and $\mbox{\boldmath$z$}_{k}=(z_{k}^{(1)},\cdots,z_{k}^{(n_{0})})$ .

In this paper, we consider an SST Viterbi decoder [17, 24, 25] which consists of a pre-decoder and a main decoder. In the case of a general convolutional code, the inverse encoder $G^{-1}(D)$ is used as a pre-decoder. Let $\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},\cdots,r_{k}^{(n_{0})})$ be the soft-decision input to the main decoder. $r_{k}^{(l)}~{}(1\leq l\leq n_{0})$ is given by

r_{k}^{(l)}=\left\{\begin{array}[]{rl}|z_{k}^{(l)}|,&\quad r_{k}^{(l)h}=0\\ -|z_{k}^{(l)}|,&\quad r_{k}^{(l)h}=1.\end{array}\right.

(7)

II Joint Distribution of the Input to the Main Decoder

The argument in this paper is based on the result of Kailath [11, Section III-B]. That is, by calculating the covariance matrix of the input to the main decoder, we derive the mean-square error in estimating the input. First we obtain the joint distribution of the input to the main decoder. In the following, $P(\,\cdot\,)$ and $E[\,\cdot\,]$ denote the probability and expectation, respectively.

Let $\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},\cdots,r_{k}^{(n_{0})})$ be the input to the main decoder in an SST Viterbi decoder [25]. In connection with the distribution of $r_{k}^{(l)}~{}(1\leq l\leq n_{0})$ [25, Proposition 12], $\alpha~{}(=\alpha_{l})$ is defined by

\alpha\stackrel{{\scriptstyle\triangle}}{{=}}P(e_{k}^{(l)}=0,r_{k}^{(l)h}=1)+P(e_{k}^{(l)}=1,r_{k}^{(l)h}=0).

(8)

We have noticed that this value has another interpretation. In fact, we have the following.

Lemma 1 ([26])

\alpha_{l}=P(v_{k}^{(l)}=1)~{}(1\leq l\leq n_{0})

(9)

holds, where $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},\cdots,v_{k}^{(n_{0})})$ is the encoded block for the main decoder.

Proof:

The hard-decision input to the main decoder is expressed as

\mbox{\boldmath$r$}_{k}^{h}=\mbox{\boldmath$u$}_{k}G+\mbox{\boldmath$e$}_{k},

where $\mbox{\boldmath$u$}_{k}=\mbox{\boldmath$e$}_{k}G^{-1}$ . Let $\mbox{\boldmath$v$}_{k}=\mbox{\boldmath$u$}_{k}G$ . We have

r_{k}^{(l)h}=v_{k}^{(l)}+e_{k}^{(l)}~{}(1\leq l\leq n_{0}).

(10)

Hence it follows that

$\displaystyle\alpha_{l}$	$\displaystyle=$	$\displaystyle P(e_{k}^{(l)}=0,r_{k}^{(l)h}=1)+P(e_{k}^{(l)}=1,r_{k}^{(l)h}=0)$	(11)
	$\displaystyle=$	$\displaystyle P(e_{k}^{(l)}=0,v_{k}^{(l)}+e_{k}^{(l)}=1)+P(e_{k}^{(l)}=1,v_{k}^{(l)}+e_{k}^{(l)}=0)$
	$\displaystyle=$	$\displaystyle P(e_{k}^{(l)}=0,v_{k}^{(l)}=1)+P(e_{k}^{(l)}=1,v_{k}^{(l)}=1)$
	$\displaystyle=$	$\displaystyle P(v_{k}^{(l)}=1).$

∎

Thus the distribution of $r_{k}^{(l)}~{}(1\leq l\leq n_{0})$ is given by

p_{r}(y)=P(v_{k}^{(l)}=0)q(y-c)+P(v_{k}^{(l)}=1)q(y+c).

(12)

This equation means that if the code symbol is $0$ , then the associated distribution obeys $q(y-c)$ , whereas if the code symbol is $1$ , then the associated distribution obeys $q(y+c)$ . Hence the result is quite reasonable. On the other hand, since the distribution of $r_{k}^{(l)}$ is given by the above equation, $r_{k}^{(l)}~{}(1\leq l\leq n_{0})$ are not mutually independent, because $v_{k}^{(l)}~{}(1\leq l\leq n_{0})$ are not mutually independent.

Next, consider a QLI code whose generator matrix is given by

G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1).

(13)

Let $\mbox{\boldmath$\eta$}_{k-L}=(\eta_{k-L}^{(1)},\eta_{k-L}^{(2)})$ be the input to the main decoder in an SST Viterbi decoder [25]. We see that almost the same argument applies to $\beta~{}(=\beta_{l})$ in [25, Proposition 14]. We have the following.

Lemma 2 ([26])

\beta_{l}=P(v_{k}^{(l)}=1)~{}(l=1,2)

(14)

holds, where $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})$ is the encoded block for the main decoder.

Proof:

In the case of QLI codes [25, Section II-B], the hard-decision input to the main decoder is expressed as

	$\displaystyle\mbox{\boldmath$\eta$}_{k-L}^{h}$	$\displaystyle=$	$\displaystyle u_{k}G+\mbox{\boldmath$e$}_{k-L}$
		$\displaystyle=$	$\displaystyle\mbox{\boldmath$v$}_{k}+\mbox{\boldmath$e$}_{k-L},$

where $u_{k}=\mbox{\boldmath$e$}_{k}F$ ( $F\stackrel{{\scriptstyle\triangle}}{{=}}\left(\begin{array}[]{c}1\\ 1\end{array}\right)$ ) and $\mbox{\boldmath$v$}_{k}=u_{k}G$ . Let $\zeta_{k}=\mbox{\boldmath$z$}_{k}H(D)$ be the syndrome. Since $\mbox{\boldmath$\eta$}_{k-L}^{h}=(\zeta_{k},\zeta_{k})$ , the above is equivalent to

(\zeta_{k},\zeta_{k})=(v_{k}^{(1)},v_{k}^{(2)})+(e_{k-L}^{(1)},e_{k-L}^{(2)}).

(15)

Hence we have

$\displaystyle\beta_{l}$	$\displaystyle=$	$\displaystyle P(e_{k-L}^{(l)}=0,\zeta_{k}=1)+P(e_{k-L}^{(l)}=1,\zeta_{k}=0)$	(16)
	$\displaystyle=$	$\displaystyle P(e_{k-L}^{(l)}=0,v_{k}^{(l)}+e_{k-L}^{(l)}=1)+P(e_{k-L}^{(l)}=1,v_{k}^{(l)}+e_{k-L}^{(l)}=0)$
	$\displaystyle=$	$\displaystyle P(e_{k-L}^{(l)}=0,v_{k}^{(l)}=1)+P(e_{k-L}^{(l)}=1,v_{k}^{(l)}=1)$
	$\displaystyle=$	$\displaystyle P(v_{k}^{(l)}=1)$

for $l=1,~{}2$ . ∎

In the rest of the paper, $n_{0}=2$ is assumed, because we are concerned with QLI codes in principle. Let us examine the relationship between $\alpha_{l}$ and $\beta_{l}$ . Let $\hat{i}(k-L|k-L)$ and $\hat{i}(k-L|k)$ be the filtered estimate and the smoothed estimate, respectively [25, Section II-B]. We have

$\displaystyle\mbox{\boldmath$r$}_{k-L}^{h}$	$\displaystyle=$	$\displaystyle\mbox{\boldmath$z$}_{k-L}^{h}-\hat{i}(k-L\|k-L)G(D)$	(17)
	$\displaystyle=$	$\displaystyle(i_{k-L}-\hat{i}(k-L\|k-L))G(D)+\mbox{\boldmath$e$}_{k-L}$
	$\displaystyle=$	$\displaystyle u_{k-L}G(D)+\mbox{\boldmath$e$}_{k-L}$
$\displaystyle\mbox{\boldmath$\eta$}_{k-L}^{h}$	$\displaystyle=$	$\displaystyle\mbox{\boldmath$z$}_{k-L}^{h}-\hat{i}(k-L\|k)G(D)$	(18)
	$\displaystyle=$	$\displaystyle(i_{k-L}-\hat{i}(k-L\|k))G(D)+\mbox{\boldmath$e$}_{k-L}$
	$\displaystyle=$	$\displaystyle\tilde{u}_{k-L}G(D)+\mbox{\boldmath$e$}_{k-L},$

where $\tilde{u}_{k-L}\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$e$}_{k}F$ . Then it follows that

$\displaystyle\mbox{\boldmath$v$}_{k-L}$	$\displaystyle=$	$\displaystyle u_{k-L}G(D)$	(19)
	$\displaystyle=$	$\displaystyle(i_{k-L}-\hat{i}(k-L\|k-L))G(D)$	(19)
$\displaystyle\mbox{\boldmath$\tilde{v}$}_{k-L}$	$\displaystyle=$	$\displaystyle\tilde{u}_{k-L}G(D)$	(20)
	$\displaystyle=$	$\displaystyle(i_{k-L}-\hat{i}(k-L\|k))G(D).$	(20)

From the meaning of filtering and smoothing, it is natural to think that $P(i_{k-L}-\hat{i}(k-L|k-L)=0)$ is smaller than $P(i_{k-L}-\hat{i}(k-L|k)=0)$ . Hence it is expected that

P(v_{k-L}^{(l)}=1)>P(\tilde{v}_{k-L}^{(l)}=1),

(21)

i.e., $\alpha_{l}>\beta_{l}$ holds. (In the derivation, $P(v_{k-L}^{(l)}=1)=P(v_{k}^{(l)}=1)$ , which is equivalent to a kind of stationarity, has been used.) However, this is not always true (see Section IV).

Since the meaning of $\alpha~{}(=\alpha_{l})$ has been clarified, we can derive the joint distribution (denoted $p_{r}(x,y)$ ) of $r_{k}^{(1)}$ and $r_{k}^{(2)}$ . In fact, we have the following.

Proposition 1 ([26])

$p_{r}(x,y)$ is given by

	$\displaystyle p_{r}(x,y)$	$\displaystyle=$	$\displaystyle\alpha_{00}q(x-c)q(y-c)+\alpha_{01}q(x-c)q(y+c)$		(22)
			$\displaystyle+\alpha_{10}q(x+c)q(y-c)+\alpha_{11}q(x+c)q(y+c),$		(22)

where $\alpha_{ij}=P(v_{k}^{(1)}=i,v_{k}^{(2)}=j)$ .

Proof:

$p_{r}(x,y)\geq 0$ is obvious. Let us show that $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p_{r}(x,y)dxdy=1$ . Noting, for example,

	$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}q(x-c)q(y-c)dxdy$
			$\displaystyle=\int_{-\infty}^{\infty}q(x-c)dx\int_{-\infty}^{\infty}q(y-c)dy$
			$\displaystyle=1\times 1=1,$

we have

	$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p_{r}(x,y)dxdy$				(23)
			$\displaystyle=\alpha_{00}+\alpha_{01}+\alpha_{10}+\alpha_{11}=1.$		(23)

(Remark: $\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}q(x-c)q(y-c)dxdy$ is a multiple integral on the infinite interval, i.e., an improper integral. Consider a finite interval $I=\{a\leq x\leq b,a^{\prime}\leq y\leq b^{\prime}\}$ . Since $q(x-c)q(y-c)$ is continuous on $I$ , a repeated integral is possible on $I$ and we have

	$\displaystyle\int\int_{I}q(x-c)q(y-c)dxdy$
			$\displaystyle=\int_{a}^{b}q(x-c)dx\int_{a^{\prime}}^{b^{\prime}}q(y-c)dy.$

Taking the limit as $a,a^{\prime}\rightarrow-\infty$ and as $b,b^{\prime}\rightarrow\infty$ , the above equality is obtained.)

Next, let us calculate the marginal distribution of $p_{r}(x,y)$ . We have

$\displaystyle\int_{-\infty}^{\infty}p_{r}(x,y)dy$		(24)
	$\displaystyle=(\alpha_{00}+\alpha_{01})q(x-c)+(\alpha_{10}+\alpha_{11})q(x+c)$
	$\displaystyle=(1-\alpha_{1})q(x-c)+\alpha_{1}q(x+c).$

Note that this is just the distribution of $r_{k}^{(1)}$ . Similarly, we have

\int_{-\infty}^{\infty}p_{r}(x,y)dx=(1-\alpha_{2})q(y-c)+\alpha_{2}q(y+c),

(25)

where the right-hand side is the distribution of $r_{k}^{(2)}$ . All these facts show that $p_{r}(x,y)$ is the joint distribution of $r_{k}^{(1)}$ and $r_{k}^{(2)}$ . ∎

III Mean-Square Error in Estimating the Input

Consider the signal model:

Z=X+W,

(26)

where $X$ represents a signal of interest and $W$ represents random noise. We assume that $X$ and $W$ are mutually independent. Since we cannot know the value of $X$ directly, we have to estimate it based on the observation $Z$ . The error of an estimate, denoted $\mbox{\boldmath$f$}(Z)$ , of the input $X$ based on the observation $Z$ can be measured in mean-square sense

E[(X-\mbox{\boldmath$f$}(Z))(X-\mbox{\boldmath$f$}(Z))^{T}],

(27)

where “^T” means transpose. It is well known [16, 18, 21, 27] that the minimum of the above value is achieved by the conditional expectation

\hat{X}=\widehat{\mbox{\boldmath$f$}(Z)}=E[X|Z].

(28)

$\hat{X}$ is the least-squares estimate and the corresponding estimation error is referred to as the minimum mean-square error (MMSE) [6]. In the following, the value of the MMSE is denoted by “mmse”.

Remark: Let $\mathcal{Z}$ be the $\sigma$ -field generated by $Z$ . Also, denote by $L^{2}(\mathcal{Z})$ the set of elements in $L^{2}$ which are $\mathcal{Z}$ measurable, where $L^{2}$ is the set of square integrable random variables. Then we have $E[X|Z]=P_{L^{2}(\mathcal{Z})}X$ , where $P_{L^{2}(\mathcal{Z})}X$ is the orthogonal projection of $X$ onto the space $L^{2}(\mathcal{Z})$ [18, 21, 27]. If $X$ and $Z$ are jointly Gaussian, then we have $E[X|Z]=P_{\mathcal{H}(Z)}X$ [18, 21], where $\mathcal{H}(Z)$ is the Gaussian space [18, 21] generated by $Z$ . Note that $\mathcal{H}(Z)$ is a subspace of $L^{2}(\mathcal{Z})$ .

Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. Suppose that the observations are given by

\mbox{\boldmath$z$}_{k}=\mbox{\boldmath$s$}_{k}+\mbox{\boldmath$w$}_{k},~{}k=1,2,\cdots,

(29)

where $\{\mbox{\boldmath$s$}_{k}\}$ is a zero-mean finite-variance signal process and $\{\mbox{\boldmath$w$}_{k}\}$ is a zero-mean white Gaussian noise. It is assumed that $\mbox{\boldmath$w$}_{k}$ has a covariance matrix $E[\mbox{\boldmath$w$}_{k}^{T}\mbox{\boldmath$w$}_{l}]=R_{k}\delta_{kl}$ , where $\delta_{kl}$ is Kronecker’s delta. The innovation process is defined by

\mbox{\boldmath$\nu$}_{k}\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$z$}_{k}-\mbox{\boldmath$\hat{s}$}(k|k-1),

(30)

where $\mbox{\boldmath$\hat{s}$}(k|k-1)$ is the linear least-squares estimate of $\mbox{\boldmath$s$}_{k}$ given $\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\}$ . Kailath [11, Section III-B] showed the following.

Proposition 2 (Kailath [11])

The covariance matrix of $\mbox{\boldmath$\nu$}_{k}$ is given by

E[\mbox{\boldmath$\nu$}_{k}^{T}\mbox{\boldmath$\nu$}_{l}]=(P_{k}+R_{k})\delta_{kl},

(31)

where $P_{k}$ is the covariance matrix of the error in the estimate $\mbox{\boldmath$\hat{s}$}(k|k-1)$ .

We remark that this result plays an essential role in this paper.

III-A Covariance Matrix Associated with the Input to the Main Decoder

In this section, by assuming that $\mbox{\boldmath$r$}_{k}$ is the innovation, we calculate the covariance matrix of $\mbox{\boldmath$r$}_{k}$ . Then the MMSE in estimating the input $\mbox{\boldmath$x$}_{k}$ is obtained from the associated covariance matrix. In preparation for the purpose, we present a lemma.

Lemma 3 ([26])

Suppose that $0\leq\epsilon\leq 1/2$ . The following quantities have the same value:

$\displaystyle P(v_{k}^{(1)}=0,v_{k}^{(2)}=0)-P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=0)$	$\displaystyle=$	$\displaystyle\alpha_{00}-(1-\alpha_{1})(1-\alpha_{2})$	(32)
$\displaystyle P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=1)-P(v_{k}^{(1)}=0,v_{k}^{(2)}=1)$	$\displaystyle=$	$\displaystyle(1-\alpha_{1})\alpha_{2}-\alpha_{01}$	(33)
$\displaystyle P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=0)-P(v_{k}^{(1)}=1,v_{k}^{(2)}=0)$	$\displaystyle=$	$\displaystyle\alpha_{1}(1-\alpha_{2})-\alpha_{10}$	(34)
$\displaystyle P(v_{k}^{(1)}=1,v_{k}^{(2)}=1)-P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=1)$	$\displaystyle=$	$\displaystyle\alpha_{11}-\alpha_{1}\alpha_{2}.$	(35)

The common value is denoted by $\delta$ .

Remark: $\delta=0$ implies that $v_{k}^{(1)}$ and $v_{k}^{(2)}$ are mutually independent.

Proof:

Suppose that $\alpha_{1}$ and $\alpha_{2}$ are given. From the definition of $\alpha_{ij}$ , we obtain a system of linear equations:

\left\{\begin{array}[]{l}\alpha_{00}+\alpha_{01}=1-\alpha_{1}\\ \alpha_{10}+\alpha_{11}=\alpha_{1}\\ \alpha_{00}+\alpha_{10}=1-\alpha_{2}\\ \alpha_{01}+\alpha_{11}=\alpha_{2}.\end{array}\right.

(36)

This can be solved as

\left\{\begin{array}[]{l}\alpha_{00}=1-\alpha_{1}-\alpha_{2}+u\\ \alpha_{01}=\alpha_{2}-u\\ \alpha_{10}=\alpha_{1}-u\\ \alpha_{11}=u,\end{array}\right.

(37)

where $u$ is an arbitrary constant. We remark that the probabilities $(0\leq)~{}\alpha_{00}\sim\alpha_{11}~{}(\leq 1)$ are determined by $u$ , which in turn restricts the value of $u$ . Since $0\leq\alpha_{ij}$ , $u$ must satisfy the following:

\left\{\begin{array}[]{l}u\geq\alpha_{1}+\alpha_{2}-1\\ u\leq\alpha_{2}\\ u\leq\alpha_{1}\\ u\geq 0.\end{array}\right.

(38)

Note that $0\leq\alpha_{i}\leq 1/2~{}(i=1,2)$ for $0\leq\epsilon\leq 1/2$ [25, Lemma 13]. Hence we have $\alpha_{1}+\alpha_{2}-1\leq 0$ . Accordingly, the value of $u$ is restricted to

0\leq u\leq\mbox{min}(\alpha_{1},\alpha_{2}).

(39)

It is shown that $\alpha_{ij}\leq 1$ is also satisfied for $0\leq u\leq\mbox{min}(\alpha_{1},\alpha_{2})$ .

Now we have

$\displaystyle\alpha_{00}-(1-\alpha_{1})(1-\alpha_{2})$		(40)
	$\displaystyle=(1-\alpha_{1}-\alpha_{2}+u)-(1-\alpha_{1}-\alpha_{2}+\alpha_{1}\alpha_{2})$
	$\displaystyle=u-\alpha_{1}\alpha_{2}$
	$\displaystyle=\alpha_{11}-\alpha_{1}\alpha_{2}.$

We see that the remaining three quantities are equal also to $u-\alpha_{1}\alpha_{2}=\alpha_{11}-\alpha_{1}\alpha_{2}$ . ∎

Let us show that $\delta\geq 0$ . We need the following.

Lemma 4

P(e_{1}+e_{2}+\cdots+e_{m}=1)\leq\frac{1}{2}

(41)

holds, where errors $e_{j}~{}(1\leq j\leq m)$ are statistically independent of each other.

Proof:

See Appendix A. ∎

Now we have the following.

Lemma 5

Suppose that $0\leq\epsilon\leq 1/2$ . Then $\delta\geq 0$ holds.

Proof:

See Appendix B. ∎

Proposition 3 ([26])

The covariance matrix associated with $p_{r}(x,y)$ is given by

	$\displaystyle\Sigma_{r}$	$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\left(\begin{array}[]{cc}\sigma_{r_{1}}^{2}&\sigma_{r_{1}r_{2}}\\ \sigma_{r_{1}r_{2}}&\sigma_{r_{2}}^{2}\end{array}\right)$		(44)
		$\displaystyle=$	$\displaystyle\left(\begin{array}[]{cc}1+4c^{2}\alpha_{1}(1-\alpha_{1})&4c^{2}\delta\\ 4c^{2}\delta&1+4c^{2}\alpha_{2}(1-\alpha_{2})\end{array}\right).$		(47)

Proof:

Note the relation

\sigma_{r_{1}}^{2}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x^{2}\,p_{r}(x,y)dxdy-m_{r_{1}}^{2},

(48)

where

m_{r_{1}}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\,p_{r}(x,y)dxdy.

(49)

The first term is calculated as

$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x^{2}\,p_{r}(x,y)dxdy$		(50)
	$\displaystyle=(\alpha_{00}+\alpha_{01}+\alpha_{10}+\alpha_{11})(1+c^{2})$
	$\displaystyle=1+c^{2}.$

Also, we have

$\displaystyle m_{r_{1}}$	$\displaystyle=$	$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\,p_{r}(x,y)dxdy$	(51)
	$\displaystyle=$	$\displaystyle(\alpha_{00}+\alpha_{01})c+(\alpha_{10}+\alpha_{11})(-c)$
	$\displaystyle=$	$\displaystyle(1-\alpha_{1})c-\alpha_{1}c$
	$\displaystyle=$	$\displaystyle(1-2\alpha_{1})c.$

Then it follows that

	$\displaystyle\sigma_{r_{1}}^{2}$	$\displaystyle=$	$\displaystyle(1+c^{2})-(1-2\alpha_{1})^{2}c^{2}$		(52)
		$\displaystyle=$	$\displaystyle 1+4c^{2}\alpha_{1}(1-\alpha_{1}).$		(52)

Similarly, we have

	$\displaystyle\sigma_{r_{2}}^{2}$	$\displaystyle=$	$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}y^{2}\,p_{r}(x,y)dxdy-m_{r_{2}}^{2}$		(53)
		$\displaystyle=$	$\displaystyle 1+4c^{2}\alpha_{2}(1-\alpha_{2}).$		(53)

Let us calculate $\sigma_{r_{1}r_{2}}$ . Note this time the relation

\sigma_{r_{1}r_{2}}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}xy\,p_{r}(x,y)dxdy-m_{r_{1}}m_{r_{2}}.

(54)

Applying Lemma 3, we have

$\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}xy\,p_{r}(x,y)dxdy$		(55)
	$\displaystyle=(\alpha_{00}+\alpha_{11})c^{2}-(\alpha_{01}+\alpha_{10})c^{2}$
	$\displaystyle=(1-\alpha_{1}-\alpha_{2}+2u)c^{2}-(\alpha_{1}+\alpha_{2}-2u)c^{2}$
	$\displaystyle=(1-2\alpha_{1}-2\alpha_{2}+4u)c^{2}.$

Since

\left\{\begin{array}[]{l}m_{r_{1}}=(1-2\alpha_{1})c\\ m_{r_{2}}=(1-2\alpha_{2})c,\end{array}\right.

(56)

it follows that

$\displaystyle\sigma_{r_{1}r_{2}}$	$\displaystyle=$	$\displaystyle(1-2\alpha_{1}-2\alpha_{2}+4u)c^{2}-(1-2\alpha_{1})(1-2\alpha_{2})c^{2}$	(57)
	$\displaystyle=$	$\displaystyle 4(u-\alpha_{1}\alpha_{2})c^{2}$
	$\displaystyle=$	$\displaystyle 4\delta c^{2}.$

∎

III-B MMSE in Estimating the Input

Note Proposition 2. If $\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},r_{k}^{(2)})$ corresponds to the innovation, then the associated covariance matrix (denoted $\Sigma_{r}$ ) is expressed as the sum of two matrices $\Sigma_{x}$ and $\Sigma_{w}$ , i.e., $\Sigma_{r}=\Sigma_{x}+\Sigma_{w}$ . Here, $\Sigma_{x}$ is the covariance matrix of the estimation error of the signal, whereas $\Sigma_{w}$ is the covariance matrix of the observation noise. From the definition of $w_{j}$ (see Section I), it follows that

\Sigma_{w}=\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right)

(58)

and hence we have

\Sigma_{x}=\left(\begin{array}[]{cc}4c^{2}\alpha_{1}(1-\alpha_{1})&4c^{2}\delta\\ 4c^{2}\delta&4c^{2}\alpha_{2}(1-\alpha_{2})\end{array}\right).

(59)

Moreover, since the signal is expressed as $c\mbox{\boldmath$x$}_{k}$ , the covariance matrix (denoted $\hat{\Sigma}_{x}$ ) of the estimation error of $\mbox{\boldmath$x$}_{k}$ becomes

\hat{\Sigma}_{x}=\left(\begin{array}[]{cc}4\alpha_{1}(1-\alpha_{1})&4\delta\\ 4\delta&4\alpha_{2}(1-\alpha_{2})\end{array}\right).

(60)

Hence the corresponding MMSE is given by

\mbox{mmse}=\mbox{tr}\,\hat{\Sigma}_{x}=4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2}),

(61)

where $\mbox{tr}\,\hat{\Sigma}_{x}$ is the trace of matrix $\hat{\Sigma}_{x}$ .

Remark: The estimate in [11] is the “linear” least-squares estimate. On the other hand, the best estimate is given by the conditional expectation $E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1]$ , which is in general a highly “nonlinear” functional of $\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\}$ (cf. $\mbox{\boldmath$x$}_{k}$ is not Gaussian). Note that if $\{\mbox{\boldmath$x$}_{k},\mbox{\boldmath$z$}_{k}\}$ are jointly Gaussian, then $E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1]$ is a linear functional of $\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\}$ [16, Section II-D]. Hence to be exact, we have

\mbox{mmse}\leq 4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2}),

(62)

where the left-hand side corresponds to the conditional expectation, whereas the right-hand side corresponds to the linear least-squares estimate. As stated above, the MMSE is a nonlinear functional of the past observations. Hence we regard the right-hand side as an approximation to the MMSE.

III-C Modification of the Derived MMSE

Recall that $\alpha_{1}=P(v_{k}^{(1)}=1)$ and $\alpha_{2}=P(v_{k}^{(2)}=1)$ . That is, the MMSE in the estimation of $\mbox{\boldmath$x$}_{k}$ is expressed in terms of the distribution of $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})$ for the main decoder in an SST Viterbi decoder. We remark that $\alpha_{1}$ and $\alpha_{2}$ are not mutually independent. As a result, $4\alpha_{1}(1-\alpha_{1})$ and $4\alpha_{2}(1-\alpha_{2})$ have a correlation. Hence the simple sum of $4\alpha_{1}(1-\alpha_{1})$ and $4\alpha_{2}(1-\alpha_{2})$ is not appropriate for the MMSE and some modification considering the degree of correlation is necessary. The variable $\delta$ (see Lemma 3) has a close connection with the independence of $v_{k}^{(1)}$ and $v_{k}^{(2)}$ . When they have a weak correlation, $\delta$ is small, whereas when they have a strong correlation, $\delta$ is large. This observation suggests that a modification

4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-\lambda\delta

(63)

is more appropriate for the MMSE, where $\lambda$ is some constant. In the following, we discuss how to determine the value of $\lambda$ . We have the following.

Proposition 4

Suppose that $0\leq\epsilon\leq 1/2$ . We have

(4\alpha_{1}(1-\alpha_{1}))(4\alpha_{2}(1-\alpha_{2}))\geq(4\delta)^{2},

(64)

where $\delta=\alpha_{11}-\alpha_{1}\alpha_{2}$ .

Proof:

Note the relation: $\Sigma_{x}=\Sigma_{r}+\Sigma_{w}$ . $\Sigma_{r}$ is the covariance matrix associated with $p_{r}(x,y)$ and is positive semi-definite. $\Sigma_{w}$ (i.e., the identity matrix of size $2\times 2$ ) is clearly positive semi-definite. Hence $\Sigma_{x}$ is positive semi-definite and hence $\mbox{det}(\Sigma_{x})\geq 0$ [23], where “ $\mbox{det}(\cdot)$ ” denotes the determinant. Since $\mbox{det}(\Sigma_{x})=c^{4}\mbox{det}(\hat{\Sigma}_{x})$ , we have $\mbox{det}(\hat{\Sigma}_{x})\geq 0$ . ∎

From Proposition 4, we have

(4\alpha_{1}(1-\alpha_{1}))(4\alpha_{2}(1-\alpha_{2}))\geq(4\delta)^{2}.

So far we have carried out numerical calculations for four QLI codes $C_{1}\sim C_{4}$ (regarded as general codes) and one general code $C_{5}$ (these codes are defined in Sections IV and V). Then we have found that the difference between the values of $4\alpha_{1}(1-\alpha_{1})$ and $4\alpha_{2}(1-\alpha_{2})$ is small. Note that this fact is derived from the structure of $G^{-1}G$ . Let

G^{-1}G=\left(\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\end{array}\right).

(65)

Also, denote by $m_{col}^{(i)}~{}(i=1,2)$ the number of terms (i.e., $D^{j}$ ) contained in $\left(\begin{array}[]{c}b_{1i}\\ b_{2i}\end{array}\right)$ . We see that $\alpha_{i}~{}(i=1,2)$ is determined by $m_{col}^{(i)}$ . Hence if the difference between $m_{col}^{(1)}$ and $m_{col}^{(2)}$ is small, then $\alpha_{1}\approx\alpha_{2}$ holds. For example, it is shown (see Section IV-B) that

1)

$m_{col}^{(1)}=5$ and $m_{col}^{(2)}=6$ for $C_{1}$ ,
2)

$m_{col}^{(1)}=10$ and $m_{col}^{(2)}=10$ for $C_{2}$ .

Thus we have $4\alpha_{1}(1-\alpha_{1})\approx 4\alpha_{2}(1-\alpha_{2})$ . As a consequence, we have approximately

\left\{\begin{array}[]{l}4\alpha_{1}(1-\alpha_{1})\geq 4\delta\\ 4\alpha_{2}(1-\alpha_{2})\geq 4\delta.\end{array}\right.

(66)

That is,

4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta\geq 0

holds with high probability.

Now we can show that this inequality always holds. That is, we have the following.

Proposition 5

Suppose that $0\leq\epsilon\leq 1/2$ . We have

4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta\geq 0,

(67)

where $\delta=\alpha_{11}-\alpha_{1}\alpha_{2}$ .

Proof:

See Appendix C. ∎

Remark: Let

	$\displaystyle\hat{\Sigma}_{x}$	$\displaystyle=$	$\displaystyle\left(\begin{array}[]{cc}4\alpha_{1}(1-\alpha_{1})&4\delta\\ 4\delta&4\alpha_{2}(1-\alpha_{2})\end{array}\right)$		(70)
		$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\left(\begin{array}[]{cc}\sigma_{1}^{2}&\sigma_{12}\\ \sigma_{12}&\sigma_{2}^{2}\end{array}\right).$		(73)

Then $4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta$ corresponds to $\sigma_{1}^{2}+\sigma_{2}^{2}-2\sigma_{12}$ . Denote by $\mu$ the correlation coefficient between $x_{k}^{(1)}$ and $x_{k}^{(2)}$ , i.e.,

\mu\stackrel{{\scriptstyle\triangle}}{{=}}\frac{\sigma_{12}}{\sigma_{1}\sigma_{2}}.

(74)

Note that $-1\leq\mu\leq 1$ . We have

\sigma_{1}^{2}+\sigma_{2}^{2}-2\sigma_{12}=\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}.

(75)

Now we restrict $\mu$ to $0\leq\mu\leq 1$ . As the special cases, the following hold:

1)

$\mu=0$ : $\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}=\sigma_{1}^{2}+\sigma_{2}^{2}$ .
2)

$\mu=1$ : $\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}=(\sigma_{1}-\sigma_{2})^{2}$ .

Hence $-8\delta=-2\mu\sigma_{1}\sigma_{2}$ represents the correction term depending on the degree of correlation.

Based on Proposition 5, we finally set

	mmse	$\displaystyle=$	$\displaystyle 4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta$		(76)
		$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\xi(\alpha_{1},\alpha_{2},\delta)$		(77)

as the MMSE in the estimation of $\mbox{\boldmath$x$}_{k}$ .

III-D In the Case of QLI Codes

Consider a QLI code whose generator matrix is given by

G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1).

Let $\mbox{\boldmath$\eta$}_{k-L}=(\eta_{k-L}^{(1)},\eta_{k-L}^{(2)})$ be the input to the main decoder in an SST Viterbi decoder. We see that the results in the previous sections hold for $\mbox{\boldmath$\eta$}_{k-L}$ as well. Therefore, we state only the results.

Proposition 6 ([26])

The joint distribution of $\eta_{k-L}^{(1)}$ and $\eta_{k-L}^{(2)}$ (denoted $p_{\eta}(x,y)$ ) is given by

	$\displaystyle p_{\eta}(x,y)$	$\displaystyle=$	$\displaystyle\beta_{00}q(x-c)q(y-c)+\beta_{01}q(x-c)q(y+c)$		(78)
			$\displaystyle+\beta_{10}q(x+c)q(y-c)+\beta_{11}q(x+c)q(y+c),$		(78)

where $\beta_{ij}=P(v_{k}^{(1)}=i,v_{k}^{(2)}=j)$ .

Lemma 6 ([26])

Assume that $0\leq\epsilon\leq 1/2$ . The following quantities have the same value:

$\displaystyle P(v_{k}^{(1)}=0,v_{k}^{(2)}=0)-P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=0)$	$\displaystyle=$	$\displaystyle\beta_{00}-(1-\beta_{1})(1-\beta_{2})$	(79)
$\displaystyle P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=1)-P(v_{k}^{(1)}=0,v_{k}^{(2)}=1)$	$\displaystyle=$	$\displaystyle(1-\beta_{1})\beta_{2}-\beta_{01}$	(80)
$\displaystyle P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=0)-P(v_{k}^{(1)}=1,v_{k}^{(2)}=0)$	$\displaystyle=$	$\displaystyle\beta_{1}(1-\beta_{2})-\beta_{10}$	(81)
$\displaystyle P(v_{k}^{(1)}=1,v_{k}^{(2)}=1)-P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=1)$	$\displaystyle=$	$\displaystyle\beta_{11}-\beta_{1}\beta_{2}.$	(82)

The common value is denoted by $\delta^{\prime}$ .

Proposition 7 ([26])

The covariance matrix associated with $p_{\eta}(x,y)$ is given by

	$\displaystyle\Sigma_{\eta}$	$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\left(\begin{array}[]{cc}\sigma_{\eta_{1}}^{2}&\sigma_{\eta_{1}\eta_{2}}\\ \sigma_{\eta_{1}\eta_{2}}&\sigma_{\eta_{2}}^{2}\end{array}\right)$		(85)
		$\displaystyle=$	$\displaystyle\left(\begin{array}[]{cc}1+4c^{2}\beta_{1}(1-\beta_{1})&4c^{2}\delta^{\prime}\\ 4c^{2}\delta^{\prime}&1+4c^{2}\beta_{2}(1-\beta_{2})\end{array}\right).$		(88)

Proposition 8

Suppose that $0\leq\epsilon\leq 1/2$ . We have

(4\beta_{1}(1-\beta_{1}))(4\beta_{2}(1-\beta_{2}))\geq(4\delta^{\prime})^{2},

(89)

where $\delta^{\prime}=\beta_{11}-\beta_{1}\beta_{2}$ .

Proposition 9

Suppose that $0\leq\epsilon\leq 1/2$ . We have

4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime}\geq 0,

(90)

where $\delta^{\prime}=\beta_{11}-\beta_{1}\beta_{2}$ .

Finally, for QLI codes we set

	mmse	$\displaystyle=$	$\displaystyle 4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime}$		(91)
		$\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}}$	$\displaystyle\xi(\beta_{1},\beta_{2},\delta^{\prime})$		(92)

as the MMSE in the estimation of $\mbox{\boldmath$x$}_{k-L}$ .

IV Mutual Information and MMSEs

We have shown that the MMSE in the estimation of $\mbox{\boldmath$x$}_{k}$ is expressed in terms of the distribution of $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})$ for the main decoder in an SST Viterbi decoder. More precisely, we have derived the following:

1)

As a general code: $\mbox{mmse}=\xi(\alpha_{1},\alpha_{2},\delta)$ .
2)

As a QLI code: $\mbox{mmse}=\xi(\beta_{1},\beta_{2},\delta^{\prime})$ .

In this section, we discuss the validity of the derived MMSE from the viewpoint of mutual information and mean-square error.

In this paper, we have used the signal model (see Section I):

\mbox{\boldmath$z$}_{k}=c\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k}.

Since $n_{0}=2$ , it follows that

c=\sqrt{2E_{s}/N_{0}}=\sqrt{E_{b}/N_{0}}.

(93)

Then letting

\rho=c^{2}=E_{b}/N_{0},

(94)

the above equation is rewritten as

\mbox{\boldmath$z$}_{k}=\sqrt{\rho}\,\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k}.

(95)

Note that this is just the signal model used in [6] (i.e., $\rho=\mbox{snr}$ ). Guo et al. [6] discussed the relation between the input-output mutual information and the MMSE in estimating the input in Gaussian channels. Their relation holds for discrete-time and continuous-time noncausal (smoothing) MMSE estimation regardless of the input statistics (i.e., not necessarily Gaussian). Here [25] recall that the innovations approach to Viterbi decoding of QLI codes has a close connection with smoothing in the linear estimation theory. Then we thought the results in [6] can be used to discuss the validity of the MMSE obtained in the previous section. Hence the argument in this section is based on that of Guo et al. [6].

IV-A General Codes

Let $\mbox{\boldmath$z$}^{n}\stackrel{{\scriptstyle\triangle}}{{=}}\{\mbox{\boldmath$z$}_{1},\mbox{\boldmath$z$}_{2},\cdots,\mbox{\boldmath$z$}_{n}\}$ . Also, let $E[\mbox{\boldmath$x$}|\mbox{\boldmath$z$}^{n}]$ be the conditional expectation of $x$ given $\mbox{\boldmath$z$}^{n}$ . According to Guo et al. [6, Section IV-A], let us define as follows:

\mbox{pmmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])^{T}],

(96)

where $\mbox{pmmse}(i,\rho)$ represents the one-step prediction MMSE. Note that this is a function of $\rho~{}(=c^{2})$ . This is because $E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}]$ depends on $\mbox{\boldmath$z$}^{i-1}$ (cf. $\mbox{\boldmath$z$}^{i-1}$ is a function of $\rho$ ).

Remark 1: The above definition is slightly different from that in [6], where $\mbox{pmmse}(i,\rho)$ is defined as

\mbox{pmmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[|x_{i}-E[x_{i}|z^{i-1}]|^{2}],

(97)

where $z^{i-1}\stackrel{{\scriptstyle\triangle}}{{=}}\{z_{1},\cdots,z_{i-1}\}$ . In order to distinguish it from ours, theirs is denoted by $\mbox{pmmse}_{i}(\rho)$ in this section. Set $n=2k$ . We have

	$\displaystyle\sum_{i=1}^{n}\mbox{pmmse}_{i}(\rho)$				(98)
			$\displaystyle=\sum_{i=1}^{k}(\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho)).$		(98)

Here note the following:

$\displaystyle\mbox{pmmse}_{2i-1}(\rho)$	$\displaystyle=$	$\displaystyle E[\|x_{2i-1}-E[x_{2i-1}\|z^{2i-2}]\|^{2}]$	(99)
$\displaystyle\mbox{pmmse}_{2i}(\rho)$	$\displaystyle=$	$\displaystyle E[\|x_{2i}-E[x_{2i}\|z^{2i-1}]\|^{2}]$	(100)
	$\displaystyle\leq$	$\displaystyle E[\|x_{2i}-E[x_{2i}\|z^{2i-2}]\|^{2}].$	(100)

Hence

$\displaystyle\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho)$	$\displaystyle\leq$	$\displaystyle E[\|x_{2i-1}-E[x_{2i-1}\|z^{2i-2}]\|^{2}]$	(101)
		$\displaystyle+E[\|x_{2i}-E[x_{2i}\|z^{2i-2}]\|^{2}]$
	$\displaystyle=$	$\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])^{T}]$

holds. Thus we have

$\displaystyle\sum_{i=1}^{n}\mbox{pmmse}_{i}(\rho)$		(102)
	$\displaystyle=\sum_{i=1}^{k}(\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho))$
	$\displaystyle\leq\sum_{i=1}^{k}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])^{T}]$
	$\displaystyle=\sum_{i=1}^{k}\mbox{pmmse}(i,\rho).$

It is confirmed from Remark 1 that the relation in [6, Theorem 9] still holds. In fact, we have

	$\displaystyle I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]$	$\displaystyle\leq$	$\displaystyle\frac{\rho}{2}\sum_{i=1}^{2k}\mbox{pmmse}_{i}(\rho)$		(103)
		$\displaystyle\leq$	$\displaystyle\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho).$		(103)

Then it follows that

\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\right).

(104)

Our argument is based on the innovations approach proposed by Kailath [11] (cf. [25]). Also, our model is a discrete-time one. Hence it is reasonable to think that the MMSE associated with the estimation of $\mbox{\boldmath$x$}_{i}$ is corresponding to $\mbox{pmmse}(i,\rho)$ (see Proposition 2). That is, it is appropriate to set

\mbox{pmmse}(i,\rho)=\xi(\alpha_{1},\alpha_{2},\delta).

(105)

Moreover, we set

\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\approx\xi(\alpha_{1},\alpha_{2},\delta).

(106)

Note that the left-hand side is the average prediction MMSE (see [6, Section IV-A]). We finally have

\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\lesssim\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta).

(107)

On the other hand, the mutual information between $\mbox{\boldmath$x$}^{k}$ and $\mbox{\boldmath$z$}^{k}$ [2, Section 9.4] is evaluated as

I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq\frac{1}{2}\log(1+\rho)^{2k}=k\log(1+\rho).

(108)

Then we have

\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{\log(1+\rho)}{\rho}.

(109)

Remark 2: The essence of our argument lies in the equality

\mbox{pmmse}(i,\rho)=\xi(\alpha_{1},\alpha_{2},\delta).

Meanwhile, using the signal model, $\mbox{pmmse}(i,\rho)$ is calculated as follows. Since $\mbox{\boldmath$x$}_{i}$ and $\mbox{\boldmath$z$}^{i-1}$ are mutually independent, we have

$\displaystyle\mbox{pmmse}(i,\rho)$	$\displaystyle=$	$\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}\|\mbox{\boldmath$z$}^{i-1}])^{T}]$	(110)
	$\displaystyle=$	$\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}])^{T}]$
	$\displaystyle=$	$\displaystyle\mbox{var}(x_{i}^{(1)})+\mbox{var}(x_{i}^{(2)})$
	$\displaystyle=$	$\displaystyle 1+1=2,$

where “var” denotes the variance. Hence

\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)=k\rho.

(111)

On the other hand, using the inequality: $\log\,x\leq x-1~{}(x>0)$ , we have

\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{\log(1+\rho)}{\rho}\leq\frac{\rho}{\rho}=1.

(112)

Hence

I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq k\rho=\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)

actually holds.

From the above argument, it is expected that $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ are closely related. Then in the next section, we will discuss the relation between them.

IV-B Numerical Results as General Codes

In order to examine the relation between $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ , numerical calculations have been carried out. We have taken two QLI codes:

1)

$C_{1}~{}(\nu=2,L=1)$ : The generator matrix is defined by $G_{1}(D)=(1+D+D^{2},1+D^{2})$ .
2)

$C_{2}~{}(\nu=6,L=2)$ : The generator matrix is defined by $G_{2}(D)=(1+D+D^{3}+D^{4}+D^{6},1+D+D^{2}+D^{3}+D^{4}+D^{6})$ .

By regarding these QLI codes as “general” codes, we have compared the values of $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ . The results are shown in Tables I and II, where $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ is simply denoted $\frac{1}{2}\mbox{mmse}$ . The results for $C_{1}$ and $C_{2}$ are shown also in Fig.1 and Fig.2, respectively.

TABLE I:

\frac{\log(1+\rho)}{\rho}

and minimum mean-square error (

C_{1}

as a general code)

$E_{b}/N_{0}~{}(\mbox{dB})$	$\frac{\log(1+\rho)}{\rho}$	$\alpha_{1}$	$4\alpha_{1}(1-\alpha_{1})$	$\alpha_{2}$	$4\alpha_{2}(1-\alpha_{2})$	$\delta$	$\frac{1}{2}\mbox{mmse}$
$-10$	$0.9531$	$0.4995$	$1.0000$	$0.4999$	$1.0000$	$0.0038$	$0.9848$
$-9$	$0.9419$	$0.4992$	$1.0000$	$0.4998$	$1.0000$	$0.0053$	$0.9788$
$-8$	$0.9282$	$0.4986$	$1.0000$	$0.4996$	$1.0000$	$0.0074$	$0.9704$
$-7$	$0.9118$	$0.4976$	$1.0000$	$0.4992$	$1.0000$	$0.0102$	$0.9592$
$-6$	$0.8921$	$0.4958$	$0.9999$	$0.4984$	$1.0000$	$0.0142$	$0.9432$
$-5$	$0.8689$	$0.4930$	$0.9998$	$0.4970$	$1.0000$	$0.0193$	$0.9227$
$-4$	$0.8418$	$0.4883$	$0.9995$	$0.4945$	$0.9999$	$0.0262$	$0.8949$
$-3$	$0.8106$	$0.4808$	$0.9985$	$0.4900$	$0.9996$	$0.0352$	$0.8583$
$-2$	$0.7753$	$0.4691$	$0.9962$	$0.4823$	$0.9987$	$0.0465$	$0.8115$
$-1$	$0.7360$	$0.4515$	$0.9906$	$0.4696$	$0.9963$	$0.0602$	$0.7527$
$0$	$0.6931$	$0.4259$	$0.9780$	$0.4494$	$0.9898$	$0.0758$	$0.6807$
$1$	$0.6473$	$0.3904$	$0.9520$	$0.4191$	$0.9738$	$0.0917$	$0.5961$
$2$	$0.5992$	$0.3442$	$0.9029$	$0.3766$	$0.9391$	$0.1050$	$0.5010$
$3$	$0.5498$	$0.2879$	$0.8201$	$0.3213$	$0.8723$	$0.1116$	$0.3998$
$4$	$0.5001$	$0.2255$	$0.6986$	$0.2565$	$0.7628$	$0.1076$	$0.3003$
$5$	$0.4510$	$0.1621$	$0.5433$	$0.1876$	$0.6096$	$0.0921$	$0.2081$
$6$	$0.4033$	$0.1049$	$0.3756$	$0.1231$	$0.4318$	$0.0681$	$0.1313$
$7$	$0.3579$	$0.0599$	$0.2252$	$0.0710$	$0.2638$	$0.0427$	$0.0737$
$8$	$0.3153$	$0.0239$	$0.1138$	$0.0349$	$0.1347$	$0.0222$	$0.0355$
$9$	$0.2758$	$0.0120$	$0.0474$	$0.0143$	$0.0564$	$0.0094$	$0.0143$
$10$	$0.2398$	$0.0039$	$0.0155$	$0.0047$	$0.0187$	$0.0031$	$0.0047$

\includegraphics

[width=10.0cm,clip]conl-2-G.ps

Figure 1:

\frac{\log(1+\rho)}{\rho}

and

\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)~{}(C_{1})

TABLE II:

\frac{\log(1+\rho)}{\rho}

and minimum mean-square error (

C_{2}

as a general code)

$E_{b}/N_{0}~{}(\mbox{dB})$	$\frac{\log(1+\rho)}{\rho}$	$\alpha_{1}$	$4\alpha_{1}(1-\alpha_{1})$	$\alpha_{2}$	$4\alpha_{2}(1-\alpha_{2})$	$\delta$	$\frac{1}{2}\mbox{mmse}$
$-10$	$0.9531$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-9$	$0.9419$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-8$	$0.9282$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-7$	$0.9118$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-6$	$0.8921$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0001$	$0.9996$
$-5$	$0.8689$	$0.4999$	$1.0000$	$0.4999$	$1.0000$	$0.0003$	$0.9988$
$-4$	$0.8418$	$0.4997$	$1.0000$	$0.4997$	$1.0000$	$0.0006$	$0.9976$
$-3$	$0.8106$	$0.4993$	$1.0000$	$0.4993$	$1.0000$	$0.0013$	$0.9948$
$-2$	$0.7753$	$0.4981$	$1.0000$	$0.4981$	$1.0000$	$0.0029$	$0.9884$
$-1$	$0.7360$	$0.4953$	$0.9999$	$0.4953$	$0.9999$	$0.0060$	$0.9759$
$0$	$0.6931$	$0.4890$	$0.9995$	$0.4890$	$0.9995$	$0.0117$	$0.9527$
$1$	$0.6473$	$0.4760$	$0.9977$	$0.4760$	$0.9977$	$0.0215$	$0.9118$
$2$	$0.5992$	$0.4515$	$0.9906$	$0.4515$	$0.9906$	$0.0363$	$0.8452$
$3$	$0.5498$	$0.4100$	$0.9676$	$0.4100$	$0.9676$	$0.0553$	$0.7464$
$4$	$0.5001$	$0.3493$	$0.9091$	$0.3493$	$0.9091$	$0.0731$	$0.6168$
$5$	$0.4510$	$0.2717$	$0.7915$	$0.2717$	$0.7915$	$0.0814$	$0.4659$
$6$	$0.4033$	$0.1878$	$0.6101$	$0.1878$	$0.6101$	$0.0740$	$0.3139$
$7$	$0.3579$	$0.1126$	$0.3998$	$0.1126$	$0.3998$	$0.0538$	$0.1847$
$8$	$0.3153$	$0.0569$	$0.2145$	$0.0569$	$0.2145$	$0.0306$	$0.0920$
$9$	$0.2758$	$0.0237$	$0.0925$	$0.0237$	$0.0925$	$0.0136$	$0.0381$
$10$	$0.2398$	$0.0078$	$0.0308$	$0.0078$	$0.0308$	$0.0046$	$0.0125$

\includegraphics

[width=10.0cm,clip]conl-6-G.ps

Figure 2:

\frac{\log(1+\rho)}{\rho}

and

\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)~{}(C_{2})

With respect to the behaviors of $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ , we have the following:

(i) $\frac{\log(1+\rho)}{\rho}$ :

1)

$\epsilon\rightarrow 1/2$ : Since $\rho\rightarrow 0$ , $\frac{\log(1+\rho)}{\rho}\rightarrow 1$ .
2)

$\epsilon\rightarrow 0$ : Since $\rho\rightarrow\infty$ , $\frac{\log(1+\rho)}{\rho}\rightarrow 0$ .

Note that $\frac{\log(1+\rho)}{\rho}$ approaches $0$ slowly as the SNR increases.

(ii) $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ :

$\epsilon\rightarrow 1/2$ : Since $\alpha_{i}\rightarrow 1/2~{}(i=1,2)$ and since $\delta\rightarrow 0$ ,

\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)=\frac{1}{2}(4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta)\rightarrow 1.

(113)

$\epsilon\rightarrow 0$ : Since $\alpha_{i}\rightarrow 0~{}(i=1,2)$ and since $\delta\rightarrow 0$ ,

\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)=\frac{1}{2}(4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta)\rightarrow 0.

(114)

From Figs. 1 and 2, we observe that the behaviors of $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ for $C_{1}$ and $C_{2}$ are considerably different. Let us examine the cause of the difference.

Let $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})$ be the encoded block for the main decoder. We have already shown that the degree of correlation between $v_{k}^{(1)}$ and $v_{k}^{(2)}$ is expressed in terms of $\delta$ (see Lemma 3). We also see that the degree of correlation between $v_{k}^{(1)}$ and $v_{k}^{(2)}$ has a close connection with the number of error terms (denoted $m_{v}$ ) by which $v_{k}^{(1)}$ and $v_{k}^{(2)}$ differ. In fact, when $m_{v}$ is small, $v_{k}^{(1)}$ and $v_{k}^{(2)}$ have a strong correlation, whereas when $m_{v}$ is large, $v_{k}^{(1)}$ and $v_{k}^{(2)}$ have a weak correlation. These observations show that $\delta$ is closely related to $m_{v}$ . Then it is expected that the behavior of $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ depends heavily on the value of $m_{v}$ (i.e., on a code). In order to confirm this fact, let us evaluate the values of $m_{v}$ for $C_{1}$ and $C_{2}$ . Note that since the encoded block is expressed as $\mbox{\boldmath$v$}_{k}=(\mbox{\boldmath$e$}_{k}G^{-1})G$ , $m_{v}$ is obtained from $G^{-1}G$ .

$C_{1}$ : The inverse encoder is given by

G_{1}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right).

(115)

Hence from

G_{1}^{-1}G_{1}=\left(\begin{array}[]{cc}D+\mbox{\boldmath$D$}^{2}+D^{3}&D+D^{3}\\ 1+D^{3}&1+\mbox{\boldmath$D$}+\mbox{\boldmath$D$}^{2}+D^{3}\end{array}\right),

(116)

it follows that $m_{v}=3$ (i.e., bold-faced terms).

$C_{2}$ : The inverse encoder is given by

G_{2}^{-1}=\left(\begin{array}[]{c}D^{3}+D^{4}+D^{5}\\ 1+D+D^{3}+D^{4}+D^{5}\end{array}\right).

(117)

Then it is shown that

G_{2}^{-1}G_{2}=\left(\begin{array}[]{cc}D^{3}+D^{10}+D^{11}&D^{3}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+\mbox{\boldmath$D$}^{7}+D^{10}+D^{11}\\ 1+\mbox{\boldmath$D$}^{2}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+\mbox{\boldmath$D$}^{7}+D^{10}+D^{11}&1+\mbox{\boldmath$D$}^{3}+D^{10}+D^{11}\end{array}\right).

(118)

We see that $m_{v}=8$ .

Now compare the values of $\delta$ in Tables I and II. We observe that they are considerably different. For example, at the SNR of $E_{b}/N_{0}=0\,\mbox{dB}$ , we have

\delta~{}(C_{1})=0.0758>\delta~{}(C_{2})=0.0117.

(119)

We think the difference is due to the fact

m_{v}(C_{1})=3<m_{v}(C_{2})=8.

(120)

$m_{v}(C_{1})=3$ means that there is a certain correlation between $v_{k}^{(1)}$ and $v_{k}^{(2)}$ and then we have $\delta=0.0758$ (i.e., $8\delta=0.6064$ ). This value is fairly large. On the other hand, $m_{v}(C_{2})=8$ implies that $v_{k}^{(1)}$ and $v_{k}^{(2)}$ have a weak correlation, which results in a small value of $\delta=0.0117$ .

We have already seen that the behavior of $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ is dependent on $m_{v}$ . Also, it has been shown that the values of $m_{v}$ for $C_{1}$ and $C_{2}$ are considerably different. We think these explain the behaviors of curves in Figs. 1 and 2.

As observed above, since the behavior of $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ varies with the codes, it is appropriate to take the average with respect to a code. Then including $C_{1}$ and $C_{2}$ , we have taken additionally two QLI codes $C_{3}$ and $C_{4}$ whose generator matrices are defined by

G_{3}(D)=(1+D+D^{2}+D^{3},1+D+D^{3})

(121)

and

G_{4}(D)=(1+D^{3}+D^{4},1+D+D^{3}+D^{4}),

(122)

respectively. It is shown that $C_{3}$ and $C_{4}$ have $m_{v}=4$ and $m_{v}=5$ , respectively. Then we have averaged the corresponding $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ ’s. The result is shown in Fig.3, where $\frac{1}{2}\mbox{mmse(av)}$ denotes the average value of $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ over four codes. We think the relation between $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ is shown more properly in Fig.3.

\includegraphics

[width=10.0cm,clip]conl-2346-G.ps

Figure 3:

\frac{\log(1+\rho)}{\rho}

and the average of

\frac{1}{2}\mbox{mmse}

’s.

In Figs. 1, 2, and 3, we observe that $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)>\frac{\log(1+\rho)}{\rho}$ holds at low-to-medium SNRs. However, the sign of inequality is reversed at high SNRs. We think this comes from the fact that $\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ approaches $0$ rapidly as the SNR increases, whereas $\frac{\log(1+\rho)}{\rho}$ approaches $0$ slowly as the SNR increases.

IV-C QLI Codes

In [25], we have shown that the innovations approach to Viterbi decoding of QLI codes is related to “smoothing” in the linear estimation theory. As a result, the relationship between the mutual information and the MMSE can be discussed using the result in [6, Corollary 3]. For the purpose, we provide a proposition.

Proposition 10

Let $\mbox{\boldmath$x$}=(x_{1},\cdots,x_{n})$ , where $x_{i}$ is the equiprobable binary input with unit variance. Each $x_{i}$ is assumed to be independent of all others. Also, let $\mbox{\boldmath$\tilde{x}$}=(\tilde{x}_{1},\cdots,\tilde{x}_{n})$ be a standard Gaussian vector. Moreover, let $w$ be a standard Gaussian vector which is independent of $x$ and $\tilde{x}$ . Then we have

\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}]\leq\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}].

(123)

Proof:

We follow Guo et al. [6]. Let $\mbox{\boldmath$z$}=\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}$ , where $\mbox{\boldmath$z$}=(z_{1},\cdots,z_{n})$ . We have

$\displaystyle\mbox{mmse}(\rho)$	$\displaystyle=$	$\displaystyle E[(\mbox{\boldmath$x$}-E[\mbox{\boldmath$x$}\|\mbox{\boldmath$z$}])(\mbox{\boldmath$x$}-E[\mbox{\boldmath$x$}\|\mbox{\boldmath$z$}])^{T}]$	(124)
	$\displaystyle=$	$\displaystyle E[\|x_{1}-E[x_{1}\|\mbox{\boldmath$z$}]\|^{2}]+\cdots+E[\|x_{n}-E[x_{n}\|\mbox{\boldmath$z$}]\|^{2}]$
	$\displaystyle\leq$	$\displaystyle E[\|x_{1}-E[x_{1}\|z_{1}]\|^{2}]+\cdots+E[\|x_{n}-E[x_{n}\|z_{n}]\|^{2}]$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mbox{mmse}_{i}(\rho),$

where $\mbox{mmse}_{i}(\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[|x_{i}-E[x_{i}|z_{i}]|^{2}]$ . Then it follows from [6, Theorems 1 and 2] that

$\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}]$	$\displaystyle=$	$\displaystyle\frac{1}{2}\mbox{mmse}(\rho)$	(125)
	$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{n}\frac{1}{2}\mbox{mmse}_{i}(\rho)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}].$

(Remark 1: This relation holds for any random vector $x$ satisfying $E[\mbox{\boldmath$x$}\mbox{\boldmath$x$}^{T}]<\infty$ (cf. [6]).)

On the other hand, since $\tilde{x}_{i}$ is Gaussian (see [2, Section 9.4]), we have

\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}]=\sum_{i=1}^{n}\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}].

(126)

Note that in the scalar case,

\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}]\leq\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}]

(127)

holds (see Appendix D and see [6, Fig.1]). Hence we have

$\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}]$	$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}]$
	$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}]$
	$\displaystyle=$	$\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}].$

∎

Now the following [6, Corollary 3] has been shown:

\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{1}{2}\sum_{i=1}^{k}\mbox{mmse}(i,\rho),

(128)

where

\mbox{mmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{k}])^{T}]~{}(1\leq i\leq k).

(129)

Remark 2: The definition of $\mbox{mmse}(i,\rho)$ is slightly different from that in [6]. However, since the whole observations $\mbox{\boldmath$z$}^{k}$ are used in each conditional expectation, the above equality holds also under our definition.

First note the left-hand side, i.e., $\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]$ . Let $\mbox{\boldmath$\tilde{x}$}_{k}$ be Gaussian. It follows from Proposition 10 that

\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]\geq\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}].

(130)

Since $\mbox{\boldmath$\tilde{x}$}_{k}$ is Gaussian, we have

I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{1}{2}\log(1+\rho)^{2k}=k\log(1+\rho)

(131)

and hence

\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{k}{1+\rho}

(132)

holds. Therefore,

\frac{1}{1+\rho}\geq\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)=\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\right)

(133)

is obtained.

Next, note the right-hand side, i.e., $\frac{1}{2}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)$ . For $1+L\leq k$ , we have

$\displaystyle\sum_{i=1}^{k}\mbox{mmse}(i,\rho)$		(134)
	$\displaystyle=E[(\mbox{\boldmath$x$}_{1}-E[\mbox{\boldmath$x$}_{1}\|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{1}-E[\mbox{\boldmath$x$}_{1}\|\mbox{\boldmath$z$}^{k}])^{T}]$
	$\displaystyle\cdots$
	$\displaystyle+E[(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}\|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}\|\mbox{\boldmath$z$}^{k}])^{T}]$
	$\displaystyle\cdots$
	$\displaystyle+E[(\mbox{\boldmath$x$}_{k}-E[\mbox{\boldmath$x$}_{k}\|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k}-E[\mbox{\boldmath$x$}_{k}\|\mbox{\boldmath$z$}^{k}])^{T}].$

Our concern is the evaluation of

	$\displaystyle\mbox{mmse}(k-L,\rho)$				(135)
			$\displaystyle=E[(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}\|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}\|\mbox{\boldmath$z$}^{k}])^{T}].$		(135)

For the purpose, we assume an approximation:

\mbox{mmse}(k-L,\rho)\approx\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho).

(136)

Hence in the relation

$\displaystyle\frac{1}{1+\rho}$	$\displaystyle\geq$	$\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)$	(137)
	$\displaystyle=$	$\displaystyle\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\right)$
	$\displaystyle\approx$	$\displaystyle\frac{1}{2}\mbox{mmse}(k-L,\rho),$

we replace $\mbox{mmse}(k-L,\rho)$ by $\xi(\beta_{1},\beta_{2},\delta^{\prime})$ . Then we have

\frac{1}{1+\rho}\geq\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\approx\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}).

(138)

Note that the above approximation is equivalent to set

\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\approx\xi(\beta_{1},\beta_{2},\delta^{\prime}),

(139)

where the left-hand side represents the average noncausal MMSE (see [6, Section IV-A]).

IV-D Numerical Results as QLI Codes

In order to confirm the validity of the derived equation, numerical calculations have been carried out. We have used the same QLI codes $C_{1}$ and $C_{2}$ as in the previous section. In this case, we regard these codes as inherent QLI codes and compare the values of $\frac{1}{1+\rho}$ and $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ . The results are shown in Tables III and IV, where $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ is simply denoted $\frac{1}{2}\mbox{mmse}$ . The results for $C_{1}$ and $C_{2}$ are shown also in Fig.4 and Fig.5, respectively.

TABLE III:

\frac{1}{1+\rho}

and minimum mean-square error (

C_{1}

as a QLI code)

$E_{b}/N_{0}~{}(\mbox{dB})$	$\frac{1}{1+\rho}$	$\beta_{1}$	$4\beta_{1}(1-\beta_{1})$	$\beta_{2}$	$4\beta_{2}(1-\beta_{2})$	$\delta^{\prime}$	$\frac{1}{2}\mbox{mmse}$
$-10$	$0.9091$	$0.4999$	$1.0000$	$0.4981$	$1.0000$	$0.0154$	$0.9384$
$-9$	$0.8882$	$0.4998$	$1.0000$	$0.4970$	$1.0000$	$0.0192$	$0.9232$
$-8$	$0.8632$	$0.4996$	$1.0000$	$0.4954$	$0.9999$	$0.0239$	$0.9044$
$-7$	$0.8337$	$0.4992$	$1.0000$	$0.4929$	$0.9998$	$0.0297$	$0.8811$
$-6$	$0.7992$	$0.4984$	$1.0000$	$0.4892$	$0.9995$	$0.0368$	$0.8526$
$-5$	$0.7597$	$0.4970$	$1.0000$	$0.4835$	$0.9989$	$0.0453$	$0.8183$
$-4$	$0.7153$	$0.4945$	$0.9999$	$0.4752$	$0.9975$	$0.0555$	$0.7767$
$-3$	$0.6661$	$0.4900$	$0.9996$	$0.4632$	$0.9946$	$0.0674$	$0.7275$
$-2$	$0.6131$	$0.4823$	$0.9987$	$0.4461$	$0.9884$	$0.0811$	$0.6692$
$-1$	$0.5573$	$0.4696$	$0.9963$	$0.4226$	$0.9760$	$0.0959$	$0.6026$
$0$	$0.5000$	$0.4494$	$0.9898$	$0.3914$	$0.9528$	$0.1110$	$0.5273$
$1$	$0.4427$	$0.4191$	$0.9738$	$0.3515$	$0.9118$	$0.1242$	$0.4460$
$2$	$0.3869$	$0.3766$	$0.9391$	$0.3033$	$0.8452$	$0.1326$	$0.3618$
$3$	$0.3339$	$0.3213$	$0.8723$	$0.2482$	$0.7464$	$0.1325$	$0.2794$
$4$	$0.2847$	$0.2565$	$0.7628$	$0.1905$	$0.6168$	$0.1213$	$0.2046$
$5$	$0.2403$	$0.1876$	$0.6096$	$0.1346$	$0.4659$	$0.0995$	$0.1398$
$6$	$0.2008$	$0.1231$	$0.4318$	$0.0858$	$0.3138$	$0.0714$	$0.0872$
$7$	$0.1663$	$0.0710$	$0.2638$	$0.0485$	$0.1846$	$0.0439$	$0.0486$
$8$	$0.1368$	$0.0349$	$0.1347$	$0.0236$	$0.0922$	$0.0225$	$0.0235$
$9$	$0.1118$	$0.0143$	$0.0564$	$0.0096$	$0.0380$	$0.0095$	$0.0092$
$10$	$0.0909$	$0.0047$	$0.0187$	$0.0031$	$0.0124$	$0.0031$	$0.0032$

\includegraphics

[width=10.0cm,clip]conl-2-Q.ps

Figure 4:

\frac{1}{1+\rho}

and

\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})~{}(C_{1})

TABLE IV:

\frac{1}{1+\rho}

and minimum mean-square error (

C_{2}

as a QLI code)

$E_{b}/N_{0}~{}(\mbox{dB})$	$\frac{1}{1+\rho}$	$\beta_{1}$	$4\beta_{1}(1-\beta_{1})$	$\beta_{2}$	$4\beta_{2}(1-\beta_{2})$	$\delta^{\prime}$	$\frac{1}{2}\mbox{mmse}$
$-10$	$0.9091$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0154$	$0.9384$
$-9$	$0.8882$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0192$	$0.9232$
$-8$	$0.8632$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0239$	$0.9044$
$-7$	$0.8337$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0297$	$0.8812$
$-6$	$0.7992$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0368$	$0.8528$
$-5$	$0.7597$	$0.4999$	$1.0000$	$0.5000$	$1.0000$	$0.0454$	$0.8184$
$-4$	$0.7153$	$0.4997$	$1.0000$	$0.4999$	$1.0000$	$0.0557$	$0.7772$
$-3$	$0.6661$	$0.4993$	$1.0000$	$0.4998$	$1.0000$	$0.0678$	$0.7288$
$-2$	$0.6131$	$0.4981$	$1.0000$	$0.4994$	$1.0000$	$0.0820$	$0.6720$
$-1$	$0.5573$	$0.4953$	$0.9999$	$0.4981$	$1.0000$	$0.0984$	$0.6064$
$0$	$0.5000$	$0.4890$	$0.9995$	$0.4949$	$0.9999$	$0.1164$	$0.5341$
$1$	$0.4427$	$0.4760$	$0.9977$	$0.4869$	$0.9993$	$0.1359$	$0.4548$
$2$	$0.3869$	$0.4515$	$0.9906$	$0.4695$	$0.9963$	$0.1553$	$0.3721$
$3$	$0.3339$	$0.4100$	$0.9676$	$0.4362$	$0.9837$	$0.1717$	$0.2890$
$4$	$0.2847$	$0.3493$	$0.9091$	$0.3814$	$0.9437$	$0.1788$	$0.2112$
$5$	$0.2403$	$0.2717$	$0.7915$	$0.3048$	$0.8476$	$0.1692$	$0.1429$
$6$	$0.2008$	$0.1878$	$0.6101$	$0.2159$	$0.6770$	$0.1388$	$0.0883$
$7$	$0.1663$	$0.1126$	$0.3998$	$0.1319$	$0.4580$	$0.0950$	$0.0490$
$8$	$0.1368$	$0.0569$	$0.2145$	$0.0674$	$0.2515$	$0.0524$	$0.0236$
$9$	$0.1118$	$0.0237$	$0.0925$	$0.0283$	$0.1099$	$0.0229$	$0.0096$
$10$	$0.0909$	$0.0078$	$0.0308$	$0.0093$	$0.0368$	$0.0077$	$0.0032$

\includegraphics

[width=10.0cm,clip]conl-6-Q.ps

Figure 5:

\frac{1}{1+\rho}

and

\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})~{}(C_{2})

First note the following.

Lemma 7

Let $\rho>0$ . Then

\frac{1}{1+\rho}\leq\frac{\log(1+\rho)}{\rho}

(140)

holds.

Proof:

It is shown [7, Theorem 63] that

uv\leq u\log\,u+e^{v-1}~{}(u>0).

(141)

Letting $v=1$ , we have

u\leq u\log\,u+1~{}(u>0).

Furthermore, letting $u=\rho+1$ , we have

1+\rho\leq(1+\rho)\log(1+\rho)+1~{}(\rho>0).

This is equivalent to

\frac{1}{1+\rho}\leq\frac{\log(1+\rho)}{\rho}.

∎

The behavior of $\frac{1}{1+\rho}$ is similar to that of $\frac{\log(1+\rho)}{\rho}$ . We have the following:

(i) $\frac{1}{1+\rho}$ :

1)

$\epsilon\rightarrow 1/2$ : Since $\rho\rightarrow 0$ , $\frac{1}{1+\rho}\rightarrow 1$ .
2)

$\epsilon\rightarrow 0$ : Since $\rho\rightarrow\infty$ , $\frac{1}{1+\rho}\rightarrow 0$ .

Note that $\frac{1}{1+\rho}$ approaches $0$ more rapidly as the SNR increases compared with $\frac{\log(1+\rho)}{\rho}$ .

(ii) $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ :

$\epsilon\rightarrow 1/2$ : Since $\beta_{i}\rightarrow 1/2~{}(i=1,2)$ and since $\delta^{\prime}\rightarrow 0$ ,

\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})=\frac{1}{2}(4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime})\rightarrow 1.

(142)

$\epsilon\rightarrow 0$ : Since $\beta_{i}\rightarrow 0~{}(i=1,2)$ and since $\delta^{\prime}\rightarrow 0$ ,

\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})=\frac{1}{2}(4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime})\rightarrow 0.

(143)

From Tables III and IV (or Figs. 4 and 5), we observe that the behaviors of $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ for $C_{1}$ and $C_{2}$ are almost the same. This is because both $C_{1}$ and $C_{2}$ are regarded as QLI codes and hence have the same $m_{v}~{}(=2)$ , which results in almost equal values of $\delta^{\prime}$ . Also, observe that $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ provides a good approximation to $\frac{1}{1+\rho}$ (Figs. 4 and 5).

V Discussion

In the previous section, we have discussed the validity of the derived MMSE using the results of Guo et al. [6]. In reality, we have examined

1)

the relation between $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ ,
2)

the relation between $\frac{1}{1+\rho}$ and $\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ .

Note that the relation between the mutual information and the MMSE has been reduced to the relation between a function of $\rho$ and the MMSE. That is, their relation has been examined not directly but indirectly. Now we see that the following approximation and evaluation are used in the argument in Section IV:

i)

$\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\approx\xi(\alpha_{1},\alpha_{2},\delta)$ ( $\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\approx\xi(\beta_{1},\beta_{2},\delta^{\prime})$ ).
ii)

$I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq k\log(1+\rho)$ .
iii)

$\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq\frac{k}{1+\rho}$ .

Since $\xi(\alpha_{1},\alpha_{2},\delta)$ ( $\xi(\beta_{1},\beta_{2},\delta^{\prime})$ ) is not dependent on $k$ , averaging in i) seems to be reasonable. (But it has to be justified.) On the other hand, inequalities ii) and iii) come from the fact that the input in our signal model is not Gaussian and hence the input-output mutual information cannot be obtained in a concrete form. (In the scalar case, the mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). In our case, however, since the input is taken as a vector, we have used the conventional inequality on mutual information.) Consider the case that a QLI code is regarded as a general code. It follows from i) and ii) that

	$\displaystyle\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)$	$\displaystyle\lesssim$	$\displaystyle\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$
	$\displaystyle\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)$	$\displaystyle\leq$	$\displaystyle\frac{\log(1+\rho)}{\rho}.$

Similarly, in the case that a QLI code is regarded as the inherent QLI code, it follows from i) and iii) that

	$\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)$	$\displaystyle\approx$	$\displaystyle\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$
	$\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)$	$\displaystyle\leq$	$\displaystyle\frac{1}{1+\rho}.$

Hence evaluations of $I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]$ and $\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]$ are important in our discussions. It is expected that inequalities ii) and iii) are tight at low SNRs, whereas they are loose at high SNRs (cf. [6, p.1263]). As a result, if approximation i) is appropriate, then crossing of two curves in Figs. $1\sim 5$ is understandable. Hence the validity of approximation i) has to be further examined. Nevertheless, in the case that a given QLI code is regarded as the inherent QLI code, it has been shown that $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ provides a good approximation to $\frac{1}{1+\rho}$ . This is quite remarkable considering the fact that $\frac{1}{1+\rho}$ is a function of $\rho$ , whereas $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ is dependent on concrete convolutional coding. We think this fact implies the validity of our inference and the related results.

Here note the following: $\frac{\log(1+\rho)}{\rho}$ ( $\frac{1}{1+\rho}$ ) is a function only of $\rho=c^{2}=E_{b}/N_{0}$ , whereas $\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ ( $\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})$ ) is dependent on the encoded block for the main decoder. Hence at first glance, the comparison seems to be inappropriate. Although $\frac{1}{2}\mbox{mmse}$ actually depends on coding, it is a function of the channel error probability $\epsilon$ . Since $\epsilon=Q\bigl{(}\sqrt{2E_{s}/N_{0}}\bigr{)}=Q\bigl{(}\sqrt{E_{b}/N_{0}}\bigr{)}=Q\bigl{(}\sqrt{\rho}\bigr{)}$ (cf. $n_{0}=2$ ), $\frac{1}{2}\mbox{mmse}$ is also a function of $\rho$ . Hence the above comparison is justified.

By the way, the argument in Section IV is based on the signal model: $z_{j}=cx_{j}+w_{j}$ given in Section I. Note that $x_{j}$ is determined by the encoded symbol $y_{j}$ and hence this signal model is dependent clearly on convolutional coding. However, since each $y_{j}$ is independent of all others, $x_{j}$ can be seen as a random variable having values $\pm 1$ with equal probability. When $x_{j}$ is seen in this way, convolutional coding is not found explicitly in the expression $z_{j}=cx_{j}+w_{j}$ . In words, we cannot see from the signal model how $\{y_{j}\}$ is generated. On the other hand, consider the MMSE in estimating the input based on the observations $\{z_{j}\}$ . If the signal model is interpreted as above, then the MMSE seems to be independent of concrete convolutional coding. But this is not true. As we have already seen, the MMSE is replaced finally by $\xi(\alpha_{1},\alpha_{2},\delta)$ ( $\xi(\beta_{1},\beta_{2},\delta^{\prime})$ ), which is an essential step. By way of this replacement, the MMSE is connected with the convolutional coding. That is, convolutional coding is actually reflected in the MMSE.

From the results in Section IV, it seems that our argument is more convincing for QLI codes. We know that an SST Viterbi decoder functions well for QLI codes compared with general codes. In fact, QLI codes are preferable from a likelihood concentration viewpoint [22, 25]. Then a question arises: How is a likelihood concentration in the main decoder related to the MMSE? Suppose that the information $u_{k}=\mbox{\boldmath$e$}_{k}G^{-1}(D)$ ( $u_{k}=\mbox{\boldmath$e$}_{k}F$ ) for the main decoder consists of $m_{u}$ error terms. We know that a likelihood concentration in the main decoder depends on $m_{u}$ [25], whereas $\delta~{}(\delta^{\prime})$ is affected by $m_{v}$ and hence the MMSE is dependent on $m_{v}$ .

In the case of QLI codes, we have the following.

Proposition 11

Consider a QLI code whose generator matrix is given by

G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1).

We have

m_{u}=m_{v}.

(144)

Proof:

Let $u_{k}=e_{1}+\cdots+e_{m}$ , where errors $e_{j}~{}(1\leq j\leq m)$ are mutually independent. We have

$\displaystyle\mbox{\boldmath$v$}_{k}$	$\displaystyle=$	$\displaystyle u_{k}G(D)$	(145)
	$\displaystyle=$	$\displaystyle(u_{k}g_{1}(D),u_{k}g_{1}(D)+u_{k}D^{L})$
	$\displaystyle=$	$\displaystyle(v_{k}^{(1)},v_{k}^{(2)}).$

Then it follows that $v_{k}^{(1)}$ and $v_{k}^{(2)}$ differ by $u_{k}D^{L}$ . Since $u_{k}=e_{1}+\cdots+e_{m}$ , $v_{k}^{(1)}$ and $v_{k}^{(2)}$ differ by $m$ error terms. ∎

We observe that the relation $m_{u}=m_{v}$ actually holds for $C_{1}\sim C_{4}$ (see Section IV-B). The above result shows that in the case of QLI codes, a likelihood concentration in the main decoder and the degree of correlation between $v_{k}^{(1)}$ and $v_{k}^{(2)}$ (i.e., the value of $\delta~{}(\delta^{\prime})$ ) have a close connection. We remark that the former is independent of the latter in principle. The above result, however, shows that the two notions are closely related to each other.

Note that the equality $m_{u}=m_{v}$ does not hold for general codes in general. Then in order to examine the relation between a likelihood concentration in the main decoder and the MMSE, we can consider a code with small $m_{u}$ and large $m_{v}$ . As an example, take the code $C_{5}$ [22] whose generator matrix is defined by

G_{5}(D)=(1+D+D^{4}+D^{5}+D^{6},1+D^{2}+D^{3}+D^{4}+D^{6}).

(146)

The inverse encoder is given by

G_{5}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right)

(147)

and we have $m_{u}=3$ . On the other hand, it is shown that

G_{5}^{-1}G_{5}=\left(\begin{array}[]{cc}D+\mbox{\boldmath$D$}^{2}+D^{5}+\mbox{\boldmath$D$}^{6}+D^{7}&D+\mbox{\boldmath$D$}^{3}+\mbox{\boldmath$D$}^{4}+D^{5}+D^{7}\\ 1+D^{2}+\mbox{\boldmath$D$}^{4}+D^{7}&1+\mbox{\boldmath$D$}+D^{2}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+D^{7}\end{array}\right).

(148)

Then we have $m_{v}=8$ .

Since $m_{u}$ is small, a likelihood concentration occurs in the main decoder [22, 25]. On the other hand, $m_{v}=8$ is considerably large. Here recall that when $C_{2}$ is regarded as a general code, it has $m_{v}=8$ . This value is the same as that for $C_{5}$ . However, since $C_{2}$ has $m_{u}=m_{v}=8$ as a general code, a likelihood concentration in the main decoder is not expected at low-to-medium SNRs. Then in order to see the effect of $m_{u}$ on the values of $\frac{1}{2}\mbox{mmse}$ , let us compare the values of $\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ for $C_{2}$ and $C_{5}$ . For the purpose, we have evaluated $\frac{1}{2}\mbox{mmse}$ for $C_{5}$ . The result is shown in Table V.

TABLE V:

\frac{\log(1+\rho)}{\rho}

and minimum mean-square error (

C_{5}

)

$E_{b}/N_{0}~{}(\mbox{dB})$	$\frac{\log(1+\rho)}{\rho}$	$\alpha_{1}$	$4\alpha_{1}(1-\alpha_{1})$	$\alpha_{2}$	$4\alpha_{2}(1-\alpha_{2})$	$\delta$	$\frac{1}{2}\mbox{mmse}$
$-10$	$0.9531$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-9$	$0.9419$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-8$	$0.9282$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-7$	$0.9118$	$0.5000$	$1.0000$	$0.5000$	$1.0000$	$0.0000$	$1.0000$
$-6$	$0.8921$	$0.4999$	$1.0000$	$0.5000$	$1.0000$	$0.0002$	$0.9992$
$-5$	$0.8689$	$0.4998$	$1.0000$	$0.5000$	$1.0000$	$0.0002$	$0.9992$
$-4$	$0.8418$	$0.4994$	$1.0000$	$0.4999$	$1.0000$	$0.0006$	$0.9976$
$-3$	$0.8106$	$0.4986$	$1.0000$	$0.4996$	$1.0000$	$0.0014$	$0.9944$
$-2$	$0.7753$	$0.4967$	$1.0000$	$0.4989$	$1.0000$	$0.0029$	$0.9884$
$-1$	$0.7360$	$0.4925$	$0.9998$	$0.4970$	$1.0000$	$0.0060$	$0.9759$
$0$	$0.6931$	$0.4839$	$0.9990$	$0.4925$	$0.9998$	$0.0117$	$0.9526$
$1$	$0.6473$	$0.4675$	$0.9958$	$0.4823$	$0.9987$	$0.0214$	$0.9117$
$2$	$0.5992$	$0.4387$	$0.9850$	$0.4615$	$0.9941$	$0.0363$	$0.8444$
$3$	$0.5498$	$0.3932$	$0.9544$	$0.4242$	$0.9770$	$0.0553$	$0.7445$
$4$	$0.5001$	$0.3301$	$0.8845$	$0.3663$	$0.9285$	$0.0731$	$0.6141$
$5$	$0.4510$	$0.2531$	$0.7562$	$0.2889$	$0.8217$	$0.0814$	$0.4634$
$6$	$0.4033$	$0.1727$	$0.5715$	$0.2021$	$0.6450$	$0.0741$	$0.3119$
$7$	$0.3579$	$0.1026$	$0.3683$	$0.1224$	$0.4297$	$0.0537$	$0.1842$
$8$	$0.3153$	$0.0515$	$0.1954$	$0.0622$	$0.2333$	$0.0306$	$0.0920$
$9$	$0.2758$	$0.0214$	$0.0838$	$0.0260$	$0.1013$	$0.0136$	$0.0382$
$10$	$0.2398$	$0.0070$	$0.0278$	$0.0085$	$0.0337$	$0.0045$	$0.0128$

In Table V, look at the values of $\frac{1}{2}\mbox{mmse}$ . We observe that they are almost the same as those for $C_{2}$ (see Table II). That is, it seems that $m_{u}$ has little effect on the values of $\frac{1}{2}\mbox{mmse}$ . This observation is explained as follows. $m_{u}=3$ means that a likelihood concentration in the main decoder is notable. However, this does not necessarily affect the MMSE. A likelihood concentration really reduces the decoding complexity (i.e., the complexity in estimating the input). On the other hand, the MMSE is the estimation error which is finally attained after the estimation process. In other words, $m_{u}$ affects the complexity of estimation, whereas $m_{v}$ is related to the final estimation error. Moreover, we can say it as follows: the MMSE is determined by the structure of $G^{-1}G$ . Hence even if $G^{-1}$ ’s are considerably different between two codes $C$ and $\tilde{C}$ , when $G^{-1}G$ ’s have almost equal values of $m_{v}$ , the MMSE’s are close to each other. In fact, the inverse encoder for $C_{2}$ is given by

G_{2}^{-1}=\left(\begin{array}[]{c}D^{3}+D^{4}+D^{5}\\ 1+D+D^{3}+D^{4}+D^{5}\end{array}\right),

whereas the inverse encoder for $C_{5}$ is given by

G_{5}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right).

Two inverse encoders are quite different, but the values of $m_{v}$ calculated from $G^{-1}G$ are equal, which results in almost the same MMSE.

VI Conclusion

In this paper, we have shown that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well. Although this fact can be obtained from the definition of innovations for hard-decision data, this time we have discussed the subject from the viewpoint of mutual information and mean-square error. Then by combining the present result with that in [25], it has been confirmed that the input to the main decoder in an SST Viterbi decoder is regarded as the innovation. Moreover, we have obtained an important result. Note that the MMSE has been expressed in terms of the distribution of the encoded block for the main decoder in an SST Viterbi decoder. Then through the argument, the input-output mutual information has been connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory.

On the other hand, we have problems to be further discussed. Note that since the input is not Gaussian, the discussions are based on inequalities and approximate expressions. In particular, when a given QLI code is regarded as a general code, the difference between $\frac{\log(1+\rho)}{\rho}$ and $\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)$ is not so small. We remark that this comparison has been done based on two inequalities regarding evaluation of the input-output mutual information. Moreover, the discussions are based partly on numerical calculations. Hence our argument seems to be slightly less rigorous in some places. Nevertheless, we think the closeness of two curves in Figs. 4 and 5 implies that the inference and the related results in this paper are reasonable.

Appendix A Proof of Lemma 4

We use mathematical induction on $m$ .

1) $P(e_{1}=1)=\epsilon\leq\frac{1}{2}$ is obvious.

2) Suppose that $P(e_{1}+\cdots+e_{m}=1)\leq\frac{1}{2}$ . Let $q=P(e_{1}+\cdots+e_{m}=1)$ . Then we have

$\displaystyle P(e_{1}+\cdots+e_{m}+e_{m+1}=1)$		(A.149)
	$\displaystyle=P(e_{1}+\cdots+e_{m}=0)P(e_{m+1}=1)$
	$\displaystyle+P(e_{1}+\cdots+e_{m}=1)P(e_{m+1}=0)$
	$\displaystyle=(1-q)\epsilon+q(1-\epsilon)$
	$\displaystyle=\epsilon+q(1-2\epsilon).$

Since $q\leq\frac{1}{2}$ by the assumption, we have

	$\displaystyle P(e_{1}+\cdots+e_{m}+e_{m+1}=1)$	$\displaystyle\leq$	$\displaystyle\epsilon+\frac{1}{2}(1-2\epsilon)$		(A.150)
		$\displaystyle=$	$\displaystyle\frac{1}{2}.$		(A.150)

Appendix B Proof of Lemma 5

Suppose that $\alpha_{1}(\epsilon)$ , $\alpha_{2}(\epsilon)$ , and $\alpha_{11}(\epsilon)$ have been determined given $\epsilon$ . Here note $\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})=(\mbox{\boldmath$e$}_{k}G^{-1})G$ . Let $e$ be the (error) terms common to $v_{k}^{(1)}$ and $v_{k}^{(2)}$ . Then $v_{k}^{(1)}$ and $v_{k}^{(2)}$ are expressed as

	$\displaystyle v_{k}^{(1)}$	$\displaystyle=$	$\displaystyle e+\tilde{e}_{1}$		(B.151)
	$\displaystyle v_{k}^{(2)}$	$\displaystyle=$	$\displaystyle e+\tilde{e}_{2},$		(B.152)

where $e$ , $\tilde{e}_{1}$ , and $\tilde{e}_{2}$ are mutually independent. Set

\left\{\begin{array}[]{l}p=P(e=1)\\ s=P(\tilde{e}_{1}=1)\\ t=P(\tilde{e}_{2}=1).\end{array}\right.

(B.153)

We have

$\displaystyle\alpha_{11}$	$\displaystyle=$	$\displaystyle(1-p)st+p(1-s)(1-t)$	(B.154)
$\displaystyle\alpha_{1}$	$\displaystyle=$	$\displaystyle(1-p)s+p(1-s)$	(B.155)
$\displaystyle\alpha_{2}$	$\displaystyle=$	$\displaystyle(1-p)t+p(1-t).$	(B.156)

By direct calculation, it is derived that

	$\displaystyle\delta$	$\displaystyle=$	$\displaystyle\alpha_{11}-\alpha_{1}\alpha_{2}$		(B.157)
		$\displaystyle=$	$\displaystyle p(1-p)(1-2s)(1-2t).$		(B.157)

Since $0\leq s\leq\frac{1}{2}$ and since $0\leq t\leq\frac{1}{2}$ (see Lemma 4), it follows that $\delta\geq 0$ .

Appendix C Proof of Proposition 5

Suppose that $\alpha_{1}(\epsilon)$ , $\alpha_{2}(\epsilon)$ , and $\alpha_{11}(\epsilon)$ have been determined given $\epsilon$ . It suffices to show that

\alpha_{1}(1-\alpha_{1})+\alpha_{2}(1-\alpha_{2})-2\delta\geq 0.

We apply the same argument as that in the proof of Lemma 5. Let $p$ , $s$ , and $t$ be the same as those for Lemma 5. We have

$\displaystyle\alpha_{1}(1-\alpha_{1})$	$\displaystyle=$	$\displaystyle p(1-p)+s(1-s)-4ps(1-p)(1-s)$	(C.158)
$\displaystyle\alpha_{2}(1-\alpha_{2})$	$\displaystyle=$	$\displaystyle p(1-p)+t(1-t)-4pt(1-p)(1-t)$	(C.159)
$\displaystyle\delta$	$\displaystyle=$	$\displaystyle p(1-p)(1-2s)(1-2t).$	(C.160)

By direct calculation, it follows that

	$\displaystyle\alpha_{1}(1-\alpha_{1})+\alpha_{2}(1-\alpha_{2})-2\delta$				(C.161)
			$\displaystyle=s(1-s)+t(1-t)+4p(1-p)(s-t)^{2}\geq 0.$		(C.161)

Appendix D Explanation for $\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]$

The mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). Let $x$ be the equiprobable binary input with unit variance. Using the relation [6, Theorem 1]

\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]=\frac{1}{2}\mbox{mmse}(\rho),

(D.162)

we have

	$\displaystyle\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]=\frac{1}{2}\mbox{mmse}(\rho)$				(D.163)
			$\displaystyle=\frac{1}{2}\left(1-\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy\right).$		(D.163)

On the other hand, for the standard Gaussian input $\tilde{x}$ , we have

	$\displaystyle\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]$	$\displaystyle=$	$\displaystyle\frac{d}{d\rho}\left(\frac{1}{2}\log(1+\rho)\right)$		(D.164)
		$\displaystyle=$	$\displaystyle\frac{1}{2}\frac{1}{1+\rho}.$		(D.164)

Accordingly, it suffices to show that

\frac{1}{1+\rho}\geq 1-\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy.

(D.165)

Note that this inequality is equivalent to

\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy\geq\frac{\rho}{1+\rho}.

(D.166)

Furthermore, the above is rewritten as

$\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-\rho)^{2}}{2\rho}}}{\sqrt{2\pi\rho}}\tanh(y)dy$		(D.167)
	$\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+\rho)^{2}}{2\rho}}}{\sqrt{2\pi\rho}}\tanh(y)dy$
	$\displaystyle\geq\frac{\rho}{1+\rho},$

where $\frac{1}{\sqrt{2\pi\rho}}e^{-\frac{(y-\rho)^{2}}{2\rho}}$ represents the Gaussian distribution with mean $\rho$ and variance $\rho$ .

Here note the integrands on the left-hand side. The variable $\rho$ appears only in Gaussian distributions. Hence using a linear approximation of $\tanh(y)$ , numerical integration is possible. $\tanh(y)~{}(0\leq y<\infty)$ is approximated as

\tanh(y)\approx\left\{\begin{array}[]{rl}\displaystyle{y},&0\leq y\leq\frac{1}{2}\\ \displaystyle{\frac{1}{2}y+\frac{1}{4}},&\frac{1}{2}\leq y\leq 1\\ \displaystyle{\frac{3}{10}y+\frac{9}{20}},&1\leq y\leq\frac{3}{2}\\ \displaystyle{\frac{1}{15}y+\frac{4}{5}},&\frac{3}{2}\leq y\leq 3\\ \displaystyle{1},&3\leq y<\infty.\\ \end{array}\right.

(D.168)

(cf. $\tanh(0)=0,~{}\tanh(0.5)\approx 0.4621,~{}\tanh(1.0)\approx 0.7616,~{}\tanh(1.5)\approx 0.9051,~{}\tanh(3.0)\approx 0.9951$ )

If the above inequality holds for each $\rho$ , then $\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]$ will be shown. For example, let $\rho=1$ . The target inequality becomes

$\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy$		(D.169)
	$\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy$
	$\displaystyle\geq\frac{1}{2}.$

By carrying out numerical integration, we have

$\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy$		(D.170)
	$\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy$
	$\displaystyle\approx 0.6079-0.0665=0.5414>\frac{1}{2}.$

Hence we actually have $\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]$ at $\rho=1$ .

References

[1] S. Arimoto, Kalman Filter, (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1977.
[2] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.
[3] T. E. Duncan, “On the calculation of mutual information,” SIAM J. Appl. Math., vol. 19, pp. 215–220, Jul. 1970.
[4] R. Esposito, “On a relation between detection and estimation in decision theory,” Inform. Contr., vol. 12, pp. 116-120, 1968.
[5] G. D. Forney, Jr., “Convolutional codes I: Algebraic structure,” IEEE Trans. Inf. Theory, vol. IT-16, no. 6, pp. 720–738, Nov. 1970.
[6] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005.
[7] G. H. Hardy, J. E. Littlewood, and G. Pólya, Inequalities, 2nd ed. Cambridge University Press, 1952. (H. hosokawa, Inequalities, (Japanese transl.). Tokyo, Japan: Springer-Verlag Tokyo, 2003.)
[8] K. Ito ans S. Watanabe, “Introduction to stochastic differential equations,” in Proc. Intern. Symp. SDE, pp. i–xxx, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
[9] A. H. Jazwinski, Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.
[10] T. T. Kadota, M. Zakai, and J. Ziv, “Mutual information of the white Gaussian channel with and without feedback,” IEEE Trans. Inf. Theory, vol. IT-17, no. 4, pp. 368–371, Jul. 1971.
[11] T. Kailath, “An innovations approach to least-squares estimation–Part I: Linear filtering in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 646–655, Dec. 1968.
[12] T. Kailath and P. Frost, “An innovations approach to least-squares estimation–Part II: Linear smoothing in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 655–660, Dec. 1968.
[13] T. Kailath, “A note on least-squares estimates from likelihood ratios,” Inform. Contr., vol. 13, pp. 534–540, 1968.
[14] T. Kailath, “A general likelihood-ratio formula for random signals in Gaussian noise,” IEEE Trans. Inf. Theory, vol. IT-15, no. 3, pp. 350–361, May 1969.
[15] T. Kailath, “A view of three decades of linear filtering theory (Invited paper),” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 146–181, Mar. 1974.
[16] T. Kailath and H. V. Poor, “Detection of stochastic processes (Invited paper),” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2230–2259, Oct. 1998.
[17] S. Kubota, S. Kato, and T. Ishitani, “Novel Viterbi decoder VLSI implementation and its performance,” IEEE Trans. Commun., vol. 41, no. 8, pp. 1170–1178, Aug. 1993.
[18] H. Kunita, Estimation of Stochastic Processes (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1976.
[19] P. Malliavin, “Stochastic calculus of variation and hypoelliptic operators,” in Proc. Intern. Symp. SDE, pp. 195–263, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
[20] J. L. Massey and D. J. Costello, Jr., “Nonsystematic convolutional codes for sequential decoding in space applications,” IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp. 806–813, Oct. 1971.
[21] B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, 5th ed. Berlin, Germany: Springer-Verlag, 1998.
[22] S. Ping, Y. Yan, and C. Feng, “An effective simplifying scheme for Viterbi decoder,” IEEE Trans. Commun., vol. 39, no. 1, pp. 1–3, Jan. 1991.
[23] G. Strang, Linear Algebra and Its Applications. New York: Academic Press, 1976. (M. Yamaguchi and A. Inoue, Linear Algebra and Its Applications, (Japanese transl.). Tokyo, Japan: Sangyo Tosho Publishing, 1978.)
[24] M. Tajima, K. Shibata, and Z. Kawasaki, “On the equivalence between Scarce-State-Transition Viterbi decoding and syndrome decoding of convolutional codes,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 2107–2116, Aug. 2003.
[25] M. Tajima, “An innovations approach to Viterbi decoding of convolutional codes,” IEEE Trans. Inf. Theory, vol. 65, no. 5, pp. 2704–2722, May 2019.
[26] M. Tajima, “Corrections to “An innovations approach to Viterbi decoding of convolutional codes”,” IEEE Trans. Inf. Theory, to be published.
[27] D. Williams, Probability with Martingales. Cambridge University Press, 1991. (J. Akahori et al., Probability with Martingales, (Japanese transl.). Tokyo, Japan: Baifukan, 2004.)
[28] E. Wong, Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill, 1971.
[29] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.

A Further Note on an Innovations Approach to Viterbi Decoding of Convolutional Codes

Abstract

Index Terms:

I Introduction

II Joint Distribution of the Input to the Main Decoder

Lemma 1 ([26])

Proof:

Lemma 2 ([26])

Proof:

Proposition 1 ([26])

Proof:

III Mean-Square Error in Estimating the Input

Proposition 2 (Kailath [11])

III-A Covariance Matrix Associated with the Input to the Main Decoder

Lemma 3 ([26])

Proof:

Lemma 4

Proof:

Lemma 5

Proof:

Proposition 3 ([26])

Proof:

III-B MMSE in Estimating the Input

III-C Modification of the Derived MMSE

Proposition 4

Proof:

Proposition 5

Proof:

III-D In the Case of QLI Codes

Proposition 6 ([26])

Lemma 6 ([26])

Proposition 7 ([26])

Proposition 8

Proposition 9

IV Mutual Information and MMSEs

IV-A General Codes

IV-B Numerical Results as General Codes

IV-C QLI Codes

Proposition 10

Proof:

IV-D Numerical Results as QLI Codes

Lemma 7

Proof:

V Discussion

Proposition 11

Proof:

VI Conclusion

Appendix A Proof of Lemma 4

Appendix B Proof of Lemma 5

Appendix C Proof of Proposition 5

Appendix D Explanation for dd​ρ​I​[x;ρ​x+w]≤dd​ρ​I​[x~;ρ​x~+w]\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]

References

Appendix D Explanation for $\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]$