This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Further Note on an Innovations Approach to Viterbi Decoding of Convolutional Codes

Masato Tajima M. Tajima is with University of Toyama, 3190 Gofuku, Toyama 930-8555, Japan (e-mail: [email protected]).
Abstract

In this paper, we show that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well from the viewpoint of mutual information and mean-square error. It is assumed that a code sequence is transmitted symbol by symbol over an AWGN channel using BPSK modulation. Then we can consider the signal model, where the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input. By assuming that the soft-decision input to the main decoder is the innovation, we show that the minimum mean-square error (MMSE) in estimating the binary input is expressed in terms of the distribution of the encoded block for the main decoder. It is shown that the obtained MMSE satisfies indirectly the known relation between the mutual information and the MMSE in Gaussian channels. Thus the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. Moreover, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.

Index Terms:
Convolutional codes, Scarce-State-Transition (SST) Viterbi decoder, innovations, filtering, smoothing, mean-square error, mutual information.

I Introduction

Consider an SST Viterbi decoder [17, 24] which consists of a pre-decoder and a main decoder. In [25], by comparing with the results in the linear filtering theory [1, 9, 11, 12, 15], we showed that the hard-decision input to the main decoder can be seen as the innovation [11, 12, 16, 18]. In the coding theory, the framework of the theory is determined by hard-decision data. In the previous paper [25, Definition 1], we have given the definition of innovations for hard-decision data. We see that the definition applies also to soft-decision data. That is, if the associated hard-decision data is the innovation, then the original soft-decision data can be regarded as the innovation. Hence, we can consider the soft-decision input to the main decoder to be the innovation as well. In this paper, concerning this subject, we show that this is true from a different viewpoint. We use the innovations approach to least-squares estimation by Kailath [11, 12] and the relationship between the mutual information and the mean-square error (e.g., [6]).

Consider the situation where we have observations of a signal in additive white Gaussian noise. (In this paper, it is assumed that the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input, where the latter is not dependent on the SNR. In the following, we call the equiprobable binary input simply the input.) Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. In the discrete-time case [11, 15], he showed that the covariance matrix of the innovation is expressed as the sum of two covariance matrices, where one is corresponding to the estimation error of the signal and the other is corresponding to the observation noise. Then the mean-square error in estimating the signal is obtained from the former covariance matrix by taking its trace. Based on the result of Kailath, we thought as follows: if the soft-decision input to the main decoder corresponds to the innovation, then the associated covariance matrix should have the above property. That is, the covariance matrix of the soft-decision input to the main decoder can be decomposed as the sum of two covariance matrices and hence by following the above, the mean-square error in estimating the signal will be obtained. We remark that in this context, the obtained mean-square error has to be justified using some method. For the purpose, we have noted the relation between the mutual information and the mean-square error.

By the way, we derived the distribution of the input to the main decoder corresponding to a single code symbol in [25]. However, we could not obtain the joint distribution of the input to the main decoder corresponding to a branch. After our paper [25] was published, we have noticed that the distribution of the input to the main decoder corresponding to a single code symbol has a reasonable interpretation (see [26]). Then using this fact, the joint distribution of the input to the main decoder corresponding to a branch has been derived. In that case, the associated covariance matrix can be calculated. Then our argument for obtaining the mean-square error is as follows:

  • 1)

    Derive the joint distribution of the soft-decision input to the main decoder.

  • 2)

    Calculate the associated covariance matrix using the derived joint distribution.

  • 3)

    The covariance matrix of the estimation error of the signal is obtained by subtracting that of the observation noise from the matrix in 2). Moreover, by removing the SNR from the obtained covariance matrix, the covariance matrix of the estimation error of the input is derived.

  • 4)

    The mean-square error in estimating the input is given by the trace (i.e., the sum of diagonal elements) of the last covariance matrix.

In this way, we show that the mean-square error in estimating the input is expressed in terms of the distribution of the encoded block for the main decoder. Since in an SST Viterbi decoder, the estimation error is decoded at the main decoder, the result is reasonable.

On the other hand, we have to show the validity of the derived mean-square error. For the purpose, we note the relation between the mutual information and the mean-square error. Including these notions, the mutual information, the likelihood ratio (LR), and the mean-square error are central concerns in information theory, detection theory, and estimation theory. It is well known that the best estimate measured in mean-square sense (i.e., the least-squares estimate) is given by the conditional expectation [16, 18, 21, 27]. In this paper, the corresponding mean-square error is referred to as the minimum mean-square error (MMSE) [6]. In this case, depending on the amount of observations {z(t)}\{z(t)\} used for the estimation, the causal (filtering) MMSE and the noncausal (smoothing) MMSE are considered. When {z(τ),τt}\{z(\tau),~{}\tau\leq t\} is used for the estimation of the input x(t)x(t), it corresponds to filtering, whereas when {z(τ),τT}\{z(\tau),~{}\tau\leq T\} is used for the estimation of x(t)(t<T)x(t)~{}(t<T), it corresponds to smoothing.

From the late 1960’s to the early 1970’s, the relation between the LR and the mean-square error was actively discussed. Kailath [14] showed that the LR is expressed in terms of the causal least-squares estimate. Moreover [13], he showed that the causal least-squares estimate can be obtained from the LR. Esposito [4] also derived the relation between the LR and the noncausal estimator, which is closely related to [13]. Subsequently, Duncan [3] and Kadota et al. [10] derived the relation between the mutual information and the causal mean-square error. Their works are regarded as extensions of the work of Gelfand and Yaglom (see [1, 3]), who discussed for the first time the relation between the mutual information and the filtering error. Later, Guo et al. [6] derived new formulas regarding the relation between the mutual information and the noncausal mean-square error. Furthermore, Zakai [29] showed that the relation between the mutual information and the estimation error holds also in the abstract Wiener Space [28, 29]. He applied not the Ito calculus [8, 21] but the Malliavin calculus [19, 29]. We remark that the additive Gaussian noise channel is assumed for all above works.

Among those works, we have noted the work of Guo et al. [6]. It deals with the relation between the mutual information and the MMSE. Also, their signal model is equal to ours and hence it seems favorable for manipulating. We thought if the MMSE obtained using the above method satisfies the relations in [6, Section IV-A], then the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. The main text of the paper consists of the arguments stated above. By the way, it is shown that the MMSE is expressed in terms of the distribution of the encoded block for the main decoder. Then we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory. The remainder of this paper is organized as follows.

In Section II, based on the signal model in this paper, we derive the joint distribution of the input to the main decoder.

In Section III, the associated covariance matrix is calculated using the derived joint distribution. Subsequently, by subtracting the covariance matrix of the observation noise from it and by removing the SNR from the resulting matrix, the covariance matrix of the estimation error of the input is obtained. The MMSE in estimating the input is given by the trace of the last matrix. In this case, since the diagonal elements of the target matrix have a correlation, we modify the obtained MMSE. Note that this modification is essentially important. (We remark that some results in Sections II and III have been given in [26] with or without proofs. However, in order for the paper to be self-contained, the necessary materials are provided again with proofs in those sections.)

In Section IV, the validity of the derived MMSE is discussed. The argument is based on the relation between the mutual information and the MMSE. More precisely, we discuss using the results of Guo et al. [6, Section IV-A]. We remark that the input in our signal model is not Gaussian. Hence the input-output mutual information cannot be obtained in a concrete form. We have only inequalities and approximate expressions. For that reason, we carry out numerical calculations using concrete convolutional codes. In this case, in order to clarify the difference between causal estimation (filtering) and noncausal one (smoothing), we take QLI codes [20] with different constraint-lengths. Then the MMSE’s are calculated by regarding these QLI codes as general codes on one side and by regarding them as inherent QLI codes on the other. The obtained results are compared and carefully examined. Moreover, through the argument, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.

In Section V, we give several important comments regarding the discussions in Section IV.

Finally, in Section IV, we conclude with the main points of the paper and with problems to be further discussed.

Let us close this section by introducing the basic notions needed for this paper. Notations in this paper are same as those in [25] in principle. We always assume that the underlying field is GF(2)\mbox{GF}(2). Let G(D)G(D) be a generator matrix for an (n0,k0)(n_{0},k_{0}) convolutional code, where G(D)G(D) is assumed to be minimal [5]. A corresponding check matrix H(D)H(D) is also assumed to be minimal. Hence they have the same constraint length, denoted ν\nu. Denote by 𝒊={𝒊k}\mbox{\boldmath$i$}=\{\mbox{\boldmath$i$}_{k}\} and 𝒚={𝒚k}\mbox{\boldmath$y$}=\{\mbox{\boldmath$y$}_{k}\} an information sequence and the corresponding code sequence, respectively, where 𝒊k=(ik(1),,ik(k0))\mbox{\boldmath$i$}_{k}=(i_{k}^{(1)},\cdots,i_{k}^{(k_{0})}) is the information block at t=kt=k and 𝒚k=(yk(1),,yk(n0))\mbox{\boldmath$y$}_{k}=(y_{k}^{(1)},\cdots,y_{k}^{(n_{0})}) is the encoded block at t=kt=k. In this paper, it is assumed that a code sequence 𝒚y is transmitted symbol by symbol over a memoryless AWGN channel using BPSK modulation. Let 𝒛={𝒛k}\mbox{\boldmath$z$}=\{\mbox{\boldmath$z$}_{k}\} be a received sequence, where 𝒛k=(zk(1),,zk(n0))\mbox{\boldmath$z$}_{k}=(z_{k}^{(1)},\cdots,z_{k}^{(n_{0})}) is the received block at t=kt=k. Each component zjz_{j} of 𝒛z is modeled as

zj\displaystyle z_{j} =\displaystyle= xj2Es/N0+wj\displaystyle x_{j}\sqrt{2E_{s}/N_{0}}+w_{j} (1)
=\displaystyle= cxj+wj,\displaystyle cx_{j}+w_{j}, (2)

where c=2Es/N0c\stackrel{{\scriptstyle\triangle}}{{=}}\sqrt{2E_{s}/N_{0}}. Here, xjx_{j} takes ±1\pm 1 depending on whether the code symbol yjy_{j} is 0 or 11. That is, xjx_{j} is the equiprobable binary input. (We call it simply the input.) EsE_{s} and N0N_{0} denote the energy per channel symbol and the single-sided noise spectral density, respectively. (Let EbE_{b} be the energy per information bit. Then the relationship between EbE_{b} and EsE_{s} is defined by Es=REbE_{s}=RE_{b}, where RR is the code rate.) Also, wjw_{j} is a zero-mean unit variance Gaussian random variable with probability density function

q(y)=12πey22.q(y)=\frac{1}{\sqrt{2\pi}}e^{-\frac{y^{2}}{2}}. (3)

Each wjw_{j} is independent of all others. The hard-decision (denoted “h”) data of zjz_{j} is defined by

zjh={0,zj01,zj<0.z_{j}^{h}\stackrel{{\scriptstyle\triangle}}{{=}}\left\{\begin{array}[]{rl}0,&\quad z_{j}\geq 0\\ 1,&\quad z_{j}<0.\end{array}\right. (4)

In this case, the channel error probability (denoted ϵ\epsilon) is given by

ϵ=12π2Es/N0ey22𝑑y=Q(2Es/N0).\epsilon=\frac{1}{\sqrt{2\pi}}\int_{\sqrt{2E_{s}/N_{0}}}^{\infty}e^{-\frac{y^{2}}{2}}dy\stackrel{{\scriptstyle\triangle}}{{=}}Q\bigl{(}\sqrt{2E_{s}/N_{0}}\bigr{)}. (5)

Note that the above signal model can be seen as the block at t=kt=k. In that case, we can rewrite as

𝒛k=c𝒙k+𝒘k,\mbox{\boldmath$z$}_{k}=c\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k}, (6)

where 𝒙k=(xk(1),,xk(n0))\mbox{\boldmath$x$}_{k}=(x_{k}^{(1)},\cdots,x_{k}^{(n_{0})}), 𝒘k=(wk(1),,wk(n0))\mbox{\boldmath$w$}_{k}=(w_{k}^{(1)},\cdots,w_{k}^{(n_{0})}), and 𝒛k=(zk(1),,zk(n0))\mbox{\boldmath$z$}_{k}=(z_{k}^{(1)},\cdots,z_{k}^{(n_{0})}).

In this paper, we consider an SST Viterbi decoder [17, 24, 25] which consists of a pre-decoder and a main decoder. In the case of a general convolutional code, the inverse encoder G1(D)G^{-1}(D) is used as a pre-decoder. Let 𝒓k=(rk(1),,rk(n0))\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},\cdots,r_{k}^{(n_{0})}) be the soft-decision input to the main decoder. rk(l)(1ln0)r_{k}^{(l)}~{}(1\leq l\leq n_{0}) is given by

rk(l)={|zk(l)|,rk(l)h=0|zk(l)|,rk(l)h=1.r_{k}^{(l)}=\left\{\begin{array}[]{rl}|z_{k}^{(l)}|,&\quad r_{k}^{(l)h}=0\\ -|z_{k}^{(l)}|,&\quad r_{k}^{(l)h}=1.\end{array}\right. (7)

II Joint Distribution of the Input to the Main Decoder

The argument in this paper is based on the result of Kailath [11, Section III-B]. That is, by calculating the covariance matrix of the input to the main decoder, we derive the mean-square error in estimating the input. First we obtain the joint distribution of the input to the main decoder. In the following, P()P(\,\cdot\,) and E[]E[\,\cdot\,] denote the probability and expectation, respectively.

Let 𝒓k=(rk(1),,rk(n0))\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},\cdots,r_{k}^{(n_{0})}) be the input to the main decoder in an SST Viterbi decoder [25]. In connection with the distribution of rk(l)(1ln0)r_{k}^{(l)}~{}(1\leq l\leq n_{0}) [25, Proposition 12], α(=αl)\alpha~{}(=\alpha_{l}) is defined by

α=P(ek(l)=0,rk(l)h=1)+P(ek(l)=1,rk(l)h=0).\alpha\stackrel{{\scriptstyle\triangle}}{{=}}P(e_{k}^{(l)}=0,r_{k}^{(l)h}=1)+P(e_{k}^{(l)}=1,r_{k}^{(l)h}=0). (8)

We have noticed that this value has another interpretation. In fact, we have the following.

Lemma 1 ([26])
αl=P(vk(l)=1)(1ln0)\alpha_{l}=P(v_{k}^{(l)}=1)~{}(1\leq l\leq n_{0}) (9)

holds, where 𝐯k=(vk(1),,vk(n0))\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},\cdots,v_{k}^{(n_{0})}) is the encoded block for the main decoder.

Proof:

The hard-decision input to the main decoder is expressed as

𝒓kh=𝒖kG+𝒆k,\mbox{\boldmath$r$}_{k}^{h}=\mbox{\boldmath$u$}_{k}G+\mbox{\boldmath$e$}_{k},

where 𝒖k=𝒆kG1\mbox{\boldmath$u$}_{k}=\mbox{\boldmath$e$}_{k}G^{-1}. Let 𝒗k=𝒖kG\mbox{\boldmath$v$}_{k}=\mbox{\boldmath$u$}_{k}G. We have

rk(l)h=vk(l)+ek(l)(1ln0).r_{k}^{(l)h}=v_{k}^{(l)}+e_{k}^{(l)}~{}(1\leq l\leq n_{0}). (10)

Hence it follows that

αl\displaystyle\alpha_{l} =\displaystyle= P(ek(l)=0,rk(l)h=1)+P(ek(l)=1,rk(l)h=0)\displaystyle P(e_{k}^{(l)}=0,r_{k}^{(l)h}=1)+P(e_{k}^{(l)}=1,r_{k}^{(l)h}=0) (11)
=\displaystyle= P(ek(l)=0,vk(l)+ek(l)=1)+P(ek(l)=1,vk(l)+ek(l)=0)\displaystyle P(e_{k}^{(l)}=0,v_{k}^{(l)}+e_{k}^{(l)}=1)+P(e_{k}^{(l)}=1,v_{k}^{(l)}+e_{k}^{(l)}=0)
=\displaystyle= P(ek(l)=0,vk(l)=1)+P(ek(l)=1,vk(l)=1)\displaystyle P(e_{k}^{(l)}=0,v_{k}^{(l)}=1)+P(e_{k}^{(l)}=1,v_{k}^{(l)}=1)
=\displaystyle= P(vk(l)=1).\displaystyle P(v_{k}^{(l)}=1).

Thus the distribution of rk(l)(1ln0)r_{k}^{(l)}~{}(1\leq l\leq n_{0}) is given by

pr(y)=P(vk(l)=0)q(yc)+P(vk(l)=1)q(y+c).p_{r}(y)=P(v_{k}^{(l)}=0)q(y-c)+P(v_{k}^{(l)}=1)q(y+c). (12)

This equation means that if the code symbol is 0, then the associated distribution obeys q(yc)q(y-c), whereas if the code symbol is 11, then the associated distribution obeys q(y+c)q(y+c). Hence the result is quite reasonable. On the other hand, since the distribution of rk(l)r_{k}^{(l)} is given by the above equation, rk(l)(1ln0)r_{k}^{(l)}~{}(1\leq l\leq n_{0}) are not mutually independent, because vk(l)(1ln0)v_{k}^{(l)}~{}(1\leq l\leq n_{0}) are not mutually independent.

Next, consider a QLI code whose generator matrix is given by

G(D)=(g1(D),g1(D)+DL)(1Lν1).G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1). (13)

Let 𝜼kL=(ηkL(1),ηkL(2))\mbox{\boldmath$\eta$}_{k-L}=(\eta_{k-L}^{(1)},\eta_{k-L}^{(2)}) be the input to the main decoder in an SST Viterbi decoder [25]. We see that almost the same argument applies to β(=βl)\beta~{}(=\beta_{l}) in [25, Proposition 14]. We have the following.

Lemma 2 ([26])
βl=P(vk(l)=1)(l=1,2)\beta_{l}=P(v_{k}^{(l)}=1)~{}(l=1,2) (14)

holds, where 𝐯k=(vk(1),vk(2))\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)}) is the encoded block for the main decoder.

Proof:

In the case of QLI codes [25, Section II-B], the hard-decision input to the main decoder is expressed as

𝜼kLh\displaystyle\mbox{\boldmath$\eta$}_{k-L}^{h} =\displaystyle= ukG+𝒆kL\displaystyle u_{k}G+\mbox{\boldmath$e$}_{k-L}
=\displaystyle= 𝒗k+𝒆kL,\displaystyle\mbox{\boldmath$v$}_{k}+\mbox{\boldmath$e$}_{k-L},

where uk=𝒆kFu_{k}=\mbox{\boldmath$e$}_{k}F (F=(11)F\stackrel{{\scriptstyle\triangle}}{{=}}\left(\begin{array}[]{c}1\\ 1\end{array}\right)) and 𝒗k=ukG\mbox{\boldmath$v$}_{k}=u_{k}G. Let ζk=𝒛kH(D)\zeta_{k}=\mbox{\boldmath$z$}_{k}H(D) be the syndrome. Since 𝜼kLh=(ζk,ζk)\mbox{\boldmath$\eta$}_{k-L}^{h}=(\zeta_{k},\zeta_{k}), the above is equivalent to

(ζk,ζk)=(vk(1),vk(2))+(ekL(1),ekL(2)).(\zeta_{k},\zeta_{k})=(v_{k}^{(1)},v_{k}^{(2)})+(e_{k-L}^{(1)},e_{k-L}^{(2)}). (15)

Hence we have

βl\displaystyle\beta_{l} =\displaystyle= P(ekL(l)=0,ζk=1)+P(ekL(l)=1,ζk=0)\displaystyle P(e_{k-L}^{(l)}=0,\zeta_{k}=1)+P(e_{k-L}^{(l)}=1,\zeta_{k}=0) (16)
=\displaystyle= P(ekL(l)=0,vk(l)+ekL(l)=1)+P(ekL(l)=1,vk(l)+ekL(l)=0)\displaystyle P(e_{k-L}^{(l)}=0,v_{k}^{(l)}+e_{k-L}^{(l)}=1)+P(e_{k-L}^{(l)}=1,v_{k}^{(l)}+e_{k-L}^{(l)}=0)
=\displaystyle= P(ekL(l)=0,vk(l)=1)+P(ekL(l)=1,vk(l)=1)\displaystyle P(e_{k-L}^{(l)}=0,v_{k}^{(l)}=1)+P(e_{k-L}^{(l)}=1,v_{k}^{(l)}=1)
=\displaystyle= P(vk(l)=1)\displaystyle P(v_{k}^{(l)}=1)

for l=1,2l=1,~{}2. ∎

In the rest of the paper, n0=2n_{0}=2 is assumed, because we are concerned with QLI codes in principle. Let us examine the relationship between αl\alpha_{l} and βl\beta_{l}. Let i^(kL|kL)\hat{i}(k-L|k-L) and i^(kL|k)\hat{i}(k-L|k) be the filtered estimate and the smoothed estimate, respectively [25, Section II-B]. We have

𝒓kLh\displaystyle\mbox{\boldmath$r$}_{k-L}^{h} =\displaystyle= 𝒛kLhi^(kL|kL)G(D)\displaystyle\mbox{\boldmath$z$}_{k-L}^{h}-\hat{i}(k-L|k-L)G(D) (17)
=\displaystyle= (ikLi^(kL|kL))G(D)+𝒆kL\displaystyle(i_{k-L}-\hat{i}(k-L|k-L))G(D)+\mbox{\boldmath$e$}_{k-L}
=\displaystyle= ukLG(D)+𝒆kL\displaystyle u_{k-L}G(D)+\mbox{\boldmath$e$}_{k-L}
𝜼kLh\displaystyle\mbox{\boldmath$\eta$}_{k-L}^{h} =\displaystyle= 𝒛kLhi^(kL|k)G(D)\displaystyle\mbox{\boldmath$z$}_{k-L}^{h}-\hat{i}(k-L|k)G(D) (18)
=\displaystyle= (ikLi^(kL|k))G(D)+𝒆kL\displaystyle(i_{k-L}-\hat{i}(k-L|k))G(D)+\mbox{\boldmath$e$}_{k-L}
=\displaystyle= u~kLG(D)+𝒆kL,\displaystyle\tilde{u}_{k-L}G(D)+\mbox{\boldmath$e$}_{k-L},

where u~kL=𝒆kF\tilde{u}_{k-L}\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$e$}_{k}F. Then it follows that

𝒗kL\displaystyle\mbox{\boldmath$v$}_{k-L} =\displaystyle= ukLG(D)\displaystyle u_{k-L}G(D) (19)
=\displaystyle= (ikLi^(kL|kL))G(D)\displaystyle(i_{k-L}-\hat{i}(k-L|k-L))G(D)
𝒗~kL\displaystyle\mbox{\boldmath$\tilde{v}$}_{k-L} =\displaystyle= u~kLG(D)\displaystyle\tilde{u}_{k-L}G(D) (20)
=\displaystyle= (ikLi^(kL|k))G(D).\displaystyle(i_{k-L}-\hat{i}(k-L|k))G(D).

From the meaning of filtering and smoothing, it is natural to think that P(ikLi^(kL|kL)=0)P(i_{k-L}-\hat{i}(k-L|k-L)=0) is smaller than P(ikLi^(kL|k)=0)P(i_{k-L}-\hat{i}(k-L|k)=0). Hence it is expected that

P(vkL(l)=1)>P(v~kL(l)=1),P(v_{k-L}^{(l)}=1)>P(\tilde{v}_{k-L}^{(l)}=1), (21)

i.e., αl>βl\alpha_{l}>\beta_{l} holds. (In the derivation, P(vkL(l)=1)=P(vk(l)=1)P(v_{k-L}^{(l)}=1)=P(v_{k}^{(l)}=1), which is equivalent to a kind of stationarity, has been used.) However, this is not always true (see Section IV).

Since the meaning of α(=αl)\alpha~{}(=\alpha_{l}) has been clarified, we can derive the joint distribution (denoted pr(x,y)p_{r}(x,y)) of rk(1)r_{k}^{(1)} and rk(2)r_{k}^{(2)}. In fact, we have the following.

Proposition 1 ([26])

pr(x,y)p_{r}(x,y) is given by

pr(x,y)\displaystyle p_{r}(x,y) =\displaystyle= α00q(xc)q(yc)+α01q(xc)q(y+c)\displaystyle\alpha_{00}q(x-c)q(y-c)+\alpha_{01}q(x-c)q(y+c) (22)
+α10q(x+c)q(yc)+α11q(x+c)q(y+c),\displaystyle+\alpha_{10}q(x+c)q(y-c)+\alpha_{11}q(x+c)q(y+c),

where αij=P(vk(1)=i,vk(2)=j)\alpha_{ij}=P(v_{k}^{(1)}=i,v_{k}^{(2)}=j).

Proof:

pr(x,y)0p_{r}(x,y)\geq 0 is obvious. Let us show that pr(x,y)𝑑x𝑑y=1\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p_{r}(x,y)dxdy=1. Noting, for example,

q(xc)q(yc)𝑑x𝑑y\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}q(x-c)q(y-c)dxdy
=q(xc)𝑑xq(yc)𝑑y\displaystyle=\int_{-\infty}^{\infty}q(x-c)dx\int_{-\infty}^{\infty}q(y-c)dy
=1×1=1,\displaystyle=1\times 1=1,

we have

pr(x,y)𝑑x𝑑y\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}p_{r}(x,y)dxdy (23)
=α00+α01+α10+α11=1.\displaystyle=\alpha_{00}+\alpha_{01}+\alpha_{10}+\alpha_{11}=1.

(Remark: q(xc)q(yc)𝑑x𝑑y\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}q(x-c)q(y-c)dxdy is a multiple integral on the infinite interval, i.e., an improper integral. Consider a finite interval I={axb,ayb}I=\{a\leq x\leq b,a^{\prime}\leq y\leq b^{\prime}\}. Since q(xc)q(yc)q(x-c)q(y-c) is continuous on II, a repeated integral is possible on II and we have

Iq(xc)q(yc)𝑑x𝑑y\displaystyle\int\int_{I}q(x-c)q(y-c)dxdy
=abq(xc)𝑑xabq(yc)𝑑y.\displaystyle=\int_{a}^{b}q(x-c)dx\int_{a^{\prime}}^{b^{\prime}}q(y-c)dy.

Taking the limit as a,aa,a^{\prime}\rightarrow-\infty and as b,bb,b^{\prime}\rightarrow\infty, the above equality is obtained.)

Next, let us calculate the marginal distribution of pr(x,y)p_{r}(x,y). We have

pr(x,y)𝑑y\displaystyle\int_{-\infty}^{\infty}p_{r}(x,y)dy (24)
=(α00+α01)q(xc)+(α10+α11)q(x+c)\displaystyle=(\alpha_{00}+\alpha_{01})q(x-c)+(\alpha_{10}+\alpha_{11})q(x+c)
=(1α1)q(xc)+α1q(x+c).\displaystyle=(1-\alpha_{1})q(x-c)+\alpha_{1}q(x+c).

Note that this is just the distribution of rk(1)r_{k}^{(1)}. Similarly, we have

pr(x,y)𝑑x=(1α2)q(yc)+α2q(y+c),\int_{-\infty}^{\infty}p_{r}(x,y)dx=(1-\alpha_{2})q(y-c)+\alpha_{2}q(y+c), (25)

where the right-hand side is the distribution of rk(2)r_{k}^{(2)}. All these facts show that pr(x,y)p_{r}(x,y) is the joint distribution of rk(1)r_{k}^{(1)} and rk(2)r_{k}^{(2)}. ∎

III Mean-Square Error in Estimating the Input

Consider the signal model:

Z=X+W,Z=X+W, (26)

where XX represents a signal of interest and WW represents random noise. We assume that XX and WW are mutually independent. Since we cannot know the value of XX directly, we have to estimate it based on the observation ZZ. The error of an estimate, denoted 𝒇(Z)\mbox{\boldmath$f$}(Z), of the input XX based on the observation ZZ can be measured in mean-square sense

E[(X𝒇(Z))(X𝒇(Z))T],E[(X-\mbox{\boldmath$f$}(Z))(X-\mbox{\boldmath$f$}(Z))^{T}], (27)

where “T” means transpose. It is well known [16, 18, 21, 27] that the minimum of the above value is achieved by the conditional expectation

X^=𝒇(Z)^=E[X|Z].\hat{X}=\widehat{\mbox{\boldmath$f$}(Z)}=E[X|Z]. (28)

X^\hat{X} is the least-squares estimate and the corresponding estimation error is referred to as the minimum mean-square error (MMSE) [6]. In the following, the value of the MMSE is denoted by “mmse”.

Remark: Let 𝒵\mathcal{Z} be the σ\sigma-field generated by ZZ. Also, denote by L2(𝒵)L^{2}(\mathcal{Z}) the set of elements in L2L^{2} which are 𝒵\mathcal{Z} measurable, where L2L^{2} is the set of square integrable random variables. Then we have E[X|Z]=PL2(𝒵)XE[X|Z]=P_{L^{2}(\mathcal{Z})}X, where PL2(𝒵)XP_{L^{2}(\mathcal{Z})}X is the orthogonal projection of XX onto the space L2(𝒵)L^{2}(\mathcal{Z}) [18, 21, 27]. If XX and ZZ are jointly Gaussian, then we have E[X|Z]=P(Z)XE[X|Z]=P_{\mathcal{H}(Z)}X [18, 21], where (Z)\mathcal{H}(Z) is the Gaussian space [18, 21] generated by ZZ. Note that (Z)\mathcal{H}(Z) is a subspace of L2(𝒵)L^{2}(\mathcal{Z}).

Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. Suppose that the observations are given by

𝒛k=𝒔k+𝒘k,k=1,2,,\mbox{\boldmath$z$}_{k}=\mbox{\boldmath$s$}_{k}+\mbox{\boldmath$w$}_{k},~{}k=1,2,\cdots, (29)

where {𝒔k}\{\mbox{\boldmath$s$}_{k}\} is a zero-mean finite-variance signal process and {𝒘k}\{\mbox{\boldmath$w$}_{k}\} is a zero-mean white Gaussian noise. It is assumed that 𝒘k\mbox{\boldmath$w$}_{k} has a covariance matrix E[𝒘kT𝒘l]=RkδklE[\mbox{\boldmath$w$}_{k}^{T}\mbox{\boldmath$w$}_{l}]=R_{k}\delta_{kl}, where δkl\delta_{kl} is Kronecker’s delta. The innovation process is defined by

𝝂k=𝒛k𝒔^(k|k1),\mbox{\boldmath$\nu$}_{k}\stackrel{{\scriptstyle\triangle}}{{=}}\mbox{\boldmath$z$}_{k}-\mbox{\boldmath$\hat{s}$}(k|k-1), (30)

where 𝒔^(k|k1)\mbox{\boldmath$\hat{s}$}(k|k-1) is the linear least-squares estimate of 𝒔k\mbox{\boldmath$s$}_{k} given {𝒛l,1lk1}\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\}. Kailath [11, Section III-B] showed the following.

Proposition 2 (Kailath [11])

The covariance matrix of 𝛎k\mbox{\boldmath$\nu$}_{k} is given by

E[𝝂kT𝝂l]=(Pk+Rk)δkl,E[\mbox{\boldmath$\nu$}_{k}^{T}\mbox{\boldmath$\nu$}_{l}]=(P_{k}+R_{k})\delta_{kl}, (31)

where PkP_{k} is the covariance matrix of the error in the estimate 𝐬^(k|k1)\mbox{\boldmath$\hat{s}$}(k|k-1).

We remark that this result plays an essential role in this paper.

III-A Covariance Matrix Associated with the Input to the Main Decoder

In this section, by assuming that 𝒓k\mbox{\boldmath$r$}_{k} is the innovation, we calculate the covariance matrix of 𝒓k\mbox{\boldmath$r$}_{k}. Then the MMSE in estimating the input 𝒙k\mbox{\boldmath$x$}_{k} is obtained from the associated covariance matrix. In preparation for the purpose, we present a lemma.

Lemma 3 ([26])

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. The following quantities have the same value:

P(vk(1)=0,vk(2)=0)P(vk(1)=0)P(vk(2)=0)\displaystyle P(v_{k}^{(1)}=0,v_{k}^{(2)}=0)-P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=0) =\displaystyle= α00(1α1)(1α2)\displaystyle\alpha_{00}-(1-\alpha_{1})(1-\alpha_{2}) (32)
P(vk(1)=0)P(vk(2)=1)P(vk(1)=0,vk(2)=1)\displaystyle P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=1)-P(v_{k}^{(1)}=0,v_{k}^{(2)}=1) =\displaystyle= (1α1)α2α01\displaystyle(1-\alpha_{1})\alpha_{2}-\alpha_{01} (33)
P(vk(1)=1)P(vk(2)=0)P(vk(1)=1,vk(2)=0)\displaystyle P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=0)-P(v_{k}^{(1)}=1,v_{k}^{(2)}=0) =\displaystyle= α1(1α2)α10\displaystyle\alpha_{1}(1-\alpha_{2})-\alpha_{10} (34)
P(vk(1)=1,vk(2)=1)P(vk(1)=1)P(vk(2)=1)\displaystyle P(v_{k}^{(1)}=1,v_{k}^{(2)}=1)-P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=1) =\displaystyle= α11α1α2.\displaystyle\alpha_{11}-\alpha_{1}\alpha_{2}. (35)

The common value is denoted by δ\delta.

Remark: δ=0\delta=0 implies that vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} are mutually independent.

Proof:

Suppose that α1\alpha_{1} and α2\alpha_{2} are given. From the definition of αij\alpha_{ij}, we obtain a system of linear equations:

{α00+α01=1α1α10+α11=α1α00+α10=1α2α01+α11=α2.\left\{\begin{array}[]{l}\alpha_{00}+\alpha_{01}=1-\alpha_{1}\\ \alpha_{10}+\alpha_{11}=\alpha_{1}\\ \alpha_{00}+\alpha_{10}=1-\alpha_{2}\\ \alpha_{01}+\alpha_{11}=\alpha_{2}.\end{array}\right. (36)

This can be solved as

{α00=1α1α2+uα01=α2uα10=α1uα11=u,\left\{\begin{array}[]{l}\alpha_{00}=1-\alpha_{1}-\alpha_{2}+u\\ \alpha_{01}=\alpha_{2}-u\\ \alpha_{10}=\alpha_{1}-u\\ \alpha_{11}=u,\end{array}\right. (37)

where uu is an arbitrary constant. We remark that the probabilities (0)α00α11(1)(0\leq)~{}\alpha_{00}\sim\alpha_{11}~{}(\leq 1) are determined by uu, which in turn restricts the value of uu. Since 0αij0\leq\alpha_{ij}, uu must satisfy the following:

{uα1+α21uα2uα1u0.\left\{\begin{array}[]{l}u\geq\alpha_{1}+\alpha_{2}-1\\ u\leq\alpha_{2}\\ u\leq\alpha_{1}\\ u\geq 0.\end{array}\right. (38)

Note that 0αi1/2(i=1,2)0\leq\alpha_{i}\leq 1/2~{}(i=1,2) for 0ϵ1/20\leq\epsilon\leq 1/2 [25, Lemma 13]. Hence we have α1+α210\alpha_{1}+\alpha_{2}-1\leq 0. Accordingly, the value of uu is restricted to

0umin(α1,α2).0\leq u\leq\mbox{min}(\alpha_{1},\alpha_{2}). (39)

It is shown that αij1\alpha_{ij}\leq 1 is also satisfied for 0umin(α1,α2)0\leq u\leq\mbox{min}(\alpha_{1},\alpha_{2}).

Now we have

α00(1α1)(1α2)\displaystyle\alpha_{00}-(1-\alpha_{1})(1-\alpha_{2}) (40)
=(1α1α2+u)(1α1α2+α1α2)\displaystyle=(1-\alpha_{1}-\alpha_{2}+u)-(1-\alpha_{1}-\alpha_{2}+\alpha_{1}\alpha_{2})
=uα1α2\displaystyle=u-\alpha_{1}\alpha_{2}
=α11α1α2.\displaystyle=\alpha_{11}-\alpha_{1}\alpha_{2}.

We see that the remaining three quantities are equal also to uα1α2=α11α1α2u-\alpha_{1}\alpha_{2}=\alpha_{11}-\alpha_{1}\alpha_{2}. ∎

Let us show that δ0\delta\geq 0. We need the following.

Lemma 4
P(e1+e2++em=1)12P(e_{1}+e_{2}+\cdots+e_{m}=1)\leq\frac{1}{2} (41)

holds, where errors ej(1jm)e_{j}~{}(1\leq j\leq m) are statistically independent of each other.

Proof:

See Appendix A. ∎

Now we have the following.

Lemma 5

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. Then δ0\delta\geq 0 holds.

Proof:

See Appendix B. ∎

Proposition 3 ([26])

The covariance matrix associated with pr(x,y)p_{r}(x,y) is given by

Σr\displaystyle\Sigma_{r} =\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}} (σr12σr1r2σr1r2σr22)\displaystyle\left(\begin{array}[]{cc}\sigma_{r_{1}}^{2}&\sigma_{r_{1}r_{2}}\\ \sigma_{r_{1}r_{2}}&\sigma_{r_{2}}^{2}\end{array}\right) (44)
=\displaystyle= (1+4c2α1(1α1)4c2δ4c2δ1+4c2α2(1α2)).\displaystyle\left(\begin{array}[]{cc}1+4c^{2}\alpha_{1}(1-\alpha_{1})&4c^{2}\delta\\ 4c^{2}\delta&1+4c^{2}\alpha_{2}(1-\alpha_{2})\end{array}\right). (47)
Proof:

Note the relation

σr12=x2pr(x,y)𝑑x𝑑ymr12,\sigma_{r_{1}}^{2}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x^{2}\,p_{r}(x,y)dxdy-m_{r_{1}}^{2}, (48)

where

mr1=xpr(x,y)𝑑x𝑑y.m_{r_{1}}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\,p_{r}(x,y)dxdy. (49)

The first term is calculated as

x2pr(x,y)𝑑x𝑑y\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x^{2}\,p_{r}(x,y)dxdy (50)
=(α00+α01+α10+α11)(1+c2)\displaystyle=(\alpha_{00}+\alpha_{01}+\alpha_{10}+\alpha_{11})(1+c^{2})
=1+c2.\displaystyle=1+c^{2}.

Also, we have

mr1\displaystyle m_{r_{1}} =\displaystyle= xpr(x,y)𝑑x𝑑y\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x\,p_{r}(x,y)dxdy (51)
=\displaystyle= (α00+α01)c+(α10+α11)(c)\displaystyle(\alpha_{00}+\alpha_{01})c+(\alpha_{10}+\alpha_{11})(-c)
=\displaystyle= (1α1)cα1c\displaystyle(1-\alpha_{1})c-\alpha_{1}c
=\displaystyle= (12α1)c.\displaystyle(1-2\alpha_{1})c.

Then it follows that

σr12\displaystyle\sigma_{r_{1}}^{2} =\displaystyle= (1+c2)(12α1)2c2\displaystyle(1+c^{2})-(1-2\alpha_{1})^{2}c^{2} (52)
=\displaystyle= 1+4c2α1(1α1).\displaystyle 1+4c^{2}\alpha_{1}(1-\alpha_{1}).

Similarly, we have

σr22\displaystyle\sigma_{r_{2}}^{2} =\displaystyle= y2pr(x,y)𝑑x𝑑ymr22\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}y^{2}\,p_{r}(x,y)dxdy-m_{r_{2}}^{2} (53)
=\displaystyle= 1+4c2α2(1α2).\displaystyle 1+4c^{2}\alpha_{2}(1-\alpha_{2}).

Let us calculate σr1r2\sigma_{r_{1}r_{2}}. Note this time the relation

σr1r2=xypr(x,y)𝑑x𝑑ymr1mr2.\sigma_{r_{1}r_{2}}=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}xy\,p_{r}(x,y)dxdy-m_{r_{1}}m_{r_{2}}. (54)

Applying Lemma 3, we have

xypr(x,y)𝑑x𝑑y\displaystyle\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}xy\,p_{r}(x,y)dxdy (55)
=(α00+α11)c2(α01+α10)c2\displaystyle=(\alpha_{00}+\alpha_{11})c^{2}-(\alpha_{01}+\alpha_{10})c^{2}
=(1α1α2+2u)c2(α1+α22u)c2\displaystyle=(1-\alpha_{1}-\alpha_{2}+2u)c^{2}-(\alpha_{1}+\alpha_{2}-2u)c^{2}
=(12α12α2+4u)c2.\displaystyle=(1-2\alpha_{1}-2\alpha_{2}+4u)c^{2}.

Since

{mr1=(12α1)cmr2=(12α2)c,\left\{\begin{array}[]{l}m_{r_{1}}=(1-2\alpha_{1})c\\ m_{r_{2}}=(1-2\alpha_{2})c,\end{array}\right. (56)

it follows that

σr1r2\displaystyle\sigma_{r_{1}r_{2}} =\displaystyle= (12α12α2+4u)c2(12α1)(12α2)c2\displaystyle(1-2\alpha_{1}-2\alpha_{2}+4u)c^{2}-(1-2\alpha_{1})(1-2\alpha_{2})c^{2} (57)
=\displaystyle= 4(uα1α2)c2\displaystyle 4(u-\alpha_{1}\alpha_{2})c^{2}
=\displaystyle= 4δc2.\displaystyle 4\delta c^{2}.

III-B MMSE in Estimating the Input

Note Proposition 2. If 𝒓k=(rk(1),rk(2))\mbox{\boldmath$r$}_{k}=(r_{k}^{(1)},r_{k}^{(2)}) corresponds to the innovation, then the associated covariance matrix (denoted Σr\Sigma_{r}) is expressed as the sum of two matrices Σx\Sigma_{x} and Σw\Sigma_{w}, i.e., Σr=Σx+Σw\Sigma_{r}=\Sigma_{x}+\Sigma_{w}. Here, Σx\Sigma_{x} is the covariance matrix of the estimation error of the signal, whereas Σw\Sigma_{w} is the covariance matrix of the observation noise. From the definition of wjw_{j} (see Section I), it follows that

Σw=(1001)\Sigma_{w}=\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right) (58)

and hence we have

Σx=(4c2α1(1α1)4c2δ4c2δ4c2α2(1α2)).\Sigma_{x}=\left(\begin{array}[]{cc}4c^{2}\alpha_{1}(1-\alpha_{1})&4c^{2}\delta\\ 4c^{2}\delta&4c^{2}\alpha_{2}(1-\alpha_{2})\end{array}\right). (59)

Moreover, since the signal is expressed as c𝒙kc\mbox{\boldmath$x$}_{k}, the covariance matrix (denoted Σ^x\hat{\Sigma}_{x}) of the estimation error of 𝒙k\mbox{\boldmath$x$}_{k} becomes

Σ^x=(4α1(1α1)4δ4δ4α2(1α2)).\hat{\Sigma}_{x}=\left(\begin{array}[]{cc}4\alpha_{1}(1-\alpha_{1})&4\delta\\ 4\delta&4\alpha_{2}(1-\alpha_{2})\end{array}\right). (60)

Hence the corresponding MMSE is given by

mmse=trΣ^x=4α1(1α1)+4α2(1α2),\mbox{mmse}=\mbox{tr}\,\hat{\Sigma}_{x}=4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2}), (61)

where trΣ^x\mbox{tr}\,\hat{\Sigma}_{x} is the trace of matrix Σ^x\hat{\Sigma}_{x}.

Remark: The estimate in [11] is the “linear” least-squares estimate. On the other hand, the best estimate is given by the conditional expectation E[𝒙k|𝒛l,1lk1]E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1], which is in general a highly “nonlinear” functional of {𝒛l,1lk1}\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\} (cf. 𝒙k\mbox{\boldmath$x$}_{k} is not Gaussian). Note that if {𝒙k,𝒛k}\{\mbox{\boldmath$x$}_{k},\mbox{\boldmath$z$}_{k}\} are jointly Gaussian, then E[𝒙k|𝒛l,1lk1]E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1] is a linear functional of {𝒛l,1lk1}\{\mbox{\boldmath$z$}_{l},~{}1\leq l\leq k-1\} [16, Section II-D]. Hence to be exact, we have

mmse4α1(1α1)+4α2(1α2),\mbox{mmse}\leq 4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2}), (62)

where the left-hand side corresponds to the conditional expectation, whereas the right-hand side corresponds to the linear least-squares estimate. As stated above, the MMSE is a nonlinear functional of the past observations. Hence we regard the right-hand side as an approximation to the MMSE.

III-C Modification of the Derived MMSE

Recall that α1=P(vk(1)=1)\alpha_{1}=P(v_{k}^{(1)}=1) and α2=P(vk(2)=1)\alpha_{2}=P(v_{k}^{(2)}=1). That is, the MMSE in the estimation of 𝒙k\mbox{\boldmath$x$}_{k} is expressed in terms of the distribution of 𝒗k=(vk(1),vk(2))\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)}) for the main decoder in an SST Viterbi decoder. We remark that α1\alpha_{1} and α2\alpha_{2} are not mutually independent. As a result, 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) and 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) have a correlation. Hence the simple sum of 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) and 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) is not appropriate for the MMSE and some modification considering the degree of correlation is necessary. The variable δ\delta (see Lemma 3) has a close connection with the independence of vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)}. When they have a weak correlation, δ\delta is small, whereas when they have a strong correlation, δ\delta is large. This observation suggests that a modification

4α1(1α1)+4α2(1α2)λδ4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-\lambda\delta (63)

is more appropriate for the MMSE, where λ\lambda is some constant. In the following, we discuss how to determine the value of λ\lambda. We have the following.

Proposition 4

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. We have

(4α1(1α1))(4α2(1α2))(4δ)2,(4\alpha_{1}(1-\alpha_{1}))(4\alpha_{2}(1-\alpha_{2}))\geq(4\delta)^{2}, (64)

where δ=α11α1α2\delta=\alpha_{11}-\alpha_{1}\alpha_{2}.

Proof:

Note the relation: Σx=Σr+Σw\Sigma_{x}=\Sigma_{r}+\Sigma_{w}. Σr\Sigma_{r} is the covariance matrix associated with pr(x,y)p_{r}(x,y) and is positive semi-definite. Σw\Sigma_{w} (i.e., the identity matrix of size 2×22\times 2) is clearly positive semi-definite. Hence Σx\Sigma_{x} is positive semi-definite and hence det(Σx)0\mbox{det}(\Sigma_{x})\geq 0 [23], where “det()\mbox{det}(\cdot)” denotes the determinant. Since det(Σx)=c4det(Σ^x)\mbox{det}(\Sigma_{x})=c^{4}\mbox{det}(\hat{\Sigma}_{x}), we have det(Σ^x)0\mbox{det}(\hat{\Sigma}_{x})\geq 0. ∎

From Proposition 4, we have

(4α1(1α1))(4α2(1α2))(4δ)2.(4\alpha_{1}(1-\alpha_{1}))(4\alpha_{2}(1-\alpha_{2}))\geq(4\delta)^{2}.

So far we have carried out numerical calculations for four QLI codes C1C4C_{1}\sim C_{4} (regarded as general codes) and one general code C5C_{5} (these codes are defined in Sections IV and V). Then we have found that the difference between the values of 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) and 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) is small. Note that this fact is derived from the structure of G1GG^{-1}G. Let

G1G=(b11b12b21b22).G^{-1}G=\left(\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\end{array}\right). (65)

Also, denote by mcol(i)(i=1,2)m_{col}^{(i)}~{}(i=1,2) the number of terms (i.e., DjD^{j}) contained in (b1ib2i)\left(\begin{array}[]{c}b_{1i}\\ b_{2i}\end{array}\right). We see that αi(i=1,2)\alpha_{i}~{}(i=1,2) is determined by mcol(i)m_{col}^{(i)}. Hence if the difference between mcol(1)m_{col}^{(1)} and mcol(2)m_{col}^{(2)} is small, then α1α2\alpha_{1}\approx\alpha_{2} holds. For example, it is shown (see Section IV-B) that

  • 1)

    mcol(1)=5m_{col}^{(1)}=5 and mcol(2)=6m_{col}^{(2)}=6 for C1C_{1},

  • 2)

    mcol(1)=10m_{col}^{(1)}=10 and mcol(2)=10m_{col}^{(2)}=10 for C2C_{2}.

Thus we have 4α1(1α1)4α2(1α2)4\alpha_{1}(1-\alpha_{1})\approx 4\alpha_{2}(1-\alpha_{2}). As a consequence, we have approximately

{4α1(1α1)4δ4α2(1α2)4δ.\left\{\begin{array}[]{l}4\alpha_{1}(1-\alpha_{1})\geq 4\delta\\ 4\alpha_{2}(1-\alpha_{2})\geq 4\delta.\end{array}\right. (66)

That is,

4α1(1α1)+4α2(1α2)8δ04\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta\geq 0

holds with high probability.

Now we can show that this inequality always holds. That is, we have the following.

Proposition 5

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. We have

4α1(1α1)+4α2(1α2)8δ0,4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta\geq 0, (67)

where δ=α11α1α2\delta=\alpha_{11}-\alpha_{1}\alpha_{2}.

Proof:

See Appendix C. ∎

Remark: Let

Σ^x\displaystyle\hat{\Sigma}_{x} =\displaystyle= (4α1(1α1)4δ4δ4α2(1α2))\displaystyle\left(\begin{array}[]{cc}4\alpha_{1}(1-\alpha_{1})&4\delta\\ 4\delta&4\alpha_{2}(1-\alpha_{2})\end{array}\right) (70)
=\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}} (σ12σ12σ12σ22).\displaystyle\left(\begin{array}[]{cc}\sigma_{1}^{2}&\sigma_{12}\\ \sigma_{12}&\sigma_{2}^{2}\end{array}\right). (73)

Then 4α1(1α1)+4α2(1α2)8δ4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta corresponds to σ12+σ222σ12\sigma_{1}^{2}+\sigma_{2}^{2}-2\sigma_{12}. Denote by μ\mu the correlation coefficient between xk(1)x_{k}^{(1)} and xk(2)x_{k}^{(2)}, i.e.,

μ=σ12σ1σ2.\mu\stackrel{{\scriptstyle\triangle}}{{=}}\frac{\sigma_{12}}{\sigma_{1}\sigma_{2}}. (74)

Note that 1μ1-1\leq\mu\leq 1. We have

σ12+σ222σ12=σ12+σ222μσ1σ2.\sigma_{1}^{2}+\sigma_{2}^{2}-2\sigma_{12}=\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}. (75)

Now we restrict μ\mu to 0μ10\leq\mu\leq 1. As the special cases, the following hold:

  • 1)

    μ=0\mu=0: σ12+σ222μσ1σ2=σ12+σ22\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}=\sigma_{1}^{2}+\sigma_{2}^{2}.

  • 2)

    μ=1\mu=1: σ12+σ222μσ1σ2=(σ1σ2)2\sigma_{1}^{2}+\sigma_{2}^{2}-2\mu\sigma_{1}\sigma_{2}=(\sigma_{1}-\sigma_{2})^{2}.

Hence 8δ=2μσ1σ2-8\delta=-2\mu\sigma_{1}\sigma_{2} represents the correction term depending on the degree of correlation.

Based on Proposition 5, we finally set

mmse =\displaystyle= 4α1(1α1)+4α2(1α2)8δ\displaystyle 4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta (76)
=\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}} ξ(α1,α2,δ)\displaystyle\xi(\alpha_{1},\alpha_{2},\delta) (77)

as the MMSE in the estimation of 𝒙k\mbox{\boldmath$x$}_{k}.

III-D In the Case of QLI Codes

Consider a QLI code whose generator matrix is given by

G(D)=(g1(D),g1(D)+DL)(1Lν1).G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1).

Let 𝜼kL=(ηkL(1),ηkL(2))\mbox{\boldmath$\eta$}_{k-L}=(\eta_{k-L}^{(1)},\eta_{k-L}^{(2)}) be the input to the main decoder in an SST Viterbi decoder. We see that the results in the previous sections hold for 𝜼kL\mbox{\boldmath$\eta$}_{k-L} as well. Therefore, we state only the results.

Proposition 6 ([26])

The joint distribution of ηkL(1)\eta_{k-L}^{(1)} and ηkL(2)\eta_{k-L}^{(2)} (denoted pη(x,y)p_{\eta}(x,y)) is given by

pη(x,y)\displaystyle p_{\eta}(x,y) =\displaystyle= β00q(xc)q(yc)+β01q(xc)q(y+c)\displaystyle\beta_{00}q(x-c)q(y-c)+\beta_{01}q(x-c)q(y+c) (78)
+β10q(x+c)q(yc)+β11q(x+c)q(y+c),\displaystyle+\beta_{10}q(x+c)q(y-c)+\beta_{11}q(x+c)q(y+c),

where βij=P(vk(1)=i,vk(2)=j)\beta_{ij}=P(v_{k}^{(1)}=i,v_{k}^{(2)}=j).

Lemma 6 ([26])

Assume that 0ϵ1/20\leq\epsilon\leq 1/2. The following quantities have the same value:

P(vk(1)=0,vk(2)=0)P(vk(1)=0)P(vk(2)=0)\displaystyle P(v_{k}^{(1)}=0,v_{k}^{(2)}=0)-P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=0) =\displaystyle= β00(1β1)(1β2)\displaystyle\beta_{00}-(1-\beta_{1})(1-\beta_{2}) (79)
P(vk(1)=0)P(vk(2)=1)P(vk(1)=0,vk(2)=1)\displaystyle P(v_{k}^{(1)}=0)P(v_{k}^{(2)}=1)-P(v_{k}^{(1)}=0,v_{k}^{(2)}=1) =\displaystyle= (1β1)β2β01\displaystyle(1-\beta_{1})\beta_{2}-\beta_{01} (80)
P(vk(1)=1)P(vk(2)=0)P(vk(1)=1,vk(2)=0)\displaystyle P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=0)-P(v_{k}^{(1)}=1,v_{k}^{(2)}=0) =\displaystyle= β1(1β2)β10\displaystyle\beta_{1}(1-\beta_{2})-\beta_{10} (81)
P(vk(1)=1,vk(2)=1)P(vk(1)=1)P(vk(2)=1)\displaystyle P(v_{k}^{(1)}=1,v_{k}^{(2)}=1)-P(v_{k}^{(1)}=1)P(v_{k}^{(2)}=1) =\displaystyle= β11β1β2.\displaystyle\beta_{11}-\beta_{1}\beta_{2}. (82)

The common value is denoted by δ\delta^{\prime}.

Proposition 7 ([26])

The covariance matrix associated with pη(x,y)p_{\eta}(x,y) is given by

Ση\displaystyle\Sigma_{\eta} =\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}} (ση12ση1η2ση1η2ση22)\displaystyle\left(\begin{array}[]{cc}\sigma_{\eta_{1}}^{2}&\sigma_{\eta_{1}\eta_{2}}\\ \sigma_{\eta_{1}\eta_{2}}&\sigma_{\eta_{2}}^{2}\end{array}\right) (85)
=\displaystyle= (1+4c2β1(1β1)4c2δ4c2δ1+4c2β2(1β2)).\displaystyle\left(\begin{array}[]{cc}1+4c^{2}\beta_{1}(1-\beta_{1})&4c^{2}\delta^{\prime}\\ 4c^{2}\delta^{\prime}&1+4c^{2}\beta_{2}(1-\beta_{2})\end{array}\right). (88)
Proposition 8

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. We have

(4β1(1β1))(4β2(1β2))(4δ)2,(4\beta_{1}(1-\beta_{1}))(4\beta_{2}(1-\beta_{2}))\geq(4\delta^{\prime})^{2}, (89)

where δ=β11β1β2\delta^{\prime}=\beta_{11}-\beta_{1}\beta_{2}.

Proposition 9

Suppose that 0ϵ1/20\leq\epsilon\leq 1/2. We have

4β1(1β1)+4β2(1β2)8δ0,4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime}\geq 0, (90)

where δ=β11β1β2\delta^{\prime}=\beta_{11}-\beta_{1}\beta_{2}.

Finally, for QLI codes we set

mmse =\displaystyle= 4β1(1β1)+4β2(1β2)8δ\displaystyle 4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime} (91)
=\displaystyle\stackrel{{\scriptstyle\triangle}}{{=}} ξ(β1,β2,δ)\displaystyle\xi(\beta_{1},\beta_{2},\delta^{\prime}) (92)

as the MMSE in the estimation of 𝒙kL\mbox{\boldmath$x$}_{k-L}.

IV Mutual Information and MMSEs

We have shown that the MMSE in the estimation of 𝒙k\mbox{\boldmath$x$}_{k} is expressed in terms of the distribution of 𝒗k=(vk(1),vk(2))\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)}) for the main decoder in an SST Viterbi decoder. More precisely, we have derived the following:

  • 1)

    As a general code: mmse=ξ(α1,α2,δ)\mbox{mmse}=\xi(\alpha_{1},\alpha_{2},\delta).

  • 2)

    As a QLI code: mmse=ξ(β1,β2,δ)\mbox{mmse}=\xi(\beta_{1},\beta_{2},\delta^{\prime}).

In this section, we discuss the validity of the derived MMSE from the viewpoint of mutual information and mean-square error.

In this paper, we have used the signal model (see Section I):

𝒛k=c𝒙k+𝒘k.\mbox{\boldmath$z$}_{k}=c\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k}.

Since n0=2n_{0}=2, it follows that

c=2Es/N0=Eb/N0.c=\sqrt{2E_{s}/N_{0}}=\sqrt{E_{b}/N_{0}}. (93)

Then letting

ρ=c2=Eb/N0,\rho=c^{2}=E_{b}/N_{0}, (94)

the above equation is rewritten as

𝒛k=ρ𝒙k+𝒘k.\mbox{\boldmath$z$}_{k}=\sqrt{\rho}\,\mbox{\boldmath$x$}_{k}+\mbox{\boldmath$w$}_{k}. (95)

Note that this is just the signal model used in [6] (i.e., ρ=snr\rho=\mbox{snr}). Guo et al. [6] discussed the relation between the input-output mutual information and the MMSE in estimating the input in Gaussian channels. Their relation holds for discrete-time and continuous-time noncausal (smoothing) MMSE estimation regardless of the input statistics (i.e., not necessarily Gaussian). Here [25] recall that the innovations approach to Viterbi decoding of QLI codes has a close connection with smoothing in the linear estimation theory. Then we thought the results in [6] can be used to discuss the validity of the MMSE obtained in the previous section. Hence the argument in this section is based on that of Guo et al. [6].

IV-A General Codes

Let 𝒛n={𝒛1,𝒛2,,𝒛n}\mbox{\boldmath$z$}^{n}\stackrel{{\scriptstyle\triangle}}{{=}}\{\mbox{\boldmath$z$}_{1},\mbox{\boldmath$z$}_{2},\cdots,\mbox{\boldmath$z$}_{n}\}. Also, let E[𝒙|𝒛n]E[\mbox{\boldmath$x$}|\mbox{\boldmath$z$}^{n}] be the conditional expectation of 𝒙x given 𝒛n\mbox{\boldmath$z$}^{n}. According to Guo et al. [6, Section IV-A], let us define as follows:

pmmse(i,ρ)=E[(𝒙iE[𝒙i|𝒛i1])(𝒙iE[𝒙i|𝒛i1])T],\mbox{pmmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])^{T}], (96)

where pmmse(i,ρ)\mbox{pmmse}(i,\rho) represents the one-step prediction MMSE. Note that this is a function of ρ(=c2)\rho~{}(=c^{2}). This is because E[𝒙i|𝒛i1]E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}] depends on 𝒛i1\mbox{\boldmath$z$}^{i-1} (cf. 𝒛i1\mbox{\boldmath$z$}^{i-1} is a function of ρ\rho).

Remark 1: The above definition is slightly different from that in [6], where pmmse(i,ρ)\mbox{pmmse}(i,\rho) is defined as

pmmse(i,ρ)=E[|xiE[xi|zi1]|2],\mbox{pmmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[|x_{i}-E[x_{i}|z^{i-1}]|^{2}], (97)

where zi1={z1,,zi1}z^{i-1}\stackrel{{\scriptstyle\triangle}}{{=}}\{z_{1},\cdots,z_{i-1}\}. In order to distinguish it from ours, theirs is denoted by pmmsei(ρ)\mbox{pmmse}_{i}(\rho) in this section. Set n=2kn=2k. We have

i=1npmmsei(ρ)\displaystyle\sum_{i=1}^{n}\mbox{pmmse}_{i}(\rho) (98)
=i=1k(pmmse2i1(ρ)+pmmse2i(ρ)).\displaystyle=\sum_{i=1}^{k}(\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho)).

Here note the following:

pmmse2i1(ρ)\displaystyle\mbox{pmmse}_{2i-1}(\rho) =\displaystyle= E[|x2i1E[x2i1|z2i2]|2]\displaystyle E[|x_{2i-1}-E[x_{2i-1}|z^{2i-2}]|^{2}] (99)
pmmse2i(ρ)\displaystyle\mbox{pmmse}_{2i}(\rho) =\displaystyle= E[|x2iE[x2i|z2i1]|2]\displaystyle E[|x_{2i}-E[x_{2i}|z^{2i-1}]|^{2}] (100)
\displaystyle\leq E[|x2iE[x2i|z2i2]|2].\displaystyle E[|x_{2i}-E[x_{2i}|z^{2i-2}]|^{2}].

Hence

pmmse2i1(ρ)+pmmse2i(ρ)\displaystyle\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho) \displaystyle\leq E[|x2i1E[x2i1|z2i2]|2]\displaystyle E[|x_{2i-1}-E[x_{2i-1}|z^{2i-2}]|^{2}] (101)
+E[|x2iE[x2i|z2i2]|2]\displaystyle+E[|x_{2i}-E[x_{2i}|z^{2i-2}]|^{2}]
=\displaystyle= E[(𝒙iE[𝒙i|𝒛i1])(𝒙iE[𝒙i|𝒛i1])T]\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])^{T}]

holds. Thus we have

i=1npmmsei(ρ)\displaystyle\sum_{i=1}^{n}\mbox{pmmse}_{i}(\rho) (102)
=i=1k(pmmse2i1(ρ)+pmmse2i(ρ))\displaystyle=\sum_{i=1}^{k}(\mbox{pmmse}_{2i-1}(\rho)+\mbox{pmmse}_{2i}(\rho))
i=1kE[(𝒙iE[𝒙i|𝒛i1])(𝒙iE[𝒙i|𝒛i1])T]\displaystyle\leq\sum_{i=1}^{k}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])^{T}]
=i=1kpmmse(i,ρ).\displaystyle=\sum_{i=1}^{k}\mbox{pmmse}(i,\rho).

It is confirmed from Remark 1 that the relation in [6, Theorem 9] still holds. In fact, we have

I[𝒙k;𝒛k]\displaystyle I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}] \displaystyle\leq ρ2i=12kpmmsei(ρ)\displaystyle\frac{\rho}{2}\sum_{i=1}^{2k}\mbox{pmmse}_{i}(\rho) (103)
\displaystyle\leq ρ2i=1kpmmse(i,ρ).\displaystyle\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho).

Then it follows that

1ρ(I[𝒙k;𝒛k]k)12(1ki=1kpmmse(i,ρ)).\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\right). (104)

Our argument is based on the innovations approach proposed by Kailath [11] (cf. [25]). Also, our model is a discrete-time one. Hence it is reasonable to think that the MMSE associated with the estimation of 𝒙i\mbox{\boldmath$x$}_{i} is corresponding to pmmse(i,ρ)\mbox{pmmse}(i,\rho) (see Proposition 2). That is, it is appropriate to set

pmmse(i,ρ)=ξ(α1,α2,δ).\mbox{pmmse}(i,\rho)=\xi(\alpha_{1},\alpha_{2},\delta). (105)

Moreover, we set

1ki=1kpmmse(i,ρ)ξ(α1,α2,δ).\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\approx\xi(\alpha_{1},\alpha_{2},\delta). (106)

Note that the left-hand side is the average prediction MMSE (see [6, Section IV-A]). We finally have

1ρ(I[𝒙k;𝒛k]k)12ξ(α1,α2,δ).\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\lesssim\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta). (107)

On the other hand, the mutual information between 𝒙k\mbox{\boldmath$x$}^{k} and 𝒛k\mbox{\boldmath$z$}^{k} [2, Section 9.4] is evaluated as

I[𝒙k;𝒛k]12log(1+ρ)2k=klog(1+ρ).I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq\frac{1}{2}\log(1+\rho)^{2k}=k\log(1+\rho). (108)

Then we have

1ρ(I[𝒙k;𝒛k]k)log(1+ρ)ρ.\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{\log(1+\rho)}{\rho}. (109)

Remark 2: The essence of our argument lies in the equality

pmmse(i,ρ)=ξ(α1,α2,δ).\mbox{pmmse}(i,\rho)=\xi(\alpha_{1},\alpha_{2},\delta).

Meanwhile, using the signal model, pmmse(i,ρ)\mbox{pmmse}(i,\rho) is calculated as follows. Since 𝒙i\mbox{\boldmath$x$}_{i} and 𝒛i1\mbox{\boldmath$z$}^{i-1} are mutually independent, we have

pmmse(i,ρ)\displaystyle\mbox{pmmse}(i,\rho) =\displaystyle= E[(𝒙iE[𝒙i|𝒛i1])(𝒙iE[𝒙i|𝒛i1])T]\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{i-1}])^{T}] (110)
=\displaystyle= E[(𝒙iE[𝒙i])(𝒙iE[𝒙i])T]\displaystyle E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}])^{T}]
=\displaystyle= var(xi(1))+var(xi(2))\displaystyle\mbox{var}(x_{i}^{(1)})+\mbox{var}(x_{i}^{(2)})
=\displaystyle= 1+1=2,\displaystyle 1+1=2,

where “var” denotes the variance. Hence

ρ2i=1kpmmse(i,ρ)=kρ.\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)=k\rho. (111)

On the other hand, using the inequality: logxx1(x>0)\log\,x\leq x-1~{}(x>0), we have

1ρ(I[𝒙k;𝒛k]k)log(1+ρ)ρρρ=1.\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\leq\frac{\log(1+\rho)}{\rho}\leq\frac{\rho}{\rho}=1. (112)

Hence

I[𝒙k;𝒛k]kρ=ρ2i=1kpmmse(i,ρ)I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq k\rho=\frac{\rho}{2}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)

actually holds.

From the above argument, it is expected that log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) are closely related. Then in the next section, we will discuss the relation between them.

IV-B Numerical Results as General Codes

In order to examine the relation between log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta), numerical calculations have been carried out. We have taken two QLI codes:

  • 1)

    C1(ν=2,L=1)C_{1}~{}(\nu=2,L=1): The generator matrix is defined by G1(D)=(1+D+D2,1+D2)G_{1}(D)=(1+D+D^{2},1+D^{2}).

  • 2)

    C2(ν=6,L=2)C_{2}~{}(\nu=6,L=2): The generator matrix is defined by G2(D)=(1+D+D3+D4+D6,1+D+D2+D3+D4+D6)G_{2}(D)=(1+D+D^{3}+D^{4}+D^{6},1+D+D^{2}+D^{3}+D^{4}+D^{6}).

By regarding these QLI codes as “general” codes, we have compared the values of log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta). The results are shown in Tables I and II, where 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) is simply denoted 12mmse\frac{1}{2}\mbox{mmse}. The results for C1C_{1} and C2C_{2} are shown also in Fig.1 and Fig.2, respectively.

TABLE I: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and minimum mean-square error (C1C_{1} as a general code)
Eb/N0(dB)E_{b}/N_{0}~{}(\mbox{dB}) log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} α1\alpha_{1} 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) α2\alpha_{2} 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) δ\delta 12mmse\frac{1}{2}\mbox{mmse}
10-10 0.95310.9531 0.49950.4995 1.00001.0000 0.49990.4999 1.00001.0000 0.00380.0038 0.98480.9848
9-9 0.94190.9419 0.49920.4992 1.00001.0000 0.49980.4998 1.00001.0000 0.00530.0053 0.97880.9788
8-8 0.92820.9282 0.49860.4986 1.00001.0000 0.49960.4996 1.00001.0000 0.00740.0074 0.97040.9704
7-7 0.91180.9118 0.49760.4976 1.00001.0000 0.49920.4992 1.00001.0000 0.01020.0102 0.95920.9592
6-6 0.89210.8921 0.49580.4958 0.99990.9999 0.49840.4984 1.00001.0000 0.01420.0142 0.94320.9432
5-5 0.86890.8689 0.49300.4930 0.99980.9998 0.49700.4970 1.00001.0000 0.01930.0193 0.92270.9227
4-4 0.84180.8418 0.48830.4883 0.99950.9995 0.49450.4945 0.99990.9999 0.02620.0262 0.89490.8949
3-3 0.81060.8106 0.48080.4808 0.99850.9985 0.49000.4900 0.99960.9996 0.03520.0352 0.85830.8583
2-2 0.77530.7753 0.46910.4691 0.99620.9962 0.48230.4823 0.99870.9987 0.04650.0465 0.81150.8115
1-1 0.73600.7360 0.45150.4515 0.99060.9906 0.46960.4696 0.99630.9963 0.06020.0602 0.75270.7527
0 0.69310.6931 0.42590.4259 0.97800.9780 0.44940.4494 0.98980.9898 0.07580.0758 0.68070.6807
11 0.64730.6473 0.39040.3904 0.95200.9520 0.41910.4191 0.97380.9738 0.09170.0917 0.59610.5961
22 0.59920.5992 0.34420.3442 0.90290.9029 0.37660.3766 0.93910.9391 0.10500.1050 0.50100.5010
33 0.54980.5498 0.28790.2879 0.82010.8201 0.32130.3213 0.87230.8723 0.11160.1116 0.39980.3998
44 0.50010.5001 0.22550.2255 0.69860.6986 0.25650.2565 0.76280.7628 0.10760.1076 0.30030.3003
55 0.45100.4510 0.16210.1621 0.54330.5433 0.18760.1876 0.60960.6096 0.09210.0921 0.20810.2081
66 0.40330.4033 0.10490.1049 0.37560.3756 0.12310.1231 0.43180.4318 0.06810.0681 0.13130.1313
77 0.35790.3579 0.05990.0599 0.22520.2252 0.07100.0710 0.26380.2638 0.04270.0427 0.07370.0737
88 0.31530.3153 0.02390.0239 0.11380.1138 0.03490.0349 0.13470.1347 0.02220.0222 0.03550.0355
99 0.27580.2758 0.01200.0120 0.04740.0474 0.01430.0143 0.05640.0564 0.00940.0094 0.01430.0143
1010 0.23980.2398 0.00390.0039 0.01550.0155 0.00470.0047 0.01870.0187 0.00310.0031 0.00470.0047
\includegraphics

[width=10.0cm,clip]conl-2-G.ps

Figure 1: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12mmse=12ξ(α1,α2,δ)(C1)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)~{}(C_{1}).
TABLE II: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and minimum mean-square error (C2C_{2} as a general code)
Eb/N0(dB)E_{b}/N_{0}~{}(\mbox{dB}) log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} α1\alpha_{1} 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) α2\alpha_{2} 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) δ\delta 12mmse\frac{1}{2}\mbox{mmse}
10-10 0.95310.9531 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
9-9 0.94190.9419 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
8-8 0.92820.9282 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
7-7 0.91180.9118 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
6-6 0.89210.8921 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00010.0001 0.99960.9996
5-5 0.86890.8689 0.49990.4999 1.00001.0000 0.49990.4999 1.00001.0000 0.00030.0003 0.99880.9988
4-4 0.84180.8418 0.49970.4997 1.00001.0000 0.49970.4997 1.00001.0000 0.00060.0006 0.99760.9976
3-3 0.81060.8106 0.49930.4993 1.00001.0000 0.49930.4993 1.00001.0000 0.00130.0013 0.99480.9948
2-2 0.77530.7753 0.49810.4981 1.00001.0000 0.49810.4981 1.00001.0000 0.00290.0029 0.98840.9884
1-1 0.73600.7360 0.49530.4953 0.99990.9999 0.49530.4953 0.99990.9999 0.00600.0060 0.97590.9759
0 0.69310.6931 0.48900.4890 0.99950.9995 0.48900.4890 0.99950.9995 0.01170.0117 0.95270.9527
11 0.64730.6473 0.47600.4760 0.99770.9977 0.47600.4760 0.99770.9977 0.02150.0215 0.91180.9118
22 0.59920.5992 0.45150.4515 0.99060.9906 0.45150.4515 0.99060.9906 0.03630.0363 0.84520.8452
33 0.54980.5498 0.41000.4100 0.96760.9676 0.41000.4100 0.96760.9676 0.05530.0553 0.74640.7464
44 0.50010.5001 0.34930.3493 0.90910.9091 0.34930.3493 0.90910.9091 0.07310.0731 0.61680.6168
55 0.45100.4510 0.27170.2717 0.79150.7915 0.27170.2717 0.79150.7915 0.08140.0814 0.46590.4659
66 0.40330.4033 0.18780.1878 0.61010.6101 0.18780.1878 0.61010.6101 0.07400.0740 0.31390.3139
77 0.35790.3579 0.11260.1126 0.39980.3998 0.11260.1126 0.39980.3998 0.05380.0538 0.18470.1847
88 0.31530.3153 0.05690.0569 0.21450.2145 0.05690.0569 0.21450.2145 0.03060.0306 0.09200.0920
99 0.27580.2758 0.02370.0237 0.09250.0925 0.02370.0237 0.09250.0925 0.01360.0136 0.03810.0381
1010 0.23980.2398 0.00780.0078 0.03080.0308 0.00780.0078 0.03080.0308 0.00460.0046 0.01250.0125
\includegraphics

[width=10.0cm,clip]conl-6-G.ps

Figure 2: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12mmse=12ξ(α1,α2,δ)(C2)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)~{}(C_{2}).

With respect to the behaviors of log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta), we have the following:

(i) log(1+ρ)ρ\frac{\log(1+\rho)}{\rho}:

  • 1)

    ϵ1/2\epsilon\rightarrow 1/2: Since ρ0\rho\rightarrow 0, log(1+ρ)ρ1\frac{\log(1+\rho)}{\rho}\rightarrow 1 .

  • 2)

    ϵ0\epsilon\rightarrow 0: Since ρ\rho\rightarrow\infty, log(1+ρ)ρ0\frac{\log(1+\rho)}{\rho}\rightarrow 0 .

Note that log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} approaches 0 slowly as the SNR increases.

(ii) 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta):

  • 1)

    ϵ1/2\epsilon\rightarrow 1/2: Since αi1/2(i=1,2)\alpha_{i}\rightarrow 1/2~{}(i=1,2) and since δ0\delta\rightarrow 0,

    12ξ(α1,α2,δ)=12(4α1(1α1)+4α2(1α2)8δ)1.\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)=\frac{1}{2}(4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta)\rightarrow 1. (113)
  • 2)

    ϵ0\epsilon\rightarrow 0: Since αi0(i=1,2)\alpha_{i}\rightarrow 0~{}(i=1,2) and since δ0\delta\rightarrow 0,

    12ξ(α1,α2,δ)=12(4α1(1α1)+4α2(1α2)8δ)0.\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)=\frac{1}{2}(4\alpha_{1}(1-\alpha_{1})+4\alpha_{2}(1-\alpha_{2})-8\delta)\rightarrow 0. (114)

From Figs. 1 and 2, we observe that the behaviors of 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) for C1C_{1} and C2C_{2} are considerably different. Let us examine the cause of the difference.

Let 𝒗k=(vk(1),vk(2))\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)}) be the encoded block for the main decoder. We have already shown that the degree of correlation between vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} is expressed in terms of δ\delta (see Lemma 3). We also see that the degree of correlation between vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} has a close connection with the number of error terms (denoted mvm_{v}) by which vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} differ. In fact, when mvm_{v} is small, vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} have a strong correlation, whereas when mvm_{v} is large, vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} have a weak correlation. These observations show that δ\delta is closely related to mvm_{v}. Then it is expected that the behavior of 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) depends heavily on the value of mvm_{v} (i.e., on a code). In order to confirm this fact, let us evaluate the values of mvm_{v} for C1C_{1} and C2C_{2}. Note that since the encoded block is expressed as 𝒗k=(𝒆kG1)G\mbox{\boldmath$v$}_{k}=(\mbox{\boldmath$e$}_{k}G^{-1})G, mvm_{v} is obtained from G1GG^{-1}G.

C1C_{1}: The inverse encoder is given by

G11=(D1+D).G_{1}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right). (115)

Hence from

G11G1=(D+𝑫2+D3D+D31+D31+𝑫+𝑫2+D3),G_{1}^{-1}G_{1}=\left(\begin{array}[]{cc}D+\mbox{\boldmath$D$}^{2}+D^{3}&D+D^{3}\\ 1+D^{3}&1+\mbox{\boldmath$D$}+\mbox{\boldmath$D$}^{2}+D^{3}\end{array}\right), (116)

it follows that mv=3m_{v}=3 (i.e., bold-faced terms).

C2C_{2}: The inverse encoder is given by

G21=(D3+D4+D51+D+D3+D4+D5).G_{2}^{-1}=\left(\begin{array}[]{c}D^{3}+D^{4}+D^{5}\\ 1+D+D^{3}+D^{4}+D^{5}\end{array}\right). (117)

Then it is shown that

G21G2=(D3+D10+D11D3+𝑫5+𝑫6+𝑫7+D10+D111+𝑫2+𝑫5+𝑫6+𝑫7+D10+D111+𝑫3+D10+D11).G_{2}^{-1}G_{2}=\left(\begin{array}[]{cc}D^{3}+D^{10}+D^{11}&D^{3}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+\mbox{\boldmath$D$}^{7}+D^{10}+D^{11}\\ 1+\mbox{\boldmath$D$}^{2}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+\mbox{\boldmath$D$}^{7}+D^{10}+D^{11}&1+\mbox{\boldmath$D$}^{3}+D^{10}+D^{11}\end{array}\right). (118)

We see that mv=8m_{v}=8.

Now compare the values of δ\delta in Tables I and II. We observe that they are considerably different. For example, at the SNR of Eb/N0=0dBE_{b}/N_{0}=0\,\mbox{dB}, we have

δ(C1)=0.0758>δ(C2)=0.0117.\delta~{}(C_{1})=0.0758>\delta~{}(C_{2})=0.0117. (119)

We think the difference is due to the fact

mv(C1)=3<mv(C2)=8.m_{v}(C_{1})=3<m_{v}(C_{2})=8. (120)

mv(C1)=3m_{v}(C_{1})=3 means that there is a certain correlation between vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} and then we have δ=0.0758\delta=0.0758 (i.e., 8δ=0.60648\delta=0.6064). This value is fairly large. On the other hand, mv(C2)=8m_{v}(C_{2})=8 implies that vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} have a weak correlation, which results in a small value of δ=0.0117\delta=0.0117.

We have already seen that the behavior of 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) is dependent on mvm_{v}. Also, it has been shown that the values of mvm_{v} for C1C_{1} and C2C_{2} are considerably different. We think these explain the behaviors of curves in Figs. 1 and 2.

As observed above, since the behavior of 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) varies with the codes, it is appropriate to take the average with respect to a code. Then including C1C_{1} and C2C_{2}, we have taken additionally two QLI codes C3C_{3} and C4C_{4} whose generator matrices are defined by

G3(D)=(1+D+D2+D3,1+D+D3)G_{3}(D)=(1+D+D^{2}+D^{3},1+D+D^{3}) (121)

and

G4(D)=(1+D3+D4,1+D+D3+D4),G_{4}(D)=(1+D^{3}+D^{4},1+D+D^{3}+D^{4}), (122)

respectively. It is shown that C3C_{3} and C4C_{4} have mv=4m_{v}=4 and mv=5m_{v}=5, respectively. Then we have averaged the corresponding 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)’s. The result is shown in Fig.3, where 12mmse(av)\frac{1}{2}\mbox{mmse(av)} denotes the average value of 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) over four codes. We think the relation between log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) is shown more properly in Fig.3.

\includegraphics

[width=10.0cm,clip]conl-2346-G.ps

Figure 3: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and the average of 12mmse\frac{1}{2}\mbox{mmse}’s.

In Figs. 1, 2, and 3, we observe that 12ξ(α1,α2,δ)>log(1+ρ)ρ\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)>\frac{\log(1+\rho)}{\rho} holds at low-to-medium SNRs. However, the sign of inequality is reversed at high SNRs. We think this comes from the fact that 12ξ(α1,α2,δ)\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) approaches 0 rapidly as the SNR increases, whereas log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} approaches 0 slowly as the SNR increases.

IV-C QLI Codes

In [25], we have shown that the innovations approach to Viterbi decoding of QLI codes is related to “smoothing” in the linear estimation theory. As a result, the relationship between the mutual information and the MMSE can be discussed using the result in [6, Corollary 3]. For the purpose, we provide a proposition.

Proposition 10

Let 𝐱=(x1,,xn)\mbox{\boldmath$x$}=(x_{1},\cdots,x_{n}), where xix_{i} is the equiprobable binary input with unit variance. Each xix_{i} is assumed to be independent of all others. Also, let 𝐱~=(x~1,,x~n)\mbox{\boldmath$\tilde{x}$}=(\tilde{x}_{1},\cdots,\tilde{x}_{n}) be a standard Gaussian vector. Moreover, let 𝐰w be a standard Gaussian vector which is independent of 𝐱x and 𝐱~\tilde{x}. Then we have

ddρI[𝒙;ρ𝒙+𝒘]ddρI[𝒙~;ρ𝒙~+𝒘].\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}]\leq\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}]. (123)
Proof:

We follow Guo et al. [6]. Let 𝒛=ρ𝒙+𝒘\mbox{\boldmath$z$}=\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}, where 𝒛=(z1,,zn)\mbox{\boldmath$z$}=(z_{1},\cdots,z_{n}). We have

mmse(ρ)\displaystyle\mbox{mmse}(\rho) =\displaystyle= E[(𝒙E[𝒙|𝒛])(𝒙E[𝒙|𝒛])T]\displaystyle E[(\mbox{\boldmath$x$}-E[\mbox{\boldmath$x$}|\mbox{\boldmath$z$}])(\mbox{\boldmath$x$}-E[\mbox{\boldmath$x$}|\mbox{\boldmath$z$}])^{T}] (124)
=\displaystyle= E[|x1E[x1|𝒛]|2]++E[|xnE[xn|𝒛]|2]\displaystyle E[|x_{1}-E[x_{1}|\mbox{\boldmath$z$}]|^{2}]+\cdots+E[|x_{n}-E[x_{n}|\mbox{\boldmath$z$}]|^{2}]
\displaystyle\leq E[|x1E[x1|z1]|2]++E[|xnE[xn|zn]|2]\displaystyle E[|x_{1}-E[x_{1}|z_{1}]|^{2}]+\cdots+E[|x_{n}-E[x_{n}|z_{n}]|^{2}]
=\displaystyle= i=1nmmsei(ρ),\displaystyle\sum_{i=1}^{n}\mbox{mmse}_{i}(\rho),

where mmsei(ρ)=E[|xiE[xi|zi]|2]\mbox{mmse}_{i}(\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[|x_{i}-E[x_{i}|z_{i}]|^{2}]. Then it follows from [6, Theorems 1 and 2] that

ddρI[𝒙;ρ𝒙+𝒘]\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}] =\displaystyle= 12mmse(ρ)\displaystyle\frac{1}{2}\mbox{mmse}(\rho) (125)
\displaystyle\leq i=1n12mmsei(ρ)\displaystyle\sum_{i=1}^{n}\frac{1}{2}\mbox{mmse}_{i}(\rho)
=\displaystyle= i=1nddρI[xi;ρxi+wi].\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}].

(Remark 1: This relation holds for any random vector 𝒙x satisfying E[𝒙𝒙T]<E[\mbox{\boldmath$x$}\mbox{\boldmath$x$}^{T}]<\infty (cf. [6]).)

On the other hand, since x~i\tilde{x}_{i} is Gaussian (see [2, Section 9.4]), we have

ddρI[𝒙~;ρ𝒙~+𝒘]=i=1nddρI[x~i;ρx~i+wi].\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}]=\sum_{i=1}^{n}\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}]. (126)

Note that in the scalar case,

ddρI[xi;ρxi+wi]ddρI[x~i;ρx~i+wi]\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}]\leq\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}] (127)

holds (see Appendix D and see [6, Fig.1]). Hence we have

ddρI[𝒙;ρ𝒙+𝒘]\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$x$};\sqrt{\rho}\,\mbox{\boldmath$x$}+\mbox{\boldmath$w$}] \displaystyle\leq i=1nddρI[xi;ρxi+wi]\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[x_{i};\sqrt{\rho}\,x_{i}+w_{i}]
\displaystyle\leq i=1nddρI[x~i;ρx~i+wi]\displaystyle\sum_{i=1}^{n}\frac{d}{d\rho}I[\tilde{x}_{i};\sqrt{\rho}\,\tilde{x}_{i}+w_{i}]
=\displaystyle= ddρI[𝒙~;ρ𝒙~+𝒘].\displaystyle\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$};\sqrt{\rho}\,\mbox{\boldmath$\tilde{x}$}+\mbox{\boldmath$w$}].

Now the following [6, Corollary 3] has been shown:

ddρI[𝒙k;𝒛k]=12i=1kmmse(i,ρ),\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{1}{2}\sum_{i=1}^{k}\mbox{mmse}(i,\rho), (128)

where

mmse(i,ρ)=E[(𝒙iE[𝒙i|𝒛k])(𝒙iE[𝒙i|𝒛k])T](1ik).\mbox{mmse}(i,\rho)\stackrel{{\scriptstyle\triangle}}{{=}}E[(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{i}-E[\mbox{\boldmath$x$}_{i}|\mbox{\boldmath$z$}^{k}])^{T}]~{}(1\leq i\leq k). (129)

Remark 2: The definition of mmse(i,ρ)\mbox{mmse}(i,\rho) is slightly different from that in [6]. However, since the whole observations 𝒛k\mbox{\boldmath$z$}^{k} are used in each conditional expectation, the above equality holds also under our definition.

First note the left-hand side, i.e., ddρI[𝒙k;𝒛k]\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]. Let 𝒙~k\mbox{\boldmath$\tilde{x}$}_{k} be Gaussian. It follows from Proposition 10 that

ddρI[𝒙~k;𝒛k]ddρI[𝒙k;𝒛k].\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]\geq\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]. (130)

Since 𝒙~k\mbox{\boldmath$\tilde{x}$}_{k} is Gaussian, we have

I[𝒙~k;𝒛k]=12log(1+ρ)2k=klog(1+ρ)I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{1}{2}\log(1+\rho)^{2k}=k\log(1+\rho) (131)

and hence

ddρI[𝒙~k;𝒛k]=k1+ρ\frac{d}{d\rho}I[\mbox{\boldmath$\tilde{x}$}^{k};\mbox{\boldmath$z$}^{k}]=\frac{k}{1+\rho} (132)

holds. Therefore,

11+ρddρ(I[𝒙k;𝒛k]k)=12(1ki=1kmmse(i,ρ))\frac{1}{1+\rho}\geq\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)=\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\right) (133)

is obtained.

Next, note the right-hand side, i.e., 12i=1kmmse(i,ρ)\frac{1}{2}\sum_{i=1}^{k}\mbox{mmse}(i,\rho). For 1+Lk1+L\leq k, we have

i=1kmmse(i,ρ)\displaystyle\sum_{i=1}^{k}\mbox{mmse}(i,\rho) (134)
=E[(𝒙1E[𝒙1|𝒛k])(𝒙1E[𝒙1|𝒛k])T]\displaystyle=E[(\mbox{\boldmath$x$}_{1}-E[\mbox{\boldmath$x$}_{1}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{1}-E[\mbox{\boldmath$x$}_{1}|\mbox{\boldmath$z$}^{k}])^{T}]
\displaystyle\cdots
+E[(𝒙kLE[𝒙kL|𝒛k])(𝒙kLE[𝒙kL|𝒛k])T]\displaystyle+E[(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}|\mbox{\boldmath$z$}^{k}])^{T}]
\displaystyle\cdots
+E[(𝒙kE[𝒙k|𝒛k])(𝒙kE[𝒙k|𝒛k])T].\displaystyle+E[(\mbox{\boldmath$x$}_{k}-E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k}-E[\mbox{\boldmath$x$}_{k}|\mbox{\boldmath$z$}^{k}])^{T}].

Our concern is the evaluation of

mmse(kL,ρ)\displaystyle\mbox{mmse}(k-L,\rho) (135)
=E[(𝒙kLE[𝒙kL|𝒛k])(𝒙kLE[𝒙kL|𝒛k])T].\displaystyle=E[(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}|\mbox{\boldmath$z$}^{k}])(\mbox{\boldmath$x$}_{k-L}-E[\mbox{\boldmath$x$}_{k-L}|\mbox{\boldmath$z$}^{k}])^{T}].

For the purpose, we assume an approximation:

mmse(kL,ρ)1ki=1kmmse(i,ρ).\mbox{mmse}(k-L,\rho)\approx\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho). (136)

Hence in the relation

11+ρ\displaystyle\frac{1}{1+\rho} \displaystyle\geq ddρ(I[𝒙k;𝒛k]k)\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right) (137)
=\displaystyle= 12(1ki=1kmmse(i,ρ))\displaystyle\frac{1}{2}\left(\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\right)
\displaystyle\approx 12mmse(kL,ρ),\displaystyle\frac{1}{2}\mbox{mmse}(k-L,\rho),

we replace mmse(kL,ρ)\mbox{mmse}(k-L,\rho) by ξ(β1,β2,δ)\xi(\beta_{1},\beta_{2},\delta^{\prime}). Then we have

11+ρddρ(I[𝒙k;𝒛k]k)12ξ(β1,β2,δ).\frac{1}{1+\rho}\geq\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right)\approx\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}). (138)

Note that the above approximation is equivalent to set

1ki=1kmmse(i,ρ)ξ(β1,β2,δ),\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\approx\xi(\beta_{1},\beta_{2},\delta^{\prime}), (139)

where the left-hand side represents the average noncausal MMSE (see [6, Section IV-A]).

IV-D Numerical Results as QLI Codes

In order to confirm the validity of the derived equation, numerical calculations have been carried out. We have used the same QLI codes C1C_{1} and C2C_{2} as in the previous section. In this case, we regard these codes as inherent QLI codes and compare the values of 11+ρ\frac{1}{1+\rho} and 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}). The results are shown in Tables III and IV, where 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}) is simply denoted 12mmse\frac{1}{2}\mbox{mmse}. The results for C1C_{1} and C2C_{2} are shown also in Fig.4 and Fig.5, respectively.

TABLE III: 11+ρ\frac{1}{1+\rho} and minimum mean-square error (C1C_{1} as a QLI code)
Eb/N0(dB)E_{b}/N_{0}~{}(\mbox{dB}) 11+ρ\frac{1}{1+\rho} β1\beta_{1} 4β1(1β1)4\beta_{1}(1-\beta_{1}) β2\beta_{2} 4β2(1β2)4\beta_{2}(1-\beta_{2}) δ\delta^{\prime} 12mmse\frac{1}{2}\mbox{mmse}
10-10 0.90910.9091 0.49990.4999 1.00001.0000 0.49810.4981 1.00001.0000 0.01540.0154 0.93840.9384
9-9 0.88820.8882 0.49980.4998 1.00001.0000 0.49700.4970 1.00001.0000 0.01920.0192 0.92320.9232
8-8 0.86320.8632 0.49960.4996 1.00001.0000 0.49540.4954 0.99990.9999 0.02390.0239 0.90440.9044
7-7 0.83370.8337 0.49920.4992 1.00001.0000 0.49290.4929 0.99980.9998 0.02970.0297 0.88110.8811
6-6 0.79920.7992 0.49840.4984 1.00001.0000 0.48920.4892 0.99950.9995 0.03680.0368 0.85260.8526
5-5 0.75970.7597 0.49700.4970 1.00001.0000 0.48350.4835 0.99890.9989 0.04530.0453 0.81830.8183
4-4 0.71530.7153 0.49450.4945 0.99990.9999 0.47520.4752 0.99750.9975 0.05550.0555 0.77670.7767
3-3 0.66610.6661 0.49000.4900 0.99960.9996 0.46320.4632 0.99460.9946 0.06740.0674 0.72750.7275
2-2 0.61310.6131 0.48230.4823 0.99870.9987 0.44610.4461 0.98840.9884 0.08110.0811 0.66920.6692
1-1 0.55730.5573 0.46960.4696 0.99630.9963 0.42260.4226 0.97600.9760 0.09590.0959 0.60260.6026
0 0.50000.5000 0.44940.4494 0.98980.9898 0.39140.3914 0.95280.9528 0.11100.1110 0.52730.5273
11 0.44270.4427 0.41910.4191 0.97380.9738 0.35150.3515 0.91180.9118 0.12420.1242 0.44600.4460
22 0.38690.3869 0.37660.3766 0.93910.9391 0.30330.3033 0.84520.8452 0.13260.1326 0.36180.3618
33 0.33390.3339 0.32130.3213 0.87230.8723 0.24820.2482 0.74640.7464 0.13250.1325 0.27940.2794
44 0.28470.2847 0.25650.2565 0.76280.7628 0.19050.1905 0.61680.6168 0.12130.1213 0.20460.2046
55 0.24030.2403 0.18760.1876 0.60960.6096 0.13460.1346 0.46590.4659 0.09950.0995 0.13980.1398
66 0.20080.2008 0.12310.1231 0.43180.4318 0.08580.0858 0.31380.3138 0.07140.0714 0.08720.0872
77 0.16630.1663 0.07100.0710 0.26380.2638 0.04850.0485 0.18460.1846 0.04390.0439 0.04860.0486
88 0.13680.1368 0.03490.0349 0.13470.1347 0.02360.0236 0.09220.0922 0.02250.0225 0.02350.0235
99 0.11180.1118 0.01430.0143 0.05640.0564 0.00960.0096 0.03800.0380 0.00950.0095 0.00920.0092
1010 0.09090.0909 0.00470.0047 0.01870.0187 0.00310.0031 0.01240.0124 0.00310.0031 0.00320.0032
\includegraphics

[width=10.0cm,clip]conl-2-Q.ps

Figure 4: 11+ρ\frac{1}{1+\rho} and 12mmse=12ξ(β1,β2,δ)(C1)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})~{}(C_{1}).
TABLE IV: 11+ρ\frac{1}{1+\rho} and minimum mean-square error (C2C_{2} as a QLI code)
Eb/N0(dB)E_{b}/N_{0}~{}(\mbox{dB}) 11+ρ\frac{1}{1+\rho} β1\beta_{1} 4β1(1β1)4\beta_{1}(1-\beta_{1}) β2\beta_{2} 4β2(1β2)4\beta_{2}(1-\beta_{2}) δ\delta^{\prime} 12mmse\frac{1}{2}\mbox{mmse}
10-10 0.90910.9091 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.01540.0154 0.93840.9384
9-9 0.88820.8882 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.01920.0192 0.92320.9232
8-8 0.86320.8632 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.02390.0239 0.90440.9044
7-7 0.83370.8337 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.02970.0297 0.88120.8812
6-6 0.79920.7992 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.03680.0368 0.85280.8528
5-5 0.75970.7597 0.49990.4999 1.00001.0000 0.50000.5000 1.00001.0000 0.04540.0454 0.81840.8184
4-4 0.71530.7153 0.49970.4997 1.00001.0000 0.49990.4999 1.00001.0000 0.05570.0557 0.77720.7772
3-3 0.66610.6661 0.49930.4993 1.00001.0000 0.49980.4998 1.00001.0000 0.06780.0678 0.72880.7288
2-2 0.61310.6131 0.49810.4981 1.00001.0000 0.49940.4994 1.00001.0000 0.08200.0820 0.67200.6720
1-1 0.55730.5573 0.49530.4953 0.99990.9999 0.49810.4981 1.00001.0000 0.09840.0984 0.60640.6064
0 0.50000.5000 0.48900.4890 0.99950.9995 0.49490.4949 0.99990.9999 0.11640.1164 0.53410.5341
11 0.44270.4427 0.47600.4760 0.99770.9977 0.48690.4869 0.99930.9993 0.13590.1359 0.45480.4548
22 0.38690.3869 0.45150.4515 0.99060.9906 0.46950.4695 0.99630.9963 0.15530.1553 0.37210.3721
33 0.33390.3339 0.41000.4100 0.96760.9676 0.43620.4362 0.98370.9837 0.17170.1717 0.28900.2890
44 0.28470.2847 0.34930.3493 0.90910.9091 0.38140.3814 0.94370.9437 0.17880.1788 0.21120.2112
55 0.24030.2403 0.27170.2717 0.79150.7915 0.30480.3048 0.84760.8476 0.16920.1692 0.14290.1429
66 0.20080.2008 0.18780.1878 0.61010.6101 0.21590.2159 0.67700.6770 0.13880.1388 0.08830.0883
77 0.16630.1663 0.11260.1126 0.39980.3998 0.13190.1319 0.45800.4580 0.09500.0950 0.04900.0490
88 0.13680.1368 0.05690.0569 0.21450.2145 0.06740.0674 0.25150.2515 0.05240.0524 0.02360.0236
99 0.11180.1118 0.02370.0237 0.09250.0925 0.02830.0283 0.10990.1099 0.02290.0229 0.00960.0096
1010 0.09090.0909 0.00780.0078 0.03080.0308 0.00930.0093 0.03680.0368 0.00770.0077 0.00320.0032
\includegraphics

[width=10.0cm,clip]conl-6-Q.ps

Figure 5: 11+ρ\frac{1}{1+\rho} and 12mmse=12ξ(β1,β2,δ)(C2)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})~{}(C_{2}).

First note the following.

Lemma 7

Let ρ>0\rho>0. Then

11+ρlog(1+ρ)ρ\frac{1}{1+\rho}\leq\frac{\log(1+\rho)}{\rho} (140)

holds.

Proof:

It is shown [7, Theorem 63] that

uvulogu+ev1(u>0).uv\leq u\log\,u+e^{v-1}~{}(u>0). (141)

Letting v=1v=1, we have

uulogu+1(u>0).u\leq u\log\,u+1~{}(u>0).

Furthermore, letting u=ρ+1u=\rho+1, we have

1+ρ(1+ρ)log(1+ρ)+1(ρ>0).1+\rho\leq(1+\rho)\log(1+\rho)+1~{}(\rho>0).

This is equivalent to

11+ρlog(1+ρ)ρ.\frac{1}{1+\rho}\leq\frac{\log(1+\rho)}{\rho}.

The behavior of 11+ρ\frac{1}{1+\rho} is similar to that of log(1+ρ)ρ\frac{\log(1+\rho)}{\rho}. We have the following:

(i) 11+ρ\frac{1}{1+\rho}:

  • 1)

    ϵ1/2\epsilon\rightarrow 1/2: Since ρ0\rho\rightarrow 0, 11+ρ1\frac{1}{1+\rho}\rightarrow 1.

  • 2)

    ϵ0\epsilon\rightarrow 0: Since ρ\rho\rightarrow\infty, 11+ρ0\frac{1}{1+\rho}\rightarrow 0.

Note that 11+ρ\frac{1}{1+\rho} approaches 0 more rapidly as the SNR increases compared with log(1+ρ)ρ\frac{\log(1+\rho)}{\rho}.

(ii) 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}):

  • 1)

    ϵ1/2\epsilon\rightarrow 1/2: Since βi1/2(i=1,2)\beta_{i}\rightarrow 1/2~{}(i=1,2) and since δ0\delta^{\prime}\rightarrow 0,

    12ξ(β1,β2,δ)=12(4β1(1β1)+4β2(1β2)8δ)1.\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})=\frac{1}{2}(4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime})\rightarrow 1. (142)
  • 2)

    ϵ0\epsilon\rightarrow 0: Since βi0(i=1,2)\beta_{i}\rightarrow 0~{}(i=1,2) and since δ0\delta^{\prime}\rightarrow 0,

    12ξ(β1,β2,δ)=12(4β1(1β1)+4β2(1β2)8δ)0.\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})=\frac{1}{2}(4\beta_{1}(1-\beta_{1})+4\beta_{2}(1-\beta_{2})-8\delta^{\prime})\rightarrow 0. (143)

From Tables III and IV (or Figs. 4 and 5), we observe that the behaviors of 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}) for C1C_{1} and C2C_{2} are almost the same. This is because both C1C_{1} and C2C_{2} are regarded as QLI codes and hence have the same mv(=2)m_{v}~{}(=2), which results in almost equal values of δ\delta^{\prime}. Also, observe that 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}) provides a good approximation to 11+ρ\frac{1}{1+\rho} (Figs. 4 and 5).

V Discussion

In the previous section, we have discussed the validity of the derived MMSE using the results of Guo et al. [6]. In reality, we have examined

  • 1)

    the relation between log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12mmse=12ξ(α1,α2,δ)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta),

  • 2)

    the relation between 11+ρ\frac{1}{1+\rho} and 12mmse=12ξ(β1,β2,δ)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}).

Note that the relation between the mutual information and the MMSE has been reduced to the relation between a function of ρ\rho and the MMSE. That is, their relation has been examined not directly but indirectly. Now we see that the following approximation and evaluation are used in the argument in Section IV:

  • i)

    1ki=1kpmmse(i,ρ)ξ(α1,α2,δ)\frac{1}{k}\sum_{i=1}^{k}\mbox{pmmse}(i,\rho)\approx\xi(\alpha_{1},\alpha_{2},\delta) (1ki=1kmmse(i,ρ)ξ(β1,β2,δ)\frac{1}{k}\sum_{i=1}^{k}\mbox{mmse}(i,\rho)\approx\xi(\beta_{1},\beta_{2},\delta^{\prime})).

  • ii)

    I[𝒙k;𝒛k]klog(1+ρ)I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq k\log(1+\rho).

  • iii)

    ddρI[𝒙k;𝒛k]k1+ρ\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]\leq\frac{k}{1+\rho}.

Since ξ(α1,α2,δ)\xi(\alpha_{1},\alpha_{2},\delta) (ξ(β1,β2,δ)\xi(\beta_{1},\beta_{2},\delta^{\prime})) is not dependent on kk, averaging in i) seems to be reasonable. (But it has to be justified.) On the other hand, inequalities ii) and iii) come from the fact that the input in our signal model is not Gaussian and hence the input-output mutual information cannot be obtained in a concrete form. (In the scalar case, the mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). In our case, however, since the input is taken as a vector, we have used the conventional inequality on mutual information.) Consider the case that a QLI code is regarded as a general code. It follows from i) and ii) that

1ρ(I[𝒙k;𝒛k]k)\displaystyle\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right) \displaystyle\lesssim 12ξ(α1,α2,δ)\displaystyle\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta)
1ρ(I[𝒙k;𝒛k]k)\displaystyle\frac{1}{\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right) \displaystyle\leq log(1+ρ)ρ.\displaystyle\frac{\log(1+\rho)}{\rho}.

Similarly, in the case that a QLI code is regarded as the inherent QLI code, it follows from i) and iii) that

ddρ(I[𝒙k;𝒛k]k)\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right) \displaystyle\approx 12ξ(β1,β2,δ)\displaystyle\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})
ddρ(I[𝒙k;𝒛k]k)\displaystyle\frac{d}{d\rho}\left(\frac{I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}]}{k}\right) \displaystyle\leq 11+ρ.\displaystyle\frac{1}{1+\rho}.

Hence evaluations of I[𝒙k;𝒛k]I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}] and ddρI[𝒙k;𝒛k]\frac{d}{d\rho}I[\mbox{\boldmath$x$}^{k};\mbox{\boldmath$z$}^{k}] are important in our discussions. It is expected that inequalities ii) and iii) are tight at low SNRs, whereas they are loose at high SNRs (cf. [6, p.1263]). As a result, if approximation i) is appropriate, then crossing of two curves in Figs. 151\sim 5 is understandable. Hence the validity of approximation i) has to be further examined. Nevertheless, in the case that a given QLI code is regarded as the inherent QLI code, it has been shown that 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}) provides a good approximation to 11+ρ\frac{1}{1+\rho}. This is quite remarkable considering the fact that 11+ρ\frac{1}{1+\rho} is a function of ρ\rho, whereas 12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime}) is dependent on concrete convolutional coding. We think this fact implies the validity of our inference and the related results.

Here note the following: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} (11+ρ\frac{1}{1+\rho}) is a function only of ρ=c2=Eb/N0\rho=c^{2}=E_{b}/N_{0}, whereas 12mmse=12ξ(α1,α2,δ)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) (12ξ(β1,β2,δ)\frac{1}{2}\xi(\beta_{1},\beta_{2},\delta^{\prime})) is dependent on the encoded block for the main decoder. Hence at first glance, the comparison seems to be inappropriate. Although 12mmse\frac{1}{2}\mbox{mmse} actually depends on coding, it is a function of the channel error probability ϵ\epsilon. Since ϵ=Q(2Es/N0)=Q(Eb/N0)=Q(ρ)\epsilon=Q\bigl{(}\sqrt{2E_{s}/N_{0}}\bigr{)}=Q\bigl{(}\sqrt{E_{b}/N_{0}}\bigr{)}=Q\bigl{(}\sqrt{\rho}\bigr{)} (cf. n0=2n_{0}=2), 12mmse\frac{1}{2}\mbox{mmse} is also a function of ρ\rho. Hence the above comparison is justified.

By the way, the argument in Section IV is based on the signal model: zj=cxj+wjz_{j}=cx_{j}+w_{j} given in Section I. Note that xjx_{j} is determined by the encoded symbol yjy_{j} and hence this signal model is dependent clearly on convolutional coding. However, since each yjy_{j} is independent of all others, xjx_{j} can be seen as a random variable having values ±1\pm 1 with equal probability. When xjx_{j} is seen in this way, convolutional coding is not found explicitly in the expression zj=cxj+wjz_{j}=cx_{j}+w_{j}. In words, we cannot see from the signal model how {yj}\{y_{j}\} is generated. On the other hand, consider the MMSE in estimating the input based on the observations {zj}\{z_{j}\}. If the signal model is interpreted as above, then the MMSE seems to be independent of concrete convolutional coding. But this is not true. As we have already seen, the MMSE is replaced finally by ξ(α1,α2,δ)\xi(\alpha_{1},\alpha_{2},\delta) (ξ(β1,β2,δ)\xi(\beta_{1},\beta_{2},\delta^{\prime})), which is an essential step. By way of this replacement, the MMSE is connected with the convolutional coding. That is, convolutional coding is actually reflected in the MMSE.

From the results in Section IV, it seems that our argument is more convincing for QLI codes. We know that an SST Viterbi decoder functions well for QLI codes compared with general codes. In fact, QLI codes are preferable from a likelihood concentration viewpoint [22, 25]. Then a question arises: How is a likelihood concentration in the main decoder related to the MMSE? Suppose that the information uk=𝒆kG1(D)u_{k}=\mbox{\boldmath$e$}_{k}G^{-1}(D) (uk=𝒆kFu_{k}=\mbox{\boldmath$e$}_{k}F) for the main decoder consists of mum_{u} error terms. We know that a likelihood concentration in the main decoder depends on mum_{u} [25], whereas δ(δ)\delta~{}(\delta^{\prime}) is affected by mvm_{v} and hence the MMSE is dependent on mvm_{v}.

In the case of QLI codes, we have the following.

Proposition 11

Consider a QLI code whose generator matrix is given by

G(D)=(g1(D),g1(D)+DL)(1Lν1).G(D)=(g_{1}(D),g_{1}(D)+D^{L})~{}(1\leq L\leq\nu-1).

We have

mu=mv.m_{u}=m_{v}. (144)
Proof:

Let uk=e1++emu_{k}=e_{1}+\cdots+e_{m}, where errors ej(1jm)e_{j}~{}(1\leq j\leq m) are mutually independent. We have

𝒗k\displaystyle\mbox{\boldmath$v$}_{k} =\displaystyle= ukG(D)\displaystyle u_{k}G(D) (145)
=\displaystyle= (ukg1(D),ukg1(D)+ukDL)\displaystyle(u_{k}g_{1}(D),u_{k}g_{1}(D)+u_{k}D^{L})
=\displaystyle= (vk(1),vk(2)).\displaystyle(v_{k}^{(1)},v_{k}^{(2)}).

Then it follows that vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} differ by ukDLu_{k}D^{L}. Since uk=e1++emu_{k}=e_{1}+\cdots+e_{m}, vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} differ by mm error terms. ∎

We observe that the relation mu=mvm_{u}=m_{v} actually holds for C1C4C_{1}\sim C_{4} (see Section IV-B). The above result shows that in the case of QLI codes, a likelihood concentration in the main decoder and the degree of correlation between vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} (i.e., the value of δ(δ)\delta~{}(\delta^{\prime})) have a close connection. We remark that the former is independent of the latter in principle. The above result, however, shows that the two notions are closely related to each other.

Note that the equality mu=mvm_{u}=m_{v} does not hold for general codes in general. Then in order to examine the relation between a likelihood concentration in the main decoder and the MMSE, we can consider a code with small mum_{u} and large mvm_{v}. As an example, take the code C5C_{5} [22] whose generator matrix is defined by

G5(D)=(1+D+D4+D5+D6,1+D2+D3+D4+D6).G_{5}(D)=(1+D+D^{4}+D^{5}+D^{6},1+D^{2}+D^{3}+D^{4}+D^{6}). (146)

The inverse encoder is given by

G51=(D1+D)G_{5}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right) (147)

and we have mu=3m_{u}=3. On the other hand, it is shown that

G51G5=(D+𝑫2+D5+𝑫6+D7D+𝑫3+𝑫4+D5+D71+D2+𝑫4+D71+𝑫+D2+𝑫5+𝑫6+D7).G_{5}^{-1}G_{5}=\left(\begin{array}[]{cc}D+\mbox{\boldmath$D$}^{2}+D^{5}+\mbox{\boldmath$D$}^{6}+D^{7}&D+\mbox{\boldmath$D$}^{3}+\mbox{\boldmath$D$}^{4}+D^{5}+D^{7}\\ 1+D^{2}+\mbox{\boldmath$D$}^{4}+D^{7}&1+\mbox{\boldmath$D$}+D^{2}+\mbox{\boldmath$D$}^{5}+\mbox{\boldmath$D$}^{6}+D^{7}\end{array}\right). (148)

Then we have mv=8m_{v}=8.

Since mum_{u} is small, a likelihood concentration occurs in the main decoder [22, 25]. On the other hand, mv=8m_{v}=8 is considerably large. Here recall that when C2C_{2} is regarded as a general code, it has mv=8m_{v}=8. This value is the same as that for C5C_{5}. However, since C2C_{2} has mu=mv=8m_{u}=m_{v}=8 as a general code, a likelihood concentration in the main decoder is not expected at low-to-medium SNRs. Then in order to see the effect of mum_{u} on the values of 12mmse\frac{1}{2}\mbox{mmse}, let us compare the values of 12mmse=12ξ(α1,α2,δ)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) for C2C_{2} and C5C_{5}. For the purpose, we have evaluated 12mmse\frac{1}{2}\mbox{mmse} for C5C_{5}. The result is shown in Table V.

TABLE V: log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and minimum mean-square error (C5C_{5})
Eb/N0(dB)E_{b}/N_{0}~{}(\mbox{dB}) log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} α1\alpha_{1} 4α1(1α1)4\alpha_{1}(1-\alpha_{1}) α2\alpha_{2} 4α2(1α2)4\alpha_{2}(1-\alpha_{2}) δ\delta 12mmse\frac{1}{2}\mbox{mmse}
10-10 0.95310.9531 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
9-9 0.94190.9419 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
8-8 0.92820.9282 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
7-7 0.91180.9118 0.50000.5000 1.00001.0000 0.50000.5000 1.00001.0000 0.00000.0000 1.00001.0000
6-6 0.89210.8921 0.49990.4999 1.00001.0000 0.50000.5000 1.00001.0000 0.00020.0002 0.99920.9992
5-5 0.86890.8689 0.49980.4998 1.00001.0000 0.50000.5000 1.00001.0000 0.00020.0002 0.99920.9992
4-4 0.84180.8418 0.49940.4994 1.00001.0000 0.49990.4999 1.00001.0000 0.00060.0006 0.99760.9976
3-3 0.81060.8106 0.49860.4986 1.00001.0000 0.49960.4996 1.00001.0000 0.00140.0014 0.99440.9944
2-2 0.77530.7753 0.49670.4967 1.00001.0000 0.49890.4989 1.00001.0000 0.00290.0029 0.98840.9884
1-1 0.73600.7360 0.49250.4925 0.99980.9998 0.49700.4970 1.00001.0000 0.00600.0060 0.97590.9759
0 0.69310.6931 0.48390.4839 0.99900.9990 0.49250.4925 0.99980.9998 0.01170.0117 0.95260.9526
11 0.64730.6473 0.46750.4675 0.99580.9958 0.48230.4823 0.99870.9987 0.02140.0214 0.91170.9117
22 0.59920.5992 0.43870.4387 0.98500.9850 0.46150.4615 0.99410.9941 0.03630.0363 0.84440.8444
33 0.54980.5498 0.39320.3932 0.95440.9544 0.42420.4242 0.97700.9770 0.05530.0553 0.74450.7445
44 0.50010.5001 0.33010.3301 0.88450.8845 0.36630.3663 0.92850.9285 0.07310.0731 0.61410.6141
55 0.45100.4510 0.25310.2531 0.75620.7562 0.28890.2889 0.82170.8217 0.08140.0814 0.46340.4634
66 0.40330.4033 0.17270.1727 0.57150.5715 0.20210.2021 0.64500.6450 0.07410.0741 0.31190.3119
77 0.35790.3579 0.10260.1026 0.36830.3683 0.12240.1224 0.42970.4297 0.05370.0537 0.18420.1842
88 0.31530.3153 0.05150.0515 0.19540.1954 0.06220.0622 0.23330.2333 0.03060.0306 0.09200.0920
99 0.27580.2758 0.02140.0214 0.08380.0838 0.02600.0260 0.10130.1013 0.01360.0136 0.03820.0382
1010 0.23980.2398 0.00700.0070 0.02780.0278 0.00850.0085 0.03370.0337 0.00450.0045 0.01280.0128

In Table V, look at the values of 12mmse\frac{1}{2}\mbox{mmse}. We observe that they are almost the same as those for C2C_{2} (see Table II). That is, it seems that mum_{u} has little effect on the values of 12mmse\frac{1}{2}\mbox{mmse}. This observation is explained as follows. mu=3m_{u}=3 means that a likelihood concentration in the main decoder is notable. However, this does not necessarily affect the MMSE. A likelihood concentration really reduces the decoding complexity (i.e., the complexity in estimating the input). On the other hand, the MMSE is the estimation error which is finally attained after the estimation process. In other words, mum_{u} affects the complexity of estimation, whereas mvm_{v} is related to the final estimation error. Moreover, we can say it as follows: the MMSE is determined by the structure of G1GG^{-1}G. Hence even if G1G^{-1}’s are considerably different between two codes CC and C~\tilde{C}, when G1GG^{-1}G’s have almost equal values of mvm_{v}, the MMSE’s are close to each other. In fact, the inverse encoder for C2C_{2} is given by

G21=(D3+D4+D51+D+D3+D4+D5),G_{2}^{-1}=\left(\begin{array}[]{c}D^{3}+D^{4}+D^{5}\\ 1+D+D^{3}+D^{4}+D^{5}\end{array}\right),

whereas the inverse encoder for C5C_{5} is given by

G51=(D1+D).G_{5}^{-1}=\left(\begin{array}[]{c}D\\ 1+D\end{array}\right).

Two inverse encoders are quite different, but the values of mvm_{v} calculated from G1GG^{-1}G are equal, which results in almost the same MMSE.

VI Conclusion

In this paper, we have shown that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well. Although this fact can be obtained from the definition of innovations for hard-decision data, this time we have discussed the subject from the viewpoint of mutual information and mean-square error. Then by combining the present result with that in [25], it has been confirmed that the input to the main decoder in an SST Viterbi decoder is regarded as the innovation. Moreover, we have obtained an important result. Note that the MMSE has been expressed in terms of the distribution of the encoded block for the main decoder in an SST Viterbi decoder. Then through the argument, the input-output mutual information has been connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory.

On the other hand, we have problems to be further discussed. Note that since the input is not Gaussian, the discussions are based on inequalities and approximate expressions. In particular, when a given QLI code is regarded as a general code, the difference between log(1+ρ)ρ\frac{\log(1+\rho)}{\rho} and 12mmse=12ξ(α1,α2,δ)\frac{1}{2}\mbox{mmse}=\frac{1}{2}\xi(\alpha_{1},\alpha_{2},\delta) is not so small. We remark that this comparison has been done based on two inequalities regarding evaluation of the input-output mutual information. Moreover, the discussions are based partly on numerical calculations. Hence our argument seems to be slightly less rigorous in some places. Nevertheless, we think the closeness of two curves in Figs. 4 and 5 implies that the inference and the related results in this paper are reasonable.

Appendix A Proof of Lemma 4

We use mathematical induction on mm.

1) P(e1=1)=ϵ12P(e_{1}=1)=\epsilon\leq\frac{1}{2} is obvious.

2) Suppose that P(e1++em=1)12P(e_{1}+\cdots+e_{m}=1)\leq\frac{1}{2}. Let q=P(e1++em=1)q=P(e_{1}+\cdots+e_{m}=1). Then we have

P(e1++em+em+1=1)\displaystyle P(e_{1}+\cdots+e_{m}+e_{m+1}=1) (A.149)
=P(e1++em=0)P(em+1=1)\displaystyle=P(e_{1}+\cdots+e_{m}=0)P(e_{m+1}=1)
+P(e1++em=1)P(em+1=0)\displaystyle+P(e_{1}+\cdots+e_{m}=1)P(e_{m+1}=0)
=(1q)ϵ+q(1ϵ)\displaystyle=(1-q)\epsilon+q(1-\epsilon)
=ϵ+q(12ϵ).\displaystyle=\epsilon+q(1-2\epsilon).

Since q12q\leq\frac{1}{2} by the assumption, we have

P(e1++em+em+1=1)\displaystyle P(e_{1}+\cdots+e_{m}+e_{m+1}=1) \displaystyle\leq ϵ+12(12ϵ)\displaystyle\epsilon+\frac{1}{2}(1-2\epsilon) (A.150)
=\displaystyle= 12.\displaystyle\frac{1}{2}.

Appendix B Proof of Lemma 5

Suppose that α1(ϵ)\alpha_{1}(\epsilon), α2(ϵ)\alpha_{2}(\epsilon), and α11(ϵ)\alpha_{11}(\epsilon) have been determined given ϵ\epsilon. Here note 𝒗k=(vk(1),vk(2))=(𝒆kG1)G\mbox{\boldmath$v$}_{k}=(v_{k}^{(1)},v_{k}^{(2)})=(\mbox{\boldmath$e$}_{k}G^{-1})G. Let ee be the (error) terms common to vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)}. Then vk(1)v_{k}^{(1)} and vk(2)v_{k}^{(2)} are expressed as

vk(1)\displaystyle v_{k}^{(1)} =\displaystyle= e+e~1\displaystyle e+\tilde{e}_{1} (B.151)
vk(2)\displaystyle v_{k}^{(2)} =\displaystyle= e+e~2,\displaystyle e+\tilde{e}_{2}, (B.152)

where ee, e~1\tilde{e}_{1}, and e~2\tilde{e}_{2} are mutually independent. Set

{p=P(e=1)s=P(e~1=1)t=P(e~2=1).\left\{\begin{array}[]{l}p=P(e=1)\\ s=P(\tilde{e}_{1}=1)\\ t=P(\tilde{e}_{2}=1).\end{array}\right. (B.153)

We have

α11\displaystyle\alpha_{11} =\displaystyle= (1p)st+p(1s)(1t)\displaystyle(1-p)st+p(1-s)(1-t) (B.154)
α1\displaystyle\alpha_{1} =\displaystyle= (1p)s+p(1s)\displaystyle(1-p)s+p(1-s) (B.155)
α2\displaystyle\alpha_{2} =\displaystyle= (1p)t+p(1t).\displaystyle(1-p)t+p(1-t). (B.156)

By direct calculation, it is derived that

δ\displaystyle\delta =\displaystyle= α11α1α2\displaystyle\alpha_{11}-\alpha_{1}\alpha_{2} (B.157)
=\displaystyle= p(1p)(12s)(12t).\displaystyle p(1-p)(1-2s)(1-2t).

Since 0s120\leq s\leq\frac{1}{2} and since 0t120\leq t\leq\frac{1}{2} (see Lemma 4), it follows that δ0\delta\geq 0.

Appendix C Proof of Proposition 5

Suppose that α1(ϵ)\alpha_{1}(\epsilon), α2(ϵ)\alpha_{2}(\epsilon), and α11(ϵ)\alpha_{11}(\epsilon) have been determined given ϵ\epsilon. It suffices to show that

α1(1α1)+α2(1α2)2δ0.\alpha_{1}(1-\alpha_{1})+\alpha_{2}(1-\alpha_{2})-2\delta\geq 0.

We apply the same argument as that in the proof of Lemma 5. Let pp, ss, and tt be the same as those for Lemma 5. We have

α1(1α1)\displaystyle\alpha_{1}(1-\alpha_{1}) =\displaystyle= p(1p)+s(1s)4ps(1p)(1s)\displaystyle p(1-p)+s(1-s)-4ps(1-p)(1-s) (C.158)
α2(1α2)\displaystyle\alpha_{2}(1-\alpha_{2}) =\displaystyle= p(1p)+t(1t)4pt(1p)(1t)\displaystyle p(1-p)+t(1-t)-4pt(1-p)(1-t) (C.159)
δ\displaystyle\delta =\displaystyle= p(1p)(12s)(12t).\displaystyle p(1-p)(1-2s)(1-2t). (C.160)

By direct calculation, it follows that

α1(1α1)+α2(1α2)2δ\displaystyle\alpha_{1}(1-\alpha_{1})+\alpha_{2}(1-\alpha_{2})-2\delta (C.161)
=s(1s)+t(1t)+4p(1p)(st)20.\displaystyle=s(1-s)+t(1-t)+4p(1-p)(s-t)^{2}\geq 0.

Appendix D Explanation for ddρI[x;ρx+w]ddρI[x~;ρx~+w]\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w]

The mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). Let xx be the equiprobable binary input with unit variance. Using the relation [6, Theorem 1]

ddρI[x;ρx+w]=12mmse(ρ),\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]=\frac{1}{2}\mbox{mmse}(\rho), (D.162)

we have

ddρI[x;ρx+w]=12mmse(ρ)\displaystyle\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]=\frac{1}{2}\mbox{mmse}(\rho) (D.163)
=12(1ey222πtanh(ρρy)𝑑y).\displaystyle=\frac{1}{2}\left(1-\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy\right).

On the other hand, for the standard Gaussian input x~\tilde{x}, we have

ddρI[x~;ρx~+w]\displaystyle\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w] =\displaystyle= ddρ(12log(1+ρ))\displaystyle\frac{d}{d\rho}\left(\frac{1}{2}\log(1+\rho)\right) (D.164)
=\displaystyle= 1211+ρ.\displaystyle\frac{1}{2}\frac{1}{1+\rho}.

Accordingly, it suffices to show that

11+ρ1ey222πtanh(ρρy)𝑑y.\frac{1}{1+\rho}\geq 1-\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy. (D.165)

Note that this inequality is equivalent to

ey222πtanh(ρρy)𝑑yρ1+ρ.\int_{-\infty}^{\infty}\frac{e^{-\frac{y^{2}}{2}}}{\sqrt{2\pi}}\tanh(\rho-\sqrt{\rho}\,y)dy\geq\frac{\rho}{1+\rho}. (D.166)

Furthermore, the above is rewritten as

0e(yρ)22ρ2πρtanh(y)𝑑y\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-\rho)^{2}}{2\rho}}}{\sqrt{2\pi\rho}}\tanh(y)dy (D.167)
0e(y+ρ)22ρ2πρtanh(y)𝑑y\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+\rho)^{2}}{2\rho}}}{\sqrt{2\pi\rho}}\tanh(y)dy
ρ1+ρ,\displaystyle\geq\frac{\rho}{1+\rho},

where 12πρe(yρ)22ρ\frac{1}{\sqrt{2\pi\rho}}e^{-\frac{(y-\rho)^{2}}{2\rho}} represents the Gaussian distribution with mean ρ\rho and variance ρ\rho.

Here note the integrands on the left-hand side. The variable ρ\rho appears only in Gaussian distributions. Hence using a linear approximation of tanh(y)\tanh(y), numerical integration is possible. tanh(y)(0y<)\tanh(y)~{}(0\leq y<\infty) is approximated as

tanh(y){y,0y1212y+14,12y1310y+920,1y32115y+45,32y31,3y<.\tanh(y)\approx\left\{\begin{array}[]{rl}\displaystyle{y},&0\leq y\leq\frac{1}{2}\\ \displaystyle{\frac{1}{2}y+\frac{1}{4}},&\frac{1}{2}\leq y\leq 1\\ \displaystyle{\frac{3}{10}y+\frac{9}{20}},&1\leq y\leq\frac{3}{2}\\ \displaystyle{\frac{1}{15}y+\frac{4}{5}},&\frac{3}{2}\leq y\leq 3\\ \displaystyle{1},&3\leq y<\infty.\\ \end{array}\right. (D.168)

(cf. tanh(0)=0,tanh(0.5)0.4621,tanh(1.0)0.7616,tanh(1.5)0.9051,tanh(3.0)0.9951\tanh(0)=0,~{}\tanh(0.5)\approx 0.4621,~{}\tanh(1.0)\approx 0.7616,~{}\tanh(1.5)\approx 0.9051,~{}\tanh(3.0)\approx 0.9951)

If the above inequality holds for each ρ\rho, then ddρI[x;ρx+w]ddρI[x~;ρx~+w]\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w] will be shown. For example, let ρ=1\rho=1. The target inequality becomes

0e(y1)222πtanh(y)𝑑y\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy (D.169)
0e(y+1)222πtanh(y)𝑑y\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy
12.\displaystyle\geq\frac{1}{2}.

By carrying out numerical integration, we have

0e(y1)222πtanh(y)𝑑y\displaystyle\int_{0}^{\infty}\frac{e^{-\frac{(y-1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy (D.170)
0e(y+1)222πtanh(y)𝑑y\displaystyle-\int_{0}^{\infty}\frac{e^{-\frac{(y+1)^{2}}{2}}}{\sqrt{2\pi}}\tanh(y)dy
0.60790.0665=0.5414>12.\displaystyle\approx 0.6079-0.0665=0.5414>\frac{1}{2}.

Hence we actually have ddρI[x;ρx+w]ddρI[x~;ρx~+w]\frac{d}{d\rho}I[x;\sqrt{\rho}\,x+w]\leq\frac{d}{d\rho}I[\tilde{x};\sqrt{\rho}\,\tilde{x}+w] at ρ=1\rho=1.

References

  • [1] S. Arimoto, Kalman Filter, (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1977.
  • [2] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.
  • [3] T. E. Duncan, “On the calculation of mutual information,” SIAM J. Appl. Math., vol. 19, pp. 215–220, Jul. 1970.
  • [4] R. Esposito, “On a relation between detection and estimation in decision theory,” Inform. Contr., vol. 12, pp. 116-120, 1968.
  • [5] G. D. Forney, Jr., “Convolutional codes I: Algebraic structure,” IEEE Trans. Inf. Theory, vol. IT-16, no. 6, pp. 720–738, Nov. 1970.
  • [6] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005.
  • [7] G. H. Hardy, J. E. Littlewood, and G. Pólya, Inequalities, 2nd ed. Cambridge University Press, 1952. (H. hosokawa, Inequalities, (Japanese transl.). Tokyo, Japan: Springer-Verlag Tokyo, 2003.)
  • [8] K. Ito ans S. Watanabe, “Introduction to stochastic differential equations,” in Proc. Intern. Symp. SDE, pp. i–xxx, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
  • [9] A. H. Jazwinski, Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.
  • [10] T. T. Kadota, M. Zakai, and J. Ziv, “Mutual information of the white Gaussian channel with and without feedback,” IEEE Trans. Inf. Theory, vol. IT-17, no. 4, pp. 368–371, Jul. 1971.
  • [11] T. Kailath, “An innovations approach to least-squares estimation–Part I: Linear filtering in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 646–655, Dec. 1968.
  • [12] T. Kailath and P. Frost, “An innovations approach to least-squares estimation–Part II: Linear smoothing in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 655–660, Dec. 1968.
  • [13] T. Kailath, “A note on least-squares estimates from likelihood ratios,” Inform. Contr., vol. 13, pp. 534–540, 1968.
  • [14] T. Kailath, “A general likelihood-ratio formula for random signals in Gaussian noise,” IEEE Trans. Inf. Theory, vol. IT-15, no. 3, pp. 350–361, May 1969.
  • [15] T. Kailath, “A view of three decades of linear filtering theory (Invited paper),” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 146–181, Mar. 1974.
  • [16] T. Kailath and H. V. Poor, “Detection of stochastic processes (Invited paper),” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2230–2259, Oct. 1998.
  • [17] S. Kubota, S. Kato, and T. Ishitani, “Novel Viterbi decoder VLSI implementation and its performance,” IEEE Trans. Commun., vol. 41, no. 8, pp. 1170–1178, Aug. 1993.
  • [18] H. Kunita, Estimation of Stochastic Processes (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1976.
  • [19] P. Malliavin, “Stochastic calculus of variation and hypoelliptic operators,” in Proc. Intern. Symp. SDE, pp. 195–263, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
  • [20] J. L. Massey and D. J. Costello, Jr., “Nonsystematic convolutional codes for sequential decoding in space applications,” IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp. 806–813, Oct. 1971.
  • [21] B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, 5th ed. Berlin, Germany: Springer-Verlag, 1998.
  • [22] S. Ping, Y. Yan, and C. Feng, “An effective simplifying scheme for Viterbi decoder,” IEEE Trans. Commun., vol. 39, no. 1, pp. 1–3, Jan. 1991.
  • [23] G. Strang, Linear Algebra and Its Applications. New York: Academic Press, 1976. (M. Yamaguchi and A. Inoue, Linear Algebra and Its Applications, (Japanese transl.). Tokyo, Japan: Sangyo Tosho Publishing, 1978.)
  • [24] M. Tajima, K. Shibata, and Z. Kawasaki, “On the equivalence between Scarce-State-Transition Viterbi decoding and syndrome decoding of convolutional codes,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 2107–2116, Aug. 2003.
  • [25] M. Tajima, “An innovations approach to Viterbi decoding of convolutional codes,” IEEE Trans. Inf. Theory, vol. 65, no. 5, pp. 2704–2722, May 2019.
  • [26] M. Tajima, “Corrections to “An innovations approach to Viterbi decoding of convolutional codes”,” IEEE Trans. Inf. Theory, to be published.
  • [27] D. Williams, Probability with Martingales. Cambridge University Press, 1991. (J. Akahori et al., Probability with Martingales, (Japanese transl.). Tokyo, Japan: Baifukan, 2004.)
  • [28] E. Wong, Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill, 1971.
  • [29] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.