A Further Note on an Innovations Approach to Viterbi Decoding of Convolutional Codes
Abstract
In this paper, we show that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well from the viewpoint of mutual information and mean-square error. It is assumed that a code sequence is transmitted symbol by symbol over an AWGN channel using BPSK modulation. Then we can consider the signal model, where the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input. By assuming that the soft-decision input to the main decoder is the innovation, we show that the minimum mean-square error (MMSE) in estimating the binary input is expressed in terms of the distribution of the encoded block for the main decoder. It is shown that the obtained MMSE satisfies indirectly the known relation between the mutual information and the MMSE in Gaussian channels. Thus the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. Moreover, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.
Index Terms:
Convolutional codes, Scarce-State-Transition (SST) Viterbi decoder, innovations, filtering, smoothing, mean-square error, mutual information.I Introduction
Consider an SST Viterbi decoder [17, 24] which consists of a pre-decoder and a main decoder. In [25], by comparing with the results in the linear filtering theory [1, 9, 11, 12, 15], we showed that the hard-decision input to the main decoder can be seen as the innovation [11, 12, 16, 18]. In the coding theory, the framework of the theory is determined by hard-decision data. In the previous paper [25, Definition 1], we have given the definition of innovations for hard-decision data. We see that the definition applies also to soft-decision data. That is, if the associated hard-decision data is the innovation, then the original soft-decision data can be regarded as the innovation. Hence, we can consider the soft-decision input to the main decoder to be the innovation as well. In this paper, concerning this subject, we show that this is true from a different viewpoint. We use the innovations approach to least-squares estimation by Kailath [11, 12] and the relationship between the mutual information and the mean-square error (e.g., [6]).
Consider the situation where we have observations of a signal in additive white Gaussian noise. (In this paper, it is assumed that the signal is composed of the signal-to-noise ratio (SNR) and the equiprobable binary input, where the latter is not dependent on the SNR. In the following, we call the equiprobable binary input simply the input.) Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. In the discrete-time case [11, 15], he showed that the covariance matrix of the innovation is expressed as the sum of two covariance matrices, where one is corresponding to the estimation error of the signal and the other is corresponding to the observation noise. Then the mean-square error in estimating the signal is obtained from the former covariance matrix by taking its trace. Based on the result of Kailath, we thought as follows: if the soft-decision input to the main decoder corresponds to the innovation, then the associated covariance matrix should have the above property. That is, the covariance matrix of the soft-decision input to the main decoder can be decomposed as the sum of two covariance matrices and hence by following the above, the mean-square error in estimating the signal will be obtained. We remark that in this context, the obtained mean-square error has to be justified using some method. For the purpose, we have noted the relation between the mutual information and the mean-square error.
By the way, we derived the distribution of the input to the main decoder corresponding to a single code symbol in [25]. However, we could not obtain the joint distribution of the input to the main decoder corresponding to a branch. After our paper [25] was published, we have noticed that the distribution of the input to the main decoder corresponding to a single code symbol has a reasonable interpretation (see [26]). Then using this fact, the joint distribution of the input to the main decoder corresponding to a branch has been derived. In that case, the associated covariance matrix can be calculated. Then our argument for obtaining the mean-square error is as follows:
-
1)
Derive the joint distribution of the soft-decision input to the main decoder.
-
2)
Calculate the associated covariance matrix using the derived joint distribution.
-
3)
The covariance matrix of the estimation error of the signal is obtained by subtracting that of the observation noise from the matrix in 2). Moreover, by removing the SNR from the obtained covariance matrix, the covariance matrix of the estimation error of the input is derived.
-
4)
The mean-square error in estimating the input is given by the trace (i.e., the sum of diagonal elements) of the last covariance matrix.
In this way, we show that the mean-square error in estimating the input is expressed in terms of the distribution of the encoded block for the main decoder. Since in an SST Viterbi decoder, the estimation error is decoded at the main decoder, the result is reasonable.
On the other hand, we have to show the validity of the derived mean-square error. For the purpose, we note the relation between the mutual information and the mean-square error. Including these notions, the mutual information, the likelihood ratio (LR), and the mean-square error are central concerns in information theory, detection theory, and estimation theory. It is well known that the best estimate measured in mean-square sense (i.e., the least-squares estimate) is given by the conditional expectation [16, 18, 21, 27]. In this paper, the corresponding mean-square error is referred to as the minimum mean-square error (MMSE) [6]. In this case, depending on the amount of observations used for the estimation, the causal (filtering) MMSE and the noncausal (smoothing) MMSE are considered. When is used for the estimation of the input , it corresponds to filtering, whereas when is used for the estimation of , it corresponds to smoothing.
From the late 1960’s to the early 1970’s, the relation between the LR and the mean-square error was actively discussed. Kailath [14] showed that the LR is expressed in terms of the causal least-squares estimate. Moreover [13], he showed that the causal least-squares estimate can be obtained from the LR. Esposito [4] also derived the relation between the LR and the noncausal estimator, which is closely related to [13]. Subsequently, Duncan [3] and Kadota et al. [10] derived the relation between the mutual information and the causal mean-square error. Their works are regarded as extensions of the work of Gelfand and Yaglom (see [1, 3]), who discussed for the first time the relation between the mutual information and the filtering error. Later, Guo et al. [6] derived new formulas regarding the relation between the mutual information and the noncausal mean-square error. Furthermore, Zakai [29] showed that the relation between the mutual information and the estimation error holds also in the abstract Wiener Space [28, 29]. He applied not the Ito calculus [8, 21] but the Malliavin calculus [19, 29]. We remark that the additive Gaussian noise channel is assumed for all above works.
Among those works, we have noted the work of Guo et al. [6]. It deals with the relation between the mutual information and the MMSE. Also, their signal model is equal to ours and hence it seems favorable for manipulating. We thought if the MMSE obtained using the above method satisfies the relations in [6, Section IV-A], then the derived MMSE is justified, which in turn implies that the soft-decision input to the main decoder can be regarded as the innovation. The main text of the paper consists of the arguments stated above. By the way, it is shown that the MMSE is expressed in terms of the distribution of the encoded block for the main decoder. Then we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory. The remainder of this paper is organized as follows.
In Section II, based on the signal model in this paper, we derive the joint distribution of the input to the main decoder.
In Section III, the associated covariance matrix is calculated using the derived joint distribution. Subsequently, by subtracting the covariance matrix of the observation noise from it and by removing the SNR from the resulting matrix, the covariance matrix of the estimation error of the input is obtained. The MMSE in estimating the input is given by the trace of the last matrix. In this case, since the diagonal elements of the target matrix have a correlation, we modify the obtained MMSE. Note that this modification is essentially important. (We remark that some results in Sections II and III have been given in [26] with or without proofs. However, in order for the paper to be self-contained, the necessary materials are provided again with proofs in those sections.)
In Section IV, the validity of the derived MMSE is discussed. The argument is based on the relation between the mutual information and the MMSE. More precisely, we discuss using the results of Guo et al. [6, Section IV-A]. We remark that the input in our signal model is not Gaussian. Hence the input-output mutual information cannot be obtained in a concrete form. We have only inequalities and approximate expressions. For that reason, we carry out numerical calculations using concrete convolutional codes. In this case, in order to clarify the difference between causal estimation (filtering) and noncausal one (smoothing), we take QLI codes [20] with different constraint-lengths. Then the MMSE’s are calculated by regarding these QLI codes as general codes on one side and by regarding them as inherent QLI codes on the other. The obtained results are compared and carefully examined. Moreover, through the argument, we see that the input-output mutual information is connected with the distribution of the encoded block for the main decoder.
In Section V, we give several important comments regarding the discussions in Section IV.
Finally, in Section IV, we conclude with the main points of the paper and with problems to be further discussed.
Let us close this section by introducing the basic notions needed for this paper. Notations in this paper are same as those in [25] in principle. We always assume that the underlying field is . Let be a generator matrix for an convolutional code, where is assumed to be minimal [5]. A corresponding check matrix is also assumed to be minimal. Hence they have the same constraint length, denoted . Denote by and an information sequence and the corresponding code sequence, respectively, where is the information block at and is the encoded block at . In this paper, it is assumed that a code sequence is transmitted symbol by symbol over a memoryless AWGN channel using BPSK modulation. Let be a received sequence, where is the received block at . Each component of is modeled as
(1) | |||||
(2) |
where . Here, takes depending on whether the code symbol is or . That is, is the equiprobable binary input. (We call it simply the input.) and denote the energy per channel symbol and the single-sided noise spectral density, respectively. (Let be the energy per information bit. Then the relationship between and is defined by , where is the code rate.) Also, is a zero-mean unit variance Gaussian random variable with probability density function
(3) |
Each is independent of all others. The hard-decision (denoted “h”) data of is defined by
(4) |
In this case, the channel error probability (denoted ) is given by
(5) |
Note that the above signal model can be seen as the block at . In that case, we can rewrite as
(6) |
where , , and .
II Joint Distribution of the Input to the Main Decoder
The argument in this paper is based on the result of Kailath [11, Section III-B]. That is, by calculating the covariance matrix of the input to the main decoder, we derive the mean-square error in estimating the input. First we obtain the joint distribution of the input to the main decoder. In the following, and denote the probability and expectation, respectively.
Let be the input to the main decoder in an SST Viterbi decoder [25]. In connection with the distribution of [25, Proposition 12], is defined by
(8) |
We have noticed that this value has another interpretation. In fact, we have the following.
Lemma 1 ([26])
(9) |
holds, where is the encoded block for the main decoder.
Proof:
The hard-decision input to the main decoder is expressed as
where . Let . We have
(10) |
Hence it follows that
(11) | |||||
∎
Thus the distribution of is given by
(12) |
This equation means that if the code symbol is , then the associated distribution obeys , whereas if the code symbol is , then the associated distribution obeys . Hence the result is quite reasonable. On the other hand, since the distribution of is given by the above equation, are not mutually independent, because are not mutually independent.
Next, consider a QLI code whose generator matrix is given by
(13) |
Let be the input to the main decoder in an SST Viterbi decoder [25]. We see that almost the same argument applies to in [25, Proposition 14]. We have the following.
Lemma 2 ([26])
(14) |
holds, where is the encoded block for the main decoder.
Proof:
In the case of QLI codes [25, Section II-B], the hard-decision input to the main decoder is expressed as
where () and . Let be the syndrome. Since , the above is equivalent to
(15) |
Hence we have
(16) | |||||
for . ∎
In the rest of the paper, is assumed, because we are concerned with QLI codes in principle. Let us examine the relationship between and . Let and be the filtered estimate and the smoothed estimate, respectively [25, Section II-B]. We have
(17) | |||||
(18) | |||||
where . Then it follows that
(19) | |||||
(20) | |||||
From the meaning of filtering and smoothing, it is natural to think that is smaller than . Hence it is expected that
(21) |
i.e., holds. (In the derivation, , which is equivalent to a kind of stationarity, has been used.) However, this is not always true (see Section IV).
Since the meaning of has been clarified, we can derive the joint distribution (denoted ) of and . In fact, we have the following.
Proposition 1 ([26])
is given by
(22) | |||||
where .
Proof:
is obvious. Let us show that . Noting, for example,
we have
(23) | |||||
(Remark: is a multiple integral on the infinite interval, i.e., an improper integral. Consider a finite interval . Since is continuous on , a repeated integral is possible on and we have
Taking the limit as and as , the above equality is obtained.)
Next, let us calculate the marginal distribution of . We have
(24) | |||||
Note that this is just the distribution of . Similarly, we have
(25) |
where the right-hand side is the distribution of . All these facts show that is the joint distribution of and . ∎
III Mean-Square Error in Estimating the Input
Consider the signal model:
(26) |
where represents a signal of interest and represents random noise. We assume that and are mutually independent. Since we cannot know the value of directly, we have to estimate it based on the observation . The error of an estimate, denoted , of the input based on the observation can be measured in mean-square sense
(27) |
where “T” means transpose. It is well known [16, 18, 21, 27] that the minimum of the above value is achieved by the conditional expectation
(28) |
is the least-squares estimate and the corresponding estimation error is referred to as the minimum mean-square error (MMSE) [6]. In the following, the value of the MMSE is denoted by “mmse”.
Remark: Let be the -field generated by . Also, denote by the set of elements in which are measurable, where is the set of square integrable random variables. Then we have , where is the orthogonal projection of onto the space [18, 21, 27]. If and are jointly Gaussian, then we have [18, 21], where is the Gaussian space [18, 21] generated by . Note that is a subspace of .
Kailath [11, 12] applied the innovations method to linear filtering/smoothing problems. Suppose that the observations are given by
(29) |
where is a zero-mean finite-variance signal process and is a zero-mean white Gaussian noise. It is assumed that has a covariance matrix , where is Kronecker’s delta. The innovation process is defined by
(30) |
where is the linear least-squares estimate of given . Kailath [11, Section III-B] showed the following.
Proposition 2 (Kailath [11])
The covariance matrix of is given by
(31) |
where is the covariance matrix of the error in the estimate .
We remark that this result plays an essential role in this paper.
III-A Covariance Matrix Associated with the Input to the Main Decoder
In this section, by assuming that is the innovation, we calculate the covariance matrix of . Then the MMSE in estimating the input is obtained from the associated covariance matrix. In preparation for the purpose, we present a lemma.
Lemma 3 ([26])
Suppose that . The following quantities have the same value:
(32) | |||||
(33) | |||||
(34) | |||||
(35) |
The common value is denoted by .
Remark: implies that and are mutually independent.
Proof:
Suppose that and are given. From the definition of , we obtain a system of linear equations:
(36) |
This can be solved as
(37) |
where is an arbitrary constant. We remark that the probabilities are determined by , which in turn restricts the value of . Since , must satisfy the following:
(38) |
Note that for [25, Lemma 13]. Hence we have . Accordingly, the value of is restricted to
(39) |
It is shown that is also satisfied for .
Now we have
(40) | |||||
We see that the remaining three quantities are equal also to . ∎
Let us show that . We need the following.
Lemma 4
(41) |
holds, where errors are statistically independent of each other.
Proof:
See Appendix A. ∎
Now we have the following.
Lemma 5
Suppose that . Then holds.
Proof:
See Appendix B. ∎
Proposition 3 ([26])
The covariance matrix associated with is given by
(44) | |||||
(47) |
Proof:
Note the relation
(48) |
where
(49) |
The first term is calculated as
(50) | |||||
Also, we have
(51) | |||||
Then it follows that
(52) | |||||
Similarly, we have
(53) | |||||
Let us calculate . Note this time the relation
(54) |
Applying Lemma 3, we have
(55) | |||||
Since
(56) |
it follows that
(57) | |||||
∎
III-B MMSE in Estimating the Input
Note Proposition 2. If corresponds to the innovation, then the associated covariance matrix (denoted ) is expressed as the sum of two matrices and , i.e., . Here, is the covariance matrix of the estimation error of the signal, whereas is the covariance matrix of the observation noise. From the definition of (see Section I), it follows that
(58) |
and hence we have
(59) |
Moreover, since the signal is expressed as , the covariance matrix (denoted ) of the estimation error of becomes
(60) |
Hence the corresponding MMSE is given by
(61) |
where is the trace of matrix .
Remark: The estimate in [11] is the “linear” least-squares estimate. On the other hand, the best estimate is given by the conditional expectation , which is in general a highly “nonlinear” functional of (cf. is not Gaussian). Note that if are jointly Gaussian, then is a linear functional of [16, Section II-D]. Hence to be exact, we have
(62) |
where the left-hand side corresponds to the conditional expectation, whereas the right-hand side corresponds to the linear least-squares estimate. As stated above, the MMSE is a nonlinear functional of the past observations. Hence we regard the right-hand side as an approximation to the MMSE.
III-C Modification of the Derived MMSE
Recall that and . That is, the MMSE in the estimation of is expressed in terms of the distribution of for the main decoder in an SST Viterbi decoder. We remark that and are not mutually independent. As a result, and have a correlation. Hence the simple sum of and is not appropriate for the MMSE and some modification considering the degree of correlation is necessary. The variable (see Lemma 3) has a close connection with the independence of and . When they have a weak correlation, is small, whereas when they have a strong correlation, is large. This observation suggests that a modification
(63) |
is more appropriate for the MMSE, where is some constant. In the following, we discuss how to determine the value of . We have the following.
Proposition 4
Suppose that . We have
(64) |
where .
Proof:
Note the relation: . is the covariance matrix associated with and is positive semi-definite. (i.e., the identity matrix of size ) is clearly positive semi-definite. Hence is positive semi-definite and hence [23], where “” denotes the determinant. Since , we have . ∎
From Proposition 4, we have
So far we have carried out numerical calculations for four QLI codes (regarded as general codes) and one general code (these codes are defined in Sections IV and V). Then we have found that the difference between the values of and is small. Note that this fact is derived from the structure of . Let
(65) |
Also, denote by the number of terms (i.e., ) contained in . We see that is determined by . Hence if the difference between and is small, then holds. For example, it is shown (see Section IV-B) that
-
1)
and for ,
-
2)
and for .
Thus we have . As a consequence, we have approximately
(66) |
That is,
holds with high probability.
Now we can show that this inequality always holds. That is, we have the following.
Proposition 5
Suppose that . We have
(67) |
where .
Proof:
See Appendix C. ∎
Remark: Let
(70) | |||||
(73) |
Then corresponds to . Denote by the correlation coefficient between and , i.e.,
(74) |
Note that . We have
(75) |
Now we restrict to . As the special cases, the following hold:
-
1)
: .
-
2)
: .
Hence represents the correction term depending on the degree of correlation.
Based on Proposition 5, we finally set
mmse | (76) | ||||
(77) |
as the MMSE in the estimation of .
III-D In the Case of QLI Codes
Consider a QLI code whose generator matrix is given by
Let be the input to the main decoder in an SST Viterbi decoder. We see that the results in the previous sections hold for as well. Therefore, we state only the results.
Proposition 6 ([26])
The joint distribution of and (denoted ) is given by
(78) | |||||
where .
Lemma 6 ([26])
Assume that . The following quantities have the same value:
(79) | |||||
(80) | |||||
(81) | |||||
(82) |
The common value is denoted by .
Proposition 7 ([26])
The covariance matrix associated with is given by
(85) | |||||
(88) |
Proposition 8
Suppose that . We have
(89) |
where .
Proposition 9
Suppose that . We have
(90) |
where .
Finally, for QLI codes we set
mmse | (91) | ||||
(92) |
as the MMSE in the estimation of .
IV Mutual Information and MMSEs
We have shown that the MMSE in the estimation of is expressed in terms of the distribution of for the main decoder in an SST Viterbi decoder. More precisely, we have derived the following:
-
1)
As a general code: .
-
2)
As a QLI code: .
In this section, we discuss the validity of the derived MMSE from the viewpoint of mutual information and mean-square error.
In this paper, we have used the signal model (see Section I):
Since , it follows that
(93) |
Then letting
(94) |
the above equation is rewritten as
(95) |
Note that this is just the signal model used in [6] (i.e., ). Guo et al. [6] discussed the relation between the input-output mutual information and the MMSE in estimating the input in Gaussian channels. Their relation holds for discrete-time and continuous-time noncausal (smoothing) MMSE estimation regardless of the input statistics (i.e., not necessarily Gaussian). Here [25] recall that the innovations approach to Viterbi decoding of QLI codes has a close connection with smoothing in the linear estimation theory. Then we thought the results in [6] can be used to discuss the validity of the MMSE obtained in the previous section. Hence the argument in this section is based on that of Guo et al. [6].
IV-A General Codes
Let . Also, let be the conditional expectation of given . According to Guo et al. [6, Section IV-A], let us define as follows:
(96) |
where represents the one-step prediction MMSE. Note that this is a function of . This is because depends on (cf. is a function of ).
Remark 1: The above definition is slightly different from that in [6], where is defined as
(97) |
where . In order to distinguish it from ours, theirs is denoted by in this section. Set . We have
(98) | |||||
Here note the following:
(99) | |||||
(100) | |||||
Hence
(101) | |||||
holds. Thus we have
(102) | |||||
It is confirmed from Remark 1 that the relation in [6, Theorem 9] still holds. In fact, we have
(103) | |||||
Then it follows that
(104) |
Our argument is based on the innovations approach proposed by Kailath [11] (cf. [25]). Also, our model is a discrete-time one. Hence it is reasonable to think that the MMSE associated with the estimation of is corresponding to (see Proposition 2). That is, it is appropriate to set
(105) |
Moreover, we set
(106) |
Note that the left-hand side is the average prediction MMSE (see [6, Section IV-A]). We finally have
(107) |
On the other hand, the mutual information between and [2, Section 9.4] is evaluated as
(108) |
Then we have
(109) |
Remark 2: The essence of our argument lies in the equality
Meanwhile, using the signal model, is calculated as follows. Since and are mutually independent, we have
(110) | |||||
where “var” denotes the variance. Hence
(111) |
On the other hand, using the inequality: , we have
(112) |
Hence
actually holds.
From the above argument, it is expected that and are closely related. Then in the next section, we will discuss the relation between them.
IV-B Numerical Results as General Codes
In order to examine the relation between and , numerical calculations have been carried out. We have taken two QLI codes:
-
1)
: The generator matrix is defined by .
-
2)
: The generator matrix is defined by .
By regarding these QLI codes as “general” codes, we have compared the values of and . The results are shown in Tables I and II, where is simply denoted . The results for and are shown also in Fig.1 and Fig.2, respectively.
[width=10.0cm,clip]conl-2-G.ps
[width=10.0cm,clip]conl-6-G.ps
With respect to the behaviors of and , we have the following:
(i) :
-
1)
: Since , .
-
2)
: Since , .
Note that approaches slowly as the SNR increases.
(ii) :
-
1)
: Since and since ,
(113) -
2)
: Since and since ,
(114)
From Figs. 1 and 2, we observe that the behaviors of for and are considerably different. Let us examine the cause of the difference.
Let be the encoded block for the main decoder. We have already shown that the degree of correlation between and is expressed in terms of (see Lemma 3). We also see that the degree of correlation between and has a close connection with the number of error terms (denoted ) by which and differ. In fact, when is small, and have a strong correlation, whereas when is large, and have a weak correlation. These observations show that is closely related to . Then it is expected that the behavior of depends heavily on the value of (i.e., on a code). In order to confirm this fact, let us evaluate the values of for and . Note that since the encoded block is expressed as , is obtained from .
: The inverse encoder is given by
(115) |
Hence from
(116) |
it follows that (i.e., bold-faced terms).
: The inverse encoder is given by
(117) |
Then it is shown that
(118) |
We see that .
Now compare the values of in Tables I and II. We observe that they are considerably different. For example, at the SNR of , we have
(119) |
We think the difference is due to the fact
(120) |
means that there is a certain correlation between and and then we have (i.e., ). This value is fairly large. On the other hand, implies that and have a weak correlation, which results in a small value of .
We have already seen that the behavior of is dependent on . Also, it has been shown that the values of for and are considerably different. We think these explain the behaviors of curves in Figs. 1 and 2.
As observed above, since the behavior of varies with the codes, it is appropriate to take the average with respect to a code. Then including and , we have taken additionally two QLI codes and whose generator matrices are defined by
(121) |
and
(122) |
respectively. It is shown that and have and , respectively. Then we have averaged the corresponding ’s. The result is shown in Fig.3, where denotes the average value of over four codes. We think the relation between and is shown more properly in Fig.3.
[width=10.0cm,clip]conl-2346-G.ps
In Figs. 1, 2, and 3, we observe that holds at low-to-medium SNRs. However, the sign of inequality is reversed at high SNRs. We think this comes from the fact that approaches rapidly as the SNR increases, whereas approaches slowly as the SNR increases.
IV-C QLI Codes
In [25], we have shown that the innovations approach to Viterbi decoding of QLI codes is related to “smoothing” in the linear estimation theory. As a result, the relationship between the mutual information and the MMSE can be discussed using the result in [6, Corollary 3]. For the purpose, we provide a proposition.
Proposition 10
Let , where is the equiprobable binary input with unit variance. Each is assumed to be independent of all others. Also, let be a standard Gaussian vector. Moreover, let be a standard Gaussian vector which is independent of and . Then we have
(123) |
Proof:
We follow Guo et al. [6]. Let , where . We have
(124) | |||||
where . Then it follows from [6, Theorems 1 and 2] that
(125) | |||||
(Remark 1: This relation holds for any random vector satisfying (cf. [6]).)
Remark 2: The definition of is slightly different from that in [6]. However, since the whole observations are used in each conditional expectation, the above equality holds also under our definition.
First note the left-hand side, i.e., . Let be Gaussian. It follows from Proposition 10 that
(130) |
Since is Gaussian, we have
(131) |
and hence
(132) |
holds. Therefore,
(133) |
is obtained.
Next, note the right-hand side, i.e., . For , we have
(134) | |||||
Our concern is the evaluation of
(135) | |||||
For the purpose, we assume an approximation:
(136) |
Hence in the relation
(137) | |||||
we replace by . Then we have
(138) |
Note that the above approximation is equivalent to set
(139) |
where the left-hand side represents the average noncausal MMSE (see [6, Section IV-A]).
IV-D Numerical Results as QLI Codes
In order to confirm the validity of the derived equation, numerical calculations have been carried out. We have used the same QLI codes and as in the previous section. In this case, we regard these codes as inherent QLI codes and compare the values of and . The results are shown in Tables III and IV, where is simply denoted . The results for and are shown also in Fig.4 and Fig.5, respectively.
[width=10.0cm,clip]conl-2-Q.ps
[width=10.0cm,clip]conl-6-Q.ps
First note the following.
Lemma 7
Let . Then
(140) |
holds.
Proof:
It is shown [7, Theorem 63] that
(141) |
Letting , we have
Furthermore, letting , we have
This is equivalent to
∎
The behavior of is similar to that of . We have the following:
(i) :
-
1)
: Since , .
-
2)
: Since , .
Note that approaches more rapidly as the SNR increases compared with .
(ii) :
-
1)
: Since and since ,
(142) -
2)
: Since and since ,
(143)
From Tables III and IV (or Figs. 4 and 5), we observe that the behaviors of for and are almost the same. This is because both and are regarded as QLI codes and hence have the same , which results in almost equal values of . Also, observe that provides a good approximation to (Figs. 4 and 5).
V Discussion
In the previous section, we have discussed the validity of the derived MMSE using the results of Guo et al. [6]. In reality, we have examined
-
1)
the relation between and ,
-
2)
the relation between and .
Note that the relation between the mutual information and the MMSE has been reduced to the relation between a function of and the MMSE. That is, their relation has been examined not directly but indirectly. Now we see that the following approximation and evaluation are used in the argument in Section IV:
-
i)
().
-
ii)
.
-
iii)
.
Since () is not dependent on , averaging in i) seems to be reasonable. (But it has to be justified.) On the other hand, inequalities ii) and iii) come from the fact that the input in our signal model is not Gaussian and hence the input-output mutual information cannot be obtained in a concrete form. (In the scalar case, the mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). In our case, however, since the input is taken as a vector, we have used the conventional inequality on mutual information.) Consider the case that a QLI code is regarded as a general code. It follows from i) and ii) that
Similarly, in the case that a QLI code is regarded as the inherent QLI code, it follows from i) and iii) that
Hence evaluations of and are important in our discussions. It is expected that inequalities ii) and iii) are tight at low SNRs, whereas they are loose at high SNRs (cf. [6, p.1263]). As a result, if approximation i) is appropriate, then crossing of two curves in Figs. is understandable. Hence the validity of approximation i) has to be further examined. Nevertheless, in the case that a given QLI code is regarded as the inherent QLI code, it has been shown that provides a good approximation to . This is quite remarkable considering the fact that is a function of , whereas is dependent on concrete convolutional coding. We think this fact implies the validity of our inference and the related results.
Here note the following: () is a function only of , whereas () is dependent on the encoded block for the main decoder. Hence at first glance, the comparison seems to be inappropriate. Although actually depends on coding, it is a function of the channel error probability . Since (cf. ), is also a function of . Hence the above comparison is justified.
By the way, the argument in Section IV is based on the signal model: given in Section I. Note that is determined by the encoded symbol and hence this signal model is dependent clearly on convolutional coding. However, since each is independent of all others, can be seen as a random variable having values with equal probability. When is seen in this way, convolutional coding is not found explicitly in the expression . In words, we cannot see from the signal model how is generated. On the other hand, consider the MMSE in estimating the input based on the observations . If the signal model is interpreted as above, then the MMSE seems to be independent of concrete convolutional coding. But this is not true. As we have already seen, the MMSE is replaced finally by (), which is an essential step. By way of this replacement, the MMSE is connected with the convolutional coding. That is, convolutional coding is actually reflected in the MMSE.
From the results in Section IV, it seems that our argument is more convincing for QLI codes. We know that an SST Viterbi decoder functions well for QLI codes compared with general codes. In fact, QLI codes are preferable from a likelihood concentration viewpoint [22, 25]. Then a question arises: How is a likelihood concentration in the main decoder related to the MMSE? Suppose that the information () for the main decoder consists of error terms. We know that a likelihood concentration in the main decoder depends on [25], whereas is affected by and hence the MMSE is dependent on .
In the case of QLI codes, we have the following.
Proposition 11
Consider a QLI code whose generator matrix is given by
We have
(144) |
Proof:
Let , where errors are mutually independent. We have
(145) | |||||
Then it follows that and differ by . Since , and differ by error terms. ∎
We observe that the relation actually holds for (see Section IV-B). The above result shows that in the case of QLI codes, a likelihood concentration in the main decoder and the degree of correlation between and (i.e., the value of ) have a close connection. We remark that the former is independent of the latter in principle. The above result, however, shows that the two notions are closely related to each other.
Note that the equality does not hold for general codes in general. Then in order to examine the relation between a likelihood concentration in the main decoder and the MMSE, we can consider a code with small and large . As an example, take the code [22] whose generator matrix is defined by
(146) |
The inverse encoder is given by
(147) |
and we have . On the other hand, it is shown that
(148) |
Then we have .
Since is small, a likelihood concentration occurs in the main decoder [22, 25]. On the other hand, is considerably large. Here recall that when is regarded as a general code, it has . This value is the same as that for . However, since has as a general code, a likelihood concentration in the main decoder is not expected at low-to-medium SNRs. Then in order to see the effect of on the values of , let us compare the values of for and . For the purpose, we have evaluated for . The result is shown in Table V.
In Table V, look at the values of . We observe that they are almost the same as those for (see Table II). That is, it seems that has little effect on the values of . This observation is explained as follows. means that a likelihood concentration in the main decoder is notable. However, this does not necessarily affect the MMSE. A likelihood concentration really reduces the decoding complexity (i.e., the complexity in estimating the input). On the other hand, the MMSE is the estimation error which is finally attained after the estimation process. In other words, affects the complexity of estimation, whereas is related to the final estimation error. Moreover, we can say it as follows: the MMSE is determined by the structure of . Hence even if ’s are considerably different between two codes and , when ’s have almost equal values of , the MMSE’s are close to each other. In fact, the inverse encoder for is given by
whereas the inverse encoder for is given by
Two inverse encoders are quite different, but the values of calculated from are equal, which results in almost the same MMSE.
VI Conclusion
In this paper, we have shown that the soft-decision input to the main decoder in an SST Viterbi decoder is regarded as the innovation as well. Although this fact can be obtained from the definition of innovations for hard-decision data, this time we have discussed the subject from the viewpoint of mutual information and mean-square error. Then by combining the present result with that in [25], it has been confirmed that the input to the main decoder in an SST Viterbi decoder is regarded as the innovation. Moreover, we have obtained an important result. Note that the MMSE has been expressed in terms of the distribution of the encoded block for the main decoder in an SST Viterbi decoder. Then through the argument, the input-output mutual information has been connected with the distribution of the encoded block for the main decoder. We think this is an extension of the relation between the mutual information and the MMSE to coding theory.
On the other hand, we have problems to be further discussed. Note that since the input is not Gaussian, the discussions are based on inequalities and approximate expressions. In particular, when a given QLI code is regarded as a general code, the difference between and is not so small. We remark that this comparison has been done based on two inequalities regarding evaluation of the input-output mutual information. Moreover, the discussions are based partly on numerical calculations. Hence our argument seems to be slightly less rigorous in some places. Nevertheless, we think the closeness of two curves in Figs. 4 and 5 implies that the inference and the related results in this paper are reasonable.
Appendix A Proof of Lemma 4
We use mathematical induction on .
1) is obvious.
2) Suppose that . Let . Then we have
(A.149) | |||||
Since by the assumption, we have
(A.150) | |||||
Appendix B Proof of Lemma 5
Suppose that , , and have been determined given . Here note . Let be the (error) terms common to and . Then and are expressed as
(B.151) | |||||
(B.152) |
where , , and are mutually independent. Set
(B.153) |
We have
(B.154) | |||||
(B.155) | |||||
(B.156) |
By direct calculation, it is derived that
(B.157) | |||||
Since and since (see Lemma 4), it follows that .
Appendix C Proof of Proposition 5
Suppose that , , and have been determined given . It suffices to show that
We apply the same argument as that in the proof of Lemma 5. Let , , and be the same as those for Lemma 5. We have
(C.158) | |||||
(C.159) | |||||
(C.160) |
By direct calculation, it follows that
(C.161) | |||||
Appendix D Explanation for
The mutual information of Gaussian channel with equiprobable binary input has been given (see [6, p.1263]). Let be the equiprobable binary input with unit variance. Using the relation [6, Theorem 1]
(D.162) |
we have
(D.163) | |||||
On the other hand, for the standard Gaussian input , we have
(D.164) | |||||
Accordingly, it suffices to show that
(D.165) |
Note that this inequality is equivalent to
(D.166) |
Furthermore, the above is rewritten as
(D.167) | |||||
where represents the Gaussian distribution with mean and variance .
Here note the integrands on the left-hand side. The variable appears only in Gaussian distributions. Hence using a linear approximation of , numerical integration is possible. is approximated as
(D.168) |
(cf. )
If the above inequality holds for each , then will be shown. For example, let . The target inequality becomes
(D.169) | |||||
By carrying out numerical integration, we have
(D.170) | |||||
Hence we actually have at .
References
- [1] S. Arimoto, Kalman Filter, (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1977.
- [2] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: John Wiley & Sons, 2006.
- [3] T. E. Duncan, “On the calculation of mutual information,” SIAM J. Appl. Math., vol. 19, pp. 215–220, Jul. 1970.
- [4] R. Esposito, “On a relation between detection and estimation in decision theory,” Inform. Contr., vol. 12, pp. 116-120, 1968.
- [5] G. D. Forney, Jr., “Convolutional codes I: Algebraic structure,” IEEE Trans. Inf. Theory, vol. IT-16, no. 6, pp. 720–738, Nov. 1970.
- [6] D. Guo, S. Shamai (Shitz), and S. Verdú, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005.
- [7] G. H. Hardy, J. E. Littlewood, and G. Pólya, Inequalities, 2nd ed. Cambridge University Press, 1952. (H. hosokawa, Inequalities, (Japanese transl.). Tokyo, Japan: Springer-Verlag Tokyo, 2003.)
- [8] K. Ito ans S. Watanabe, “Introduction to stochastic differential equations,” in Proc. Intern. Symp. SDE, pp. i–xxx, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
- [9] A. H. Jazwinski, Stochastic Processes and Filtering Theory. New York: Academic Press, 1970.
- [10] T. T. Kadota, M. Zakai, and J. Ziv, “Mutual information of the white Gaussian channel with and without feedback,” IEEE Trans. Inf. Theory, vol. IT-17, no. 4, pp. 368–371, Jul. 1971.
- [11] T. Kailath, “An innovations approach to least-squares estimation–Part I: Linear filtering in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 646–655, Dec. 1968.
- [12] T. Kailath and P. Frost, “An innovations approach to least-squares estimation–Part II: Linear smoothing in additive white noise,” IEEE Trans. Automatic Control, vol. AC-13, no. 6, pp. 655–660, Dec. 1968.
- [13] T. Kailath, “A note on least-squares estimates from likelihood ratios,” Inform. Contr., vol. 13, pp. 534–540, 1968.
- [14] T. Kailath, “A general likelihood-ratio formula for random signals in Gaussian noise,” IEEE Trans. Inf. Theory, vol. IT-15, no. 3, pp. 350–361, May 1969.
- [15] T. Kailath, “A view of three decades of linear filtering theory (Invited paper),” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 146–181, Mar. 1974.
- [16] T. Kailath and H. V. Poor, “Detection of stochastic processes (Invited paper),” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2230–2259, Oct. 1998.
- [17] S. Kubota, S. Kato, and T. Ishitani, “Novel Viterbi decoder VLSI implementation and its performance,” IEEE Trans. Commun., vol. 41, no. 8, pp. 1170–1178, Aug. 1993.
- [18] H. Kunita, Estimation of Stochastic Processes (in Japanese). Tokyo, Japan: Sangyo Tosho Publishing, 1976.
- [19] P. Malliavin, “Stochastic calculus of variation and hypoelliptic operators,” in Proc. Intern. Symp. SDE, pp. 195–263, Jul. 1976. (Stochastic Differential Equations, K. Ito, Ed. Tokyo: Kinokuniya Book-Store, 1978.)
- [20] J. L. Massey and D. J. Costello, Jr., “Nonsystematic convolutional codes for sequential decoding in space applications,” IEEE Trans. Commun. Technol., vol. COM-19, no. 5, pp. 806–813, Oct. 1971.
- [21] B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, 5th ed. Berlin, Germany: Springer-Verlag, 1998.
- [22] S. Ping, Y. Yan, and C. Feng, “An effective simplifying scheme for Viterbi decoder,” IEEE Trans. Commun., vol. 39, no. 1, pp. 1–3, Jan. 1991.
- [23] G. Strang, Linear Algebra and Its Applications. New York: Academic Press, 1976. (M. Yamaguchi and A. Inoue, Linear Algebra and Its Applications, (Japanese transl.). Tokyo, Japan: Sangyo Tosho Publishing, 1978.)
- [24] M. Tajima, K. Shibata, and Z. Kawasaki, “On the equivalence between Scarce-State-Transition Viterbi decoding and syndrome decoding of convolutional codes,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 2107–2116, Aug. 2003.
- [25] M. Tajima, “An innovations approach to Viterbi decoding of convolutional codes,” IEEE Trans. Inf. Theory, vol. 65, no. 5, pp. 2704–2722, May 2019.
- [26] M. Tajima, “Corrections to “An innovations approach to Viterbi decoding of convolutional codes”,” IEEE Trans. Inf. Theory, to be published.
- [27] D. Williams, Probability with Martingales. Cambridge University Press, 1991. (J. Akahori et al., Probability with Martingales, (Japanese transl.). Tokyo, Japan: Baifukan, 2004.)
- [28] E. Wong, Stochastic Processes in Information and Dynamical Systems. New York: McGraw-Hill, 1971.
- [29] M. Zakai, “On mutual information, likelihood ratios, and estimation error for the additive Gaussian channel,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3017–3024, Sep. 2005.