On Finite-Time Mutual Information

Jieao Zhu, Zijian Zhang, Zhongzhichao Wan, and Linglong Dai Beijing National Research Center for Information Science and Technology (BNRist)
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
E-mails: zja21, zhangzj20, [email protected]; [email protected]

Abstract

Shannon-Hartley theorem can accurately calculate the channel capacity when the signal observation time is infinite. However, the calculation of finite-time mutual information, which remains unknown, is essential for guiding the design of practical communication systems. In this paper, we investigate the mutual information between two correlated Gaussian processes within a finite-time observation window. We first derive the finite-time mutual information by providing a limit expression. Then we numerically compute the mutual information within a single finite-time window. We reveal that the number of bits transmitted per second within the finite-time window can exceed the mutual information averaged over the entire time axis, which is called the exceed-average phenomenon. Furthermore, we derive a finite-time mutual information formula under a typical signal autocorrelation case by utilizing the Mercer expansion of trace class operators, and reveal the connection between the finite-time mutual information problem and the operator theory. Finally, we analytically prove the existence of the exceed-average phenomenon in this typical case, and demonstrate its compatibility with the Shannon capacity.

Index Terms:

Finite-time mutual information, exceed-average, Mercer expansion, trace class, operator theory.

I Introduction

The Shannon-Hartley theorem [1] has accurately revealed the fundamental theoretical limit of information transmission rate $C$ , which is also called as the Shannon capacity, over a Gaussian waveform channel of a limited bandwidth $W$ . The expression for Shannon capacity is $C=W\log\left(1+S/N\right)$ , where $S$ and $N$ denote the signal power and the noise power, respectively. The derivation of Shannon-Hartley Theorem heavily depends on the Nyquist sampling principle [2]. The Nyquist sampling principle, which is also named as the $2WT$ theorem [3], claims that one can only obtain $2WT+o(2WT)$ independent samples within an observation time window $T$ in a channel band-limited to $W$ [4].

Based on the Nyquist sampling principle, the Shannon capacity is derived by multiplying the capacity $1/2\log(1+P/N)$ of a Gaussian symbol channel [5, p.249] with $2WT+o(2WT)$ at first, and then dividing the result by $T$ , finally letting $T\rightarrow\infty$ . Note that this approximation only holds when $T\to\infty$ . Therefore, the Shannon capacity only asymptotically holds as $T$ becomes sufficiently large. When $T$ is of finite value, the approximation fails to work. Thus, when the observation time $T$ is finite, the Shannon-Hartley Theorem cannot be directly applied to calculate the capacity in a finite-time window. To the best of our knowledge, the evaluation of the finite-time mutual information has not yet been investigated in the literature. It is worth noting that real-world communication systems transmit signals in a finite-time window, thus evaluating the finite-time mutual information is of practical significance.

In this paper, to fill in this gap, we analyze the finite-time mutual information instead of the traditional infinite-time counterpart, and prove the existence of exceed-average phenomenon within a finite-time observation window¹¹1Simulation codes will be provided to reproduce the results presented in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html.. Specifically, our contributions are summarized as follows:

•

We derive the mutual information expressions within a finite-time observation window by using dense sampling and limiting methods. In this way, we can overcome the continuous difficulties that appear when analyzing the information contained in a continuous time interval. These finite-time mutual information expressions make the analysis of finite-time problems possible.
•

We conduct numerical experiments based on the discretized formulas. In the numerical results under a special setting, we reveal the exceed-average phenomenon, i.e., the mutual information within a finite-time observation window exceeds the mutual information averaged over the whole time axis.
•

In order to analytically prove the exceed-average phenomenon, we first derive an analytical finite-time mutual information formula based on Mercer expansion [6], where we can find the connection between the mutual information problem and the operator theory [7]. To make the problem tractable, we construct a typical case in which the transmitted signal has certain statistical properties. Utilizing this construction, we obtain a closed-form mutual information solution in this typical case, which leads to a rigorous proof of the exceed-average phenomenon.

Organization: In the rest of this paper, the finite-time mutual information is formulated and evaluated numerically in Section II, where the exceed-average phenomenon is first discovered. Then, in Section III, we derive a closed-form finite-time mutual information formula under a typical case. Based on this formula, in Section IV, the exceed-average phenomenon is rigorously proved. Finally, conclusions are drawn in Section V.

Notations: $X(t)$ denotes a Gaussian Process; $R_{X}(t_{1},t_{2})$ denotes the autocorrelation function; $S_{X}(f),S_{X}(\omega)$ are the Power Spectral Density (PSD) of the corresponding process $X(t)$ ; Boldface italic symbols $\bm{X}(t_{1}^{n})$ denotes the column vector generated by taking samples of $X(t)$ on instants $t_{i},1\leq i\leq n$ ; Upper-case boldface letters such as ${\bf{\Phi}}$ denote matrices; $\mathbb{E\left[\cdot\right]}$ denotes the expectation; $\mathbbm{1}_{A}(\cdot)$ denotes the indicator function of the set $A$ ; $\mathcal{L}^{2}([0,T])$ denotes the collection of all the square-integrable functions on window $[0,T]$ ; ${\rm i}$ denotes the imaginary unit.

II Numerical Analysis of the Finite-Time Mutual Information

In this section, we numerically evaluate the finite-time mutual information. In Subsection II-A, we model the transmission problem by Gaussian processes, and derive the mutual information expressions within a finite-time observation window; In Subsection II-B, we approximate the finite-time mutual information by discretized matrix-based formulas; In Subsection II-C, we reveal the exceed-average phenomenon by numerically evaluating the finite-time mutual information.

II-A The Expressions for finite-time mutual information

Inspired by [8], we model the transmitted signal by a zero-mean stationary Gaussian stochastic process, denoted as $X(t)$ , and the received signal by $Y(t):=X(t)+N(t)$ . The noise process $N(t)$ is also a stationary Gaussian process independent of $X(t)$ . The receiver can only observe the signal within a finite-time window $[0,T]$ , where $T>0$ is the observation window span. We aim to find the maximum mutual information that can be acquired within this time window.

To analytically express the amount of acquired information, we first introduce $n$ sampling points inside the time window, denoted by $(t_{1},t_{2},\cdots,t_{n}):=t_{1}^{n}$ , and then let $n\rightarrow\infty$ to approximate the finite-time mutual information. By defining ${\bm{X}}(t_{1}^{n})\equiv(X(t_{1}),X(t_{2}),\cdots,X(t_{n}))$ and ${\bm{Y}}(t_{1}^{n})\equiv(Y(t_{1}),Y(t_{2}),\cdots,Y(t_{n}))$ , the mutual information on these $n$ samples can be expressed as

I(t_{1}^{n})=I({\bm{X}}(t_{1}^{n});{\bm{Y}}(t_{1}^{n})),

(1)

and the finite-time mutual information is defined as

I(T)=\lim_{n\rightarrow\infty}\sup_{\{t_{1}^{n}\}\subset[0,T]}{I(t_{1}^{n})}.

(2)

Then, the transmission rate $C(T)$ can be defined as $C(T)=I(T)/T$ . After these definitions, we can then define the limit mutual information as $C(\infty)=\lim_{T\rightarrow\infty}{C(T)}$ by letting $T\rightarrow\infty$ .

II-B Discretization

Without loss of generality, we fix the sampling instants uniformly onto fractions of $T$ : $t_{i}=(i-1)T/n,1\leq i\leq n$ . Since the random vectors ${\bm{X}}(t_{1}^{n})$ and ${\bm{Y}}(t_{1}^{n})$ are samples of a Gaussian process, they are both Gaussian random vectors with mean zero and covariance matrices ${\bm{K}}_{X}$ and ${\bm{K}}_{Y}$ , where ${\bm{K}}_{X},{\bm{K}}_{Y}\in\mathbb{R}^{n\times n}$ are symmetric positive-definite matrices defined as

	$\displaystyle({\bm{K}}_{X})_{i,j}$	$\displaystyle=R_{X}(t_{i},t_{j}):=\mathbb{E}\left[X(t_{i})X(t_{j})\right],$		(3)
	$\displaystyle({\bm{K}}_{Y})_{i,j}$	$\displaystyle=R_{Y}(t_{i},t_{j}):=\mathbb{E}\left[Y(t_{i})Y(t_{j})\right].$		(3)

Note that $Y(t)$ is the independent sum of $X(t)$ and $N(t)$ , thus the autocorrelation functions satisfy $R_{Y}(t_{1},t_{2})=R_{X}(t_{1},t_{2})+R_{N}(t_{1},t_{2})$ , and similarly the covariance matrices satisfy ${\bm{K}}_{Y}={\bm{K}}_{X}+{\bm{K}}_{N}$ .

The mutual information $I(t_{1}^{n})$ is defined as $I(t_{1}^{n})=h({\bm{Y}}(t_{1}^{n}))-h({\bm{Y}}(t_{1}^{n})|{\bm{X}}(t_{1}^{n}))=h({\bm{Y}}(t_{1}^{n}))-h({\bm{N}}(t_{1}^{n}))$ , where $h(\cdot)$ denotes the differential entropy. Utilizing the entropy formula for $n$ -dimentional Gaussian vectors [5], we obtain

I(t_{1}^{n})=\frac{1}{2}\log\left(\frac{\det({\bm{K}}_{X}+{\bm{K}}_{N})}{\det{({\bm{K}}_{N})}}\right).

(4)

II-C Numerical Analysis

In order to study the properties of mutual information $I(t_{1}^{n})$ as a function of $n$ , we set the autocorrelation function of the signal process $X(t)$ and noise process $N(t)$ to the following special case

	$\displaystyle R_{X}(t_{1},t_{2})$	$\displaystyle=\mathrm{sinc}(10(t_{1}-t_{2})),$		(5)
	$\displaystyle R_{N}(t_{1},t_{2})$	$\displaystyle=\exp(-\lvert t_{1}-t_{2}\rvert),$		(5)

where $\mathrm{sinc}(x):=\sin(\pi x)/(\pi x)$ , and the corresponding PSDs are $S_{X}(f)=0.1\times\mathbbm{1}_{\{-5\leq f\leq 5\}}$ and $S_{N}(f)=\frac{2}{1+(2\pi f)^{2}}$ . In order to compare the finite-time mutual information with the averaged mutual information, the Shannon limit with colored noise spectrum $S_{N}(f)$ is utilized, which is a generalized version of the well-known formula $C=W\log\left(1+S/N\right)$ . The averaged mutual information of colored noise PSD [5] is expressed as

C_{\rm av}:=\frac{1}{2}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(f)}{S_{N}(f)}\right)\mathrm{d}f}.

(6)

Then, plugging (5) into (6) yields the numerical result for $C_{\rm av}$ .

We calculate the finite-time transmission rate $C(T)$ and the average mutual information $C_{\rm av}$ against the number of samples $n$ within the observation window $[0,T]$ . The numerical results are collected in Fig. 1. It is shown that $I(t_{1}^{n})$ is an increasing function of $n$ , and for fixed values of $T$ , the approximated finite-time mutual information $I(t_{1}^{n})$ tends to a finite limit under the correlation assumptions given by (5). The most amazing observation is that, it is possible to obtain more information within finite-time window $[0,T]$ than the prediction $TC_{\rm av}$ given by averaging the mutual information along the entire time axis (6). We call this phenomenon the exceed-average phenomenon.

Refer to caption — Figure 1: A first glance to the exceed-average phenomenon. The red dashed horizontal line is the Shannon limit, and the $T=1,2,8$ curves illustrate the dependence of $C(t_{1}^{n})$ on $n$ .

III A Closed-Form Finite-Time Mutual Information Formula

In this section, we first introduce the Mercer expansion in Subsection III-A as a basic tool for our analysis. Then we derive the series representation of the finite-time mutual information, and the corresponding power constraint in Subsection III-B, under the assumption of AWGN noise.

III-A The Mercer Expansion

By analyzing the structure of the autocorrelation functions, it is possible to obtain $I(t_{1}^{n})$ and $I(T)$ directly. In fact, if we know the Mercer expansion [6] of the autocorrelation function $R_{X}(t_{1},t_{2})$ on interval $[0,T]$ , then we can calculate $h({\bm{X}}(t_{1}^{n}))$ more easily [9]. In the following discussion, we assume the Mercer expansion of the source autocorrelation function $R_{X}(t_{1},t_{2})$ to be in the following form

\lambda_{k}\phi_{k}(t_{1})=\int_{0}^{T}{R_{X}(t_{1},t_{2})\phi_{k}(t_{2})\mathrm{d}t_{2}};k>0,k\in\mathbb{N},

(7)

where the eigenvalues are positive: $\lambda_{k}>0$ , and the eigenfunctions form an orthonormal set:

\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}.

(8)

The Mercer’s theorem [6] ensures the existence and uniqueness of the eigenpairs $(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}$ , and furthermore, the kernel itself can be expanded under the eigenfunctions:

R_{X}(t_{1},t_{2})=\sum_{k=1}^{+\infty}{\lambda_{k}\phi_{k}(t_{1})\phi_{k}(t_{2})}.

(9)

III-B Finite-Time Mutual Information Formula

Based on Mercer expansion, we obtain a closed-form formula in the following Theorem 1.

Theorem 1 (Source expansion, AWGN noise)

Suppose the information source, modeled by the transmitted process $X(t)$ , has autocorrelation function $R_{X}(t_{1},t_{2})$ . An AWGN noise of PSD $n_{0}/2$ is imposed onto $X(t)$ , resulting in the received process $Y(t)$ . The Mercer expansion of $R_{X}(t_{1},t_{2})$ on $[0,T]$ is given by (7), satisfying (8). Then the finite-time mutual information $I(T)$ within the observation window $[0,T]$ between the processes $X(t)$ and $Y(t)$ can be expressed as

I(T)=\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)}.

(10)

Proof:

The Mercer expansion decomposes the AWGN channel into an infinite number of independent parallel subchannels, each with signal power $\lambda_{k}$ and noise variance $n_{0}/2$ . Thus, accumulating all the mutual information values of these subchannels yields the finite-time mutual information $I(T)$ . ∎

From Theorem 1 we can conclude that the finite-time mutual information of AWGN channel is uniquely determined by the Mercer spectra $\lambda_{k}$ of $R_{X}(t_{1},t_{2})$ within $[0,T]$ . However, it remains unknown whether the series representation (10) converges. In fact, the convergence is closely related to the signal power, which is calculated in the following Lemma 1.

Lemma 1 (Operator Trace Coincide with Power Constraint)

Given stationary Gaussian process $X(t)$ with mean zero and autocorrelation $R_{X}(t_{1},t_{2})$ . The Mercer expansion of $R_{X}(t_{1},t_{2})$ on $[0,T]$ is given by (7), satisfying (8). The Mercer operator $M(\cdot):\mathcal{L}^{2}([0,T])\rightarrow\mathcal{L}^{2}([0,T])$ is defined by the integral $(M\phi)(s)=\int_{0}^{T}{R_{X}(s,\tau)\phi(\tau)\mathrm{d}\tau}$ . Then the sum of all the eigenvalues $\lambda_{k}$ of operator $M$ is equal to the signal energy $PT$ within $[0,T]$ :

\mathrm{tr}(M):=\sum_{k=1}^{+\infty}{\lambda_{k}}=PT,

(11)

where $P=R_{X}(0,0)$ .

Proof:

Integrating both sides of (9) on the interval $[0,T]$ gives the conclusion immediately. ∎

Remark 1

From the above Lemma 1, we can conclude that the sum of $\lambda_{k}$ is finite when $T$ is finite. It can be immediately derived that $I(T)<\infty$ , since $\log(1+x)\leq x$ .

Remark 2

The finite-time mutual information formula (10) is closely related to the operator theory [7] in functional analysis. The sum of all the eigenvalues $\lambda_{k}$ is called the operator trace in linear operator theory. As is mentioned in Lemma 1, the autocorrelation function $R_{X}(t_{1},t_{2})$ can be treated as a linear operator $M$ on $\mathcal{L}^{2}([0,T])$ .

IV Proof of the Existence of Exceed-Average Phenomenon

In this section, we first give a proof of the existence of the exceed-average phenomenon in a typical case, then we discuss the compatibility with the Shannon-Hartley theorem.

IV-A Closed-Form Mutual Information in a Typical Case

To prove the exceed-average phenomenon, we only need to show that the finite-time mutual information is greater than the averaged mutual information in a typical case. Let us consider a finite-time communication scheme with a finitely-powered stationary transmitted signal autocorrelation²²2The signal autocorrelation $R_{X}(\tau)$ is often observed in many scenarios, such as passing a signal with white spectrum through an RC lowpass filter., which is specified as

R_{X}(t_{1},t_{2})=R_{X}(\tau)=P\exp(-\alpha\lvert\tau\rvert),

(12)

where $\tau=t_{1}-t_{2}$ , under AWGN channel with noise PSD being $n_{0}/2$ . The power of signal $X(t)$ is $P=R_{X}(0)$ . According to Lemma 1, the trace of the corresponding Mercer operator $M(\cdot)$ is finite. Then the finite-time mutual information given by Theorem 1 is also finite, as is shown in Remark 1. Finding the Mercer expansion is equivalent to finding the eigenpairs $(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}$ . The eigenpairs are determined by the following characteristic integral equation [10]:

\lambda_{k}\phi_{k}(s)=\int_{0}^{s}{Pe^{-\alpha(s-t)}\phi_{k}(t)\mathrm{d}t}+\int_{s}^{T}{Pe^{-\alpha(t-s)}\phi_{k}(t)\mathrm{d}t}.

(13)

Differentiating both sides of (13) twice with respect to $s$ yields the boundary conditions and the differential equation that $\lambda_{k},\phi_{k}$ must satisfy:

$\displaystyle\lambda_{k}\phi_{k}^{\prime\prime}(s)$	$\displaystyle=(\alpha^{2}\lambda_{k}-2\alpha P)\phi_{k}(s),0<s<T,$	(14)
$\displaystyle\phi_{k}^{\prime}(0)$	$\displaystyle=\alpha\phi_{k}(0),$
$\displaystyle\phi_{k}^{\prime}(T)$	$\displaystyle=-\alpha\phi_{k}(T).$

Let $\omega_{k}>0$ denote the resonant frequency of the above harmonic oscillator equation, and let $\phi_{k}(t)=A_{k}\cos(\omega_{k}t)+B_{k}\sin(\omega_{k}t)$ be the sinusoidal form of the eigenfunction. Using the boundary conditions we obtain

		$\displaystyle B_{k}\omega_{k}=\alpha A_{k},$		(15)
		$\displaystyle B_{k}\omega_{k}\cos(\omega_{k}T)-A_{k}\omega_{k}\sin(\omega_{k}T)$
		$\displaystyle=-\alpha\left(A_{k}\cos(\omega_{k}T)+B_{k}\sin(\omega_{k}T)\right).$

To ensure the existence of solution to the homogeneous linear equations (15) with unknowns $A_{k},B_{k}$ , the determinant must be zero. Exploiting this condition, we find that $\omega_{k}$ naturally satisfy the transcendental equation $\tan(\omega_{k}T)=(2\omega_{k}\alpha)/({\omega_{k}^{2}-\alpha^{2}})$ . By introducing an auxillary variable $\theta_{k}=\arctan(\omega_{k}/\alpha)\in[0,\pi/2]$ , this transcendental equation can be simplified as $\tan(\omega_{k}T)=-\tan(2\theta_{k})$ , i.e., there exists positive integer $m$ such that $2\arctan(\omega_{k}/\alpha)=m\pi-\omega_{k}T$ . The integer $m$ can be chosen to be equal to $k$ . From the function images of $2\arctan(\omega/\alpha)$ and $k\pi-\omega T$ (Fig. 2), we can determine $\omega_{k}$ , and then $\lambda_{k}$ . To sum up, the solution to the characteristic equation (13) are collected into (16) as follows.

$\displaystyle 2\arctan(\omega_{k}/\alpha)$	$\displaystyle=k\pi-\omega_{k}T,$	(16)
$\displaystyle\lambda_{k}$	$\displaystyle=\frac{2\alpha P}{\alpha^{2}+\omega_{k}^{2}},$
$\displaystyle\phi_{k}(t)$	$\displaystyle=\frac{1}{Z_{k}}\left(\omega_{k}\cos(\omega_{k}t)+\alpha\sin(\omega_{k}t)\right),$

where $Z_{k}$ denotes the normalization constants of $\phi_{k}(t)$ on $[0,T]$ to ensure orthonormality.

Equation (16) gives all the eigenpairs $(\lambda_{k},\phi_{k})_{k=1}^{\infty}$ , from which we can calculate $I(T)$ by applying Theorem 1. As for the Shannon limit $C_{\rm sh}$ , by applying (6) and evaluating the integral with [11] we can obtain

C_{\rm av}=C_{\rm sh}=\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right).

(17)

After all the preparation works above, we can rigorously prove that $C(T)>C_{\rm av}$ under the typical case of (12), as long as the transmission power $P$ is smaller than a constant $\delta$ . The following Theorem 2 proves this result.

Theorem 2 (Existence of exceed-average phenomenon in a typical case)

Suppose $X(t)$ and $Y(t)$ are specified according to (12). The eigenpairs are determined by (16). Then, for any fixed positive values of $T,n_{0}$ and $\alpha$ , there exists $\delta>0$ such that the exceed-average inequality

C(T):=\frac{1}{2T}\sum_{k=1}^{+\infty}\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)>C_{\rm av}

(18)

holds strictly for arbitrary $0<P<\delta$ .

Proof:

See Appendix A. ∎

To support the above theoretical analysis, numerical experiments on $I(T)$ are conducted based on evaluations of (16) and (17). As shown in Fig. 3, it seems that we can always harness more mutual information in a finite-time observation window than the averaged mutual information. This fact is somehow unsurprising because the observations ${\bm{Y}}(t_{1}^{N})$ inside the window $[0,T]$ can always eliminate some extra uncertainty outside the window due to the autocorrelation of $X(t)$ .

IV-B Compatibility with the Shannon-Hartley theorem

Though the exceed-average effect do imply mathematically that the mutual information within a finite-time window is higher than the average mutual information $C_{\rm av}$ (which coincides with the Shannon limit (6) in this case), in fact, it is still impossible to construct a long-time stable data transmission scheme above the Shannon capacity by leveraging this effect. So the exceed-average phenomenon does not contradict the Shannon-Hartley Theorem. Placing additional observation windows cannot increase the average information rate, because the posterior process $Y(t)|{\bm{Y}}(t_{1}^{N})$ does not carry as much information as the original one, causing a rate reduction in the later windows. It is expected that, the finite-time mutual information would ultimately decrease to the average mutual information as the total observation time tends to infinity (i.e., $C(\infty)=C_{\rm av}$ ), and the analytical proof, as well as the achievability of this finite-time mutual information, is still worth investigation in future works.

V Conclusions

In this paper, we provided rigorous proofs of the existence of the exceed-average phenomenon under typical autocorrelation settings of the transmitted signal process and the noise process. Our discovery of the exceed-average phenomenon provided a generalization of Shannon’s renowned formula $C=W\log(1+S/N)$ to the practical finite-time communications. Since the finite-time mutual information is a more precise estimation of the capacity limit, the optimization target may shift from the average mutual information to the finite-time mutual information in the design of practical communication systems. Thus, it may have guiding significance for the performance improvement of modern communication systems. In future works, general proofs of $C(T)>C_{\rm av}$ , independent of the concrete autocorrelation settings, still require further investigation. Moreover, we need to answer the question of how to exploit this exceed-average phenomenon to improve the communication rate, and whether this finite-time mutual information is achievable by a certain coding scheme. In addition, although we have discovered numerically that the finite-time mutual information agrees with the Shannon capacity when $T\rightarrow\infty$ , an analytical proof of this result is required in the future.

Appendix A
Proof Of Theorem 2

Plugging (17) into the right-hand side of (18), and differentiate both sides w.r.t $P$ . Notice that if $P=0$ , then both sides of (18) are equal to 0. Thus, we only need to prove that the derivative of left-hand side is strictly larger than that of right-hand side within a small interval $P\in(0,\delta)$ :

\frac{1}{2}\sum_{k=1}^{+\infty}{\left(\frac{1}{1+\frac{2\lambda_{k}}{n_{0}}}\frac{2\lambda_{k}}{n_{0}P}\right)}>\frac{T}{n_{0}}\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(19)

Multiply both sides of (19) by $n_{0}$ and define $\mu_{k}:=\lambda_{k}/(PT)$ , and then from Lemma 1 we obtain $\sum_{k}{\mu_{k}}=1$ . In this way, (19) is equivalent to

\sum_{k=1}^{+\infty}{\frac{\mu_{k}}{1+\frac{2\lambda_{k}}{n_{0}}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(20)

Since $\varphi(x):=1/(1+2x/n_{0})$ is convex on $(0,+\infty)$ , by applying Jensen’s inequality to the left-hand side of (20), we only need to prove that

\frac{1}{1+\frac{2}{n_{0}}\sum_{k}{\lambda_{k}\mu_{k}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(21)

From the definition of $\mu_{k}$ we can derive that $\lambda_{k}\mu_{k}=\lambda_{k}^{2}/(PT)$ . So we go on to calculate $\sum_{k}{\lambda_{k}^{2}}$ . That is equivalent to calculate $\mathrm{tr}(M^{2})$ , where $M^{2}$ corresponds to the integral kernel:

K_{M^{2}}(t_{1},t_{2}):=\int_{0}^{T}{P^{2}\exp(-\alpha\lvert t_{1}-s\rvert)\exp(-\alpha\lvert t_{2}-s\rvert)\mathrm{d}s}.

(22)

Evaluating the kernel $K_{M^{2}}$ on the diagonal $t=t_{1}=t_{2}$ , and integrating this kernel on the diagonal of $[0,T]^{2}$ gives $\sum_{k}{\lambda_{k}^{2}}$ , i.e., $\mathrm{tr}(M^{2})$ :

	$\displaystyle\sum_{k}{\lambda_{k}^{2}}$	$\displaystyle=\frac{P^{2}}{2\alpha}\int_{0}^{T}{\left(2-e^{-2\alpha t}-e^{-2\alpha(T-t)}\right)\mathrm{d}t},$		(23)
		$\displaystyle=\frac{P^{2}}{\alpha}\left(T-\frac{1}{2\alpha}(1-e^{-2\alpha T})\right).$		(23)

By substituting (23) into (21), we just need to prove that

\sqrt{1+4P/(n_{0}\alpha)}>1+\frac{2P}{n_{0}\alpha}\left(1-\frac{1-e^{-2\alpha T}}{2\alpha T}\right).

(24)

Define the dimensionless number $x=2P/(n_{0}\alpha)$ . Since the function $\psi(x):=(1-\exp(-x))/x$ is strictly positive and less than 1 at $x>0$ , we can conclude that, there exists a small positive $\delta>0$ such that (24) holds for $0<P<\delta$ . The number $\delta$ can be chosen as

\delta=\frac{n_{0}\alpha\psi(2\alpha T)}{(1-\psi(2\alpha T))^{2}}>0,

(25)

which implies that (19) holds for any $0<P<\delta$ . Thus, integrating (19) on both sides from $p=0$ to $p=P,P<\delta$ gives rise to the conclusion (18), which completes the proof of Theorem 2.

Acknowledgment

This work was supported in part by the National Key Research and Development Program of China (Grant No.2020YFB1807201), in part by the National Natural Science Foundation of China (Grant No. 62031019), and in part by the European Commission through the H2020-MSCA-ITN META WIRELESS Research Project under Grant 956256.

References

[1] C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Techni. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
[2] H. Landau, “Sampling, data transmission, and the nyquist rate,” Proc. IEEE, vol. 55, no. 10, pp. 1701–1706, Oct. 1967.
[3] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans. American Institute of Electrical Engineers, vol. 47, no. 2, pp. 617–644, Apr. 1928.
[4] D. Slepian, “On bandwidth,” Proc. IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976.
[5] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
[6] J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Philos. Trans. Royal Soc., vol. 209, pp. 4–415, 1909.
[7] K. Zhu, Operator theory in function spaces. American Mathematical Soc., 2007, no. 138.
[8] A. Balakrishnan, “A note on the sampling principle for continuous signals,” IRE Trans. Inf. Theory, vol. 3, no. 2, pp. 143–146, Jun. 1957.
[9] J. Barrett and D. Lampard, “An expansion for some second-order probability distributions and its application to noise problems,” IRE Trans. Inf. Theory, vol. 1, no. 1, pp. 10–15, Mar. 1955.
[10] D. Cai and P. S. Vassilevski, “Eigenvalue problems for exponential-type kernels,” Comput. Methods in Applied Math., vol. 20, no. 1, pp. 61–78, Jan. 2020.
[11] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, series, and products. Academic press, 2014.

On Finite-Time Mutual Information

Abstract

Index Terms:

I Introduction

II Numerical Analysis of the Finite-Time Mutual Information

II-A The Expressions for finite-time mutual information

II-B Discretization

II-C Numerical Analysis

III A Closed-Form Finite-Time Mutual Information Formula

III-A The Mercer Expansion

III-B Finite-Time Mutual Information Formula

Theorem 1 (Source expansion, AWGN noise)

Proof:

Lemma 1 (Operator Trace Coincide with Power Constraint)

Proof:

Remark 1

Remark 2

IV Proof of the Existence of Exceed-Average Phenomenon

IV-A Closed-Form Mutual Information in a Typical Case

Theorem 2 (Existence of exceed-average phenomenon in a typical case)

Proof:

IV-B Compatibility with the Shannon-Hartley theorem

V Conclusions

Appendix A Proof Of Theorem 2

Acknowledgment

References

Appendix A
Proof Of Theorem 2