This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Finite-Time Mutual Information

Jieao Zhu, Zijian Zhang, Zhongzhichao Wan, and Linglong Dai Beijing National Research Center for Information Science and Technology (BNRist)
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
E-mails: zja21, zhangzj20, [email protected]; [email protected]
Abstract

Shannon-Hartley theorem can accurately calculate the channel capacity when the signal observation time is infinite. However, the calculation of finite-time mutual information, which remains unknown, is essential for guiding the design of practical communication systems. In this paper, we investigate the mutual information between two correlated Gaussian processes within a finite-time observation window. We first derive the finite-time mutual information by providing a limit expression. Then we numerically compute the mutual information within a single finite-time window. We reveal that the number of bits transmitted per second within the finite-time window can exceed the mutual information averaged over the entire time axis, which is called the exceed-average phenomenon. Furthermore, we derive a finite-time mutual information formula under a typical signal autocorrelation case by utilizing the Mercer expansion of trace class operators, and reveal the connection between the finite-time mutual information problem and the operator theory. Finally, we analytically prove the existence of the exceed-average phenomenon in this typical case, and demonstrate its compatibility with the Shannon capacity.

Index Terms:
Finite-time mutual information, exceed-average, Mercer expansion, trace class, operator theory.

I Introduction

The Shannon-Hartley theorem [1] has accurately revealed the fundamental theoretical limit of information transmission rate CC, which is also called as the Shannon capacity, over a Gaussian waveform channel of a limited bandwidth WW. The expression for Shannon capacity is C=Wlog(1+S/N)C=W\log\left(1+S/N\right), where SS and NN denote the signal power and the noise power, respectively. The derivation of Shannon-Hartley Theorem heavily depends on the Nyquist sampling principle [2]. The Nyquist sampling principle, which is also named as the 2WT2WT theorem [3], claims that one can only obtain 2WT+o(2WT)2WT+o(2WT) independent samples within an observation time window TT in a channel band-limited to WW [4].

Based on the Nyquist sampling principle, the Shannon capacity is derived by multiplying the capacity 1/2log(1+P/N)1/2\log(1+P/N) of a Gaussian symbol channel [5, p.249] with 2WT+o(2WT)2WT+o(2WT) at first, and then dividing the result by TT, finally letting TT\rightarrow\infty. Note that this approximation only holds when TT\to\infty. Therefore, the Shannon capacity only asymptotically holds as TT becomes sufficiently large. When TT is of finite value, the approximation fails to work. Thus, when the observation time TT is finite, the Shannon-Hartley Theorem cannot be directly applied to calculate the capacity in a finite-time window. To the best of our knowledge, the evaluation of the finite-time mutual information has not yet been investigated in the literature. It is worth noting that real-world communication systems transmit signals in a finite-time window, thus evaluating the finite-time mutual information is of practical significance.

In this paper, to fill in this gap, we analyze the finite-time mutual information instead of the traditional infinite-time counterpart, and prove the existence of exceed-average phenomenon within a finite-time observation window111Simulation codes will be provided to reproduce the results presented in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html.. Specifically, our contributions are summarized as follows:

  • We derive the mutual information expressions within a finite-time observation window by using dense sampling and limiting methods. In this way, we can overcome the continuous difficulties that appear when analyzing the information contained in a continuous time interval. These finite-time mutual information expressions make the analysis of finite-time problems possible.

  • We conduct numerical experiments based on the discretized formulas. In the numerical results under a special setting, we reveal the exceed-average phenomenon, i.e., the mutual information within a finite-time observation window exceeds the mutual information averaged over the whole time axis.

  • In order to analytically prove the exceed-average phenomenon, we first derive an analytical finite-time mutual information formula based on Mercer expansion [6], where we can find the connection between the mutual information problem and the operator theory [7]. To make the problem tractable, we construct a typical case in which the transmitted signal has certain statistical properties. Utilizing this construction, we obtain a closed-form mutual information solution in this typical case, which leads to a rigorous proof of the exceed-average phenomenon.

Organization: In the rest of this paper, the finite-time mutual information is formulated and evaluated numerically in Section II, where the exceed-average phenomenon is first discovered. Then, in Section III, we derive a closed-form finite-time mutual information formula under a typical case. Based on this formula, in Section IV, the exceed-average phenomenon is rigorously proved. Finally, conclusions are drawn in Section V.

Notations: X(t)X(t) denotes a Gaussian Process; RX(t1,t2)R_{X}(t_{1},t_{2}) denotes the autocorrelation function; SX(f),SX(ω)S_{X}(f),S_{X}(\omega) are the Power Spectral Density (PSD) of the corresponding process X(t)X(t); Boldface italic symbols 𝑿(t1n)\bm{X}(t_{1}^{n}) denotes the column vector generated by taking samples of X(t)X(t) on instants ti,1int_{i},1\leq i\leq n; Upper-case boldface letters such as 𝚽{\bf{\Phi}} denote matrices; 𝔼[]\mathbb{E\left[\cdot\right]} denotes the expectation; 𝟙A()\mathbbm{1}_{A}(\cdot) denotes the indicator function of the set AA; 2([0,T])\mathcal{L}^{2}([0,T]) denotes the collection of all the square-integrable functions on window [0,T][0,T]; i{\rm i} denotes the imaginary unit.

II Numerical Analysis of the Finite-Time Mutual Information

In this section, we numerically evaluate the finite-time mutual information. In Subsection II-A, we model the transmission problem by Gaussian processes, and derive the mutual information expressions within a finite-time observation window; In Subsection II-B, we approximate the finite-time mutual information by discretized matrix-based formulas; In Subsection II-C, we reveal the exceed-average phenomenon by numerically evaluating the finite-time mutual information.

II-A The Expressions for finite-time mutual information

Inspired by [8], we model the transmitted signal by a zero-mean stationary Gaussian stochastic process, denoted as X(t)X(t), and the received signal by Y(t):=X(t)+N(t)Y(t):=X(t)+N(t). The noise process N(t)N(t) is also a stationary Gaussian process independent of X(t)X(t). The receiver can only observe the signal within a finite-time window [0,T][0,T], where T>0T>0 is the observation window span. We aim to find the maximum mutual information that can be acquired within this time window.

To analytically express the amount of acquired information, we first introduce nn sampling points inside the time window, denoted by (t1,t2,,tn):=t1n(t_{1},t_{2},\cdots,t_{n}):=t_{1}^{n}, and then let nn\rightarrow\infty to approximate the finite-time mutual information. By defining 𝑿(t1n)(X(t1),X(t2),,X(tn)){\bm{X}}(t_{1}^{n})\equiv(X(t_{1}),X(t_{2}),\cdots,X(t_{n})) and 𝒀(t1n)(Y(t1),Y(t2),,Y(tn)){\bm{Y}}(t_{1}^{n})\equiv(Y(t_{1}),Y(t_{2}),\cdots,Y(t_{n})), the mutual information on these nn samples can be expressed as

I(t1n)=I(𝑿(t1n);𝒀(t1n)),I(t_{1}^{n})=I({\bm{X}}(t_{1}^{n});{\bm{Y}}(t_{1}^{n})), (1)

and the finite-time mutual information is defined as

I(T)=limnsup{t1n}[0,T]I(t1n).I(T)=\lim_{n\rightarrow\infty}\sup_{\{t_{1}^{n}\}\subset[0,T]}{I(t_{1}^{n})}. (2)

Then, the transmission rate C(T)C(T) can be defined as C(T)=I(T)/TC(T)=I(T)/T. After these definitions, we can then define the limit mutual information as C()=limTC(T)C(\infty)=\lim_{T\rightarrow\infty}{C(T)} by letting TT\rightarrow\infty.

II-B Discretization

Without loss of generality, we fix the sampling instants uniformly onto fractions of TT: ti=(i1)T/n,1int_{i}=(i-1)T/n,1\leq i\leq n. Since the random vectors 𝑿(t1n){\bm{X}}(t_{1}^{n}) and 𝒀(t1n){\bm{Y}}(t_{1}^{n}) are samples of a Gaussian process, they are both Gaussian random vectors with mean zero and covariance matrices 𝑲X{\bm{K}}_{X} and 𝑲Y{\bm{K}}_{Y}, where 𝑲X,𝑲Yn×n{\bm{K}}_{X},{\bm{K}}_{Y}\in\mathbb{R}^{n\times n} are symmetric positive-definite matrices defined as

(𝑲X)i,j\displaystyle({\bm{K}}_{X})_{i,j} =RX(ti,tj):=𝔼[X(ti)X(tj)],\displaystyle=R_{X}(t_{i},t_{j}):=\mathbb{E}\left[X(t_{i})X(t_{j})\right], (3)
(𝑲Y)i,j\displaystyle({\bm{K}}_{Y})_{i,j} =RY(ti,tj):=𝔼[Y(ti)Y(tj)].\displaystyle=R_{Y}(t_{i},t_{j}):=\mathbb{E}\left[Y(t_{i})Y(t_{j})\right].

Note that Y(t)Y(t) is the independent sum of X(t)X(t) and N(t)N(t), thus the autocorrelation functions satisfy RY(t1,t2)=RX(t1,t2)+RN(t1,t2)R_{Y}(t_{1},t_{2})=R_{X}(t_{1},t_{2})+R_{N}(t_{1},t_{2}), and similarly the covariance matrices satisfy 𝑲Y=𝑲X+𝑲N{\bm{K}}_{Y}={\bm{K}}_{X}+{\bm{K}}_{N}.

The mutual information I(t1n)I(t_{1}^{n}) is defined as I(t1n)=h(𝒀(t1n))h(𝒀(t1n)|𝑿(t1n))=h(𝒀(t1n))h(𝑵(t1n))I(t_{1}^{n})=h({\bm{Y}}(t_{1}^{n}))-h({\bm{Y}}(t_{1}^{n})|{\bm{X}}(t_{1}^{n}))=h({\bm{Y}}(t_{1}^{n}))-h({\bm{N}}(t_{1}^{n})), where h()h(\cdot) denotes the differential entropy. Utilizing the entropy formula for nn-dimentional Gaussian vectors [5], we obtain

I(t1n)=12log(det(𝑲X+𝑲N)det(𝑲N)).I(t_{1}^{n})=\frac{1}{2}\log\left(\frac{\det({\bm{K}}_{X}+{\bm{K}}_{N})}{\det{({\bm{K}}_{N})}}\right). (4)

II-C Numerical Analysis

In order to study the properties of mutual information I(t1n)I(t_{1}^{n}) as a function of nn, we set the autocorrelation function of the signal process X(t)X(t) and noise process N(t)N(t) to the following special case

RX(t1,t2)\displaystyle R_{X}(t_{1},t_{2}) =sinc(10(t1t2)),\displaystyle=\mathrm{sinc}(10(t_{1}-t_{2})), (5)
RN(t1,t2)\displaystyle R_{N}(t_{1},t_{2}) =exp(|t1t2|),\displaystyle=\exp(-\lvert t_{1}-t_{2}\rvert),

where sinc(x):=sin(πx)/(πx)\mathrm{sinc}(x):=\sin(\pi x)/(\pi x), and the corresponding PSDs are SX(f)=0.1×𝟙{5f5}S_{X}(f)=0.1\times\mathbbm{1}_{\{-5\leq f\leq 5\}} and SN(f)=21+(2πf)2S_{N}(f)=\frac{2}{1+(2\pi f)^{2}}. In order to compare the finite-time mutual information with the averaged mutual information, the Shannon limit with colored noise spectrum SN(f)S_{N}(f) is utilized, which is a generalized version of the well-known formula C=Wlog(1+S/N)C=W\log\left(1+S/N\right). The averaged mutual information of colored noise PSD [5] is expressed as

Cav:=12+log(1+SX(f)SN(f))df.C_{\rm av}:=\frac{1}{2}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(f)}{S_{N}(f)}\right)\mathrm{d}f}. (6)

Then, plugging (5) into (6) yields the numerical result for CavC_{\rm av}.

We calculate the finite-time transmission rate C(T)C(T) and the average mutual information CavC_{\rm av} against the number of samples nn within the observation window [0,T][0,T]. The numerical results are collected in Fig. 1. It is shown that I(t1n)I(t_{1}^{n}) is an increasing function of nn, and for fixed values of TT, the approximated finite-time mutual information I(t1n)I(t_{1}^{n}) tends to a finite limit under the correlation assumptions given by (5). The most amazing observation is that, it is possible to obtain more information within finite-time window [0,T][0,T] than the prediction TCavTC_{\rm av} given by averaging the mutual information along the entire time axis (6). We call this phenomenon the exceed-average phenomenon.

Refer to caption
Figure 1: A first glance to the exceed-average phenomenon. The red dashed horizontal line is the Shannon limit, and the T=1,2,8T=1,2,8 curves illustrate the dependence of C(t1n)C(t_{1}^{n}) on nn.

III A Closed-Form Finite-Time Mutual Information Formula

In this section, we first introduce the Mercer expansion in Subsection III-A as a basic tool for our analysis. Then we derive the series representation of the finite-time mutual information, and the corresponding power constraint in Subsection III-B, under the assumption of AWGN noise.

III-A The Mercer Expansion

By analyzing the structure of the autocorrelation functions, it is possible to obtain I(t1n)I(t_{1}^{n}) and I(T)I(T) directly. In fact, if we know the Mercer expansion [6] of the autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) on interval [0,T][0,T], then we can calculate h(𝑿(t1n))h({\bm{X}}(t_{1}^{n})) more easily [9]. In the following discussion, we assume the Mercer expansion of the source autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) to be in the following form

λkϕk(t1)=0TRX(t1,t2)ϕk(t2)dt2;k>0,k,\lambda_{k}\phi_{k}(t_{1})=\int_{0}^{T}{R_{X}(t_{1},t_{2})\phi_{k}(t_{2})\mathrm{d}t_{2}};k>0,k\in\mathbb{N}, (7)

where the eigenvalues are positive: λk>0\lambda_{k}>0, and the eigenfunctions form an orthonormal set:

0Tϕi(t)ϕj(t)dt=δij.\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}. (8)

The Mercer’s theorem [6] ensures the existence and uniqueness of the eigenpairs (λk,ϕk(t))k=1(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}, and furthermore, the kernel itself can be expanded under the eigenfunctions:

RX(t1,t2)=k=1+λkϕk(t1)ϕk(t2).R_{X}(t_{1},t_{2})=\sum_{k=1}^{+\infty}{\lambda_{k}\phi_{k}(t_{1})\phi_{k}(t_{2})}. (9)

III-B Finite-Time Mutual Information Formula

Based on Mercer expansion, we obtain a closed-form formula in the following Theorem 1.

Theorem 1 (Source expansion, AWGN noise)

Suppose the information source, modeled by the transmitted process X(t)X(t), has autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}). An AWGN noise of PSD n0/2n_{0}/2 is imposed onto X(t)X(t), resulting in the received process Y(t)Y(t). The Mercer expansion of RX(t1,t2)R_{X}(t_{1},t_{2}) on [0,T][0,T] is given by (7), satisfying (8). Then the finite-time mutual information I(T)I(T) within the observation window [0,T][0,T] between the processes X(t)X(t) and Y(t)Y(t) can be expressed as

I(T)=12k=1+log(1+λkn0/2).I(T)=\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)}. (10)
Proof:

The Mercer expansion decomposes the AWGN channel into an infinite number of independent parallel subchannels, each with signal power λk\lambda_{k} and noise variance n0/2n_{0}/2. Thus, accumulating all the mutual information values of these subchannels yields the finite-time mutual information I(T)I(T). ∎

From Theorem 1 we can conclude that the finite-time mutual information of AWGN channel is uniquely determined by the Mercer spectra λk\lambda_{k} of RX(t1,t2)R_{X}(t_{1},t_{2}) within [0,T][0,T]. However, it remains unknown whether the series representation (10) converges. In fact, the convergence is closely related to the signal power, which is calculated in the following Lemma 1.

Lemma 1 (Operator Trace Coincide with Power Constraint)

Given stationary Gaussian process X(t)X(t) with mean zero and autocorrelation RX(t1,t2)R_{X}(t_{1},t_{2}). The Mercer expansion of RX(t1,t2)R_{X}(t_{1},t_{2}) on [0,T][0,T] is given by (7), satisfying (8). The Mercer operator M():2([0,T])2([0,T])M(\cdot):\mathcal{L}^{2}([0,T])\rightarrow\mathcal{L}^{2}([0,T]) is defined by the integral (Mϕ)(s)=0TRX(s,τ)ϕ(τ)dτ(M\phi)(s)=\int_{0}^{T}{R_{X}(s,\tau)\phi(\tau)\mathrm{d}\tau}. Then the sum of all the eigenvalues λk\lambda_{k} of operator MM is equal to the signal energy PTPT within [0,T][0,T]:

tr(M):=k=1+λk=PT,\mathrm{tr}(M):=\sum_{k=1}^{+\infty}{\lambda_{k}}=PT, (11)

where P=RX(0,0)P=R_{X}(0,0).

Proof:

Integrating both sides of (9) on the interval [0,T][0,T] gives the conclusion immediately. ∎

Remark 1

From the above Lemma 1, we can conclude that the sum of λk\lambda_{k} is finite when TT is finite. It can be immediately derived that I(T)<I(T)<\infty, since log(1+x)x\log(1+x)\leq x.

Remark 2

The finite-time mutual information formula (10) is closely related to the operator theory [7] in functional analysis. The sum of all the eigenvalues λk\lambda_{k} is called the operator trace in linear operator theory. As is mentioned in Lemma 1, the autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) can be treated as a linear operator MM on 2([0,T])\mathcal{L}^{2}([0,T]).

IV Proof of the Existence of Exceed-Average Phenomenon

In this section, we first give a proof of the existence of the exceed-average phenomenon in a typical case, then we discuss the compatibility with the Shannon-Hartley theorem.

IV-A Closed-Form Mutual Information in a Typical Case

To prove the exceed-average phenomenon, we only need to show that the finite-time mutual information is greater than the averaged mutual information in a typical case. Let us consider a finite-time communication scheme with a finitely-powered stationary transmitted signal autocorrelation222The signal autocorrelation RX(τ)R_{X}(\tau) is often observed in many scenarios, such as passing a signal with white spectrum through an RC lowpass filter., which is specified as

RX(t1,t2)=RX(τ)=Pexp(α|τ|),R_{X}(t_{1},t_{2})=R_{X}(\tau)=P\exp(-\alpha\lvert\tau\rvert), (12)

where τ=t1t2\tau=t_{1}-t_{2}, under AWGN channel with noise PSD being n0/2n_{0}/2. The power of signal X(t)X(t) is P=RX(0)P=R_{X}(0). According to Lemma 1, the trace of the corresponding Mercer operator M()M(\cdot) is finite. Then the finite-time mutual information given by Theorem 1 is also finite, as is shown in Remark 1. Finding the Mercer expansion is equivalent to finding the eigenpairs (λk,ϕk(t))k=1(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}. The eigenpairs are determined by the following characteristic integral equation [10]:

λkϕk(s)=0sPeα(st)ϕk(t)dt+sTPeα(ts)ϕk(t)dt.\lambda_{k}\phi_{k}(s)=\int_{0}^{s}{Pe^{-\alpha(s-t)}\phi_{k}(t)\mathrm{d}t}+\int_{s}^{T}{Pe^{-\alpha(t-s)}\phi_{k}(t)\mathrm{d}t}. (13)

Differentiating both sides of (13) twice with respect to ss yields the boundary conditions and the differential equation that λk,ϕk\lambda_{k},\phi_{k} must satisfy:

λkϕk′′(s)\displaystyle\lambda_{k}\phi_{k}^{\prime\prime}(s) =(α2λk2αP)ϕk(s),0<s<T,\displaystyle=(\alpha^{2}\lambda_{k}-2\alpha P)\phi_{k}(s),0<s<T, (14)
ϕk(0)\displaystyle\phi_{k}^{\prime}(0) =αϕk(0),\displaystyle=\alpha\phi_{k}(0),
ϕk(T)\displaystyle\phi_{k}^{\prime}(T) =αϕk(T).\displaystyle=-\alpha\phi_{k}(T).

Let ωk>0\omega_{k}>0 denote the resonant frequency of the above harmonic oscillator equation, and let ϕk(t)=Akcos(ωkt)+Bksin(ωkt)\phi_{k}(t)=A_{k}\cos(\omega_{k}t)+B_{k}\sin(\omega_{k}t) be the sinusoidal form of the eigenfunction. Using the boundary conditions we obtain

Bkωk=αAk,\displaystyle B_{k}\omega_{k}=\alpha A_{k}, (15)
Bkωkcos(ωkT)Akωksin(ωkT)\displaystyle B_{k}\omega_{k}\cos(\omega_{k}T)-A_{k}\omega_{k}\sin(\omega_{k}T)
=α(Akcos(ωkT)+Bksin(ωkT)).\displaystyle=-\alpha\left(A_{k}\cos(\omega_{k}T)+B_{k}\sin(\omega_{k}T)\right).
Refer to caption
Figure 2: Images of 2arctan(ω/α)2\arctan(\omega/\alpha) and kπωTk\pi-\omega T with T=2,α=1T=2,\alpha=1 and k=1,2,3,4k=1,2,3,4. The desired resonant frequencies ωk\omega_{k} can be read from the horizontal coordinates of the intersection points of the curve 2arctan(ω/α)2\arctan(\omega/\alpha) and the parallel lines.

To ensure the existence of solution to the homogeneous linear equations (15) with unknowns Ak,BkA_{k},B_{k}, the determinant must be zero. Exploiting this condition, we find that ωk\omega_{k} naturally satisfy the transcendental equation tan(ωkT)=(2ωkα)/(ωk2α2)\tan(\omega_{k}T)=(2\omega_{k}\alpha)/({\omega_{k}^{2}-\alpha^{2}}). By introducing an auxillary variable θk=arctan(ωk/α)[0,π/2]\theta_{k}=\arctan(\omega_{k}/\alpha)\in[0,\pi/2], this transcendental equation can be simplified as tan(ωkT)=tan(2θk)\tan(\omega_{k}T)=-\tan(2\theta_{k}), i.e., there exists positive integer mm such that 2arctan(ωk/α)=mπωkT2\arctan(\omega_{k}/\alpha)=m\pi-\omega_{k}T. The integer mm can be chosen to be equal to kk. From the function images of 2arctan(ω/α)2\arctan(\omega/\alpha) and kπωTk\pi-\omega T (Fig. 2), we can determine ωk\omega_{k}, and then λk\lambda_{k}. To sum up, the solution to the characteristic equation (13) are collected into (16) as follows.

2arctan(ωk/α)\displaystyle 2\arctan(\omega_{k}/\alpha) =kπωkT,\displaystyle=k\pi-\omega_{k}T, (16)
λk\displaystyle\lambda_{k} =2αPα2+ωk2,\displaystyle=\frac{2\alpha P}{\alpha^{2}+\omega_{k}^{2}},
ϕk(t)\displaystyle\phi_{k}(t) =1Zk(ωkcos(ωkt)+αsin(ωkt)),\displaystyle=\frac{1}{Z_{k}}\left(\omega_{k}\cos(\omega_{k}t)+\alpha\sin(\omega_{k}t)\right),

where ZkZ_{k} denotes the normalization constants of ϕk(t)\phi_{k}(t) on [0,T][0,T] to ensure orthonormality.

Equation (16) gives all the eigenpairs (λk,ϕk)k=1(\lambda_{k},\phi_{k})_{k=1}^{\infty}, from which we can calculate I(T)I(T) by applying Theorem 1. As for the Shannon limit CshC_{\rm sh}, by applying (6) and evaluating the integral with [11] we can obtain

Cav=Csh=12(α2+4Pαn0α).C_{\rm av}=C_{\rm sh}=\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right). (17)

After all the preparation works above, we can rigorously prove that C(T)>CavC(T)>C_{\rm av} under the typical case of (12), as long as the transmission power PP is smaller than a constant δ\delta. The following Theorem 2 proves this result.

Theorem 2 (Existence of exceed-average phenomenon in a typical case)

Suppose X(t)X(t) and Y(t)Y(t) are specified according to (12). The eigenpairs are determined by (16). Then, for any fixed positive values of T,n0T,n_{0} and α\alpha, there exists δ>0\delta>0 such that the exceed-average inequality

C(T):=12Tk=1+log(1+λkn0/2)>CavC(T):=\frac{1}{2T}\sum_{k=1}^{+\infty}\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)>C_{\rm av} (18)

holds strictly for arbitrary 0<P<δ0<P<\delta.

Proof:

See Appendix A. ∎

Refer to caption
Figure 3: Theoretical verification of the exceed-average effect. The blue lines represent the finite-time mutual information I(T)I(T). The red lines are the average mutual information TCavTC_{\rm av}, calculated from (6). All the curves are evaluated under hypothesis (12), where P=1,2,4P=1,2,4, n0=1n_{0}=1 and α=1\alpha=1.

To support the above theoretical analysis, numerical experiments on I(T)I(T) are conducted based on evaluations of (16) and (17). As shown in Fig. 3, it seems that we can always harness more mutual information in a finite-time observation window than the averaged mutual information. This fact is somehow unsurprising because the observations 𝒀(t1N){\bm{Y}}(t_{1}^{N}) inside the window [0,T][0,T] can always eliminate some extra uncertainty outside the window due to the autocorrelation of X(t)X(t).

IV-B Compatibility with the Shannon-Hartley theorem

Though the exceed-average effect do imply mathematically that the mutual information within a finite-time window is higher than the average mutual information CavC_{\rm av} (which coincides with the Shannon limit (6) in this case), in fact, it is still impossible to construct a long-time stable data transmission scheme above the Shannon capacity by leveraging this effect. So the exceed-average phenomenon does not contradict the Shannon-Hartley Theorem. Placing additional observation windows cannot increase the average information rate, because the posterior process Y(t)|𝒀(t1N)Y(t)|{\bm{Y}}(t_{1}^{N}) does not carry as much information as the original one, causing a rate reduction in the later windows. It is expected that, the finite-time mutual information would ultimately decrease to the average mutual information as the total observation time tends to infinity (i.e., C()=CavC(\infty)=C_{\rm av}), and the analytical proof, as well as the achievability of this finite-time mutual information, is still worth investigation in future works.

V Conclusions

In this paper, we provided rigorous proofs of the existence of the exceed-average phenomenon under typical autocorrelation settings of the transmitted signal process and the noise process. Our discovery of the exceed-average phenomenon provided a generalization of Shannon’s renowned formula C=Wlog(1+S/N)C=W\log(1+S/N) to the practical finite-time communications. Since the finite-time mutual information is a more precise estimation of the capacity limit, the optimization target may shift from the average mutual information to the finite-time mutual information in the design of practical communication systems. Thus, it may have guiding significance for the performance improvement of modern communication systems. In future works, general proofs of C(T)>CavC(T)>C_{\rm av}, independent of the concrete autocorrelation settings, still require further investigation. Moreover, we need to answer the question of how to exploit this exceed-average phenomenon to improve the communication rate, and whether this finite-time mutual information is achievable by a certain coding scheme. In addition, although we have discovered numerically that the finite-time mutual information agrees with the Shannon capacity when TT\rightarrow\infty, an analytical proof of this result is required in the future.

Appendix A
Proof Of Theorem 2

Plugging (17) into the right-hand side of (18), and differentiate both sides w.r.t PP. Notice that if P=0P=0, then both sides of (18) are equal to 0. Thus, we only need to prove that the derivative of left-hand side is strictly larger than that of right-hand side within a small interval P(0,δ)P\in(0,\delta):

12k=1+(11+2λkn02λkn0P)>Tn011+4P/(n0α).\frac{1}{2}\sum_{k=1}^{+\infty}{\left(\frac{1}{1+\frac{2\lambda_{k}}{n_{0}}}\frac{2\lambda_{k}}{n_{0}P}\right)}>\frac{T}{n_{0}}\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (19)

Multiply both sides of (19) by n0n_{0} and define μk:=λk/(PT)\mu_{k}:=\lambda_{k}/(PT), and then from Lemma 1 we obtain kμk=1\sum_{k}{\mu_{k}}=1. In this way, (19) is equivalent to

k=1+μk1+2λkn0>11+4P/(n0α).\sum_{k=1}^{+\infty}{\frac{\mu_{k}}{1+\frac{2\lambda_{k}}{n_{0}}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (20)

Since φ(x):=1/(1+2x/n0)\varphi(x):=1/(1+2x/n_{0}) is convex on (0,+)(0,+\infty), by applying Jensen’s inequality to the left-hand side of (20), we only need to prove that

11+2n0kλkμk>11+4P/(n0α).\frac{1}{1+\frac{2}{n_{0}}\sum_{k}{\lambda_{k}\mu_{k}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (21)

From the definition of μk\mu_{k} we can derive that λkμk=λk2/(PT)\lambda_{k}\mu_{k}=\lambda_{k}^{2}/(PT). So we go on to calculate kλk2\sum_{k}{\lambda_{k}^{2}}. That is equivalent to calculate tr(M2)\mathrm{tr}(M^{2}), where M2M^{2} corresponds to the integral kernel:

KM2(t1,t2):=0TP2exp(α|t1s|)exp(α|t2s|)ds.K_{M^{2}}(t_{1},t_{2}):=\int_{0}^{T}{P^{2}\exp(-\alpha\lvert t_{1}-s\rvert)\exp(-\alpha\lvert t_{2}-s\rvert)\mathrm{d}s}. (22)

Evaluating the kernel KM2K_{M^{2}} on the diagonal t=t1=t2t=t_{1}=t_{2}, and integrating this kernel on the diagonal of [0,T]2[0,T]^{2} gives kλk2\sum_{k}{\lambda_{k}^{2}}, i.e., tr(M2)\mathrm{tr}(M^{2}):

kλk2\displaystyle\sum_{k}{\lambda_{k}^{2}} =P22α0T(2e2αte2α(Tt))dt,\displaystyle=\frac{P^{2}}{2\alpha}\int_{0}^{T}{\left(2-e^{-2\alpha t}-e^{-2\alpha(T-t)}\right)\mathrm{d}t}, (23)
=P2α(T12α(1e2αT)).\displaystyle=\frac{P^{2}}{\alpha}\left(T-\frac{1}{2\alpha}(1-e^{-2\alpha T})\right).

By substituting (23) into (21), we just need to prove that

1+4P/(n0α)>1+2Pn0α(11e2αT2αT).\sqrt{1+4P/(n_{0}\alpha)}>1+\frac{2P}{n_{0}\alpha}\left(1-\frac{1-e^{-2\alpha T}}{2\alpha T}\right). (24)

Define the dimensionless number x=2P/(n0α)x=2P/(n_{0}\alpha). Since the function ψ(x):=(1exp(x))/x\psi(x):=(1-\exp(-x))/x is strictly positive and less than 1 at x>0x>0, we can conclude that, there exists a small positive δ>0\delta>0 such that (24) holds for 0<P<δ0<P<\delta. The number δ\delta can be chosen as

δ=n0αψ(2αT)(1ψ(2αT))2>0,\delta=\frac{n_{0}\alpha\psi(2\alpha T)}{(1-\psi(2\alpha T))^{2}}>0, (25)

which implies that (19) holds for any 0<P<δ0<P<\delta. Thus, integrating (19) on both sides from p=0p=0 to p=P,P<δp=P,P<\delta gives rise to the conclusion (18), which completes the proof of Theorem 2.

Acknowledgment

This work was supported in part by the National Key Research and Development Program of China (Grant No.2020YFB1807201), in part by the National Natural Science Foundation of China (Grant No. 62031019), and in part by the European Commission through the H2020-MSCA-ITN META WIRELESS Research Project under Grant 956256.

References

  • [1] C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Techni. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
  • [2] H. Landau, “Sampling, data transmission, and the nyquist rate,” Proc. IEEE, vol. 55, no. 10, pp. 1701–1706, Oct. 1967.
  • [3] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans. American Institute of Electrical Engineers, vol. 47, no. 2, pp. 617–644, Apr. 1928.
  • [4] D. Slepian, “On bandwidth,” Proc. IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976.
  • [5] T. M. Cover, Elements of information theory.   John Wiley & Sons, 1999.
  • [6] J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Philos. Trans. Royal Soc., vol. 209, pp. 4–415, 1909.
  • [7] K. Zhu, Operator theory in function spaces.   American Mathematical Soc., 2007, no. 138.
  • [8] A. Balakrishnan, “A note on the sampling principle for continuous signals,” IRE Trans. Inf. Theory, vol. 3, no. 2, pp. 143–146, Jun. 1957.
  • [9] J. Barrett and D. Lampard, “An expansion for some second-order probability distributions and its application to noise problems,” IRE Trans. Inf. Theory, vol. 1, no. 1, pp. 10–15, Mar. 1955.
  • [10] D. Cai and P. S. Vassilevski, “Eigenvalue problems for exponential-type kernels,” Comput. Methods in Applied Math., vol. 20, no. 1, pp. 61–78, Jan. 2020.
  • [11] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, series, and products.   Academic press, 2014.