This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Finite-Time Capacity: Making
Exceed-Shannon Possible?

Jieao Zhu, Zijian Zhang, Zhongzhichao Wan, and Linglong Dai All authors are with the Beijing National Research Center for Information Science and Technology (BNRist) as well as the Department of Electronic Engineering, Tsinghua University, Beijing 100084, China (E-mails: zja21, zhangzj20, [email protected]; [email protected]). This work was supported in part by the National Key Research and Development Program of China (Grant No.2020YFB1807201), in part by the National Natural Science Foundation of China (Grant No. 62031019), and in part by the European Commission through the H2020-MSCA-ITN META WIRELESS Research Project under Grant 956256.
Abstract

Shannon-Hartley theorem can accurately calculate the channel capacity when the signal observation time is infinite. However, the calculation of finite-time capacity, which remains unknown, is essential for guiding the design of practical communication systems. In this paper, we investigate the capacity between two correlated Gaussian processes within a finite-time observation window. We first derive the finite-time capacity by providing a limit expression. Then we numerically compute the maximum transmission rate within a single finite-time window. We reveal that the number of bits transmitted per second within the finite-time window can exceed the classical Shannon capacity, which is called as the Exceed-Shannon phenomenon. Furthermore, we derive a finite-time capacity formula under a typical signal autocorrelation case by utilizing the Mercer expansion of trace class operators, and reveal the connection between the finite-time capacity problem and the operator theory. Finally, we analytically prove the existence of the Exceed-Shannon phenomenon in this typical case, and demonstrate the achievability of the finite-time capacity and its compatibility with the classical Shannon capacity.

Index Terms:
Finite-time capacity, Exceed-Shannon, Mercer expansion, signal autocorrelation, operator theory.

I Introduction

The Shannon-Hartley theorem [1] has accurately revealed the fundamental theoretical limit of information transmission rate CC, which is also called as the Shannon capacity, over a Gaussian waveform channel of a limited bandwidth WW. The expression for Shannon capacity is C=Wlog(1+S/N)C=W\log\left(1+S/N\right), where SS and NN denote the signal power and the noise power, respectively. The derivation of Shannon-Hartley Theorem heavily depends on the Nyquist sampling principle [2]. The Nyquist sampling principle, which is also named as the 2WT2WT theorem [3], claims that one can only obtain 2WT+o(2WT)2WT+o(2WT) independent samples within an observation time window TT in a channel band-limited to WW [4], where o()o(\cdot) means higher-order infinitesimal, i.e., o(2WT)/T0,To(2WT)/T\to 0,T\to\infty.

Based on the Nyquist sampling principle, the Shannon capacity is derived by multiplying the capacity 1/2log(1+P/N)1/2\log(1+P/N) of a Gaussian symbol channel [5, p.249] with 2WT+o(2WT)2WT+o(2WT) at first, and then dividing the result by TT, finally letting TT\rightarrow\infty. In the above derivation, (2WT+o(2WT))/T(2WT+o(2WT))/T is approximated by 2W2W in the final step to obtain the Shannon capacity. Note that this approximation only holds when TT\to\infty. Therefore, the Shannon capacity only asymptotically holds as TT becomes sufficiently large. When TT is of finite value, the approximation fails to work. Thus, when the observation time TT is finite, i.e., the received signal can only be observed within a finite-time window [0,T][0,T], Shannon-Hartley Theorem cannot be directly applied to calculate the capacity in a finite-time window. To the best of our knowledge, the evaluation of the finite-time capacity has not yet been investigated in the literature. One possible reason is that, most of the researchers mainly focused on how to approximate the Shannon capacity with advanced coding and modulation schemes. It is worth noting that any real-world communication systems transmit signals in a finite-time window, thus evaluating the finite-time capacity is of practical significance.

In this paper, to fill in this gap, the finite-time capacity instead of the traditional infinite-time counterpart is analyzed, where we reveal and prove the existence of “Exceed-Shannon” phenomenon within a finite-time observation window111Simulation codes will be provided to reproduce the results presented in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html.. Specifically, our contributions are summarized as follows:

  • We derive the capacity expressions within a finite-time observation window by using dense sampling and limiting methods. In this way, we can overcome the continuous difficulties that appear when analyzing the information contained in a continuous time interval. These finite-time capacity expressions make the analysis of finite-time capacity problems possible.

  • We approximate the original continuous finite-time capacity expressions by discrete matrices, and conduct numerical experiments based on the discretized formulas. In the numerical results under a special setting, we reveal the “Exceed-Shannon” phenomenon222In fact, the finite-time “Exceed-Shannon” phenomenon revealed in this paper does not contradict the classical infinite-time Shannon-Hartley theorem, since new assumptions are considered. Specifically, in the Shannon-Hartley theorem, the sampling time is assumed to be infinitely long, while in this paper, the sampling takes place in a finite-time observation window. Similarly, although compressed sensing[6] can achieve much lower sampling rate than the Nyquist sampling rate to perform accurate sparse signal reconstruction, it does not contradict the Nyquist sampling principle due to the new assumption of signal sparsity., i.e., the mutual information within a finite-time observation window exceeds the Shannon capacity.

  • In order to analytically prove the revealed “Exceed-Shannon” phenomenon, we first derive an analytical finite-time capacity formula based on Mercer expansion [7], where we can find the connection between the capacity problem and the operator theory [8]. To make the problem tractable, we construct a typical case in which the transmitted signal has certain statistical properties. Utilizing this construction, we obtain a closed-form capacity solution in this typical case, which leads to a rigorous proof of the “Exceed-Shannon” phenomenon. Inspired by the techniques in the proof, we find that the finite-time capacity is, in fact, a more general case of the Shannon limit, thus the “Exceed-Shannon” phenomenon of the finite-time capacity is compatible with the classical Shannon theory.

Organization: In the rest of this paper, the finite-time capacity is formulated and evaluated numerically in Section II, where the “Exceed-Shannon” phenomenon is first discovered. Then, in Section III, we derive a closed-form finite-time capacity formula under a typical case. Based on this formula, in Section IV, the “Exceed-Shannon” phenomenon is rigorously proved. Finally, conclusions are drawn in Section V.

Notations: X(t)X(t) denotes a Gaussian process; RX(t1,t2)R_{X}(t_{1},t_{2}) denotes the autocorrelation function; SX(f),SX(ω)S_{X}(f),S_{X}(\omega) are the power spectral density (PSD) of the corresponding process X(t)X(t), where ω=2πf\omega=2\pi f; Boldface italic symbols 𝑿(t1n)\bm{X}(t_{1}^{n}) denotes the column vector generated by taking samples of X(t)X(t) on instants ti,1int_{i},1\leq i\leq n; Upper-case boldface letters such as 𝚽{\bf{\Phi}} denote matrices; 𝔼[]\mathbb{E\left[\cdot\right]} denotes the expectation; 𝟙A()\mathbbm{1}_{A}(\cdot) denotes the indicator function of the set AA; 2([0,T])\mathcal{L}^{2}([0,T]) denotes the collection of all the square-integrable functions on window [0,T][0,T]; i{\rm i} denotes the imaginary unit.

II Numerical Analysis of the Finite-Time Capacity

In this section, we focus on the numerical evaluation of the finite-time capacity. In Subsection II-A, we model the transmission problem by Gaussian processes, and derive the capacity expressions within a finite-time observation window by using dense sampling and limiting methods; In Subsection II-B, we approximate the finite-time capacity by discretized matrix-based formulas; In Subsection II-C, we reveal the “Exceed-Shannon” phenomenon by numerically evaluating the finite-time capacity in a special setting of the signal autocorrelations.

II-A The Expressions for Finite-Time Capacity

The finite-time capacity is, heuristically, defined as the maximum number of bits that can be successfully transmitted within a finite-time window. Since Shannon capacity is defined on pairs of random variables, it is crucial to introduce randomness into the transmission model. Inspired by [9], we model the transmitted signal by a zero-mean stationary Gaussian stochastic process, denoted as X(t)X(t), and the received signal by Y(t):=X(t)+N(t)Y(t):=X(t)+N(t). The process N(t)N(t), which denotes the noise, is also a stationary Gaussian process independent of X(t)X(t). The receiver is only allowed to observe the signal within finite-time window [0,T][0,T], where T>0T>0 is the observation window span. Our goal is to find the maximum number of bits that can be acquired within this time window.

To analytically express the amount of the acquired information, we first introduce nn sampling instants inside the time window, denoted by (t1,t2,,tn):=t1n(t_{1},t_{2},\cdots,t_{n}):=t_{1}^{n}, and then let nn\rightarrow\infty to approximate the finite-time capacity333In this paper, we do not explicitly distinguish between the terms “finite-time mutual information” and “finite-time capacity”, since we consider communication schemes where the source autocorrelation is fixed.. This approximation of capacity becomes more precise as the sampling instants t1nt_{1}^{n} become denser. Then by defining 𝑿(t1n)(X(t1),X(t2),,X(tn)){\bm{X}}(t_{1}^{n})\equiv(X(t_{1}),X(t_{2}),\cdots,X(t_{n})) and 𝒀(t1n)(Y(t1),Y(t2),,Y(tn)){\bm{Y}}(t_{1}^{n})\equiv(Y(t_{1}),Y(t_{2}),\cdots,Y(t_{n})), the capacity on these nn samples can be expressed as

I(t1n)=I(𝑿(t1n);𝒀(t1n)),I(t_{1}^{n})=I({\bm{X}}(t_{1}^{n});{\bm{Y}}(t_{1}^{n})), (1)

and the finite-time capacity is defined as

I(T)=limnsup{t1n}[0,T]I(t1n).I(T)=\lim_{n\rightarrow\infty}\sup_{\{t_{1}^{n}\}\subset[0,T]}{I(t_{1}^{n})}. (2)

Then, the transmission rate C(T)C(T) can be defined by dividing the amount of information acquired within [0,T][0,T] by the time span TT:

C(t1n)\displaystyle C(t_{1}^{n}) =I(t1n)/T,\displaystyle=I(t_{1}^{n})/T, (3)
C(T)\displaystyle C(T) =I(T)/T.\displaystyle=I(T)/T.

From these definitions, we can define the limit capacity as C()=limTC(T)C(\infty)=\lim_{T\rightarrow\infty}{C(T)} by letting TT\rightarrow\infty. The quantity C()C(\infty) characterizes the maximum average number of bits per second one can acquire from a received noisy stochastic process.

II-B Discretization

Without loss of generality, we fix the sampling instants uniformly onto fractions of TT: ti=(i1)T/n,1int_{i}=(i-1)T/n,1\leq i\leq n. Since the random vectors 𝑿(t1n){\bm{X}}(t_{1}^{n}) and 𝒀(t1n){\bm{Y}}(t_{1}^{n}) are samples of a Gaussian process, they are both Gaussian random vectors with mean zero and covariance matrices 𝑲X{\bm{K}}_{X} and 𝑲Y{\bm{K}}_{Y}, where 𝑲X,𝑲Yn×n{\bm{K}}_{X},{\bm{K}}_{Y}\in\mathbb{R}^{n\times n} are symmetric positive-definite matrices. The entries of 𝑲X{\bm{K}}_{X} and 𝑲Y{\bm{K}}_{Y} are determined by the autocorrelation functions of Gaussian processes X(t)X(t) and Y(t)Y(t), denoted by RX(t1,t2)R_{X}(t_{1},t_{2}) and RY(t1,t2)R_{Y}(t_{1},t_{2}):

(𝑲X)i,j\displaystyle({\bm{K}}_{X})_{i,j} =RX(ti,tj):=𝔼[X(ti)X(tj)],\displaystyle=R_{X}(t_{i},t_{j}):=\mathbb{E}\left[X(t_{i})X(t_{j})\right], (4)
(𝑲Y)i,j\displaystyle({\bm{K}}_{Y})_{i,j} =RY(ti,tj):=𝔼[Y(ti)Y(tj)].\displaystyle=R_{Y}(t_{i},t_{j}):=\mathbb{E}\left[Y(t_{i})Y(t_{j})\right].

Note that Y(t)Y(t) is the independent sum of X(t)X(t) and N(t)N(t), thus the autocorrelation functions satisfy RY(t1,t2)=RX(t1,t2)+RN(t1,t2)R_{Y}(t_{1},t_{2})=R_{X}(t_{1},t_{2})+R_{N}(t_{1},t_{2}), and similarly the covariance matrices satisfy 𝑲Y=𝑲X+𝑲N{\bm{K}}_{Y}={\bm{K}}_{X}+{\bm{K}}_{N}.

The mutual information I(t1n)I(t_{1}^{n}) is defined as I(t1n)=h(𝒀(t1n))h(𝒀(t1n)|𝑿(t1n))=h(𝒀(t1n))h(𝑵(t1n))I(t_{1}^{n})=h({\bm{Y}}(t_{1}^{n}))-h({\bm{Y}}(t_{1}^{n})|{\bm{X}}(t_{1}^{n}))=h({\bm{Y}}(t_{1}^{n}))-h({\bm{N}}(t_{1}^{n})), where h()h(\cdot) denotes the differential entropy. For nn-dimentional Gaussian vector 𝑼{\bm{U}} with mean 0 and covariance matrix 𝑲{\bm{K}}, the differential entropy is given by

h(𝑼)=12log((2πe)ndet(𝑲)).h({\bm{U}})=\frac{1}{2}\log\left((2\pi e)^{n}\det({\bm{K}})\right). (5)

Plugging (5) into the definition of I(t1n)I(t_{1}^{n}), we obtain

I(t1n)=12log(det(𝑲X+𝑲N)det(𝑲N)).I(t_{1}^{n})=\frac{1}{2}\log\left(\frac{\det({\bm{K}}_{X}+{\bm{K}}_{N})}{\det{({\bm{K}}_{N})}}\right). (6)

In (6), by letting nn\rightarrow\infty, we can find that I(t1n)I(t_{1}^{n}) increases monotonously when nn doubles, because of the data processing inequality. Though without rigorous proof, we can assume with confidence that I(t1n)I(t_{1}^{n}) is an increasing function of nn. However, it remains unknown whether I(t1n)I(t_{1}^{n}) tends to a finite limit. In fact, I(t1n)I(t_{1}^{n}) can be arbitrarily large, since the signal outside the noise band is strictly unpolluted by the noise, which results in infinite SNR. Thus, the capacity will diverge to infinity. Therefore, in order to avoid capacity divergence, at least one of the following conditions should be satisfied:

  • The noise process N(t)N(t) is not band-limited.

  • The power spectral density of X(t)X(t) is strictly contained inside the band of N(t)N(t).

Thus, in the following numerical analysis, we choose N(t)N(t) to be band-unlimited. This leads to the choice of reasonable autocorrelation functions of X(t)X(t) and N(t)N(t) in the following subsection.

II-C Numerical Analysis

In order to study the properties of mutual information I(t1n)I(t_{1}^{n}) as a function of nn, we perform numerical analyses under different values of nn and TT. The autocorrelation function and PSD of the signal process X(t)X(t) and noise process N(t)N(t) are set to the special case

RX(t1,t2)\displaystyle R_{X}(t_{1},t_{2}) =sinc(10(t1t2)),\displaystyle=\mathrm{sinc}(10(t_{1}-t_{2})), (7)
RN(t1,t2)\displaystyle R_{N}(t_{1},t_{2}) =exp(|t1t2|),\displaystyle=\exp(-\lvert t_{1}-t_{2}\rvert),
SX(f)\displaystyle S_{X}(f) =0.1×𝟙{5f5},\displaystyle=0.1\times\mathbbm{1}_{\{-5\leq f\leq 5\}},
SN(f)\displaystyle S_{N}(f) =21+(2πf)2,\displaystyle=\frac{2}{1+(2\pi f)^{2}},

where sinc(x):=sin(πx)/(πx)\mathrm{sinc}(x):=\sin(\pi x)/(\pi x). Note that the PSD of the transmitted process SX(f)S_{X}(f) is strictly band-limited, while the PSD of the noise process is not. In fact, the noise PSD is carefully selected to ensure the received noise has finite power on each instant tit_{i}, allowing the execution of numerical computations. A finitely powered process PSD must be colored, in contrast to additive white Gaussian noise (AWGN) with white PSD and infinite power. That is the reason why we choose SN(f)S_{N}(f) to be the form of (7).

In order to compare the finite-time capacity with the classical Shannon capacity, we have to calculate the Shannon capacity with colored noise spectrum SN(f)S_{N}(f), which is a generalized version of the well-known formula C=Wlog(1+S/N)C=W\log\left(1+S/N\right). The Shannon capacity CshC_{\rm sh} of colored noise PSD [5], measured in nat/s{\rm nat/s}, is expressed as

Csh:=12+log(1+SX(f)SN(f))df[nat/s].C_{\rm sh}:=\frac{1}{2}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(f)}{S_{N}(f)}\right)\mathrm{d}f}\quad[{\rm nat/s}]. (8)

Then, plugging (7) into (8) yields the numerical result for CshC_{\rm sh}.

In the numerical analysis, we calculate the finite-time transmission rate C(T)C(T) and Shannon capacity against the number of samples nn within the observation window [0,T][0,T]. The numerical results are collected in Fig. 1. It is shown that I(t1n)I(t_{1}^{n}) is an increasing function of nn, and for fixed values of TT, the approximated finite-time capacity I(t1n)I(t_{1}^{n}) tends to a finite limit under the correlation assumptions given by (7). The most amazing observation is that, we can obtain more information within finite-time window [0,T][0,T] than the prediction TCshTC_{\rm sh} given by the Shannon capacity (8). We call this phenomenon the “Exceed-Shannon” phenomenon.

Refer to caption
Figure 1: A first glance to the “Exceed-Shannon” phenomenon, where the red dashed horizontal line is the Shannon capacity, and the T=1,2,8T=1,2,8 curves illustrate the dependence of C(t1n)C(t_{1}^{n}) on nn.

To analytically verify the existence of the Exceed-Shannon phenomenon, we are going to introduce some mathematical tools in the following Section III, and finally give an analytical proof in Section IV.

III A Closed-Form Finite-Time Capacity Formula

In this section, we first introduce the Mercer expansion in Subsection III-A as a basic tool for our analysis. Then we derive the series representation of the finite-time capacity, and the corresponding power constraint in Subsection III-B, under the assumption of AWGN noise. The power constraint shows that the finite-time capacity is upper-bounded, thus the series expansion of the finite-time capacity converges absolutely.

III-A The Mercer Expansion

Motivated by the discovery of the Exceed-Shannon phenomenon, we go further into the underlying mechanism behind this fact. Since the calculation of (6) depends on the evaluation of determinants that are determined by autocorrelation functions of Gaussian processes, it is possible to obtain I(t1n)I(t_{1}^{n}) and I(T)I(T) directly from the autocorrelation functions. In fact, if we know the Mercer expansion [7] of the autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) on window [0,T][0,T], then we can calculate h(𝑿(t1n))h({\bm{X}}(t_{1}^{n})) more easily [10]. In the following discussion, we assume the Mercer expansion of the source autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) to be in the following form

λkϕk(t1)=0TRX(t1,t2)ϕk(t2)dt2;k>0,k.\lambda_{k}\phi_{k}(t_{1})=\int_{0}^{T}{R_{X}(t_{1},t_{2})\phi_{k}(t_{2})\mathrm{d}t_{2}};k>0,k\in\mathbb{N}. (9)

Due to the positive-definite property of the integral kernel RX(t1,t2)R_{X}(t_{1},t_{2}), the eigenvalues are strictly positive: λk>0\lambda_{k}>0, and the eigenfunctions form an orthonormal set:

0Tϕi(t)ϕj(t)dt=δij.\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}. (10)

The Mercer’s theorem [7] ensures the existence and uniqueness of the eigenpairs (λk,ϕk(t))k=1(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}, and furthermore, the kernel itself can be expanded under the eigenfunctions. The convergence is absolute and uniform:

RX(t1,t2)=k=1+λkϕk(t1)ϕk(t2).R_{X}(t_{1},t_{2})=\sum_{k=1}^{+\infty}{\lambda_{k}\phi_{k}(t_{1})\phi_{k}(t_{2})}. (11)

The Mercer expansion enables us to analytically express an autocorrelation function on a finite-time interval [0,T][0,T], since the autocorrelation function can be naturally treated as a positive-definite integral kernel.

III-B Finite-Time Capacity Formula

Based on Mercer expansion, we can obtain a closed-form formula in the following Theorem 1.

Theorem 1 (Source expansion, AWGN noise)

Suppose the information source, modeled by the transmitted process X(t)X(t), has autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}). An AWGN noise of PSD n0/2n_{0}/2 is imposed onto X(t)X(t), resulting in the received process Y(t)Y(t). The Mercer expansion of RX(t1,t2)R_{X}(t_{1},t_{2}) on [0,T][0,T] is given by (9), satisfying (10). Then the finite-time mutual information I(T)I(T) within the observation window [0,T][0,T] between the processes X(t)X(t) and Y(t)Y(t) can be expressed as

I(T)=12k=1+log(1+λkn0/2).I(T)=\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)}. (12)
Proof:

See Appendix A. ∎

From Theorem 1, we can conclude that the finite-time capacity of AWGN channel is uniquely determined by the Mercer spectra λk\lambda_{k} of RX(t1,t2)R_{X}(t_{1},t_{2}) within [0,T][0,T]. However, it remains unknown whether the series representation (12) converges. In fact, the convergence is closely related to the signal power. In Fourier transform, the power in the time domain is equal to the power in the frequency domain, which is known as the Parseval’s theorem[11]. Like the Fourier transform, the transform defined by the orthonormal basis {ϕk(t)}k=1\{\phi_{k}(t)\}_{k=1}^{\infty} also satisfy the Parseval’s theorem. This observation leads to a theoretic verification of power conservation in the view of λk\lambda_{k}, which is stated in the following Lemma 1.

Lemma 1 (Operator Trace Coincide with Power Constraint)

Given stationary Gaussian process X(t)X(t) with mean zero and autocorrelation RX(t1,t2)R_{X}(t_{1},t_{2}). The Mercer expansion of RX(t1,t2)R_{X}(t_{1},t_{2}) on [0,T][0,T] is given by (9), satisfying (10). The Mercer operator M():2([0,T])2([0,T])M(\cdot):\mathcal{L}^{2}([0,T])\rightarrow\mathcal{L}^{2}([0,T]) is defined by the integral (Mϕ)(s)=0TRX(s,τ)ϕ(τ)dτ(M\phi)(s)=\int_{0}^{T}{R_{X}(s,\tau)\phi(\tau)\mathrm{d}\tau}. Then the sum of all the eigenvalues λk\lambda_{k} of operator MM is equal to the signal energy PTPT within [0,T][0,T]:

tr(M):=k=1+λk=PT,\mathrm{tr}(M):=\sum_{k=1}^{+\infty}{\lambda_{k}}=PT, (13)

where P=RX(0,0)P=R_{X}(0,0).

Proof:

See Appendix B. ∎

Remark 1

The convergence of the finite-time capacity series (12) is ensured by Lemma 1. In fact, from the above Lemma 1, we can conclude that the sum of λk\lambda_{k} is finite when TT is finite. It can be immediately derived that I(T)<I(T)<\infty, since log(1+x)x\log(1+x)\leq x. Furthermore, note that the sum of λk\lambda_{k} is finite even for non-stationary processes (i.e., the power at time tt: P(t):=𝔼[X2(t)]P(t):=\mathbb{E}\left[X^{2}(t)\right] is not always a constant P:=RX(0,0)P:=R_{X}(0,0)), as long as P(t)<P(t)<\infty holds for any 0<t<T0<t<T. Then the conclusion I(T)<I(T)<\infty holds even for non-stationary processes.

Remark 2

The finite-time capacity formula (12) is closely related to the operator theory [8] in functional analysis. The sum of all the eigenvalues λk\lambda_{k} is called the operator trace in linear operator theory. As is mentioned in Lemma 1, the autocorrelation function RX(t1,t2)R_{X}(t_{1},t_{2}) can be treated as a linear operator MM on 2([0,T])\mathcal{L}^{2}([0,T]). Furthermore, this operator belongs to the trace class [12] if and only if 0TRX(t,t)dt<\int_{0}^{T}{R_{X}(t,t)\mathrm{d}t}<\infty. Note that this condition is automatically satisfied if X(t)X(t) is a Gaussian process, since Gaussian random variables always have finite variances.

The Mercer spectra enables us to explicitly calculate the finite-time capacity, and furthermore, prove the “Exceed-Shannon” phenomenon. This will be demonstrated in the next section.

IV Proof of the Existence of Exceed-Shannon Phenomenon

In this section, we first give two different proofs of the existence of the Exceed-Shannon phenomenon, both in a typical case. Then we discuss the achievability of the finite-time capacity, and the compatibility with Shannon-Hartley Theorem.

IV-A Closed-Form Capacity in A Typical Case

In order to show the existence of Exceed-Shannon phenomenon, we only need to show that the finite-time capacity is greater than Shannon capacity in a typical case. Let us consider a finite-time communication scheme with a finitely-powered stationary transmitted signal autocorrelation444The signal autocorrelation RX(τ)R_{X}(\tau) is often observed in many scenarios, such as passing a signal with white spectrum through an RC lowpass filter., which is specified as

RX(t1,t2)=RX(τ)=Pexp(α|τ|),R_{X}(t_{1},t_{2})=R_{X}(\tau)=P\exp(-\alpha\lvert\tau\rvert), (14)

where τ=t1t2\tau=t_{1}-t_{2}, in AWGN channel555In this theoretical proof of “Exceed-Shannon” phenomenon, we assume the noise to be AWGN, to simplify the analytical computations. Gaussian processes of white spectrum are “immoral”, thus they can neither be power-limited, nor they can be directly sampled and numerically represented in computers. with noise PSD being n0/2n_{0}/2. The power of signal X(t)X(t) is P=RX(0)P=R_{X}(0). According to Lemma 1, the trace of the corresponding Mercer operator M()M(\cdot) is finite. Then the finite-time capacity given by Theorem 1 is also finite, as is shown in Remark 1. Finding the Mercer expansion is equivalent to finding the eigenpairs (λk,ϕk(t))k=1(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}. The eigenpairs are determined by the following characteristic integral equation [13]:

λkϕk(s)=0sPeα(st)ϕk(t)dt+sTPeα(ts)ϕk(t)dt.\lambda_{k}\phi_{k}(s)=\int_{0}^{s}{Pe^{-\alpha(s-t)}\phi_{k}(t)\mathrm{d}t}+\int_{s}^{T}{Pe^{-\alpha(t-s)}\phi_{k}(t)\mathrm{d}t}. (15)

Differentiating both sides of (15) twice with respect to ss yields the boundary conditions and the differential equation that λk,ϕk\lambda_{k},\phi_{k} must satisfy:

λkϕk′′(s)\displaystyle\lambda_{k}\phi_{k}^{\prime\prime}(s) =(α2λk2αP)ϕk(s),0<s<T,\displaystyle=(\alpha^{2}\lambda_{k}-2\alpha P)\phi_{k}(s),0<s<T, (16)
ϕk(0)\displaystyle\phi_{k}^{\prime}(0) =αϕk(0),\displaystyle=\alpha\phi_{k}(0),
ϕk(T)\displaystyle\phi_{k}^{\prime}(T) =αϕk(T).\displaystyle=-\alpha\phi_{k}(T).

Let ωk>0\omega_{k}>0 denote the resonant frequency of the above harmonic oscillator differential equation, then λk=2αP/(α2+ωk2)\lambda_{k}=2\alpha P/(\alpha^{2}+\omega_{k}^{2}), and ωk\omega_{k} must satisfy the above two boundary constraints. Let ϕk(t)=Akcos(ωkt)+Bksin(ωkt)\phi_{k}(t)=A_{k}\cos(\omega_{k}t)+B_{k}\sin(\omega_{k}t) be the sinusoidal form of the eigenfunction. Using the boundary conditions we obtain

Bkωk=αAk,\displaystyle B_{k}\omega_{k}=\alpha A_{k}, (17)
Bkωkcos(ωkT)Akωksin(ωkT)\displaystyle B_{k}\omega_{k}\cos(\omega_{k}T)-A_{k}\omega_{k}\sin(\omega_{k}T)
=α(Akcos(ωkT)+Bksin(ωkT)).\displaystyle=-\alpha\left(A_{k}\cos(\omega_{k}T)+B_{k}\sin(\omega_{k}T)\right).

To ensure the existence of solution to the homogeneous linear equations (17) with unknowns Ak,BkA_{k},B_{k}, the determinant must be zero. Exploiting this condition, we find the equation that ωk\omega_{k} must satisfy:

tan(ωkT)=2ωkαωk2α2.\tan(\omega_{k}T)=\frac{2\omega_{k}\alpha}{\omega_{k}^{2}-\alpha^{2}}. (18)
Refer to caption
Figure 2: Images of 2arctan(ω/α)2\arctan(\omega/\alpha) and kπωTk\pi-\omega T with T=2,α=1T=2,\alpha=1 and k=1,2,3,4k=1,2,3,4. The desired resonant frequencies ωk\omega_{k} can be read from the horizontal coordinates of the intersection points of the curve 2arctan(ω/α)2\arctan(\omega/\alpha) and the parallel lines.

By introducing an auxillary variable θk=arctan(ωk/α)[0,π/2]\theta_{k}=\arctan(\omega_{k}/\alpha)\in[0,\pi/2], equation (18) can be simplified as tan(ωkT)=tan(2θk)\tan(\omega_{k}T)=-\tan(2\theta_{k}), i.e., there exists a positive integer mm such that 2arctan(ωk/α)=mπωkT2\arctan(\omega_{k}/\alpha)=m\pi-\omega_{k}T. The integer mm can be chosen to be equal to kk. From the function images of 2arctan(ω/α)2\arctan(\omega/\alpha) and mπωTm\pi-\omega T (Fig. 2), we can determine ωk\omega_{k}, and then λk\lambda_{k}. To sum up, the solution to the characteristic equation (15) are collected into (19) as follows.

2arctan(ωk/α)\displaystyle 2\arctan(\omega_{k}/\alpha) =kπωkT,\displaystyle=k\pi-\omega_{k}T, (19)
λk\displaystyle\lambda_{k} =2αPα2+ωk2,\displaystyle=\frac{2\alpha P}{\alpha^{2}+\omega_{k}^{2}},
ϕk(t)\displaystyle\phi_{k}(t) =1Zk(ωkcos(ωkt)+αsin(ωkt)),\displaystyle=\frac{1}{Z_{k}}\left(\omega_{k}\cos(\omega_{k}t)+\alpha\sin(\omega_{k}t)\right),

where ZkZ_{k} denotes the normalization constants of ϕk(t)\phi_{k}(t) on [0,T][0,T] to ensure orthonormality.

Equation (19) gives all eigenpairs (λk,ϕk)k=1(\lambda_{k},\phi_{k})_{k=1}^{\infty}, from which we can calculate I(T)I(T) by applying Theorem 1. As for the Shannon capacity CshC_{\rm sh}, by applying (8) and evaluating the integral with[14], we can obtain

Csh\displaystyle C_{\rm sh} =14π+log(1+SX(ω)n0/2)dω\displaystyle=\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(\omega)}{n_{0}/2}\right)\mathrm{d}\omega} (20)
=14π+log(1+2Pαα2+ω2n0/2)dω\displaystyle=\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{\frac{2P\alpha}{\alpha^{2}+\omega^{2}}}{n_{0}/2}\right)\mathrm{d}\omega}
=(a)12(α2+4Pαn0α),\displaystyle\overset{(a)}{=}\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right),

where the evaluation of the improper integral (a)(a) is given in Appendix E.

After all the preparation works above, we can rigorously prove that C(T)>CshC(T)>C_{\rm sh} under the typical case of (14), as long as the transmission power PP is smaller than a constant δ\delta. The following Theorem 2 proves this result.

Theorem 2 (Existence of Exceed-Shannon phenomenon in a typical case)

Suppose X(t)X(t) and Y(t)Y(t) are specified according to (14). The eigenpairs are determined by (19). Then, for any fixed positive values of T,n0T,n_{0} and α\alpha, there exists a positive number δ\delta such that the Exceed-Shannon inequality

C(T):=12Tk=1+log(1+λkn0/2)>CshC(T):=\frac{1}{2T}\sum_{k=1}^{+\infty}\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)>C_{\rm sh} (21)

holds strictly for arbitrary 0<P<δ0<P<\delta.

Proof:

See Appendix C. ∎

Refer to caption
Figure 3: Theoretical verification of the Exceed-Shannon effect. The blue lines represent the finite-time mutual information I(T)I(T). The red lines are the Shannon capacities TCshTC_{\rm sh} calculated from (8). All curves are evaluated under hypothesis (14), where P=1,2,4P=1,2,4, n0=1n_{0}=1 and α=1\alpha=1.

To verify the above theoretical analysis, numerical experiments on I(T)I(T) are conducted based on evaluations of (19) and (20). As is shown in Fig. 3, it seems that we can always harness more mutual information in a finite-time observation window, compared with the Shannon capacity. Though seems impossible, this fact is somehow unsurprising because the observations 𝒀(t1N){\bm{Y}}(t_{1}^{N}) inside the finite-time window [0,T][0,T] can always eliminate some extra uncertainty outside the window due to the autocorrelation of X(t)X(t). Different from the finite-time capacity, the Shannon capacity describes the circumstance of TT\rightarrow\infty, where the fringe effect near t=0t=0 and t=Tt=T becomes negligible compared with the prolonged window period. Thus, the Shannon capacity does not take into consideration the small extra information on the fringe, causing an underestimation of the capacity. Fig. 3 also shows that, the extra capacity ΔI:=I(T)TCsh\Delta I:=I(T)-TC_{\rm sh} between the finite-time result and Shannon capacity tends to a constant as TT\rightarrow\infty. As is discussed above, the difference may come from the additional elimination of uncertainty at the fringe of the window. This asymptotically constant difference results in asymptotic linearity of the finite-time mutual information I(T)I(T) as a function of TT.

Apart from the above discussion, there is an extra interesting observation in Fig. 3, which leads to another rigourous proof of the Exceed-Shannon phenomenon. If we investigate the slope of curves I(T)I(T) at the origin, i.e., the “instant transmission rate” at the origin, we find that C(0+)>CshC(0^{+})>C_{\rm sh}. This observation is confirmed by the following theorem:

Theorem 3 (Instant Finite-Time Rate Exceeds Shannon)

Suppose X(t)X(t) and Y(t)Y(t) are specified according to (14). The eigenpairs are given by (19). Then the instantaneous finite-time information transmission rate C(0+)C(0^{+}) is given by:

C(0+)=I(T)T|T=0+=Pn0C(0^{+})=\frac{\partial I(T)}{\partial T}|_{T=0^{+}}=\frac{P}{n_{0}} (22)
Proof:

See Appendix D. ∎

From the conclusion of Theorem 3 and (20), we can reduce the Exceed-Shannon inequality C(0+)>CshC(0^{+})>C_{\rm sh} at T=0+T=0^{+} to the following inequality:

Pn0>12(α2+4Pαn0α),\frac{P}{n_{0}}>\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right), (23)

which can be directly verified by simple term-shifting and squaring on both sides. This inequality implies that the average transmission rate in the finite-time regime is strictly larger than the Shannon capacity around the origin T=0T=0.

Remark 3

Both Theorem 2 and Theorem 3 prove the existence of Exceed-Shannon phenomenon rigorously. However, their differences are:

  • Theorem 2 proves that the Exceed-Shannon inequality holds in a small power range 0<P<δ0<P<\delta, but Theorem 3 is true for arbitrary power PP.

  • Since Theorem 3 only proves C(0+)>CshC(0^{+})>C_{\rm sh}, it does not ensure the Exceed-Shannon inequality when TT becomes larger. By contrast, Theorem 2 states that the the Exceed-Shannon phenomenon exists for arbitrary T>0T>0.

Remark 4

In fact, Theorem 2 and Theorem 3 characterize the Exceed-Shannon phenomenon of the finite-time capacity from two aspects. One aspect is the observation time TT, and the other is the transmit power PP. Combining these two proofs of the theorems may result in a universal proof that is independent of the choice of parameters TT and PP, which requires further study.

The conclusion of Theorem 3 is verified numerically in Fig. 5 and Fig. 5. The blue solid lines, representing the finite-time capacity C(T)C(T), are above the red dashed lines representing the Shannon capacity, which demonstrates the Exceed-Shannon phenomenon. The C(T)C(T) curves all start at P/n0P/n_{0} when T=0+T=0^{+}, which coincides with the conclusion of Theorem 3. It can also be observed from the two figures that, for fixed values of P,n0P,n_{0}, as α\alpha increases, the transmitted signal X(t)X(t) tends to be less correlated, thus being more informative. The transmission rate is then improved. This insight also comes from the change of PSD SX(f)S_{X}(f). As α\alpha increases, the PSD becomes flatter, i.e., a wider range of bandwidth is occupied, and thus the rate increases accordingly.

Refer to caption
Figure 4: Comparison between the finite-time capacity C(T)C(T) and Shannon capacity, when α=1\alpha=1.
Refer to caption
Figure 5: Comparison between the finite-time capacity C(T)C(T) and Shannon capacity, when α=2\alpha=2.

IV-B Further Discussions on the Exceed-Shannon Phenomenon

Achievability of the finite-time capacity. It is known that for any band-limited stationary Gaussian process X(t)X(t) with PSD SX(f)S_{X}(f), one can generate signals whose PSD is exactly SX(f)S_{X}(f) by first generating a sequence X(nTs)X(nT_{s}) of sufficiently high sampling rate, and then passing the generated sequence through a shaping filter. Since the transmitted signal X(t)X(t) and its generating sequence X(nTs)X(nT_{s}) determine each other uniquely if X(t)X(t) is strictly band-limited, X(t)X(t) and X(nTs)X(nT_{s}) can be treated to contain the same amount of information. Then we can conclude that, after observing the noisy received process Y(t),t[0,T]Y(t),t\in[0,T], because of the definition of the finite-time capacity I(T)I(T), the amount of uncertainty of the underlying transmitted sequence X(nTs)X(nT_{s}) can be reduced by exactly I(T)I(T) nats. That is to say, we link the finite-time capacity with a sequence-to-sequence capacity. Thus, the finite-time mutual information I(T)I(T) is achievable by standard capacity-achieving techniques such as random coding [1], as long as the sampling instants are dense enough.

Compatibility with the Shannon-Hartley theorem. Though the Exceed-Shannon effect does imply an average data transmission rate within a finite-time window higher than that predicted by Shannon, in fact, it is still impossible to construct a long-time stable data transmission scheme above the Shannon capacity by leveraging this effect. So the Exceed-Shannon phenomenon does not contradict the Shannon-Hartley Theorem. Placing additional observation windows cannot increase the average information rate, because the received process Y(t)Y(t) observed by the subsequent additional windows has already been implicitly altered by the previous observation. The posterior process Y(t)|𝒀(t1N)Y(t)|{\bm{Y}}(t_{1}^{N}) does not carry as much information as the original one, thus causing a rate reduction in the later windows. It is expected that, the average transmission rate would ultimately decrease to the Shannon capacity as the total observation time tends to infinity (i.e., C()=CshC(\infty)=C_{\rm sh}), and the analytical proof is still worth investigation in future works.

V Conclusions

In this paper, we provided rigorous proofs of the existence of the “Exceed-Shannon” phenomenon under typical autocorrelation settings of the transmitted signal process and the noise process. Our discovery of the “Exceed-Shannon” phenomenon revealed a possible new direction of research in information theory, as it provided a generalization of Shannon’s renowned formula C=Wlog(1+S/N)C=W\log(1+S/N) to the practical finite-time communications. It shows the possibility that we can communicate at a higher-than-Shannon rate in a short time. Since the finite-time capacity is a more precise estimation of the ultimate capacity limit, the optimization target may shift from the Shannon capacity to the finite-time capacity in the design of practical communication systems. Thus, it has guiding significance for the performance improvement of modern communication systems. In future works, general proofs of C(T)>CshC(T)>C_{\rm sh}, independent of the concrete autocorrelation settings, still require further investigation. Moreover, we need to answer the question of how to exploit this Exceed-Shannon phenomenon to improve the communication rate. In addition, although we have discovered numerically that the finite-time capacity agrees with the Shannon capacity when TT\rightarrow\infty, an analytical proof of this result is required in the future.

Appendix A
Proof Of Theorem 1

Define nn-by-nn matrix 𝚽n{\bm{\Phi}}_{n} as

𝚽n=[ϕ1(t1n),ϕ2(t1n),,ϕn(t1n)],{\bm{\Phi}}_{n}=[\phi_{1}(t_{1}^{n}),\phi_{2}(t_{1}^{n}),\cdots,\phi_{n}(t_{1}^{n})],

where ti=(i1)T/n,1int_{i}=(i-1)T/n,1\leq i\leq n. According to the definition of this matrix, the following relation holds:

(Tn𝚽nT𝚽n)ij\displaystyle\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}\right)_{ij} =Tnk=1nϕi(tk)ϕj(tk)\displaystyle=\frac{T}{n}\sum_{k=1}^{n}{\phi_{i}(t_{k})\phi_{j}(t_{k})} (24)
0Tϕi(t)ϕj(t)dt=δij.\displaystyle\rightarrow\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}.

This implies that the matrix 𝚽n{\bm{\Phi}}_{n} satisfies the property of asymptotic orthogonality:

Tn𝚽nT𝚽nIn20,\|\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}-I_{n}\|_{2}\rightarrow 0, (25)

and the matrix 𝚽n{\bm{\Phi}}_{n} can asymptotically diagonalize 𝑲X=(RX(ti,tj))i,j=1n{\bm{K}}_{X}=\left(R_{X}(t_{i},t_{j})\right)_{i,j=1}^{n} because of the eigenvalue property (9):

𝔼[T2n2𝚽nT𝑿(t1n)𝑿T(t1n)𝚽n]\displaystyle\mathbb{E}\left[\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{X}}(t_{1}^{n}){\bm{X}}^{\mathrm{T}}(t_{1}^{n}){\bm{\Phi}}_{n}\right] =T2n2𝚽nT𝑲X𝚽n\displaystyle=\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n} (26)
diag(λ1,λ2,,λn).\displaystyle\rightarrow\mathrm{diag}(\lambda_{1},\lambda_{2},\cdots,\lambda_{n}).

Next, we investigate the noise realizations on sampling instants tit_{i}: For AWGN noise, the instantaneous power is \infty, i.e., 𝔼[N(tj)2]=\mathbb{E}[N(t_{j})^{2}]=\infty, so it is necessary to assume that the AWGN noise is sampled after passing a rectangular-shaped impulse response filter ξ(t)\xi(t) with pulse width T/nT/n and gain n/Tn/T. This assumption is reasonable, since the filter ξ(t)\xi(t) tends to an ideal sampler δ(t)\delta(t) as nn\to\infty. Under this hypothesis, the noise variance for each sample can be calculated as

𝔼[(0T/nN(t)ξ(Tnt)dt)2]\displaystyle\mathbb{E}\left[\left(\int_{0}^{T/n}{N(t)\xi\left(\frac{T}{n}-t\right)\mathrm{d}t}\right)^{2}\right] (27)
=0T/ndt1dt2𝔼[N(t1)N(t2)]ξ(t1)ξ(t2)\displaystyle=\iint_{0}^{T/n}{\mathrm{d}t_{1}\mathrm{d}t_{2}}{\mathbb{E}\left[N(t_{1})N(t_{2})\right]\xi\left(t_{1}\right)\xi\left(t_{2}\right)}
=(a)0T/ndt1dt2n02δ(t1t2)ξ(t1)ξ(t2)\displaystyle\overset{(a)}{=}\iint_{0}^{T/n}{\mathrm{d}t_{1}\mathrm{d}t_{2}}{\frac{n_{0}}{2}\delta(t_{1}-t_{2})\xi\left(t_{1}\right)\xi\left(t_{2}\right)}
=n020T/nξ2(t)dt\displaystyle=\frac{n_{0}}{2}\int_{0}^{T/n}{\xi^{2}(t)\mathrm{d}t}
=n02nT.\displaystyle=\frac{n_{0}}{2}\frac{n}{T}.

Note that the equality (a)(a) holds since the noise autocorrelation is n02δ(t1t2)\frac{n_{0}}{2}\delta(t_{1}-t_{2}). In this way, the mutual information within window [0,T][0,T] can be calculated as

I(T)=\displaystyle I(T)= 12limnlog(det(𝑲X+nn02T𝑰n)det(nn02T𝑰n))\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\left(\frac{\det({\bm{K}}_{X}+\frac{nn_{0}}{2T}{\bm{I}}_{n})}{\det\left(\frac{nn_{0}}{2T}{\bm{I}}_{n}\right)}\right) (28)
=\displaystyle= 12limnlogdet(𝑰n+2Tnn0𝑲X)\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left({\bm{I}}_{n}+\frac{2T}{nn_{0}}{\bm{K}}_{X}\right)
=(b)\displaystyle\overset{(b)}{=} 12limnlogdet(Tn𝚽nT𝚽n+1n0/2T2n2𝚽nT𝑲X𝚽n)\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}+\frac{1}{n_{0}/2}\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right)
=(c)\displaystyle\overset{(c)}{=} 12limnlogdet(𝑰n+1n0/2diag(λ1,,λn))\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left({\bm{I}}_{n}+\frac{1}{n_{0}/2}\mathrm{diag}(\lambda_{1},\cdots,\lambda_{n})\right)
=\displaystyle= 12k=1+log(1+λkn0/2),\displaystyle\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)},

where (b)(b) comes from sandwiching the determinant in the bracket with both the asymptotically orthogonal matrix T/n𝚽n\sqrt{T/n}{\bm{\Phi}}_{n} on the left and its transpose on the right, and (c)(c) comes from plugging (24) and (26) into the previous step.

Appendix B
Proof Of Lemma 1

Tracing both left and right hand sides of (26) and let nn\rightarrow\infty, we obtain

tr(M)\displaystyle\mathrm{tr}(M) =limntr(T2n2𝚽nT𝑲X𝚽n)\displaystyle=\lim_{n\rightarrow\infty}\mathrm{tr}\left(\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right) (29)
=limn(Tn×tr(Tn𝚽nT𝑲X𝚽n))\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\times\mathrm{tr}\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right)\right)
=limn(Tntr(𝑲X))\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\mathrm{tr}\left({\bm{K}}_{X}\right)\right)
=limn(Tn×nP)\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\times nP\right)
=PT,\displaystyle=PT,

which completes the proof of Lemma 1.

Appendix C
Proof Of Theorem 2

Plugging (20) into the right-hand side of (21), and differentiate both sides w.r.t PP. Notice that if P=0P=0, then both sides of (21) are equal to 0. Thus, we only need to prove that the derivative of left-hand side is strictly larger than that of right-hand side within a small interval P(0,δ)P\in(0,\delta):

12k=1+(11+2λkn02λkn0P)>Tn011+4P/(n0α).\frac{1}{2}\sum_{k=1}^{+\infty}{\left(\frac{1}{1+\frac{2\lambda_{k}}{n_{0}}}\frac{2\lambda_{k}}{n_{0}P}\right)}>\frac{T}{n_{0}}\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (30)

Multiply both sides of (30) by n0n_{0} and define μk:=λk/(PT)\mu_{k}:=\lambda_{k}/(PT), and then from Lemma 1 we obtain kμk=1\sum_{k}{\mu_{k}}=1. In this way, (30) is equivalent to

k=1+μk1+2λkn0>11+4P/(n0α).\sum_{k=1}^{+\infty}{\frac{\mu_{k}}{1+\frac{2\lambda_{k}}{n_{0}}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (31)

Since φ(x):=1/(1+2x/n0)\varphi(x):=1/(1+2x/n_{0}) is convex on (0,+)(0,+\infty), by applying Jensen’s inequality to the left-hand side of (31), we only need to prove that

11+2n0kλkμk>11+4P/(n0α).\frac{1}{1+\frac{2}{n_{0}}\sum_{k}{\lambda_{k}\mu_{k}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}. (32)

From the definition of μk\mu_{k} we can derive that λkμk=λk2/(PT)\lambda_{k}\mu_{k}=\lambda_{k}^{2}/(PT). So we go on to calculate kλk2\sum_{k}{\lambda_{k}^{2}}. That is equivalent to calculate tr(M2)\mathrm{tr}(M^{2}), where M2M^{2} corresponds to the integral kernel:

KM2(t1,t2):=0TP2exp(α|t1s|)exp(α|t2s|)ds.K_{M^{2}}(t_{1},t_{2}):=\int_{0}^{T}{P^{2}\exp(-\alpha\lvert t_{1}-s\rvert)\exp(-\alpha\lvert t_{2}-s\rvert)\mathrm{d}s}. (33)

Evaluating the kernel KM2K_{M^{2}} on the diagonal t=t1=t2t=t_{1}=t_{2} yields

KM2(t,t)\displaystyle K_{M^{2}}(t,t) =P20Texp(2α|ts|)ds,\displaystyle=P^{2}\int_{0}^{T}{\exp(-2\alpha\lvert t-s\rvert)\mathrm{d}s}, (34)
=P22α(2exp(2αt)exp(2α(Tt))).\displaystyle=\frac{P^{2}}{2\alpha}(2-\exp(-2\alpha t)-\exp(-2\alpha(T-t))).

Integrating this kernel on the diagonal of [0,T]2[0,T]^{2} gives kλk2=tr(M2)\sum_{k}{\lambda_{k}^{2}}=\mathrm{tr}(M^{2}):

kλk2\displaystyle\sum_{k}{\lambda_{k}^{2}} =P22α0T(2e2αte2α(Tt))dt,\displaystyle=\frac{P^{2}}{2\alpha}\int_{0}^{T}{\left(2-e^{-2\alpha t}-e^{-2\alpha(T-t)}\right)\mathrm{d}t}, (35)
=P2α(T12α(1e2αT)).\displaystyle=\frac{P^{2}}{\alpha}\left(T-\frac{1}{2\alpha}(1-e^{-2\alpha T})\right).

By substituting (35) into (32), we just need to prove that

1+4P/(n0α)>1+2Pn0α(11e2αT2αT).\sqrt{1+4P/(n_{0}\alpha)}>1+\frac{2P}{n_{0}\alpha}\left(1-\frac{1-e^{-2\alpha T}}{2\alpha T}\right). (36)

Define the dimensionless number x=2P/(n0α)x=2P/(n_{0}\alpha). Since the function ψ(x):=(1exp(x))/x\psi(x):=(1-\exp(-x))/x is strictly positive and less than 1 at x>0x>0, we can conclude that, there exists a small positive δ>0\delta>0 such that (36) holds for 0<P<δ0<P<\delta. The number δ\delta can be chosen as

δ=n0αψ(2αT)(1ψ(2αT))2>0,\delta=\frac{n_{0}\alpha\psi(2\alpha T)}{(1-\psi(2\alpha T))^{2}}>0, (37)

which implies that (30) holds for any 0<P<δ0<P<\delta. Thus, integrating (30) on both sides from p=0p=0 to p=P,P<δp=P,P<\delta gives rise to the conclusion (21), which completes the proof of Theorem 2.

Appendix D
Proof Of Theorem 3

Differentiating the finite-time capacity expression for I(T)I(T), i.e. (12), with respect to TT, then we obtain

I(T)T|T=0+=limT0+12k=1+1(n0/2)(1+2λk/n0)λkT,\frac{\partial I(T)}{\partial T}|_{T=0^{+}}=\lim_{T\rightarrow 0^{+}}\frac{1}{2}\sum_{k=1}^{+\infty}{\frac{1}{(n_{0}/2)(1+2\lambda_{k}/n_{0})}\frac{\partial\lambda_{k}}{\partial T}}, (38)

where limT0+k=1+λkT\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}, by applying Lemma 1, can be expressed as

limT0+k=1+λkT\displaystyle\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T} =limT0+Tk=1+λk\displaystyle=\lim_{T\rightarrow 0^{+}}\frac{\partial}{\partial T}\sum_{k=1}^{+\infty}{\lambda_{k}} (39)
=T(PT)|T=0+\displaystyle=\frac{\partial}{\partial T}(PT)|_{T=0^{+}}
=P.\displaystyle=P.

Since ωk+\omega_{k}\uparrow+\infty as T0+T\downarrow 0^{+}, we can safely conclude that λk0+\lambda_{k}\downarrow 0^{+}. From Dirichlet’s test, the series in (38) converges uniformly. Thus, by interchanging the infinite sum and the limit operation, we obtain

I(T)T|T=0+\displaystyle\frac{\partial I(T)}{\partial T}|_{T=0^{+}} =12k=1+limT0+(1(n0/2)(1+2λk/n0)λkT)\displaystyle=\frac{1}{2}\sum_{k=1}^{+\infty}{\lim_{T\rightarrow 0^{+}}\left(\frac{1}{(n_{0}/2)(1+2\lambda_{k}/n_{0})}\frac{\partial\lambda_{k}}{\partial T}\right)} (40)
=1n0k=1+λkT|T=0+\displaystyle=\frac{1}{n_{0}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}|_{T=0^{+}}
=1n0limT0+k=1+λkT\displaystyle=\frac{1}{n_{0}}\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}
=Pn0,\displaystyle=\frac{P}{n_{0}},

which completes the proof of Theorem 3.

Appendix E
The Evaluation of The Improper Integral (20)

Define the improper inegral with parameters P>0P>0 and α>0\alpha>0:

J(P,α):=+log(1+Pα2+ω2)dω.J(P,\alpha):=\int_{-\infty}^{+\infty}{\log\left(1+\frac{P}{\alpha^{2}+\omega^{2}}\right){\rm d}\omega}. (41)

By taking the partial derivative of J(P,α)J(P,\alpha) with respect to PP, we obtain

J(P,α)P=+1ω2+(α2+P)dω.\frac{\partial J(P,\alpha)}{\partial P}=\int_{-\infty}^{+\infty}{\frac{1}{\omega^{2}+(\alpha^{2}+P)}{\rm d}\omega}. (42)

Note that the analytic function defined as

f(z):=1z2+(α2+P),f(z):=\frac{1}{z^{2}+(\alpha^{2}+P)}, (43)

has residual

Res[f(z),z=zp]=12iα2+P,{\rm Res}\left[f(z),z=z_{p}\right]=\frac{1}{2{\rm i}\sqrt{\alpha^{2}+P}}, (44)

at pole zp=iα2+Pz_{p}={\rm i}\sqrt{\alpha^{2}+P} in the upper half-plane, thus the integral in (42) can be evaluated by the residual theorem:

J(P,α)P\displaystyle\frac{\partial J(P,\alpha)}{\partial P} =2πiRes[f(z),z=iα2+P]\displaystyle=2\pi{\rm i}{\rm Res}\left[f(z),z={\rm i}\sqrt{\alpha^{2}+P}\right] (45)
=πα2+P.\displaystyle=\frac{\pi}{\sqrt{\alpha^{2}+P}}.

Since J(0,α)0J(0,\alpha)\equiv 0, by integrating (42) with respect to PP from 0 to PP yields

J(P,α)\displaystyle J(P,\alpha) =0Pπα2+pdp\displaystyle=\int_{0}^{P}\frac{\pi}{\sqrt{\alpha^{2}+p}}{\rm d}p (46)
=2πα2+p|p=0P\displaystyle=2\pi\sqrt{\alpha^{2}+p}|_{p=0}^{P}
=2π(α2+Pα).\displaystyle=2\pi\left(\sqrt{\alpha^{2}+P}-\alpha\right).

Then the integral in (20) can be calculated by setting P4Pα/n0P\leftarrow 4P\alpha/n_{0} and αα\alpha\leftarrow\alpha in (46):

14π+log(1+2Pαα2+ω2n0/2)dω\displaystyle\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{\frac{2P\alpha}{\alpha^{2}+\omega^{2}}}{n_{0}/2}\right)\mathrm{d}\omega} (47)
=2π4π(α2+4Pαn0α)\displaystyle=\frac{2\pi}{4\pi}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right)
=12(α2+4Pαn0α),\displaystyle=\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right),

which completes the proof.

References

  • [1] C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Techni. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
  • [2] H. Landau, “Sampling, data transmission, and the nyquist rate,” Proc. IEEE, vol. 55, no. 10, pp. 1701–1706, Oct. 1967.
  • [3] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans. American Institute of Electrical Engineers, vol. 47, no. 2, pp. 617–644, Apr. 1928.
  • [4] D. Slepian, “On bandwidth,” Proc. IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976.
  • [5] T. M. Cover, Elements of information theory.   John Wiley & Sons, 1999.
  • [6] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
  • [7] J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Philos. Trans. Royal Soc., vol. 209, pp. 4–415, 1909.
  • [8] K. Zhu, Operator theory in function spaces.   American Mathematical Soc., 2007, no. 138.
  • [9] A. Balakrishnan, “A note on the sampling principle for continuous signals,” IRE Trans. Inf. Theory, vol. 3, no. 2, pp. 143–146, Jun. 1957.
  • [10] J. Barrett and D. Lampard, “An expansion for some second-order probability distributions and its application to noise problems,” IRE Trans. Inf. Theory, vol. 1, no. 1, pp. 10–15, Mar. 1955.
  • [11] S. S. Kelkar, L. L. Grigsby, and J. Langsner, “An extension of parseval’s theorem and its use in calculating transient energy in the frequency domain,” IEEE Trans. Ind. Electron., vol. IE-30, no. 1, pp. 42–45, Feb. 1983.
  • [12] C. Brislawn, “Kernels of trace class operators,” American Math. Soc., vol. 104, no. 4, 1988.
  • [13] D. Cai and P. S. Vassilevski, “Eigenvalue problems for exponential-type kernels,” Comput. Methods in Applied Math., vol. 20, no. 1, pp. 61–78, Jan. 2020.
  • [14] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, series, and products.   Academic press, 2014.