Finite-Time Capacity: Making
Exceed-Shannon Possible?

Jieao Zhu, Zijian Zhang, Zhongzhichao Wan, and Linglong Dai All authors are with the Beijing National Research Center for Information Science and Technology (BNRist) as well as the Department of Electronic Engineering, Tsinghua University, Beijing 100084, China (E-mails: zja21, zhangzj20, [email protected]; [email protected]). This work was supported in part by the National Key Research and Development Program of China (Grant No.2020YFB1807201), in part by the National Natural Science Foundation of China (Grant No. 62031019), and in part by the European Commission through the H2020-MSCA-ITN META WIRELESS Research Project under Grant 956256.

Abstract

Shannon-Hartley theorem can accurately calculate the channel capacity when the signal observation time is infinite. However, the calculation of finite-time capacity, which remains unknown, is essential for guiding the design of practical communication systems. In this paper, we investigate the capacity between two correlated Gaussian processes within a finite-time observation window. We first derive the finite-time capacity by providing a limit expression. Then we numerically compute the maximum transmission rate within a single finite-time window. We reveal that the number of bits transmitted per second within the finite-time window can exceed the classical Shannon capacity, which is called as the Exceed-Shannon phenomenon. Furthermore, we derive a finite-time capacity formula under a typical signal autocorrelation case by utilizing the Mercer expansion of trace class operators, and reveal the connection between the finite-time capacity problem and the operator theory. Finally, we analytically prove the existence of the Exceed-Shannon phenomenon in this typical case, and demonstrate the achievability of the finite-time capacity and its compatibility with the classical Shannon capacity.

Index Terms:

Finite-time capacity, Exceed-Shannon, Mercer expansion, signal autocorrelation, operator theory.

I Introduction

The Shannon-Hartley theorem [1] has accurately revealed the fundamental theoretical limit of information transmission rate $C$ , which is also called as the Shannon capacity, over a Gaussian waveform channel of a limited bandwidth $W$ . The expression for Shannon capacity is $C=W\log\left(1+S/N\right)$ , where $S$ and $N$ denote the signal power and the noise power, respectively. The derivation of Shannon-Hartley Theorem heavily depends on the Nyquist sampling principle [2]. The Nyquist sampling principle, which is also named as the $2WT$ theorem [3], claims that one can only obtain $2WT+o(2WT)$ independent samples within an observation time window $T$ in a channel band-limited to $W$ [4], where $o(\cdot)$ means higher-order infinitesimal, i.e., $o(2WT)/T\to 0,T\to\infty$ .

Based on the Nyquist sampling principle, the Shannon capacity is derived by multiplying the capacity $1/2\log(1+P/N)$ of a Gaussian symbol channel [5, p.249] with $2WT+o(2WT)$ at first, and then dividing the result by $T$ , finally letting $T\rightarrow\infty$ . In the above derivation, $(2WT+o(2WT))/T$ is approximated by $2W$ in the final step to obtain the Shannon capacity. Note that this approximation only holds when $T\to\infty$ . Therefore, the Shannon capacity only asymptotically holds as $T$ becomes sufficiently large. When $T$ is of finite value, the approximation fails to work. Thus, when the observation time $T$ is finite, i.e., the received signal can only be observed within a finite-time window $[0,T]$ , Shannon-Hartley Theorem cannot be directly applied to calculate the capacity in a finite-time window. To the best of our knowledge, the evaluation of the finite-time capacity has not yet been investigated in the literature. One possible reason is that, most of the researchers mainly focused on how to approximate the Shannon capacity with advanced coding and modulation schemes. It is worth noting that any real-world communication systems transmit signals in a finite-time window, thus evaluating the finite-time capacity is of practical significance.

In this paper, to fill in this gap, the finite-time capacity instead of the traditional infinite-time counterpart is analyzed, where we reveal and prove the existence of “Exceed-Shannon” phenomenon within a finite-time observation window¹¹1Simulation codes will be provided to reproduce the results presented in this paper: http://oa.ee.tsinghua.edu.cn/dailinglong/publications/publications.html.. Specifically, our contributions are summarized as follows:

•

We derive the capacity expressions within a finite-time observation window by using dense sampling and limiting methods. In this way, we can overcome the continuous difficulties that appear when analyzing the information contained in a continuous time interval. These finite-time capacity expressions make the analysis of finite-time capacity problems possible.
•

We approximate the original continuous finite-time capacity expressions by discrete matrices, and conduct numerical experiments based on the discretized formulas. In the numerical results under a special setting, we reveal the “Exceed-Shannon” phenomenon²²2In fact, the finite-time “Exceed-Shannon” phenomenon revealed in this paper does not contradict the classical infinite-time Shannon-Hartley theorem, since new assumptions are considered. Specifically, in the Shannon-Hartley theorem, the sampling time is assumed to be infinitely long, while in this paper, the sampling takes place in a finite-time observation window. Similarly, although compressed sensing[6] can achieve much lower sampling rate than the Nyquist sampling rate to perform accurate sparse signal reconstruction, it does not contradict the Nyquist sampling principle due to the new assumption of signal sparsity., i.e., the mutual information within a finite-time observation window exceeds the Shannon capacity.
•

In order to analytically prove the revealed “Exceed-Shannon” phenomenon, we first derive an analytical finite-time capacity formula based on Mercer expansion [7], where we can find the connection between the capacity problem and the operator theory [8]. To make the problem tractable, we construct a typical case in which the transmitted signal has certain statistical properties. Utilizing this construction, we obtain a closed-form capacity solution in this typical case, which leads to a rigorous proof of the “Exceed-Shannon” phenomenon. Inspired by the techniques in the proof, we find that the finite-time capacity is, in fact, a more general case of the Shannon limit, thus the “Exceed-Shannon” phenomenon of the finite-time capacity is compatible with the classical Shannon theory.

Organization: In the rest of this paper, the finite-time capacity is formulated and evaluated numerically in Section II, where the “Exceed-Shannon” phenomenon is first discovered. Then, in Section III, we derive a closed-form finite-time capacity formula under a typical case. Based on this formula, in Section IV, the “Exceed-Shannon” phenomenon is rigorously proved. Finally, conclusions are drawn in Section V.

Notations: $X(t)$ denotes a Gaussian process; $R_{X}(t_{1},t_{2})$ denotes the autocorrelation function; $S_{X}(f),S_{X}(\omega)$ are the power spectral density (PSD) of the corresponding process $X(t)$ , where $\omega=2\pi f$ ; Boldface italic symbols $\bm{X}(t_{1}^{n})$ denotes the column vector generated by taking samples of $X(t)$ on instants $t_{i},1\leq i\leq n$ ; Upper-case boldface letters such as ${\bf{\Phi}}$ denote matrices; $\mathbb{E\left[\cdot\right]}$ denotes the expectation; $\mathbbm{1}_{A}(\cdot)$ denotes the indicator function of the set $A$ ; $\mathcal{L}^{2}([0,T])$ denotes the collection of all the square-integrable functions on window $[0,T]$ ; ${\rm i}$ denotes the imaginary unit.

II Numerical Analysis of the Finite-Time Capacity

In this section, we focus on the numerical evaluation of the finite-time capacity. In Subsection II-A, we model the transmission problem by Gaussian processes, and derive the capacity expressions within a finite-time observation window by using dense sampling and limiting methods; In Subsection II-B, we approximate the finite-time capacity by discretized matrix-based formulas; In Subsection II-C, we reveal the “Exceed-Shannon” phenomenon by numerically evaluating the finite-time capacity in a special setting of the signal autocorrelations.

II-A The Expressions for Finite-Time Capacity

The finite-time capacity is, heuristically, defined as the maximum number of bits that can be successfully transmitted within a finite-time window. Since Shannon capacity is defined on pairs of random variables, it is crucial to introduce randomness into the transmission model. Inspired by [9], we model the transmitted signal by a zero-mean stationary Gaussian stochastic process, denoted as $X(t)$ , and the received signal by $Y(t):=X(t)+N(t)$ . The process $N(t)$ , which denotes the noise, is also a stationary Gaussian process independent of $X(t)$ . The receiver is only allowed to observe the signal within finite-time window $[0,T]$ , where $T>0$ is the observation window span. Our goal is to find the maximum number of bits that can be acquired within this time window.

To analytically express the amount of the acquired information, we first introduce $n$ sampling instants inside the time window, denoted by $(t_{1},t_{2},\cdots,t_{n}):=t_{1}^{n}$ , and then let $n\rightarrow\infty$ to approximate the finite-time capacity³³3In this paper, we do not explicitly distinguish between the terms “finite-time mutual information” and “finite-time capacity”, since we consider communication schemes where the source autocorrelation is fixed.. This approximation of capacity becomes more precise as the sampling instants $t_{1}^{n}$ become denser. Then by defining ${\bm{X}}(t_{1}^{n})\equiv(X(t_{1}),X(t_{2}),\cdots,X(t_{n}))$ and ${\bm{Y}}(t_{1}^{n})\equiv(Y(t_{1}),Y(t_{2}),\cdots,Y(t_{n}))$ , the capacity on these $n$ samples can be expressed as

I(t_{1}^{n})=I({\bm{X}}(t_{1}^{n});{\bm{Y}}(t_{1}^{n})),

(1)

and the finite-time capacity is defined as

I(T)=\lim_{n\rightarrow\infty}\sup_{\{t_{1}^{n}\}\subset[0,T]}{I(t_{1}^{n})}.

(2)

Then, the transmission rate $C(T)$ can be defined by dividing the amount of information acquired within $[0,T]$ by the time span $T$ :

	$\displaystyle C(t_{1}^{n})$	$\displaystyle=I(t_{1}^{n})/T,$		(3)
	$\displaystyle C(T)$	$\displaystyle=I(T)/T.$		(3)

From these definitions, we can define the limit capacity as $C(\infty)=\lim_{T\rightarrow\infty}{C(T)}$ by letting $T\rightarrow\infty$ . The quantity $C(\infty)$ characterizes the maximum average number of bits per second one can acquire from a received noisy stochastic process.

II-B Discretization

Without loss of generality, we fix the sampling instants uniformly onto fractions of $T$ : $t_{i}=(i-1)T/n,1\leq i\leq n$ . Since the random vectors ${\bm{X}}(t_{1}^{n})$ and ${\bm{Y}}(t_{1}^{n})$ are samples of a Gaussian process, they are both Gaussian random vectors with mean zero and covariance matrices ${\bm{K}}_{X}$ and ${\bm{K}}_{Y}$ , where ${\bm{K}}_{X},{\bm{K}}_{Y}\in\mathbb{R}^{n\times n}$ are symmetric positive-definite matrices. The entries of ${\bm{K}}_{X}$ and ${\bm{K}}_{Y}$ are determined by the autocorrelation functions of Gaussian processes $X(t)$ and $Y(t)$ , denoted by $R_{X}(t_{1},t_{2})$ and $R_{Y}(t_{1},t_{2})$ :

	$\displaystyle({\bm{K}}_{X})_{i,j}$	$\displaystyle=R_{X}(t_{i},t_{j}):=\mathbb{E}\left[X(t_{i})X(t_{j})\right],$		(4)
	$\displaystyle({\bm{K}}_{Y})_{i,j}$	$\displaystyle=R_{Y}(t_{i},t_{j}):=\mathbb{E}\left[Y(t_{i})Y(t_{j})\right].$		(4)

Note that $Y(t)$ is the independent sum of $X(t)$ and $N(t)$ , thus the autocorrelation functions satisfy $R_{Y}(t_{1},t_{2})=R_{X}(t_{1},t_{2})+R_{N}(t_{1},t_{2})$ , and similarly the covariance matrices satisfy ${\bm{K}}_{Y}={\bm{K}}_{X}+{\bm{K}}_{N}$ .

The mutual information $I(t_{1}^{n})$ is defined as $I(t_{1}^{n})=h({\bm{Y}}(t_{1}^{n}))-h({\bm{Y}}(t_{1}^{n})|{\bm{X}}(t_{1}^{n}))=h({\bm{Y}}(t_{1}^{n}))-h({\bm{N}}(t_{1}^{n}))$ , where $h(\cdot)$ denotes the differential entropy. For $n$ -dimentional Gaussian vector ${\bm{U}}$ with mean 0 and covariance matrix ${\bm{K}}$ , the differential entropy is given by

h({\bm{U}})=\frac{1}{2}\log\left((2\pi e)^{n}\det({\bm{K}})\right).

(5)

Plugging (5) into the definition of $I(t_{1}^{n})$ , we obtain

I(t_{1}^{n})=\frac{1}{2}\log\left(\frac{\det({\bm{K}}_{X}+{\bm{K}}_{N})}{\det{({\bm{K}}_{N})}}\right).

(6)

In (6), by letting $n\rightarrow\infty$ , we can find that $I(t_{1}^{n})$ increases monotonously when $n$ doubles, because of the data processing inequality. Though without rigorous proof, we can assume with confidence that $I(t_{1}^{n})$ is an increasing function of $n$ . However, it remains unknown whether $I(t_{1}^{n})$ tends to a finite limit. In fact, $I(t_{1}^{n})$ can be arbitrarily large, since the signal outside the noise band is strictly unpolluted by the noise, which results in infinite SNR. Thus, the capacity will diverge to infinity. Therefore, in order to avoid capacity divergence, at least one of the following conditions should be satisfied:

•

The noise process $N(t)$ is not band-limited.
•

The power spectral density of $X(t)$ is strictly contained inside the band of $N(t)$ .

Thus, in the following numerical analysis, we choose $N(t)$ to be band-unlimited. This leads to the choice of reasonable autocorrelation functions of $X(t)$ and $N(t)$ in the following subsection.

II-C Numerical Analysis

In order to study the properties of mutual information $I(t_{1}^{n})$ as a function of $n$ , we perform numerical analyses under different values of $n$ and $T$ . The autocorrelation function and PSD of the signal process $X(t)$ and noise process $N(t)$ are set to the special case

$\displaystyle R_{X}(t_{1},t_{2})$	$\displaystyle=\mathrm{sinc}(10(t_{1}-t_{2})),$	(7)
$\displaystyle R_{N}(t_{1},t_{2})$	$\displaystyle=\exp(-\lvert t_{1}-t_{2}\rvert),$
$\displaystyle S_{X}(f)$	$\displaystyle=0.1\times\mathbbm{1}_{\{-5\leq f\leq 5\}},$
$\displaystyle S_{N}(f)$	$\displaystyle=\frac{2}{1+(2\pi f)^{2}},$

where $\mathrm{sinc}(x):=\sin(\pi x)/(\pi x)$ . Note that the PSD of the transmitted process $S_{X}(f)$ is strictly band-limited, while the PSD of the noise process is not. In fact, the noise PSD is carefully selected to ensure the received noise has finite power on each instant $t_{i}$ , allowing the execution of numerical computations. A finitely powered process PSD must be colored, in contrast to additive white Gaussian noise (AWGN) with white PSD and infinite power. That is the reason why we choose $S_{N}(f)$ to be the form of (7).

In order to compare the finite-time capacity with the classical Shannon capacity, we have to calculate the Shannon capacity with colored noise spectrum $S_{N}(f)$ , which is a generalized version of the well-known formula $C=W\log\left(1+S/N\right)$ . The Shannon capacity $C_{\rm sh}$ of colored noise PSD [5], measured in ${\rm nat/s}$ , is expressed as

C_{\rm sh}:=\frac{1}{2}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(f)}{S_{N}(f)}\right)\mathrm{d}f}\quad[{\rm nat/s}].

(8)

Then, plugging (7) into (8) yields the numerical result for $C_{\rm sh}$ .

In the numerical analysis, we calculate the finite-time transmission rate $C(T)$ and Shannon capacity against the number of samples $n$ within the observation window $[0,T]$ . The numerical results are collected in Fig. 1. It is shown that $I(t_{1}^{n})$ is an increasing function of $n$ , and for fixed values of $T$ , the approximated finite-time capacity $I(t_{1}^{n})$ tends to a finite limit under the correlation assumptions given by (7). The most amazing observation is that, we can obtain more information within finite-time window $[0,T]$ than the prediction $TC_{\rm sh}$ given by the Shannon capacity (8). We call this phenomenon the “Exceed-Shannon” phenomenon.

Refer to caption — Figure 1: A first glance to the “Exceed-Shannon” phenomenon, where the red dashed horizontal line is the Shannon capacity, and the $T=1,2,8$ curves illustrate the dependence of $C(t_{1}^{n})$ on $n$ .

To analytically verify the existence of the Exceed-Shannon phenomenon, we are going to introduce some mathematical tools in the following Section III, and finally give an analytical proof in Section IV.

III A Closed-Form Finite-Time Capacity Formula

In this section, we first introduce the Mercer expansion in Subsection III-A as a basic tool for our analysis. Then we derive the series representation of the finite-time capacity, and the corresponding power constraint in Subsection III-B, under the assumption of AWGN noise. The power constraint shows that the finite-time capacity is upper-bounded, thus the series expansion of the finite-time capacity converges absolutely.

III-A The Mercer Expansion

Motivated by the discovery of the Exceed-Shannon phenomenon, we go further into the underlying mechanism behind this fact. Since the calculation of (6) depends on the evaluation of determinants that are determined by autocorrelation functions of Gaussian processes, it is possible to obtain $I(t_{1}^{n})$ and $I(T)$ directly from the autocorrelation functions. In fact, if we know the Mercer expansion [7] of the autocorrelation function $R_{X}(t_{1},t_{2})$ on window $[0,T]$ , then we can calculate $h({\bm{X}}(t_{1}^{n}))$ more easily [10]. In the following discussion, we assume the Mercer expansion of the source autocorrelation function $R_{X}(t_{1},t_{2})$ to be in the following form

\lambda_{k}\phi_{k}(t_{1})=\int_{0}^{T}{R_{X}(t_{1},t_{2})\phi_{k}(t_{2})\mathrm{d}t_{2}};k>0,k\in\mathbb{N}.

(9)

Due to the positive-definite property of the integral kernel $R_{X}(t_{1},t_{2})$ , the eigenvalues are strictly positive: $\lambda_{k}>0$ , and the eigenfunctions form an orthonormal set:

\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}.

(10)

The Mercer’s theorem [7] ensures the existence and uniqueness of the eigenpairs $(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}$ , and furthermore, the kernel itself can be expanded under the eigenfunctions. The convergence is absolute and uniform:

R_{X}(t_{1},t_{2})=\sum_{k=1}^{+\infty}{\lambda_{k}\phi_{k}(t_{1})\phi_{k}(t_{2})}.

(11)

The Mercer expansion enables us to analytically express an autocorrelation function on a finite-time interval $[0,T]$ , since the autocorrelation function can be naturally treated as a positive-definite integral kernel.

III-B Finite-Time Capacity Formula

Based on Mercer expansion, we can obtain a closed-form formula in the following Theorem 1.

Theorem 1 (Source expansion, AWGN noise)

Suppose the information source, modeled by the transmitted process $X(t)$ , has autocorrelation function $R_{X}(t_{1},t_{2})$ . An AWGN noise of PSD $n_{0}/2$ is imposed onto $X(t)$ , resulting in the received process $Y(t)$ . The Mercer expansion of $R_{X}(t_{1},t_{2})$ on $[0,T]$ is given by (9), satisfying (10). Then the finite-time mutual information $I(T)$ within the observation window $[0,T]$ between the processes $X(t)$ and $Y(t)$ can be expressed as

I(T)=\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)}.

(12)

Proof:

See Appendix A. ∎

From Theorem 1, we can conclude that the finite-time capacity of AWGN channel is uniquely determined by the Mercer spectra $\lambda_{k}$ of $R_{X}(t_{1},t_{2})$ within $[0,T]$ . However, it remains unknown whether the series representation (12) converges. In fact, the convergence is closely related to the signal power. In Fourier transform, the power in the time domain is equal to the power in the frequency domain, which is known as the Parseval’s theorem[11]. Like the Fourier transform, the transform defined by the orthonormal basis $\{\phi_{k}(t)\}_{k=1}^{\infty}$ also satisfy the Parseval’s theorem. This observation leads to a theoretic verification of power conservation in the view of $\lambda_{k}$ , which is stated in the following Lemma 1.

Lemma 1 (Operator Trace Coincide with Power Constraint)

Given stationary Gaussian process $X(t)$ with mean zero and autocorrelation $R_{X}(t_{1},t_{2})$ . The Mercer expansion of $R_{X}(t_{1},t_{2})$ on $[0,T]$ is given by (9), satisfying (10). The Mercer operator $M(\cdot):\mathcal{L}^{2}([0,T])\rightarrow\mathcal{L}^{2}([0,T])$ is defined by the integral $(M\phi)(s)=\int_{0}^{T}{R_{X}(s,\tau)\phi(\tau)\mathrm{d}\tau}$ . Then the sum of all the eigenvalues $\lambda_{k}$ of operator $M$ is equal to the signal energy $PT$ within $[0,T]$ :

\mathrm{tr}(M):=\sum_{k=1}^{+\infty}{\lambda_{k}}=PT,

(13)

where $P=R_{X}(0,0)$ .

Proof:

See Appendix B. ∎

Remark 1

The convergence of the finite-time capacity series (12) is ensured by Lemma 1. In fact, from the above Lemma 1, we can conclude that the sum of $\lambda_{k}$ is finite when $T$ is finite. It can be immediately derived that $I(T)<\infty$ , since $\log(1+x)\leq x$ . Furthermore, note that the sum of $\lambda_{k}$ is finite even for non-stationary processes (i.e., the power at time $t$ : $P(t):=\mathbb{E}\left[X^{2}(t)\right]$ is not always a constant $P:=R_{X}(0,0)$ ), as long as $P(t)<\infty$ holds for any $0<t<T$ . Then the conclusion $I(T)<\infty$ holds even for non-stationary processes.

Remark 2

The finite-time capacity formula (12) is closely related to the operator theory [8] in functional analysis. The sum of all the eigenvalues $\lambda_{k}$ is called the operator trace in linear operator theory. As is mentioned in Lemma 1, the autocorrelation function $R_{X}(t_{1},t_{2})$ can be treated as a linear operator $M$ on $\mathcal{L}^{2}([0,T])$ . Furthermore, this operator belongs to the trace class [12] if and only if $\int_{0}^{T}{R_{X}(t,t)\mathrm{d}t}<\infty$ . Note that this condition is automatically satisfied if $X(t)$ is a Gaussian process, since Gaussian random variables always have finite variances.

The Mercer spectra enables us to explicitly calculate the finite-time capacity, and furthermore, prove the “Exceed-Shannon” phenomenon. This will be demonstrated in the next section.

IV Proof of the Existence of Exceed-Shannon Phenomenon

In this section, we first give two different proofs of the existence of the Exceed-Shannon phenomenon, both in a typical case. Then we discuss the achievability of the finite-time capacity, and the compatibility with Shannon-Hartley Theorem.

IV-A Closed-Form Capacity in A Typical Case

In order to show the existence of Exceed-Shannon phenomenon, we only need to show that the finite-time capacity is greater than Shannon capacity in a typical case. Let us consider a finite-time communication scheme with a finitely-powered stationary transmitted signal autocorrelation⁴⁴4The signal autocorrelation $R_{X}(\tau)$ is often observed in many scenarios, such as passing a signal with white spectrum through an RC lowpass filter., which is specified as

R_{X}(t_{1},t_{2})=R_{X}(\tau)=P\exp(-\alpha\lvert\tau\rvert),

(14)

where $\tau=t_{1}-t_{2}$ , in AWGN channel⁵⁵5In this theoretical proof of “Exceed-Shannon” phenomenon, we assume the noise to be AWGN, to simplify the analytical computations. Gaussian processes of white spectrum are “immoral”, thus they can neither be power-limited, nor they can be directly sampled and numerically represented in computers. with noise PSD being $n_{0}/2$ . The power of signal $X(t)$ is $P=R_{X}(0)$ . According to Lemma 1, the trace of the corresponding Mercer operator $M(\cdot)$ is finite. Then the finite-time capacity given by Theorem 1 is also finite, as is shown in Remark 1. Finding the Mercer expansion is equivalent to finding the eigenpairs $(\lambda_{k},\phi_{k}(t))_{k=1}^{\infty}$ . The eigenpairs are determined by the following characteristic integral equation [13]:

\lambda_{k}\phi_{k}(s)=\int_{0}^{s}{Pe^{-\alpha(s-t)}\phi_{k}(t)\mathrm{d}t}+\int_{s}^{T}{Pe^{-\alpha(t-s)}\phi_{k}(t)\mathrm{d}t}.

(15)

Differentiating both sides of (15) twice with respect to $s$ yields the boundary conditions and the differential equation that $\lambda_{k},\phi_{k}$ must satisfy:

$\displaystyle\lambda_{k}\phi_{k}^{\prime\prime}(s)$	$\displaystyle=(\alpha^{2}\lambda_{k}-2\alpha P)\phi_{k}(s),0<s<T,$	(16)
$\displaystyle\phi_{k}^{\prime}(0)$	$\displaystyle=\alpha\phi_{k}(0),$
$\displaystyle\phi_{k}^{\prime}(T)$	$\displaystyle=-\alpha\phi_{k}(T).$

Let $\omega_{k}>0$ denote the resonant frequency of the above harmonic oscillator differential equation, then $\lambda_{k}=2\alpha P/(\alpha^{2}+\omega_{k}^{2})$ , and $\omega_{k}$ must satisfy the above two boundary constraints. Let $\phi_{k}(t)=A_{k}\cos(\omega_{k}t)+B_{k}\sin(\omega_{k}t)$ be the sinusoidal form of the eigenfunction. Using the boundary conditions we obtain

		$\displaystyle B_{k}\omega_{k}=\alpha A_{k},$		(17)
		$\displaystyle B_{k}\omega_{k}\cos(\omega_{k}T)-A_{k}\omega_{k}\sin(\omega_{k}T)$
		$\displaystyle=-\alpha\left(A_{k}\cos(\omega_{k}T)+B_{k}\sin(\omega_{k}T)\right).$

To ensure the existence of solution to the homogeneous linear equations (17) with unknowns $A_{k},B_{k}$ , the determinant must be zero. Exploiting this condition, we find the equation that $\omega_{k}$ must satisfy:

\tan(\omega_{k}T)=\frac{2\omega_{k}\alpha}{\omega_{k}^{2}-\alpha^{2}}.

(18)

By introducing an auxillary variable $\theta_{k}=\arctan(\omega_{k}/\alpha)\in[0,\pi/2]$ , equation (18) can be simplified as $\tan(\omega_{k}T)=-\tan(2\theta_{k})$ , i.e., there exists a positive integer $m$ such that $2\arctan(\omega_{k}/\alpha)=m\pi-\omega_{k}T$ . The integer $m$ can be chosen to be equal to $k$ . From the function images of $2\arctan(\omega/\alpha)$ and $m\pi-\omega T$ (Fig. 2), we can determine $\omega_{k}$ , and then $\lambda_{k}$ . To sum up, the solution to the characteristic equation (15) are collected into (19) as follows.

$\displaystyle 2\arctan(\omega_{k}/\alpha)$	$\displaystyle=k\pi-\omega_{k}T,$	(19)
$\displaystyle\lambda_{k}$	$\displaystyle=\frac{2\alpha P}{\alpha^{2}+\omega_{k}^{2}},$
$\displaystyle\phi_{k}(t)$	$\displaystyle=\frac{1}{Z_{k}}\left(\omega_{k}\cos(\omega_{k}t)+\alpha\sin(\omega_{k}t)\right),$

where $Z_{k}$ denotes the normalization constants of $\phi_{k}(t)$ on $[0,T]$ to ensure orthonormality.

Equation (19) gives all eigenpairs $(\lambda_{k},\phi_{k})_{k=1}^{\infty}$ , from which we can calculate $I(T)$ by applying Theorem 1. As for the Shannon capacity $C_{\rm sh}$ , by applying (8) and evaluating the integral with[14], we can obtain

$\displaystyle C_{\rm sh}$	$\displaystyle=\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{S_{X}(\omega)}{n_{0}/2}\right)\mathrm{d}\omega}$	(20)
	$\displaystyle=\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{\frac{2P\alpha}{\alpha^{2}+\omega^{2}}}{n_{0}/2}\right)\mathrm{d}\omega}$
	$\displaystyle\overset{(a)}{=}\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right),$

where the evaluation of the improper integral $(a)$ is given in Appendix E.

After all the preparation works above, we can rigorously prove that $C(T)>C_{\rm sh}$ under the typical case of (14), as long as the transmission power $P$ is smaller than a constant $\delta$ . The following Theorem 2 proves this result.

Theorem 2 (Existence of Exceed-Shannon phenomenon in a typical case)

Suppose $X(t)$ and $Y(t)$ are specified according to (14). The eigenpairs are determined by (19). Then, for any fixed positive values of $T,n_{0}$ and $\alpha$ , there exists a positive number $\delta$ such that the Exceed-Shannon inequality

C(T):=\frac{1}{2T}\sum_{k=1}^{+\infty}\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)>C_{\rm sh}

(21)

holds strictly for arbitrary $0<P<\delta$ .

Proof:

See Appendix C. ∎

To verify the above theoretical analysis, numerical experiments on $I(T)$ are conducted based on evaluations of (19) and (20). As is shown in Fig. 3, it seems that we can always harness more mutual information in a finite-time observation window, compared with the Shannon capacity. Though seems impossible, this fact is somehow unsurprising because the observations ${\bm{Y}}(t_{1}^{N})$ inside the finite-time window $[0,T]$ can always eliminate some extra uncertainty outside the window due to the autocorrelation of $X(t)$ . Different from the finite-time capacity, the Shannon capacity describes the circumstance of $T\rightarrow\infty$ , where the fringe effect near $t=0$ and $t=T$ becomes negligible compared with the prolonged window period. Thus, the Shannon capacity does not take into consideration the small extra information on the fringe, causing an underestimation of the capacity. Fig. 3 also shows that, the extra capacity $\Delta I:=I(T)-TC_{\rm sh}$ between the finite-time result and Shannon capacity tends to a constant as $T\rightarrow\infty$ . As is discussed above, the difference may come from the additional elimination of uncertainty at the fringe of the window. This asymptotically constant difference results in asymptotic linearity of the finite-time mutual information $I(T)$ as a function of $T$ .

Apart from the above discussion, there is an extra interesting observation in Fig. 3, which leads to another rigourous proof of the Exceed-Shannon phenomenon. If we investigate the slope of curves $I(T)$ at the origin, i.e., the “instant transmission rate” at the origin, we find that $C(0^{+})>C_{\rm sh}$ . This observation is confirmed by the following theorem:

Theorem 3 (Instant Finite-Time Rate Exceeds Shannon)

Suppose $X(t)$ and $Y(t)$ are specified according to (14). The eigenpairs are given by (19). Then the instantaneous finite-time information transmission rate $C(0^{+})$ is given by:

C(0^{+})=\frac{\partial I(T)}{\partial T}|_{T=0^{+}}=\frac{P}{n_{0}}

(22)

Proof:

See Appendix D. ∎

From the conclusion of Theorem 3 and (20), we can reduce the Exceed-Shannon inequality $C(0^{+})>C_{\rm sh}$ at $T=0^{+}$ to the following inequality:

\frac{P}{n_{0}}>\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right),

(23)

which can be directly verified by simple term-shifting and squaring on both sides. This inequality implies that the average transmission rate in the finite-time regime is strictly larger than the Shannon capacity around the origin $T=0$ .

Remark 3

Both Theorem 2 and Theorem 3 prove the existence of Exceed-Shannon phenomenon rigorously. However, their differences are:

•

Theorem 2 proves that the Exceed-Shannon inequality holds in a small power range $0<P<\delta$ , but Theorem 3 is true for arbitrary power $P$ .
•

Since Theorem 3 only proves $C(0^{+})>C_{\rm sh}$ , it does not ensure the Exceed-Shannon inequality when $T$ becomes larger. By contrast, Theorem 2 states that the the Exceed-Shannon phenomenon exists for arbitrary $T>0$ .

Remark 4

In fact, Theorem 2 and Theorem 3 characterize the Exceed-Shannon phenomenon of the finite-time capacity from two aspects. One aspect is the observation time $T$ , and the other is the transmit power $P$ . Combining these two proofs of the theorems may result in a universal proof that is independent of the choice of parameters $T$ and $P$ , which requires further study.

The conclusion of Theorem 3 is verified numerically in Fig. 5 and Fig. 5. The blue solid lines, representing the finite-time capacity $C(T)$ , are above the red dashed lines representing the Shannon capacity, which demonstrates the Exceed-Shannon phenomenon. The $C(T)$ curves all start at $P/n_{0}$ when $T=0^{+}$ , which coincides with the conclusion of Theorem 3. It can also be observed from the two figures that, for fixed values of $P,n_{0}$ , as $\alpha$ increases, the transmitted signal $X(t)$ tends to be less correlated, thus being more informative. The transmission rate is then improved. This insight also comes from the change of PSD $S_{X}(f)$ . As $\alpha$ increases, the PSD becomes flatter, i.e., a wider range of bandwidth is occupied, and thus the rate increases accordingly.

IV-B Further Discussions on the Exceed-Shannon Phenomenon

Achievability of the finite-time capacity. It is known that for any band-limited stationary Gaussian process $X(t)$ with PSD $S_{X}(f)$ , one can generate signals whose PSD is exactly $S_{X}(f)$ by first generating a sequence $X(nT_{s})$ of sufficiently high sampling rate, and then passing the generated sequence through a shaping filter. Since the transmitted signal $X(t)$ and its generating sequence $X(nT_{s})$ determine each other uniquely if $X(t)$ is strictly band-limited, $X(t)$ and $X(nT_{s})$ can be treated to contain the same amount of information. Then we can conclude that, after observing the noisy received process $Y(t),t\in[0,T]$ , because of the definition of the finite-time capacity $I(T)$ , the amount of uncertainty of the underlying transmitted sequence $X(nT_{s})$ can be reduced by exactly $I(T)$ nats. That is to say, we link the finite-time capacity with a sequence-to-sequence capacity. Thus, the finite-time mutual information $I(T)$ is achievable by standard capacity-achieving techniques such as random coding [1], as long as the sampling instants are dense enough.

Compatibility with the Shannon-Hartley theorem. Though the Exceed-Shannon effect does imply an average data transmission rate within a finite-time window higher than that predicted by Shannon, in fact, it is still impossible to construct a long-time stable data transmission scheme above the Shannon capacity by leveraging this effect. So the Exceed-Shannon phenomenon does not contradict the Shannon-Hartley Theorem. Placing additional observation windows cannot increase the average information rate, because the received process $Y(t)$ observed by the subsequent additional windows has already been implicitly altered by the previous observation. The posterior process $Y(t)|{\bm{Y}}(t_{1}^{N})$ does not carry as much information as the original one, thus causing a rate reduction in the later windows. It is expected that, the average transmission rate would ultimately decrease to the Shannon capacity as the total observation time tends to infinity (i.e., $C(\infty)=C_{\rm sh}$ ), and the analytical proof is still worth investigation in future works.

V Conclusions

In this paper, we provided rigorous proofs of the existence of the “Exceed-Shannon” phenomenon under typical autocorrelation settings of the transmitted signal process and the noise process. Our discovery of the “Exceed-Shannon” phenomenon revealed a possible new direction of research in information theory, as it provided a generalization of Shannon’s renowned formula $C=W\log(1+S/N)$ to the practical finite-time communications. It shows the possibility that we can communicate at a higher-than-Shannon rate in a short time. Since the finite-time capacity is a more precise estimation of the ultimate capacity limit, the optimization target may shift from the Shannon capacity to the finite-time capacity in the design of practical communication systems. Thus, it has guiding significance for the performance improvement of modern communication systems. In future works, general proofs of $C(T)>C_{\rm sh}$ , independent of the concrete autocorrelation settings, still require further investigation. Moreover, we need to answer the question of how to exploit this Exceed-Shannon phenomenon to improve the communication rate. In addition, although we have discovered numerically that the finite-time capacity agrees with the Shannon capacity when $T\rightarrow\infty$ , an analytical proof of this result is required in the future.

Appendix A
Proof Of Theorem 1

Define $n$ -by- $n$ matrix ${\bm{\Phi}}_{n}$ as

{\bm{\Phi}}_{n}=[\phi_{1}(t_{1}^{n}),\phi_{2}(t_{1}^{n}),\cdots,\phi_{n}(t_{1}^{n})],

where $t_{i}=(i-1)T/n,1\leq i\leq n$ . According to the definition of this matrix, the following relation holds:

	$\displaystyle\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}\right)_{ij}$	$\displaystyle=\frac{T}{n}\sum_{k=1}^{n}{\phi_{i}(t_{k})\phi_{j}(t_{k})}$		(24)
		$\displaystyle\rightarrow\int_{0}^{T}{\phi_{i}(t)\phi_{j}(t)\mathrm{d}t}=\delta_{ij}.$		(24)

This implies that the matrix ${\bm{\Phi}}_{n}$ satisfies the property of asymptotic orthogonality:

\|\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}-I_{n}\|_{2}\rightarrow 0,

(25)

and the matrix ${\bm{\Phi}}_{n}$ can asymptotically diagonalize ${\bm{K}}_{X}=\left(R_{X}(t_{i},t_{j})\right)_{i,j=1}^{n}$ because of the eigenvalue property (9):

	$\displaystyle\mathbb{E}\left[\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{X}}(t_{1}^{n}){\bm{X}}^{\mathrm{T}}(t_{1}^{n}){\bm{\Phi}}_{n}\right]$	$\displaystyle=\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}$		(26)
		$\displaystyle\rightarrow\mathrm{diag}(\lambda_{1},\lambda_{2},\cdots,\lambda_{n}).$		(26)

Next, we investigate the noise realizations on sampling instants $t_{i}$ : For AWGN noise, the instantaneous power is $\infty$ , i.e., $\mathbb{E}[N(t_{j})^{2}]=\infty$ , so it is necessary to assume that the AWGN noise is sampled after passing a rectangular-shaped impulse response filter $\xi(t)$ with pulse width $T/n$ and gain $n/T$ . This assumption is reasonable, since the filter $\xi(t)$ tends to an ideal sampler $\delta(t)$ as $n\to\infty$ . Under this hypothesis, the noise variance for each sample can be calculated as

		$\displaystyle\mathbb{E}\left[\left(\int_{0}^{T/n}{N(t)\xi\left(\frac{T}{n}-t\right)\mathrm{d}t}\right)^{2}\right]$		(27)
		$\displaystyle=\iint_{0}^{T/n}{\mathrm{d}t_{1}\mathrm{d}t_{2}}{\mathbb{E}\left[N(t_{1})N(t_{2})\right]\xi\left(t_{1}\right)\xi\left(t_{2}\right)}$
		$\displaystyle\overset{(a)}{=}\iint_{0}^{T/n}{\mathrm{d}t_{1}\mathrm{d}t_{2}}{\frac{n_{0}}{2}\delta(t_{1}-t_{2})\xi\left(t_{1}\right)\xi\left(t_{2}\right)}$
		$\displaystyle=\frac{n_{0}}{2}\int_{0}^{T/n}{\xi^{2}(t)\mathrm{d}t}$
		$\displaystyle=\frac{n_{0}}{2}\frac{n}{T}.$

Note that the equality $(a)$ holds since the noise autocorrelation is $\frac{n_{0}}{2}\delta(t_{1}-t_{2})$ . In this way, the mutual information within window $[0,T]$ can be calculated as

$\displaystyle I(T)=$	$\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\left(\frac{\det({\bm{K}}_{X}+\frac{nn_{0}}{2T}{\bm{I}}_{n})}{\det\left(\frac{nn_{0}}{2T}{\bm{I}}_{n}\right)}\right)$	(28)
$\displaystyle=$	$\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left({\bm{I}}_{n}+\frac{2T}{nn_{0}}{\bm{K}}_{X}\right)$
$\displaystyle\overset{(b)}{=}$	$\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{\Phi}}_{n}+\frac{1}{n_{0}/2}\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right)$
$\displaystyle\overset{(c)}{=}$	$\displaystyle\frac{1}{2}\lim_{n\rightarrow\infty}\log\det\left({\bm{I}}_{n}+\frac{1}{n_{0}/2}\mathrm{diag}(\lambda_{1},\cdots,\lambda_{n})\right)$
$\displaystyle=$	$\displaystyle\frac{1}{2}\sum_{k=1}^{+\infty}{\log\left(1+\frac{\lambda_{k}}{n_{0}/2}\right)},$

where $(b)$ comes from sandwiching the determinant in the bracket with both the asymptotically orthogonal matrix $\sqrt{T/n}{\bm{\Phi}}_{n}$ on the left and its transpose on the right, and $(c)$ comes from plugging (24) and (26) into the previous step.

Appendix B
Proof Of Lemma 1

Tracing both left and right hand sides of (26) and let $n\rightarrow\infty$ , we obtain

$\displaystyle\mathrm{tr}(M)$	$\displaystyle=\lim_{n\rightarrow\infty}\mathrm{tr}\left(\frac{T^{2}}{n^{2}}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right)$	(29)
	$\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\times\mathrm{tr}\left(\frac{T}{n}{\bm{\Phi}}_{n}^{\mathrm{T}}{\bm{K}}_{X}{\bm{\Phi}}_{n}\right)\right)$
	$\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\mathrm{tr}\left({\bm{K}}_{X}\right)\right)$
	$\displaystyle=\lim_{n\rightarrow\infty}\left(\frac{T}{n}\times nP\right)$
	$\displaystyle=PT,$

which completes the proof of Lemma 1.

Appendix C
Proof Of Theorem 2

Plugging (20) into the right-hand side of (21), and differentiate both sides w.r.t $P$ . Notice that if $P=0$ , then both sides of (21) are equal to 0. Thus, we only need to prove that the derivative of left-hand side is strictly larger than that of right-hand side within a small interval $P\in(0,\delta)$ :

\frac{1}{2}\sum_{k=1}^{+\infty}{\left(\frac{1}{1+\frac{2\lambda_{k}}{n_{0}}}\frac{2\lambda_{k}}{n_{0}P}\right)}>\frac{T}{n_{0}}\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(30)

Multiply both sides of (30) by $n_{0}$ and define $\mu_{k}:=\lambda_{k}/(PT)$ , and then from Lemma 1 we obtain $\sum_{k}{\mu_{k}}=1$ . In this way, (30) is equivalent to

\sum_{k=1}^{+\infty}{\frac{\mu_{k}}{1+\frac{2\lambda_{k}}{n_{0}}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(31)

Since $\varphi(x):=1/(1+2x/n_{0})$ is convex on $(0,+\infty)$ , by applying Jensen’s inequality to the left-hand side of (31), we only need to prove that

\frac{1}{1+\frac{2}{n_{0}}\sum_{k}{\lambda_{k}\mu_{k}}}>\frac{1}{\sqrt{1+4P/(n_{0}\alpha)}}.

(32)

From the definition of $\mu_{k}$ we can derive that $\lambda_{k}\mu_{k}=\lambda_{k}^{2}/(PT)$ . So we go on to calculate $\sum_{k}{\lambda_{k}^{2}}$ . That is equivalent to calculate $\mathrm{tr}(M^{2})$ , where $M^{2}$ corresponds to the integral kernel:

K_{M^{2}}(t_{1},t_{2}):=\int_{0}^{T}{P^{2}\exp(-\alpha\lvert t_{1}-s\rvert)\exp(-\alpha\lvert t_{2}-s\rvert)\mathrm{d}s}.

(33)

Evaluating the kernel $K_{M^{2}}$ on the diagonal $t=t_{1}=t_{2}$ yields

	$\displaystyle K_{M^{2}}(t,t)$	$\displaystyle=P^{2}\int_{0}^{T}{\exp(-2\alpha\lvert t-s\rvert)\mathrm{d}s},$		(34)
		$\displaystyle=\frac{P^{2}}{2\alpha}(2-\exp(-2\alpha t)-\exp(-2\alpha(T-t))).$		(34)

Integrating this kernel on the diagonal of $[0,T]^{2}$ gives $\sum_{k}{\lambda_{k}^{2}}=\mathrm{tr}(M^{2})$ :

	$\displaystyle\sum_{k}{\lambda_{k}^{2}}$	$\displaystyle=\frac{P^{2}}{2\alpha}\int_{0}^{T}{\left(2-e^{-2\alpha t}-e^{-2\alpha(T-t)}\right)\mathrm{d}t},$		(35)
		$\displaystyle=\frac{P^{2}}{\alpha}\left(T-\frac{1}{2\alpha}(1-e^{-2\alpha T})\right).$		(35)

By substituting (35) into (32), we just need to prove that

\sqrt{1+4P/(n_{0}\alpha)}>1+\frac{2P}{n_{0}\alpha}\left(1-\frac{1-e^{-2\alpha T}}{2\alpha T}\right).

(36)

Define the dimensionless number $x=2P/(n_{0}\alpha)$ . Since the function $\psi(x):=(1-\exp(-x))/x$ is strictly positive and less than 1 at $x>0$ , we can conclude that, there exists a small positive $\delta>0$ such that (36) holds for $0<P<\delta$ . The number $\delta$ can be chosen as

\delta=\frac{n_{0}\alpha\psi(2\alpha T)}{(1-\psi(2\alpha T))^{2}}>0,

(37)

which implies that (30) holds for any $0<P<\delta$ . Thus, integrating (30) on both sides from $p=0$ to $p=P,P<\delta$ gives rise to the conclusion (21), which completes the proof of Theorem 2.

Appendix D
Proof Of Theorem 3

Differentiating the finite-time capacity expression for $I(T)$ , i.e. (12), with respect to $T$ , then we obtain

\frac{\partial I(T)}{\partial T}|_{T=0^{+}}=\lim_{T\rightarrow 0^{+}}\frac{1}{2}\sum_{k=1}^{+\infty}{\frac{1}{(n_{0}/2)(1+2\lambda_{k}/n_{0})}\frac{\partial\lambda_{k}}{\partial T}},

(38)

where $\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}$ , by applying Lemma 1, can be expressed as

$\displaystyle\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}$	$\displaystyle=\lim_{T\rightarrow 0^{+}}\frac{\partial}{\partial T}\sum_{k=1}^{+\infty}{\lambda_{k}}$	(39)
	$\displaystyle=\frac{\partial}{\partial T}(PT)\|_{T=0^{+}}$
	$\displaystyle=P.$

Since $\omega_{k}\uparrow+\infty$ as $T\downarrow 0^{+}$ , we can safely conclude that $\lambda_{k}\downarrow 0^{+}$ . From Dirichlet’s test, the series in (38) converges uniformly. Thus, by interchanging the infinite sum and the limit operation, we obtain

$\displaystyle\frac{\partial I(T)}{\partial T}\|_{T=0^{+}}$	$\displaystyle=\frac{1}{2}\sum_{k=1}^{+\infty}{\lim_{T\rightarrow 0^{+}}\left(\frac{1}{(n_{0}/2)(1+2\lambda_{k}/n_{0})}\frac{\partial\lambda_{k}}{\partial T}\right)}$	(40)
	$\displaystyle=\frac{1}{n_{0}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}\|_{T=0^{+}}$
	$\displaystyle=\frac{1}{n_{0}}\lim_{T\rightarrow 0^{+}}\sum_{k=1}^{+\infty}\frac{\partial\lambda_{k}}{\partial T}$
	$\displaystyle=\frac{P}{n_{0}},$

which completes the proof of Theorem 3.

Appendix E
The Evaluation of The Improper Integral (20)

Define the improper inegral with parameters $P>0$ and $\alpha>0$ :

J(P,\alpha):=\int_{-\infty}^{+\infty}{\log\left(1+\frac{P}{\alpha^{2}+\omega^{2}}\right){\rm d}\omega}.

(41)

By taking the partial derivative of $J(P,\alpha)$ with respect to $P$ , we obtain

\frac{\partial J(P,\alpha)}{\partial P}=\int_{-\infty}^{+\infty}{\frac{1}{\omega^{2}+(\alpha^{2}+P)}{\rm d}\omega}.

(42)

Note that the analytic function defined as

f(z):=\frac{1}{z^{2}+(\alpha^{2}+P)},

(43)

has residual

{\rm Res}\left[f(z),z=z_{p}\right]=\frac{1}{2{\rm i}\sqrt{\alpha^{2}+P}},

(44)

at pole $z_{p}={\rm i}\sqrt{\alpha^{2}+P}$ in the upper half-plane, thus the integral in (42) can be evaluated by the residual theorem:

	$\displaystyle\frac{\partial J(P,\alpha)}{\partial P}$	$\displaystyle=2\pi{\rm i}{\rm Res}\left[f(z),z={\rm i}\sqrt{\alpha^{2}+P}\right]$		(45)
		$\displaystyle=\frac{\pi}{\sqrt{\alpha^{2}+P}}.$		(45)

Since $J(0,\alpha)\equiv 0$ , by integrating (42) with respect to $P$ from $0$ to $P$ yields

$\displaystyle J(P,\alpha)$	$\displaystyle=\int_{0}^{P}\frac{\pi}{\sqrt{\alpha^{2}+p}}{\rm d}p$	(46)
	$\displaystyle=2\pi\sqrt{\alpha^{2}+p}\|_{p=0}^{P}$
	$\displaystyle=2\pi\left(\sqrt{\alpha^{2}+P}-\alpha\right).$

Then the integral in (20) can be calculated by setting $P\leftarrow 4P\alpha/n_{0}$ and $\alpha\leftarrow\alpha$ in (46):

		$\displaystyle\frac{1}{4\pi}\int_{-\infty}^{+\infty}{\log\left(1+\frac{\frac{2P\alpha}{\alpha^{2}+\omega^{2}}}{n_{0}/2}\right)\mathrm{d}\omega}$		(47)
		$\displaystyle=\frac{2\pi}{4\pi}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right)$
		$\displaystyle=\frac{1}{2}\left(\sqrt{\alpha^{2}+\frac{4P\alpha}{n_{0}}}-\alpha\right),$

which completes the proof.

References

[1] C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Techni. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
[2] H. Landau, “Sampling, data transmission, and the nyquist rate,” Proc. IEEE, vol. 55, no. 10, pp. 1701–1706, Oct. 1967.
[3] H. Nyquist, “Certain topics in telegraph transmission theory,” Trans. American Institute of Electrical Engineers, vol. 47, no. 2, pp. 617–644, Apr. 1928.
[4] D. Slepian, “On bandwidth,” Proc. IEEE, vol. 64, no. 3, pp. 292–300, Mar. 1976.
[5] T. M. Cover, Elements of information theory. John Wiley & Sons, 1999.
[6] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
[7] J. Mercer, “Functions of positive and negative type and their connection with the theory of integral equations,” Philos. Trans. Royal Soc., vol. 209, pp. 4–415, 1909.
[8] K. Zhu, Operator theory in function spaces. American Mathematical Soc., 2007, no. 138.
[9] A. Balakrishnan, “A note on the sampling principle for continuous signals,” IRE Trans. Inf. Theory, vol. 3, no. 2, pp. 143–146, Jun. 1957.
[10] J. Barrett and D. Lampard, “An expansion for some second-order probability distributions and its application to noise problems,” IRE Trans. Inf. Theory, vol. 1, no. 1, pp. 10–15, Mar. 1955.
[11] S. S. Kelkar, L. L. Grigsby, and J. Langsner, “An extension of parseval’s theorem and its use in calculating transient energy in the frequency domain,” IEEE Trans. Ind. Electron., vol. IE-30, no. 1, pp. 42–45, Feb. 1983.
[12] C. Brislawn, “Kernels of trace class operators,” American Math. Soc., vol. 104, no. 4, 1988.
[13] D. Cai and P. S. Vassilevski, “Eigenvalue problems for exponential-type kernels,” Comput. Methods in Applied Math., vol. 20, no. 1, pp. 61–78, Jan. 2020.
[14] I. S. Gradshteyn and I. M. Ryzhik, Table of integrals, series, and products. Academic press, 2014.

Finite-Time Capacity: Making Exceed-Shannon Possible?

Abstract

Index Terms:

I Introduction

II Numerical Analysis of the Finite-Time Capacity

II-A The Expressions for Finite-Time Capacity

II-B Discretization

II-C Numerical Analysis

III A Closed-Form Finite-Time Capacity Formula

III-A The Mercer Expansion

III-B Finite-Time Capacity Formula

Theorem 1 (Source expansion, AWGN noise)

Proof:

Lemma 1 (Operator Trace Coincide with Power Constraint)

Proof:

Remark 1

Remark 2

IV Proof of the Existence of Exceed-Shannon Phenomenon

IV-A Closed-Form Capacity in A Typical Case

Theorem 2 (Existence of Exceed-Shannon phenomenon in a typical case)

Proof:

Theorem 3 (Instant Finite-Time Rate Exceeds Shannon)

Proof:

Remark 3

Remark 4

IV-B Further Discussions on the Exceed-Shannon Phenomenon

V Conclusions

Appendix A Proof Of Theorem 1

Appendix B Proof Of Lemma 1

Appendix C Proof Of Theorem 2

Appendix D Proof Of Theorem 3

Appendix E The Evaluation of The Improper Integral (20)

References

Finite-Time Capacity: Making
Exceed-Shannon Possible?

Appendix A
Proof Of Theorem 1

Appendix B
Proof Of Lemma 1

Appendix C
Proof Of Theorem 2

Appendix D
Proof Of Theorem 3

Appendix E
The Evaluation of The Improper Integral (20)