This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Iterative Detection for Orthogonal Time Frequency Space Modulation with Unitary Approximate Message Passing

Zhengdao Yuan, Fei Liu, Weijie Yuan, Qinghua Guo, , Zhongyong Wang, and Jinhong Yuan,   The work of Z. Yuan, F. Liu and Z. Wang was supported by the Postdoctoral science foundation of China (2019M652576), Science and technology research project of Henan (202102210313), Henan research project of high education (20B510005) and National Natural Science Foundation of China (NSFC61901417).Z. Yuan is with the Artificial Intelligence Technology Engineering Research Center, Henan Radio & TV University, and School of Information Engineering, and also with Zhengzhou University, Zhengzhou 450002, China (e-mail: [email protected]).F. Liu and Z. Wang is with the School of Information Engineering, Zhengzhou University, Zhengzhou 450002, China, (e-mail: [email protected], [email protected])W. Yuan and J. Yuan are with the School of Electrical Engineering and Telecommunications, University of New South Wales, NSW 2052, Australia (e-mail: [email protected], [email protected]).Q. Guo is with the School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia (e-mail: [email protected]).
Abstract

The orthogonal-time-frequency-space (OTFS) modulation has emerged as a promising modulation scheme for high mobility wireless communications. To harvest the time and frequency diversity promised by OTFS, some promising detectors, especially message passing based ones, have been developed by taking advantage of the sparsity of the channel in the delay-Doppler domain. However, when the number of channel paths is relatively large or fractional Doppler shifts have to be considered, the complexity of existing detectors is a concern, and the message passing based detectors may suffer from performance loss due to the short loops involved in message passing. In this work, we investigate the design of OTFS detectors based on the approximate message passing (AMP). In particular, leveraging the unitary AMP (UAMP), we design new detectors that enjoy the structure of the channel matrix and allow efficient implementation. In addition, the estimation of noise variance is incorporated into the UAMP-based detectors. Thanks to the robustness of UAMP relative to AMP, the UAMP-based detectors deliver superior performance, and outperform state-of-the-art detectors significantly. We also investigate iterative joint detection and decoding in a coded OTFS system, where the OTFS detectors are integrated into a powerful turbo receiver, leading to considerable performance gains.

Index Terms:
Orthogonal time frequency space modulation (OTFS), approximate message passing (AMP), unitary transformation, iterative detection.

I Introduction

Recently the orthogonal time frequency space (OTFS) modulation has attracted much attention due to its capability of achieving reliable communications for high mobility applications [1, 2, 3, 4, 5]. OTFS offers both time and frequency diversity as each symbol is spread over the time and frequency domains through the two dimensional (2D) inverse symplectic finite Fourier transform (ISFFT) [1, 2]. Compared with orthogonal frequency division multiplexing (OFDM), OTFS can achieve significant performance gains in high mobility scenarios [6]. In addition, when the number of channel paths is small, the effective channel in the delay-Doppler (DD) domain is sparse, which allows efficient channel estimation and data detection using message passing techniques [2].

A practical and powerful detector is essential to harvest the full time and frequency diversity promised by OTFS. The optimal maximum a posteriori (MAP) detector is impractical due to its complexity growing exponentially with the length of the OTFS block. In [7], an effective channel matrix in the DD domain was derived, based on which a low-complexity two-stage detector was proposed. The authors in [8] proposed a detection scheme, which uses minimum mean squared error (MMSE) equalization in the first iteration, followed by parallel interference cancellation with a soft-output sphere decoder in subsequent iterations. A low complexity iterative rake decision feedback detector was proposed in [9], [10], which extracts and coherently combines the multiple copies of the symbols (due to multipath propagation) in the DD grid using maximal ratio combining (MRC). Multiple modified MRC detectors with enhanced performance were also developed in [10]. They performs similarly or even better than the message passing based detector [2] while with a linear complexity. The design of low-complexity detectors were also investigated based on the message passing techniques [2], [11], [12]. A variational Bayes (VB) based detector was proposed in [13] to achieve better convergence compared with the existing message passing based detectors. The detectors in [2], [13], and [14] take advantage of the sparsity of the channel matrix in the DD domain, and their complexity depends on the number of nonzero elements in each row of the channel matrix, which is denoted by SS. Without considering fractional Doppler shifts, SS is equal to the number of channel paths. In general, a wideband system is able to provide sufficient delay resolution, however, the Doppler resolution depends on the time duration of the OTFS block. To fulfill the low latency requirement in future wireless communications, the time duration should be relatively small, where it is necessary to consider fractional Doppler shifts [2, 11]. In this case, the value of SS is proportional to the number of paths and can be significantly larger than the number of channel paths. This leads to two issues: the complexity of the existing detectors is a concern especially in the case of rich scattering environments, and the message passing based detectors may suffer from performance loss due to the presence of short loops in the corresponding system graph model.

This work aims to address the above issues by designing efficient detectors based on the approximate message passing (AMP) algorithm [15], [16]. The use of AMP in this work is due to two facts: AMP was developed based on loopy belief propagation in dense factor graphs, i.e., short loops can be handled by AMP, and AMP has a low complexity. It is known that AMP works well for i.i.d. (sub-)Gaussian system transfer matrix, but it may suffer from performance loss or even diverge for a general system transfer matrix [17]. Inspired by [18], the work in [19] showed that AMP can still work well for a general system transfer matrix when a unitary transform of the original model is used, where the unitary matrix for transformation can be the conjugate transpose of the left singular matrix of the general system transfer matrix [19, 20] obtained through singular value decomposition (SVD). This variant to AMP is called unitary AMP (UAMP), which is formerly known as UTAMP. UAMP has been used for low complexity robust sparse Bayesian learning [21], bilinear recovery [20], inverse synthetic aperture radar (ISAR) imaging with high Doppler resolution [22], direction of arrival (DOA) estimation [23], etc. It will be shown in this paper that UAMP is well suitable for OTFS because the channel matrix in the DD domain is a block circulant matrix with circulant block (BCCB), and the 2D discrete Fourier transform can be used for the unitary transformation, thereby allowing an efficient implementation using the 2D fast Fourier transform (FFT) algorithm. This leads to an attractive UAMP-based detector in two aspects. First, the complexity of the UAMP-based detector is in the order of the logarithm of the OTFS block length per symbol per iteration, which is independent of SS. Second, thanks to the robustness of UAMP, the UAMP-based detector delivers significantly better performance. In addition, as the noise variance is normally unknown in practice, the noise variance estimation is incorporated in the design of the UAMP-based detector in this work, so that an extra noise variance estimator is not required (existing OTFS detectors except the one in [9] often assume a perfect knowledge of the noise variance). It is shown that the UAMP-based detector can significantly outperform the AMP-based detector and other state-of-the-art detectors [2], [9, 10], [13]. In this paper, we also extend our investigations to coded OTFS systems, and integrate (U)AMP-based detectors into a powerful turbo receiver to achieve joint iterative detection and decoding.

The remainder of the paper is organized as follows. In Section II, the OTFS system model is introduced. We design (U)AMP-based detectors with bi-orthogonal waveform in Section III, and rectangular waveform in Section IV. The discussion is extended to iterative joint detection and decoding in a coded OFTS system and the performance prediction of the OTFS system with the proposed detectors is discussed in Section V. Simulations results are provided in Section VI, followed by conclusions in Section VII.

Notations- Boldface lower-case and upper-case letters denote vectors and matrices, respectively. The superscripts ()H(\cdot)^{H} and ()T(\cdot)^{T} represent conjugate transpose and transpose operations, respectively. The superscript * is used to denote the conjugate operation. We define []M[\cdot]_{M} as the mod MM operation. The probability density function of a complex Gaussian variable with mean x^\hat{x} and variance νx\nu_{x} is represented by 𝒩(x|x^,νx)\mathcal{N}(x|{\hat{x}},\nu_{x}). The notation f(𝐱)q(𝐱)\left\langle f(\mathbf{x})\right\rangle_{q(\mathbf{x})} denotes the expectation of the function f(𝐱)f(\mathbf{x}) with respect to probability density q(𝐱)q(\mathbf{x}). The relation f(x)=cg(x)f(x)=cg(x) for some positive constant cc is written as f(x)g(x)f(x)\propto g(x). Notation \otimes represents the Kronecker product, and 𝒂𝒃\bm{a}\cdot\bm{b} and 𝒂/𝒃\bm{a}\cdot/\bm{b} represent the component-wise product and division between vectors 𝒂\bm{a} and 𝒃\bm{b}, respectively. The notation 𝑿=reshapeM(𝒙)\bm{X}=\text{reshape}_{M}\left(\bm{x}\right) represents that the vector 𝒙\bm{x} is reshaped as an M×NM\times N matrix 𝑿\bm{X} column by column, where the length of vector 𝒙\bm{x} is MNMN. We use 𝒙=vec(𝑿)\bm{x}=\text{vec}\left(\bm{X}\right) to represent vectorization of matrix 𝑿\bm{X} column by column. The notation diag(𝒂)\text{diag}(\bm{a}) represents a diagonal matrix with the elements of 𝒂\bm{a} as its diagonal. We use |𝑨|2|\bm{A}|^{2} to denote the element-wise magnitude squared operation for matrix 𝑨\bm{A}. We use 1 and 0 to denote an all-ones vector and an all-zeros vector with a proper length, respectively. We use qjq_{j} to denoted the jjth entry of 𝒒\bm{q}. The superscript tt of st\textbf{s}^{t} denotes the iteration index in an iterative algorithm.

Refer to caption
Figure 1: OTFS modulation and demodulation [2].

II System Model

The OTFS modulation and demodulation are shown in Fig. 1, which are implemented with 2D inverse SFFT (ISFFT) and SFFT at the transmitter and receiver, respectively [1], [24]. A (coded) bit sequence is mapped to symbols {x[k,l],k=0,,N1,l=0,M1}\{x[k,l],k=0,\cdots,N-1,l=0,\cdots M-1\} in the DD domain, where x[k,l]𝒜={α1,.α|𝒜|}x[k,l]\in\mathcal{A}=\{\alpha_{1},....\alpha_{|\mathcal{A}|}\} with |𝒜||\mathcal{A}| being the cardinality of 𝒜\mathcal{A}, ll and kk denote the indices of delay and Doppler shifts, respectively, and NN and MM are the number of grids of the DD plane. As shown in Fig. 1, ISFFT is performed to convert the symbols to signals in the time-frequency (TF) domain, i.e.,

Xtf[n,m]=1MNk=0N1l=0M1x[k,l]ej2π(nkNmlM).\displaystyle X_{tf}[n,m]=\frac{1}{\sqrt{MN}}\sum_{k=0}^{N-1}\sum_{l=0}^{M-1}x[k,l]e^{j2\pi(\frac{nk}{N}-\frac{ml}{M})}. (1)

Then the signals {Xtf[m,n]}\{X_{tf}[m,n]\} in the TF domain are converted to a continuous-time waveform s(t)s(t) using the Heisenberg transform with a transmit waveform gtx(t)g_{tx}(t), i.e.,

s(t)=n=0N1m=0M1Xtf[n,m]gtx(tnT)ej2πmΔf(tnT),\displaystyle s(t)=\sum_{n=0}^{N-1}\sum_{m=0}^{M-1}X_{tf}[n,m]g_{tx}(t-nT)e^{j2\pi m\Delta f(t-nT)}, (2)

where Δf\Delta f is subcarrier spacing and T=1/ΔfT=1/\Delta f.

The signal s(t)s(t) is then transmitted through a time-varying channel and the received signal in the time domain can be expressed as

u(t)=h(τ,ν)s(tτ)ej2πν(tτ)𝑑τ𝑑ν,\displaystyle u(t)=\int\int h(\tau,\nu)s(t-\tau)e^{j2\pi\nu(t-\tau)}d\tau d\nu, (3)

with h(τ,ν)h(\tau,\nu) being the channel impulse response in the (continuous) DD domain. The channel impulse response can be expressed as

h(τ,ν)=i=1Phiδ(ττi)δ(ννi),\displaystyle h(\tau,\nu)=\sum_{i=1}^{P}h_{i}\delta(\tau-\tau_{i})\delta(\nu-\nu_{i}), (4)

where δ()\delta(\cdot) is the Dirac delta function, PP is the number of resolvable propagation paths, and hih_{i}, τi\tau_{i} and νi\nu_{i} represent the gain, delay and Doppler shift associated with the iith path, respectively. The delay and Doppler-shift taps for the iith path are given by

τi=liMΔf,\displaystyle\tau_{i}=\frac{l_{i}}{M\Delta f}, (5)
νi=ki+κiNT,\displaystyle\nu_{i}=\frac{k_{i}+\kappa_{i}}{NT}, (6)

where lil_{i} and kik_{i} are the delay index and Doppler index of the iith path, and κi\kappa_{{i}} [12,12]\in[-\frac{1}{2},\frac{1}{2}] is a fractional Doppler associated with the iith path. Here, MΔfM\Delta f is the total bandwidth of the system and NTNT is the duration of an OTFS block.

At the receiver side, a receive waveform grx(t)g_{rx}(t) is used to transform the received signal r(t)r(t) to the TF domain i.e.,

Y(t,f)=grx(tt)u(t)ej2πf(tt)𝑑t,\displaystyle Y(t,f)=\int g_{rx}^{*}(t^{\prime}-t)u(t^{\prime})e^{-j2\pi f(t^{\prime}-t)}dt^{\prime}, (7)

which is then sampled at t=nTt=nT and f=mΔff=m\Delta f, yielding Y[n,m]Y[n,m]. Finally, the SFFT is applied to {Y[n,m]}\{Y[n,m]\} to obtain the signal y[k,l]y[k,l] in the DD domain, i.e.,

y[k,l]=1MNn=0N1m=0M1Y[n,m]ej2π(nkNmlM).\displaystyle y[k,l]=\frac{1}{\sqrt{MN}}\sum_{n=0}^{N-1}\sum_{m=0}^{M-1}Y[n,m]e^{-j2\pi(\frac{nk}{N}-\frac{ml}{M})}. (8)

III (U)AMP-Based Detection for OTFS with Bi-orthogonal Waveform

III-A System Model in the Delay Doppler Domain

Assume that the transmit waveform and receive waveform satisfy the bi-orthogonal property [1], then the channel input-output relationship in the DD domain can be written as [2, 25]

y[k,l]=i=0P1c=NiNihix([kki+c]N,[lli]M)\displaystyle y[k,l]=\sum_{i=0}^{P-1}\sum_{c=-N_{i}}^{N_{i}}h_{i}x([k-k_{i}+c]_{N},[l-l_{i}]_{M})
×1N1ej2π(cκi)1ej2πcκiNej2πli(ki+κi)MN+ω[k,l],\displaystyle\ \ \ \ \ \ \times\frac{1}{N}\frac{1-e^{-j2\pi(-c-\kappa_{i})}}{1-e^{-j2\pi\frac{-c-\kappa_{i}}{N}}}e^{-j2\pi\frac{l_{i}(k_{i}+\kappa_{i})}{MN}}+\omega[k,l],
(9)

where Ni<NN_{i}<N is an integer, and ω[k,l]\omega[k,l] is the noise in the DD domain. We can see that for each path, the transmitted signal is circularly shifted, and scaled by a corresponding channel gain. We reshape x[k,l]x[k,l] into a vector 𝒙MN×1\bm{x}\in\mathbb{C}^{MN\times 1}, where the jjth element xjx_{j} is x[k,l]x[k,l] with j=kM+lj=kM+l. Similarly, a vector 𝒚MN×1\bm{y}\in\mathbb{C}^{MN\times 1} can also be constructed based on {y[k,l]}\{{y}[k,l]\}. Then the channel input-output relationship in (9) can be rewritten in a vector form as

𝒚=𝑯𝒙+𝝎,\displaystyle\bm{y}=\bm{H}\bm{x}+\bm{\omega}, (10)

where 𝑯MN×MN\bm{H}\in\mathbb{C}^{MN\times MN} is the effective channel in the DD domain, and 𝝎\bm{\omega} denotes a white Gaussian noise with mean 0 and variance ϵ1\epsilon^{-1} (or precision ϵ\epsilon).

III-B AMP-Based Detector

Based on (10), we aim to recover the discrete-valued symbols in the vector 𝒙\bm{x}. Motivated by the capability of dealing with short loops and the low complexity of AMP, we estimate 𝒙\bm{x} using the AMP algorithm [15], [26]. The AMP-based detection algorithm is derived in the following and summarized in Algorithm 1.

The AMP algorithm decouples the estimation of vector 𝒙\bm{x}. Lines 1-6 of Algorithm 1 follow the AMP algorithm directly, and 𝒒\bm{q} computed in Line 6 can be regarded as pseudo observations with the following decoupled pseudo observation model

qj=xj+ϖj,j=1,,MN,q_{j}=x_{j}+\varpi_{j},~{}j=1,...,MN, (11)

where ϖj\varpi_{j} denotes a Gaussian noise with mean zero and variance νqj\nu_{q_{j}}, which is computed in Line 5 of the algorithm.

Lines 7-10 are used to compute the a posteriori mean and variance of each xjx_{j} based on the above pseudo observation model and the prior of xjx_{j}, which is a uniform discrete distribution, i.e.,

P(xj=αa)=1/|𝒜|.P(x_{j}=\alpha_{a})={1}/{|\mathcal{A}|}. (12)

It is not hard to show that the a posteriori mean x^j\hat{x}_{j} and variance νxj\nu_{x_{j}} of xjx_{j} are given by

Algorithm 1 AMP-based OTFS Detector

Initialize 𝒔(1)=𝟎\bm{s}^{(-1)}=\bm{0}, 𝒙^(0)=𝟎\hat{\bm{x}}^{(0)}=\bm{0}, 𝝂x(0)=𝟏\bm{\nu}_{x}^{(0)}=\bm{1}, and t=0t=0.
Repeat

1:  𝝂𝒑=|𝑯|2𝝂𝒙t\bm{\nu}_{\bm{p}}=\left|\bm{H}\right|^{2}\bm{\nu}_{\bm{x}}^{t}
2:  𝒑=𝑯𝒙^t𝝂𝒑𝒔t1\bm{p}=\bm{H}\hat{\bm{x}}^{t}-\bm{\nu}_{\bm{p}}\cdot\bm{s}^{t-1}
3:  𝝂𝒔=1./(𝝂𝒑+ϵ11)\bm{\nu}_{\bm{s}}=\textbf{1}./\left(\bm{\nu}_{\bm{p}}+\epsilon^{-1}\textbf{1}\right)
4:  𝒔t=𝝂𝒔(𝒓𝒑)\bm{s}^{t}=\bm{\nu}_{\bm{s}}\cdot\left(\bm{r}-\bm{p}\right)
5:  𝝂𝒒=1./(|𝑯H|2𝝂s)\bm{\nu}_{\bm{q}}=\textbf{1}./\left(\left|\bm{H}^{H}\right|^{2}\bm{\nu}_{s}\right)
6:  𝒒=𝒙^t+𝝂𝒒𝑯H𝒔t\bm{q}=\hat{\bm{x}}^{t}+\bm{\nu}_{\bm{q}}\cdot\bm{H}^{H}\bm{s}^{t}
7:  j:ξj,a=exp(νqj1|αaqj|2)\forall j:\xi_{j,a}=\exp\left({-\nu_{q_{j}}^{-1}|\alpha_{a}-q_{j}|^{2}}\right)
8:  j:βj,a=ξj,a/a=1|𝒜|ξj,a\forall j:\beta_{j,a}={\xi_{j,a}}/{\sum\nolimits_{a=1}^{|\mathcal{A}|}\xi_{j,a}}
9:  j:x^jt+1=a=1|𝒜|αaβj,a\forall j:\hat{x}^{t+1}_{j}=\sum_{a=1}^{|\mathcal{A}|}\alpha_{a}\beta_{j,a}
10:  j:νxjt+1=a=1|𝒜|βj,a|αax^j|2\forall j:\nu^{t+1}_{x_{j}}=\sum_{a=1}^{|\mathcal{A}|}\beta_{j,a}|\alpha_{a}-\hat{x}_{j}|^{2}
11:  t=t+1t=t+1

Until terminated

x^j\displaystyle\hat{x}_{j} =\displaystyle= a=1|𝒜|αaβj,a\displaystyle\sum_{a=1}^{|\mathcal{A}|}\alpha_{a}\beta_{j,a} (13)
νxj\displaystyle\nu_{x_{j}} =\displaystyle= a=1|𝒜|βj,a|αax^j|2,\displaystyle\sum_{a=1}^{|\mathcal{A}|}\beta_{j,a}|\alpha_{a}-\hat{x}_{j}|^{2}, (14)

where

βj,a=ξj,a/a=1|𝒜|ξj,a\beta_{j,a}={\xi_{j,a}}/{\sum\nolimits_{a=1}^{|\mathcal{A}|}\xi_{j,a}} (15)

with

ξj,a=exp(νqj1|αaqj|2).\xi_{j,a}=\exp\left({-\nu_{q_{j}}^{-1}|\alpha_{a}-q_{j}|^{2}}\right). (16)

In Algorithm 1, the results x^j\hat{x}_{j} and νxj\nu_{x_{j}} in the last iteration can be used for de-mapping, which can be performed based on the pseudo model

x^j=xj+wj,j=1,,MN,\hat{x}_{j}=x_{j}+w_{j},~{}j=1,...,MN, (17)

where wjw_{j} is a Gaussian noise with mean zero and variance νxj\nu_{x_{j}}.

There are two issues we need to point out. First, in the above algorithm the noise variance ϵ1\epsilon^{-1} is assumed to be known, which, however, is often unknown in practice. Second, it is noted that the deviation of the channel matrix 𝑯\bm{H} from an i.i.d. (sub-)Gaussian matrix may lead to performance loss of the AMP algorithm, motivating us to employ a robust variant to AMP, e.g., UAMP. In the next sub-section, we address both issues and develop a UAMP-based detection algorithm, where the estimation of the noise variance is incorporated. In particular, we will show how the UAMP algorithm can take advantage of the structure of the channel matrix and implemented with the 2D FFT algorithm, which is much more efficient than the AMP-based algorithm.

III-C UAMP-Based Detector with 2D FFT

We first examine the property of the channel matrix 𝑯\bm{H} in (10), which can be represented as

𝑯=i=0P1c=NiNi𝑰N([cki]N)[𝑰M(li)hi\displaystyle\bm{H}=\sum_{i=0}^{P-1}\sum_{c=-N_{i}}^{N_{i}}\bm{I}_{N}(-[c-k_{i}]_{N})\otimes\Big{[}\bm{I}_{M}(l_{i})h_{i}
×(1ej2π(cκi)NNej2πcκiN)ej2πli(ki+κi)MN]\displaystyle\times\Big{(}\frac{1-e^{-j2\pi(-c-\kappa_{i})}}{N-Ne^{-j2\pi\frac{-c-\kappa_{i}}{N}}}\Big{)}e^{-j2\pi\frac{l_{i}(k_{i}+\kappa_{i})}{MN}}\Big{]} (18)

where 𝑰N([qki]N)\bm{I}_{N}(-[q-k_{i}]_{N}) denotes an N×NN\times N matrix obtained by circularly shifting the rows of the identity matrix by [qki]N-[q-k_{i}]_{N}, and 𝑰M(li)\bm{I}_{M}(l_{i}) is obtained similarly. Without fractional Doppler, i.e., κi=0\kappa_{i}=0, the channel matrix 𝑯\bm{H} is reduced to

𝑯=i=0P1𝑰N(ki)[𝑰M(li)hiej2πlikiMN].\displaystyle\bm{H}=\sum_{i=0}^{P-1}\bm{I}_{N}(k_{i})\otimes\left[\bm{I}_{M}(l_{i})h_{i}e^{-j2\pi\frac{l_{i}k_{i}}{MN}}\right]. (19)

It is noted that 𝑯\bm{H} in both (18) and (19) is a block circulant matrix with circulant blocks (BCCB). As a toy example, suppose an OTFS system with M=4M=4, N=3N=3, Ni=1N_{i}=1, and P=3P=3, where the channel gain vector is [h0,h1,h2][h_{0},h_{1},h_{2}], fractional Doppler shifts are [0.1,0.1,0.2][-0.1,0.1,0.2], and the indices of delay and Doppler taps are [0,1,2][0,1,2] and [1,3,4][1,3,4], respectively. Then the channel matrix 𝑯\bm{H} can be expressed as

𝑯=(𝑯0𝑯2𝑯1𝑯1𝑯0𝑯2𝑯2𝑯1𝑯0)MN×MN\displaystyle\bm{H}=\left(\begin{matrix}\bm{H}_{0}&\bm{H}_{2}&\bm{H}_{1}\\ \bm{H}_{1}&\bm{H}_{0}&\bm{H}_{2}\\ \bm{H}_{2}&\bm{H}_{1}&\bm{H}_{0}\end{matrix}\right)_{MN\times MN} (20)

with

𝑯0=h0g(1,0.1)𝑰M(0)+h1g(0,0.1)𝑰M(1)\displaystyle\bm{H}_{0}=h_{0}g(1,-0.1)\bm{I}_{M}(0)+h_{1}g(0,0.1)\bm{I}_{M}(1)
+h2g(1,0.2)𝑰M(2)\displaystyle\ \ \ \ \ \ \ \ +h_{2}g(1,0.2)\bm{I}_{M}(2)
𝑯1=h0g(0,0.1)𝑰M(0)+h1g(1,0.1)𝑰M(1)\displaystyle\bm{H}_{1}=h_{0}g(0,-0.1)\bm{I}_{M}(0)+h_{1}g(-1,0.1)\bm{I}_{M}(1)
+h2g(0,0.2)𝑰M(2)\displaystyle\ \ \ \ \ \ \ \ +h_{2}g(0,0.2)\bm{I}_{M}(2)
𝑯2=h0g(1,0.1)𝑰M(0)+h1g(1,0.1)𝑰M(1)\displaystyle\bm{H}_{2}=h_{0}g(-1,-0.1)\bm{I}_{M}(0)+h_{1}g(1,0.1)\bm{I}_{M}(1)
+h2g(1,0.2)𝑰M(2),\displaystyle\ \ \ \ \ \ \ \ +h_{2}g(-1,0.2)\bm{I}_{M}(2),

where the function g(c,κi)g(c,\kappa_{i}) is defined as

g(c,κi)=1N1ej2π(cκi)1ej2πcκiNej2πli(ki+κi)MN.\displaystyle g(c,\kappa_{i})=\frac{1}{N}\frac{1-e^{-j2\pi(-c-\kappa_{i})}}{1-e^{-j2\pi\frac{-c-\kappa_{i}}{N}}}e^{-j2\pi\frac{l_{i}(k_{i}+\kappa_{i})}{MN}}. (21)

A useful property of the BCCB matrix 𝑯\bm{H} is that, it can be diagonalized using a 2D DFT matrix, i.e.,

𝑯=𝑭H𝚲𝑭\displaystyle\bm{H}=\bm{F}^{H}\bm{\Lambda}\bm{F} (22)

where 𝑭=𝑭N𝑭M\bm{F}=\bm{F}_{N}\otimes\bm{F}_{M} is a unitary matrix with 𝑭N\bm{F}_{N} and 𝑭M\bm{F}_{M} denoting respectively the normalized NN-point and MM-point DFT matrices, and matrix 𝚲\bm{\Lambda} in (22) is a diagonal matrix, i.e.,

𝚲=diag(𝒅)\displaystyle\bm{\Lambda}=\text{diag}(\bm{d}) (23)

with 𝒅\bm{d} being a length-MNMN vector. The vector 𝒅\bm{d} can be computed as

𝒅=vec(FFT2(𝑪)),\displaystyle\bm{d}=\text{vec}(\text{FFT2}\left(\bm{C}\right)), (24)

where FFT2 (\cdot) represents the 2D FFT operation, 𝑪=reshapeM(𝑯(:,1))\bm{C}=\text{reshape}_{M}\left(\bm{H}(:,1)\right) is an M×NM\times N matrix, and 𝑯(:,1)\bm{H}(:,1) with length-MNMN is the first column of matrix 𝑯\bm{H}.

The above property can be exploited by UAMP, leading to a more efficient detection algorithm while with significantly enhanced performance, compared to the AMP-based detector. Instead of using model (10) directly, the UAMP algorithm [19] works with the unitary transform of the model. Because the channel matrix 𝑯\bm{H} admits (22), we have the following unitary transform of the OTFS system model (10)

𝒓=𝚲𝑭𝒙+𝝎,\displaystyle\bm{r}=\bm{\Lambda}\bm{F}\bm{x}+\bm{\omega}^{\prime}, (25)

where 𝒓𝑭𝒚\bm{r}\triangleq\bm{F}\bm{y}, 𝝎𝑭𝝎\bm{\omega}^{\prime}\triangleq\bm{F}\bm{\omega}, and the noise 𝝎\bm{\omega}^{\prime} has the same distribution as 𝝎\bm{\omega} since 𝑭\bm{F} is an unitary matrix. The precision of the noise is still denoted by ϵ\epsilon, which needs to be estimated.

Refer to caption
Figure 2: Factor graph representation of (26).

By defining 𝚽𝚲𝑭\bm{\Phi}\triangleq\bm{\Lambda}\bm{F} and an auxiliary vector 𝒛𝚽𝒙\bm{z}\triangleq\bm{\Phi}\bm{x}, we can factorize the joint probability density function of the unknown variables 𝒙,𝒛,\bm{x},\bm{z}, and ϵ\epsilon given 𝒓\bm{r} as

p(𝒙,𝒛,ϵ|𝒓)\displaystyle\!\!\!\!\!\!p(\bm{x},\bm{z},\epsilon|\bm{r})\!\!\!\!\!\!\!\!\!\!\!\!\!\ p(ϵ)p(𝒓|𝒛,ϵ)p(𝒛|𝒙)p(𝒙)\displaystyle\propto p(\epsilon)p(\bm{r}|\bm{z},\epsilon)p(\bm{z}|\bm{x})p(\bm{x}) (26)
=p(ϵ)jp(rj|zj,ϵ)p(zj|𝒙)ip(xi)\displaystyle=p(\epsilon)\prod\nolimits_{j}p(r_{j}|z_{j},\epsilon)p(z_{j}|\bm{x})\prod\nolimits_{i}p(x_{i})
=fϵjfrj(zj,ϵ)fδj(zj,𝒙)ifxi(xi),\displaystyle=f_{\epsilon}\prod\nolimits_{j}f_{r_{j}}(z_{j},\epsilon)f_{\delta_{j}}(z_{j},\bm{x})\prod\nolimits_{i}f_{x_{i}}(x_{i}),

where indexes i,j[1:MN]i,j\in[1:MN]. To facilitate the factor graph representation of the factorization in (26), we introduce the notations in Table I, showing the correspondence between the factor labels and the underlying distributions they represent, and the specific functional form assumed by each factor. In Table I, 𝚽j\bm{\Phi}_{j} denotes the jjth row of matrix 𝚽\bm{\Phi}. The factor graph representation for the factorization in (26) is depicted in Fig. 2, where squares and circles represent function nodes and variable nodes, respectively.

TABLE I: Factors, Underlying Distributions and Functional Forms Associated with (26)
Factor Distribution Functional Form
frjf_{r_{j}} p(rj|zj,ϵ)p\left(r_{j}|z_{j},\epsilon\right) 𝒩(zj;rj,ϵ1)\mathcal{N}\left(z_{j};r_{j},\epsilon^{-1}\right)
fδjf_{\delta_{j}} p(zj|𝒙)p\left(z_{j}|\bm{x}\right) δ(zj𝚽j𝒙)\delta\left(z_{j}-\bm{\Phi}_{j}\bm{x}\right)
fxif_{x_{i}} p(xi)p\left(x_{i}\right) (1/|𝒜|)a=1|𝒜|δ(xiαa)(1/|\mathcal{A}|)\sum_{a=1}^{|\mathcal{A}|}\delta\left(x_{i}-\alpha_{a}\right)
fϵf_{\epsilon} p(ϵ)p(\epsilon) ϵ1\propto\epsilon^{-1} [27]
Algorithm 2 UAMP Based OTFS Detector (with noise precision estimation)

Unitary transform: 𝒓=𝑭𝒚=𝚲𝑭𝒙+𝝎\bm{r}=\bm{F}\bm{y}=\bm{\Lambda}\bm{F}\bm{x}+\bm{\omega}, with 𝑭=𝑭N𝑭M\bm{F}=\bm{F}_{N}\otimes\bm{F}_{M}. Calculated 𝒅\bm{d} with (24), and define vector 𝝀=𝒅𝒅\bm{\lambda}=\bm{d}\cdot\bm{d}^{*}. Initialize 𝒔(1)=𝟎\bm{s}^{(-1)}=\bm{0}, 𝒙^(0)=𝟎\hat{\bm{x}}^{(0)}=\bm{0}, ϵ^(0)=1\hat{\epsilon}^{(0)}=1, νx(0)=1\nu_{x}^{(0)}=1, and t=0t=0.
Repeat

1:  𝝂𝒑=νxt𝝀\bm{\nu}_{\bm{p}}=\nu_{x}^{t}\bm{\lambda}
2:  𝒑=𝒅vec(FFT2(reshapeM(𝒙^t)))𝝂𝒑𝒔t1\bm{p}=\bm{d}\cdot\text{vec}\left(\text{FFT2}\left(\text{reshape}_{M}(\hat{\bm{x}}^{t})\right)\right)-\bm{\nu}_{\bm{p}}\cdot\bm{s}^{t-1}
3:  𝝂𝒛=1./(1./𝝂𝒑+ϵ^t𝟏)\bm{\nu}_{\bm{z}}=1./\left(1./\bm{\nu}_{\bm{p}}+\hat{\epsilon}^{t}\bm{1}\right)
4:  𝒛^=𝝂𝒛(𝒑./𝝂𝒑+ϵ^t𝒓)\hat{\bm{z}}=\bm{\nu}_{\bm{z}}\cdot\left(\bm{p}./\bm{\nu}_{\bm{p}}+\hat{\epsilon}^{t}\bm{r}\right)
5:  ϵ^t+1=MN/(𝒓𝒛^22+1T𝝂𝒛)\hat{\epsilon}^{t+1}={MN}/\left({\left\|\bm{r}-\hat{\bm{z}}\right\|^{2}_{2}+\textbf{1}^{T}\bm{\nu}_{\bm{z}}}\right)
6:  𝝂𝒔=1./(𝝂𝒑+1/ϵt+1𝟏)\bm{\nu}_{\bm{s}}=1./\left(\bm{\nu}_{\bm{p}}+1/\epsilon^{t+1}\bm{1}\right)
7:  𝒔t=ν𝒔(𝒓𝒑^)\bm{s}^{t}=\nu_{\bm{s}}\cdot\left(\bm{r}-\hat{\bm{p}}\right)
8:  νq=𝝀T𝝂s/(MN)\nu_{q}=\bm{\lambda}^{T}\bm{\nu}_{s}/(MN)
9:  𝒒=𝒙^(t)+νqvec(IFFT2(reshapeM(𝒅𝒔t)))\bm{q}=\hat{\bm{x}}^{(t)}+\nu_{q}\text{vec}\left(\text{IFFT2}\left(\text{reshape}_{M}({\bm{d}}\cdot\bm{s}^{t})\right)\right)
10:  j:ξj,a=exp(νq1|αaqj|2)\forall j:\xi_{j,a}=\exp\left({-\nu_{q}^{-1}|\alpha_{a}-q_{j}|^{2}}\right)
11:  j:βj,a=ξj,a/a=1|𝒜|ξj,a\forall j:\beta_{j,a}={\xi_{j,a}}/{\sum\nolimits_{a=1}^{|\mathcal{A}|}\xi_{j,a}}
12:  j:x^jt+1=a=1|𝒜|αaβj,a\forall j:\hat{x}^{t+1}_{j}=\sum\nolimits_{a=1}^{|\mathcal{A}|}\alpha_{a}\beta_{j,a}
13:  j:νxjt+1=a=1|𝒜|βj,a|αax^jt+1|2\forall j:\nu_{x_{j}}^{t+1}=\sum\nolimits_{a=1}^{|\mathcal{A}|}\beta_{j,a}|\alpha_{a}-\hat{x}^{t+1}_{j}|^{2}
14:  νxt+1=1MNj=1MNνxjt+1\nu_{x}^{t+1}=\frac{1}{MN}\sum_{j=1}^{MN}\nu_{x_{j}}^{t+1}
15:  t=t+1t=t+1

Until terminated

Following the UAMP algorithm, we can derive a UAMP-based iterative detector, which is summarized in Algorithm 2. As the noise precision ϵ\epsilon is unknown, its estimation needs to be included in the UAMP-based detector. According to the derivation of (U)AMP using the loopy belief propagation, UAMP provides the message from variable node zjz_{j} to function node frjf_{r_{j}}, which is Gaussian and denoted by mzjfrj(zj)=𝒩(zj|pj,νpj)m_{{z_{j}}\rightarrow f_{r_{j}}}(z_{j})=\mathcal{N}(z_{j}|p_{j},\nu_{p_{j}}). Here, the mean pjp_{j} and the variance νpj\nu_{p_{j}} are given in Lines 1 and 2 of Algorithm 2 in a vector form. Based on the mean field rule [28] at the function node frjf_{r_{j}}, we can compute the message passed from function node frjf_{r_{j}} to variable node ϵ\epsilon, i.e.,

mfrjϵ(ϵ)\displaystyle m_{f_{r_{j}}\rightarrow\epsilon}(\epsilon) exp{logfrj(rj|zj,ϵ)b(zj)}\displaystyle\propto\exp\left\{\left\langle{\log f_{r_{j}}({{r_{j}|z_{j},\epsilon}})}\right\rangle_{b\left({z_{j}}\right)}\right\} (27)
ϵexp{ϵ(|rjz^j|2+vzj)},\displaystyle\propto\epsilon\exp\left\{{-\epsilon}{(|r_{j}-\hat{z}_{j}|^{2}+v_{z_{j}})}\right\},

where b(zj)b(z_{j}) is the belief of zjz_{j}. It turns out that b(zj)b(z_{j}) is also Gaussian with variance and mean given by

νzj\displaystyle\nu_{z_{j}} =\displaystyle= 1/(1/νpj+ϵ^)\displaystyle 1/\left(1/\nu_{p_{j}}+\hat{\epsilon}\right) (28)

and

z^\displaystyle\hat{z} =\displaystyle= νzj(pj/νpj+ϵ^rj)\displaystyle\nu_{z_{j}}\left(p_{j}/\nu_{p_{j}}+\hat{\epsilon}r_{j}\right) (29)

respectively, where ϵ^\hat{\epsilon} in the estimate of ϵ\epsilon in last iteration. They can be expressed in a vector form shown in Line 3 and Line 4 of Algorithm 2. The estimation of ϵ\epsilon can be performed based on the belief b(ϵ)b(\epsilon) at the variable node ϵ\epsilon shown in Fig. 2, i.e.,

b(ϵ)\displaystyle b(\epsilon) fϵ(ϵ)j=1MNmfrjϵ(ϵ).\displaystyle\propto f_{\epsilon}(\epsilon)\prod_{j=1}^{MN}m_{f_{r_{j}}\rightarrow\epsilon}(\epsilon). (30)

Then the estimate of ϵ\epsilon can be expressed as

ϵ^=0ϵb(ϵ)𝑑ϵ=MN/j=1MN(|rjz^j|2+νzj),\hat{\epsilon}=\int_{0}^{\infty}\epsilon b(\epsilon)d\epsilon={MN}/{\sum_{j=1}^{MN}{\left(|r_{j}-\hat{z}_{j}|^{2}+\nu_{z_{j}}\right)}}, (31)

which can be rewritten in a vector form shown in Line 5 of Algorithm 2. By using the mean field rule at the function node frjf_{r_{j}} again, we have the message passed from the function node frjf_{r_{j}} to the variable node zjz_{j}, i.e.,

mfrjzj(zj)\displaystyle m_{f_{r_{j}}\rightarrow z_{j}}(z_{j}) exp{logfrj(rj|zj,ϵ^)b(ϵ)}\displaystyle\propto\exp\left\{\left\langle{\log f_{r_{j}}({{r_{j}|z_{j},\hat{\epsilon}}})}\right\rangle_{b\left({\epsilon}\right)}\right\} (32)
𝒩(hj|rj,ϵ^1).\displaystyle\propto\mathcal{N}(h_{j}|r_{j},\hat{\epsilon}^{-1}).

Then the UAMP algorithm with known noise can be used as if the true noise precision is ϵ^\hat{\epsilon}, leading to Lines 6-15 and Lines 1 and 2 of Algorithm 2. The derivations of Lines 10-13 are the same as those for the AMP-based detector. There is an extra operation in Line 14, which averages the variances of {xj}\{x_{j}\}. It is note that the special form of the unitary matrix 𝑭\bm{F} leads to the implementation with the 2D FFT algorithm in Line 2 and Line 9 of Algorithm 2.

III-D Computational Complexity Analysis

We first look at the complexity of the AMP based-detector. From Algorithm 1, we can find that the complexity of AMP-based detector is dominated by the matrix-vector products in Lines 1, 2, 5 and 6, e.g., 𝑯𝒙^t\bm{H}\hat{\bm{x}}^{t} and |𝑯|2𝝂𝒙t\left|\bm{H}\right|^{2}\bm{\nu}_{\bm{x}}^{t}. Note that, each row of the matrix 𝑯\bm{H} has S=i=1PNiS=\sum_{i=1}^{P}N_{i} non-zero elements [2], so the complexity is 𝒪(MNS)+𝒪(MN|𝒜|)\mathcal{O}\left(MNS\right)+\mathcal{O}\left(MN|\mathcal{A}|\right) per OTFS block per iteration.

Comparing Algorithm 2 for the UAMP-based detector with Algorithm 1, we can find that the UAMP-based detector does not require any matrix-vector products, and all lines of the algorithm involve only element-wise vector operations or scalar operations, except Line 2 and Line 9. It is noted that Line 2 and Line 9 can be implemented with the FFT algorithm. So the the complexity of Algorithm 2 is 𝒪(MNlog(MN))+𝒪(MN|𝒜|)\mathcal{O}\left(MN\log(MN)\right)+\mathcal{O}\left(MN|\mathcal{A}|\right) per OTFS block per iteration, which is independent of SS.

In comparison with the (U)AMP-based detectors, the detectors proposed in [2] and [13] have a complexity of 𝒪(MNS|𝒜|)\mathcal{O}(MNS|\mathcal{A}|) per OTFS block per iteration, which can be much higher than that of the (U)AMP-based detectors in the case of rich scattering environments and when fractional Doppler shifts have to be considered (leading to a large SS). The MRC detector in [9, 10] has a complexity of 𝒪(MN(L+log(N)))\mathcal{O}(MN(L+log(N))) per block per iteration (LL is the number of unique delay taps), which can be smaller than that of the UAMP-based detector when LL is small. We will show in Section VI that the (U)AMP-based detectors can deliver much better performance than other detectors especially when PP is relatively large.

IV OTFS Detection with Rectangular Waveform

The rectangular waveform has been used in the literature as a more practical waveform. When a length-TT rectangular waveform is used as the transmitter and receiver filters, as in [29], [30, 31], we append a cyclic prefix (CP) to each length-MM sub-block of the signal 𝒔\bm{s} in the time domain before transmitting over the time-varying channel.

After removing CPs at the receiver side, the received signal in the time domain can be expressed as [29, 30, 31]

𝒖=𝑯T𝒔+𝝎,\displaystyle\bm{u}=\bm{H}_{T}\bm{s}+\bm{\omega}, (33)

where 𝝎\bm{\omega} is an additive Gaussian white noise vector, and 𝑯T\bm{H}_{T} is an NM×NMNM\times NM block diagonal matrix

𝑯T=(𝑯1000𝑯2000𝑯N)NM×NM\displaystyle\bm{H}_{T}=\left(\begin{matrix}\bm{H}_{1}&0&\ldots&0\\ 0&\bm{H}_{2}&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&\bm{H}_{N}\end{matrix}\right)_{NM\times NM}

where 𝑯nM×M\bm{H}_{n}\in\mathbb{C}^{M\times M} is the channel matrix corresponding to the nn-th sub-block of the transmitted signal. Then the received signal in the DD domain is obtained by performing block-wise DFTs, followed by the SFFT operation, i.e.,

𝒚=(𝑭N𝑭MH)(𝑰𝑭M)𝒖.\displaystyle\bm{y}=(\bm{F}_{N}\otimes\bm{F}_{M}^{H})(\bm{I}\otimes\bm{F}_{M})\bm{u}. (34)

Finally, the channel input-output relationship in the DD domain can be expressed as

𝒚\displaystyle\bm{y} =\displaystyle= (𝑭N𝑰M)𝑯T(𝑭NH𝑰M)𝒙+𝝎\displaystyle(\bm{F}_{N}\otimes\bm{I}_{M})\bm{H}_{T}(\bm{F}_{N}^{H}\otimes\bm{I}_{M})\bm{x}+\bm{\omega} (35)
=\displaystyle= 𝑯𝒙+𝝎,\displaystyle\bm{H}\bm{x}+\bm{\omega},

where 𝑯\bm{H} represents the effective channel matrix in the DD domain.

Based on model (35), we can directly apply Algorithm 1 to detect 𝒙\bm{x}, leading to an AMP-based OTFS detector. However, Algorithm 2 cannot be directly applied. Next, we design a detector based on the UAMP algorithm.

Since the channel matrix 𝑯T\bm{H}_{T} is a block diagonal matrix, we can perform SVD block-by-block, i.e.,

𝑯\displaystyle\bm{H} =\displaystyle= (𝑼1𝚲1𝑽1000𝑼2𝚲2𝑽2000𝑼N𝚲N𝑽N)\displaystyle\left(\begin{matrix}\bm{U}_{1}\bm{\Lambda}_{1}\bm{V}_{1}&0&\ldots&0\\ 0&\bm{U}_{2}\bm{\Lambda}_{2}\bm{V}_{2}&\ldots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&\bm{U}_{N}\bm{\Lambda}_{N}\bm{V}_{N}\end{matrix}\right) (36)
=\displaystyle= 𝑼𝚲𝑽\displaystyle\bm{U}\bm{\Lambda}\bm{V}

where

𝑼\displaystyle\bm{U} =\displaystyle= (𝑼100𝑼N)\displaystyle\left(\begin{matrix}\bm{U}_{1}&\ldots&0\\ \vdots&\ddots&\vdots\\ 0&\ldots&\bm{U}_{N}\end{matrix}\right) (37)
𝚲\displaystyle\bm{\Lambda} =\displaystyle= (𝚲100𝚲N)\displaystyle\left(\begin{matrix}\bm{\Lambda}_{1}&\ldots&0\\ \vdots&\ddots&\vdots\\ 0&\ldots&\bm{\Lambda}_{N}\end{matrix}\right) (38)
𝑽\displaystyle\bm{V} =\displaystyle= (𝑽100𝑽N),\displaystyle\left(\begin{matrix}\bm{V}_{1}&\ldots&0\\ \vdots&\ddots&\vdots\\ 0&\ldots&\bm{V}_{N}\end{matrix}\right), (39)

and each 𝚲n\bm{\Lambda}_{n} is an M×MM\times M diagonal matrix (so does 𝚲\bm{\Lambda} itself). By applying the unitary transformation with the unitary matrix 𝑼H(𝑭NH𝑰M)\bm{U}^{H}(\bm{F}_{N}^{H}\otimes\bm{I}_{M}) to model (35), we have

𝒓=𝚲𝚽𝒙+𝝎¯,\displaystyle\bm{r}=\bm{\Lambda}\bm{\Phi}\bm{x}+\bar{\bm{\omega}}, (40)

where 𝒓=𝑼H(𝑭NH𝑰M)𝒚MN×1\bm{r}=\bm{U}^{H}(\bm{F}_{N}^{H}\otimes\bm{I}_{M})\bm{y}\in\mathbb{C}^{MN\times 1}, 𝚽=𝑽(𝑭NH𝑰M)\bm{\Phi}=\bm{V}(\bm{F}_{N}^{H}\otimes\bm{I}_{M}) and 𝝎¯=𝑼H(𝑭NH𝑰M)𝝎\bar{\bm{\omega}}=\bm{U}^{H}(\bm{F}_{N}^{H}\otimes\bm{I}_{M})\bm{\omega} is still a zero-mean Gaussian noise vector with the same covariance matrix as 𝝎\bm{\omega} because 𝑼H(𝑭NH𝑰M)\bm{U}^{H}(\bm{F}_{N}^{H}\otimes\bm{I}_{M}) is a unitary matrix. So, UAMP can be applied to model (40), leading to an algorithm similar to Algorithm 2 with a few equations modified, as detailed in the following:

  • a.

    Vector 𝒅\bm{d} consists of the diagonal elements of matrix 𝚲\bm{\Lambda}.

  • b.

    The computation of 𝒑\bm{p} in Line 2 of Algorithm 2 is replaced by

    𝒑\displaystyle\bm{p} =\displaystyle= 𝚲𝑽(𝑭NH𝑰M)𝒙^𝝂p𝒔t1\displaystyle\bm{\Lambda}\bm{V}(\bm{F}_{N}^{H}\otimes\bm{I}_{M})\hat{\bm{x}}-\bm{\nu}_{p}\cdot\bm{s}^{t-1} (41)
    =\displaystyle= 𝒅𝑽vec(𝑿^𝑭NH)𝝂p𝒔t1,\displaystyle\bm{d}\cdot\bm{V}\text{vec}(\hat{\bm{X}}\bm{F}_{N}^{H})-\bm{\nu}_{p}\cdot\bm{s}^{t-1},

    where 𝑿^reshapeM(𝒙^)\hat{\bm{X}}\triangleq\text{reshape}_{M}(\hat{\bm{x}}), and 𝑿^𝑭NH\hat{\bm{X}}\bm{F}_{N}^{H} can be computed using the FFT algorithm.

  • c.

    The computation of 𝒒\bm{q} in Line 9 of Algorithm 2 is replaced by

    𝒒\displaystyle\bm{q} =\displaystyle= 𝒙^t+νq(𝚲𝚽)H𝒔\displaystyle\hat{\bm{x}}^{t}+\nu_{q}(\bm{\Lambda}\bm{\Phi})^{H}\bm{s} (42)
    =\displaystyle= 𝒙^t+νq(𝑭N𝑰M)𝑽H𝚲H𝒔\displaystyle\hat{\bm{x}}^{t}+\nu_{q}(\bm{F}_{N}\otimes\bm{I}_{M})\bm{V}^{H}\bm{\Lambda}^{H}\bm{s}
    =\displaystyle= 𝒙^t+νqvec(𝑺𝑭N),\displaystyle\hat{\bm{x}}^{t}+\nu_{q}\text{vec}(\bm{S}\bm{F}_{N}),

    where 𝑺reshapeM(𝑽H(𝒅𝒔))\bm{S}\triangleq\text{reshape}_{M}(\bm{V}^{H}(\bm{d}^{*}\cdot\bm{s})), and 𝑺𝑭N\bm{S}\bm{F}_{N} can be computed using the FFT algorithm.

The UAMP-based detector with rectangular waveform only involves element-wise product (division) except (41) and (42). Note that 𝑽\bm{V} is a block diagonal matrix, so the computational complexity of the detector is 𝒪(M2N)+𝒪(MN|𝒜|)\mathcal{O}\left(M^{2}N\right)+\mathcal{O}(MN|\mathcal{A}|) per OTFS block. which is independent of SS. Compared to the MRC detector in [9, 10], the UAMP-based detector has a higher complexity, but as shown in Section VI, it can achieve significantly better performance, especially in the case of high order modulations.

Refer to caption
Figure 3: Iterative joint detection and decoding in a coded OTFS system, where II and II-1 represent interleaver and deinterleaver, respectively.

V Extension to Coded System with Turbo Receiver and BER Performance Prediction with State Evolution

In this section, we consider a coded OFTS system and investigate the design of a turbo receiver for joint detection and decoding.

The turbo system is shown in Fig. 3, where the information bits are firstly encoded and then interleaved before mapping. Each symbol xj𝒜={α1,,α|𝒜|}x_{j}\in\mathcal{A}=\{\alpha_{1},...,\alpha_{|\mathcal{A}|}\} in the DD domain is mapped from a sub-sequence of the coded bit sequence, which is denoted by 𝒄j=[cj1,,cjlog|𝒜|]\bm{c}_{j}=[c_{j}^{1},...,c_{j}^{log|\mathcal{A}|}]. Each αa\alpha_{a} corresponds to a length-log|𝒜|\mathrm{log}|\mathcal{A}| binary sequence denoted by {αa1,,αalog|𝒜|}\{\alpha_{a}^{1},...,\alpha_{a}^{log|\mathcal{A}|}\}. The turbo receiver consists of a soft-in-soft-out (SISO) detector and a SISO decoder, which exchange information in an iterative manner. Based on the log-likelihood ratios (LLRs) provided the SISO decoder and the output of the OTFS demodulator as shown in Fig. 3, the task of the SISO detector is to compute the extrinsic LLR for each coded bit, i.e.,

Le(cjq)=lnP(cjq=0|𝒓)P(cjq=1|𝒓)La(cjq),L^{e}(c_{j}^{q})=\ln\frac{P(c_{j}^{q}=0|\bm{r})}{P(c_{j}^{q}=1|\bm{r})}-L^{a}(c_{j}^{q}), (43)

where La(cjq)L^{a}(c_{j}^{q}) is the ouput extrinsic LLR of the decoder in last iteration. The extrinsic LLR Le(cjq)L^{e}(c_{j}^{q}) is passed to the decoder. The derivation for Le(cjq)L^{e}(c_{j}^{q}) in terms of extrinsic mean and variance can be find in [32], and Le(cjq)L^{e}(c_{j}^{q}) can be expressed as

Le(cjq)=lnαa𝒜q0exp(|αamje|2vje)qqP(cjq=αaq)αa𝒜q1exp(|αamje|2vje)qqP(cjq=αaq),L^{e}(c_{j}^{q})=\ln\frac{\sum_{\alpha_{a}\in\mathcal{A}_{q}^{0}}\exp\big{(}-\frac{|\alpha_{a}-m_{j}^{e}|^{2}}{v_{j}^{e}}\big{)}\prod_{q^{\prime}\neq q}P(c_{j}^{q^{\prime}}=\alpha_{a}^{q^{\prime}})}{\sum_{\alpha_{a}\in\mathcal{A}_{q}^{1}}\exp\big{(}-\frac{|\alpha_{a}-m_{j}^{e}|^{2}}{v_{j}^{e}}\big{)}\prod_{q^{\prime}\neq q}P(c_{j}^{q^{\prime}}=\alpha_{a}^{q^{\prime}})}, (44)

where mjem_{j}^{e} and vjev_{j}^{e} are the extrinsic mean and variance of xjx_{j}, and 𝒜q0\mathcal{A}_{q}^{0} and 𝒜q0\mathcal{A}_{q}^{0} represent the subsets of all αa\alpha_{a} corresponding to cjq=0c_{j}^{q}=0 and cjq=1c_{j}^{q}=1, respectively. The extrinsic variance and mean are defined as [32]

vje\displaystyle v_{j}^{e} =\displaystyle= (1/vjp1/vj)1\displaystyle(1/v_{j}^{p}-1/v_{j})^{-1} (45)
mje\displaystyle m_{j}^{e} =\displaystyle= vje(mjp/vjpmj/vj)\displaystyle v_{j}^{e}(m_{j}^{p}/v_{j}^{p}-m_{j}/v_{j}) (46)

where mjm_{j} and vjv_{j} are the a priori mean and variance of xjx_{j} calculated based on the output LLRs of the SISO decoder [33], [34], [35] and mjpm_{j}^{p} and vjpv_{j}^{p} are the a posteriori mean and variance of xjx_{j}. By examining the derivation of the (U)AMP algorithm, we can find that 𝒒\bm{q} and 𝝂q\bm{\nu}_{q} consists of the extrinsic means and extrinsic variances of the symbols in vector 𝒙\bm{x} as they are the messages passed from the observation side and do not contain the immediate a priori information about 𝒙\bm{x}. Therefore, we have

mje=qj,vje=νqj\displaystyle m_{j}^{e}=q_{j},~{}v_{j}^{e}=\nu_{q_{j}} (47)

in Algorithm 1 and

mje=qj,vje=νq\displaystyle m_{j}^{e}=q_{j},~{}v_{j}^{e}=\nu_{q} (48)

in Algorithm 2. Then (44) can be readily used to compute the extrinsic LLRs of the coded bits.

It is noted that, with the LLRs provided by the SISO decoder, we can compute the probability p(xj=αa)p(x_{j}=\alpha_{a}) for each xjx_{j}, which is no longer 1/|𝒜|1/|\mathcal{A}| in Algorithm 1 and Algorithm 2. Therefore, ξj,a\xi_{j,a} in Line 7 of Algorithm 1 is amended to

ξj,a=p(xj=αa)exp(νqj1|αaqj|2),\displaystyle\xi_{j,a}=p(x_{j}=\alpha_{a})\exp\Big{(}{-\nu_{q_{j}}^{-1}|\alpha_{a}-q_{j}|^{2}}\Big{)}, (49)

and Line 10 of Algorithm 2 is changed to

ξj,a=p(xj=αa)exp(νq1|αaqj|2).\displaystyle\xi_{j,a}=p(x_{j}=\alpha_{a})\exp\Big{(}{-\nu_{q}^{-1}|\alpha_{a}-q_{j}|^{2}}\Big{)}. (50)

In addition, we note that the iteration of the (U)AMP-based detector can be incorporated into the iteration between the SISO decoder and detector, leading to a single loop iteration (i.e., inner iterations are not required).

Next, we investigate how to predict the performance of the (U)AMP-based OTFS detector based on the state evolution (SE). As (U)AMP decouples the estimation of vector 𝒙\bm{x}, so that, in the ttth iteration, we have the following pseudo observation model

qjt=xj+ϖjt,q^{t}_{j}=x_{j}+\varpi^{t}_{j}, (51)

where j=1,,Jj=1,...,J with J=MNJ=MN, and {ϖjt}\{\varpi^{t}_{j}\} denote an additive white Gaussian noise (AWGN) with mean 0 and variance τt\tau^{t}. In UAMP, the variance of the AWGN τt\tau^{t} is given as

τt\displaystyle\tau^{t} =\displaystyle= J𝟏T(𝝀./(vxt𝝀+ϵ1𝟏))\displaystyle\frac{J}{\bm{1}^{T}\big{(}\bm{\lambda}./(v_{x}^{t}\bm{\lambda}+\epsilon^{-1}\bm{1})\big{)}} (52)

where vxtv_{x}^{t} is the (average) variance of {xj}\{x_{j}\} in the ttth iteration.

Based on the above, the coded OTFS system can be regarded as a pseudo coded system with a simple AWGN channel give in (51), where {xj}\{x_{j}\} are mapped from a coded bit sequence. In the pseudo coded system, {qjt}\{q^{t}_{j}\} are the received signal, and the noise variance is τt\tau^{t}, i.e., the SNR is 1/τt1/\tau^{t}, where we assume that the power of the signal is 1. Then the signal is demapped with (44) and the LLRs are input to the SISO decoder. Based on the output of the decoder, the variance vxt+1v_{x}^{t+1} can be obtained as shown in Line 10 of Algorithm 1 and Line 14 of Algorithm 2. Clearly, vxt+1v_{x}^{t+1} depends on the SNR or τt\tau^{t} of the pseudo AWGN channel, i.e.,

vxt+1=g(τt),v_{x}^{t+1}=g(\tau^{t}), (53)

where g()g(\cdot) is a function. It is noted that the function normally does not have an analytical form. However, it can be established (in the form of a table) through simulation as in [36], i.e., simulate the code (with the mapper used) in AWGN channels with different SNRs. At the same time the bit error rate (BER) can also be obtained. Hence, we can establish the ’function’

(BER,vx)=g(τ).(\mathrm{BER},v_{x})=g(\tau). (54)

It is noted that the ’function’ does not depend on the OTFS channel, and it is only related to the code, the symbol mapper used and the SNR of the AWGN channel.

Therefore the BER performance of the coded OTFS system with iterations can be predicated with the following simple iterative process with initialization t=0t=0 and vxt=1v_{x}^{t}=1.

Repeatτt=J𝟏T(𝝀./(vxt𝝀+ϵ1𝟏))(BERt+1,vxt+1)=g(τt)t=t+1!Untilterminated\displaystyle\begin{aligned} &\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\mathrm{Repeat}\\ &\tau^{t}=\frac{J}{\bm{1}^{T}\big{(}\bm{\lambda}./(v_{x}^{t}\bm{\lambda}+\epsilon^{-1}\bm{1})\big{)}}\\ &(\mathrm{BER}^{t+1},v^{t+1}_{x})=g(\tau^{t})\\ &t=t+1\\ &\!\!\!\!\!\!\!\!\!\!\!\!!\!\!\mathrm{Until~{}terminated}\end{aligned}

As we will see in Section VI that the BER performance of the coded OTFS system with the UAMP-based detector can be predicted fairly well. Here we note that this is an empirical finding as rigorous SE of AMP algorithms for a general system transform matrix is still an open problem. In contrast, for the system with AMP-based detector, there is a huge difference between the simulated performance and predicted performance. This is because the channel matrix deviates from the i.i.d. Gaussian matrix, and as the AMP SE is no longer valid.

VI Simulation Results

In this section, we evaluate the performance of the UAMP-based detectors and compare them with the AMP-based detectors and the state-of-the-art low complexity detectors in [2], [10] and [13], which are named MP, MRC and VB, respectively. We set M=256M=256 and N=32N=32, i.e., there are 3232 time slots and 256256 subcarriers in the TF domain. Both quadrature phase shift keying (QPSK) modulation and 16-quadrature amplitude modulation (QAM) are considered. The carrier frequency is 3 GHz, and the subcarrier spacing is 2 kHz. The speed of the mobile user is set to be v=135km/hv=135km/h, leading to a maximum Doppler frequency shift index kmax=6k_{max}=6. We assume that the maximum delay index is lmaxl_{max} = 14. The Doppler index of the iith path is uniformly drawn from the set [kmax,kmax][-k_{max},k_{max}] and the delay index belongs to [1,lmax][1,l_{max}] excluding the first path (ł1=0\l_{1}=0). We assume that the fractional Doppler κi\kappa_{i} is uniformly distributed within [1/2,1/2][-1/2,1/2], and the channel coefficients {hi}\{h_{i}\} are independently drawn from a complex Gaussian distribution with mean 0 and variance ηli\eta^{l_{i}}, where the normalized power delay profile ηi=exp(αli)/iexp(αli)\eta^{i}={\text{exp}(-\alpha l_{i})}/{\sum_{i}\text{exp}(-\alpha l_{i})} with α\alpha being 0 or 0.10.1 [13], [37]. The maximum number of iterations is set to be 15 for the iterative algorithms. We note that, all the detectors except the MRC detector require the precision (or variance) of the noise. The UAMP-based detectors perform noise precision estimation, while he other detectors (except the MRC detector) including the AMP-based detectors assume that the noise precision is perfectly known. We examine the performance of the detectors in a variety of scenarios including the bi-orthogonal waveform with integer or the fractional Doppler shifts, rectangular waveform with fractional Doppler shifts, and QPSK or 16-QAM employed for modulation. In addition, both uncoded and coded systems are considered.

Refer to caption
Figure 4: BER performance of various detectors with bi-orthogonal waveform and integer Doppler shifts (a)P=6P=6, (b)P=10P=10, (c) P=12P=12, and (d) P=14P=14.
Refer to caption
Figure 5: BER performance versus iteration number (bi-orthogonal waveform and integer Doppler shifts), where P=10P=10 and SNR = 15dB.

We start with the performance comparisons of various detectors in the case of the bi-orthogonal waveform. First we assume that no fractional Doppler shifts exist, i.e., κi=0\kappa_{i}=0 and S=PS=P, and the results are shown in Fig. 4. The BER performance of various detectors versus different values of PP is shown in Fig. 4, where α=0\alpha=0, and QPSK is used. By comparing the results in this figure we can see that, the MP-based detector performs well when P=6P=6, but with the increase of PP, its performance degrades. Note that the MP algorithm in [2] is an approximation to loopy belief propagation due to a Gaussian approximation used to reduce complexity. The increase of PP significantly affects the performance of the MP-based detector, as a denser channel matrix leads to the presence of short loops with a higher probability. The VB-based detector has the similar trend. The MRC detector performs similarly as the MP-based and VB-based detectors when P=10P=10 and delivers better performance than the MP-based and VB-based detectors when P=12P=12 and P=14P=14. The AMP and UAMP-based detectors perform well, and they enjoy the diversity gain and achieve better performance with the increase of PP. In all cases, the UAMP-based detector delivers the best performance. To illustrate the convergence behaviors, we plot the BER performance of various detectors versus the number of iterations in Fig. 5 with P=10P=10 and SNR =15=15dB. It can be seen that the UAMP-based detector and the MRC detector converge faster than other detectors.

Refer to caption
Figure 6: BER performance of detectors with bi-orthogonal waveform and fractional Doppler shifts (a) P=10P=10, and (b) P=14P=14.
Refer to caption
Figure 7: BER and FER performance of detectors with the rectangular waveform and fractional Doppler shifts (a) QPSK (BER), (b) 16-QAM (BER), (c) QPSK (FER), and (d) 16-QAM (FER).
Refer to caption
Figure 8: BER performance of the coded and uncoded systems with the (U)AMP-based detectors with rectangular waveform and fractional Doppler shifts (a) P=10P=10 (coded), (b) P=10P=10 (uncoded), (c) P=14P=14 (coded), (d) P=14P=14 (uncoded).

With the assumption of factional Doppler shifts, we compare the BER performance of the AMP-based detector, the UAMP-based detector and the MRC detector versus SNR in Fig. 6, where α=0\alpha=0 and QPSK is used. As the channel matrix is not i.i.d. (sub-) Gaussian, UAMP is more robust than AMP, which leads to a significantly better performance for the UAMP-based detector, as demonstrated in Fig. 6. When P=14P=14, the performance of the AMP-based detector gets worse, and there is a huge performance gap between the AMP and the UAMP-based detectors. The MRC detector delivers much better performance than the AMP-based detector, and the UAMP-based detector outperforms the MRC detector significantly at relatively high SNRs.

Next, we consider the rectangular waveform and compare the performance of the (U)AMP based detectors and the MRC detector versus SNR. We assume fractional Doppler shifts and the number of paths P=9P=9, where α=0.1\alpha=0.1 is used for the power delay profile [13, 37]. The BER and frame error rate (FER) performance are shown in Fig. 7. It can be seen that, in terms of BER, the MRC detector performs better than the AMP-based detector and the UAMP-based detector still delivers significantly better performance than other detectors, especially in the case of 16-QAM. In terms of FER, the UAMP-based detector and the MRC detector have similar performance in the case of QPSK, and the UAMP-based detector performs considerably better than the MRC detector in the case of 16-QAM. As the MRC detector has lower complexity than the UAMP-based detector, the UAMP-based detector and the MRC detector are two useful options to achieve a good trade-off between performance and complexity.

Refer to caption
Figure 9: SE and simulated performance of the turbo receiver with the (U)AMP-based detectors, where P=10P=10.

We then evaluate the performance of the detectors in a coded OTFS system, where the turbo receiver shown in Fig. 3 is employed. We use a rate-1/2 convolutional code with generator [5,7]8[5,7]_{8}, followed by a random interleaver and QPSK modulation. The length of the codeword is MNMN. The BCJR algorithm is used for the SISO decoder. Fig. 8 shows the BER performance of both coded and uncoded systems with the (U)AMP-based detectors for P=10P=10 and P=14P=14. We can find that the performance gaps between the AMP-based detector and the UAMP-based detector become larger in the coded system. It can also be seen that the turbo receiver can achieve much better performance (about 3.543.5-4dB at the BER of 10410^{-4}) thanks to the iteration between the SISO detector and the SISO decoder.

Refer to caption
Figure 10: SE and simulated performance of the turbo receiver with the (U)AMP-based detectors (a) SNR = 8dB, and (b) SNR = 9dB.

Last, we examine the effectiveness of the SE-based BER performance prediction in Section V. The coded OTFS system has the same settings as that in Fig. 8 (a). The SE performance and the simulated performance versus SNR is shown in Fig. 9, where we can see that the predicted performance matches well with the simulated performance for the UAMP-based detector. However, for the AMP-based detector, the predicated performance is no longer accurate, and the SE overestimates the performance severely. This is because the AMP SE is invalid when the channel matrix deviates from the i.i.d. Gaussian one. Fig. 10 shows the predicated performance and simulated performance of the (U)AMP-based receiver versus iteration number at SNR = 8dB and 9dB, where it can be seen that the predicted performance matches well with the simulated performance for the UAMP-based detector.

VII Conclusions

In this paper, to address the issues of OFTS detection when the number of channel paths is relatively large and the fractional Doppler shifts have to be considered, we have investigated the design of OTFS detectors based on (U)AMP. In particular, the UAMP-based detectors enjoy the structure of the channel matrix, allowing more efficient implementation while with enhanced performance due to the robustness of UAMP. The investigations have also been extended to joint detection and decoding in a coded OTFS system with a turbo receiver. It has been demonstrated that the UAMP-based detector can outperform the state-of-the-art detectors significantly, and the performance of the system with the UAMP-based detector can be predicted well with the state evolution.

References

  • [1] R. Hadani, S. Rakib, M. Tsatsanis, A. Monk, and R. Calderbank, “Orthogonal time frequency space modulation,” in 2017 IEEE Wireless Communications and Networking Conference (WCNC), 2017.
  • [2] P. Raviteja, P. K. T., H. Yi, and V. Emanuele, “Interference cancellation and iterative detection for orthogonal time frequency space modulation,” IEEE Transactions on Wireless Communications, vol. 17, no. 10, pp. 6501–6515, 2018.
  • [3] G. D. Surabhi, R. M. Augustine, and A. Chockalingam, “On the diversity of uncoded OTFS modulation in doubly-dispersive channels,” IEEE Transactions on Wireless Communications, vol. 18, no. 6, pp. 3049–3063, June 2019.
  • [4] R. Hadani and A. Monk, “OTFS: A new generation of modulation addressing the challenges of 5g,” ArXiv, vol. abs/1802.02623, 2018.
  • [5] S. Li, J. Yuan, W. Yuan, Z. Wei, and D. W. K. Ng, “Performance analysis of coded OTFS systems over high-mobility channels,” IEEE Trans. Wireless Commun., submitted.
  • [6] A. Farhang, A. RezazadehReyhani, L. E. Doyle, and B. Farhang-Boroujeny, “Low complexity modem structure for OFDM-based orthogonal time frequency space modulation,” IEEE Wireless Communications Letters, vol. 7, no. 3, pp. 344–347, June 2018.
  • [7] L. Li, H. Wei, Y. Huang, Y. Yao, W. Ling, G. Chen, P. Li, and Y. Cai, “A simple two-stage equalizer with simplified orthogonal time frequency space modulation over rapidly time-varying channels,” CoRR, vol. abs/1709.02505, 2017. [Online]. Available: http://arxiv.org/abs/1709.02505
  • [8] T. Zemen, M. Hofer, and D. Loeschenbrand, “Low-complexity equalization for orthogonal time and frequency signaling (OTFS),” 2017. [Online]. Available: http://arxiv.org/abs/1710.09916
  • [9] T. Thaj and E. Viterbo, “Low complexity iterative rake detector for orthogonal time frequency space modulation,” in 2020 IEEE Wireless Communications and Networking Conference (WCNC), 2020, pp. 1–6.
  • [10] T. Thaj and E. Viterbo, “Low complexity iterative rake decision feedback equalizer for zero-padded otfs systems,” IEEE Transactions on Vehicular Technology, vol. 69, no. 12, p. 15606–15622, Dec 2020. [Online]. Available: http://dx.doi.org/10.1109/TVT.2020.3044276
  • [11] P. Raviteja, E. Viterbo, and Y. Hong, “OTFS performance on static multipath channels,” IEEE Wireless Communications Letters, vol. 8, no. 3, pp. 745–748, June 2019.
  • [12] P. Raviteja, K. T. Phan, Q. Jin, Y. Hong, and E. Viterbo, “Low-complexity iterative detection for orthogonal time frequency space modulation,” ArXiv, vol. arxiv.org/abs/1709.09402, 2017.
  • [13] W. Yuan, Z. Wei, J. Yuan, and D. W. K. Ng, “A simple variational bayes detector for orthogonal time frequency space (OTFS) modulation,” IEEE Transactions on Vehicular Technology, vol. 69, no. 7, pp. 7976–7980, 2020.
  • [14] S. Tiwari, S. S. Das, and V. Rangamgari, “Low complexity LMMSE receiver for OTFS,” IEEE Communications Letters, vol. 23, no. 12, pp. 2205–2209, 2019.
  • [15] D. L. Donoho, A. Maleki, and A. Montanari, “Message passing algorithms for compressed sensing: I. motivation and construction,” in 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), Jan 2010, pp. 1–5.
  • [16] ——, “Message passing algorithms for compressed sensing: II. analysis and validation,” in 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010, Cairo), Jan 2010, pp. 1–5.
  • [17] F. Caltagirone, L. Zdeborova, and F. Krzakala, “On convergence of approximate message passing,” in 2014 IEEE International Symposium on Information Theory, June 2014, pp. 1812–1816.
  • [18] Q. Guo, D. Huang, S. Nordholm, J. Xi, and Y. Yu, “Iterative frequency domain equalization with generalized approximate message passing,” IEEE Signal Processing Letters, vol. 20, no. 6, pp. 559–562, June 2013.
  • [19] Q. Guo and J. Xi, “Approximate message passing with unitary transformation,” CoRR, vol. abs/1504.04799, 2015. [Online]. Available: http://arxiv.org/abs/1504.04799
  • [20] Z. Yuan, Q. Guo, and M. Luo, “Approximate message passing with unitary transformation for robust bilinear recovery,” IEEE Transactions on Signal Processing, vol. 69, pp. 617–630, 2021.
  • [21] M. Luo, Q. Guo, D. Huang, and J. Xi, “Sparse bayesian learning based on approximate message passing with unitary transformation,” in 2019 IEEE VTS Asia Pacific Wireless Communications Symposium (APWCS), 2019, pp. 1–5.
  • [22] H. Kang, J. Li, Q. Guo, and M. Martorella, “Pattern coupled sparse bayesian learning based on UTAMP for robust high resolution ISAR imaging,” IEEE Sensors Journal, pp. 1–1, 2020.
  • [23] Y. Mao, M. Luo, D. Gao, and Q. Guo, “Low complexity DOA estimation using AMP with unitary transformation and iterative refinement,” Digital Signal Processing, p. 102800, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1051200420301457
  • [24] A. Monk, R. Hadani, M. Tsatsanis, and S. Rakib, “OTFS - orthogonal time frequency space,” CoRR, vol. abs/1608.02993, 2016. [Online]. Available: http://arxiv.org/abs/1608.02993
  • [25] P. Raviteja, Y. Hong, E. Viterbo, and E. Biglieri, “Practical pulse-shaping waveforms for reduced-cyclic-prefix OTFS,” IEEE Transactions on Vehicular Technology, vol. 68, no. 1, pp. 957–961, 2019.
  • [26] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. on Inform. Theory (ISIT 2011), Aug. 2011, pp. 2168–2172.
  • [27] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,” Journal of Machine Learning Research, vol. 1, no. 3, pp. 211–244, 2001.
  • [28] E. Riegler, G. E. Kirkelund, C. N. Manchon, M. Badiu, and B. H. Fleury, “Merging belief propagation and the mean field approximation: A free energy approach,” IEEE Transactions on Information Theory, vol. 59, no. 1, pp. 588–602, 2013.
  • [29] A. RezazadehReyhani, A. Farhang, M. Ji, R. Chen, and B. Farhang-Boroujeny, “Analysis of discrete-time MIMO OFDM-based orthogonal time frequency space modulation,” CoRR, vol. abs/1710.07900, 2017. [Online]. Available: https://arxiv.org/abs/1710.07900
  • [30] A. Farhang, A. RezazadehReyhani, L. E. Doyle, and B. Farhang-Boroujeny, “Low complexity modem structure for OFDM-based orthogonal time frequency space modulation,” IEEE Wireless Communications Letters, vol. 7, no. 3, pp. 344–347, 2018.
  • [31] M. Guillaud and D. T. M. Slock, “Channel modeling and associated inter-carrier interference equalization for OFDM systems with high doppler spread,” in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., vol. 4, 2003, pp. IV–237.
  • [32] Q. Guo and D. D. Huang, “A concise representation for the soft-in soft-out LMMSE detector,” IEEE Communications Letters, vol. 15, no. 5, pp. 566–568, 2011.
  • [33] M. Tuchler, A. C. Singer, and R. Koetter, “Minimum mean squared error equalization using a priori information,” IEEE Transactions on Signal Processing, vol. 50, no. 3, pp. 673–683, 2002.
  • [34] Q. Guo and L. Ping, “LMMSE turbo equalization based on factor graphs,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 2, pp. 311–319, 2008.
  • [35] B. Vucetic and J. Yuan, Turbo codes: principles and applications.   Springer Science & Business Media, 2012, vol. 559.
  • [36] X. Yuan, Q. Guo, X. Wang, and L. Ping, “Evolution analysis of low-cost iterative equalization in coded linear systems with cyclic prefixes,” IEEE Journal on Selected Areas in Communications, vol. 26, no. 2, pp. 301–310, 2008.
  • [37] Q. Guo, L. Ping, and D. Huang, “A low-complexity iterative channel estimation and detection technique for doubly selective channels,” IEEE Transactions on Wireless Communications, vol. 8, no. 8, pp. 4340–4349, 2009.