This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal Modem Generation with Inherent Adaptability to Variant Underwater Acoustic Channels: a Data-Driven Perspective

Xiaoquan You, Hengyu Zhang, Xuehan Wang, Jintao Wang Beijing National Research Center for Information Science and Technology (BNRist),
Dept. of Electronic Engineering, Tsinghua University, Beijing, China
{youxq21, zhanghen23, wang-xh21}@mails.tsinghua.edu.cn, wangjintao@tsinghua.edu.cn
Abstract

In underwater acoustic (UWA) communication, orthogonal frequency division multiplexing (OFDM) is commonly employed to mitigate the inter-symbol interference (ISI) caused by delay spread. However, path-specific Doppler effects in UWA channels could result in significant inter-carrier interference (ICI) in the OFDM system. To address this problem, we introduce a multi-resolution convolutional neural network (CNN) named UWAModNet in this paper, designed to optimize the modem structure, specifically modulation and demodulation matrices. Based on a trade-off between the minimum and the average equivalent sub-channel rate, we propose an optimization criterion suitable to evaluate the performance of our learned modem. Additionally, a two-stage training strategy is developed to achieve quasi-optimal results. Simulations indicate that the learned modem outperforms zero-padded OFDM (ZP-OFDM) in terms of equivalent sub-channel rate and bit error rate, even under more severe Doppler effects during testing compared to training.

Index Terms:
deep learning, modem generation, underwater acoustic channels, ZP-OFDM.

I Introduction

So far, the concept of the “Smart Ocean” has attracted considerable interest, highlighting a promising future for the Internet of Underwater Things (IoUT). IoUT can be applied in various marine applications such as marine resource exploration, defense against enemy attacks, and monitoring underwater pollution, where underwater communication plays a crucial role [1, 2].

In underwater communication, acoustic waves, due to their lower attenuation in water compared to electromagnetic waves, are capable of achieving long-distance communication and thus have broad development prospects. However, the reflection and refraction of acoustic waves between the surface and seafloor can cause a significant multipath delay spread. Furthermore, the path-specific Doppler effects, due to internal waves, platform, and sea-surface motion, also have a non-negligible impact on underwater acoustic (UWA) communication [3, 4].

As a result, orthogonal frequency division multiplexing (OFDM) is widely recognized as an effective solution to mitigate the inter-symbol interference (ISI) caused by multipath propagation [5]. However, when utilized in underwater acoustic channels, the orthogonality of OFDM subcarriers is compromised by the path-specific Doppler scale, which results in severe inter-carrier interference (ICI) among the subchannels. Additionally, to conserve transmission power allocated to the guard interval, zero-padded OFDM (ZP-OFDM) is preferred in underwater acoustic communication [6, 7].

To reduce the impact of ICI, previous research has been devoted to finding mathematical solutions to this problem. Resampling, a traditional method, is based on an assumption that all the paths have the same Doppler scaling factor [6]. Some focus on identifying new waveforms suitable for communication over UWA channels, like VBMC waveform [8], but lack mathematical derivation. Furthermore, deep learning-based methods have been proposed to optimize modem structures, specifically modulation and demodulation matrices, in high-mobility scenarios, among which ModNet significantly outperforms traditional OFDM techniques [9]. However, the signal model built in ModNet does not account for fractional delays, limiting its applicability.

To solve these problems, we aim to optimize the modem structure and build a transmission model for UWA communication. Based on a trade-off between the minimum and the average equivalent sub-channel rate, we propose an optimization criterion suitable for deep learning. Following [9, 10], we introduce a multi-resolution convolutional neural network (CNN) to enhance the performance of modem generation across UWA channels. Additionally, a two-stage training strategy is proposed to achieve quasi-optimal results. Finally, simulations on equivalent sub-channel rate and bit error rate show that the learned modem outperforms ZP-OFDM, even under more severe Doppler effects during testing compared to training.

Notations: Matrices are denoted by bold uppercase letters, vectors by bold lowercase letters, and scalars by normal font. The notation ()H(\cdot)^{H} indicates the Hermitian transpose, while 𝔼[]\mathbb{E}[\cdot] denotes the mathematical expectation.

II System Model

In this section, we present our signal model of the general modem for UWA channels, followed by the modem structure of ZP-OFDM. After that, we introduce the proposal of the evaluation criterion and optimization problem.

II-A Signal model and UWA channel

We denote the data vector as 𝐬N×1\mathbf{s}\in\mathbb{C}^{N\times 1} with zero mean and an autocorrelation matrix 𝔼[𝐬𝐬H]=σs2𝐈N\mathbb{E}[\mathbf{s}\mathbf{s}^{H}]=\sigma_{s}^{2}\mathbf{I}_{N}, where NN is the number of subcarriers. The modulated signal 𝐱M×1\mathbf{x}\in\mathbb{C}^{M\times 1} can be written as

𝐱=𝚽𝐬{\mathbf{x=\Phi s}} (1)

where 𝚽M×N(NM)\mathbf{\Phi}\in\mathbb{C}^{M\times N}(N\leq M) is the modulation matrix.

We construct the baseband signal x(t)x(t), which is band-limited to B/2fB/2-B/2\leq f\leq B/2, by employing a Nyquist interpolation filter.

x(t)=m=x[m]sinc(B(tm/Fs)),tx(t)=\sum_{m=-\infty}^{\infty}x[m]\mathrm{sinc}(B(t-m/F_{s})),t\in\mathbb{R} (2)

where x(m/Fs)=x[m]x(m/F_{s})=x[m], assumed that the sampling rate FsBF_{s}\geq B, and

sinc(x)=def{sin(πx)πx,if x0,1,if x=0.\operatorname{sinc}(x)\stackrel{{\scriptstyle\text{def}}}{{=}}\left\{\begin{array}[]{cll}\frac{\sin(\pi x)}{\pi x}&,&\text{if }x\neq 0,\\ 1&,&\text{if }x=0.\end{array}\right. (3)

In practice, we typically use a waveform with a finite duration TT, which can only be approximately bandlimited. Define M=FsTM=\lfloor F_{s}T\rfloor as the number of samples taken from x(t)x(t) within the time interval 0tT0\leq t\leq T. We can reformulate x(t)x(t) as follows:

x~(t)=m=0M1x[m]sinc(BtmB/Fs),0tT.\tilde{x}(t)=\sum_{m=0}^{M-1}x[m]\mathrm{sinc}(Bt-mB/F_{s}),0\leq t\leq T. (4)

Following [8], we assume that ε=x(t)x~(t)20\varepsilon=\|x(t)-\tilde{x}(t)\|^{2}\rightarrow 0 and thus approximate x(t)x(t) by x~(t)\tilde{x}(t).

The transmitted passband signal can be given by

x^(t)\displaystyle\hat{x}(t) =x(t)ej2πfct\displaystyle=x(t)\mathrm{e}^{\mathrm{j}2\pi f_{c}t} (5)
=m=0M1x[m]sinc(BtmB/Fs)ej2πfct.\displaystyle=\sum_{m=0}^{M-1}x[m]\mathrm{sinc}(Bt-mB/F_{s})\mathrm{e}^{\mathrm{j}2\pi f_{c}t}.

Following [6], the impulse response of the UWA channel can be determined by

c(τ,t)=p=1PAp(t)δ(ττp(t))c(\tau,t)=\sum_{p=1}^{P}A_{p}(t)\delta(\tau-\tau_{p}(t)) (6)

where PP denotes the number of propagation paths, Ap(t)A_{p}(t) is the path amplitude and τp(t)\tau_{p}(t) is the time-varying path delay. Since the duration of a transmitted signal is less than the channel stationary time, Ap(t)A_{p}(t) can be considered as a constant value ApA_{p}, and τp(t)\tau_{p}(t) can be approximated by

τp(t)τpapt\tau_{p}(t)\approx\tau_{p}-a_{p}t (7)

where apa_{p} denotes the path-specific Doppler scaling factor and τp\tau_{p} denotes the constant path delay. As a result, c(τ,t)c(\tau,t) can be derived as

c(τ,t)=p=1PApδ(τ+aptτp).c(\tau,t)=\sum_{p=1}^{P}A_{p}\delta(\tau+a_{p}t-\tau_{p}). (8)

The received passband signal is

r^(t)\displaystyle\hat{r}(t) =x^(t)c(τ,t)\displaystyle=\hat{x}(t)*c(\tau,t) (9)
=p=1PApx((ap+1)tτp)ej2πfc((ap+1)tτp)\displaystyle=\sum_{p=1}^{P}A_{p}x((a_{p}+1)t-\tau_{p})\mathrm{e}^{\mathrm{j}2\pi f_{c}((a_{p}+1)t-\tau_{p})}

where the impact of noise is neglected for simplicity in illustration.

The baseband signal r(t)r(t) can then be derived as

r(t)\displaystyle r(t) =r^(t)ej2πfct\displaystyle=\hat{r}(t)\mathrm{e}^{-\mathrm{j}2\pi f_{c}t} (10)
=p=1PApx((ap+1)tτp)ej2πfc(aptτp)\displaystyle=\sum_{p=1}^{P}A_{p}x((a_{p}+1)t-\tau_{p})\mathrm{e}^{\mathrm{j}2\pi f_{c}(a_{p}t-\tau_{p})}
=p=1P{Ap[m=0M1x[m]sinc(B((ap+1)tτp)mB/Fs)]\displaystyle=\sum_{p=1}^{P}\Biggl{\{}A_{p}\left[\sum_{m=0}^{M-1}x[m]\mathrm{sinc}(B((a_{p}+1)t-\tau_{p})-mB/F_{s})\right]
×ej2πfc(aptτp)}\displaystyle\qquad\qquad\times\mathrm{e}^{\mathrm{j}2\pi f_{c}(a_{p}t-\tau_{p})}\Biggr{\}}

with a duration of 0tT+Tg0\leq t\leq T+T_{g} due to the delay spread in the channel. The guard interval TgT_{g} is given by Tg=[τp/(ap+1)]maxT_{g}=[\tau_{p}/(a_{p}+1)]_{\max} to prevent interference among different symbols.

Time domain samples r[m]r[m^{\prime}] can then be obtained by sampling r(t)r(t) at a rate FsF_{s}, where we have

r[m]=m=0M1x[m]\displaystyle r[m^{\prime}]=\sum_{m=0}^{M-1}x[m] {p=1P[Apej2πfcτp×ej2πfcapm/Fs\displaystyle\Biggl{\{}\sum_{p=1}^{P}\Biggl{[}A_{p}\mathrm{e}^{-\mathrm{j}2\pi f_{c}\tau_{p}}\times\mathrm{e}^{\mathrm{j}2\pi f_{c}a_{p}m^{\prime}/F_{s}} (11)
×\displaystyle\times sinc(B((ap+1)m/Fsτp)mB/Fs)]}\displaystyle\mathrm{sinc}(B((a_{p}+1)m^{\prime}/F_{s}-\tau_{p})-mB/F_{s})\Biggr{]}\Biggr{\}}

for m=0,1,,M1m^{\prime}=0,1,\cdots,M^{\prime}-1, where MM^{\prime} denotes the number of signal samples of r(t)r(t), which is set as FsT+FsTg\left\lfloor F_{s}T+F_{s}T_{g}\right\rfloor to capture all useful signals to maximize information retention.

The received vector 𝐫M×1\mathbf{r}\in\mathbb{C}^{M^{\prime}\times 1} can be written in the matrix-vector notation as follows, including the additive white Gaussian noise 𝐰𝒞𝒩(0,σn2𝐈M)\mathbf{w}\sim\mathcal{CN}(0,\sigma_{n}^{2}\mathbf{I}_{M^{\prime}}).

𝐫=𝐇𝐱+𝐰.\mathbf{r=Hx+w}. (12)

𝐇M×M\mathbf{H}\in\mathbb{C}^{M^{\prime}\times M} denotes the channel matrix, whose (m,m)th(m^{\prime},m)\mathrm{th} entry is

m,m\displaystyle{}_{m^{\prime},m} =p=1P[Apej2πfcτp×ej2πfcapm/Fs\displaystyle=\sum_{p=1}^{P}\Biggl{[}A_{p}\mathrm{e}^{-\mathrm{j}2\pi f_{c}\tau_{p}}\times\mathrm{e}^{\mathrm{j}2\pi f_{c}a_{p}m^{\prime}/F_{s}} (13)
×sinc(B((ap+1)m/Fsτp)mB/Fs)].\displaystyle\times\mathrm{sinc}(B((a_{p}+1)m^{\prime}/F_{s}-\tau_{p})-mB/F_{s})\Biggr{]}.

To simplify the expression, the channel matrix 𝐇\mathbf{H} can be expressed in another form as

𝐇=p=1Pξp𝚲p𝚪p.\mathbf{H}=\sum_{p=1}^{P}\xi_{p}\mathbf{\Lambda}_{p}\mathbf{\Gamma}_{p}. (14)

The complex path gain for the pthp\mathrm{th} path ξp\xi_{p}\in\mathbb{C} is

ξp=Apej2πfcτp.\xi_{p}=A_{p}\mathrm{e}^{-\mathrm{j}2\pi f_{c}\tau_{p}}. (15)

𝚲pM×M\mathbf{\Lambda}_{p}\in\mathbb{C}^{M^{\prime}\times M^{\prime}} is a diagonal matrix with an (m,m)th(m^{\prime},m^{\prime})\mathrm{th} entry as

[𝚲p]m,m=ej2πfcapm/Fs.[\mathbf{\Lambda}_{p}]_{m^{\prime},m^{\prime}}=\mathrm{e}^{\mathrm{j}2\pi f_{c}a_{p}m^{\prime}/F_{s}}. (16)

The matrix 𝚪pM×M\mathbf{\Gamma}_{p}\in\mathbb{C}^{M^{\prime}\times M} has an (m,m)th(m^{\prime},m)\mathrm{th} entry as

[𝚪p]m,m=sinc(Bγm(p)mB/Fs).[\mathbf{\Gamma}_{p}]_{m^{\prime},m}=\mathrm{sinc}(B\gamma_{m^{\prime}}^{(p)}-mB/F_{s}). (17)

Here we adopt the notation as γm(p)=(ap+1)m/Fsτp\gamma_{m^{\prime}}^{(p)}=(a_{p}+1)m^{\prime}/F_{s}-\tau_{p}, noting that when γm(p)<0\gamma_{m^{\prime}}^{(p)}<0 or γm(p)>T\gamma_{m^{\prime}}^{(p)}>T, we have

[𝚪p]m,m=0.[\mathbf{\Gamma}_{p}]_{m^{\prime},m}=0. (18)

The signal after demodulation 𝐲N×1\mathbf{y}\in\mathbb{C}^{N\times 1} at the receiver can be expressed as

𝐲=𝚿H𝐫=𝚿H𝐇𝚽𝐬+𝚿H𝐰=𝐇e𝐬+𝚿H𝐰\mathbf{y}=\mathbf{\Psi}^{H}\mathbf{r}=\mathbf{\Psi}^{H}\mathbf{H\Phi s}+\mathbf{\Psi}^{H}\mathbf{w}=\mathbf{H}_{e}\mathbf{s}+\mathbf{\Psi}^{H}\mathbf{w} (19)

where 𝐇e=𝚿H𝐇𝚽\mathbf{H}_{e}=\mathbf{\Psi}^{H}\mathbf{H\Phi} denotes the equivalent channel and 𝚿HN×M\mathbf{\Psi}^{H}\in\mathbb{C}^{N\times M^{\prime}} represents the demodulation matrix, which converts symbols into original dimensions.

II-B ZP-OFDM modem structure

Following [6, 11], in the ZP-OFDM system, due to the presence of null subcarriers, the modulation matrix 𝚽OFDMM×N\mathbf{\Phi}_{\mathrm{OFDM}}\in\mathbb{C}^{M\times N} typically has fewer columns than rows, which can be split into two parts

𝚽OFDM=𝐅MH𝐗.\mathbf{\Phi}_{\mathrm{OFDM}}=\mathbf{F}_{M}^{H}\mathbf{X}. (20)

𝐅M\mathbf{F}_{M} denotes the discrete Fourier transform (DFT) matrix with the (i,j)(i,j)th entry

[𝐅M]i,j=1Mej2π(i1)(j1)/M.[\mathbf{F}_{M}]_{i,j}=\frac{1}{\sqrt{M}}\mathrm{e}^{-\mathrm{j}2\pi(i-1)(j-1)/M}. (21)

𝐗M×N\mathbf{X}\in\mathbb{C}^{M\times N} represents the matrix to extract NN columns from 𝐅MH\mathbf{F}_{M}^{H}, ensuring the selected subcarriers are as evenly distributed as possible within the passband.

The demodulation matrix 𝚿HN×M\mathbf{\Psi}^{H}\in\mathbb{C}^{N\times M^{\prime}} can be written as

𝚿OFDMH=𝐗H𝐅M𝐑\mathbf{\Psi}_{\mathrm{OFDM}}^{H}=\mathbf{X}^{H}\mathbf{F}_{M}\mathbf{R} (22)

where 𝐑M×M\mathbf{R}\in\mathbb{C}^{M\times M^{\prime}} represents the matrix to append the last L=MML=M^{\prime}-M columns of 𝐗H𝐅M\mathbf{X}^{H}\mathbf{F}_{M} on its front.

𝐑=[𝟎(ML)×L𝐈ML𝟎(ML)×L𝐈L𝟎L×(ML)𝐈L]M×M.\mathbf{R}=\begin{bmatrix}\mathbf{0}_{(M-L)\times L}&\mathbf{I}_{M-L}&\mathbf{0}_{(M-L)\times L}\\ \mathbf{I}_{L}&\mathbf{0}_{L\times(M-L)}&\mathbf{I}_{L}\end{bmatrix}_{M\times M^{\prime}}. (23)

II-C Proposed criterion and optimization problem

We aim to identify the modem structure (i.e. modulation matrix 𝚽\mathbf{\Phi} and demodulation matrix 𝚿H\mathbf{\Psi}^{H}), which will optimize system performance across various channel matrices 𝐇\mathbf{H}. Therefore, we need to establish an appropriate evaluation criterion to model the optimization problem, where the equivalent sub-channel rate proves to be useful, following [9].

The signal after demodulation 𝐲\mathbf{y} can be segmented into NN sub-channels, each subject to interference and noise. The output of the nthn\mathrm{th} sub-channel, denoted by 𝐲[n]\mathbf{y}[n], can be expressed as

𝐲[n]\displaystyle\mathbf{y}[n] =[𝐇e]n,n𝐬[n]+k=0,knN1[𝐇e]n,k𝐬[k]\displaystyle=[\mathbf{H}_{e}]_{n,n}\mathbf{s}[n]+\sum_{k=0,k\neq n}^{N-1}[\mathbf{H}_{e}]_{n,k}\mathbf{s}[k] (24)
+m=0M1[𝚿H]n,m𝐰[m]\displaystyle+\sum_{m^{\prime}=0}^{M^{\prime}-1}[\mathbf{\Psi}^{H}]_{n,m^{\prime}}\mathbf{w}[m^{\prime}]

where the last two terms of the sum represent interference and noise, respectively.

Since we find the worst sub-channel has a more significant impact on the transmission, the criterion function is defined as

f(𝐇e)=n=0N1rn(𝐇e)+KNminnrn(𝐇e)f(\mathbf{H}_{e})=\sum_{n=0}^{N-1}r_{n}(\mathbf{H}_{e})+KN\cdot\min_{n}r_{n}(\mathbf{H}_{e}) (25)

where K1K\geq 1 denotes the amplification factor of the worst sub-channel rate.

rn(𝐇e)=log2(1+|[𝐇e]n,n|2k=0,knN1|[𝐇e]n,k|2+σn2σs2m=0M1|[𝚿H]n,m|2)\displaystyle r_{n}(\mathbf{H}_{e})=\log_{2}\left(1+\frac{\left|[\mathbf{H}_{e}]_{n,n}\right|^{2}}{\sum_{\begin{subarray}{c}k=0,\\ k\neq n\end{subarray}}^{N-1}\left|[\mathbf{H}_{e}]_{n,k}\right|^{2}+\frac{\sigma_{n}^{2}}{\sigma_{s}^{2}}\sum_{m^{\prime}=0}^{M^{\prime}-1}\left|[\mathbf{\Psi}^{H}]_{n,m^{\prime}}\right|^{2}}\right) (26)

where rnr_{n} denotes the nnth equivalent sub-channel rate, which is related to the signal-to-noise ratio (SNR) σs2/σn2\sigma_{s}^{2}/\sigma_{n}^{2}. Hence, the modulation design can be formulated as an optimization problem

max𝚿,𝚽𝔼𝐇[f(𝐇e)],𝚽M×N,𝚿HN×M\max_{\mathbf{\Psi},\mathbf{\Phi}}\mathbb{E}_{\mathbf{H}}\left[f\left(\mathbf{H}_{e}\right)\right],\mathbf{\Phi}\in\mathbb{C}^{M\times N},\mathbf{\Psi}^{H}\in\mathbb{C}^{N\times M^{\prime}} (27)

Where 𝔼𝐇[]\mathbb{E}_{\mathbf{H}}[\cdot] represents the expectation conditioned on the distribution of 𝐇\mathbf{H}, we introduce deep learning-based methods to approximate the quasi-optimal solution.

III Design of UWAModNet and Training Strategy

In this section, a network named UWAModNet is introduced to solve the optimization problem. Following [9], we employ a two-stage training strategy to standardize the modem structure, ensuring compatibility across various channels.

III-A UWAModNet structure

Refer to caption
Figure 1: The structure of the proposed UWAModNet. For simplicity, the activation functions and reshape blocks are omitted from the diagram.

As demonstrated in Fig. 1, we propose a multi-resolution network since the energy distribution of 𝐇\mathbf{H} exhibits significant localization characteristics. In regions where the energy distribution is more concentrated, smaller convolutional kernels are better able to extract finer features, while in regions where the energy distribution is not concentrated, larger convolutional kernels are adequate. The UWA channel matrix 𝐇\mathbf{H} is processed as an input image with dimensions 2×M×M2\times M^{\prime}\times M. This image comprises two channels, representing the real and imaginary parts of 𝐇\mathbf{H}. The input image is then processed through two parallel pathways. The first pathway consists of three sequential 7×77\times 7 convolutional layers that generate a high-resolution view. In contrast, the second pathway includes three consecutive 3×33\times 3 convolutional layers, resulting in a lower resolution. Each convolution is followed by a batch normalization, and a dense connection is employed among the convolution layers, alleviating the over-fitting problem. The outputs from both pathways are then concatenated and integrated using a 1×11\times 1 convolutional layer. Subsequently, three fully connected (FC) layers are incorporated to adjust the output size, which is finally divided into 𝚽\mathbf{\Phi} and 𝚿H\mathbf{\Psi}^{H}. The energies of 𝚽\mathbf{\Phi} and 𝚿H\mathbf{\Psi}^{H} are normalized to NN and NM/MNM^{\prime}/M, respectively, aligning with the ZP-OFDM standards.

It is worth noting that each layer, except for the final fully connected (FC) layer, is followed by a leaky ReLU activation layer to introduce non-linearity. Leaky ReLU is defined as

LeakyReLU(x)={x,x0βx,x<0\mathrm{LeakyReLU}(x)=\left\{\begin{array}[]{ccc}x&,&x\geq 0\\ \beta x&,&x<0\\ \end{array}\right. (28)

where the negative slope, denoted as β\beta, is set to a default value of 0.3.

III-B Two-stage training strategy

Within the specified range of channel parameters, our goal is to obtain definite modulation matrix 𝚽\mathbf{\Phi} and demodulation matrix 𝚿H\mathbf{\Psi}^{H}. However, when we simultaneously aim to optimize system performance and ensure the convergence of system outputs, the training outcomes are not entirely satisfactory. Thus, we have revised our training strategy: In Stage I, we focus on enhancing system performance, and in Stage II, we address both system performance and the convergence of system outputs, as is developed in Algorithm 1 where E1E_{1} and E2E_{2} denote the training epochs of Stage I and Stage II respectively.

Input: the training dataset {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\} and the validation dataset {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\}
Output: the learned modulation and demodulation matrices {𝚽,𝚿H}\{\mathbf{\Phi},\mathbf{\Psi}^{H}\}
1 % Stage I
2initialize model parameters 𝜽\boldsymbol{\theta} randomly
3for epoch=1,2,,E1\mathrm{epoch}=1,2,\cdots,E_{1} do
4       sample a batch of {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\} from the training dataset
5      compute the output {𝚽,𝚿H}\{\mathbf{\Phi},\mathbf{\Psi}^{H}\} of UWAModNet
6      compute loss1\mathrm{loss}_{1} according to equation (25), (26) and (29)
7      validate the performance of the learned modem and update 𝜽\boldsymbol{\theta}
8 end for
9% Stage II
10for epoch=1,2,,E2\mathrm{epoch}=1,2,\cdots,E_{2} do
11       sample a batch of {𝐇1,𝐇1e,OFDM}\{\mathbf{H}_{1},\mathbf{H}_{1e,\mathrm{OFDM}}\} and another batch of {𝐇2,𝐇2e,OFDM}\{\mathbf{H}_{2},\mathbf{H}_{2e,\mathrm{OFDM}}\} from the training dataset
12      compute the corresponding output {𝚽1,𝚿1H}\{\mathbf{\Phi}_{1},\mathbf{\Psi}_{1}^{H}\} and {𝚽2,𝚿2H}\{\mathbf{\Phi}_{2},\mathbf{\Psi}_{2}^{H}\} of UWAModNet respectively
13      compute loss2\mathrm{loss}_{2} according to equation (25), (26) and (30)
14      validate the performance of the learned modem and update 𝜽\boldsymbol{\theta}
15 end for
16% Generate the final modem
17compute the outputs of UWAModNet with optimized model parameters on the validation dataset
Aggregate the multiple outputs using the average method to obtain the final modem {𝚽,𝚿H}\{\mathbf{\Phi},\mathbf{\Psi}^{H}\}
Algorithm 1 Training Strategy of the Proposed UWAModNet

III-B1 Stage I: Optimization stage

Channel matrices are selected from the training dataset to update weights of UWAModNet. Based on equation (25), we define the loss function for this stage as follows.

loss1=f(𝐇e,OFDM)f(𝐇e)\mathrm{loss_{1}}=f(\mathbf{H}_{e,\mathrm{OFDM}})-f(\mathbf{H}_{e}) (29)

where 𝐇e,OFDM=𝚿OFDMH𝐇𝚽OFDM\mathbf{H}_{e,\mathrm{OFDM}}=\mathbf{\Psi}^{H}_{\mathrm{OFDM}}\mathbf{H}\mathbf{\Phi_{\mathrm{OFDM}}} denotes the equivalent channel matrix with ZP-OFDM modulation and demodulation matrices. When the value of loss1\mathrm{loss_{1}} is less than zero, it indicates that the performance of the learned modem surpasses ZP-OFDM.

III-B2 Stage II: Convergence stage

The weights derived from Stage I initialize UWAModNet for Stage II. Two different channel matrices 𝐇1,𝐇2\mathbf{H}_{1},\mathbf{H}_{2} selected from the training dataset are fed into our network. Adjustments to the weights will be made according to the output matrices {𝚽1,𝚿1H},{𝚽2,𝚿2H}\{\mathbf{\Phi}_{1},\mathbf{\Psi}_{1}^{H}\},\{\mathbf{\Phi}_{2},\mathbf{\Psi}_{2}^{H}\} to ensure convergence. The loss function for this stage is as follows.

loss2\displaystyle\mathrm{loss_{2}} =\displaystyle= (30)
α[f(𝐇1e,OFDM)f(𝐇1e)+f(𝐇2e,OFDM)f(𝐇2e)]\displaystyle\alpha\cdot[f(\mathbf{H}_{1e,\mathrm{OFDM}})-f(\mathbf{H}_{1e})+f(\mathbf{H}_{2e,\mathrm{OFDM}})-f(\mathbf{H}_{2e})]
+(1α)[g(𝚽1,𝚽2)+g(𝚿1H,𝚿2H)]\displaystyle+(1-\alpha)\cdot[g(\mathbf{\Phi}_{1},\mathbf{\Phi}_{2})+g(\mathbf{\Psi}^{H}_{1},\mathbf{\Psi}^{H}_{2})]

where g(𝐗1,𝐗2)=𝐗1𝐗2Fg(\mathbf{X}_{1},\mathbf{X}_{2})=\|\mathbf{X}_{1}-\mathbf{X}_{2}\|_{F} represents the Frobenius norm and the balance parameter α\alpha is defaulted to 0.01. In equation (30), the first term assesses system performance like loss1\mathrm{loss_{1}}, and the second term quantifies the variance among output modulation and demodulation matrices generated from different channel matrices.

After Stage II, the outputs for various channel matrices largely converge within the specified range of channel parameters. The channel matrices from the validation set are then fed into the well-trained UWAModNet, and the resulting matrices are averaged and normalized to produce the final modulation and demodulation matrices, namely the learned modem.

IV Simulation Results and Analysis

IV-A Experiment settings

In this section, we describe the UWA channel parameters employed to create the training and validation datasets, some of which are displayed in Table I, partially following [8].

TABLE I: Channel Parameters
Parameter Value
Carrier frequency (fcf_{c}) 15kHz
Bandwidth (BB) 10kHz
Sampling rate (FsF_{s}) 10kHz
Number of subcarriers (NN) 70
Symbol duration (TT) 12.8ms
Guard interval (TgT_{g}) 10.0ms
Maximum path delay (τmax\tau_{\max}) 10.0ms
Maximum Doppler scaling factor (amaxa_{\max}) 0.001
Number of paths (PP) 20
Constellation mapping QPSK
Input: fcf_{c}, BB, FsF_{s}, NN, TT, TgT_{g}, τmax\tau_{\max}, amaxa_{\max} and PP
Output: a pair of {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\}
1 initialize {Ap}\{A_{p}\}, {τp}\{\tau_{p}\}, {ap}\{a_{p}\}, for p=1,2,,Pp=1,2,\cdots,P randomly according to their respective distributions
2M=FsTM=\left\lfloor F_{s}T\right\rfloor, M=FsT+FsTgM^{\prime}=\left\lfloor F_{s}T+F_{s}T_{g}\right\rfloor
3initialize 𝐇=𝟎M×M\mathbf{H}=\mathbf{0}_{M^{\prime}\times M}
4for p=1,2,,Pp=1,2,\cdots,P do
5       compute ξp\xi_{p} according to equation (15)
6      compute 𝚲p\mathbf{\Lambda}_{p} according to equation (16)
7      compute 𝚪p\mathbf{\Gamma}_{p} according to equation (17) and (18)
8      𝐇𝐇+ξp𝚲p𝚪p\mathbf{H}\leftarrow\mathbf{H}+\xi_{p}\mathbf{\Lambda}_{p}\mathbf{\Gamma}_{p}
9 end for
10
11compute 𝐅M\mathbf{F}_{M} according to equation (21)
12compute the extraction matrix 𝐗\mathbf{X}, ensuring the selected NN subcarriers are as evenly distributed as possible within the passband.
13compute 𝚽OFDM\mathbf{\Phi}_{\mathrm{OFDM}} according to equation (20)
14compute 𝚿OFDMH\mathbf{\Psi}_{\mathrm{OFDM}}^{H} according to equation (22)
15compute the equivalent channel matrix for ZP-OFDM system 𝐇e,OFDM=𝚿OFDMH𝐇𝚽OFDM\mathbf{H}_{e,\mathrm{OFDM}}=\mathbf{\Psi}^{H}_{\mathrm{OFDM}}\mathbf{H}\mathbf{\Phi_{\mathrm{OFDM}}}
16separate the real and imaginary parts from 𝐇\mathbf{H} and recombine them into a 2×M×M2\times M^{\prime}\times M tensor
separate the real and imaginary parts from 𝐇e,OFDM\mathbf{H}_{e,\mathrm{OFDM}} and recombine them into a 2×N×N2\times N\times N tensor
Algorithm 2 {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\} Pair Generation Process

Based on the parameters in Table I, we deduce that M=FsT=128M=\left\lfloor F_{s}T\right\rfloor=128, M=FsT+FsTg=228M^{\prime}=\left\lfloor F_{s}T+F_{s}T_{g}\right\rfloor=228, and thus the number of null subcarriers MN=58M-N=58. In addition, following [8], the path amplitude ApA_{p} is distributed as Api.i.d𝒞𝒩(0,1)A_{p}\stackrel{{\scriptstyle i.i.d}}{{\sim}}\mathcal{CN}(0,1), the path delay τp\tau_{p} follows a uniform distribution τpi.i.d𝒰(0,τmax)\tau_{p}\stackrel{{\scriptstyle i.i.d}}{{\sim}}\mathcal{U}(0,\tau_{\max}), and the path Doppler scaling factor apa_{p} conforms to a uniform distribution api.i.d𝒰(1/(1+amax)1,amax)a_{p}\stackrel{{\scriptstyle i.i.d}}{{\sim}}\mathcal{U}(1/(1+a_{\max})-1,a_{\max}) for p=1,2,,Pp=1,2,\cdots,P.

The training dataset consists of 15,000 UWA channel matrices 𝐇\mathbf{H} and their corresponding equivalent channel matrices for the ZP-OFDM system 𝐇e,OFDM\mathbf{H}_{e,\mathrm{OFDM}}. The validation dataset contains 5,000 pairs of {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\}. The training set is used to adjust UWAModNet weights, while the validation set helps to evaluate the performance during training and finalize the learned modem. Additionally, a testing dataset with 10,000 pairs of {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\} is used to generate equivalent sub-channel rate curves and bit error rate curves for system performance evaluation. The process of generating a pair of {𝐇,𝐇e,OFDM}\{\mathbf{H},\mathbf{H}_{e,\mathrm{OFDM}}\} is shown in Algorithm 2.

For other parameters, the amplification factor KK in equation (25) is set to 10 by default. The Adam optimizer, configured with β1=0.9,β2=0.999\beta_{1}=0.9,\beta_{2}=0.999, learning rate lr=1×103lr=1\times 10^{-3}, and ϵ=1×108\epsilon=1\times 10^{-8}, is used to train UWAModNet. Furthermore, both Stage I and Stage II of our training are configured with 400 epochs and a batch size of 100, with our UWAModNet operating at a signal-to-noise ratio (SNR) of 20 dB.

IV-B Performance of the learned modem

In this section, two figures are presented to demonstrate the performance of the learned modem, one showing the equivalent sub-channel rate and the other depicting the bit error rate.

Fig. 2 shows how average and minimum equivalent sub-channel rates vary with SNR ranging from -5 to 20 dB with amax=0.001a_{\max}=0.001. At an SNR of 20 dB, the learned modem enhances the average equivalent sub-channel rate by 38.5% and boosts the minimum equivalent sub-channel rate by 190.8% compared to ZP-OFDM. The figure demonstrates that the learned modem achieves a higher average and minimum rate, indicating significant improvements in overall and worst performance.

Refer to caption
Figure 2: Average and minimum equivalent sub-channel rate with amax=0.001a_{\max}=0.001.
Refer to caption
Figure 3: Comparison of bit error rate for UWAModNet and ZP-OFDM under testing constraints amax=0.001a_{\max}=0.001 and amax=0.002a_{\max}=0.002, originally trained at amax=0.001a_{\max}=0.001.

As for the bit error rate, at the transmitter, the bit sequence 𝐛\mathbf{b} is translated to the symbol sequence 𝐬\mathbf{s} using QPSK constellation mapping with Gray coding. At the receiver, we perform linear zero-forcing (LZF) equalization on the demodulated signal 𝐲\mathbf{y} as follows.

𝐳=(𝐇^H𝐇^)1𝐇^H𝐲\mathbf{z}=\left(\hat{\mathbf{H}}^{H}\hat{\mathbf{H}}\right)^{-1}\hat{\mathbf{H}}^{H}\mathbf{y} (31)

where 𝐳\mathbf{z} denotes the output of the equalizer. We show the performance of both ICI-aware LZF equalization (𝐇^=𝐇𝐞\hat{\mathbf{H}}=\mathbf{H_{e}}) and ICI-ignorant one-tap LZF equalization (𝐇^=diag(𝐇𝐞)\hat{\mathbf{H}}=\mathrm{diag}(\mathbf{H_{e}})) under constraints amax=0.001a_{\max}=0.001 and amax=0.002a_{\max}=0.002.

In Fig. 3, the learned modem achieves comparable BER performance to ICI-aware ZP-OFDM but at a lower computational cost, and it significantly outperforms ICI-ignorant ZP-OFDM. At an SNR of 20 dB, with amax=0.001a_{\max}=0.001, the learned modem declines BER by 84% compared to ICI-ignorant ZP-OFDM.

Under poorer channel conditions, with amax=0.002a_{\max}=0.002, the learned modem, originally trained at amax=0.001a_{\max}=0.001, suffers some performance loss but still outperforms ICI-ignorant ZP-OFDM, demonstrating good robustness.

V Conclusion

In this paper, to effectively reduce the severe ICI caused by the Doppler scale in the UWA channels, a data-driven modem generation method has been introduced. In order to derive a learned modem within a specified range of UWA channel parameters, we have proposed an optimization criterion suitable for deep learning, developed a multi-resolution CNN named UWAModNet, and made its corresponding two-stage training strategy. We have performed simulations and demonstrated that the learned modem has a better performance than ZP-OFDM. In order to make the scheme more practical, future work may focus on the corresponding channel estimation methods or adjustments to the experiment settings of the dataset generation.

References

  • [1] J. Xu, M. A. Kishk, and M.-S. Alouini, “Coverage enhancement of underwater internet of things using multilevel acoustic communication networks,” IEEE Internet of Things Journal, vol. 9, no. 24, pp. 25 373–25 385, 2022.
  • [2] X. Wang, X. Shi, J. Wang, and Z. Sun, “Iterative lmmse-sic detector for dse-aware underwater acoustic otfs systems,” IEEE Transactions on Vehicular Technology, vol. 73, no. 7, pp. 9895–9910, 2024.
  • [3] C. Zhang, Y. Xie, D. Mishra, T. Pacino, J. Shao, B. Li, J. Yuan, P. Chen, and Y. Rong, “A low complexity channel emulator for underwater acoustic communications,” in OCEANS 2023 - MTS/IEEE U.S. Gulf Coast, 2023, pp. 1–8.
  • [4] H. L. Nguyen Thi, Q. Khuong Nguyen, and V. D. Nguyen, “A combination of time and frequency synchronization with doppler compensation for coded ofdm-based uwa systems,” in 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2023, pp. 1304–1309.
  • [5] M. Wen, X. Cheng, L. Yang, Y. Li, X. Cheng, and F. Ji, “Index modulated ofdm for underwater acoustic communications,” IEEE Communications Magazine, vol. 54, no. 5, pp. 132–137, 2016.
  • [6] B. Li, S. Zhou, M. Stojanovic, L. Freitag, and P. Willett, “Multicarrier communication over underwater acoustic channels with nonuniform doppler shifts,” IEEE Journal of Oceanic Engineering, vol. 33, no. 2, pp. 198–209, 2008.
  • [7] X. Wang and J. Wang, “Pilot allocation for mimo-zp-ofdm systems in underwater acoustic channel based on structured compressive sensing,” in 2016 IEEE/OES China Ocean Acoustics (COA), 2016, pp. 1–5.
  • [8] A. K. P., C. R. Murthy, and P. Muralikrishna, “Variable bandwidth multicarrier communications: A new waveform for the delay-scale channel,” in 2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC), 2022, pp. 1–5.
  • [9] H. Zhang, X. Wang, J. Tan, and J. Wang, “Modem optimization of high-mobility scenarios: A deep-learning-inspired approach,” in 2024 IEEE International Conference on Communications Workshops (ICC Workshops), 2024, pp. 1–6.
  • [10] Z. Lu, J. Wang, and J. Song, “Multi-resolution csi feedback with deep learning in massive mimo system,” in ICC 2020 - 2020 IEEE International Conference on Communications (ICC), 2020, pp. 1–6.
  • [11] C. R. Berger, S. Zhou, J. C. Preisig, and P. Willett, “Sparse channel estimation for multicarrier underwater acoustic communication: From subspace methods to compressed sensing,” IEEE Transactions on Signal Processing, vol. 58, no. 3, pp. 1708–1721, 2010.