This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\UseRawInputEncoding

Ordinary Differential Equation-based CNN for Channel Extrapolation over RIS-assisted Communication

Meng Xu, Shun Zhang, Senior Member, IEEE, Caijun Zhong, Senior Member, IEEE, Jianpeng Ma, Member, IEEE, Octavia A. Dobre, Fellow, IEEE M. Xu, S. Zhang and J. Ma are with the State Key Laboratory of Integrated Services Networks, Xidian University, Xi an 710071, P. R. China (e-mail: mxu_\_20@stu.xidian.edu.cn, zhangshunsdu@xidian.edu.cn, jpmaxdu@gmail.com).C. Zhong is with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, P. R. China (e-mail: caijunzhong@zju.edu.cn).O. A. Dobre is with Faculty of Engineering and Applied Science, Memorial University, St. John’s NL AIC-5S7, Canada (e-mail: odobre@mun.ca).
Abstract

The reconfigurable intelligent surface (RIS) is considered as a promising new technology for reconfiguring wireless communication environments. To acquire the channel information accurately and efficiently, we only turn on a fraction of all the RIS elements, formulate a sub-sampled RIS channel, and design a deep learning based scheme to extrapolate the full channel information from the partial one. Specifically, inspired by the ordinary differential equation (ODE), we set up connections between different data layers in a convolutional neural network (CNN) and improve its structure. Simulation results are provided to demonstrate that our proposed ODE-based CNN structure can achieve faster convergence speed and better solution than the cascaded CNN.

Index Terms:
Convolutional neural network, RIS, channel extrapolation, ordinary differential equation, sub-sample.

I Introduction

With the increasing demands for communication services, the number of connected devices continues to augment exponentially and new, blossoming service requirements pose more constraints on the network. At the same time, the increased power consumption and hardware cost remain key issues [1]. A recent technological breakthrough that holds the potential to overcome these technological bottlenecks is reconfigurable intelligent surface (RIS). It can be coated on any environmental object in a cost-effective manner, thereby facilitating the large scale deployment. Due to the physical characteristics of RIS, it can reflect incident electromagnetic waves, and adjust their amplitude and phase in a controlled manner. In other words, RIS can manipulate the communication environment in an intelligent way. Furthermore, the reflection elements of RIS usually work in a passive state, which makes RIS have low power consumption [2]. Hence, RIS is considered as a promising technology and attracts more and more attention.

Similar to the case of other communication systems [3], the acquisition of channel state information (CSI) is an important problem and has become a hot research topic in RIS-assisted communication systems. In [4], Ardah et al. designed a two-stage channel estimation framework with high resolution. In [5], the atomic norm minimization was resorted to implement the channel estimation over RIS-aided MIMO system in the millimeter wave frequency band.

All of the above works depend on the hypothetical statistical model. However, in the actual communication scenario, the radio scattering conditions change rapidly with time and are very complicated. This makes the traditional methods have some limitations [3]. With the development of the artificial intelligence, the application of deep learning (DL) in RIS-aided systems has attracted extensive attention. In [6], the authors adopted fully connected neural networks to estimate the RIS channel and detect the symbols. In [7], Elbir et al. designed a twin convolutional neural network (CNN) to estimate the direct and the cascaded channel in a RIS-aided communication system. However, due to the passive characteristics of RIS, the channels from the source to RIS and that from RIS to destination are coupled, and the size of the equivalent channel is in scale with the number of the RIS elements, which is usually large enough to accurately manipulate an incoming electromagnetic (EM) field. Thus, it would cost many pilot resource to directly achieve the equivalent channel of large size at destination. Thus, how to reduce the overhead of the channel estimation is an interesting topic over the RIS-aided network. Recently, Taha et al. used a small part of RIS elements to sub-sample the channels, and optimize the beamforming vector of RIS with the channel estimated at the selected elements [8].

In this paper, we further examine the channel compression over the physical space for RIS-aided communication. After selecting a fraction of the RIS elements, we achieve the equivalent channel formed by the source, the destination and the chosen RIS elements. Then, we extrapolate the channels to all elements from those estimated at chosen elements, where DL is adopted. Furthermore, inspired by the ordinary differential equation (ODE), we modify the structure of the cascaded CNN by adding cross-layer connections, namely introducing coefficients and linear calculations between the network layers. The proposed ODE-based CNN can obtain more accurate solutions, and its performance can be verified to be better than the cascaded CNN.

II System And Channel Model

As shown in Fig. 1, let us consider an indoor scenario, where multiple-antenna base station (BS) communicates with a single-antenna user via RIS reflection. A BS is equipped with a uniform linear array (ULA) with MM antennas. RIS is in the form of a uniform planar array (UPA) and consists of L=LhLvL=L_{h}L_{v} elements, where LhL_{h} and LvL_{v} separately denotes the sizes along the horizontal and vertical dimensions. Moreover, the orthogonal frequency division multiplexing (OFDM) scheme is adopted and the number of subcarriers is KK. Since the indoor environment is easy to be blocked by objects or people, the direct channel between the BS and the user may be destroyed. Thus, we only consider the channel reflected by RIS, i.e., the cascaded channel, instead of the direct channel. Obviously, the cascaded channel consists of two parts: the link from the BS to the RIS, i.e., 𝐇L×M\mathbf{H}\in\mathbb{C}^{L\times M}, and that from the RIS to the user, i.e., 𝐠HL×1\mathbf{g^{\mathrm{H}}}\in\mathbb{C}^{L\times 1}. The received signal at the kk-th subcarrier of the user can be given as

yk=𝐠k𝚿𝐇k𝐬k+nk,\displaystyle y_{k}=\mathbf{g}_{k}\mathbf{\Psi}\mathbf{H}_{k}\mathbf{s}_{k}+n_{k}, (1)

where 𝚿L×L\mathbf{\Psi}\in\mathbb{C}^{L\times L} is a diagonal matrix, i.e., 𝚿=diag{β1exp(jϕ1),,βLexp(jϕL)}\mathbf{\Psi}=\text{diag}\{\beta_{1}\text{exp}(j\phi_{1}),\dots,\beta_{L}\text{exp}(j\phi_{L})\}, 𝐬k\mathbf{s}_{k} is the downlink M×1M\times 1 transmitted signal at the kk-th sub-carrier, and nk𝒞𝒩(0,σn2)n_{k}\sim\mathcal{CN}(0,\sigma_{n}^{2}) is the addictive white Gaussian noise. Due to the lack of signal processing capability at RIS, 𝚿\boldsymbol{\Psi} is the same at different sub-carriers. Notice that ϕi\phi_{i} in 𝚿\mathbf{\Psi} represents the phase shift introduced by each RIS element while βi\beta_{i} controls this element’s on-off state, which will be described in the following. Moreover, the channel between BS and RIS at the kk-th subcarrier is given by

𝐇k=1Ki=1Phhi,fcej2πkτh,iKTs𝐚r(ϕh,i,θh,i)𝐚tH(ψh,i),\displaystyle\mathbf{H}_{k}=\frac{1}{\sqrt{K}}\sum_{i=1}^{P_{h}}h_{i,f_{c}}e^{-j2\pi\frac{k\tau_{h,i}}{KT_{s}}}\mathbf{a}_{r}(\phi_{h,i},\theta_{h,i})\mathbf{a}_{t}^{\mathrm{H}}(\psi_{h,i}), (2)

where hi,fch_{i,f_{c}} is the complex channel gain along the ii-th scattering path at the carrier frequency fcf_{c}, τh,i\tau_{h,i} is the time delay, and 𝐚r(ϕh,i,θh,i)\mathbf{a}_{r}(\phi_{h,i},\theta_{h,i}), 𝐚t(ψh,i)\mathbf{a}_{t}(\psi_{h,i}) are the spatial steering vectors, with ϕh,i\phi_{h,i} and θh,i\theta_{h,i} as the azimuth angle and elevation angle of the receiver, respectively, and ψh,i\psi_{h,i} as the angle of departure (AoD). Correspondingly, 𝐚r(ϕh,i,θh,i)\mathbf{a}_{r}(\phi_{h,i},\theta_{h,i}) can be written as

𝐚r(ϕh,i,θh,i)=𝐚el(ϕh,i)𝐚az(ϕh,i,θh,i)𝒞L×1,\displaystyle\mathbf{a}_{r}(\phi_{h,i},\theta_{h,i})=\mathbf{a}_{el}(\phi_{h,i})\otimes\mathbf{a}_{az}(\phi_{h,i},\theta_{h,i})\in\mathcal{C}^{L\times 1}, (3)

where the Lv×1L_{v}\times 1 vector 𝐚el(ϕh,i)=[1,ej2πdλccosϕh,i,,ej2πdλc(Lv1)cosϕh,i]T\mathbf{a}_{el}(\phi_{h,i})=[1,e^{-j2\pi\frac{d}{\lambda_{c}}\cos\phi_{h,i}},\dots,e^{-j2\pi\frac{d}{\lambda_{c}}(L_{v}-1)\cos\phi_{h,i}}]^{\mathrm{T}} and the Lh×1L_{h}\times 1 vector 𝐚az(ϕh,i,θh,i)=[1,ej2πdλcsinϕh,icosθh,i,,ej2πdλc(Lh1)sinϕh,icosθh,i]T\mathbf{a}_{az}(\phi_{h,i},\theta_{h,i})=[1,e^{-j2\pi\frac{d}{\lambda_{c}}\sin\phi_{h,i}\cos\theta_{h,i}},\dots,e^{-j2\pi\frac{d}{\lambda_{c}}(L_{h}-1)\sin\phi_{h,i}\cos\theta_{h,i}}]^{\mathrm{T}}. λc\lambda_{c} is the carrier wavelength and dd denotes antenna spacing. Furthermore, \otimes represents the Kronecker product operator and []T[\cdot]^{\mathrm{T}} represents the transpose. Moreover, 𝐚t(ψh,i)\mathbf{a}_{t}(\psi_{h,i}) can be given by

𝐚t(ψh,i)=1M[1,ej2πλcdsinψh,i,,ej2πλcd(M1)sinψh,i]T.\displaystyle\mathbf{a}_{t}(\psi_{h,i})=\frac{1}{\sqrt{M}}[1,e^{j\frac{2\pi}{\lambda_{c}}d\sin\psi_{h,i}},\dots,e^{j\frac{2\pi}{\lambda_{c}}d(M-1)\sin\psi_{h,i}}]^{\mathrm{T}}. (4)

Correspondingly, the channel between the RIS and the user at the kk-th subcarrier is

𝐠kH=1Ki=1Pggi,fcej2πkτg,iKTs𝐚tH(ϕg,i,θg,i),\displaystyle\mathbf{g}_{k}^{\mathrm{H}}=\frac{1}{\sqrt{K}}\sum_{i=1}^{P_{g}}g_{i,f_{c}}e^{-j2\pi\frac{k\tau_{g,i}}{KT_{s}}}\mathbf{a}_{t}^{\mathrm{H}}(\phi_{g,i},\theta_{g,i}), (5)

where the structure of 𝐚t(ϕg,i,θg,i)\mathbf{a}_{t}(\phi_{g,i},\theta_{g,i}) is similar to that of 𝐚r(ϕh,i,θh,i)\mathbf{a}_{r}(\phi_{h,i},\theta_{h,i}).

The cascaded channel matrix between BS and the user at the kk-th subcarrier can be defined as 𝐂k=𝐆k𝐇k\mathbf{C}_{k}=\mathbf{G}_{k}\mathbf{H}_{k}, where 𝐆k=diag{𝐠k}\mathbf{G}_{k}=\mathrm{diag}\{\mathbf{g}_{k}\}, and 𝐂k\mathbf{C}_{k} has a size of L×ML\times M. Then, let us define the LM×1LM\times 1 vector 𝐜k=[𝐜k,1T,,𝐜k,LT]T\mathbf{c}_{k}=[\mathbf{c}_{k,1}^{\mathrm{T}},\dots,\mathbf{c}_{k,L}^{\mathrm{T}}]^{\mathrm{T}}, where 𝐜k,i\mathbf{c}_{k,i} represents the ii-th column of 𝐂k\mathbf{C}_{k}. Within the RIS communication system, it is proved that the optimal 𝚿\boldsymbol{\Psi} is closely related with all the cascaded channels at KK subcarriers, i.e., 𝐂=[𝐜1,𝐜2,,𝐜K]\mathbf{C}=[\mathbf{c}_{1},\mathbf{c}_{2},\dots,\mathbf{c}_{K}] [9]. Hence, our aim is to estimate 𝐂\mathbf{C}.

Refer to caption
Figure 1: RIS-aided communication system in an indoor scene.

III Proposed Channel Extrapolation Method

III-A Framework Design

Theoretically, we can send a pilot matrix 𝐗k\mathbf{X}_{k} of size ML×SML\times S to directly recover 𝐜k\mathbf{c}_{k} with the linear estimator, where SS represents the time duration of 𝐗k\mathbf{X}_{k}. From the Bayesian estimation theory, we can effectively recover 𝐜k\mathbf{c}_{k} when SMLS\geq ML. However, in massive MIMO systems, both MM and LL are relatively large. Then, a significant number of pilot needs to be employed, which drastically decreases the spectrum efficiency of the transmission. To overcome this bottleneck, we can utilize a fraction of NN RIS elements and sub-sample 𝐂k\mathbf{C}_{k}. Without loss of generality, we set the number of the selected RIS elements as NN, which can be implemented through setting the parameters βi\beta_{i}, ϕi\phi_{i} in 𝚿\boldsymbol{\Psi}, i=1,2,,Li=1,2,\ldots,L. Specifically, for the NN chosen elements, we set their ϕi\phi_{i} and βi\beta_{i} as 0 and 1, respectively. For others, the corresponding amplitude parameter βi\beta_{i} is 0. After this operation, the size of the sub-sampled cascaded channel at the kk-th subcarrier is reduced to NN. Correspondingly, the sub-sampled cascaded channel at KK subcarriers can be written as 𝐂~MN×K\widetilde{\mathbf{C}}\in\mathbb{C}^{MN\times K}. Obviously, compared with 𝐂\mathbf{C}, a pilot sequence of shorter time duration would be required to estimate 𝐂~\widetilde{\mathbf{C}}.

If the power of the pilot sequence is large enough, we can estimate 𝐂~\widetilde{\mathbf{C}} with quite high accuracy. However, we should utilize 𝐂~\widetilde{\mathbf{C}} to infer the unknown cascaded channel at the LNL-N non-chosen RIS elements. Thus, in the following, we construct a DL-based framework to extrapolate 𝐂\mathbf{C} from 𝐂~\widetilde{\mathbf{C}}. It may be worth noting that the selection pattern for the RIS elements can impact the performance of the channel extrapolation and should be optimized. This topic is beyond the scope of this paper, though. Further, we adopt the uniform sampling scheme.

III-B ODE-based Channel Extrapolation

As mentioned above, the input of the network is 𝐂~\widetilde{\mathbf{C}}, and its output is 𝐂\mathbf{C}. The task of our network is to learn the mapping function from 𝐂~\widetilde{\mathbf{C}} to 𝐂\mathbf{C}. In other words, we want to estimate the complete channel through the sub-sample version, which is made possible by the correlation between different RIS elements.

The channel extrapolation is similar to the super-resolution in the field of image processing. For this kind of problem, CNN has great advantages and is very suitable to use the correlation between data elements for information completion. In order to get better network performance, we can increase the number of data layers or modify the network structure. However, more layers will result in higher calculational requirements. Moreover, when the number of layers reaches a certain number, the improvement become less and less. Sometimes, the excessive deepening of the network causes the gradient explosion and disappearance. Thus, optimizing the network structure is more widely used than simply deepening the neural network. Theoretically, if we add some proper connections between layers, the performance of the network may be better, like residual neural network (ResNet) [10].

Recently, ODE have been introduced to the neural network and utilized to describe the latent relation between different data layers [11]. With such powerful characterization, we could speed up the convergence and learning performance of the CNN. Moreover, with the development of mathematical science, it is possible to use the numerical solutions of differential equations to modify the network structure and obtain possible gains. Here, we incorporate two numerical approximation methods, i.e., LeapFrog and Runge-Kutta methods, into CNN. The main difference between them lies in the approximation accuracy.

𝐋𝐞𝐚𝐩𝐅𝐫𝐨𝐠𝐦𝐞𝐭𝐡𝐨𝐝\mathbf{LeapFrog\ method} : LeapFrog method is a second-order approximation scheme and can be written as

yn+1=yn1+2hf(xn,yn),\displaystyle y_{n+1}=y_{n-1}+2hf(x_{n},y_{n}), (6)

where f(xn,yn)f(x_{n},y_{n}) denotes the derivative at (xn,yn)(x_{n},y_{n}) and 2h2h can be seen as an interval of width xn+1xn1x_{n+1}-x_{n-1}. Applying (6) for the CNN, we can connect the (n+1n+1)-th layer with the (n1n-1)-th one. The corresponding relationship can be formulated as:

𝐃n=𝐃n2+G(𝐃n1),n=3,4,5,,\displaystyle\mathbf{D}_{n}=\mathbf{D}_{n-2}+G(\mathbf{D}_{n-1}),\ n=3,4,5,\dots, (7)

where 𝐃i\mathbf{D}_{i} represents the output data of the ii-th layer, and G()G(\cdot) is an operation containing a ReLu activation function, a convolution layer and a multiplier.

Remark 1

The LeapFrog method is an improved version of the forward Euler equation, which can be written as yn+1=yn+hf(xn,yn)y_{n+1}=y_{n}+hf(x_{n},y_{n}). The forward Euler method is the simplest first-order approximation of ODE and has a similar structure with ResNet.

𝐑𝐮𝐧𝐠𝐞𝐊𝐮𝐭𝐭𝐚𝐦𝐞𝐭𝐡𝐨𝐝𝐬\mathbf{Runge-Kutta\ methods} : The theory of the numerical ODEs suggests that a higher-order approximation results in less truncation error and higher accuracy. Hence, we turn to the Runge-Kutta methods, which are common numerical methods for ODE and can be expressed as [12]

yn+1=yn+i=1IγiGi,\displaystyle y_{n+1}=y_{n}+\sum_{i=1}^{I}\gamma_{i}G_{i}, (8)

and G1=hf(xn,yn)G_{1}=hf(x_{n},y_{n}), while GiG_{i} has the form of

Gi=hf(xn+αih,yn+j=1i1βijGj),i=2,3,,n,\displaystyle G_{i}=hf(x_{n}+\alpha_{i}h,y_{n}+\sum_{j=1}^{i-1}\beta_{ij}G_{j}),\ i=2,3,\cdots,n, (9)

where II represents the number of stages; αi\alpha_{i} and βi\beta_{i}, like γi\gamma_{i} in (8), are the related coefficients of the ii-th stage.

If we set γ1=16,γ2=23,γ3=16,α2=12,β21=12,α3=1,β31=1,β32=2\gamma_{1}=\frac{1}{6},\gamma_{2}=\frac{2}{3},\gamma_{3}=\frac{1}{6},\alpha_{2}=\frac{1}{2},\beta_{21}=\frac{1}{2},\alpha_{3}=1,\beta_{31}=-1,\beta_{32}=2, we obtain the 3-stage Runge-Kutta equation as the basic structure in our work, which can be expressed as

yn+1\displaystyle y_{n+1} =yn+16(G1+4G2+G3),\displaystyle=y_{n}+\frac{1}{6}(G_{1}+4G_{2}+G_{3}), (10)

where G1G_{1}, G2G_{2}, G3G_{3} can be separately written as

G1\displaystyle G_{1} =hf(xn,yn),\displaystyle=hf(x_{n},y_{n}), (11)
G2\displaystyle G_{2} =hf(xn+h2,yn+12G1),\displaystyle=hf(x_{n}+\frac{h}{2},y_{n}+\frac{1}{2}G_{1}), (12)
G3\displaystyle G_{3} =hf(xn+h,ynG1+2G2).\displaystyle=hf(x_{n}+h,y_{n}-G_{1}+2G_{2}). (13)
Refer to caption
Figure 2: (a) Our proposed network architecture; (b) The structure of the RK3-Block; (c) The architecture of operation GG, which contains two ReLu activation functions and two convolutional layers.

With (10) - (13), we can construct an improved CNN structure, referred to as RK3-Block and depicted in Fig. 2 (b). Correspondingly, the constraints among different layers in this block can be written as

𝐃1=\displaystyle\mathbf{D}_{1}= 𝐃0+12G(𝐃0),𝐃2=𝐃0G(𝐃0)+2G(𝐃1),\displaystyle\ \mathbf{D}_{0}+\frac{1}{2}G^{\prime}(\mathbf{D}_{0}),\ \mathbf{D}_{2}=\mathbf{D}_{0}-G^{\prime}(\mathbf{D}_{0})+2G^{\prime}(\mathbf{D}_{1}), (14)
𝐃3=𝐃0+16(G(𝐃0)+4G(𝐃1)+G(𝐃2)),\displaystyle\mathbf{D}_{3}=\mathbf{D}_{0}+\frac{1}{6}(G^{\prime}(\mathbf{D}_{0})+4G^{\prime}(\mathbf{D}_{1})+G^{\prime}(\mathbf{D}_{2})), (15)

where the positions of 𝐃i\mathbf{D}_{i}, i=0,1,2,3i=0,1,2,3, are presented in Fig. 2 (b), and the operation G()G^{\prime}(\cdot) contains two ReLu activation functions and two convolution layers. Similar to RK3-Block, with (7), we can obtain a modified CNN structure from the LeapFrog approximation and refer to it as LF-Block, which includes three data layers. As shown in Fig. 2 (a), we can cascade several RK3-Blocks or LF-Blocks to deepen the network for better results.

III-C Learning Scheme

The valid input of our network is the sub-sampled channel 𝐂~MN×K\widetilde{\mathbf{C}}\in\mathbb{C}^{MN\times K}, and the label is the entire cascaded channel 𝐂ML×K\mathbf{C}\in\mathbb{C}^{ML\times K}. In order to facilitate the training and the generation of data, we set the entries of 𝐂\mathbf{C} related with the LNL-N non-chosen RIS elements as 0 and obtain the resultant matrix 𝐂o\mathbf{C}^{o}, whose non-zero entries are same with those of 𝐂~\widetilde{\mathbf{C}}. Correspondingly, we treat 𝐂o\mathbf{C}^{o} as the raw input of the ODE-based CNN.

Then, we reshape the raw input data and the label of the network as 𝐙IN=[(𝐂o);(𝐂o)]\mathbf{Z}_{\mathrm{IN}}=[\Re({\mathbf{C}^{o}});\Im({\mathbf{C}^{o}})] and 𝐙TA=[(𝐂);(𝐂)]\mathbf{Z}_{\mathrm{TA}}=[\Re(\mathbf{C});\Im(\mathbf{C})], respectively. Both 𝐙IN\mathbf{Z}_{\mathrm{IN}} and 𝐙TA\mathbf{Z}_{\mathrm{TA}} are real-valued matrices with the size of ML×K×2ML\times K\times 2. Correspondingly, the output of this network can be written as 𝐙OUT=[(𝐂^);(𝐂^)]ML×K×2\mathbf{Z}_{\mathrm{OUT}}=[\Re(\widehat{\mathbf{C}});\Im(\widehat{\mathbf{C}})]\in\mathbb{C}^{ML\times K\times 2}, where 𝐂^\widehat{\mathbf{C}} is the estimate of 𝐂{\mathbf{C}}. In our proposed network, there are NcN_{c} convolutional layers. In the nn-th layer, the input is processed by NkN_{k} convolutional kernels of size H×WH\times W. Note that HH and WW represent the height and the width of the convolutional kernels. Normally, the size of the output data in each convolutional layer depends on HH and WW, and it is usually slightly smaller than the input data.

During the learning stage, the parameter vector 𝝎=[𝝎1T,𝝎2T,,𝝎NcT]T\boldsymbol{\omega}=[\boldsymbol{\omega}_{1}^{T},\boldsymbol{\omega}_{2}^{\mathrm{T}},\dots,\boldsymbol{\omega}_{N_{c}}^{T}]^{T} is optimized by minimizing the mean squared error (MSE) between the output 𝐙OUT\mathbf{Z}_{\mathrm{OUT}} and the target 𝐙TA\mathbf{Z}_{\mathrm{TA}}, where the vector 𝝎n\boldsymbol{\omega}_{n} contains all the model parameters of the nn-th layer, n=1,2,,Ncn=1,2,\ldots,N_{c}. Hence, the loss function can be written as

=1MbMLKi=1Mb[𝐙TA]i[𝐙OUT]iF2,\displaystyle\mathcal{L}=\frac{1}{M_{b}MLK}\sum_{i=1}^{M_{b}}\begin{Vmatrix}[\mathbf{Z}_{\mathrm{TA}}]_{i}-[\mathbf{Z}_{\mathrm{OUT}}]_{i}\end{Vmatrix}_{F}^{2}, (16)

where 𝐀F||\mathbf{A}||_{F} is the Frobenius\mathrm{Frobenius} norm of matrix 𝐀\mathbf{A} and MbM_{b} denotes the batch size for training. Here, the adaptive moment estimation (Adam) [13] algorithm is adopted to achieve the best 𝝎\boldsymbol{\omega}, which is controlled by the learning rate η\eta.

TABLE I: Layer Parameters for the CNN with ODE-RK3 Structure.
Layer Output size Activation Kernel size Strides
1×1\times Conv2D 256×64×128256\times 64\times 128 None 5×55\times 5 1×11\times 1
4×4\times RK3-Block 256×64×128256\times 64\times 128 ReLu 3×33\times 3 1×11\times 1
1×1\times Conv2D 256×64×2256\times 64\times 2 None 3×33\times 3 1×11\times 1
TABLE II: Performance Comparison of Different Structures under Different Sampling Rates.
Sampling Rate Method
ODE-RK3
(Loss / NMSE)
ODE-LF
(Loss / NMSE)
CNN
(Loss / NMSE)
1/2 0.00001 / -39.59dB 0.00002 / -35.53dB 0.00003 / -32.37dB
1/4 0.00002 / -33.77dB 0.00009 / -28.18dB 0.00012 / -26.78dB
1/8 0.00086 / -18.12dB 0.00133 / -16.25dB 0.00151 / -15.69dB
1/16 0.0155 / -5.7dB 0.01834 / -4.95dB 0.01929 / -4.69dB

IV Simulation Results

Refer to caption
Figure 3: The NMSE of channel extrapolation versus epochs.

In this section, we evaluate the channel extrapolation performance of ODE-based CNN through numerical simulation. We first describe the communication scenario and dataset source, and then show the parameters of the training network. Finally, we show the simulation results and explain the performance of our proposed network.

The scenario we consider is an indoor scene with user, BS and RIS. To generate this scenario, we resort to the indoor distributed massive MIMO scenario I1 of the DeepMIMO dataset, which is generated based on the Wireless InSite [14].

The ULA at BS has 44 antennas, i.e., M=4M=4, while the size of the RIS’s UPA is 8×88\times 8, i.e., L=64L=64. The carrier frequency of channel estimation is 2.42.4 GHz. The OFDM signal bandwidth is set as 2020 MHz, while the number of subcarriers is K=64K=64. The antenna spacing is λ2\frac{\lambda}{2}, and the number of paths is 55. Furthermore, the activated users are located from the 11-st row to the 100100-th row. Each row contains 201201 users, and the total number of users is 2010020100. The users are split in two parts, i.e., the training and the test groups, according to the ratio 4:14:1. The sampling rate r=NLr=\frac{N}{L} is separately set as 12,14,18\frac{1}{2},\frac{1}{4},\frac{1}{8} and 116\frac{1}{16}.

In the simulations, we adopt three network structures for comparison, i.e., the ODE-RK3 structure formed by some RK3-Blocks, the ODE-LF structure containing several LF-Blocks and the cascaded CNN network. For fairness, all CNNs have 2626 layers and the same number of parameters. The ODE-based network contains 44 RK3-Blocks or LF-Blocks (each block consists of 66 convolutional layers), a head convolutional layer and a tail convolutional layer. Considering ODE-RK3 structure as an example, we list the layer parameters of the CNN in TABLE I. Specially, in the hidden layers, the number of neurons is 128128, and ReLU\mathrm{ReLU} is adopted as the activation function, i.e., ReLU(x)=max(x,0)\mathrm{ReLU}(x)=\max(x,0). The kernel size of the first convolutional layer is 5×55\times 5, and that of the remainder convolutional layers is set as 3×33\times 3. The learning rate η\eta is initialized as 0.00050.0005 and decreases with increased iteration times. Specifically, after 4040 iterations, the learning rate reduces by 20%20\% for every 1010 epochs.

TABLE II shows the performance of different network structures. It can be noted that the proposed ODE CNN is always superior to the cascaded CNN network, and this gain enhances with the increase of the sampling rate rr. The RK3 structure with three-order performs better than the third-order LF structure. When sampling rates are 12\frac{1}{2} and 14\frac{1}{4}, the ODE-RK3 structure can achieve satisfactory results. Furthermore, in terms of channel extrapolation normalized MSE (NMSE), the ODE-RK3 network at r=14r=\frac{1}{4} performs better than the CNN network at r=12r=\frac{1}{2}, which means that the length of the pilot for the sub-sampled channel 𝐂~\widetilde{\mathbf{C}} estimation can be reduced through introducing the ODE structure. If the compression ratio is relatively low, such as r=116r=\frac{1}{16}, the performance of the ODE-based CNN is not significantly better than that of the cascaded CNN due to the reduced raw input information for the channel extrapolation.

Refer to caption
Figure 4: The loss of channel extrapolation versus epochs.

Fig. 3 depicts the curves of NMSE with respect to the number of epochs. Two structures (the ODE-RK3 and cascaded CNN structures) and two sample rates (14\frac{1}{4} and 18\frac{1}{8}) are considered here. It can be seen that with the increase of iteration time, all the NMSE curves present a downward trend and reach the stable levels after 115115 epochs. Furthermore, for a given sampling rate, the NMSEs of the ODE-based CNN are always lower than those of the cascaded CNN. Fig. 4 depicts the training loss of different CNN structures versus epochs. It can be checked from Fig. 4 that, with fixed rate rr, the training loss in the ODE-RK3 network decreases faster than that in the cascaded CNN network, which means that the ODE-based network can be trained more quickly than CNN.

Refer to caption
Figure 5: The NMSEs of the channel extrapolation verus the frequency gaps.

So far, we considered the channel extrapolation at the same frequency band. However, our proposed scheme can still be used for the case with frequency difference. In actual systems, such as the frequency division duplexing system, the uplink and downlink channels operate in different frequency bands. Fig. 5 shows the performance of the ODE-RK3 and the cascaded CNN structures under different frequency gaps. As can be seen from Fig. 5, both ODE-RK3 and CNN structures are affected by the frequency gaps. As the frequency difference increases, the NMSE of channel extrapolation slightly augments. It is worth noting that the ODE-RK3 structure always performs better than cascaded CNN, which proves the stability and effectiveness of the proposed ODE-based CNN.

V Conclusion

In this paper, we have examined a RIS-assisted MIMO communication system, and designed an ODE-based CNN to extrapolate the cascaded channel. In our scheme, only part of the full CSI is needed. Hence, some of the RIS elements could be turned off through spatial sampling, which greatly reduces the length of the pilot sequence in the channel estimation phase and improves the resource utilization. Simulation results have demonstrated that the proposed extrapolation scheme can effectively compress the large-scale RIS channel over the physical space. Moreover, the ODE-based structure can speed up the convergence and improve the performance of the cascaded CNN.

References

  • [1] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and challenges with very large arrays,” IEEE Signal Process. Mag., vol. 30, no. 1, pp. 40–60, Jan. 2013.
  • [2] W. Yan, X. Yuan, Z. He, and X. Kuai, “Passive beamforming and information transfer design for reconfigurable intelligent surfaces aided multiuser MIMO systems,” IEEE J. Sel. Areas Commun., pp. 1–1, 2020.
  • [3] J. Ma, S. Zhang, H. Li, F. Gao, and S. Jin, “Sparse Bayesian learning for the time-varying massive MIMO channels: Acquisition and tracking,” IEEE Trans. Commun., vol. 67, no. 3, pp. 1925–1938, Mar. 2019.
  • [4] K. Ardah, S. Gherekhloo, A. L. F. de Almeida, and M. Haardt, “TRICE: An efficient channel estimation framework for RIS-aided MIMO communications,” arXiv:2008.09499, 2020. [Online]. Available: https://arxiv.org/abs/2008.09499.
  • [5] J. He, H. Wymeersch, and M. Juntti, “Channel estimation for RIS-aided mmWave MIMO systems via atomic norm minimization,” arXiv:2007.08158, 2020. [Online]. Available: https://arxiv.org/abs/2007.08158.
  • [6] S. Khan, K. S Khan, N. Haider, and S. Y. Shin, “Deep-learning-aided detection for reconfigurable intelligent surfaces,” arXiv:1910.09136, 2020. [Online]. Available: https://arxiv.org/abs/1910.09136.
  • [7] A. M. Elbir, A. Papazafeiropoulos, P. Kourtessis and S. Chatzinotas, “Deep channel learning for large intelligent surfaces aided mm-wave massive MIMO systems,” IEEE Wireless Commun. Lett., vol. 9, no. 9, pp. 1447-1451, Sept. 2020.
  • [8] A. Taha, M. Alrabeiah, and A. Alkhateeb, “Enabling large intelligent surfaces with compressive sensing and deep learning,” arXiv:1904.10136v2, 2019. [Online]. Available: https://arxiv.org/abs/1904.10136v2.
  • [9] S. Lin, B. Zheng, G. C. Alexandropoulos, M. Wen, M. Di Renzo, and F. Chen, “Reconfigurable intelligent surfaces with reflection pattern modulation: Beamforming design and performance analysis,” IEEE Wireless Commun. Lett., pp. 1-1, Oct. 2020.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, pp. 770-778, Dec. 2016.
  • [11] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” arXiv:1806.07366v5, 2019. [Online]. Available: https://arxiv.org/abs/1806.07366.
  • [12] X. He, Z. Mo, P. Wang, Y. Liu, M. Yang, and J. Cheng, “ODE-inspired network design for single image super-resolution,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, pp. 1732-1741, Jun. 2019.
  • [13] O. P. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” arXiv:1412.6980, 2014, [Online]. Available: https://arxiv.org/abs/1412.6980
  • [14] A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” in Proc. Information Theory and Applications Workshop (ITA), San Diego, CA, pp. 1-8, Feb. 2019.