This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Attention Mechanism Based Intelligent Channel Feedback for mmWave Massive MIMO Systems

Yibin Zhang, Jinlong Sun, Member, IEEE, Guan Gui, Senior Member, IEEE, Yun Lin, Member, IEEE, Haris Gacanin, Fellow, IEEE, Hikmet Sari, Life Fellow, IEEE, and Fumiyuki Adachi, Life Fellow, IEEE Yibin Zhang, Jinlong Sun, Guan Gui, and Hikmet Sari are with the College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (e-mail: [email protected], [email protected], [email protected], [email protected]). Yun Lin is with the College of Information and Communication Engineering, Harbin Engineering University, Harbin 150009, China (e-mail: [email protected]) Haris Gacanin is with the Institute for Communication Technologies and Embedded Systems, RWTH Aachen University, Aachen 52062, Germany (e-mail: [email protected]). Fumiyuki Adachi is with the International Research Institute of Disaster Science (IRIDeS), Tohoku University, Sendai 980-8572 Japan (e-mail: [email protected]).
Abstract

The potential advantages of intelligent wireless communications with millimeter wave (mmWave) and massive multiple-input multiple-output (MIMO) are based on the availability of instantaneous channel state information (CSI) at the base station (BS). However, no existence of channel reciprocity leads to the difficult acquisition of accurate CSI at the BS in frequency division duplex (FDD) systems. Many researchers explored effective architectures based on deep learning (DL) to solve this problem and proved the success of DL-based solutions. However, existing schemes focused on the acquisition of complete CSI while ignoring the beamforming and precoding operations. In this paper, we propose an intelligent channel feedback architecture using eigenmatrix and eigenvector feedback neural network (EMEVNet). With the help of the attention mechanism, the proposed EMEVNet can be considered as a dual channel auto-encoder, which is able to jointly encode the eigenmatrix and eigenvector into codewords. Simulation results show great performance improvement and robustness with extremely low overhead of the proposed EMEVNet method compared with the traditional DL-based CSI feedback methods.

Index Terms:
Attention mechanism, massive MIMO, mmWave, deep learning, channel feadback, beamforming, eigen features.

I Introduction

Intelligent wireless communications with millimeter wave (mmWave) band and massive multiple-input multiple-output (MIMO) are considered as the key technology of future communication [1, 2, 3]. In sixth-generation (6G) of mobile communication, mmWave will play an indispensable and important role [4, 5, 6, 7]. In addition, massive MIMO combined with the ultra-large bandwidth of mmWave will become a key technical for Internet of Everything (IoE) [8, 9, 10]. However, all these potential advantages, such as beamforming, power allocation and antenna selection techniques, will be achieved only when the base station (BS) can obtain accurate channel state information (CSI). In time-division duplexing (TDD) systems, the BS can infer the downlink CSI from the uplink CSI with the help of channel reciprocity. Unfortunately, channel reciprocity does not exist in frequency-division duplex (FDD) systems. Therefore, many scholars have been exploring how the BS obtains accurate downlink CSI in the mmWave FDD system in recent years. Compared with the traditional codebook feedback scheme, existing works try to utilize either machine learning (ML) or deep learning (DL) algorithms to obtain CSI at the BS.

Different from the existing codebook feedback scheme, many researches [11, 12, 13, 14, 15, 16, 17, 18, 19] devoted to exploring more accurate CSI feedback schemes. C. Wen et al. [11] innovatively proposed a DL based CSI compression and feedback method, i.e., CsiNet. The proposed CsiNet method compresses the downlink CSI estimated by the user equipment (UE), and then transmits the compressed codewords to the BS. And the BS obtains the accurate downlink CSI after decoding the codewords. J. Guo et al. [13] further explored the solution of multiple compression ratios on the basis of CsiNet, which can adapt to different channel environments. T. Wang et al. [14] explored the time correlation between CSI matrix and proposed to use long short-term memory (LSTM) algorithm to improve the performance of CsiNet. J. Guo et al. [12] summarized the researches using DL methods for CSI feedback in recent years, and claimed that the overhead of DL is too large to be deployed with conventional BS. Therefore, Y. Sun et al. [15, 16] explored an efficient algorithm for lightweight design, aiming to reduce the overhead of CsiNet. M. Chen et al. [19] proposed a DL-based implicit feedback architecture to inherit the low-overhead characteristic for wideband systems. Different from the CsiNet, J. Zeng et al. [17] explored a transfer learning-based fully convolutional network designed to satisfy different channel environments.

Although the DL-based CSI feedback schemes can help the BS to obtain more accurate downlink CSI, they still face the challenge of transmission overhead and utilization of spectrum resources. Furthermore, Z. Zhong et al. [20] pointed out that there is no full channel reciprocity in FDD system but partial channel reciprocity still exists. Hence, many researches [21, 22, 23, 24, 25] still insisted on exploring the solution of predicting downlink CSI from uplink CSI. Y. Yang et al. [21] proposed a sparse complex-valued neural network (SCNet) to approximate the mapping function between uplink and downlink CSI so as to reduce the transmission overhead. Meanwhile, they proposed intelligent algorithms based on meta-learning and transfer learning for multiple different wireless communication environments [22], in order to address the problem of limited datasets in new scenarios. Considering that the CSI matrix can be seen as in-phase/quadrature (I/Q) signal, Y. Zhang et al. [24] introduced a complex network to excavate the implicit information between the two channels of I/Q signal and to improve the overall performance. Y. Yang et al. [25] developed a systematic framework based on deep multimodal learning to predict CSI by multi-source sensing information.

Neither of the feedback and prediction solutions described above is perfect. The CSI feedback scheme will cause extra spectrum overhead, and the accuracy of the CSI prediction scheme is not very good. Hence, J. Wang et al. [26] proposed a compromise solution called SampleDL. The SampleDL requires the user equipment (UE) to transmit sampled downlink CSI to assist the BS in improving the prediction accuracy. The sampleDL aims to combine the advantages of feedback and prediction, which may reduce the feedback overhead and improve the system performance. Recently, some researches have paid attention to the application of downlink CSI. They conduct more specific research for the following beamforming module at the BS, instead of improving the acquiring accuracy of CSI [27, 28, 29, 30, 31]. W. Liu et al. [27] focused on the application of eigenvectors and proposed EVCsiNet to compress and feedback eigenvectors. J. Guo et al. [29] explored the feedback schemes designed for beamforming (CsiFBnet) in both single-cell and multi-cell scenarios. Z. Liu et al. [30] proposed a novel deep unsupervised learning-based approach to optimize antenna selection and hybrid beamforming.

In this paper, we pay more attention on the eigenvector and eigenmatrix obtained by singular value decomposition (SVD) transformation. This paper proposes an attention mechanism based intelligent channel feedback method designed for beamforming at the BS. Considering the applications of downlink CSI at the BS, each UE is required to transmit useful and effective information to the BS rather than the downlink CSI. The main contributions of this paper are summarized as follows:

  • We propose a CSI feedback architecture designed for beamforming, where SVD transformation is utilized as a pre-processing module for CSI matrix.

  • We propose a two-channel compressed feedback network using residual attention mechanism, which is suitable for the joint coding of multi-channel heterogeneous data.

  • We improve the reconstruction performance of codewords at the BS with respect to switching different auto-encoders for different channel types.

  • Comparing with classical methods, the proposed method obtains better reconstruction performance with extremely low feedback overhead, to verify the robustness of our proposed architecture.

II System Model And Problem Formulation

This section introduces the system model researched in this paper. First, The link-level channel model is introduced, which is based on 3rd Generation Partnership Project (3GPP) technical report. Then, we introduce SVD transform and its application to beamforming technology and precoding matrix acquisition. Finally, the scientific issues to be addressed in this paper are described in detail.

II-A Link-level Channel Model

Considering a typical mmWave FDD MIMO communication system, we assume that the BS is equipped with NtN_{t} antennas in the form of uniform linear array (ULA)111We adopt the ULA model here for simpler illustration, nevertheless, the proposed approach does not restrict to the specifical array shape. and the UE is equipped with NrN_{r} antennas (NtNr)(N_{t}\gg N_{r}). Meanwhile, the orthogonal frequency division multiplexing (OFDM) technique is applied to the link-level channel model. Then, the received signal at the UE can be expressed as,

𝒚=𝐇𝒙+𝒏\bm{y}=\mathbf{H}\bm{x}+\bm{n} (1)

where 𝐇NRB×Nr×Nt\mathbf{H}\in\mathbb{C}^{N_{RB}\times N_{r}\times N_{t}} is the downlink CSI between the BS and the UE, nn denotes the noise vector. For an OFDM system, it is necessary to consider multiple subcarriers and OFDM symbols. In this paper, resource blocks (RBs) are used as the channel matrix resolution. Thus, NRBN_{RB} represents the number of RBs used in link-level channel model. Considering a single RB and a pair of transmit and receive antennas, a common multi-path fading channel model [32] is used and can be expressed as,

𝐇=n=1Nm=1MPn,m[cn,me(j2πvn,mt)𝜶(θn,m)]\displaystyle\mathbf{H}=\sum_{n=1}^{N}\sum_{m=1}^{M}\sqrt{P_{n,m}}[c_{n,m}\cdot e^{(j2\pi v_{n,m}t)}\bm{\alpha}(\theta_{n,m})] (2)

where NN and MM denotes the number of scattering clusters and ray pathes, respectively, Pm,nP_{m,n} represents the power of mm-th ray in the nn-th scattering cluster, cn,mc_{n,m} is the coefficient calculated by field patterns and initial random phases, θn,m\theta_{n,m} is the corresponding azimuth angle-of-departure (AoD) of ray path, vn,mv_{n,m} stands for the speed and can be understood as Doppler shift parameter. Then, the the steering vector 𝜶(θn,m)Nt×1\bm{\alpha}(\theta_{n,m})\in\mathbb{C}^{N_{t}\times 1} can be formulated as,

𝜶(θ)=[1,ej2πdλsin(θ),,ej2π(Nt1)dλsin(θ)]\bm{\alpha}(\theta)=\left[1,e^{-j2\pi\frac{d}{\lambda}\sin(\theta)},\dots,e^{-j2\pi\frac{(N_{t}-1)d}{\lambda}\sin(\theta)}\right] (3)

where dd and λ\lambda are the antenna element spacing and carrier wavelength, respectively.

After channel model, we further discuss the probability distribution of LOS channel. Considering an urban macro (UMa) scenario defined by 3GPP TR38.901 [33], we assume that the plane straight-line distance from the UE to the BS is d2Dd_{2D} and the LOS probability is PrLOS\mathrm{Pr}_{LOS}. If d2D18md_{2D}\leq 18~{}{\rm m}, then PrLOS=1\mathrm{Pr}_{LOS}=1, else the PrLOS\mathrm{Pr}_{LOS} can be calculated via

PrLOS=\displaystyle\mathrm{Pr}_{LOS}= [18d2D+exp(d2D63)(118d2D)]\displaystyle\left[\frac{18}{d_{2D}}+\exp\left(-\frac{d_{2D}}{63}\right)\left(1-\frac{18}{d_{2D}}\right)\right] (4)
[1+0.8C(hUT)(d2D100)3exp(d2D150)]\displaystyle\cdot\left[1+0.8\cdot C(h_{UT})\left(\frac{d_{2D}}{100}\right)^{3}\exp\left(-\frac{d_{2D}}{150}\right)\right]

where the C(hUT)C(h_{UT}) can be found in (5), and the hUTh_{UT} denotes the antenna height for the UE.

C(hUT)={0,hUT13m(hUT1310)1.5,13mhUT28mC(h_{UT})=\left\{\begin{aligned} 0&,&h_{UT}\leq 13~{}{\rm m}\\ \left(\frac{h_{UT-13}}{10}\right)^{1.5}&,&13~{}{\rm m}\leq h_{UT}\leq 28~{}{\rm m}\end{aligned}\right. (5)

To sum up, NLOS channel is a more common scenario with the popularization of mmWave systems.

II-B Applications of SVD Transformation

This subsection shows the advantages of SVD transformation and its application in wireless communication. In order to reduce the conflict between multi-rays and increase the channel capacity in massive MIMO system, the transmitter needs to use the beamforming technology to precode the data flow according to the quality of channel. A conventional precoding matrix is based on SVD transformation of CSI matrix.

Considering channel model mentioned above with CSI matrix as 𝐇NRB×Nr×Nt\mathbf{H}\in\mathbb{C}^{N_{RB}\times N_{r}\times N_{t}}. For simplicity of description, we discuss only one RB here222This assumption is only for the brief illustration of SVD, and this solution is also applicable to OFDM systems., i.e. RB=1RB=1 and 𝐇Nr×Nt\mathbf{H}\in\mathbb{C}^{N_{r}\times N_{t}}. First, the CSI matrix 𝐇\mathbf{H} should carry on SVD transformation as

𝐇=𝐔𝚺𝐕\mathbf{H}=\mathbf{U}\cdot\mathbf{\Sigma}\cdot\mathbf{V}^{*} (6)

where 𝐔Nr×Nr\mathbf{U}\in\mathbb{C}^{N_{r}\times N_{r}} and 𝐕Nt×Nt\mathbf{V}\in\mathbb{C}^{N_{t}\times N_{t}} are the left-singular and the right-singular matrices333Both 𝐔\mathbf{U} and 𝐕\mathbf{V} will be called as eigenmatrix in the follows., respectively. What is more, 𝐔𝐔=𝐈Nr,𝐕𝐕=𝐈Nt\mathbf{U}\mathbf{U}^{*}=\mathbf{I}_{N_{r}},\mathbf{V}\mathbf{V}^{*}=\mathbf{I}_{N_{t}} 444𝐗\mathbf{X}^{*} denotes conjugate transpose matrix of 𝐗\mathbf{X}.. Note that 𝚺=(Λ,0)\mathbf{\Sigma}=(\Lambda,0) and Λ\Lambda can be expressed as follows:

Λ=(λ100λNr)Nr×Nr\Lambda=\left(\begin{array}[]{ccc}\sqrt{\lambda_{1}}&\cdots&0\\ \vdots&\ddots&\vdots\\ 0&\cdots&\sqrt{\lambda_{N_{r}}}\end{array}\right)_{N_{r}\times N_{r}} (7)

which represents the singular value matrix. And we define the eigenvalues of 𝐇𝐇\mathbf{H}\mathbf{H}^{*} as 𝐬=[λ1,λ2,,λNr]\mathbf{s}=\left[\lambda_{1},\lambda_{2},\cdots,\lambda_{N_{r}}\right]. Next, the application of SVD transformation is introduced in detail. The unitary matrices 𝐕\mathbf{V} and 𝐔\mathbf{U} are used as precoding matrices for transmitter and receiver, respectively. When BS needs to sent the parallel data flow 𝒙=[x1,x2,,xNt]T\bm{x}=[x_{1},x_{2},\cdots,x_{N_{t}}]^{T} to multiuser, right-singular matrix 𝐕\mathbf{V} will be used for precoding: 𝒙t=𝐕𝒙\bm{x}_{t}=\mathbf{V}\cdot\bm{x}. Thirdly, we consider a typical signal transmission model as

𝒚=𝐇𝒙t+𝒏\bm{y}=\mathbf{H}\bm{x}_{t}+\bm{n} (8)

where 𝒚\bm{y} is the received data flow and 𝒏\bm{n} denotes the noise vector. The channel matrix 𝐇\mathbf{H} can be expressed by (6), and we can obtain

𝒚\displaystyle\bm{y} =𝐔𝚺𝐕𝐕𝒙+𝒏\displaystyle=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^{*}\mathbf{V}\cdot\bm{x}+\bm{n} (9)
=𝐔𝚺𝒙+𝒏\displaystyle=\mathbf{U}\mathbf{\Sigma}\cdot\bm{x}+\bm{n}

Finally, the receiver will use 𝐔\mathbf{U}^{*} for receiver combining, which can be expressed as

𝐔𝒚\displaystyle\mathbf{U}^{*}\bm{y} =𝐔(𝐔𝚺𝒙+𝒏)\displaystyle=\mathbf{U}^{*}(\mathbf{U}\mathbf{\Sigma}\cdot\bm{x}+\bm{n}) (10)
=𝚺𝒙+𝐔𝒏\displaystyle=\mathbf{\Sigma}\bm{x}+\mathbf{U}^{*}\bm{n}

The noise component in (10) will be filtered out by the receiver. Therefore, the receiver can recover the data flow 𝒙\bm{x} by 𝚺\mathbf{\Sigma}.

In summary, as a transmitter, the BS should pay more attention to eigenmatrix 𝐕\mathbf{V}, which can support the following precoding module. And eigenvector 𝚺\mathbf{\Sigma} should be applied to receiver combining the data flow when the BS is a receiver.

II-C Problem Formulation

As analyzed above, the CSI matrix 𝐇\mathbf{H} of a massive MIMO system is too complex to compress and reconstruct at the BS accurately. Meanwhile, considering the specific application of downlink CSI at the BS, we find that the BS prefers to obtain a perfect eigenmatrix 𝐕\mathbf{V} (right-singular matrix) and eigenvector 𝐬\mathbf{s}. What’s more, the eigenmatrix 𝐕\mathbf{V} is unitary matrix which is symmetric and easily compressible, and the eigenvector 𝐬\mathbf{s} is a simple real-value vector. Therefore, this paper proposes to jointly compress the unitary matrix 𝐕\mathbf{V} and the corresponding eigenvector 𝐬\mathbf{s}, and feed back the codeword to the BS.

Although downlink channel estimation is challenging, this topic is beyond the scope of this paper.We assume that perfect CSI has been acquired and focus on the feedback scheme. Cosidering a classical mmWave massive MIMO FDD system described above, we focus on the eigenmatrix 𝐕NRB×Nt×Nt\mathbf{V}\in\mathbb{C}^{N_{RB}\times N_{t}\times N_{t}} and the eigenvector 𝐒NRB×Nr\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}}. The UE needs to deploy an encoder to jointly encode 𝐕\mathbf{V} and 𝐒\mathbf{S}, which can be formulated as,

ε=fen(𝐕,𝐒,Θen)\displaystyle\varepsilon=f_{en}\left(\mathbf{V},\mathbf{S},\Theta_{en}\right) (11)

where ε\varepsilon represents the codewords encoded by the UE, Θen\Theta_{en} is the weight parameter of the encoder, and fen()f_{en}(\cdot) stands for the framework of encoder. The role of the encoder is to extract high-dimensional features from 𝐕\mathbf{V} and 𝐒\mathbf{S} respectively, and match a suitable mapping function to convert them into codewords. When received the codewords ε\varepsilon, the BS needs to switch the corresponding decoder to interpret and obtain the required 𝐕\mathbf{V} and 𝐒\mathbf{S}. The decoder can be expressed as,

𝐕^,𝐒^=fde(fen(𝐕,𝐒,Θen),Θde)\displaystyle\widehat{\mathbf{V}},\widehat{\mathbf{S}}=f_{de}\left(f_{en}\left(\mathbf{V},\mathbf{S},\Theta_{en}\right),\Theta_{de}\right) (12)

where 𝐕^,𝐒^\widehat{\mathbf{V}},\widehat{\mathbf{S}} are the reconstructed eigenmatrix and eigenvector at the BS, fde()f_{de}(\cdot) denotes the framework of decoder, and Θde\Theta_{de} is the corresponding weight value.

Refer to caption
Figure 1: Illustration of the overview of proposed EMEV feedback architecture. CSI matrix 𝐇\mathbf{H} is estimated by pilot, and then be divided into eigenmatrix 𝐔,𝐕\mathbf{U},\mathbf{V} and eigenvector 𝐒\mathbf{S} by SVD transformation. 𝐔\mathbf{U} and 𝐒\mathbf{S} are used for channel identification, and the encoder is used to jointly encode 𝐕\mathbf{V} and 𝐒\mathbf{S} into codewords ε\varepsilon. The BS deploys the decoder to reconstruct 𝐕^,𝐒^\widehat{\mathbf{V}},\widehat{\mathbf{S}}.

This paper utilizes neural network (NN) based encoder and decoder to complete the compression and feedback of 𝐕\mathbf{V} and 𝐒\mathbf{S}. When training NN based auto-encoder, the loss function used by the optimizer is mean squared error (MSE) and can be expressed as,

MSE=𝔼[Γ(𝐕𝐕^22,𝐒𝐒^22)]\displaystyle MSE=\mathbb{E}\left[\Gamma\left(\|\mathbf{V}-\widehat{\mathbf{V}}\|^{2}_{2},\|\mathbf{S}-\widehat{\mathbf{S}}\|^{2}_{2}\right)\right] (13)

where 𝔼()\mathbb{E}(\cdot) stands for mathematical expectation, 22\|\cdot\|^{2}_{2} denotes the Euclidean norm, and Γ()\Gamma(\cdot) is a joint loss estimation function, weighted average function in general. The main problem explored is to solve the optimal weights of the NN-based encoder and decoder, which can be formulated by,

(Θen,Θde)=argminΘen,Θde𝔼[Γ(𝐕𝐕^22,𝐒𝐒^22)]\displaystyle\left(\Theta_{en}^{*},\Theta_{de}^{*}\right)=\mathop{\arg\min}\limits_{\Theta_{en},\Theta_{de}}\mathbb{E}\left[\Gamma\left(\|\mathbf{V}-\widehat{\mathbf{V}}\|^{2}_{2},\|\mathbf{S}-\widehat{\mathbf{S}}\|^{2}_{2}\right)\right] (14)

where Θen\Theta_{en}^{*} and Θde\Theta_{de} are the optimal weights of encoder and decoder, respectively.

III DL-based EMEV Feedback Architecture

This section describes the proposed DL-based eigenmatrix and eigenvector (EMEV) feedback architecture in detail. Based on the SVD transformation and its application for beamforming, we pay more attention to the eigenmatrix 𝐕\mathbf{V} and eigenvector 𝐒\mathbf{S} in this paper. First, the overview of proposed EMEV feedback architecture is shown. Then, the NN desigend for the EMEV auto-encoder is displayed and analyzed by different modules.

III-A Overview of The Proposed Architecture

This part is the overview of DL-based EMEV feedback architecture. We aim to explore efficient feedback schemes for beamforming. As is shown in Fig. 1, the whole process starts when the UE estimates the real-time downlink CSI through pilot. And then the CSI matrix 𝐇\mathbf{H} is divided into eigenmatrix 𝐔,𝐕\mathbf{U},\mathbf{V} and eigenvector 𝐒\mathbf{S} by SVD transformation. From the figure we can find that 𝐇\mathbf{H} is complex and irregular, but 𝐔\mathbf{U} and 𝐕\mathbf{V} are unitary matrices and 𝐒\mathbf{S} exhibits a scatter distribution. Furthermore, the power distributions of 𝐔\mathbf{U} and 𝐕\mathbf{V} are symmetric. Thirdly, the UE inputs 𝐔\mathbf{U} and 𝐒\mathbf{S} to the NN-based channel identification to obtain the exact channel type. The detailed NN-based channel identification is shown in our conference paper [34], called EMEV-IdNet. Considering the clustered delay line (CDL) channel model compliant with 5G new radio (NR) standards[33], we explores five common channel types, composed of three none line of sight (NLOS) channels and two line of sight (LOS) channels. Since the eigenmatrix distributions of five channel types are quite different [35], we cascade the channel identification before EMEV encoder to improve the system performance. After channel identification, the UE will joint-encode 𝐕\mathbf{V} and 𝐒\mathbf{S} into codewords and feedback to the BS. As is analyzed above, 𝐕\mathbf{V} and 𝐒\mathbf{S} will be able to meet the requirements of beamforming and communication for BS. Finally, the BS receives and decodes the codewords ε\varepsilon and reconstructs eigenmatrix 𝐕^\widehat{\mathbf{V}} and eigenvector 𝐒^\widehat{\mathbf{S}}. The algorithm flow of proposed EMEV feedback architecture is described in Algorithm 1.

Refer to caption
Figure 2: Illustration of the overall framework of DL-based EMEVNet. Feature extraction module: Input are 𝐕\mathbf{V} and 𝐒\mathbf{S}; Output ξV\xi_{V} and ξS\xi_{S} are high-dimensional feature of 𝐕\mathbf{V} and 𝐒\mathbf{S} respectively. Transcoding module: Input are ξV\xi_{V} and ξS\xi_{S}; Output is codewords ε\varepsilon. Decoder module: Input is received codewords ε\varepsilon at the BS; Output are reconstructed 𝐕^\widehat{\mathbf{V}} and 𝐒^\widehat{\mathbf{S}}.
Input: 𝐇NRB×Nr×Nt\mathbf{H}\in\mathbb{C}^{N_{RB}\times N_{r}\times N_{t}}\leftarrow CSI matrix;
Output: 𝐒^NRB×Nr\widehat{\mathbf{S}}\in\mathbb{R}^{N_{RB}\times N_{r}}\leftarrow Reconstructed eigenvector; 𝐕^NRB×Nt×Nt\widehat{\mathbf{V}}\in\mathbb{C}^{N_{RB\times N_{t}\times N_{t}}}\leftarrow Reconstructed eigenmatirx; idid\leftarrow Channel type;
1 Stage I: UE operations:
2 SVD transformation: Initialize 𝐔NRB×Nr×Nr,𝐒NRB×Nr,𝐕NRB×Nt×Nt\mathbf{U}\in\mathbb{C}^{N_{RB}\times N_{r}\times N_{r}},\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}},\mathbf{V}\in\mathbb{C}^{N_{RB\times N_{t}\times N_{t}}};
3 for i=1,NRBi=1,\cdots N_{RB}  do
4       𝐔𝐭,𝑺𝒕,𝐕𝐭=fsvd(𝐇(i,:,:))\mathbf{U_{t}},\bm{S_{t}},\mathbf{V_{t}}=f_{svd}(\mathbf{H}(i,:,:))
5       if 𝐔𝐭𝐒𝐭𝐕𝐭==𝐇(i,:,:)\mathbf{U_{t}}\cdot\bm{S_{t}}\cdot\mathbf{V_{t}}==\mathbf{H}(i,:,:) then
6             𝐔(i,;,:)=𝐔𝐭\mathbf{U}(i,;,:)=\mathbf{U_{t}}; 𝐒(i,:)=𝑺𝒕\mathbf{S}(i,:)=\bm{S_{t}}; 𝐕(i,;,:)=𝐕𝐭\mathbf{V}(i,;,:)=\mathbf{V_{t}}.
7       end if
8      
9 end for
10Save 𝐔,𝐒,𝐕\mathbf{U},\mathbf{S},\mathbf{V}.
11 Channel identification: Load trained EMEV-IdNet and identify the channel type idfid(𝐔,𝐒)id\leftarrow f_{id}(\mathbf{U},\mathbf{S});
12 Encoder: Switch appropriate encoder by idid and generate feedback codewords: ε=fen(𝐕,𝐕)\varepsilon=f_{en}(\mathbf{V},\mathbf{V})
13 Stage II: BS operations:
Decoder: Switch appropriate decoder by idid and reconstruct precoding matrix: (𝐕^,𝐒^)=fde(ε)\left(\widehat{\mathbf{V}},\widehat{\mathbf{S}}\right)=f_{de}(\varepsilon)
Algorithm 1 The algorithm of proposed channel feedback architecture based on EMEV feature.

III-B The Proposed DL-based EMEV Feedback Network

This subsection shows the overall framework of proposed DL-based EMEV feedback neural network, called EMEVNet. Fig. 2 is the illustration of its framework.

As is described in Fig. 2, the EMEVNet is an auto-encoder which is combined with an encoder at the UE and a decoder at the BS. And the encoder can be further divided into feature extraction and transcoding modules. We design a feature extraction module with dual-channel input layer. Different convolution layers are used for different inputs, i.e. three-dimensional convolution layer (Conv3D) is for 𝐕NRB×Nt×Nt\mathbf{V}\in\mathbb{C}^{N_{RB}\times N_{t}\times N_{t}} and two-dimensional convolution layer (Conv2D) for 𝐒NRB×Nr\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}}. The high-dimensional feature maps after convolution layers will be compressed to one dimensional feature ξV\xi_{V} and ξS\xi_{S} by fully-connected layers. Thus, the feature extraction module can be described as,

(ξV,ξS)=fc[conv(𝐕,𝐒,Ωconv),Ωfc]\displaystyle\left(\xi_{V},\xi_{S}\right)=\mathcal{L}_{fc}\left[\mathcal{L}_{conv}\left(\mathbf{V},\mathbf{S},\Omega_{conv}\right),\Omega_{fc}\right] (15)

where fc(),conv()\mathcal{L}_{fc}(\cdot),\mathcal{L}_{conv}(\cdot) represent fully-connected layer and convolution layer respectively, and Ωfc,Ωconv\Omega_{fc},\Omega_{conv} are their corresponding weight values. Then, ξV\xi_{V} and ξS\xi_{S} pass through the transcoding module and output codewords ε\varepsilon at specified system compression ratio. The attention mechanism555The attention mechanism will be described separately when analyzing the transcoding module. based attention residual block and fully-connected layer are combined into a transcoding module, which is formulated as,

ε=fc[att(5)(ξV,ξS,Ωatt),βCR,Ωfc]\displaystyle\varepsilon=\mathcal{L}_{fc}\left[\mathcal{L}_{att}^{(5)}\left(\xi_{V},\xi_{S},\Omega_{att}\right),\beta_{CR},\Omega_{fc}\right] (16)

where att(5)\mathcal{L}_{att}^{(5)} indicates 5 loops of attention residual block, βCR\beta_{CR} is the input system compression ratio, and Ωatt\Omega_{att} is the corresponding weight value. Therefore, the length of ε\varepsilon is determined by system βCR\beta_{CR}, which can be defined as,

Lε=L[(𝐕)]+L[(𝐕)]+L[𝐒]βCR+id\displaystyle L_{\varepsilon}=\frac{L[\Re({\mathbf{V}})]+L[\Im({\mathbf{V}})]+L[\mathbf{S}]}{\beta_{CR}}+id (17)

where L[]L[\cdot] denotes the length of the variable, ()\Re(\cdot) and ()\Im(\cdot) are the real and imaginary parts of complex numbers, LεL_{\varepsilon} is the length of codewords ε\varepsilon, and idid represents the control symbol of channel identification result. Finally, the BS can reconstruct eigenmatrix 𝐕^\widehat{\mathbf{V}} and eigenvector 𝐒^\widehat{\mathbf{S}} from received ε\varepsilon by utilizing decoder. The decoder is composed of fully-connected layers, convolutional residual blocks and convolution layers, which can be written as,

(𝐕^,𝐒^)=conv{res(3)[fc(ε,Ωfc)Ωres],Ωconv}\displaystyle\left(\widehat{\mathbf{V}},\widehat{\mathbf{S}}\right)=\mathcal{L}_{conv}\left\{\mathcal{L}_{res}^{(3)}\left[\mathcal{L}_{fc}\left(\varepsilon,\Omega_{fc}\right)\Omega_{res}\right],\Omega_{conv}\right\} (18)

where res(3)\mathcal{L}_{res}^{(3)} stands for 3 loops of convolutional residual block and Ωres\Omega_{res} is its weight values. In summary, the concrete steps of EMEVNet algorithm are given in the Algorithm 2.

Input: 𝐕NRB×Nt×Nt\mathbf{V}\in\mathbb{C}^{N_{RB}\times N_{t}\times N_{t}}\leftarrow Eigenmatrix; 𝐒NRB×Nr\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}}\leftarrow Eigenvector; η\eta\leftarrow Initial learning rate; τ\tau\leftarrow Maximum epoch number; βCR\beta_{CR}\leftarrow System compression ratio;
Output: 𝐕^\widehat{\mathbf{V}}\to Reconstructed eigenmatrix; 𝐒^\widehat{\mathbf{S}}\to Reconstructed eigenvector; (Θen,Θde)\left(\Theta_{en},\Theta_{de}\right)\to Trained autoencoder parameters;
1 Training stage:
2 Load 𝐕NRB×Nt×Nt,𝐒NRB×Nr\mathbf{V}\in\mathbb{C}^{N_{RB}\times N_{t}\times N_{t}},\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}};
3 Randomly initialize NN weight parameters Θen,Θde\Theta_{en},\Theta_{de};
4 for t=1,,τt=1,\cdots,\tau do
5       (ξV,ξS)=fc[conv(𝐕,𝐒,Ωconv),Ωfc]\left(\xi_{V},\xi_{S}\right)=\mathcal{L}_{fc}\left[\mathcal{L}_{conv}\left(\mathbf{V},\mathbf{S},\Omega_{conv}\right),\Omega_{fc}\right]
6       ε=fc[att(5)(ξV,ξS,Ωatt),βCR,Ωfc]\varepsilon=\mathcal{L}_{fc}\left[\mathcal{L}_{att}^{(5)}\left(\xi_{V},\xi_{S},\Omega_{att}\right),\beta_{CR},\Omega_{fc}\right]
7       𝐕^,𝐒^=conv{res(3)[fc(ε,Ωfc)Ωres],Ωconv}\widehat{\mathbf{V}},\widehat{\mathbf{S}}=\mathcal{L}_{conv}\left\{\mathcal{L}_{res}^{(3)}\left[\mathcal{L}_{fc}\left(\varepsilon,\Omega_{fc}\right)\Omega_{res}\right],\Omega_{conv}\right\}
8       losst=𝔼[Γ(𝐕𝐕^22,𝐒𝐒^22)]loss_{t}=\mathbb{E}\left[\Gamma\left(\|\mathbf{V}-\widehat{\mathbf{V}}\|^{2}_{2},\|\mathbf{S}-\widehat{\mathbf{S}}\|^{2}_{2}\right)\right]
9       if losstloss_{t} converges to lossloss^{*} then
10            break;
11       end if
12      if losstloss_{t} is not updated after 20 loops then
13            η=η×0.7\eta=\eta\times 0.7;
14       end if
15      ΩAdam(Ω,η,losst)\Omega\leftarrow{\rm Adam}(\Omega,\eta,\nabla loss_{t})
16      
17 end for
18[Θen,Θde][Ωfc,Ωatt,Ωconv,Ωres]\left[\Theta_{en}^{*},\Theta_{de}^{*}\right]\leftarrow\left[\Omega_{fc},\Omega_{att},\Omega_{conv},\Omega_{res}\right];
Save fen(βCR,Θen),fde(βCR,Θde)f_{en}(\beta_{CR},\Theta_{en}^{*}),f_{de}(\beta_{CR},\Theta_{de}^{*}).
Algorithm 2 The proposed EMEVNet algorithm for eigenmatrix and eigenvector feedback architecture.
TABLE I: The hyper-parameters setting and analysis of parameters and FLOPs for feature extraction module.
Layer name Hyper-parameters Activation Output shape Parameter size FLOPs
Input(𝐕)(\mathbf{V}) NRB×Nt×Nt×2N_{RB}\times N_{t}\times N_{t}\times 2
Input(𝐒)(\mathbf{S}) NRB×Nr×1N_{RB}\times N_{r}\times 1
Conv3D_1 Filter = 2, Kernel = 3. Leaky Relu NRB×Nt×Nt×2N_{RB}\times N_{t}\times N_{t}\times 2 2×2×322\times 2\times 3^{2} (NRB×Nt×Nt×2)×(2×32)(N_{RB}\times N_{t}\times N_{t}\times 2)\times(2\times 3^{2})
Conv2D_1 Leaky Relu NRB×Nt×2N_{RB}\times N_{t}\times 2 2×2×322\times 2\times 3^{2} (NRB×Nr×2)×(2×32)(N_{RB}\times N_{r}\times 2)\times(2\times 3^{2})
Conv3D_2 Filter = 8, Kernel = 3. Leaky Relu NRB×Nt×Nt×8N_{RB}\times N_{t}\times N_{t}\times 8 8×2×328\times 2\times 3^{2} (NRB×Nt×Nt×8)×(2×32)(N_{RB}\times N_{t}\times N_{t}\times 8)\times(2\times 3^{2})
Conv2D_2 Leaky Relu NRB×Nr×8N_{RB}\times N_{r}\times 8 8×2×328\times 2\times 3^{2} (NRB×Nr×8)×(2×32)(N_{RB}\times N_{r}\times 8)\times(2\times 3^{2})
FCLayer_1(𝐕)(\mathbf{V}) Units = LξVL_{\xi_{V}} Relu LξV×1L_{\xi_{V}}\times 1 [NRB×Nt×Nt×8]×LξV[N_{RB}\times N_{t}\times N_{t}\times 8]\times L_{\xi_{V}} 2×[NRB×Nt×Nt×8]×LξV2\times[N_{RB}\times N_{t}\times N_{t}\times 8]\times L_{\xi_{V}}
FCLayer_1(𝐒)(\mathbf{S}) Units = LξSL_{\xi_{S}} Relu LξS×1L_{\xi_{S}}\times 1 [NRB×Nr×8]×LξS[N_{RB}\times N_{r}\times 8]\times L_{\xi_{S}} 2×[NRB×Nr×8]×LξS2\times[N_{RB}\times N_{r}\times 8]\times L_{\xi_{S}}

III-C Analysis and Discussion of EMEVNet

In this subsection, feature extraction, transcoding and decoding modules are discussed one by one. We analyze the proposed NN from two aspects: time complexity and space complexity [36]. Time complexity refers to the number of operations of the model, which can be measured by FLOPs, i.e., the number of floating-point operations. The space complexity includes two parts: parameter quantity and output characteristic map. Meanwhile, the hyper-parameters and activation functions set for each layer are given in detail.

III-C1 Feature extraction module

As is shown in Fig. 2, the feature extraction module includes two 2D convolution layers, two 3D convolution layers and two fully-connected layers. The time complexity of 2D and 3D convolution layers can be respectively expressed as,

Time{Conv2D}O(SMK2CinCout)\displaystyle Time\left\{Conv2D\right\}\sim O\left(S_{M}\cdot K^{2}\cdot C_{in}\cdot C_{out}\right) (19)
Time{Conv3D}O(VMK3CinCout)\displaystyle Time\left\{Conv3D\right\}\sim O\left(V_{M}\cdot K^{3}\cdot C_{in}\cdot C_{out}\right) (20)

where SM,VMS_{M},V_{M} are the area and volume of the output feature map of the convolution layer, respectively, KK is the size of the convolution kernel, CinC_{in} denotes the number of input channels from the upper layer, and CoutC_{out} represents the number of output channels. Considering the fully-connected layer, its time complexity can be formulated as,

Time{fc}O(2×Lin×Lout)\displaystyle Time\left\{fc\right\}\sim O\left(2\times L_{in}\times L_{out}\right) (21)

where LinL_{in} and LoutL_{out} are the input and output tensor shape of fully-connected layer. After introducing the above time complexity, we continue to describe the space complexity. Eq. (22) to Eq. (24) are the space complexity expressions corresponding to 2D convolution layer, 3D convolution layer and fully-connected layer.

Space{Conv2D}O(K2×Cin×Cout)\displaystyle Space\left\{Conv2D\right\}\sim O\left(K^{2}\times C_{in}\times C_{out}\right) (22)
Space{Conv3D}O(K3×Cin×Cout)\displaystyle Space\left\{Conv3D\right\}\sim O\left(K^{3}\times C_{in}\times C_{out}\right) (23)
Space{fc}O(Lin×Lout)\displaystyle Space\left\{fc\right\}\sim O\left(L_{in}\times L_{out}\right) (24)

Through the above analysis we can find that the convolution layer requires more FLOPs, but its space complexity is relatively low. On the contrary, the space complexity of fully-connected layer is larger, which means high demand for memory overhead. The hyper-parameters set in feature extraction module is listed in Tab. I, and the analysis of complexity is also included.

TABLE II: The hyper-parameters setting and analysis of parameters and FLOPs for transcoding module.
Layer name Hyper-parameters Activation Output shape Parameter size FLOPs
Input(ξV)(\xi_{V}) LξV×1L_{\xi_{V}}\times 1
Input(ξS)(\xi_{S}) LξS×1L_{\xi_{S}}\times 1
Attention_res(𝐕,𝐒)(\mathbf{V},\mathbf{S}) Head_num = 2, Key_dim = 3. LξV×1L_{\xi_{V}}\times 1 2×(LξV2+LξS2)2\times(L^{2}_{\xi_{V}}+L^{2}_{\xi_{S}}) 8×(LξV2+LξS2)8\times(L^{2}_{\xi_{V}}+L^{2}_{\xi_{S}})
Attention_res(𝐕,𝐕)(\mathbf{V},\mathbf{V}) LξV×1L_{\xi_{V}}\times 1 2×(LξV2+LξV2)2\times(L^{2}_{\xi_{V}}+L^{2}_{\xi_{V}}) 8×(LξV2+LξV2)8\times(L^{2}_{\xi_{V}}+L^{2}_{\xi_{V}})
FCLayer_codewords Units = LεL_{\varepsilon} Linear Lε×1L_{\varepsilon}\times 1 (Lε×1)×(Lε×1)(L_{\varepsilon}\times 1)\times(L_{\varepsilon}\times 1) 2×(Lε×1)×(Lε×1)2\times(L_{\varepsilon}\times 1)\times(L_{\varepsilon}\times 1)
TABLE III: The hyper-parameters setting and analysis of parameters and FLOPs for decoder module.
Layer name Hyper-parameters Activation Output shape Parameter size FLOPs
Input(ε)(\varepsilon) Lε×1L_{\varepsilon}\times 1
FCLayer_2(𝐕)(\mathbf{V}) Units = [NRBNtNt2]N_{RB}\cdot N_{t}\cdot N_{t}\cdot 2] Linear [NRBNtNt2][N_{RB}\cdot N_{t}\cdot N_{t}\cdot 2] Lε×[NRBNtNt2]L_{\varepsilon}\times[N_{RB}\cdot N_{t}\cdot N_{t}\cdot 2] 2×Lε×[NRBNtNt2]2\times L_{\varepsilon}\times[N_{RB}\cdot N_{t}\cdot N_{t}\cdot 2]
FCLayer_2(𝐒)(\mathbf{S}) Units = [NRBNr1]N_{RB}\cdot N_{r}\cdot 1] Linear [NRBNt][N_{RB}\cdot N_{t}] Lε×[NRBNr]L_{\varepsilon}\times[N_{RB}\cdot N_{r}] 2×Lε×[NRBNt]2\times L_{\varepsilon}\times[N_{RB}\cdot N_{t}]
Conv3D_res Filter = [2,8,2], Kernel = 3. NRB×Nt×Nt×2N_{RB}\times N_{t}\times N_{t}\times 2 (2+8+2)×2×32(2+8+2)\times 2\times 3^{2} (NRB×Nt×Nt×12)×(12×32)(N_{RB}\times N_{t}\times N_{t}\times 12)\times(12\times 3^{2})
Conv2D_res NRB×Nt×2N_{RB}\times N_{t}\times 2 (1+8+2)×2×32(1+8+2)\times 2\times 3^{2} (NRB×Nr×11)×(11×32)(N_{RB}\times N_{r}\times 11)\times(11\times 3^{2})
Conv3D_3 Filter=2, Kernel=3 Tanh NRB×Nt×Nt×2N_{RB}\times N_{t}\times N_{t}\times 2 2×2×322\times 2\times 3^{2} (NRB×Nt×Nt×2)×(2×32)(N_{RB}\times N_{t}\times N_{t}\times 2)\times(2\times 3^{2})
Conv2D_3 Filter=2, Kernel=3 Linear NRB×NrN_{RB}\times N_{r} 1×2×321\times 2\times 3^{2} (NRB×Nr×1)×(2×32)(N_{RB}\times N_{r}\times 1)\times(2\times 3^{2})

III-C2 Transcoding module

Before the transcoding module, we want to briefly introduce the attention mechanism, which plays an important role in trascoding task. Attention mechanism [37] was proposed by the Bengio team and has been widely used in various fields of deep learning in recent years, such as in computer vision for capturing the receptive field on images, or for positioning in natural language processing (NLP) key token or feature. Fig. 3 shows the detailed tensor flow of the attention mechanism.

Refer to caption
Figure 3: Illustration of attention mechanism. The query (𝒒\bm{q}), key (𝒌\bm{k}), and value (𝒗\bm{v}) are input tensors, and attention value (𝒛\bm{z}) is output tensors.

First, attention distribution 𝒔\bm{s} between the input the query vector 𝒒\bm{q} and the keyword vector 𝒌\bm{k} need to be known, which can be calculated via,

𝒔i=f(𝒒,𝒌i)={𝒒T𝒌i𝒒T𝑾𝒌i[𝒒T𝒌i]/d𝒗tanh(𝑾𝒒+𝑼𝒌i)\bm{s}_{i}=f(\bm{q},\bm{k}_{i})=\begin{cases}\bm{q}^{T}\bm{k}_{i}\\ \bm{q}^{T}\bm{W}\bm{k}_{i}\\ \left[\bm{q}^{T}\bm{k}_{i}\right]/\sqrt{d}\\ \bm{v}\cdot\tanh(\bm{W}\bm{q}+\bm{U}\bm{k}_{i})\end{cases} (25)

where 𝑾\bm{W}, 𝑼\bm{U} and 𝒗\bm{v} are trainable weight coefficients in neural network, and dd denotes the input dimension. Then, the attention distribution 𝒔\bm{s} is normalized to attention score 𝒂\bm{a}, which can be written as,

𝒂i=softmax[f(𝒒,𝒌i)]=ef(𝒒,𝒌i)jef(𝒒,𝒌j)\bm{a}_{i}={\rm softmax}\left[f(\bm{q},\bm{k}_{i})\right]=\frac{e^{f(\bm{q},\bm{k}_{i})}}{\sum_{j}e^{f(\bm{q},\bm{k}_{j})}} (26)

Finally, the output 𝒛\bm{z} of attention mechanism is the result of weighted average of vector 𝒗\bm{v}, which is shown as,

𝒛=Attention(𝒒,𝒌,𝒗)=i𝒂i𝒗i\bm{z}={\rm Attention}(\bm{q},\bm{k},\bm{v})=\sum_{i}\bm{a}_{i}\cdot\bm{v}_{i} (27)

In short, the attention mechanism is actually designed to give larger weight to the parts that need attention, highlighting important information and ignoring other content. L. Chen et al. [38] explored the combination of CNN and attention mechanism, which gave excellent performance. Y. Cui et al. [39] applied the attention mechanism to the CSI feedback solution and demonstrated its performance improvement.

As is shown in Fig. 4, transcoding module is combined with two attention residual blocks and a fully-connected layer. The attention residual block receives two parallel input tensors 𝐗\mathbf{X} and 𝐗key\mathbf{X}_{key}. If 𝐗=𝐗key\mathbf{X}=\mathbf{X}_{key}, then we call it self-attention residual block, which pays more attention to self hidden information. If 𝐗𝐗key\mathbf{X}\neq\mathbf{X}_{key}, the attention residual block will embed the 𝐗key\mathbf{X}_{key}’s feature information in 𝐗\mathbf{X}, which is called cross-attention residual block. First, we design a cross-attention residual block to embed the information of ξS\xi_{S} into ξV\xi_{V}. And then a self-attention residual block is followed to explore the self hidden features of ξV\xi_{V}.

Refer to caption
Figure 4: Illustration of the transcoding module and attention residual block.

The time complexity and space complexity of multi-head attention layer can be respectively formulated as,

Time{att}O(Lin2×din)\displaystyle Time\left\{att\right\}\sim O\left(L_{in}^{2}\times d_{in}\right) (28)
Space{att}O(4×Lin2×din)\displaystyle Space\left\{att\right\}\sim O\left(4\times L_{in}^{2}\times d_{in}\right) (29)

where LinL_{in} is the length of input tensor, and dind_{in} denotes the value of hyper-parameter head_numhead\_num. The hyper-parameters set in transcoding module is listed in Tab. II, and the analysis of complexity is also included.

III-C3 Decoder module

In this part we will discuss the decoder module in EMEVNet. The decoder will be deployed at the BS to reconstruct eigenmatrix 𝐕^\widehat{\mathbf{V}} and eigenvectors 𝐒^\widehat{\mathbf{S}}. Fig. 5 shows the detailed decoder module and the convolution residual block applied in it. We design two different branches to reconstruct 𝐕^\widehat{\mathbf{V}} and 𝐒^\widehat{\mathbf{S}} respectively. Two different fully-connected layers are utilized to extract the high-dimensional features of 𝐕\mathbf{V} and 𝐒\mathbf{S} from codewords ε\varepsilon. And then we design convolutional residual blocks to reconstruct 𝐕^\widehat{\mathbf{V}} and 𝐒^\widehat{\mathbf{S}}.

Refer to caption
Figure 5: Illustration of the decoder module and convolutional residual block.

Since this part depends on convolution layer and fully-connected layer, and their complexity analysis is mentioned above. The hyper-parameters set in transcoding module is listed in Tab. III, and the analysis of complexity is also included.

IV Simulation Results and Discussions

This section shows the simulation experiments of proposed EMEV feedback NN (EMEVNet) in detail, including simulation platform, datasets generation, parameter settings and performance evaluation. And then we give some analysis and discussion about the results, including feasibility analysis, superiority analysis and robustness analysis. Meanwhile, this section exhibits all numerical results corresponding to simulation experiments. The datasets used in this paper and the simulation codes can be found at Github666Github link: https://github.com/CodeDwan/EMEV-feedback.

IV-A Parameters Setting

IV-A1 Simulation platform

All the simulations and experiments are carried out on the workstation with CentOS 7.0. The workstation is equipped with two Intel(R) Xeon(R) Silver 4210R CPU and four Nvidia RTX 2080Ti GPU, it also has 256GB Random Access Memory (RAM).

IV-A2 Datasets generation

With the help of MATLAB 5G Toolbox and Communication Toolbox, we define a standard CDL channel object and carry on the link-level simulation. The dataset used in our experiments are extracted from the link-level simulator. Tab. IV shows the alternative parameter and default values in the data generator. Both UE and BS antennas follow uniform panel array (UPA) distribution.

TABLE IV: Simulation experiment parameter setting
Channel environment NLOS LOS
CDL-A CDL-B CDL-C CDL-D CDL-E
NRB 13
Center frequency 28 GHz
Subcarrier spacing 60 KHz
UE speeds {4.8, 24, 40, 60} km/h
Delay spreads 129 ns 634 ns 634 ns 65 ns 65ns
BS antenna UPA [8.8]=64[8.8]=64
UE antenna UPA [2,2]=4[2,2]=4

We totally generate 60,00060,000 data samples for each CDL channel using MATLAB. The 60,00060,000 samples of each channel type are divided into 50,00050,000 and 10,00010,000. We utilize 50,00050,000 samples 𝔻sp\mathbb{D}_{sp} to train specific network sp\mathbb{N}_{sp} for each channel. For comparison, we mix 10,00010,000 samples of five CDL channels to obtain 𝔻mix\mathbb{D}_{mix}, and train a general network mix\mathbb{N}_{mix}. The operations of datasets and neural networks can be respectively expressed as,

sp𝔻sp={𝐇(50k)}\mathbb{N}_{sp}^{*}\leftarrow\mathbb{D}_{sp}^{*}=\left\{\mathbf{H}_{*}^{(50k)}\right\} (30)
mix𝔻mix={𝐇A(10k),𝐇B(10k),,𝐇E(10k)}\mathbb{N}_{mix}\leftarrow\mathbb{D}_{mix}=\left\{\mathbf{H}_{A}^{(10k)},\mathbf{H}_{B}^{(10k)},\cdots,\mathbf{H}_{E}^{(10k)}\right\} (31)

where 𝐇(50k)\mathbf{H}_{*}^{(50k)} denotes the 50,00050,000 samples of each channel environment represented by different subscripts, and 𝐇A(10k)\mathbf{H}_{A}^{(10k)} means the 10,00010,000 samples of CDL-A channel.

IV-A3 Setting of compression ratios

This paper explores the EMEV feedback solution. The UE needs to compress 𝐕\mathbf{V} and 𝐒\mathbf{S} first, and then feed back the compressed codewords ε\varepsilon to the BS. The length of codewords LεL_{\varepsilon} is related to the overhead of feedback, which will affect the spectral efficiency of communication system. And the LεL_{\varepsilon} is decided by compression ratio βCR\beta_{CR}. Since the size of channel matrix 𝐇\mathbf{H}, eigenmatrix 𝐕\mathbf{V} and eigenvectors 𝐒\mathbf{S} are different, we detailed define the setting of compression ratio βCR\beta_{CR} in this part. For unity and comparison, almost all researches utilize the compression ratio of 𝐇\mathbf{H} as the measurement standard. Hence, the length of codewords LεL_{\varepsilon} can be expressed as,

Lε\displaystyle L_{\varepsilon} =(𝐇)+(𝐇)βh+1\displaystyle=\frac{\Re(\mathbf{H})+\Im(\mathbf{H})}{\beta_{h}}+1 (32)
=NRB×Nr×Nt×2βh+1\displaystyle=\frac{N_{RB}\times N_{r}\times N_{t}\times 2}{\beta_{h}}+1

where βh\beta_{h} is the compression ratio of 𝐇\mathbf{H}. In order to guarantee the effectiveness of the comparative simulation experiments, fixed LεL_{\varepsilon} is given in the follow discussion. To ensure the same LεL_{\varepsilon}, the compression ratio of EMEVNet can be defined as,

βemev\displaystyle\beta_{emev} (𝐕)+(𝐕)+(𝐒)(𝐇)+(𝐇)βh\displaystyle\approx\frac{\Re(\mathbf{V})+\Im(\mathbf{V})+\Re(\mathbf{S})}{\Re(\mathbf{H})+\Im(\mathbf{H})}\beta_{h} (33)
=NRB×(Nt×Nt×2+Nr)NRB×Nr×Nt×2βh\displaystyle=\frac{N_{RB}\times(N_{t}\times N_{t}\times 2+N_{r})}{N_{RB}\times N_{r}\times N_{t}\times 2}\beta_{h}

where βemev\beta_{emev} represents the compression ratio of 𝐕,𝐒\mathbf{V},\mathbf{S}. Referring to the datasets parameters settings in Tab. IV, we can assume βemev=16βh\beta_{emev}=16\beta_{h} in this paper.

Refer to caption
(a) Initial sample 𝐕\mathbf{V} at UE
Refer to caption
(b) Reconstructed 𝐕^:Lε=416\widehat{\mathbf{V}}:L_{\varepsilon}=416
Refer to caption
(c) Reconstructed 𝐕^:Lε=208\widehat{\mathbf{V}}:L_{\varepsilon}=208
Refer to caption
(d) Reconstructed 𝐕^:Lε=104\widehat{\mathbf{V}}:L_{\varepsilon}=104
Refer to caption
(e) Reconstructed 𝐕^:Lε=52\widehat{\mathbf{V}}:L_{\varepsilon}=52
Refer to caption
(f) Reconstructed 𝐕^:Lε=26\widehat{\mathbf{V}}:L_{\varepsilon}=26
Refer to caption
(g) Reconstructed 𝐕^:Lε=13\widehat{\mathbf{V}}:L_{\varepsilon}=13
Refer to caption
(h) Reconstructed 𝐕^:Lε=6\widehat{\mathbf{V}}:L_{\varepsilon}=6
Refer to caption
(i) Initial sample 𝐒\mathbf{S} at UE
Refer to caption
(j) Reconstructed 𝐒^:Lε=416\widehat{\mathbf{S}}:L_{\varepsilon}=416
Refer to caption
(k) Reconstructed 𝐒^:Lε=208\widehat{\mathbf{S}}:L_{\varepsilon}=208
Refer to caption
(l) Reconstructed 𝐒^:Lε=104\widehat{\mathbf{S}}:L_{\varepsilon}=104
Refer to caption
(m) Reconstructed 𝐒^:Lε=52\widehat{\mathbf{S}}:L_{\varepsilon}=52
Refer to caption
(n) Reconstructed 𝐒^:Lε=26\widehat{\mathbf{S}}:L_{\varepsilon}=26
Refer to caption
(o) Reconstructed 𝐒^:Lε=13\widehat{\mathbf{S}}:L_{\varepsilon}=13
Refer to caption
(p) Reconstructed 𝐒^:Lε=6\widehat{\mathbf{S}}:L_{\varepsilon}=6
Figure 6: Illustration of partial visualization feasibility experimental results. Fig. 6a and Fig. 6i show the perfect eigenmatrix 𝐕\mathbf{V} and eigenvector 𝐒\mathbf{S} at the UE. Fig. 6b to Fig. 6h describe the reconstructed 𝐕^\widehat{\mathbf{V}} from Lε=416L_{\varepsilon}=416 to Lε=6L_{\varepsilon}=6. Fig. 6j to Fig. 6p show the reconstructed eigenvector 𝐒^\widehat{\mathbf{S}} with the same LεL_{\varepsilon}.

IV-A4 Setting of NN training

To ensure that the NN can converge to the best performance, we have tried many times for the NN training parameters, and finally defined as follows: Maximum epoch number τ=500\tau=500; Initial learning rate η=1×103\eta=1\times 10^{-3}; Early stop patience is 50 epoches. For the fairness evaluation, above NN training parameters are consistent in different experiments. The length of codewords and the corresponding compression ratio are set as follows:

  • Lε=[416,208,104,52,26,13,6]L_{\varepsilon}=[416,208,104,52,26,13,6] means the length of codewords.

  • βh=[16,32,64,128,256.512,1024]\beta_{h}=[16,32,64,128,256.512,1024] represents the compression ratio of CSI matrix 𝐇\mathbf{H}.

  • βemev=[256,512,1024,2048,4096,8192,16384]\beta_{emev}=[256,512,1024,2048,4096,8192,16384] denotes the system compression ratio of EMEVNet.

IV-A5 Performance evaluation

The ultimate purpose of our architecture is to help BS reconstruct the eigenmatrix 𝐕\mathbf{V} and eigenvector 𝐒\mathbf{S}. Therefore, we utilize normalized mean square error (NMSE) and cosine similarity (ρ)(\rho) to measure the reconstruction accuracy, which can be respectively defined as,

NMSE(𝐕,𝐕^)\displaystyle NMSE\left(\mathbf{V},\widehat{\mathbf{V}}\right) =𝔼{𝐕𝐕^22/𝐕22}\displaystyle=\mathbb{E}\left\{\lVert\mathbf{V}-\widehat{\mathbf{V}}\rVert_{2}^{2}/\lVert\mathbf{V}\rVert_{2}^{2}\right\} (34)
=10log(𝔼{𝐕𝐕^22/𝐕22})\displaystyle=10\log\left(\mathbb{E}\left\{\lVert\mathbf{V}-\widehat{\mathbf{V}}\rVert_{2}^{2}/\lVert\mathbf{V}\rVert_{2}^{2}\right\}\right)
NMSE(𝐒,𝐒^)\displaystyle NMSE\left(\mathbf{S},\widehat{\mathbf{S}}\right) =𝔼{𝐒𝐒^22/𝐒22}\displaystyle=\mathbb{E}\left\{\lVert\mathbf{S}-\widehat{\mathbf{S}}\rVert_{2}^{2}/\lVert\mathbf{S}\rVert_{2}^{2}\right\} (35)
=10log(𝔼{𝐒𝐒^22/𝐒22})\displaystyle=10\log\left(\mathbb{E}\left\{\lVert\mathbf{S}-\widehat{\mathbf{S}}\rVert_{2}^{2}/\lVert\mathbf{S}\rVert_{2}^{2}\right\}\right)
ρ(𝐕,𝐕^)\displaystyle\rho\left(\mathbf{V},\widehat{\mathbf{V}}\right) =𝔼{𝐕,𝐕^𝐕2𝐕^2}\displaystyle=\mathbb{E}\left\{\frac{\langle\mathbf{V}^{*},\widehat{\mathbf{V}}\rangle}{\lVert\mathbf{V}\rVert_{2}\lVert\widehat{\mathbf{V}}\rVert_{2}}\right\} (36)
ρ(𝐒,𝐒^)\displaystyle\rho\left(\mathbf{S},\widehat{\mathbf{S}}\right) =𝔼{𝐒,𝐒^𝐒2𝐒^2}\displaystyle=\mathbb{E}\left\{\frac{\langle\mathbf{S}^{*},\widehat{\mathbf{S}}\rangle}{\lVert\mathbf{S}\rVert_{2}\lVert\widehat{\mathbf{S}}\rVert_{2}}\right\} (37)

where a,b\langle a,b\rangle is the scalar product in Euclidean space. Generally, NMSENMSE is converted to logarithmic domain and the smaller value represents the better performance, e.g. NMSE=20NMSE=-20 dB shows better performance than 10-10 dB. The range of cosine similarity is ρ[1,1]\rho\in[-1,1], which can measure the similarity between reconstructed and original matrix. ρ\rho close to 1 indicates better reconstruction performance.

IV-B Feasibility Analysis

This subsection will show some simulation results and discuss the feasibility of our proposed architecture. In order to verify the feasibility of proposed EMEVNet, this part utilize 50,00050,000 samples of CDL-A channel, i.e. 𝔻spA={𝐇A(50k)}\mathbb{D}_{sp}^{A}=\{\mathbf{H}_{A}^{(50k)}\}, as simulation datasets. We set the training, validation and testing datasets obeying the ratio of 70:15:1570:15:15. Before training EMEVNet, the datasets 𝔻spA\mathbb{D}_{sp}^{A} need to carry out SVD transformation according to Algorithm 1. And the generated 𝐕,𝐒\mathbf{V},\mathbf{S} are the input of EMEVNet. After NN training stage according to Algorithm 2, we can obtain the trained spA\mathbb{N}_{sp}^{A}, which is the specific EMEVNet for CDL-A channel. In this part, we have trained and tested seven different compression ratios set as Section IV-A4.

The experimental results can be found in Fig. 6. Considering that 𝐕NRB×Nt×Nt\mathbf{V}\in\mathbb{C}^{N_{RB}\times N_{t}\times N_{t}} is a 3D complex-valued matrix, so we split one RB for more intuitive exhibition. The program randomly selects the third resource block, i.e. 𝐕(3)Nt×Nt\mathbf{V}(3)\in\mathbb{C}^{N_{t}\times N_{t}}. Then, the Euclidean norm of 𝐕(3)\mathbf{V}(3) is applied for convenient plotting, which can be interpreted as the power distribution. Fig. 6a shows the initial 𝐕\mathbf{V} sample at UE and Fig. 6b to Fig. 6h show the reconstructed 𝐕^\widehat{\mathbf{V}} at BS with different compression ratio from Lε=416L_{\varepsilon}=416 to Lε=6L_{\varepsilon}=6. As for eigenvector 𝐒NRB×Nr\mathbf{S}\in\mathbb{R}^{N_{RB}\times N_{r}}, it is a 2D real-valued matrix. We directly plot the figure without any processing. Fig. 6i shows the initial 𝐒\mathbf{S} at UE and Fig. 6j to Fig. 6p show the reconstructed 𝐒^\widehat{\mathbf{S}} at BS with the same settings as mentioned above. After many experiments, the shape of 𝐕\mathbf{V} of different RBs is almost the same. They all present diagonal power distribution, and the edges are serrated. Moreover, the power distribution of 𝐕\mathbf{V} is a symmetrical image.

As shown in Fig. 6, with the reduction of feedback codewords LεL_{\varepsilon}, the reconstructed 𝐕^\widehat{\mathbf{V}} by the BS will lose more information. This loss is mainly reflected in the sawtooth energy at the edge of eigenmatrix, while the loss of power principal component (diagonal distribution) is limited. When the length of codewords comes to Lε=26L_{\varepsilon}=26, i.e. βh=256\beta_{h}=256 and βemev=4096\beta_{emev}=4096, the reconstructed 𝐕^\widehat{\mathbf{V}} losses almost all edge information. It can be seen from the figure that when Lε26L_{\varepsilon}\leq 26, the edge power of 𝐕^\widehat{\mathbf{V}} cannot be reconstructed. However, we can ensure that the distribution of 𝐕^\widehat{\mathbf{V}} remains unchanged and the diagonal power is hardly affected in any case. As for 𝐒^\widehat{\mathbf{S}} at the BS, it can be well reconstructed under almost any Łε\L_{\varepsilon}. The detailed numerical results of spA\mathbb{N}_{sp}^{A}, i.e. NMSE,ρNMSE,\rho, will be shown and discussed in Section IV-E. From the exhibition and analysis of this subsection, we vividly proved the feasibility of EMEVNet by visualizing some experimental results.

Refer to caption
(a) CDL-A: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(b) CDL-A: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(c) CDL-B: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(d) CDL-B: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(e) CDL-C: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(f) CDL-C: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(g) CDL-D: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(h) CDL-D: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(i) CDL-E: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(j) CDL-E: 𝐒^\widehat{\mathbf{S}}
Figure 7: Illustration of superiority experimental results. The x-axis represents different lengths of codewords. The blue and red y-axes denotes NMSE (dB) and cosine similarity (ρ\rho).

IV-C Superiority Analysis

In this subsection, we hope to prove the superior performance of the proposed architecture. As is mentioned in the previous section, we assign a channel identification module before encoding 𝐕\mathbf{V} and 𝐒\mathbf{S}. Therefore, this part compares the performance of specific networks sp\mathbb{N}_{sp}^{*} trained with large datasets 𝔻sp\mathbb{D}_{sp}^{*} and general network mix\mathbb{N}_{mix} trained with mixed datasets 𝔻mix\mathbb{D}_{mix}. In order to completely verify the superiority of the proposed architecture, we have checked five CDL channels defined by 3GPP. The testing objects are {spA,spB,spC,spD,spE}\left\{\mathbb{N}_{sp}^{A},\mathbb{N}_{sp}^{B},\mathbb{N}_{sp}^{C},\mathbb{N}_{sp}^{D},\mathbb{N}_{sp}^{E}\right\}, and the baseline is set as {mix}\left\{\mathbb{N}_{mix}\right\}. For each experiment, we give four evaluation indexes defined in Section IV-A5, including NMSE(𝐕,𝐕^),NMSE(𝐒,𝐒^),ρ(𝐕,𝐕^)NMSE(\mathbf{V},\widehat{\mathbf{V}}),NMSE(\mathbf{S},\widehat{\mathbf{S}}),\rho(\mathbf{V},\widehat{\mathbf{V}}) and ρ(𝐒,𝐒^)\rho(\mathbf{S},\widehat{\mathbf{S}}).

The experimental results can be seen in Fig. 7. Each channel type has been tested and verified by 7 different compression ratios mentioned in Section IV-A4. The horizontally adjacent figures respectively show the reconstruction performance curves of 𝐕^\widehat{\mathbf{V}} and 𝐒^\widehat{\mathbf{S}} under the same channel, e.g. Fig. 7a and Fig. 7b show the performance under CDL-A channel environment. In addition, the vertically adjacent figures show the performance of the same feedback information with different channel types, e.g. Fig 7a and Fig. 7c show the performance with different channels. Each figure has two y-axes of different scales, corresponding to NMSE(dB)NMSE(dB) and ρ\rho respectively. In addition, there are four performance curves in each figure, among of which the solid blue lines show NMSENMSE performance, the dotted red lines represent ρ\rho performance, the circle marked lines represent specific network sp\mathbb{N}_{sp}^{*}, and the triangle marked lines represent general network mix\mathbb{N}_{mix}.

As can be seen from Fig. 7, the two blue solid lines of each figure show an upward trend, while the two red dotted lines show a downward trend. This result is the same as the analysis in Section IV-A5. The performance of reconstruction is getting worse with the decrease of LεL_{\varepsilon}, i.e., NMSE(dB)NMSE(dB) tends to be larger and ρ\rho deviates from 1. What’s more, all the blue lines with circle mark are lower than the triangle mark, and all the red lines with circle mark are higher than the triangle mark. These results mean that the performance of specific network sp\mathbb{N}_{sp}^{*} is better than general network mix\mathbb{N}_{mix}. It can also be found that two lines with the same color are almost parallel, which shows that the performance improvement of sp\mathbb{N}_{sp}^{*} is relatively stable compared with mix\mathbb{N}_{mix}, and is less affected by LεL_{\varepsilon}. In summary, we can conclude that the channel identification module is necessary, and the performance of feedback and reconstruction can be improved by selecting a specific network sp\mathbb{N}_{sp}^{*}.

IV-D Robustness Analysis

Refer to caption
(a) CDL-A: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(b) CDL-A: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(c) CDL-B: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(d) CDL-B: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(e) CDL-C: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(f) CDL-C: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(g) CDL-D: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(h) CDL-D: 𝐒^\widehat{\mathbf{S}}
Refer to caption
(i) CDL-E: 𝐕^\widehat{\mathbf{V}}
Refer to caption
(j) CDL-E: 𝐒^\widehat{\mathbf{S}}
Figure 8: Illustration of robustness experimental results. The x-axis represents different lengths of codewords. The blue and red y-axes are NMSE (dB) and cosine similarity (ρ\rho).
TABLE V: The numerical results of simulation experiments carried out and discussed in section IV.
CDL-A CDL-B CDL-C CDL-D CDL-E
NMSE (dB) ρ\rho NMSE (dB) ρ\rho NMSE (dB) ρ\rho NMSE (dB) ρ\rho NMSE (dB) ρ\rho
Lε=416L_{\varepsilon}=416 βh=16\beta_{h}=16 βemev=256\beta_{emev}=256 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –14.018 0.9807 –9.432 0.9481 –11.721 0.9686 –13.830 0.9902 –12.784 0.9746
𝐒^\widehat{\mathbf{S}} –41.318 0.9999 –24.964 0.9979 –31.566 0.9996 –39.307 0.9999 –33.332 0.9998
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –12.148 0.9695 –9.362 0.9421 –15.264 0.9851 –10.064 0.9579 –10.902 0.9593
𝐒^\widehat{\mathbf{S}} –42.999 0.9999 –31.292 0.9990 –38.057 0.9999 –43.368 0.9999 –43.211 0.9999
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –11.895 0.9694 –9.196 0.9448 –10.206 0.9558 –12.367 0.9726 –11.733 0.9685
𝐒^\widehat{\mathbf{S}} –38.764 0.9999 –22.736 0.9970 –30.065 0.9993 –36.161 0.9998 –29.025 0.9993
Lε=208L_{\varepsilon}=208 βh=32\beta_{h}=32 βemev=512\beta_{emev}=512 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –11.657 0.9683 –8.722 0.9395 –10.592 0.9601 –12.398 0.9728 –11.828 0.9689
𝐒^\widehat{\mathbf{S}} –39.512 0.9999 –24.193 0.9978 –31.456 0.9996 –37.178 0.9999 –32.143 0.9997
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –11.545 0.9649 –7.970 0.9202 –14.113 0.9806 –8.905 0.9290 –9.048 0.9377
𝐒^\widehat{\mathbf{S}} –40.598 0.9999 –23.947 0.9977 –35.133 0.9999 –42.110 0.9999 –40.704 0.9999
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –10.909 0.9623 –8.580 0.9374 –9.536 0.9491 –11.607 0.9677 –10.870 0.9619
𝐒^\widehat{\mathbf{S}} –38.111 0.9999 –22.082 0.9965 –29.778 0.9992 –35.734 0.9998 –28.401 0.9992
Lε=104L_{\varepsilon}=104 βh=64\beta_{h}=64 βemev=1024\beta_{emev}=1024 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –11.202 0.9641 –8.321 0.9344 –9.660 0.9514 –11.217 0.9650 –10.552 0.9593
𝐒^\widehat{\mathbf{S}} –37.645 0.9999 –23.765 0.9976 –30.574 0.9994 –36.067 0.9998 –31.993 0.9997
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –10.58 0.9562 –6.943 0.8989 –10.688 0.9373 –7.5421 0.9119 –7.816 0.9173
𝐒^\widehat{\mathbf{S}} –37.476 0.9998 –18.961 0.9939 –28.065 0.9996 –40.592 0.9999 –39.626 0.9998
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –9.998 0.9540 –8.101 0.9309 –8.874 0.9416 –10.661 0.9603 –9.983 0.9539
𝐒^\widehat{\mathbf{S}} –37.643 0.9998 –21.568 0.9963 –29.206 0.9992 –35.680 0.9998 –28.225 0.9992
Lε=52L_{\varepsilon}=52 βh=128\beta_{h}=128 βemev=2048\beta_{emev}=2048 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –9.581 0.9504 –8.011 0.9297 –9.070 0.9444 –10.465 0.9584 –9.7297 0.9513
𝐒^\widehat{\mathbf{S}} –36.016 0.9998 –23.657 0.9976 –29.855 0.9993 –35.663 0.9998 –31.729 0.9997
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –8.152 0.9235 –6.386 0.8851 –8.646 0.9317 –7.422 0.9094 –7.038 0.9011
𝐒^\widehat{\mathbf{S}} –33.241 0.9994 –13.486 0.9899 –22.293 0.9985 –39.054 0.9999 –36.104 0.9996
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –9.394 0.9475 –7.877 0.9275 –8.499 0.9364 –10.021 0.9542 –9.507 0.9487
𝐒^\widehat{\mathbf{S}} –36.001 0.9998 –21.547 0.9962 –29.249 0.9992 –35.369 0.9997 –28.046 0.9992
Lε=26L_{\varepsilon}=26 βh=256\beta_{h}=256 βemev=4096\beta_{emev}=4096 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –9.187 0.9448 –7.721 0.9249 –8.717 0.9394 –9.917 0.9532 –9.397 0.9472
𝐒^\widehat{\mathbf{S}} –35.434 0.9997 –23.559 0.9976 –29.982 0.9993 –35.167 0.9997 –31.112 0.9996
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –7.020 0.9007 –5.893 0.8713 –6.931 0.8986 –7.143 0.9034 –6.539 0.8891
𝐒^\widehat{\mathbf{S}} –26.173 0.9977 –7.831 0.9813 –17.173 0.9926 –33.582 0.9998 –30.038 0.9994
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –8.963 0.9425 –7.683 0.9249 –8.274 0.9334 –9.530 0.9492 –9.139 0.9445
𝐒^\widehat{\mathbf{S}} –35.393 0.9998 –20.877 0.9956 –28.506 0.9991 –34.578 0.9997 –27.672 0.9991
Lε=13L_{\varepsilon}=13 βh=512\beta_{h}=512 βemev=8192\beta_{emev}=8192 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –8.812 0.9405 –7.659 0.9233 –8.303 0.9335 –9.232 0.9459 –9.051 0.9430
𝐒^\widehat{\mathbf{S}} –33.754 0.9997 –22.682 0.9969 –29.037 0.9992 –33.561 0.9996 –30.105 0.9995
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –6.786 0.8952 –5.711 0.8658 –6.492 0.8879 –6.524 0.8887 –6.315 0.8832
𝐒^\widehat{\mathbf{S}} –17.651 0.9943 –3.233 0.9691 –13.018 0.9866 –27.956 0.9992 –26.774 0.9989
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –8.356 0.9342 –7.628 0.9231 –7.994 0.9288 –9.055 0.9431 –8.866 0.9408
𝐒^\widehat{\mathbf{S}} –33.717 0.9996 –20.771 0.9954 –28.358 0.9990 –32.944 0.9996 –27.311 0.9990
Lε=6L_{\varepsilon}=6 βh=1024\beta_{h}=1024 βemev=16384\beta_{emev}=16384 sp\mathbb{N}_{sp}^{*} (proposed) 𝐕^\widehat{\mathbf{V}} –8.298 0.9335 –7.599 0.9229 –7.879 0.9272 –8.969 0.9421 –8.948 0.9417
𝐒^\widehat{\mathbf{S}} –32.833 0.9996 –20.660 0.9952 –27.729 0.9989 –29.132 0.9993 –26.799 0.9989
csi\mathbb{N}_{csi}^{*} 𝐕^\widehat{\mathbf{V}} –6.033 0.8754 –5.465 0.8579 –5.971 0.8735 –5.283 0.8723 –6.238 0.8811
𝐒^\widehat{\mathbf{S}} –10.133 0.9885 2.702 0.9611 –6.120 0.9802 –24.523 0.9983 –22.261 0.9973
mix\mathbb{N}_{mix} 𝐕^\widehat{\mathbf{V}} –7.633 0.9233 –7.355 0.9181 –7.538 0.9215 –8.165 0.9352 –8.125 0.9319
𝐒^\widehat{\mathbf{S}} –32.357 0.9997 –19.231 0.9936 –26.690 0.9986 –28.593 0.9992 –24.871 0.9983

This subsection proves the robustness of proposed architecture by comparing EMEVNet with the classical CSI feedback scheme CsiNet [11]. In order to narrate conveniently, csi\mathbb{N}_{csi}^{*} is defined as the classical CsiNet framework. We distinguish different channels by superscripts. e.g. csiA\mathbb{N}_{csi}^{A} represents CsiNet based on CDL-A channel. For fairness, specific datasets with 50,00050,000 samples are used in this subsection, and sp\mathbb{N}_{sp}^{*} and csi\mathbb{N}_{csi}^{*} are trained respectively. Similarly, the experiments in this part also verify 5 channel types and 7 different compression ratios set as Section IV-A4. It should be noted that the comparison experiment is based on the same length of codewords, that is, the performance is compared with the same feedback overhead. Considering that CsiNet encodes and feedbacks the channel matrix 𝐇\mathbf{H}, we keep the same feedback overhead and carry out SVD transform on the decoded 𝐇^\widehat{\mathbf{H}} at the BS. The testing objects are {spA,spB,spC,spD,spE}\left\{\mathbb{N}_{sp}^{A},\mathbb{N}_{sp}^{B},\mathbb{N}_{sp}^{C},\mathbb{N}_{sp}^{D},\mathbb{N}_{sp}^{E}\right\}, and the baselines are set as {csiA,csiB,csiC,csiD,csiE}\left\{\mathbb{N}_{csi}^{A},\mathbb{N}_{csi}^{B},\mathbb{N}_{csi}^{C},\mathbb{N}_{csi}^{D},\mathbb{N}_{csi}^{E}\right\}.

The experimental results can be seen in Fig. 8. The arrangement of different subfigures is the same as that in Fig. 7. Each figure has two y-axes of different scales, corresponding to NMSE(dB)NMSE(dB) and ρ\rho respectively. In addition, there are four performance curves in each figure, and the solid blue lines show NMSENMSE performance, the dotted red lines represent ρ\rho performance, the circle marked lines represent EMEVNet sp\mathbb{N}_{sp}^{*}, and the pentagram marked lines represent the baseline, i.e. CsiNet csi\mathbb{N}_{csi}^{*}. The visualization results show that the lines marked by the pentagram have larger gradients. This phenomenon is more obvious in the performance curve of reconstructed 𝐒^\widehat{\mathbf{S}} at the BS. This can be explained as the baseline csi\mathbb{N}_{csi}^{*} is greatly affected by the length of codewords LεL_{\varepsilon}. As for eigenmatrix 𝐕^\widehat{\mathbf{V}}, except Fig. 8e shows that the baseline is better than EMEVNet with long LεL_{\varepsilon}, the other results all show the performance of EMEVNet is better with all LεL_{\varepsilon}. As for eigenvector 𝐒^\widehat{\mathbf{S}}, we can find baseline has better performance with long LεL_{\varepsilon} in all channels. For NLOS channels (CDL-A, CDL-B and CDL-C), when the length of codewords decreases to Lε=104L_{\varepsilon}=104, the performance of EMEVNet becomes better than that of baseline. While discussing LOS channels (CDL-D and CDL-E), this threshold is relaxed to Lε=26L_{\varepsilon}=26. After analysis, we believe that this is because there exists line of sight fading path with strong energy in LOS channels, which cause improvement of baseline. At the same time, we find that the performance of baseline attenuates seriously in the case of limited LεL_{\varepsilon}. However, the performance of EMEVNet is relatively stable. Especially in the case of limit LεL_{\varepsilon}, EMEVNet shows much better performance than baseline. To sum up, the proposed EMEVNet has good robustness and adaptability. It can be applied to larger system compression rate, that is, it occupies smaller feedback codewords, which can effectively reduce the feedback overhead and improve the spectrum utilization.

IV-E Numerical Results

This subsection shows the numerical results of all simulation experiments in Sections IV-B, IV-C and IV-D. All numerical results can be found in Tab. V. The sp\mathbb{N}_{sp}^{*} and csi\mathbb{N}_{csi}^{*} in the table correspond to the special EMEVNet and CsiNet trained by large datasets, respectively. And mix\mathbb{N}_{mix} represents general EMEVNet obtained from mixed datasets. The horizontal comparison is to verify the performance of different channel environments, and the vertical comparison is to verify the impact of different length of codewords. Meanwhile, from the numerical analysis, the reconstruction performance of 𝐕^\widehat{\mathbf{V}} is worse than that of 𝐒^\widehat{\mathbf{S}}, which is also consistent with the results in Section IV-B. From the analysis of numerical results, we can conclude that 𝐒^\widehat{\mathbf{S}} is almost perfectly reconstructed, while the performance of 𝐕^\widehat{\mathbf{V}} is worse with the decrease of LεL_{\varepsilon}.

V Conclusion

In this paper, a novel channel feedback architecture for mmWave FDD systems was proposed. The key idea of our architecture was to feed back useful channel information to the BS, instead of the complete CSI matrix. This paper discussed the beamforming technology based on SVD transformation, and the core design of the architecture was to feed back eigenmatrix and eigenvector. The major technical methods used in this paper include: using SVD transform to extract the effective information; utilizing attention mechanism to design a dual channel auto-encoder; deploying a channel identification NN at the UE to switch the appropriate specific EMEVNet. We considered and verified five common CDL channel environments and seven incremental system compression ratios. And all simulations were carried out in mmWave system. First, we demonstrated the feasibility of our proposed architecture by visualizing some experimental results. Then, we designed two comparison experiments to prove the superiority and robustness of proposed architecture, respectively. Finally, we showed the numerical results of all simulation experiments to further prove our analysis. This paper provided a new solution to solve the problem that the BS hardly acquires downlink CSI in FDD wireless communication system. Through extracting and feeding back useful information for BS, the intelligent communication system is able to further improve the performance and reduce overhead.

References

  • [1] C. Qi, P. Dong, W. Ma, H. Zhang, Z. Zhang, and G. Y. Li, “Acquisition of channel state information for mmwave massive MIMO: Traditional and machine learning-based approaches,” Science China Information Sciences, vol. 64, no. 8, pp. 1–16, June 2021.
  • [2] W. Tong and G. Y. Li, “Nine challenges in artificial intelligence and wireless communications for 6G,” Sept. 2021. [Online]. Available: https://arxiv.org/abs/2109.11320
  • [3] G. Gui, M. Liu, F. Tang, N. Kato, and F. Adachi, “6G: Opening new horizons for integration of comfort, security, and intelligence,” IEEE Wireless Communications, vol. 27, no. 5, pp. 126–132, Mar. 2020.
  • [4] B. Mao, F. Tang, Y. Kawamoto, and N. Kato, “AI models for green communications towards 6G,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 210–247, Mar. 2022.
  • [5] H. Guo, J. Li, J. Liu, N. Tian, and N. Kato, “A survey on space-air-ground-sea integrated network security in 6G,” IEEE Communications Surveys & Tutorials, vol. 24, no. 1, pp. 53–87, Mar. 2022.
  • [6] K. B. Letaief, W. Chen, Y. Shi, J. Zhang, and Y.-J. A. Zhang, “The roadmap to 6G: AI empowered wireless networks,” IEEE Communications Magazine, vol. 57, no. 8, pp. 84–90, Aug. 2019.
  • [7] C. Huang, S. Hu, G. C. Alexandropoulos, A. Zappone, C. Yuen, R. Zhang, M. D. Renzo, and M. Debbah, “Holographic MIMO surfaces for 6G wireless networks: Opportunities, challenges, and trends,” IEEE Wireless Communications, vol. 27, no. 5, pp. 118–125, Oct. 2020.
  • [8] A. Froytlog, G. Y. Li, and et al., “Ultra-low power wake-up radio for 5G iot,” IEEE Communications Magazine, vol. 57, no. 3, pp. 111–117, Mar. 2019.
  • [9] Z. Qin, H. Ye, G. Y. Li, and B.-H. F. Juang, “Deep learning in physical layer communications,” IEEE Wireless Communications, vol. 26, no. 2, pp. 93–99, Apr. 2019.
  • [10] Z. Qin, G. Y. Li, and H. Ye, “Federated learning and wireless communications,” IEEE Wireless Communications, vol. 28, no. 5, pp. 134–140, Oct. 2021.
  • [11] C.-K. Wen, W.-T. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Communications Letters, vol. 7, no. 5, pp. 748–751, Oct. 2018.
  • [12] J. Guo, J. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Compression and acceleration of neural networks for communications,” IEEE Wireless Communications, vol. 27, no. 4, pp. 110–117, Aug. 2020.
  • [13] J. Guo, C.-K. Wen, S. Jin, and G. Y. Li, “Convolutional neural network-based multiple-rate compressive sensing for massive MIMO CSI feedback: Design, simulation, and analysis,” IEEE Transactions on Wireless Communications, vol. 19, no. 4, pp. 2827–2840, Apr. 2020.
  • [14] T. Wang, C.-K. Wen, S. Jin, and G. Y. Li, “Deep learning-based CSI feedback approach for time-varying massive MIMO channels,” IEEE Wireless Communications Letters, vol. 8, no. 2, pp. 416–419, Apr. 2019.
  • [15] Y. Sun, W. Xu, L. Fan, G. Y. Li, and G. K. Karagiannidis, “Ancinet: An efficient deep learning approach for feedback compression of estimated CSI in massive MIMO systems,” IEEE Wireless Communications Letters, vol. 9, no. 12, pp. 2192–2196, Dec. 2020.
  • [16] Y. Sun, W. Xu, L. Liang, N. Wang, G. Y. Li, and X. You, “A lightweight deep network for efficient CSI feedback in massive MIMO systems,” IEEE Wireless Communications Letters, vol. 10, no. 8, pp. 1840–1844, Aug. 2021.
  • [17] J. Zeng, J. Sun, G. Gui, B. Adebisi, T. Ohtsuki, H. Gacanin, and H. Sari, “Downlink CSI feedback algorithm with deep transfer learning for FDD massive MIMO systems,” IEEE Transactions on Cognitive Communications and Networking, vol. 7, no. 4, pp. 1253–1265, Dec. 2021.
  • [18] X. Ma, Z. Gao, F. Gao, and M. Di Renzo, “Model-driven deep learning based channel estimation and feedback for millimeter-wave massive hybrid MIMO systems,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 8, pp. 2388–2406, Aug. 2021.
  • [19] M. Chen, J. Guo, C.-K. Wen, S. Jin, G. Y. Li, and A. Yang, “Deep learning-based implicit CSI feedback in massive MIMO,” IEEE Transactions on Communications, Feb. 2021.
  • [20] Z. Zhong, L. Fan, and S. Ge, “FDD massive MIMO uplink and downlink channel reciprocity properties: Full or partial reciprocity?” in GLOBECOM 2020 - 2020 IEEE Global Communications Conference, Dec. 2020, pp. 1–5.
  • [21] Y. Yang, F. Gao, G. Y. Li, and M. Jian, “Deep learning-based downlink channel prediction for FDD massive MIMO system,” IEEE Communications Letters, vol. 23, no. 11, pp. 1994–1998, Nov. 2019.
  • [22] Y. Yang, F. Gao, Z. Zhong, B. Ai, and A. Alkhateeb, “Deep transfer learning-based downlink channel prediction for FDD massive MIMO systems,” IEEE Transactions on Communications, vol. 68, no. 12, pp. 7485–7497, Dec. 2020.
  • [23] M. S. Safari, V. Pourahmadi, and S. Sodagari, “Deep UL2DL: Data-driven channel knowledge transfer from uplink to downlink,” IEEE Open Journal of Vehicular Technology, vol. 1, pp. 29–44, Dec. 2019.
  • [24] Y. Zhang, B. Adebisi, H. Gacanin, and F. Adachi, “CV-3DCNN: Complex-valued deep learning for CSI prediction in FDD massive MIMO systems,” IEEE Wireless Communications Letters, vol. 10, no. 2, pp. 266–270, Feb. 2021.
  • [25] Y. Yang, F. Gao, C. Xing, J. An, and A. Alkhateeb, “Deep multimodal learning: Merging sensory data for massive MIMO channel prediction,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 7, pp. 1885–1898, July 2021.
  • [26] J. Wang, T. Ohtsuki, B. Adebisi, H. Gacanin, and H. Sari, “Compressive sampled CSI feedback method based on deep learning for FDD massive MIMO systems,” IEEE Transactions on Communications, vol. 69, no. 9, pp. 5873–5885, Sept. 2021.
  • [27] W. Liu, W. Tian, H. Xiao, S. Jin, X. Liu, and J. Shen, “EVCsiNet: Eigenvector-based CSI feedback under 3GPP link-level channels,” IEEE Wireless Communications Letters, vol. 10, no. 12, pp. 2688–2692, Dec. 2021.
  • [28] F. Gao, B. Lin, C. Bian, T. Zhou, J. Qian, and H. Wang, “Fusionnet: Enhanced beam prediction for mmwave communications using sub-6 GHz channel and a few pilots,” IEEE Transactions on Communications, vol. 69, no. 12, pp. 8488–8500, Dec. 2021.
  • [29] J. Guo, C.-K. Wen, and S. Jin, “Deep learning-based CSI feedback for beamforming in single- and multi-cell massive MIMO systems,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 7, pp. 1872–1884, July 2021.
  • [30] Z. Liu, Y. Yang, F. Gao, T. Zhou, and H. Ma, “Deep unsupervised learning for joint antenna selection and hybrid beamforming,” IEEE Transactions on Communications, vol. 70, no. 3, pp. 1697–1710, Mar. 2022.
  • [31] Z. Gao, M. Wu, C. Hu, F. Gao, G. Wen, D. Zheng, and J. Zhang, “Data-driven deep learning based hybrid beamforming for aerial massive MIMO-OFDM systems with implicit CSI,” Feb. 2022. [Online]. Available: https://arxiv.org/abs/2201.06778
  • [32] 3GPP, “Study 3D channel model for LTE (relase 12),” 3rd Generation Partnership Project (3GPP), Technical Specification (TS) 36.873, 1 2018, version 14.1.0.
  • [33] ——, “Study on channel model for frequencies from 0.5 to 100 GHz (relase 16),” 3rd Generation Partnership Project (3GPP), Technical Report (TR) 38.901, 12 2019, version 16.1.0.
  • [34] Y. Zhang, J. Sun, H. Gacanin, and F. Adachi, “A novel channel identification architecture for mmwave systems based on eigen features,” Apr. 2022. [Online]. Available: https://arxiv.org/abs/2204.05052
  • [35] “NLOS and LOS of the 28 GHz bands millimeter-wave in 5G cellular networks.”
  • [36] K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp. 5353–5360.
  • [37] A. Vaswani and et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, Dec. 2017.
  • [38] L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua, “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 6298–6306.
  • [39] Y. Cui, A. Guo, and C. Song, “TransNet: Full attention network for CSI feedback in FDD massive MIMO system,” IEEE Wireless Communications Letters, pp. 1–1, Feb. 2022.