This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

FusionNet: Enhanced Beam Prediction for mmWave Communications Using Sub-6GHz Channel and A Few Pilots

Chenghong Bian, Yuwen Yang, Feifei Gao, and Geoffrey Ye Li C. Bian, Y. Yang, and F. Gao are with Institute for Artificial Intelligence, Tsinghua University (THUAI), Beijing National Research Center for Information Science and Technology (BNRist), Department of Automation, Tsinghua University, Beijing, P.R. China, 100084, P.R. China (email: {bianch16,yyw18}@mails.tsinghua.edu.cn, [email protected]).G. Y. Li is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA (email: [email protected]).
Abstract

In this paper, we propose a new downlink beamforming strategy for mmWave communications using uplink sub-6GHz channel information and a very few mmWave pilots. Specifically, we design a novel dual-input neural network, called FusionNet, to extract and exploit the features from sub-6GHz channel and a few mmWave pilots to accurately predict mmWave beam. To further improve the beamforming performance and avoid over-fitting, we develop two data pre-processing approaches utilizing channel sparsity and data augmentation. The simulation results demonstrate superior performance and robustness of the proposed strategy compared to the existing one that purely relies on the sub-6GHz information, especially in the low signal-to-noise ratio (SNR) regions.

Index Terms:
mmWave, sub-6GHz, beamforming, deep learning, data augmentation

I Introduction

Millimeter-Wave communications over 30–300GHz band offer a large transmission bandwidth and has been deemed as the key compensation for the current sub-6GHz wireless spectrum[1, 2]. To combat the severe path loss in mmWave band, the transmitter at the base station (BS) generally deploys a large number of antennas and forms a highly directional beam towards users [3, 4], which requires accurate downlink channel state information (CSI) [5]. However, the increasing in the number of antennas requires significant training overhead to obtain accurate downlink CSI.

Many techniques have been proposed to utilize the channel sparsity to reduce the training overhead of the mmWave downlink transmission [6, 7, 8, 9, 10]. For instance, Compressive sensing (CS) channel estimation for mulit-user massive MIMO systems in [6] has investigated the number of measurements required for reasonable performance. The angle sparsity has been observed in [9] and used to develop a structured CS based channel estimation scheme. Moreover, the CS based algorithm in [10] exploits the block-sparsity nature of mmWave channels in the frequency domain.

Since future wireless communication systems are expected to employ different frequency bands, it is possible to utilize the CSI features of another frequency band to assist the transmission of the current band. Out-of-band (OOB)111Actually, OOB information is not limited to the channel estimated at different frequency band, but would possibly contain a broad categories, including the channels estimated at a different positions [11], the coordinates of user’s position, the radar echo received at the BS, or even the visions captured by the BS camera. Details can be found in [12] and will not be further expanded here. information has been used to reduce the training overhead in many recent works [12, 13, 14, 15, 16]. It has been shown in [13] that 90%\% percent of paths is with common angles at frequencies far apart ranging from 900MHz to 90GHz. In [14], overhead-free multi-Gbps mmWave communication has been established with the out-of-band direction inference obtained from sub-6GHz band. From [15], the downlink covariance is inferred from the observed uplink covariance. In [16], spatial information has been extracted from sub-6GHz channels for beam selection in mmWave band. The work in [16] has opened a new door for beam prediction. However, more work needs to be done to address more complicated channel environments in practical scenarios.

Recently, deep learning (DL) [17] has been applied to a large variety of problems in wireless communications for modulation recognition[18], channel estimation [19, 20], signal detection [21, 22], channel equalization[23], CSI feedback[24, 25], and end-to-end transceiver[26, 27, 28]. A deep nerual network (DNN) can approximate any unknown or nonlinear relationship by learning from data, which makes it possible to perform beam selection for mmWave transmission from OOB information. In [29, 30], DL-based mmWave beam selection in a V2I scenario has been investigated where the size and position of the car serve as OOB information. The framework in [31] selects mmWave beam with the help of 3D scene data. Moreover, The DNN designed in [32, 33] directly obtains the optimal beam given the channel state information (CSI) of the sub-6GHz channel. Similarly, a convolutional neural network (CNN), with fewer number of parameters, in [34] leverages the sub-6GHz CSI to find the optimal mmWave beam. With the assist of various OOB information, all above approaches greatly reduce the training overhead for the mmWave downlink transmission. However, these beam prediction approaches directly treat the neural network as a black box [32, 33], which prevents from further performance improvement, especially when the signal-to-noise ratio (SNR) of the sub-6GHz channel is low.

In this paper, we develop a novel dual-input neural network, called FusionNet, to predict the optimal beam using both the sub-6GHz channels and a very few pilots in mmWave band. Even if with only a few pilots, this strategy can effectively tune the deviated beams to the correct direction and significantly improve the performance compared to its black box counterparts. The FusionNet also enables to exploit the channel sparsity in the angular-delay domain for further performance improvement. Moreover, a novel data augmentation approach is also developed to alleviate the over-fitting issue of FusionNet. Numerical results manifest that the proposed FusionNet outperforms the existing strategies in terms of both the prediction accuracy and achievable rate, especially at low SNR regions.

The rest of the paper is organized as follows. In Section II, we present the system model and channel estimation at dual frequency bands. In Section III, we briefly introduce the existing beam prediction using sub-6GHz channel only. Section IV provides the architecture of the proposed neural network and analyzes its complexity. The dataset generation, data transformation, and data augmentation are described in Section V. Numerical results are provided in Section VI, followed by the conclusion in Section VII.

Throughout our discussions, the scalar variables are represented with normal-face letters xx while matrices and vectors with upper and lower case letters, 𝐱\mathbf{x} and 𝐗\mathbf{X}, respectively. Transpose and Hermitian operators are denoted by ()T(\cdot)^{T}, ()H(\cdot)^{H}, respectively. The l2l_{2} norm is denoted as ||||2||\cdot||_{2} and |𝒞||{\cal C}| is the cardinality of set 𝒞{\cal C}; j=1j=\sqrt{-1} is the imaginary unit and (𝐚)c(\mathbf{a})_{c} denotes the ccth entry of vector 𝐚\mathbf{a}.

Refer to caption
(a) The BS and the mobile user communicate over both sub-6GHz and mmWave bands with co-located sub-6GHz and mmWave antennas.
Refer to caption
(b) The mmWave band transmission contains only one single RF chain that connects to the mmWave antennas via phase shifters and switchers.
Figure 1: The communication scenario and the mmWave architecture.

II System Model

Consider a communications system over both sub-6GHz and mmWave bands, which has one BS and one user, as illustrated in Fig. 1. The BS has two sets of antenna arrays, one with NsN_{s} antennas for sub-6GHz band while the other with NmN_{m} antennas for mmWave band. The sub-6GHz antenna array is fully digital, where each antenna connects to an independent RF chain. For simplicity, we assume the mmWave antenna array is analog, i.e., all antennas connect to a single RF chain via NmN_{m} phase shifters and NmN_{m} switchers [5], as shown in Fig. 1. The user has two antennas, working at sub-6GHz and mmWave bands. Both sub-6GHz and mmWave communication links use orthogonal frequency division multiplexing (OFDM) with KsK_{s} and KmK_{m} subcarriers, respectively. The codebook of the downlink beamforming at mmWave band is denoted as 𝒞={𝐟1,𝐟2,,𝐟|𝒞|}{\cal C}=\{\mathbf{f}_{1},\mathbf{f}_{2},\ldots,\mathbf{f}_{|{\cal C}|}\}. The target here is to find the optimal downlink beamforming index at the mmWave band via the estimated uplink channel at sub-6GHz and a few mmWave pilots.

II-A Uplink Channel Estimation at Sub-6GHz

Let 𝐡s[k]Ns×1\mathbf{h}_{s}[k]\in\mathbb{C}^{N_{s}\times 1} denote the uplink channel vector of the kk-th subcarrier at the sub-6GHz band. The received uplink signal at the BS can be expressed as

𝐲s[k]=𝐡s[k]ss[k]+𝐧s[k]\mathbf{y}_{s}[k]=\mathbf{h}_{s}[k]s_{s}[k]+\mathbf{n}_{s}[k] (1)

for k=1,2,,Ksk=1,2,\cdots,K_{s}, where ss[k]s_{s}[k] is the pilot of the kkth subcarrier and 𝐧s[k]𝒩(0,σ2)\mathbf{n}_{s}[k]\in\mathcal{N}(0,\sigma^{2}) is the corresponding noise. Either least-square (LS) or linear minimum mean-squared error (LMMSE) channel estimation, in the frequency domain or the time domain can be used, the detailed process is omitted here.

II-B Partial Channel Estimation at MmWave Band

Since there is only one RF chain, the regular uplink or downlink channel estimation at mmWave band needs to be repeated NmN_{m} times via switching the RF chain onto different antennas, or vary the weights of the phase shifters when each single RF chain connects to NmN_{m} antennas simultaneously. Such process costs a large amount of time resource, especially when NmN_{m} is large.

Here, we estimate the channels only on N~mNm\tilde{N}_{m}\ll N_{m} antennas to assist the beam prediction from the sub-6GHz to the mmWave band. The mmWave antennas that participate in the mmWave uplink channel estimation is called as active antennas. Note that the estimated channels corresponding to this N~m\tilde{N}_{m} antennas may not even be sufficient to recover the whole NmN_{m} channel elements via compressive sensing or the deep learning techniques. To make the overall illustration clear, we will first present how to estimate the uplink N~m\tilde{N}_{m} channels by changing the weights of the phase shifters.222The other way that switches the RF chain to the N~m\tilde{N}_{m} antennas sequently is not stable in practical applications and will not be adopted here.

The switchers of (NmN~m)(N_{m}-\tilde{N}_{m}) antennas are in off status during the training stage and the mmWave RF chain connects to the N~m\tilde{N}_{m} antennas via the N~m\tilde{N}_{m} phase shifters, simultaneously. Denote the channel on the kkth subcarrier of these N~m\tilde{N}_{m} antennas as 𝐡~m[k]\tilde{\mathbf{h}}_{m}[k]. In order to estimate these N~m\tilde{N}_{m} channels, the user should send the training OFDM block N~m\tilde{N}_{m} times since there is only one RF chain. The pilot signal on the kkth subcarrer in the iith training block is denoted as sm,i[k],i=1,,N~ms_{m,i}[k],i=1,\ldots,\tilde{N}_{m}. For simplicity, we assume that the pilot on all KmK_{m} subcarriers and all N~m\tilde{N}_{m} training blocks are same i.e., sm,i[k]=1,(i,k)s_{m,i}[k]=1,\forall(i,k).333It is also possible to consider the combo type training where only part of the subcarriers are used for pilots while the rest are used for unknown data.

Moreover, let us denote f~i,j\tilde{f}_{i,j} as the value of the phase shifter for the iith training block and the jjth antenna, i,j=1,,N~mi,j=1,\ldots,\tilde{N}_{m}, which is universal for all subcarriers. Denote 𝐟~i=[f~i,1,f~i,2,f~i,N~m]T\tilde{\mathbf{f}}_{i}=[\tilde{f}_{i,1},\tilde{f}_{i,2},\tilde{f}_{i,\tilde{N}_{m}}]^{T} with |f~ij|=1|\tilde{f}_{ij}|=1. As in Fig 1.b, the received signal on the kkth subcarrier of the iith training block is

yi[k]=𝐟~iT𝐡~m[k]+ni[k],k=1,,Km.\displaystyle y_{i}[k]=\tilde{\mathbf{f}}_{i}^{T}\tilde{\mathbf{h}}_{m}[k]+n_{i}[k],\quad k=1,\ldots,K_{m}. (2)

Stacking yi[k]y_{i}[k] from N~m\tilde{N}_{m} training block into one vector, we have

𝐲[k]\displaystyle\mathbf{y}[k] =[y1[k],y1[k],,yN~m[k]]T=𝐅~𝐡m[k]+𝐧[k],\displaystyle=[y_{1}[k],y_{1}[k],\ldots,y_{\tilde{N}_{m}}[k]]^{T}=\tilde{\mathbf{F}}\mathbf{h}_{m}[k]+\mathbf{n}[k], (3)

where 𝐅~=[𝐟~1,𝐟~2,,𝐟~N~m]T\tilde{\mathbf{F}}=[\tilde{\mathbf{f}}_{1},\tilde{\mathbf{f}}_{2},\ldots,\tilde{\mathbf{f}}_{\tilde{N}_{m}}]^{T} and 𝐧[k]=[n1[k],n2[k],,nN~m[k]]T.\mathbf{n}[k]=[n_{1}[k],n_{2}[k],\ldots,n_{\tilde{N}_{m}}[k]]^{T}. Normally, 𝐅~\tilde{\mathbf{F}} is selected as the N~m×N~m\tilde{N}_{m}\times\tilde{N}_{m} normalized DFT matrix during the training process.

II-C Downlink Data Transmission at mmWave Band

During the downlink data transmission, the BS uses all NmN_{m} antennas to form one single narrow beam, 𝐟𝒞\mathbf{f}\in{\cal C}. Since we assume only one RF chain, the downlink beamforming can be represented as 𝐟=[f1,f2,,fNm]T\mathbf{f}=[f_{1},f_{2},\ldots,f_{N_{m}}]^{T} with |fi|=1|f_{i}|=1. The received signal on the kk-th subcarrier in one OFDM block can be written as

ym[k]=𝐡mT[k]𝐟sd[k]+nm[k],y_{m}[k]=\mathbf{h}^{T}_{m}[k]\mathbf{f}s_{d}[k]+n_{m}[k], (4)

where nm[k]n_{m}[k] and sd[k]s_{d}[k] represent the corresponding noise and signal with powers σn2\sigma_{n}^{2} and PsP_{s}, respectively.

The achievable rate of the downlink transmission can be expressed as

R(𝐡m,𝐟)=k=1Klog(1+|𝐡mT[k]𝐟|2Psσn2).R(\mathbf{h}_{m},\mathbf{f})=\sum_{k=1}^{K}\log\left(1+\frac{|\mathbf{h}^{T}_{m}[k]\mathbf{f}|^{2}P_{s}}{\sigma_{n}^{2}}\right). (5)

Since the phase shifters are generally constrained with limited bits, the size of 𝒞{\cal C} is finite. In the previous works [35, 3, 36], exhaustive search is performed to find optimal 𝐟\mathbf{f}^{\star} to maximize the achievable rate, R(𝐡𝐦,𝐟)R(\mathbf{h_{m}},\mathbf{f}), according to the channel conditions.

Refer to caption
(a) The measured PDP of 3.5 GHz with 100MHz bandwidth in corridor scenario.
Refer to caption
(b) The measured PDP of 28 GHz with 100MHz bandwidth in corridor scenario.
Figure 2: The illustration of certain congruency between the sub-6GHz channel and the mmWave channel in the same propagation enviroment.

II-D Direct Prediction from sub-6GHz Channel

A typical example is shown in Fig. 2, where the power delay profile (PDP) of the channels from both 28GHz and 3.5GHz bands with 100MHz bandwidth are measured in the corridor environment[37]. From the figure, though the PDP of 28GHz channel is more sparse compared to that of 3.5GHz, the contour of both frequency bands are similar, which inspires to use the congrugency of the channel vectors in [13, 15, 16]. Specifically, it is possible to predict optimal mmWave downlink beam 𝐟\mathbf{f}^{\star} directly from the sub-6GHz uplink channel in [32, 33]. For instance, it has been proven in [32] that there exists a deterministic mapping from the sub-6GHz channel to the mmWave beams. In [33], the optimal mmWave beam is predicted by estimating the PDP of the sub-6GHz channel, where the PDP is considered as a fingerprint for the UE position and thus contains essential angular information for beam selection in a cell-specific manner with given environments. The training dataset of the two frequency bands is generated from the Wireless System Engineering (WiSE). Then, a DNN similar to [32] is trained to predict the best beam within the candidate set 𝒞{\cal C}.

Although predicting the mmWave beam from sub-6GHz channel CSI is theoretically and experimentally demonstrated effective. From [32, 33], there is still a big gap to improve its performance, especially when the SNR of the sub-6GHz channel is low. In fact, even without channel noise, the prediction accuracy is still below 85%\% and does not meet the practical communications requirement, which motivates us to improve the accuracy by combining features of the sub-6GHz channel and pilots and the mmwave channel.

III FusionNet for Improved Beam Prediction

In this section, we will show how to merge a very few mmWave pilots with the sub-6GHz CSI to significantly improve the accuracy of mmWave beam prediction.

Refer to caption
Figure 3: Beam selection directly from mmWave antennas requires NmN_{m} training resources. As the prediction from sub-6GHz channel already roughly points to the correct direction, we will only need N~mNm\tilde{N}_{m}\ll{N}_{m} training resources to calibrate and get a better beam prediction.

III-A Why A Few MmWave Pilots?

The best beam for mmWave downlink almost corresponds to the strongest path in the angular domain. Hence, the beam prediction problem is readily solved if the angular information of the mmWave channel is available. As discussed before, predicting directly from the sub-6GHz channel is not accurate enough due to the following factors:

  • 1)

    With limited number of sub-6GHz antennas the BS cannot offer high resolution in the angular domain;

  • 2)

    As pointed out in [38], the support for the mmWave channel in the angular domain is only a subset of that for the sub-6GHz channel with the same spatial grid quantization, and thus the angle of the strongest path for sub-6GHz channel may not be the mmWave counterpart;

  • 3)

    The DL approximation of the channel mapping from sub-6GHz to mmWave is not accurate enough due to limited number of training samples and limited network size.

Though the beam direction predicted from the sub-6GHz channel usually deviates from the true one as we can imagine, it still preserves certain channel spatial information and could serve as a valid starting point to find the best beam. Hence, the limited mmWave pilots would be very helpful to “calibrate” such deviation and significantly enhance the prediction accuracy, as illustrated in Fig. 3. Note that since the mmWave pilots are mainly used for calibration but not estimation, the number of required pilots is much smaller than that to estimate the complete mmWave channel, i.e., N~mNm\tilde{N}_{m}\ll N_{m}.

This key rationale is simple but is rather practical and useful for beam prediction, channel estimation, or data detection, etc. In all, one should always count on a few pilots to achieve the ultimate precision after using the DL to get a coarse starting point. In the following, we will present a detail design of the neural network architecture to effectively merge the mmWave pilots and sub-6GHz channel information, to accurately predict the beam in the mmWave band.

III-B Network Architecture

Following [32], we here adopt multi-layer perceptron (MLP) to represent the mapping function from the sub-6GHz channel to the optimal beam 𝐟𝒞\mathbf{f^{\star}}\in{\cal C}.

Refer to caption
(a) Directly concatenate two vectors before feeding to the network
Refer to caption
(b) The proposed FusionNet
Figure 4: The architectures for the shallow model and the proposed FusionNet

One straightforward way is to directly concatenate the sub-6GHz channel and the mmWave channel as the input of a DNN, as illustrated in Fig. 4(a), which is known as the shallow model [39]. However, as the correlation between the sub-6GHz and the mmWave channel is highly non-linear, it is hard for a neural network to learn their individual features effectively from a concatenated vector. The shallow model still has some performance loss, especially when the sub-6GHz channel is with low SNR.

Inspired by [40, 39], we design a dual-input network, called FusionNet, as shown in Fig. 4(b). The FusionNet first extracts the features from the sub-6GHz and mmWave channels separately, and fuses them in a concatenation layer to generate a probability vector 𝐩\mathbf{p} whose iith entry, pip_{i}, represents the probability for the iith beam (in the given codebook 𝒞\mathbf{{\cal C}}) being the optimal one. We denote the inputs corresponding to the sub-6GHz and the mmWave channels as 𝐱s\mathbf{x}_{s} and 𝐱m\mathbf{x}_{m}, respectively, which are the vectorization of channels on all subcarriers estimated in Section II.

As in Fig. 4(b), the FusionNet is comprised of three sub-networks, i.e., mmw-network, sub6-network and classify-network. The mmw-network, with LmL_{m} fully connected layers, extracts the features, such as angular, delay, and path gain information from the mmWave channel input. The sub6-network, with LsL_{s} layers, extracts information from the sub-6GHz input. The classify-network takes the concatenated feature as the input, and is followed by a Softmax layer. Each fully connected layer in these sub-networks is followed by a BatchNorm layer, a Relu layer, and a dropout layer.

Since the frequency discrepancy between the sub-6GHz and the mmWave channel is very large, intuitively, using the sub-6GHz channel to predict the best beam in mmWave band will definitely need more layers compared to using the mmWave channel itself. Hence, we set Ls>LmL_{s}>L_{m} in the proposed design. Denote the numbers of neurons in the llth layer in the sub6-network, mmw-network, and classify-network as nlm,nls,nlcn_{l}^{m},n_{l}^{s},n_{l}^{c}, respectively. The mmw-network and sub6-network extract the various path information of the corresponding channels, while the classify-network yields probability vector, 𝐩\mathbf{p}, with length nlc=|𝒞|n_{l}^{c}=\mathbf{|{\cal C}|}. The other parameters of the FusionNet will be further discussed in Section V.

III-C Training and Evaluation

In the training stage, a supervised learning approach is adopted, where the training label, denoted as 𝐭\mathbf{t}, is a one-hot vector representing the best beam for the mmWave downlink transmission. The detailed calculation of 𝐭\mathbf{t} will be presented later in the next section. We adopt the cross-entropy loss as the loss function:

Hp(𝐭)=c=1|𝒞|(𝐭)clog((𝐩)c),H_{p}(\mathbf{t})={\sum_{c=1}^{\mathbf{|{\cal C}|}}{(\mathbf{t})_{c}*\log((\mathbf{p})_{c})}}, (6)

which is minimized by the ADAM optimizer.

In the evaluation stage, new sub-6GHz and mmWave channels will be generated. After pre-processing the data and feeding to the trained FusionNet, the optimal beam can be predicted.

III-D Complexity Analysis

For the FusionNet, the total number of floating point operations (FLOPs) can be computed as:

lm=1Lm1nl1mnlm+ls=1Ls1nl1snls+lc=1Lc1nl1cnlc.\sum_{l_{m}=1}^{L_{m}-1}{n_{l-1}^{m}n_{l}^{m}}+\sum_{l_{s}=1}^{L_{s}-1}{n_{l-1}^{s}n_{l}^{s}}+\sum_{l_{c}=1}^{L_{c}-1}{n_{l-1}^{c}n_{l}^{c}}. (7)

Compared with [32] that only needs to process the training data from the sub-6GHz channel, the first two sub-networks of the FusionNet take training data from the two frequency bands, resulting in a little bit higher complexity. For example, when Lm=4,Ls=6,Lc=3L_{m}=4,L_{s}=6,L_{c}=3 and nlm,nls=2048,nlc=64n_{l}^{m},n_{l}^{s}=2048,n_{l}^{c}=64, the total complexity is mainly determined by the first two sub-networks and is approximately twice of the complexity in [32]. Nevertheless, the neural network is mainly trained offline and is deployed online at the BS, where the computing resources are always assumed abundant.

IV Dataset Generation and Data Preprocessing

In this section, we will first introduce how to generate the training data with the corresponding optimal beam labels. Then two novel data pre-processing methods will be designed to further improve the performance of the FusionNet.

IV-A Data Set Generation

The dataset for the FusionNet comprises of the mmWave channel, the sub-6GHz channel estimated at different user positions, and the corresponding best beam index. We first generate the sub-6GHz and mmWave channels using the “O1 scenario” in DeepMIMO dataset [41]. The parameters used in the data generation process are summarized in Table. I.

The DeepMIMO dataset contains parameters of the RR strongest rays for each user and is represented by (ψr,θr,αr,τr\psi_{r},\theta_{r},\alpha_{r},\tau_{r}), where ψr\psi_{r} denotes the azimuth angle at the BS of the rr-th path, θr\theta_{r} denotes the elevation angle, αr\alpha_{r} is the complex gain, and τr\tau_{r} is the delay. The mmWave channel is constructed using a geometric channel model whose channel vector at the kk-th subcarrier is

𝐡m[k]=r=1Rαrej2πkτrTs𝐚(ψr,θr),\mathbf{h}_{m}[k]=\sum_{r=1}^{R}{\alpha_{r}{e^{-j\frac{2\pi k\tau_{r}}{T_{s}}}}\mathbf{a}(\psi_{r},\theta_{r})}, (8)

where 𝐚\mathbf{a} is the steering vector and TsT_{s} is the reciprocal of the OFDM subcarrier interval. The sub-6GHz channel is also generated using (8) while the DeepMIMO dataset will automatically adjust the parameters according to the environment and frequency band. Therefore, the geometric channel model is capable to capture the physical characteristics of the signal propagation process including the dependence on the environment geometry, materials, frequency band, etc., which are of vital importance for DL based beam prediction.

We will place pilots on all subcarriers of the OFDM blocks at both frequency bands. With the received uplink signals, the BS will estimate channels 𝐡m[k]\mathbf{h}_{m}[k] and 𝐡s[k]\mathbf{h}_{s}[k] at the two frequency bands as illustrated in Section II. Similar data normalization [32] is then carried out for input signals in both frequency bands.

Let 𝐡mu[k]\mathbf{h}_{m}^{u}[k] be the mmWave channel vector at the kkth subcarrier for the uuth user from DeepMIMO and (𝐡mu[k])n(\mathbf{h}_{m}^{u}[k])_{n} be the mmWave channel at the kkth subcarrier on the nnth antenna. The channel vectors are normalized by a global normalization value Ω\Omega, which is the largest absolute value in the whole dataset, i.e.,

Ω=maxk,n,u|(𝐡mu[k])n|.\Omega=\max_{k,n,u}|(\mathbf{h}_{m}^{u}[k])_{n}|. (9)

After normalization, the magnitudes of all channel elements will be between 0 and 1. We then spilt the normalized complex channel into real part and image part, which will then be stacked together. The final mmWave training dataset can be obtained with size Um×(2×Km×Nm)U_{m}\times(2\times K_{m}\times N_{m}) where UmU_{m} is the number of total user positions at the mmWave band.

Next, we will present how to get label 𝐭\mathbf{t} at each user position. The achievable rate for the mmWave downlink channel of the uuth user with beam 𝐟c\mathbf{f}_{c} can be computed as

R(𝐡mu,𝐟c)=k=1Kmlog2(1+SNR|𝐡mu[k]𝐟c|2),\kern-10.0ptR(\mathbf{h}_{m}^{u},\mathbf{f}_{c})=\sum_{k=1}^{K_{m}}{\log_{2}(1+\text{SNR}|\mathbf{h}_{m}^{u}[k]\mathbf{f}_{c}|^{2})},\\

for c=1,2,,|𝒞|,c=1,2,\cdots,\mathbf{|{\cal C}|,} where SNR denotes the signal-to-noise ratio at the transmitter (user’s side). Then the index of the best beam can be obtained from the offline searching by

(cu)=argmaxc=1,2,,|𝒞|R(𝐡mu,𝐟c).(c^{u})^{\star}=\arg\max\limits_{c=1,2,\cdots,\mathbf{|{\cal C}|}}R(\mathbf{h}_{m}^{u},\mathbf{f}_{c}). (10)

An one-hot vector 𝐭u\mathbf{t}^{u} can be obtained for each user to represent (cu)(c^{u})^{\star}, whose (cu)(c^{u})^{\star}th element is 1 while other elements are 0, which serves as the label in the training and validating stage.

IV-B Utilizing Channel Sparsity

In Section II, the estimated sub-6GHz and mmWave channels, 𝐡s[k]\mathbf{h}_{s}[k] and 𝐡m[k]\mathbf{h}_{m}[k], are in the spatial-frequency domain. By stacking these channel vectors together, channel matrices for the sub-6GHz band and the mmWave band can be obtained as

𝐇msf=[𝐡m[1],𝐡m[2],,𝐡m[Km]],\displaystyle\mathbf{H}_{m}^{sf}=\left[\mathbf{h}_{m}[1],\mathbf{h}_{m}[2],\cdots,\mathbf{h}_{m}[K_{m}]\right],
𝐇ssf=[𝐡s[1],𝐡s[2],,𝐡s[Ks]].\displaystyle\mathbf{H}_{s}^{sf}=\left[\mathbf{h}_{s}[1],\mathbf{h}_{s}[2],\cdots,\mathbf{h}_{s}[K_{s}]\right].
TABLE I: Parameters to generate the channel vectors
parameters mmWave Sub6 GHz
carrier frequency 28GHz 3.5GHz
BS antennas 64 4
antenna interval 0.5 0.5
OFDM band width(MHz) 0.5 0.02
OFDM Subcarriers 512 32
Path 5 15

Inspired by [38, 42, 43, 44], the FusionNet can improve the prediction performance by leveraging the sparsity of the channel matrices. Since the sparsity at the two frequency bands are similar, we will only discuss the mmWave channel matrix as an example. A 2-D Discrete Fourier Transform (DFT) is performed on 𝐇msf\mathbf{H}_{m}^{sf} to find the new channel matrix in the angle-delay domain,

𝐇mad=𝐅a𝐇msf𝐅dH,\mathbf{H}_{m}^{ad}=\mathbf{F}_{a}\mathbf{H}_{m}^{sf}\mathbf{F}_{d}^{H}, (11)

where 𝐅a\mathbf{F}_{a} and 𝐅d\mathbf{F}_{d} are Nm×NmN_{m}\times N_{m} and Km×KmK_{m}\times K_{m} normalized DFT matrices.

As illustrated in [24], limited scattering of channels as well as the large number of antennas at the BS (i.e the large NmN_{m}) ensure the sparsity of 𝐇mad\mathbf{H}_{m}^{ad} in the angular-delay domain, i.e., only NaNmN_{a}\ll N_{m} rows and NdKmN_{d}\ll K_{m} columns of 𝐇mad\mathbf{H}_{m}^{ad} have significant values. Nevertheless, since there are a limited number of sub-6GHz antennas in the considered scenario and only N~mNm\tilde{N}_{m}\ll N_{m} mmWave antennas are used for channel estimation, the condition that a large number of antennas are employed at the BS is not satisfied. Thus, we will only adopt one dimension DFT to obtain the sparse representation, 𝐇msd\mathbf{H}_{m}^{sd}, in the delay domain by

𝐇msd=𝐇msf𝐅dH.\mathbf{H}_{m}^{sd}=\mathbf{H}_{m}^{sf}\mathbf{F}_{d}^{H}. (12)

The sparse representation for the sub-6GHz channel can also be leveraged following (12). Both the sparse channels, 𝐇msd\mathbf{H}_{m}^{sd} and 𝐇ssd\mathbf{H}_{s}^{sd}, will be used as the input data to train the FusionNet.

IV-C Data Augmentation

The proposed FusionNet is mainly comprised of fully connected layers with a large number of parameters. If trained with insufficient number of data, the neural network will most likely be over-fitted on the training set and would fail to yield a good generalisation on the test set. Many techniques, e.g., the dropout and batch normalization function, have been used to solve the over-fitting problem. However, as the number of parameters increases, the flexibility of the network becomes extremely high and these techniques fall out too.

Data augmentation has been used in DL to generate additional training data and has achieved great success in the realm of speech recognition[45], image classification[46, 47, 48] and deep reinforcement learning [49]. Despite various applications in other areas, the data augmentation approach has not yet been used to generate extra data for the wireless communications, to the best of the authors’ knowledge. We next introduce a novel data augmentation approach to generate new artificial samples as the input of the FusionNet.

A plausible data augmentation transformation should preserve label information. That is, one can perform any kind of transformation on the mmWave and the sub-6GHz channels as long as the transformed channels yield the same label as before. From the achieved rate objective (10), a simple observation is that if the mmWave channel vector, 𝐡mu[k]\mathbf{h}_{m}^{u}[k], is multiplied by any ej2πϕe^{-j2\pi\phi}, then both the achievable rate and the corresponding best beam index for each user remain the same. This observation inspires us to insert a random phase ϕ\phi into 𝐡m\mathbf{h}_{m} to augment the mmWave data. We generate ϕ\phi from the uniform distribution, i.e. ϕ𝒰(0,1)\phi\sim\mathcal{U}(0,1). Note that, different subcarriers for a specific user should share the same ϕ\phi while different users may have different ϕ\phi’s.

Data augmentation is also needed for the sub-6GHz channel. At the first look, it seems hard to directly tell what kind of transformation performed on the sub-6GHz channel will not affect the original label because the sub-6GHz channel does not directly determine the optimal beam (training label). Actually, the sub-6GHz channel provides underlying information of the mmWave channel, such as angular feature and frequency feature, about the propagation environment. Therefore, any kind of transformation that preserves the information in sub-6GHz channel is valid. Hence, multiplying a random value to the sub-6GHz channel vector is also plausible since this linear transformation will not cause any loss of channel information. For simplicity, we augment the sub-6GHz data by

𝐡^su=𝐡suej2πχ,\hat{\mathbf{h}}_{s}^{u}=\mathbf{h}_{s}^{u}e^{-j2\pi\chi}, (13)

where χ\chi is a random phase with χ𝒰(0,1)\chi\sim\mathcal{U}(0,1).

Using the above data augmentation approaches, a large number of new training samples can be generated. However, as the number of training samples increases, the computational complexity increases accordingly and the computer memory may also become insufficient. Most importantly, their underlying information remains the same even though the number of synthesized samples grows. As a result, the neural network’s performance would not be further improved when the number of the augment samples reaches a certain value.

V Simulation Results

In simulation, the neural network is trained using the data with labels described in Section V. During the training phase, Pytorch 1.3.0 is adopted as the DL framework running on a server with RTX 2080 Ti GPU. The number of neurons in each fully connected layer of the mmw-network and sub6-network is 2048 while the number of neurons in each fully connected layer of classify-network is 64. A varying learning rate is adopted, changing from the initial value, 10310^{-3}, to 10410^{-4} after half of the total epochs, and further to 10510^{-5} for the last 1/101/10 epochs. The number of samples is approximately 1.08×1051.08\times 10^{5}, where 70%\% for training and 30%30\% for validation. The batch size is 512 and the total epochs is 60. To relieve the over-fitting problem, dropout layers are added after the fully connected layers with a dropout rate 0.4. The performance of the proposed neural network is evaluated in terms of best beam selection accuracy, i.e., top-1 accuracy and the corresponding achievable data rate (IV-A). Specifically, the top-1 accuracy Acctop1Acc_{top1}444Note that in previous many works [32, 33], the accuracy of top-3 beamfomer is also adopted as an import criterion for the beam prediction since the Acctop1Acc_{top1} is low. For our work, thanks to the essential calibration effect of the mmWave pilot, the top-3 accuracy nearly approaches 100 %\% even with low SNR which can be seen in the following simulations. is defined as

Acctop1=1Ntestk=1Ntest𝕀c^k=ck,Acc_{top1}=\frac{1}{N_{test}}\sum_{k=1}^{N_{test}}{\mathbb{I}_{\hat{c}_{k}=c^{\star}_{k}}}, (14)

where NtestN_{test} is the number of testing data, c^k\hat{c}_{k} is the predicted index of the beam (the index of the largest value in 𝐩\mathbf{p}), ckc^{\star}_{k} is the ground truth, and 𝕀\mathbb{I} denotes the indicator function. Note that the SNR of the mmWave downlink data transmission is set to be 0 dB through out the whole simulation, which may be different from the SNR of the uplink sub-6GHz signal or the mmWave pilots.

V-A Prediction Performance

Refer to caption

Figure 5: The Acctop1Acc_{top1} predicted using FusionNet, shallow model and DNN proposed in [32]. The top-3 accuracy of FusionNet is also shown in this figure.

Fig. 5 compares prediction accuracy of the FusionNet with other neural network architectures, where the top-3 accuracy of FusionNet is also displayed. The baseline curve is the performance of the DNN in [32] that merely takes the sub-6GHz channel as input while the “shallow model” curve stands for predicting directly from the concatenated sub-6GHz and mmWave channel as shown in Fig. 4(a). Both the “shallow model” network and the FusionNet are trained with N~m=8\tilde{N}_{m}=8 active mmWave antennas and with pilot SNR =20=20dB. From Fig. 5, the prediction accuracy of “shallow model” network and the FusionNet both outperform the baseline curve. Since the shallow model does not fully exploit the individual features of the two channels, the “shallow model” curve is always below the FusionNet curve. Moreover, the top-3 accuracy of FusionNet reaches 100%\% even when the SNR of the sub-6GHz channel is merely 0 dB, hence we will omit the top-3 accuracy in the rest of the simulation.

Refer to caption
(a) Acctop1Acc_{top1}
Refer to caption
(b) Achievable rate
Figure 6: Prediction performance with different number of active mmWave antennas

Fig. 6 displays prediction accuracy and the corresponding achievable rate versus SNR of the sub-6GHz channel estimation. The number of active antennas N~m\tilde{N}_{m} is 2, 4, 8, 16, respectively, and the mmWave pilot SNR is 20 dB. From Fig. 6(a), prediction accuracy of the FusionNet with any number of N~m\tilde{N}_{m} is always much better than the baseline method, especially at the low SNR region. An approximately 5 dB SNR gain in the sub-6GHz band is observed in terms of beam prediction accuracy even if we turn on only 2 mmWave antennas. Moreover, the beam prediction accuracy of the FusionNet significantly improves as the number of active mmWave antennas increases while the improvement slows down beyond 8 active mmWave antennas. The achievable rate in Fig. 6(b) follows the similar trend. Another observation is that there is almost no rate loss in mmWave downlink data transmission when the SNR of the sub-6GHz signal is merely 5 dB and when N~m=4\tilde{N}_{m}=4 mmWave antennas are used for beam calibration. All these observations clearly demonstrate the effectiveness of the proposed FusionNet.

Refer to caption
(a) Acctop1Acc_{top1}
Refer to caption
(b) Achievable rate
Figure 7: Prediction accuracy and achievable rate with the SNR of the mmWave channel

Fig. 7 depicts prediction accuracy and the achievable rate versus the sub-6GHz channel SNR under different mmWave pilot SNR with 8 active mmWave antennas. From the figure, the FusionNet significantly outperforms the the baseline for most cases except when the mmWave pilot SNR is extremely low and the sub-6GHz training SNR is very high, which is not unexpected since in this case the calibration effect is not accurate enough, and then may drag down the performance predicted from the sub-6GHz channel. Nevertheless, the calibration effect with a low SNR is still positive when the sub-6GHz SNR is low. Therefore, the mmWave pilots would greatly help enhance the performance of a pure DNN in most practical scenarios. Moreover, further increasing the pilot SNR beyond 5 dB does not present more positive effect, which is essential for mmWave transmission when there is a severe path loss. Similar observations can also be found in the achievable rate in Fig. 7(b).

V-B Effects of the Number of OFDM Pilot Subcarriers on Both Frequency Bands

In previous examples, the FusionNet is examined with fully-loaded pilot subcarriers in both frequency bands. However, the practical protocol may assign limited number of pilot subcarriers in each OFDM block and it is of interest to check whether the FusionNet still works in this case.

TABLE II: The prediction accuracy when using different number of mmWave subcarriers
sub-6GHz SNR(dB) -10 -5 0 5 10 15 20
Acctop1Acc_{top1} using all subcarries 0.601 0.777 0.902 0.942 0.956 0.956 0.960
Acctop1Acc_{top1} using 1/8 subcarries 0.563 0.761 0.896 0.944 0.952 0.956 0.956
Acctop1Acc_{top1} using 1/16 subcarries 0.535 0.756 0.896 0.939 0.951 0.951 0.954
Acctop1Acc_{top1} using 1/32 subcarries 0.529 0.744 0.883 0.924 0.937 0.939 0.940

Table. II demonstrates the FusionNet’s performance when a fraction of OFDM subcarriers of the mmWave band are used as pilots, with N~m=4\tilde{N}_{m}=4 active antennas. From the table, prediction accuracy drops very little compared to the fully loaded pilots even if we use only one pilot because that the beam calibration process mainly uses the angular information of the mmWave channel, thus the additional OFDM subcarriers cannot provide further prediction improvement. Therefore, in practice, one may reduce the mmWave pilot number to enhance the data throughput.

Table. III shows the FusionNet’s performance using a fraction of OFDM subcarriers of the sub-6GHz channel. All OFDM subcarriers on N~m=4\tilde{N}_{m}=4 mmWave active antennas are used as pilots. Different from Table. II, prediction accuracy drops when the SNR of the sub-6GHz channel is low, however, it ceases as the sub-6GHz SNR increases. This is because the sub6-network of FusionNet might easily fit to the noise at the low SNR region, using more OFDM subcarriers for training will help reduce the effect of noise.

TABLE III: The prediction accuracy when using different number of sub-6GHz subcarriers
sub-6GHz SNR(dB) -10 -5 0 5 10 15 20
Acctop1Acc_{top1} using all subcarries 0.601 0.777 0.902 0.942 0.956 0.956 0.960
Acctop1Acc_{top1} using 1/2 subcarries 0.498 0.682 0.856 0.929 0.956 0.956 0.960
Acctop1Acc_{top1} using 1/4 subcarries 0.424 0.589 0.780 0.898 0.949 0.956 0.960
Acctop1Acc_{top1} using 1/8 subcarries 0.389 0.495 0.676 0.830 0.923 0.954 0.959

V-C Utilizing Channel Sparsity and Data Augmentation

In this part, two proposed data pre-processing approaches are evaluated. The augmentation rate, RaugR_{aug}, is defined as the size of the augmented dataset divided by the size of the original dataset.

Refer to caption

Figure 8: The prediction accuracy using sub-6GHz channel and the channel on 2 active mmWave antennas with different data pre-processing methods.

Fig. 8 shows the prediction performances utilizing channel sparsity, data augmentation, as well as the combination of these two approaches. For this simulation, N~m=2{\tilde{N}}_{m}=2, the mmWave pilot SNR is 20 dB and the augmentation rate is 8. From the figure, the prediction performance is improved by exploiting the channel sparsity and adopting data augmentation approach especially at the low sub-6GHz SNR, it can be further improved combining the two pre-processing approaches.

Refer to caption

Figure 9: The prediction accuracy versus the data augmentation rate.

Fig. 9 displays prediction accuracy versus the augmentation rate when the SNR of the sub-6GHz and the mmWave channel are 10-10 dB and 20 dB, respectively, where the channel sparsity is also exploited. The prediction accuracy improves rapidly at first and then slows down with 2 or 4 active antennas, showing the synthesized data has fully exploited the underline information after certain rate. However, when N~m\tilde{N}_{m} increases to 16, the improvement brought by data augmentation is limited since the prediction accuracy with the original dataset is good enough.

V-D Prediction Directly from Active MmWave Pilots

With the significant improvement brought by a few mmWave pilots, one natural question arises: is the enhancement because 𝐡~m[k]\tilde{\mathbf{h}}_{m}[k] itself is good enough to predict the optimal downlink beam? To answer this question, the prediction performance using mmWave channel only along with the baseline and the FusionNet, under pilot SNR 20 dB is shown in Fig. 10 where the FusionNet adopts 8 active mmWave antennas.

From the figure, the prediction accuracy based on the active mmWave antennas is not satisfactory, which is still below 85 %\% even if half of the total mmWave antennas (i.e., N~m=32\tilde{N}_{m}=32) are active. In brief, Fig. 10 clearly demonstrates the intriguingly novel aspect of the FusionNet, which can merge the two “mediocre” ways and results in an extremely precise prediction.

Refer to caption
Figure 10: Prediction directly from mmWave channel

VI Conclusion

In this paper, we develop a deep learning based approach using the uplink sub-6GHz channel with very few pilots in the mmWave band to greatly enhance the performance of mmWave downlink beam prediction. Specifically, we design a novel DNN architecture, the FusionNet, that concatenates both the sub-6GHz and partial mmWave channel as the inputs. By extracting the individual features from the two different channels and perform concatenation, the prediction accuracy and the achievable rate of the FusionNet outperforms all current state-of-art method. To further improve the prediction performance, we introduce a data augmentation approach to prevent over-fitting when training the FusionNet to extract and exploit the sparsity features of the channels. We show that even when the SNR of sub-6GHz and the mmWave channels are low, the proposed FusionNet is still able to predict the best beam using very few mmWave pilot with high fidelity, making itself a promising candidate for future full spectrum wireless applications.

References

  • [1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5g cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, 2013.
  • [2] T. Bai and R. W. H. Jr, “Coverage and rate analysis for millimeter wave cellular networks,” 2014.
  • [3] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Trans. Commun., vol. 61, no. 10, pp. 4391–4403, 2013.
  • [4] S. Han, C. I, Z. Xu, and C. Rowell, “Large-scale antenna systems with hybrid analog and digital beamforming for millimeter wave 5g,” IEEE Commun. Mag., vol. 53, no. 1, pp. 186–194, 2015.
  • [5] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave mimo systems,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 436–453, 2016.
  • [6] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing based multi-user millimeter wave systems: How many measurements are needed?” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2015, pp. 2909–2913.
  • [7] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, 2014.
  • [8] S. L. H. Nguyen and A. Ghrayeb, “Compressive sensing-based channel estimation for massive multiuser mimo systems,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), 2013, pp. 2890–2895.
  • [9] Z. Gao, L. Dai, and Z. Wang, “Channel estimation for mmwave massive mimo based access and backhaul in ultra-dense network,” in Proc. IEEE Int. Conf. Commun. (ICC), 2016, pp. 1–6.
  • [10] M. Wang, F. Gao, N. Shlezinger, M. F. Flanagan, and Y. C. Eldar, “A block sparsity based estimator for mmwave massive mimo channels with beam squint,” IEEE Trans. Signal Process., vol. 68, pp. 49–64, 2020.
  • [11] Y. Yang, F. Gao, Z. Zhong, B. Ai, and A. Alkhateeb, “Deep transfer learning based downlink channel prediction for fdd massive mimo systems,” 2019.
  • [12] N. Gonzalez-Prelcic, A. Ali, V. Va, and R. W. Heath, “Millimeter-wave communication with out-of-band information,” IEEE Commun. Mag., vol. 55, no. 12, pp. 140–146, 2017.
  • [13] A. Ali, N. González-Prelcic, and R. W. Heath, “Estimating millimeter wave channels using out-of-band measurements,” in Proc. Inf. Theory Appl. Workshop, 2016, pp. 1–6.
  • [14] T. Nitsche, A. B. Flores, E. W. Knightly, and J. Widmer, “Steering with eyes closed: Mm-wave beam steering without in-band measurement,” in Proc. IEEE INFOCOM, 2015, pp. 2416–2424.
  • [15] A. Decurninge, M. Guillaud, and D. T. M. Slock, “Channel covariance estimation in massive mimo frequency division duplex systems,” in Proc. IEEE GC Workshops, 2015, pp. 1–6.
  • [16] A. Ali, N. González-Prelcic, and R. W. Heath, “Millimeter wave beam-selection using out-of-band spatial information,” IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1038–1052, 2018.
  • [17] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–44, 05 2015.
  • [18] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. on Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, 2017.
  • [19] M. Soltani, V. Pourahmadi, A. Mirzaei, and H. Sheikhzadeh, “Deep learning-based channel estimation,” IEEE Commun. Lett., vol. 23, no. 4, pp. 652–655, 2019.
  • [20] H. He, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based channel estimation for beamspace mmwave massive mimo systems,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 852–855, 2018.
  • [21] H. Ye, G. Li, and B.-H. Juang, “Power of deep learning for channel estimation and signal detection in ofdm systems,” IEEE Wireless Commun. Lett., vol. PP, 08 2017.
  • [22] N. Samuel, T. Diskin, and A. Wiesel, “Deep mimo detection,” in Proc. IEEE Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), 2017, pp. 1–5.
  • [23] D. Erdogmus, D. Rende, J. C. Principe, and T. F. Wong, “Nonlinear channel equalization using multilayer perceptrons with information-theoretic criterion,” in Proc. Neural Netw. Signal Process. 200I (NNSP XI), 2001, pp. 443–451.
  • [24] C. Wen, W. Shih, and S. Jin, “Deep learning for massive mimo csi feedback,” IEEE Wireless Commun. Lett., vol. 7, no. 5, pp. 748–751, 2018.
  • [25] J. Guo, C. Wen, S. Jin, and G. Y. Li, “Convolutional neural network-based multiple-rate compressive sensing for massive mimo csi feedback: Design, simulation, and analysis,” IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2827–2840, 2020.
  • [26] Z. Qin, H. Ye, G. Y. Li, and B. F. Juang, “Deep learning in physical layer communications,” IEEE Wireless Commun., vol. 26, no. 2, pp. 93–99, 2019.
  • [27] F. A. Aoudia and J. Hoydis, “End-to-end learning of communications systems without a channel model,” in Proc. Asilomar Conf. Signals, Syst., Comput., 2018, pp. 298–303.
  • [28] H. Ye, L. Liang, G. Y. Li, and B. Juang, “Deep learning-based end-to-end wireless communication systems with conditional gans as unknown channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, 2020.
  • [29] Y. Wang, M. Narasimha, and R. W. Heath, “Mmwave beam prediction with situational awareness: A machine learning approach,” in Proc. IEEE Int. Workshops Signal Process. Adv. Wireless Commun. (SPAWC), 2018, pp. 1–5.
  • [30] A. Klautau, P. Batista, N. González-Prelcic, Y. Wang, and R. W. Heath, “5g mimo data for machine learning: Application to beam-selection using deep learning,” in Proc. Inf. Theory Appl. Workshop (ITA), 2018, pp. 1–9.
  • [31] W. Xu, F. Gao, S. Jin, and A. Alkhateeb, “3d scene based beam selection for mmwave communications,” IEEE Wireless Commun. Lett., pp. 1–1, 2020.
  • [32] M. Alrabeiah and A. Alkhateeb, “Deep learning for mmwave beam and blockage prediction using sub-6ghz channels,” IEEE Trans. Commun., pp. 1–1, 2020.
  • [33] M. S. Sim, Y. Lim, S. H. Park, L. Dai, and C. Chae, “Deep learning-based mmwave beam selection for 5g nr/6g with sub-6 ghz channel information: Algorithms and prototype validation,” IEEE Access, vol. 8, pp. 51 634–51 646, 2020.
  • [34] K. Ma, P. Zhao, and Z. wang, “Deep learning assisted beam prediction using out-of-band information,” in Proc. IEEE Veh. Technol. Conf. (VTC2020-Spring), 2020, pp. 1–5.
  • [35] Y. M. Tsang, A. S. Y. Poon, and S. Addepalli, “Coding the beams: Improving beamforming training in mmwave communication system,” in Proc. IEEE GLOBECOM, 2011, pp. 1–6.
  • [36] Junyi Wang, Zhou Lan, Chang-woo Pyo, T. Baykas, Chin-sean Sum, M. A. Rahman, Jing Gao, R. Funada, F. Kojima, H. Harada, and S. Kato, “Beam codebook based beamforming protocol for multi-gbps millimeter-wave wpan systems,” IEEE J. Sel. Areas Commun., vol. 27, no. 8, pp. 1390–1399, 2009.
  • [37] T. Jiang, J. Zhang, M. Shafi, L. Tian, and P. Tang, “The comparative study of s-v model between 3.5 and 28 ghz in indoor and outdoor scenarios,” IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 2351–2364, 2020.
  • [38] B. Li, Z. Zhang, and Y. Chen, “Beam selection in multi-user millimeter wave system based on out-of-band spatial information,” in Proc. IEEE Int. Conf. Comput. Commun. (ICCC), 2019, pp. 871–876.
  • [39] J. Ngiam, A. Khosla, M. Kim, J. Nam, and A. Y. Ng, “Multimodal deep learning,” in Proc. Int. Conf. Mach. Learn., 2009.
  • [40] A. Ben Said, A. Mohamed, T. Elfouly, K. Harras, and Z. J. Wang, “Multimodal deep learning approach for joint eeg-emg data compression and classification,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), 2017, pp. 1–6.
  • [41] A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” in Proc. Inf. Theory Appl. Workshop (ITA), San Diego, CA, Feb 2019, pp. 1–8.
  • [42] H. Xie, F. Gao, and S. Jin, “An overview of low-rank channel estimation for massive mimo systems,” IEEE Access, vol. 4, pp. 7313–7321, 2016.
  • [43] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for fdd massive mimo,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, 2015.
  • [44] B. Wang, F. Gao, S. Jin, H. Lin, and G. Y. Li, “Spatial- and frequency-wideband effects in millimeter-wave massive mimo systems,” IEEE Transactions on Signal Processing, vol. 66, no. 13, pp. 3393–3406, 2018.
  • [45] N. Rossenbach, A. Zeyer, R. Schlüter, and H. Ney, “Generating synthetic audio data for attention-based speech recognition systems,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2020, pp. 7069–7073.
  • [46] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, “Understanding data augmentation for classification: When to warp?” in Proc. Int. Conf. Digit. Image Comput.: Techn. Appl. (DICTA), 2016, pp. 1–6.
  • [47] J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network with data augmentation for sar target recognition,” IEEE Geosci. Remote Sens. Lett., vol. 13, no. 3, pp. 364–368, 2016.
  • [48] A. Antoniou, A. Storkey, and H. Edwards, “Data augmentation generative adversarial networks,” 2017.
  • [49] V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529–33, 02 2015.