RIS-Assisted MIMO Communication Systems: Model-based versus Autoencoder Approaches

Ha An Le^∗, Trinh Van Chien^ν, Van Duc Nguyen^†, and Wan Choi^∗ This work was supported by Institute of Information & communications Technology Planning & evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00809, Development on the disruptive technologies for beyond 5G mobile communications employing new resources). ^∗Department of Electrical and Computer Engineering, Seoul National University, Seoul, Korea
^νSchool of Information and Communications Technology (SoICT), Hanoi University of Science and Technology, Vietnam
^†School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Hanoi, Vietnam
Emails: [email protected], [email protected], [email protected], [email protected]

Abstract

This paper considers reconfigurable intelligent surface (RIS)-assisted point-to-point multiple-input multiple-output (MIMO) communication systems, where a transmitter communicates with a receiver through an RIS. Based on the main target of reducing the bit error rate (BER) and therefore enhancing the communication reliability, we study different model-based and data-driven (autoencoder) approaches. In particular, we consider a model-based approach that optimizes both active and passive optimization variables. We further propose a novel end-to-end data-driven framework, which leverages the recent advances in machine learning. The neural networks presented for conventional signal processing modules are jointly trained with the channel effects to minimize the bit error detection. Numerical results demonstrate that the proposed data-driven approach can learn to encode the transmitted signal via different channel realizations dynamically. In addition, the data-driven approach not only offers a significant gain in the BER performance compared to the other state-of-the-art benchmarks but also guarantees the performance when perfect channel information is unavailable.

Index Terms:

Reconfigurable Intelligent Surface, Multiple-Input Multiple-Output, Autoencoder, Bit Error Rate.

I Introduction

Reconfigurable intelligent surface (RIS) has been recently emerged as a cost-effective paradigm that can tackle the complexity issues [1]. Each RIS device comprises a number of reflecting elements that can be controlled to manipulate the incoming signals in the desired manner. With a proper phase shift design, the reflected signal can be added constructively to enhance the signal strength [1]. The authors in [2] proposed a joint active and passive beamforming design in MISO multi-user system to minimize the total transmit power consumption. In [3], an alternating maximization approach has been applied to jointly optimize precoding matrix and RIS phase-shifts in order to boost energy efficiency of the system. Consider a RIS-assisted point-to-point MIMO system, the authors in [4] investigated the optimization of RIS phase-shift in OFDM MIMO system to enhance to capacity of the cascaded channel. However, in [4], a precoding matrix at the transmitter has not been considered. In [5], a low complexity design of precoding matrix and phase-shift of RIS has been proposed to maximize the spectral efficiency of a point-to-point MIMO system. Related to the communication reliability, there are only a few works. In [6], an RIS-based Vertical Bell Labs layered space-time (VBLAST) system has been proposed to enhance the bit error rate (BER) performance of the MIMO system. These previous works suffer from high computational complexity since the algorithms are implemented in iterative manners. The local solutions from the model-based approaches might be much worse than the global optimum, motivating for a better design.

Various applications of machine learning in wireless communications such as resource management, channel estimation, and signal detection [7] have been proposed to address intractable non-convex optimization problems and high-complexity issues. Related to the RIS system, the idea of applying machine learning in designing RIS phase-shift has been widely studied. In [8], the authors proposed a deep neural network (DNN) to learn the optimal RIS phase-shift from the users’ positions in order to maximize the through put of the multi-user MISO system. In [9], a framework which is comprised of two convolutional neural networks (CNNs) is utilized to jointly optimize the RIS phase-shift and precoding matrix in an unsupervised fashion with the goal of maximizing the sum rate of all users. With the same optimization objective with [9], in [10], a reinforcement learning-based framework is proposed to predict the optimal precoding matrix and RIS phase-shift with the given channel realizations. Although different machine learning approaches have been successfully applied in RIS systems, there is very limit work that focuses on improving signal detection performance and reliability of the system. In [11], the authors considered a jointly transmitter, receiver and RIS phase-shift design. In this framework, an autoencoder approach is proposed to enhance the BER performance of RIS-assited SISO system. Furthermore, the authors in [12] proposed a autoencoder approach-based design for RIS-assited MIMO systems to improve data detection performance. However, the works in [11] and [12] are limited since the autoencoders are trained based on one deterministic channel realization. This not only makes the autoencoder can work only for one specific channel but also does not really reflex the practical aspects of physical radio channels.¹¹1The previous works [11, 12] assumed sufficiently large coherent time, so their autoencoder architectures may not enable them to cope with the rapid changes of propagation channels in practice. Motivated by this drawback, we propose an autoencoder approach that can encode data through different channel realizations to enhance the BER.

Paper contributions: In this paper, for the model-based approach, we first study the two different MIMO communication system models that optimize radio resource and smartly control the propagation environments with the BER minimization as the utility metric. For the data-driven approach, we propose an autoencoder design for RIS-assisted MIMO communication systems, which consider the practical conditions of fading channels, and therefore close to the real applications than previous works. We jointly design the RIS phase-shift and the transceiver in order to reduce the BER at the receiver. Following the end-to-end framework, the transmitter, receiver, and the RIS device are modeled by individual deep neural networks with a unique loss function that minimizes the bit detection error on average. Our proposed data-driven approach establishes a unified framework to learn the entire system in order to enhance the communication reliability. Numerical results demonstrates the benefits of the data-driven approach that provides much better BER performance than the other model-based approaches. Moreover, we numerically show that the proposed framework can learn to encoder data that are robust to the fluctuation of propagation channels, and therefore retains a good BER performance under imperfect channel state information (CSI).

Notation: The upper and lower bold letters are utilized to denote the matrices and vectors. The superscript $(\cdot)^{T}$ and $(\cdot)^{H}$ are the regular and Hermitian transpose. $\mathbf{I}_{N}$ denotes an identity matrix of size $N\times N$ and $\mathrm{arg}(\cdot)$ is the argument of a complex number. $\|\cdot\|$ and $\|\cdot\|_{F}$ denote the Euclidean and Frobenius norm. The expectation of a random variable is denoted by $\mathbb{E}\{\cdot\}$ , while $\mathcal{CN}(\cdot,\cdot)$ is a circularly symmetric Gaussian distribution. Finally, $\mathcal{O}(\cdot)$ is the big- $\mathcal{O}$ notation.

Refer to caption — Figure 1: The considered RIS-assisted MIMO communication system model where the transceiver and the RIS are replaced by neural networks.

II MODEL-BASED OPTIMIZATION APPROACH

We consider a point-to-point MIMO system where a transmitter equipped with $N_{t}$ antennas transmits $N_{s}$ data streams to the receiver. The receiver has $N_{r}$ antennas to enhance the received signal strength. Furthermore, the system performance is boosted by the support of an RIS comprising $K$ passive reflecting elements as illustrated in Fig. 1. The reflection matrix $\boldsymbol{\Theta}\in\mathbb{C}^{K\times K}$ is formulated as

\boldsymbol{\Theta}=\mathrm{diag}(\beta_{1}e^{j\theta_{1}},\dots,\beta_{K}e^{j\theta_{K}}),\vspace*{-0.1cm}

(1)

where $0\leq\beta_{k}\leq 1$ and $-\pi\leq\theta_{k}\leq\pi$ are the magnitude and phase produced by the $k$ -th reflecting element. As a popular assumption because of the recent advances towards lossless metasurfaces, we assume a unit signal reflection, i.e., $\beta_{k}=1,\forall k$ . In this paper, we assume that the direct link between the transmitter and the receiver is blocked due to large obstacles. The data is sent from the transmitter to the receiver through the RIS by the indirect link with the cascaded channels. The signal $\mathbf{s}$ is assumed to be $M$ -QAM (quadrature amplitude modulation) with $\mathbb{E}\{\mathbf{s}\mathbf{s}^{H}\}=\mathbf{I}_{N_{t}}$ . The modulated data message is precoded by a linear precoder $\mathbf{F}\in\mathbb{C}^{N_{t}\times N_{s}}$ with $\|\mathbf{F}\|^{2}_{F}=N_{s}$ , and then passed through to the receiver via the RIS. The received signal, $\mathbf{y}\in\mathbb{C}^{N_{r}}$ , at the receiver is

\mathbf{y}=\sqrt{P/N_{s}}\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}\mathbf{F}\mathbf{s}+\mathbf{n},\vspace*{-0.1cm}

(2)

where $P$ is the total transmitted power. The channel between the transmitter and the RIS is denoted by $\mathbf{G}\in\mathbb{C}^{K\times N_{t}}$ , while $\mathbf{H}\in\mathbb{C}^{K\times N_{r}}$ is the channel between the RIS and the receiver. Additionally, $\mathbf{n}\sim\mathcal{CN}(\mathbf{0},\sigma^{2}\mathbf{I}_{N_{r}})$ denotes additive white Gaussian noise (AWGN). We assume that all the channels are known to the transmitter with the help of feedback or channel reciprocity. From (2), the signal is then decoded as

\hat{\mathbf{s}}=\mathbf{Z}\mathbf{y}=\sqrt{P/N_{s}}\mathbf{Z}\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}\mathbf{F}\mathbf{s}+\mathbf{Z}\mathbf{n},\vspace*{-0.1cm}

(3)

where $\mathbf{Z}\in\mathbb{C}^{N_{s}\times N_{r}}$ represents the equalizer matrix. The active beamforming matrices, i.e., the precoding matrix $\mathbf{F}$ and the equalizer matrix $\mathbf{Z}$ , together with the passive reflection matrix $\boldsymbol{\Phi}$ can be optimized for a given utility metric and practical constraints. However, the strong coupling among these optimization variables leads to a nontrivial procedure to solve the resource allocation problems optimally. In this paper, we separately solve subproblems $\boldsymbol{\Theta}$ and $\mathbf{F}$ to obtain a good solution in polynomial time by, first, maximizing the channel capacity defined for (2). Then, based on (3), the channel impairment is compensated by a proper selection of the equalizer matrix $\mathbf{Z}$ .

II-1 Design of the reflection matrix $\boldsymbol{\Theta}$

Let us define $\mathbf{f}_{0}(\boldsymbol{\Theta},\mathbf{Z},\mathbf{F})=(P/N_{s})\mathbf{Z}\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}\mathbf{F}\mathbf{F}^{H}\mathbf{G}^{H}\boldsymbol{\Theta}^{H}\mathbf{H}\mathbf{Z}^{H}$ . Following the same methodology as [5, Proposition 1] with the given optimal solution to the precoding matrix $\mathbf{F}$ and the equalizer $\mathbf{Z}$ , it holds that $\log_{2}\det|\mathbf{I}_{N_{r}}+\mathbf{f}_{0}(\boldsymbol{\Theta},\mathbf{Z},\mathbf{F})|\geq\log_{2}(1+(P/N_{s})\mathrm{tr}(\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}\mathbf{G}^{H}\boldsymbol{\Theta}^{H}\mathbf{H}))$ . Consequently, a good feasible point to the channel capacity is obtained by the optimal phase shift matrix $\boldsymbol{\Theta}^{\ast}$ of the total path gain maximization as

$\displaystyle\boldsymbol{\Theta}^{\ast}=\underset{\boldsymbol{\Theta}}{\mathrm{argmax}}$	$\displaystyle\mathrm{tr}(\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}\mathbf{G}^{H}\boldsymbol{\Theta}^{H}\mathbf{H})$	(4)
subject to	$\displaystyle-\pi\leq\theta_{k}\leq\pi,\beta_{k}=1,\forall k,$
	$\displaystyle\boldsymbol{\Theta}=\mathrm{diag}(e^{j\theta_{1}},\dots,e^{j\theta_{K}}).$

We observe that the problem (4) can be solved by the semi-definite relaxation technique [2] or the alternating direction method of multipliers (ADMM) [5] and the diagonal constraint can be relaxed following the steps in [5].

II-2 Design of the precoding matrix $\mathbf{F}$

For the given solution to the $\boldsymbol{\Theta}$ -subproblem, let us define the aggregated channel $\widetilde{\mathbf{H}}=\mathbf{H}^{H}\boldsymbol{\Theta}\mathbf{G}$ that imposes all the features of the cascaded channels and the phase shifts. By utilizing the singular value decomposition, we formulate the SVD of $\widetilde{\mathbf{H}}$ as $\widetilde{\mathbf{H}}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^{H}$ , where $\mathbf{U}\in\mathbb{C}^{N_{r}\times N_{t}}$ and $\mathbf{V}\in\mathbb{C}^{N_{t}\times N_{t}}$ satisfy $\mathbf{U}^{H}\mathbf{U}=\mathbf{I}_{N_{t}}$ and $\mathbf{V}^{H}\mathbf{V}=\mathbf{I}_{N_{t}}$ . Besides, $\boldsymbol{\Sigma}=\mathrm{diag}(\lambda_{1},\ldots,\lambda_{N_{t}})$ contains the singular values $\lambda_{m},\forall m=1,\ldots,N_{t},$ with $\lambda_{1}\geq\ldots\geq\lambda_{N_{t}}$ . The optimal solution to the precoding matrix is

\mathbf{F}^{\ast}=[\mathbf{V}]_{1:N_{s}}\mathbf{P}^{1/2}=[\mathbf{V}]_{1:N_{s}}\mathrm{diag}\left(\sqrt{p_{1}^{\ast}},\ldots,\sqrt{p_{N_{s}}^{\ast}}\right),\vspace*{-0.1cm}

(5)

where $p_{m}^{\ast}$ denotes the optimal fraction of the transmit power assigned to the $m$ -th data stream satisfying $\sum_{m=1}^{M}p^{\ast}_{m}=N_{s}$ , and $[\mathbf{A}]_{1:N_{s}}$ denotes the first $N_{s}$ columns of matrix $\mathbf{A}$ .

II-3 Design of the equalizer matrix $\mathbf{Z}$

Conditioned on the solutions to the $\mathbf{Z}$ - and $\mathbf{F}$ - subproblems, the received signal in (2) can be reformulated by substituting (5) into (2) and doing some algebraic manipulations as

\hat{\mathbf{s}}\stackrel{{\scriptstyle(a)}}{{=}}\sqrt{P/N_{s}}\mathbf{Z}\mathbf{U}\boldsymbol{\Sigma}\mathbf{P}^{1/2}\mathbf{s}+\tilde{\mathbf{n}}\stackrel{{\scriptstyle(b)}}{{=}}\mathbf{s}+\tilde{\mathbf{n}},\vspace*{-0.1cm}

(6)

where $\tilde{\mathbf{n}}=\mathbf{Z}\mathbf{n}$ . In (6), $(a)$ is obtained by the SVD composition of the aggregated channel $\widetilde{\mathbf{H}}$ and the use of optimal precoding matrix $\mathbf{F}^{\ast}$ in (5). In order to detect the transmitted signal effective, $(b)$ is obtained by the following solution

\mathbf{Z}^{\ast}=(\boldsymbol{\Sigma}_{N_{s}}\mathbf{P}^{1/2})^{-1}[\mathbf{U}]_{1:N_{s}}^{H},\vspace*{-0.1cm}

(7)

where $\boldsymbol{\Sigma}_{N_{s}}=\mathrm{diag}(\lambda_{1},\ldots,\lambda_{N_{s}})$ . Notice that the complexity of the algorithm mainly comes from obtaining $\boldsymbol{\Theta}^{\ast}$ which is in order of $\mathcal{O}(K^{3}+TK^{2})$ , where $T$ is the number of iterations needed to reach the convergence from an initial point. Furthermore, the receiver requires the computational complexity of $\mathcal{O}(N_{s}2^{M})$ to decode the modulated signal from $\hat{\mathbf{s}}$ . Therefore, the total complexity raised by this communication system is in the order of $\mathcal{O}(K^{3}+TK^{2}+N_{s}2^{M})$ .

Remark 1.

Even though $\boldsymbol{\Theta}^{\ast}$ , $\mathbf{Z}^{\ast}$ , and $\mathbf{F}^{\ast}$ are not the optimal solution, they offer an initial mechanism to study the passive and active resource allocation optimization to the networks under smart environment controls. From the equivalence between the sum channel capacity maximization and the minimum mean square error (MMSE) optimization [13], the proposed design is expected to attain a low BER as well.²²2 Following the similar steps as in [13], we can prove that the sum channel capacity maximization and the mean square error (MSE) minimization share the same globally optimal solution to $\{\boldsymbol{\Theta},\mathbf{Z},\mathbf{F}\}$ .

III AUTOENCODER OPTIMIZATION APPROACH

This section describes the RIS-aided autoencoder-based framework that jointly learns the features of all the active and passive devices as illustrated in Fig. 2, where the transceiver and the RIS device are replaced by neural networks.

III-A Preliminary

We consider the design that minimizes the BER between the decoded signal at the receiver and the transmitted signal at the transmitter. The feasible set of our frame work is characterized by the fact that $0\leq\|\mathbf{H}^{H}\boldsymbol{\Phi}\mathbf{G}\mathbf{F}\|_{F}\stackrel{{\scriptstyle(a)}}{{=}}\sqrt{\sum\nolimits_{n=1}^{N_{s}}\|\mathbf{H}^{H}\boldsymbol{\Phi}\mathbf{G}\mathbf{f}_{n}\|_{2}^{2}},$ where $\mathbf{f}_{n}\in\mathbb{C}^{N_{t}}$ denotes the $n$ -th column of matrix $\mathbf{F}$ and $(a)$ is obtained by the Frobenius norm based on the matrix multiplication. Let us denote $\mathbf{o}_{m}^{T}\in\mathbb{C}^{N_{t}}$ is the $m$ -th row of matrix $\mathbf{H}^{H}\boldsymbol{\Phi}\mathbf{G}$ , then it holds that

\begin{split}&\sum\nolimits_{n=1}^{N_{s}}\|\mathbf{H}^{H}\boldsymbol{\Phi}\mathbf{G}\mathbf{f}_{n}\|_{2}^{2}=\sum\nolimits_{n=1}^{N_{s}}\sum\nolimits_{m=1}^{N_{r}}\left|\mathbf{o}_{m}^{T}\mathbf{f}_{n}\right|^{2}\stackrel{{\scriptstyle(a)}}{{\leq}}\\ &\sum\nolimits_{n=1}^{N_{s}}\sum\nolimits_{m=1}^{N_{r}}\|\mathbf{o}_{m}^{T}\|_{2}^{2}\|\mathbf{f}_{n}\|_{2}^{2}=N_{s}\sum\nolimits_{n=1}^{N_{t}}\lambda_{n}\stackrel{{\scriptstyle(b)}}{{<}}\infty,\end{split}\vspace*{-0.1cm}

(8)

where $(a)$ is obtained by using the Cauchy–Schwarz inequality and $(b)$ is due to the finite network dimensions and the law of conservation of energy. Consequently, we obtain

0\leq\|\mathbf{H}^{H}\boldsymbol{\Phi}\mathbf{G}\mathbf{F}\|_{F}\leq\sqrt{N_{s}\sum\nolimits_{n=1}^{N_{t}}\lambda_{n}}.\vspace*{-0.1cm}

(9)

Besides, the limited power budget at the receiver ensures that $\|\mathbf{Z}\|_{F}$ is bounded, and therefore the feasible set of our framework is compact. The signal transmission in (2) and the signal detection in (3) can be expressed as a composition of the continuous mappings. Since all the requirements of the universal approximation theorem [14] are fulfilled, there exist neural networks to train and predict our considered model.

We denote the deep neural networks replaced for the transmitter, receiver, and the RIS device by the encoder, the decoder, and the RIS network, respectively. As shown in Fig. 1 and based on the continuous mappings, the data-driven approach involves the following procedures to train the neural networks in a hierarchical fashion as follows:

$i)$

The RIS network is a neural network that is trained to predict the desirable phase-shift values based on the channel state information. The detailed interpretation is presented in Sec. III-B.
$ii)$

The encoder is a neural network that is trained to encode the original data bit stream and predict the transmitted signals (after steering by the beamforming vectors) from both the phase-shift values and the channel information. The detailed interpretation is presented in Sec. III-C.
$iii)$

The decoder is a neural network that is trained to decode the transmitted data bit stream from the received signals. The detailed interpretation is presented in Sec. III-E.

Conditioned on the fact that the data bit stream are available for the training phase, the design of the data-driven approach is described in detail hereafter.

III-B RIS Network Design

The RIS network is a neural network that is trained to predict the phase shifts gathered in the reflection matrix $\boldsymbol{\Theta}$ . We exploit the channels between the transmitter and the RIS device as well as between the RIS device and the receiver as the inputs of the neural network³³3Since the network training can be performed offline, we assume the availability of CSI. In practice, a classical channel estimation method can be applied to estimate the CSI via the pilot training phase.. In order to make the framework closer to practical systems, we assume that the perfect instantaneous channels are unavailable. Therefore, the imperfect instantaneous channels $\widehat{\mathbf{G}}\in\mathbb{C}^{K\times N_{t}}$ and $\widehat{\mathbf{H}}\in\mathbb{C}^{K\times N_{r}}$ are defined as follows

\widehat{\mathbf{H}}=\mathbf{H}+\mathbf{H}_{\mathrm{e}}\mbox{ and }\widehat{\mathbf{G}}=\mathbf{G}+\mathbf{G}_{\mathrm{e}},\vspace*{-0.1cm}

(10)

where $\mathbf{H}_{\mathrm{e}}\in\mathbb{C}^{K\times N_{r}}$ and $\mathbf{G}_{\mathrm{e}}\in\mathbb{C}^{K\times N_{t}}$ are the corresponding estimation errors, which are assumed to be uncorrelated with $\mathbf{H}$ and $\mathbf{G}$ , respectively. The elements of $\mathbf{H}_{\mathrm{e}}$ and $\mathbf{G}_{\mathrm{e}}$ are independent and identically distributed by a circularly symmetric complex Gaussian distribution [15] with zero-mean and variance standing for the channel estimation quality. As shown in Fig. 2, each realization of the channel estimates $\hat{\mathbf{H}}_{r}^{H}$ and $\hat{\mathbf{G}}$ are first reshaped into a vector of length $2KN_{t}+2KN_{r}$ and then fed through a few fully connected layers with rectified linear unit (ReLU) activation functions. To prevent overfitting problems and enable efficient training [16], a batch normalization layer is inserted between each pair of the fully connected layers. The predicted phase shift vector is given as $\tilde{\boldsymbol{\theta}}^{\ast}=\{\tilde{\theta}_{1}^{\ast},\tilde{\theta}_{2}^{\ast},...,\tilde{\theta}_{K}^{\ast}\}$ , followed by the predicted reflection matrix $\widetilde{\boldsymbol{\Theta}}^{\ast}=\mathrm{diag}(e^{j\tilde{\theta}_{1}^{\ast}},\dots,e^{j\tilde{\theta}_{K}^{\ast}})$ . We notice that the parameter settings for the RIS network are given in Table I. Furthermore, the predicted reflection matrix $\widetilde{\boldsymbol{\Theta}}^{\ast}$ is then utilized to formulate the estimated cascaded channel of the indirect link channel from the transmitter to the receiver through the RIS device as $\mathbf{H}_{\mathrm{eff}}=\widehat{\mathbf{H}}^{H}\widetilde{\boldsymbol{\Theta}}^{\ast}\widehat{\mathbf{G}}$ .

TABLE I: Parameter setting for the encoder, decoder, and RIS model.

Layers	Encoder	RIS model	Decoder
Input	$MN_{s}+$ $2N_{t}N_{r}$	$2KN_{t}+2KN_{r}$	$2N_{r}$
$1$ st fully con. layer	1024	256	512
BatchNorm1d	1024	256	512
Activation function	ReLU	ReLU	ReLU
$2$ nd fully con. layer	1024	256	512
BatchNorm1d	1024	256	512
Activation function	ReLU	ReLU	ReLU
$3$ rd fully con. layer	$2N_{t}$	256	$MN_{s}$
BatchNorm1d	-	256	-
Activation function	-	ReLU	-
$4$ th fully con. layer	-	$K$	-
Activation function	-	Sigmoid	-
Normalization	$2N_{t}$	-	-

III-C Transmitter Design

In our transmitter design based on a neural network, the estimated cascaded channel $\mathbf{H}_{\mathrm{eff}}$ is considered as the input of the autoencoder along with the data bit stream. More specifically, as shown in Fig. 2, the estimated cascaded channel is concatenated with the data bit stream $\mathbf{b}$ and fed to the autoencoder. By this mechanism, the autoencoder can inherit the channel state information to combat fading and greatly improve the system performance. In our framework, the data bit stream $\mathbf{b}$ is divided into the $N_{t}$ streams of one-hot vector, each representing one of the $M$ possible modulated data signals. The input is then fed through multiple fully connected layers with the ReLU activation function and batch-normalization layers. Note that the last fully connected layer has a size of $2N_{t}$ corresponding to the real and imaginary part of the modulated data symbols at the transmitted antennas. The output of the auto encoder is then reshaped to generate the complex transmitted signal vector $\mathbf{x}\in\mathbb{C}^{N_{t}}$ . Before transmitting the signal, we apply an average power constraint by using a normalization layer, which is a custom layer and can be considered as a neural layer without any trainable parameters. The output of the normalization layer is

\mathbf{x}=PB^{1/2}\Big{(}\sum\nolimits_{i=1}^{B}\|\mathbf{x}^{\prime}_{i}\|_{2}^{2}\Big{)}^{-1/2}\mathbf{x}^{\prime},\vspace*{-0.1cm}

(11)

where $B$ is the mini-batch size and $\mathbf{x}^{\prime}\in\mathbb{C}^{N_{t}}$ is the output of the last fully connected layer. In (11), the predicted signal $\mathbf{x}$ is the transmitted signal with a transmit power level.

III-D Channel Layers

In order for the system to build an end-to-end framework, we design several custom layers to simulate the data propagation via the propagation channels with the presence of the RIS device as shown in Fig. 2. Similar to the normalization layer of the transmitter, these channel layers are custom layers with untrainable parameters to perform the complex multiplication between signals and channels. Different from [11], in the training process, the channels are changing along with every transmitted symbol. Therefore, the trained neural networks can encode the data bit streams with the aware of channel condition instead of only a function of the bit information as in the conventional modulation schemes.

III-E Decoder Design

The decoder is a fully connected neural network whose input is the received signal $\mathbf{y}\in\mathbb{C}^{N_{t}}$ . The input data in the complex field are first stacked into a vector including both the real and imaginary parts. After that, the stacked data go through the multiple fully connected layers with the ReLU activation function and the batch normalization layer between each pair of two fully connected layers. The output of the neural network responsible for the decoder is separated to form the recovered data $\hat{\mathbf{b}}=[\hat{\mathbf{b}}_{1},\cdots,\hat{\mathbf{b}}_{N_{t}}]$ of the original one-hot data $\mathbf{b}=[\mathbf{b}_{1},\cdots,\mathbf{b}_{N_{t}}]$ . We stress that there are total $M$ output classes in each data stream by separating the decoded signal. Furthermore, the equalization step is done at the receiver without any channel knowledge. In more details, Table I shows the parameter setting of the decoder in detail.

III-F Optimization Process

As the main theme of an autoencoder, we jointly learn the parameterized encoder, decoder, and RIS device by minimizing the loss function for a given modulation scheme as

		$\displaystyle\underset{\{f,g,r\}\textperiodcentered}{\mathrm{minimize}}$		$\displaystyle\mathcal{L}_{\mathrm{AE}}(\psi_{f},\psi_{g},\psi_{r}),$		(12)
		subject to		$\displaystyle\mathbf{s}\in\mathcal{M},$		(12)

where $\mathcal{M}$ is the finite constellation set defined by the M-QAM in this paper. $\psi_{f},\psi_{g},$ and $\psi_{r}$ are the parameters of the encoder $f$ , the decoder $g$ , and the RIS network $r$ , respectively. Since the data bits are represented by one-hot vectors, the detection of these bits can be regarded as a typical classification problem. Therefore, cross-entropy loss is readily used for the network optimization. The loss function $\mathcal{L}_{\mathrm{AE}}(\psi_{f},\psi_{g},\psi_{r})$ is formulated as

\mathcal{L}_{\mathrm{AE}}(\psi_{f},\psi_{g},\psi_{r})=\sum\nolimits_{i=1}^{N_{s}}\alpha_{i}\mathcal{L}_{i}(\psi_{f},\psi_{g},\psi_{r}),\vspace*{-0.1cm}

(13)

where $\mathcal{L}_{i}(\psi_{f},\psi_{g},\psi_{r})$ is the loss function for the $i$ -th data stream, which is defined as

\mathcal{L}_{i}(\psi_{f},\psi_{g},\psi_{r})=\\ -\frac{1}{B}\sum\nolimits_{m=1}^{B}\sum\nolimits_{n=0}^{M-1}\mathrm{log}\left(\frac{\mathrm{exp}\left([\mathbf{b}_{i}]_{n}\right)}{\mathrm{exp}\left(\sum\nolimits_{j=0}^{M-1}[\mathbf{b}_{i}]_{j}\right)}\right)p([\hat{\mathbf{b}}_{i}]_{n}),

(14)

where $[\mathbf{b}_{i}]_{n}$ denotes the $n$ -th bit of the one-hot data $[\mathbf{b}_{i}]$ , $p([\hat{\mathbf{b}}_{i}]_{n})$ is the output of the last fully connected layer in the decoder which can be regarded as the probability of the $(n+1)$ -th possible modulated signal for data $[\mathbf{b}_{i}]$ , and $\mathrm{exp}(\cdot)$ is the exponential function. In (13), $\alpha_{i}\geq 0$ is a weight associated with the $i$ -th loss function and satisfied $\sum_{i=1}^{N_{s}}\alpha_{i}=1$ . Notice that an equal weight setting, i.e., $\alpha_{i}=1/N_{s},\forall i,$ may lead to the unfair performance between each stream. To deal with unfair performance issues, we apply a dynamic scheme where the weights for the loss function are updated in the $t$ -th mini-batch as follows

\alpha_{i}^{t}=\mathcal{L}_{i}^{t}(\psi_{f},\psi_{g},\psi_{r})/\mathcal{L}^{t}_{AE}(\psi_{f},\psi_{g},\psi_{r}),\quad\forall t.

(15)

In this context, the autoencoder would be trained to obtain a balanced loss among the data streams. We exploit the stochastic gradient descent to train the autoencoder. Hence, the weights and biases are updated based on solving (12) and through the back propagation.

Remark 2.

The complexity of fully connected neural networks grows with the size of input and output. Therefore, the complexity of the autoencoder is in the order of $\mathcal{O}(MN_{s}+N_{t}N_{r}+K(N_{t}+N_{r}))$ which comprises of encoder, decoder and RIS model. From these calculations, our model obtains much lower complexity compared with the model-based algorithm.

IV Numerical Results

TABLE II: Average Running Time of the model-based and Autoencoder Approaches

	Model-based	Autoencoder
16	3.480 ms	0.251 ms
32	7.131 ms	0.255 ms

We evaluate the performance of our considered frameworks by a $4\times 2$ MIMO system transmitting $N_{s}=2$ data streams with equal transmitted power on each data stream $P=4$ [W]. The RIS device is equipped with eight phase-shift elements. The binary phase-shift keying (BPSK) modulation/demodulation are used at the transmitter and the receiver, respectively. In order for the system to train our deep neural networks, $200000$ different data symbols along with $200000$ channel realizations with $\sigma_{e}=0.1$ are used as data for the training phase. We define SNR as the ratio between the transmit signal power and the noise power. The SNR is fixed at $5\mathrm{dB}$ , while it will be varied in the testing phase. The Adam optimizer is selected to train the neural networks. The hyper-parameters are chosen as: the number of epochs is $10$ , the mini-bath size is $1000$ , and the learning rate is $0.0002$ . Conditioned on the phase-shift design, the following benchmarks are considered for comparison: $i)$ RIS-assisted Joint Transmitter and Receiver Design is the model-based approach presented in Sec. II and it is denoted as “Model-based” in the figures; $ii)$ RIS-assisted Autoencoder is the data-driven approach presented in Sec. III and it is denoted as “Autoencoder” in the figures.

In Fig. 3 $a$ , the cross entropy losses for all data streams are plotted as the function of the training iteration. As can be seen, loss functions of both data streams converge after a hundred training iterations. Thanks to the adaptive weight applied in loss function, the losses for both data streams are close to each other in every iteration. To evaluate our proposed model, in Fig. 3 $b$ , we plot the BER as a function of the SNR for all the considered benchmarks with the two different number of phase shifts. In addition, we also plot the performance of “Model-based” approach with random phase-shifts as a performance bound. As illustrated, thanks to the cooperation between transmitter, receiver and RIS, “Autoencoder” yields the lowest bit error rate in all SNR values. Moreover, when the number of RIS reflecting elements is increased, the performance of both models are greatly improved. This is very intuitive since “Model-based” can achieve higher channel capacity with more RIS reflecting elements. Even though higher capacity does not lead to optimal bit error rate, as mentioned in Remark 1, higher capacity is expected to attain a higher BER performance. For our model, given the increase in the number of phase-shifts, the end-to-end framework can encode and decode data in a more flexible way to combat noise effect and reduce error rate in decoding data. Additionally, as listed in Table. II, our model is dozens of times faster than model-based approaches with 200 iterations which shows its complexity efficiency.

To illustrate the robustness of our model to imperfect CSI, we test the proposed model with different channel error variance values with $K=16,32$ as in Fig. 3 $c$ . Surprisingly, in both setups the performance of “Autoencoder” remains almost unchanged with different channel error variance. These results show that “Autoencoder” is very robust to imperfect CSI. This robustness comes from the fact that “Autoencoder” is trained with imperfect CSI. Moreover, since various channel realizations are utilized in training process, the encoder can learn to encode data in a way that is robust to various channel conditions. Hence, the effect of imperfect CSI can be reduced. From the results, we can conclude that our proposed model can guarantee a good performance when the perfect CSI is not available at the transmitter.

V Conclusion

In this paper, an autoencoder approach for RIS-based MIMO system has been presented to enhance to bit error rate performance of the system. We replaced the transceiver architecture and the RIS model with three FCNN models and trained them jointly with the objective of minimizing the BER of estimated signal at the receiver. By utilizing the BPSK modulation scheme for two independent data streams, the performance of the proposed framework has been compared with the conventional RIS designs in terms of bit error rate performance. Due to the cooperation between the transmitter, receiver and the RIS operation, our framework showed the superior improvements in detecting signals at the receiver.

References

[1] Q. Wu, S. Zhang, B. Zheng, C. You, and R. Zhang, “Intelligent reflecting surface-aided wireless communications: A tutorial,” IEEE Transactions on Communications, vol. 69, no. 5, pp. 3313–3351, 2021.
[2] Q. Wu and R. Zhang, “Intelligent reflecting surface enhanced wireless network via joint active and passive beamforming,” IEEE Transactions on Wireless Communications, vol. 18, no. 11, pp. 5394–5409, 2019.
[3] C. Huang, A. Zappone, G. C. Alexandropoulos, M. Debbah, and C. Yuen, “Reconfigurable intelligent surfaces for energy efficiency in wireless communication,” IEEE Transactions on Wireless Communications, vol. 18, no. 8, pp. 4157–4170, 2019.
[4] S. Zhang and R. Zhang, “Capacity characterization for intelligent reflecting surface aided MIMO communication,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1823–1838, 2020.
[5] B. Ning, Z. Chen, W. Chen, and J. Fang, “Beamforming optimization for intelligent refecting surface assisted MIMO: A sum-path-gain maximization approach,” IEEE Wireless Communications Letters, vol. 9, no. 7, pp. 1105–1109, 2020.
[6] A. Khaleel and E. Basar, “Reconfigurable intelligent surface-empowered MIMO systems,” IEEE Systems Journal, vol. 15, no. 3, pp. 4358–4366, 2021.
[7] T. V. Chien, T. N. Canh, E. Biornson, and E. G. Larsson, “Power control in cellular massive mimo with varying user activity: A deep learning solution,” IEEE Transactions on Wireless Communications, vol. 19, no. 9, pp. 5732–5748, 2020.
[8] C. Huang, G. C. Alexandropoulos, C. Yuen, and M. Debbah, “Indoor signal focusing with deep learning designed reconfigurable intelligent surfaces,” pp. 1–5, 2019.
[9] H. Song, M. Zhang, J. Gao, and C. Zhong, “Unsupervised learning-based joint active and passive beamforming design for reconfigurable intelligent surfaces aided wireless networks,” IEEE Communications Letters, vol. 25, no. 3, pp. 892–896, 2021.
[10] C. Huang, R. Mo, and C. Yuen, “Reconfigurable intelligent surface assisted multiuser miso systems exploiting deep reinforcement learning,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 8, pp. 1839–1850, 2020.
[11] T. Erpek, Y. E. Sagduyu, A. Alkhateeb, and A. Yener, “Autoencoder-based communications with reconfigurable intelligent surfaces,” arXiv preprint arXiv:2112.04441, 2021.
[12] H. Jiang, L. Dai, M. Hao, and R. Mackenzie, “End-to-end learning for ris-aided communication systems,” IEEE Transactions on Vehicular Technology, pp. 1–1, 2022.
[13] T. Van Chien, C. Mollén, and E. Björnson, “Large-scale-fading decoding in cellular massive MIMO systems with spatially correlated channels,” IEEE Transactions on Communications, vol. 67, no. 4, pp. 2746–2762, 2018.
[14] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016, http://www.deeplearningbook.org.
[15] B. Hassibi and H. B. M, “How much training is needed in multiple-antenna wireless links?” IEEE Transactions on Information Theory, vol. 49, no. 4, pp. 951–963, 2003.
[16] S. Loffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.