myequationsequList of Formulas
RIS-enhanced Resilience in Cell-Free MIMO
Zusammenfassung
More and more applications that require high reliability and fault tolerance are realized with wireless network architectures and thus ultimately rely on the wireless channels, which can be subject to impairments and blockages. Hence, these architectures require a backup plan in the physical layer in order to guarantee functionality, especially when safety-relevant aspects are involved. To this end, this work proposes to utilize the reconfigurable intelligent surface (RIS) as a resilience mechanism to counteract outages. The advantages of RISs for such a purpose derive from their inherent addition of alternative channel links in combination with their reconfigurability. The major benefits are investigated in a cell-free multiple-input and multiple-output (MIMO) setting, in which the direct channel paths are subject to blockages. An optimization problem is formulated that includes rate allocation with beamforming and phase shift configuration and is solved with a resilience-aware alternating optimization approach. Numerical results show that deploying even a randomly-configured RIS to a network reduces the performance degradation caused by blockages. This becomes even more pronounced in the optimized case, in which the RIS is able to potentially counteract the performance degradation entirely. Interestingly, adding more reflecting elements to the system brings an overall benefit for the resilience, even for time-sensitive systems, due to the contribution of the RIS reflections, even when unoptimized.
Index Terms:
Resilience, reconfigurable intelligent surface (RIS), intelligent reflecting surface (IRS), outage, failure, cell-free MIMO, resource allocation, quality of service.I Introduction
Fueled by the large-scale deployment of 5G systems, IoT (IoT) technologies connect huge numbers of low-power, low-complexity, and battery-limited devices, each of which requiring specific data throughputs [ericsson]. More and more enterprises migrate the IoT network architecture towards next-generation low-power wide-area access technologies, e.g., Cat-M and narrowband IoT, which are more energy efficient, reliable, and enable higher capacities [ericsson]. Critical IoT (or mission-critical IoT) has gained considerable attention in the recent years thanks to use-cases such as factory automation, remote monitoring/interaction, UAV control and vehicular communications [crit1], [crit2]. Critical IoT refers to IoT applications that require high reliability and low latency, which lay the foundation to a plethora of operations that are dependent on continuous data streams.
As a consequence, the role of the wireless communication network gains increased significance and bears a higher responsibility. However, this responsibility can become a problem, as the wireless channel is not only nondeterministic but also subject to shadowing and blockages, which might lead to outages because of failed packet deliveries. Depending on the situation, an outage can result in impacts of different kinds ranging from minor delays to safety-relevant aspects like harming the environment or on-site workers. Consequently, it becomes imperative that the wireless network has the ability to evaluate and react to outages, while maintaining an acceptable level of service [resiliencemetric, RobertRes].
One of the main challenges for such URLLC (URLLC) systems stems from the inability of utilizing legacy retransmission-based methods in the next transmission block to account for failed packet deliveries. Instead, the system needs to continuously monitor the network and should be able to quickly apply mechanisms that counteract potential outage scenarios, thus offering resilience against failures [resiliencemetric]. In the context of a wireless network, such resilience mechanisms can be implemented by reallocating the network’s resources accordingly [RobertRes]. In this work, we propose the RIS (RIS) as a resilience mechanism, which is a metasurface comprised of multiple tunable reflecting elements that can introduce a phase shift to incoming signals in real-time [howitworks, RIStut]. With this reconfigurability, an RIS is able to support the wireless transmission in different ways, i.e., by extending coverage, suppressing interference, or changing the channel statistics [SynBenefitsConf, SynBenefits, corrBj, BjonAtten]. As a resilience method, the advantages of RIS are twofold: On the one hand, new RIS-assisted paths are introduced to the system, which can be utilized in case of a blockage in the direct paths [Basar1]. This extends the resilience scope of the system, as the addition of the RIS enables the recovery from situations (like blockages in all direct paths) that would previously result in failures. On the other hand, these new paths are customizable. Thus the adaptation to disruptions, that were in the resilience scope before deploying the RIS, can be improved upon by a smart configuration of the phase shifters.
To this end, this work investigates the fundamental concept of utilizing the RIS as a resilience method in a cell-free MIMO (MIMO) downlink system, in which a CP (CP) serves single-antenna users through distributed AP. In order to highlight the effects and influence of the RIS more prominently, we assume that the CP has perfect global instantaneous CSI (CSI). Based on this information an optimization problem is formulated that jointly allocates the rates of the users, while designing the beamformers and phase shifters to minimize the network-wide adaption gap. To facilitate practical implementation, we propose a resilience-aware alternating optimization framework, which splits the given non-convex optimization problem into two convex sub-problems. Additionally, this framework is able to take the system-specific requirements of either a high-quality or quick recovery into account by determining a solution that satisfies this specific quality-time trade-off.
II System Model
This paper considers the RIS-aided cell-free MIMO downlink system depicted in Fig. 1. More precisely, a set of single-antenna users is served by a set of -antenna AP . We consider the RIS to be a uniform planar array, which is composed of passive reflecting elements. We assume that during the process of positioning the RIS, it is assured that it is able to provide an alternative path for every user in case of blockages in their direct links. The AP, as well as the RIS, are connected to the CP via perfect orthogonal fronthaul links, facilitating central processing at the CP. In addition, each user has a QoS (QoS) target represented by a desired data rate .
II-A Channel Model
The channel model considered in this paper assumes quasi-static block fading channels, where the channel coefficients remain constant within the coherence time, but may change independently among coherence blocks. We denote the direct channel link between AP and user as . The reflected-channel link provided by the RIS between AP and user is denoted by , where denotes the link between AP and the RIS, denotes the link between the RIS and user and denotes the reflection coefficient matrix with . Here, is the reconfigurable reflection coefficient at the -th reflecting element, which is composed of a phase shift . Further, we denote the aggregate direct channel vector of user as , the aggregate AP to RIS channel matrix as and the aggregate transmit signal vector as .
Utilizing the aggregate vectors, the received signal at user can be expressed as the sum of the direct and reflected channel vectors, namely
(1) |
where and is the AWGN (AWGN) sample.
Further, the symbols intended to be decoded by user are denoted by . We assume that these messages form an i.i.d. (i.i.d.) Gaussian codebook. These symbols for user are transmitted by the -th AP using the beamforming vector , which are both provided by the CP over an ideal fronthaul.
Hence, the overall transmit signal vector at the -th AP is given as , which is subject to the power constraint , or equivalently
(2) |
The received signal (1) at user is then given by
(3) |
where the first term is the desired signal at the user and the second term is the received interference from all other users. Thus, we formulate the SINR (SINR) of user decoding its message as
(4) |
Using these definitions, the QoS demands for each user are satisfied, if the following conditions are met
(5) |
where denotes the transmission bandwidth and denotes the rate of user .
II-B Resilience Metric
This paper considers a specific target throughput, e.g., determined by the QoS requirements by the network with being user ’s desired QoS requirement. Note that the QoS demands are assumed to remain constant within the observation interval. In contrast, the network’s sum throughput is dependent on the allocated resources at time , and thus captured in , where is the allocated data rate for user at time . Regarding the time axis, there are two major cornerstones on the resilience behavior, namely , the initial time at which the degradation manifests and , the time of recovery. With those aspects at hand, and considering the resilience metric proposed in [resiliencemetric, RobertRes], we define the networks absorption, adaption, and time-to-recovery metrics as
(6) |
(7) |
respectively, where is the network operator’s desired recovery time, i.e., the time for which a functionality degradation is tolerable. A linear combination of the equations (6)-(7) yields the considered resilience metric
(8) |
with fixed weights , , denoting the network operator’s needs, e.g., emphasizing the robustness, the quality of adaption, or the recovery time. We also let , thus, the best-case value for the resilience is .
Note that the proposed resilience metric (8) can be temporally divided in anticipatory actions, i.e., , that take place before the outage occurs and reactionary actions , i.e., and , which occur after the outage. Regarding the anticipatory actions, various work are concerned with designing wireless communication networks to be robust against adverse conditions, e.g., [scalableRS]. However, the literature on designing resilient wireless communication systems from the physical layer resource management perspective with a focus on adaption and time-to-recovery remains limited in breadth and depth. Thus, this paper mainly focuses the reactionary actions as they are sufficient to demonstrate the trade-off between the quality of a solution and the time necessary to obtain it. Consequently, we assume throughout this work.
III Problem Formulation
In order to study the effect of the RIS on the resilience performance, we consider the problem of minimizing the constrained network-wide adaption gap, namely
(P1) | ||||
s.t. | ||||
(9) | ||||
(10) |
where is the stacked rate vector and (10) are the unit modulus constraints representing the phase shift constraints . It can be observed that the feasible set of the formulated problem is non-convex due to the non-convex nature of its constraints (9), (10) and strong coupling of variables in (9). Therefore, we solve the problem by decoupling the variables with the alternating optimization approach proposed in [SynBenefits], where both emerging sub-problems are efficiently solved using the same SCA (SCA) framework. In addition, these sub-problems can also be considered as standalone resilience mechanisms (see Algorithm 1), facilitating a more in-depth study of the impact of the RIS on the system’s resilience.
III-A Beamforming Design
As a result of the alternating optimization approach, the phase shifters are assumed to be fixed for the duration of the beamforming design. Thus, problem (P1) can be rewritten as
(P2) | ||||
s.t. | ||||
(11) |
(12) | |||
(13) |
where the introduction of the slack variables convexifies the rate expressions and signifies an element-wise inequality. However, the constraints in (12) are still non-convex but can be convexified using the SCA approach. To this end, we rewrite (12) as
(14) |
and apply the first-order Taylor approximation around the point on the fractional term. Consequently, the following convex approximation of (14) can be derived, see also [SynBenefits],
(15) |
Thus, the approximation of problem (P2) can be written as
(P2.1) | ||||
s.t. |
Problem (P2.1) is convex and can be solved iteratively using the SCA method. More precisely, we define as a vector stacking the optimization variables of the beamforming design problem at iteration , where . Similarly and denote the optimal solutions and the point, around which the approximations are computed, respectively. Thus, with a given point , an optimal solution can be obtained by solving problem (P2.1).
III-B Phase Shift Design
During the design process of the phase shifters at the RIS, the beamformers are assumed to be fixed due to the application of the alternating optimization approach. With the intent of utilizing a similar problem structure as in (P2), we denote , where and . With the above definitions, the SINR constraints can be written similar to (14) as
(16) |
At this point, the overall optimization problem for the phase shift design can be formulated as
(P3) | ||||
s.t. |
where the penalty method [PenaltyMethod] for the phase shift constraints (10) is adopted and is a large positive constant. Similar to (14), (16) can be approximated by calculating the first-order Taylor approximation of (16) on around the point , denoted by . Further, the objective function can be approximated around the point by first-order Taylor approximation of the penalty term , which is given by . Based on the above approximation methods, problem (P3) is approximated by the following convex problem:
(P3.1) | ||||
s.t. |
Due to the similarity of the problem formulation and the utilization of the same SCA framework, problem (P3.1) can be solved by defining and following the same iterative procedure as for solving problem (P2.1).
III-C Resilience-aware Alternating Optimization
In this section, we outline an alternating optimization procedure, that is suitable to be utilized for resilience applications. Usually, when considering an SCA approach in the literature, the goal is to retrieve the highest-quality solution for a given problem [SynBenefits, PenaltyMethod, SynBenefitsConf]. Hence, these works do not take the duration of obtaining these solutions into consideration and employ time-intensive outer and inner loops, until some convergence criteria are met.
In the context of resilience, however, we are not necessarily interested in the highest-quality results. Instead we aim to obtain a solution, which satisfies a specific quality-time trade-off specified by the weights in (8). Consequently, the proposed resilience-aware alternating optimization works towards minimizing the network wide gap and stops as soon as it lies below a certain threshold . Further, it dispenses from the convergence criteria of the inner loops when solving the sub-problems. Instead, a fixed amount of iterations of solving the problems (P2.1) and (P3.1) are introduced. This is possible due to the utilization of the same framework for both sub-problems because it makes a feasible point for both problems without additional adaption. The advantage of using fixed iterations lies in the fact that the intermediate solutions and of these sub-problems are also suitable to be evaluated in order to improve the resilience metric. This enables the algorithm to react to different values of , i.e., the weight of the recovery-time, efficiently. Note, that for and large , the proposed algorithm reduces to behaving like the conventional convergence-based algorithms, thus representing a generalization to them. The detailed steps of the resilience-aware alternating optimization are illustrated in Algorithm 1.
IV Numerical results
In this section, we numerically evaluate the performance of an RIS as a resilience method. To this end we assume a cell-free MIMO system with AP, each of which equipped with antennas. We assume the single-antenna users, as well as the AP, to be distributed randomly within an area of operation, which spans . The RIS is positioned in the center of this area and is assumed to be composed of reflecting elements, deployed in a quadratic grid with spacing, where m is the wavelength. Hence, we employ the correlated channel model introduced in [corrBj], where the average attenuation intensity is modeled after [BjonAtten, Eq.(23)]. For the direct links we assume Rayleigh fading channels with log-normal shadowing with 8dB standard deviation. Further, we assume a bandwidth of , a noise power of dBm, a maximum transmit power of dBm per AP and each user to require a QoS of Mbps. We define the occurrence of an outage as an event, where each individual direct link between AP and user has a 12% probability to be subject to a complete blockage. Note that the RIS has been positioned in a way that the RIS-assisted links are exempt from blockages. In addition, we assume that rate adaption as a resilience mechanism is utilized right after every outage occurs [RobertRes, M1]. To this end, Problem (P2.1) is immediately solved with fixed beamformers after occurrence of any outage. The network operator’s desired recovery time is set to ms.
IV-A Convergence Behaviour
First we study the effect of the amount of fixed iterations employed in Algorithm 1. To this end, we define two approaches: 1) an alternating approach (alt), in which we set and 2) a convergence-based approach (conv), in which each-subproblem is optimized until convergence. At this point, it also becomes important to decide which sub-problem is solved first after an outage occurs. Thus, we also compare the performance of both approaches above when initializing with either the beamforming problem (alt-BF, conv-BF) or the phase shifting problem (alt-PS, conv-PS) after an outage has occurred. Fig. LABEL:altVSconv depicts an outage, occuring at ms, and the changes in of the different approaches over the time required to solve the respective sub-problems. Note that the slope between any two points of a curve represents the quality-time trade-off between those points.
It becomes apparent that both red curves, representing the approaches starting with the beamforming sub-problem, perform better right after the outage has occurred. The rationale behind this behaviour lies in the fact that the beamformer of any AP-link that is affected by an outage should be redirected to the RIS first, before reflecting any incoming signals at the RIS to the users. Fig. LABEL:altVSconv also shows that the alternating approaches are performing better with regards to the quality-time ratio between each iteration than the convergence-based ones. In addition, the alternating approach starting with the beamforming sub-problem (alt-BF) not only performs the best considering the quality-ti