Intent-driven Intelligent Control and Orchestration in O-RAN Via Hierarchical Reinforcement Learning
Abstract
rApps and xApps need to be controlled and orchestrated well in the open radio access network (O-RAN) so that they can deliver a guaranteed network performance in a complex multi-vendor environment. This paper proposes a novel intent-driven intelligent control and orchestration scheme based on hierarchical reinforcement learning (HRL). The proposed scheme can orchestrate multiple rApps or xApps according to the operator’s intent of optimizing certain key performance indicators (KPIs), such as throughput, energy efficiency, and latency. Specifically, we propose a bi-level architecture with a meta-controller and a controller. The meta-controller provides the target performance in terms of KPIs, while the controller performs xApp orchestration at the lower level. Our simulation results show that the proposed HRL-based intent-driven xApp orchestration mechanism achieves and increase in average system throughput with respect to two baselines, i.e. a single xApp baseline and a non-machine learning-based algorithm, respectively. Similarly, and increase in energy efficiency is observed in comparison to the same baselines.
Index Terms:
O-RAN, rApps, xApp, hierarchical reinforcement learning, orchestrationI Introduction
Open radio access network (O-RAN) facilitates openness and intelligence to support diverse traffic types and their requirements in 5G and beyond networks[1] as well as, multi-vendor RAN deployments. In a multi-vendor environment, rApps and xApps can be hosted in a non-real-time RAN intelligent controller (non-RT-RIC) and near-real-time RAN intelligent controller (near-RT-RIC). In the literature, xApps and rApps have been studied for resource and power allocation, beamforming and management, cell sleeping, traffic steering and so on[2, 3, 4]. Advanced reinforcement learning (RL) algorithms can be used to develop intelligent network functions in O-RAN. However, a multi-rApp or a multi-xApp scenario with a variety of AI-enabled Apps will require intelligent control and orchestration among the Apps to avoid performance degradation.
Note that, we focus on xApps as a case study but our work generalizes to rApps as well. To elevate autonomy in O-RAN via xApp orchestration, intent-driven network optimization goals can play a pivotal role. The intent is defined as an optimization goal that is a high-level command given by the operator usually in plain language and it determines a key performance indicator (KPI) that the network should meet, such as “increase throughput by ” or “increase energy efficiency by ” [5]. To better support autonomous orchestration of the xApps in a multi-vendor environment, emphasis on operators’ intents is crucial [6]. Intents aid in achieving agile, flexible, and simplified configuration of the wireless networks with minimum possible intervention. Furthermore, intelligent intent-driven management has the ability to constantly acquire knowledge and adjust to changing network conditions by utilizing extensive real-time network data. The inclusion of intent-driven goals for intelligent xApp control and orchestration is a promising yet highly complex task, since there are multiple vendors involved with different network functions and intents may trigger conflicting optimization goals in sub-systems. There are a few works on conflict mitigation or xApp cohabitation in O-RAN. For instance, Han et al. propose a conflict mitigation scheme among multiple xApps using team learning [7], and Polese et al. propose a machine learning (ML)-based pipeline for the cohabitation of multiple xApps in an O-RAN environment [8]. The work outlined in [9] introduces a method for achieving automation throughout the entire life cycle of xApps, beginning with the utilization scenario, requirements, design, verification, and ultimately, the deployment within networks. However, the operator intent is not involved in these works.
To this end, we propose a hierarchical reinforcement learning (HRL) method for intent-driven xApp orchestration. Different from the previous works, the proposed scheme has a bi-level architecture, where we can pass the intents to the top-level hierarchy, and process it as optimization goals for the lower-level controller to control and orchestrate xApps. Orchestration can avoid xApp conflicts and improve performance by combining xApps with similar performance objectives. The proposed method is compared with two baselines: non-machine learning (non-ML) solution and a single xApp scenario. Our simulation results show that the proposed HRL-based intent-driven xApp orchestration mechanism achieves and increase in average system throughput along with and increase in energy efficiency, compared to the single xApp and non-ML baselines, respectively.
The rest of the paper is organized as follows: Section II discusses the related works, followed by Section III which presents the system model elaborately. The proposed HRL-based xApp orchestration in O-RAN is covered in Section IV. Performance analysis and comparison of the proposed method along with the baselines are presented in Section V. Lastly, we present our conclusions in Section VI.
II Related work
There are a few works that investigate ML-based xApps for RAN optimization and control. Polese et al. propose an ML pipeline for multiple xApps in an O-RAN environment [8]. Han et al. propose a conflict mitigation scheme among deep reinforcement learning (DRL)-based power allocation and resource allocation xApps [7]. Polese et al. propose an Orchest-RAN scheme, in which network operators can specify high-level control objectives in non-RT-RIC to sort out the optimal set of data-driven algorithms to fulfill the provided intent [10]. While the work presented in [10] focuses on selecting the appropriate machine learning models and their execution locations for given inputs from the operator, it does not put emphasis on the network operator’s goals as optimization objectives to select and orchestrate xApps.
An intent-driven orchestration of cognitive autonomous networks of RAN management is presented in [11], where the authors propose a generic design of intent-based management for controlling RAN parameters and KPIs. Zhang et al. propose an intent conflict resolution scheme to realize conflict avoidance in machine learning-based xApps [12]. A graph-based solution is proposed in [13] to determine the specific network function required to fulfill an intent.
Compared with existing literature, the main contribution of this work is that we propose an HRL scheme for intent-driven orchestration of xApps. The HRL scheme can well fit the inherent O-RAN hierarchy with non-RT-RIC and near-RT-RIC, and intent-based orchestration enables higher flexibility for network control and management. The intents from the human level operator are provided as goals for the system to achieve, which leads to the orchestration of xApps to achieve the provided goal.
III System Model
III-A System Model
We consider an O-RAN-based downlink orthogonal frequency division multiplexing cellular system having BSs serving users simultaneously, and multiple small cells in the system are within the range of a macro cell. There are classes of traffic in the system and users are connected with multiple RATs via dual connectivity. There are classes of RATs (), where represents a certain access technology (LTE, 5G, etc.). The wireless system model considered in this work is presented in Fig. 1. RIC platforms in the figure (non and near-RT-RIC) can host rApps and xApps which are control and optimization applications operating at different time scales.
We design three xApps, namely traffic steering, cell sleeping, and beam forming xApps. In each xApp, we apply deep reinforcement learning for optimization within this xApp, which will be introduced in the following.
III-A1 Traffic Steering xApp
The traffic steering xApp aims to achieve a simultaneous balance of QoS requirements for various traffic classes by introducing a traffic steering scheme based on Deep Q-Network (DQN) [14]. We design the reward and state functions to ensure satisfactory performance, focusing on two essential KPIs: network delay and average system throughput. Traffic can be steered to a certain BS based on load experienced, link quality, and traffic type. The details of this xApp can be found in [2].
III-A2 Cell Sleeping xApp
The cell sleeping xApp is designed to reduce power consumption in the system by turning off idle or less busy BSs. The xApp can perform cell sleeping based on traffic load ratios and queue length of each BS. The energy consumption model for the BS is:
(1) |
where is the fixed power consumption, is the slope of load-dependent power consumption, is the transmission power, is the maximum transmission power, and is the constant power consumption in sleep mode [15].
The goal of the cell sleeping xApp is to maximize energy efficiency as much as possible without overloading the active BSs. The optimization goal is formulated as follows:
(2) |
where is the set of the user equipments (UEs) connected to a certain BS, represents the throughput, is the penalty factor to reduce overloading, and is the number of the BSs overloaded. Turning off the BSs can greatly decrease power consumption. It reduces the number of BSs active that are serving the live network traffic. This poses a risk of overloading the active BSs. Therefore, the penalty factor related to the number of BSs has been introduced to avoid excessive overloading.
To address the formulated problem, the following MDP is formulated:
-
•
State: The set of the state consists of: . represents the traffic load ratio of a BS, . The second element of the state space is the queue length of the BSs representing the load level.
-
•
Action: Turning the BSs on and off are put into the action set for the DQN implementation. .
-
•
Reward: The reward function is the same as eq. (2).
III-A3 Beamforming xApp
The third xApp is the beamforming xApp. We deploy band-switching BSs from 3.5 GHz to mmWave frequencies [16]. This allows us to support high throughput traffic like enhanced mobile broadband (eMBB) via accurate intelligent beamforming. This xApp can control power based on the location of the UE and it uses minimum transmission power needed which is energy efficient. The xApp employs analog beamforming, and a multi-antenna setup is adopted where each BS deploys a uniform linear array (ULA) of antennas [17]. The beamforming weights of every beamforming vector are implemented using constant modulus phase shifters. We also assume that there is a beam steering-based codebook, , from which every beamforming vector is selected [17].
Every BS has a transmit power , where is the set of candidate transmit powers. We want to optimize two parameters: throughput and energy efficiency using this xApp. To obtain such a goal, the following optimization problem is addressed.
(3) |
where is the throughput achieved by the system, is the defined throughput requirement for a traffic type , represents the energy efficiency associated with the BS throughput and transmission power, is the maximum theoretical energy efficiency, and and are the weight factors.
To solve the formulated problem, the following MDP is defined.
-
•
State: UE co-ordinates are used as set of states, .
-
•
Action: The action set consists of two elements: . Here, is the steering angle, and is the array steering vector in the direction of the -th element in the codebook. accounts for the power level change.
-
•
Reward: The reward function is the same as eq. (3) as presented before.
IV Proposed HRL-based xApp orchestration Scheme
RL problems can be formulated as MDPs where we have a set of states, actions, transition probability, and a reward function (). The RL agent in HRL consists of two controllers: a meta-controller and a controller [18]. The MDP for HRL has an added element which is denoted as a set of goals (). Depending on the current state, the meta-controller is responsible for generating high-level goals () for the controller. After that, these goals are transformed into high-level policies. The controller chooses low-level action according to the high-level policies. This process from the controller yields an intrinsic reward (). Finally, an extrinsic reward () is given to the meta-controller from the environment and it will provide the controller with a new goal (). This section will discuss the xApp orchestration scheme via HRL.
IV-A xApp Coordination Using HRL
The proposed O-RAN-based system architecture is presented in Fig 2. RIC platforms can host rApps and xApps which are applications operating at different time scales. Three xApps have been defined in previous sections. The rApp in the figure works as an input panel for the network operator, and it can convert these inputs as goals to be optimized. Also, it works as the meta-controller in the non-RT-RIC.
Let’s assume, is a set of xApps and is the subset of having at least one element (xApp in our case), that can optimize the network performance based on the operator input. Let be the set of candidate KPIs that a xApp can optimize and be the set of QoS requirements the system has to satisfy. Considering all these assumptions, the xApp orchestration problem that we want to address can be formulated as follows:
(4) |
where is the intended performance metric the operator intends to improve. is the penalty parameter for QoS requirement violation, and is the number of UEs QoS requirements violated to. Lastly, is the proposition that “An xApp can improve a performance metric”, which is either ‘0’ or ‘1’.
As presented in Fig. 2, the rApp in the system is directly connected to the user panel where the operator may provide input to the system. The operator input is provided as the percentage of the increase related to a certain KPI. For example, for throughput increase or for energy efficiency increase or any other intent stated in natural language. The rApp has a hierarchical deep Q-learning (h-DQN) framework [18]. The meta-controller (in non-RT-RIC) takes the increased amount of throughput or increased amount of energy efficiency as a goal, observes the state in the environment and provides both the goal and states to the controller in near-RT-RIC having a bundle of xApps. This type of data passing is done via the A1 interface by which both the non and near-RT-RIC are connected. The controller takes the action of choosing an xApp or a set of xApps based on the provided state and goal. Following, we define the MDP for the meta-controller and controller to address the xApp orchestration problem formulated in eq. (4).
-
•
State: The set of states consists of traffic flow types of different users in the network. UEs having similar traffic types are grouped together. . Elements in this set stand for five different traffic types in the system. Both meta-controller and controller share the same states.
-
•
Action: xApp or combination of xApp selection is considered as actions to be performed by the controller which is defined as: {.
-
•
Intrinsic reward: The intrinsic reward function () for the controller is: which is similar to eq. (4).
-
•
Goal for the controller: Increased throughput or increased energy efficiency level that can satisfy operator intent is passed to the controller as goals. It is for throughput increasing intents or for energy efficiency increasing intents. Note that these goals can be generalized to other KPIs however for simplicity we target throughput and energy efficiency.
-
•
Extrinsic reward: The meta-controller is responsible for the overall performance of the system. Therefore, we have set the extrinsic reward function for the meta-controller as the objective of the problem formulation presented in eq. (4). The following equation is basically the summation of the intrinsic reward over steps.
(5)
The whole process of xApp orchestration can be summarized as follows:
-
•
Step 1: Operator’s intent is provided as input regarding which performance metric is to be improved.
-
•
Step 2: These performance targets are provided to the controller in near-RT-RIC by the meta controller rApp in the non-RT-RIC as goals to achieve.
-
•
Step 3: The controller selects an xApp or a combination of xApps to reach the target performance as close as possible. The system learns based on the reward it gets for such kind of xApp selection.
-
•
Step 4: Selected xApps with their own DRL-based functionalities optimize the performance of the network as a response to the intent of the operator.
IV-B Baseline Algorithms
This section includes two baselines. The first baseline is a simulation of the same network scenario based on the system model we have presented so far where there is no intelligent DRL-based xApp to optimize the network. We use non-ML algorithms. For comparing the throughput performance of the proposed HRL-based system, we use a threshold-based traffic steering scheme proposed in [19]. It uses a predefined threshold. The threshold is determined considering the load at each station, channel condition, and user service type. The mean of all these metrics is taken to obtain the threshold () values. Weighted summation of the same parameters is taken to form a variable (). Then, the traffic is steered to another BS based on the and values. This baseline does not include cell sleeping, therefore BSs are always on. In our second baseline, we consider single xApp scenarios. For example, the proposed HRL-based xApp orchestration mechanism is compared with single xApp scenarios where only traffic steering xApp is in action, or the cell sleeping xApp is in action.
V Performance Evaluation
V-A Simulation setup
A MATLAB-based simulation environment has been developed having one eNB and four gNBs to serve as one macro-cell and four small cells. In total, we deploy 60 UEs with five different traffic types: voice, gaming, video, URLLC, and eMBB. Different types of traffic in the system have variant requirements in terms of different KPIs. QoS requirements of different traffic types have been defined based on our previous work [20]. eMBB and URLLC traffic types have been added additionally to test the system compatibility. For the eMBB traffic type, we consider packet size, , and to be 1500 bytes, 100 Mbps, and 15 ms, respectively [21]. Lastly, specifications related to delay and packet size for the URLLC traffic are set to 32 bytes and 2.5 ms.
A 5G NSA mode having different types of RAT (LTE and 5G) in the simulation environment work together. We deploy an architecture based on [22]. The carrier frequency for LTE is set to be 800 MHz. For 5G NR small cells, band-switching BSs are deployed at 3.5 GHz and 30 GHz. BS transmission power for LTE and 5G NR is set to be 38 dBm and 43 dBm, respectively [23].
For the HRL implementation, the starting rate of learning is set to 0.95. In order to maintain stable learning performance, we reduce the learning rate periodically after a certain number of episodes. Additionally, the discount factor used is 0.3. The simulation is conducted 10 times using MATLAB, and the average outcomes are presented along with a 95% confidence interval.
V-B Simulation results
Before conducting the performance evaluation of the proposed xApp orchestration scheme, first, we present how the intent-oriented HRL-based orchestration scheme works. Fig. 3 shows that the operator intent of “increase throughput” leads to the selection of certain xApps. When there is a throughput increase intent from the operator, after a few time slots, there is a sharp increase in throughput. This is because xApp1 (traffic steering xApp) has been invoked. When a 5% increase is again given as an input, a combination of xApp1 and xApp3 (intelligent beamforming xApp) is selected. When the operator provides an intent to decrease power consumption by , we can see from Fig. 3 that there is a sharp decrease in throughput. This is because xApp1 and xApp3 have been terminated at the 461-th time slot and xApp2 (cell sleeping xApp) has been invoked.
Fig. 4 presents a similar graph to the previous one but this time it plots the energy efficiency in the time axis. When there is an intent from the operator to achieve “ increase in energy efficiency”, we can see that there is an initiation of xApp2 at the 131-th time slot. This xApp performs cell sleeping and saves energy. For the next energy efficiency increase intent given by the operator, it can be seen that both xApp2 and xApp3 are working together. The proposed HRL-based algorithm has successfully orchestrated these two xApps for the desired performance gain. Fig. 3 and 4 basically show the utility of the proposed system. Not only it can induce operator intent as an optimization goal, but also it can orchestrate xApps to gain desired performance output by using the proper combination of xApps.
Fig. 5 shows the performance comparison between the proposed HRL-based xApp orchestration scheme and the baseline scenarios in terms of average system throughput. Results are obtained under a constant load of 6 Mbps. The proposed orchestration scheme achieves a increase and increase in average system throughput compared to the non-ML algorithm and single xApp scenario (traffic steering xApp), respectively. It is because of the efficient orchestration mechanism that involves multiple xApps that trigger the optimal combination of xApps to reach better performance based on the operator intent.
Fig. 6 shows the performance comparison between the proposed HRL-based xApp orchestration scheme and the baseline scenarios in terms of average energy efficiency. The proposed orchestration scheme obtains a increase and increase in average energy efficiency compared to the single xApp and non-ML scenario (cell sleeping xApp), respectively. Similar to the former, it is because of the HRL-based orchestration mechanism that incorporates multiple xApps to achieve better performance based on the user intent. Also, note that we use traffic steering in the former figure and cell sleeping in this evaluation because they are specifically optimizing throughput and energy respectively.
VI Conclusions
In this paper, we show that the HRL-based intent-driven orchestration mechanism is vastly effective in not only optimizing KPIs but also providing great flexibility and control to the operator. In this study, we have introduced a novel HRL-based xApp orchestration mechanism that can perform xApp management and provide recommendations for the best combination of xApps given the operator’s intent. The optimal xApp orchestration scheme has led to a increase in average system throughput and a increase in energy efficiency compared to single xApp usage with no orchestration. In our future work, we plan to extend this orchestration to rApps and other xApps with complex KPI interactions.
Acknowledgement
This work has been supported by MITACS and Ericsson Canada, and NSERC Canada Research Chairs and NSERC Collaborative Research and Training Experience Program (CREATE) under Grant 497981.
References
- [1] L. Bonati, S. D’Oro, M. Polese, S. Basagni, and T. Melodia, “Intelligence and Learning in O-RAN for Data-Driven NextG Cellular Networks,” IEEE Communications Magazine, vol. 59, no. 10, pp. 21–27, 2021.
- [2] M. A. Habib, H. Zhou, P. E. Iturria-Rivera, M. Elsayed, M. Bavand, R. Gaigalas, S. Furr, and M. Erol-Kantarci, “Traffic Steering for 5G Multi-RAT Deployments using Deep Reinforcement Learning,” in 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), 2023, pp. 164–169.
- [3] Y. Dantas, P. E. Iturria-Rivera, H. Zhou, M. Bavand, M. Elsayed, R. Gaigalas, and M. Erol-Kantarci, “Beam Selection for Energy-Efficient mmWave Network Using Advantage Actor Critic Learning,” 2023.
- [4] H. Zhou, L. Kong, M. Elsayed, M. Bavand, R. Gaigalas, S. Furr, and M. Erol-Kantarci, “Hierarchical Reinforcement Learning for RIS-Assisted Energy-Efficient RAN,” in GLOBECOM 2022 - 2022 IEEE Global Communications Conference, 2022, pp. 3326–3331.
- [5] K. Mehmood, K. Kralevska, and D. Palma, “Intent-Driven Autonomous Network and Service Management in Future Cellular Networks: A Structured Literature Review,” Computer Networks, vol. 220, p. 109477, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389128622005114
- [6] A. Leivadeas and M. Falkner, “A Survey on Intent-Based Networking,” IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 625–655, 2023.
- [7] H. Zhang, H. Zhou, and M. Erol-Kantarci, “Team Learning-Based Resource Allocation for Open Radio Access Network (O-RAN),” in ICC 2022 - IEEE International Conference on Communications, 2022, pp. 4938–4943.
- [8] M. Polese, L. Bonati, S. D’Oro, S. Basagni, and T. Melodia, “ColO-RAN: Developing Machine Learning-based xApps for Open RAN Closed-loop Control on Programmable Experimental Platforms,” 2022.
- [9] A. Kilks, M. Dryjansky, V. Ram, L. Wong, and P. Harvey, “Towards Autonomous Open Radio Access Networks,” ITU Journal on Future and Evolving Technologies, vol. 4, no. 2, 2023.
- [10] S. D’Oro, L. Bonati, M. Polese, and T. Melodia, “OrchestRAN: Network Automation Through Orchestrated Intelligence in the Open RAN,” in IEEE INFOCOM 2022 - IEEE Conference on Computer Communications, 2022, pp. 270–279.
- [11] A. Banerjee, S. S. Mwanje, and G. Carle, “An Intent-Driven Orchestration of Cognitive Autonomous Networks for RAN Management,” in 2021 17th International Conference on Network and Service Management (CNSM), 2021, pp. 380–384.
- [12] J. Zhang, J. Guo, C. Yang, X. Mi, L. Jiao, X. Zhu, L. Cao, and R. Li, “A Conflict Resolution Scheme in Intent-Driven Network,” in 2021 IEEE/CIC International Conference on Communications in China (ICCC), 2021, pp. 23–28.
- [13] E. J. Scheid, C. C. Machado, M. F. Franco, R. L. dos Santos, R. P. Pfitscher, A. E. Schaeffer-Filho, and L. Z. Granville, “INSpire: Integrated NFV-based Intent Refinement Environment,” in 2017 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), 2017, pp. 186–194.
- [14] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-Level Control Through Deep Reinforcement Learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- [15] P. Ren and M. Tao, “A Decentralized Sleep Mechanism in Heterogeneous Cellular Networks with QoS Constraints,” IEEE Wireless Communications Letters, vol. 3, no. 5, pp. 509–512, Oct. 2014.
- [16] F. B. Mismar, A. Alammouri, A. Alkhateeb, J. G. Andrews, and B. L. Evans, “Deep Learning Predictive Band Switching in Wireless Networks,” IEEE Transactions on Wireless Communications, vol. 20, no. 1, pp. 96–109, 2021.
- [17] F. B. Mismar, B. L. Evans, and A. Alkhateeb, “Deep Reinforcement Learning for 5G Networks: Joint Beamforming, Power Control, and Interference Coordination,” IEEE Transactions on Communications, vol. 68, no. 3, pp. 1581–1592, 2020.
- [18] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. B. Tenenbaum, “Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation,” CoRR, vol. abs/1604.06057, 2016. [Online]. Available: http://arxiv.org/abs/1604.06057
- [19] M. Khaturia, P. Jha, and A. Karandikar, “5G-Flow: A unified Multi-RAT RAN architecture for beyond 5G networks,” Computer Networks, vol. 198, p. 108412, 2021.
- [20] M. A. Habib, H. Zhou, P. E. Iturria-Rivera, M. Elsayed, M. Bavand, R. Gaigalas, Y. Ozcan, and M. Erol-Kantarci, “Hierarchical Reinforcement Learning Based Traffic Steering in Multi-RAT 5G Deployments,” 2023.
- [21] A. Chagdali, S. E. Elayoubi, and A. M. Masucci, “Impact of Slice Function Placement on the Performance of URLLC with Redundant Coverage,” in 2020 16th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), 2020, pp. 1–6.
- [22] P. Frenger and R. Tano. (2019) A Technical Look at 5G Energy Consumption and Performance. [Online]. Available: https://www.ericsson.com/en/blog/2019/9/energy-consumption-5G-nr
- [23] E. Dahlman, S. Parkvall, and J. Skold, 4G, LTE-Advanced Pro and The Road to 5G, Third Edition, 3rd ed. USA: Academic Press, Inc., 2016.