Attention-based SIC Ordering and Power Allocation for Non-orthogonal Multiple Access Networks

Liang Huang, Bincheng Zhu, Runkai Nan, Kaikai Chi, Yuan Wu L. Huang, B. Zhu, R. Nan, and K. Chi are with the School of Computer Science and Technology, Zhejiang University of Technology, China (e-mail: {lianghuang, bczhu, rknan, kkchi}@zjut.edu.cn).
Y. Wu is with the State Key Laboratory of Internet of Things for Smart City, University of Macau, Macau, China, and also with the Department of Computer and Information Science, University of Macau, Macau, China (e-mail: [email protected]).

Abstract

Non-orthogonal multiple access (NOMA) emerges as a superior technology for enhancing spectral efficiency, reducing latency, and improving connectivity compared to orthogonal multiple access. In NOMA networks, successive interference cancellation (SIC) plays a crucial role in decoding user signals sequentially. The challenge lies in the joint optimization of SIC ordering and power allocation, a task made complex by the factorial nature of ordering combinations. This study introduces an innovative solution, the Attention-based SIC Ordering and Power Allocation (ASOPA) framework, targeting an uplink NOMA network with dynamic SIC ordering. ASOPA aims to maximize weighted proportional fairness by employing deep reinforcement learning, strategically decomposing the problem into two manageable subproblems: SIC ordering optimization and optimal power allocation. Our approach utilizes an attention-based neural network, which processes instantaneous channel gains and user weights to determine the SIC decoding sequence for each user. A baseline network, serving as a mimic model, aids in the reinforcement learning process. Once the SIC ordering is established, the power allocation subproblem transforms into a convex optimization problem, enabling efficient calculation of optimal transmit power for all users. Extensive simulations validate ASOPA’s efficacy, demonstrating a performance closely paralleling the exhaustive method, with over 97% confidence in normalized network utility. Compared to the current state-of-the-art implementation, i.e., Tabu search, ASOPA achieves over 97.5% network utility of Tabu search. Furthermore, ASOPA is two orders magnitude less execution latency than Tabu search when $N=10$ and even three orders magnitude less execution latency less than Tabu search when $N=20$ . Notably, ASOPA maintains a low execution latency of approximately 50 milliseconds in a ten-user NOMA network, aligning with static SIC ordering algorithms. Furthermore, ASOPA demonstrates superior performance over baseline algorithms besides Tabu search in various NOMA network configurations, including scenarios with imperfect channel state information, multiple base stations, and multiple-antenna setups. Such results underscore ASOPA’s robustness and effectiveness, highlighting its ability to excel across various NOMA network environments. The complete source code for ASOPA is accessible at https://github.com/Jil-Menzerna/ASOPA.

Index Terms:

non-orthogonal multiple access (NOMA), successive interference cancellation (SIC), deep reinforcement learning (DRL), resource allocation.

1 Introduction

With the rapid development of online gaming, augmented and virtual reality, 3-dimensional media, and Internet of Things, wireless network traffic has increased significantly, which is a challenge for orthogonal multiple access (OMA) schemes. To meet the growing demand, the next generation of wireless networks exploits advanced multiple access technologies [1, 2, 3], including the non-orthogonal multiple access (NOMA) [4, 5] and the rate-splitting multiple access (RSMA) [6]. Combining NOMA with other technologies, such as cognitive radios, unmanned aerial vehicles, mobile edge computing, simultaneous wireless information and power transfer, etc., can bring considerable advantages [7, 8]. Specifically, NOMA gives improved spectral efficiency, energy efficiency, higher data rates, massive connectivity, and diversity of wireless service [9, 10].

For the uplink power-domain NOMA network, users simultaneously transmit their data over the same frequency resource, so there is inter-user interference due to the broadcast and superposition nature of the wireless medium [11, 12]. At the base station (BS) of uplink NOMA-based networks, the successive interference cancellation (SIC) technique decodes different users’ messages from the received signal. Using SIC, the BS sequentially decodes users’ messages according to a particular order. When decoding a user’s message, the remaining undecoded signal is treated as interference. After decoding, a user’s message will be subtracted from the received signal. The procedure continues until all users’ messages are decoded according to a specific SIC order. Although different SIC orderings generate the same sum throughput of the NOMA network with a single BS [13], they affect the throughput of each user [15, 17, 14, 16]. The earlier a user’s signal is decoded, the more substantial interference it suffers. When the quality of service of an individual user matters, it is necessary to optimize the SIC order for better performance metrics, e.g., outage probability [19, 18, 20], latency [21], and energy consumption [23, 22].

For a NOMA network, the joint optimization of SIC ordering and resource allocation is a Non-deterministic Polynomial (NP) hard problem [21]. Many researchers decomposed the joint optimization into SIC ordering optimization and resource allocation. For SIC ordering, the total number of decoding orderings is the factorial of the number of users. The exhaustive method obtains the optimal SIC ordering by enumerating all possible SIC orderings and is limited to small-scale scenarios, i.e., with five users[24]. To tradeoff performance and computational complexity, there are two types of heuristic methods for SIC ordering in many works, i.e., static heuristic methods and dynamic heuristic methods. Some researchers [25, 27, 28, 19, 17, 20, 29, 30, 31, 32, 18, 26] adopted a static SIC ordering order with respect to a single metric and optimized the resource allocation when considering specific NOMA-based wireless networks. The execution latency of static SIC ordering methods is very low, which is close to the conventional SIC ordering algorithms, i.e., the descending order and ascending order of channel quality. However, these static SIC ordering methods can not cope with complex NOMA wireless scenarios, as finding the corresponding single metric is problematic. Some other works [21, 33, 34, 14, 3] tried to iteratively search for the SIC ordering, e.g., greedy insertion and linear relaxation. While the dynamic heuristic methods apply to most NOMA wireless scenarios and achieve better performance compared to static SIC ordering algorithms, they still have high complexity in the joint optimization problems due to repeatedly optimizing the resource optimization.

Recently, deep learning has emerged as a promising approach for making near-optimal decisions. With a large labelled dataset, it can achieve substantial performance through supervised learning. However, in the joint optimization of SIC ordering and resource allocation, obtaining the optimal solution labels is challenging, particularly in large-scale NOMA wireless networks. In this regard, deep reinforcement learning (DRL) is a suitable technique that can train a model using a dataset without labels. Some recent studies have utilized DRL techniques to efficiently solve computation-intensive resource allocation problems in wireless networks, such as Deep Q-Network [35, 36], actor-critic algorithm [37, 38], deep deterministic policy gradient [40, 39], and proximal policy optimization [41, 42]. Recently, a pioneering deep reinforcement learning-based framework named DROO is introduced to address the hybrid integer-continuous challenge in mobile edge computing [43]. DROO ingeniously splits the primary optimization issue into two subproblems: a zero–one binary offloading decision and a continuous resource allocation task. These subproblems are then individually managed through a model-free learning module and a model-based optimization module, respectively [44]. Nevertheless, DROO and its subsequent iterations [45, 46], are constrained by their reliance on quantization modules that exclusively produce binary decisions. This limitation becomes particularly evident in their inability to permute SIC ordering for NOMA networks. The permutation complexity for SIC orderings is factorial, presenting a significantly more challenging scenario than binary decisions. This complexity forms the basis of our motivation to develop a solution that adeptly handles the joint SIC ordering and power allocation problem, aiming to achieve near-optimal performance. This is particularly crucial in meeting the real-time requirements of NOMA networks, where efficiently managing the factorial complexity is essential for optimal system operation.

In this paper, we consider the uplink NOMA with different weighted users. Since transmit power of the NOMA wireless network is typically shared in a best-effort fashion, we aim to ensure fairness across multiple users. We optimize the SIC ordering and users’ transmit power to maximize the weighted proportional fairness function. To tackle this challenging problem, a novel Attention-based SIC ordering and power allocation (ASOPA) framework is proposed, which leverages both DRL and optimization theory. To assess the effectiveness of ASOPA, we conduct comparative analyses against a range of baseline algorithms. These comparisons focus on two key metrics: the network utility achieved and the execution duration of ASOPA. Furthermore, to demonstrate the wide-ranging applicability and versatility of ASOPA, the ASOPA is applied with various NOMA network configurations. This extension effectively highlights ASOPA’s adaptability across diverse network structures within the NOMA framework, underlining its robustness and practical utility in diverse network conditions.

Our main contributions can be summarized as follows:

•

We formulate a joint optimization problem of SIC ordering and power allocation to maximize the weighted proportional fairness on an uplink NOMA wireless network. To solve this problem efficiently, it is decomposed into two subproblems: a SIC ordering subproblem and a power allocation subproblem under the given SIC ordering. This decomposition approach enables us to leverage DRL and convex optimization for optimal performance.
•

We propose the ASOPA framework to solve the joint optimization problem. This framework comprises three components: an attention-based actor network, a convex optimization module, and a baseline network. The actor network generates the SIC ordering, while the optimization module allocates users’ transmit power. The baseline network is used to train and reinforce the actor network. The ASOPA framework is designed to generate feasible solutions that meet all physical constraints, ensuring optimal performance.
•

We conduct comprehensive numerical experiments to validate the effectiveness of the ASOPA framework. The findings reveal that ASOPA attains near-optimal performance, fulfilling the real-time demands of NOMA networks. Notably, the normalized network utility achieved by ASOPA has a confidence interval exceeding 99%, closely mirroring the performance of exhaustive methods. In terms of execution latency, ASOPA is on par with static SIC ordering algorithms, with an approximate duration of 50 ms in a ten-user NOMA network. To highlight ASOPA’s broad applicability and versatility, we extend its application to include scenarios with imperfect channel state information (CSI), networks comprising multiple BSs, and systems with multiple-antenna setups. In each of these diverse environments, ASOPA consistently outperforms the baseline algorithms, showcasing its robustness and effectiveness across different NOMA network configurations.

The rest of this paper is organized as follows. The related work is introduced in Section 2. Section 3 gives the system model and problem formulation. Section 4 introduces ASOPA. The numerical results are given in Section 6. Finally, Section 7 concludes the paper.

2 Related Work

Most of the existing works on NOMA use fixed SIC ordering according to channel conditions. Specifically, the descending order of channel quality is usually used in uplink NOMA [25, 26], and the ascending order of channel quality is generally used in downlink NOMA [27, 28]. However, in many scenarios, using the above fixed ascending or descending order for SIC might not be optimal. To achieve optimal performance, [24] used the exhaustive search to find the optimal decoding order. However, the computational complexity of the exhaustive search is at least $O(N!)$ , and its usage is limited to small-scale NOMA wireless scenarios, i.e., less than five users. To balance performance and computational complexity, the following works further optimized the SIC ordering in a NOMA network, which affects different performance metrics, e.g., outage probability[19, 18, 20], throughput[32, 15, 17, 14, 16], latency[21], energy consumption[23, 22] and the data rate of a particular user[24]. After elaborately designing the SIC ordering algorithm, Qian et al. [21] showed that by optimizing the SIC ordering, the min-max execution latency could be reduced by ten times compared to the best comparison method. Also, [14] showed that optimizing the SIC ordering can get a 48% improvement in the sum rate over the fixed SIC ordering. The above work of SIC ordering optimization can be classified into two types: static SIC ordering and dynamic SIC ordering.

2.1 Static SIC Ordering

In addition to wireless channel quality, some works have proposed using a static decoding order based on other performance metrics specific to certain problems and scenarios. For example, the descending order of received user signal power [19, 17, 20], the descending order of predicted user throughput [29], the decreasing order of channel gain normalized by noise and interference power [30, 31], the ascending order of average channel gain from user to the base station [32], and the ascending order of a user’s maximum secrecy throughput [18]. For the uplink NOMA scenario, [18] adopted the ascending order of user’s maximum secrecy throughput as the SIC ordering to achieve secrecy transmission in eavesdropper scenarios. [19] used the descending order of received user signal power as the SIC ordering and derived the closed-form expression of the outage probability for a single base station and three-user system. Building on this work, [20] considered the single base station and multiple active user scenario in the case of imperfect channel state information, using the descending order of estimated instantaneous received signal power as the SIC ordering. Although these static SIC ordering methods can achieve excellent performance with small execution latency, they may suffer considerable performance degradation on complex NOMA networks due to the difficulty of finding a suitable metric for ordering.

2.2 Dynamic SIC Ordering

In addition to works that use a static SIC ordering based on specific problems and scenarios, a few studies have proposed iteratively searching heuristic algorithms to optimize the SIC ordering. For instance, [3] and [14] introduced binary variables to represent the SIC ordering, solving the problem via variable relaxation with compensation for performance degradation and as an integer linear programming problem, respectively. [33] adapt a permutation-based genetic algorithm to optimize the SIC ordering. [21] used the greedy meta-scheduling technique to develop a low-complexity and easy-to-implement SIC ordering algorithm. This algorithm sequentially inserts users into the existing ordering, tries every possible insertion position for each user, and chooses the position that provides the most significant benefit. [34] utilized a heuristic tabu search to optimize the SIC ordering in an iterative process. At each iteration, tabu search swapped the SIC ordering of any two users and selected the best one. To the best of our knowledge, [34] is the state-of-the-art algorithm for dynamic SIC ordering, it achieves the near-optimal performance, but suffers from high computational complexity due to the large number of iterations. While these dynamic SIC ordering methods apply to most NOMA wireless scenarios and achieve better performance than static SIC ordering, they all involve iterative updates and repeatedly solve resource allocation subproblems, resulting in high computational complexity.

3 System model and problem formulation

3.1 System Model

As shown in Fig. 1, we consider an uplink NOMA network with a central-located BS and $N$ active users, denoted as a set $\mathcal{N}=\{1,2,\dots,N\}$ , where each user has a single antenna. Users have a stable power supply, and all $N$ users simultaneously transmit their information to the BS by NOMA.

Refer to caption — Figure 1: An uplink NOMA network with one BS and $N$ users.

The system time is divided into consecutive slots of equal length, smaller than the channel coherence time. We assume that the wireless channel gain is constant in each time slot and may vary across different slots. Without loss of generality, the slot length is normalized for brevity.

The BS employs SIC in a successive order to decode users’ signals. In our framework, we define a function $\xi(n)=i$ and its inverse $\pi(i)=n$ , which establish a mapping between a user’s index $n$ and its corresponding decoding order $i$ . For instance, $\xi(3)=2$ indicates that the user 3 is decoded second in the sequence. Conversely, $\pi(2)=3$ indicates that the second user to be decoded in the order is user 3. This bi-directional mapping functions serve to clearly delineate the relationship between the decoding sequence and the specific users in the network. Hence, the signal-to-interference-plus-noise ratio of the user $n$ can be expressed as

\phi_{n}=\frac{p_{n}g_{n}}{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}{p_{n^{\prime}}g_{n^{\prime}}}+N_{0}},

(1)

where $N_{0}$ is the power of the additive Gaussian noise at the BS, $g_{n}$ denotes the wireless channel gain between user $n$ and the BS at a tagged time slot, and $p_{n}$ denotes the transmit power of user $n$ sending its information. So the data rate of user $n$ can be expressed as

R_{n}=B\log_{2}(1+\phi_{n}),

(2)

where $B$ denotes the communication bandwidth.

We summarize essential notations used throughout this paper in Table I.

TABLE I: Notations

Notation	Definition
$N$	The number of users
$\mathcal{N}$	The set of users
$g_{n}$	The wireless channel gain between the user $n$ and the BS
$p_{n}$	The transmit power of user $n$
$\mathbf{p}$	The vector representation of the power allocation
$P_{n}^{max}$	The maximum power that user $n$ can achieve
$\bm{\pi}$	The SIC ordering of all users
$\bm{\pi}^{BL}$	The SIC ordering generated by the baseline network
$\Pi$	The set of all possible SIC orderings
$\xi\left(n\right)$	The order of user $n$ to be decoded
$N_{0}$	The power of the additive Gaussian noise at the BS
$\phi_{n}$	The signal-to-interference-plus-noise ratio of the user $n$
$B$	The communication bandwidth
$R_{n}$	The data rate of user $n$
$w_{n}$	The weight of user $n$
$\mathbf{X}$	The representation of all users’ information
$\theta$	The parameters of the actor network
$\theta^{\prime}$	The parameters of the baseline network
$\mathbf{E}$	The embedding of all users generated by the encoder
$\overline{\mathbf{e}}$	The global information embedding which is the mean of $\mathbf{e_{n}},\forall n\in\mathcal{N}$
$\bm{\ell}_{t}$	The probability of users being selected at iteration $t$
$\mathbf{q},\mathbf{K},\mathbf{V}$	The query, key, and value
$d_{k}$	The dimension of $\mathbf{q}$ and $\mathbf{k}$
$d_{e}$	The dimension of each user’s embedding $\mathbf{e}_{n}$
$S\left({\theta}\|\mathbf{X}\right)$	The expected objective value for input $\mathbf{X}$ under the network parameters $\theta$
$z_{\theta}\left(\bm{\pi}\|\mathbf{X}\right)$	The probability of $\bm{\pi}$ generated by the actor network $\theta$ for the input $\mathbf{X}$
$R(\bm{\pi}\|\mathbf{X})$	The network utility under the given SIC ordering $\bm{\pi}$ for $\mathbf{X}$
$\left\|\bm{\tau}\right\|$	The batch size
$\tau$	The index of sample in a training batch
$\bm{\tau}$	The set of training batch

3.2 Problem Formulation

This paper aims to achieve weighted proportional fairness across multiple users. Proportional fairness can be achieved by maximizing individual rates with a logarithmic utility function [48]. In addition, considering users’ different priorities, individual weights are assigned to each user to achieve weighted proportional fairness [49, 50, 51]. Thus, the aim is to maximize the weighted sum of the logarithmic throughput of different users by jointly optimizing the SIC ordering $\bm{\pi}$ and the power allocation p, denoted as the network utility $R(\bm{\pi},\textbf{p})$ . This optimization problem can be expressed mathematically as:


$\displaystyle\textbf{P0}:R(\bm{\pi},\textbf{p})=\max\limits_{\mathbf{\bm{\pi}},\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln R_{n}}$	(3a)
$\displaystyle s.t.\$	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N},$	(3b)
	$\displaystyle\bm{\pi}\in\Pi,$	(3c)

where $w_{n}$ is the weight of user $n$ . $\bm{\pi}=[\pi_{1},\pi_{2},\cdots,\pi_{N}]$ indicates the SIC order, where we denote ${\pi_{i}}=\pi(i)$ for brevity. ${\Pi}$ is the permutation set of all possible SIC orderings with size factorial $N$ , represented as $N!$ . $\textbf{p}=[p_{1},p_{2},\dots,p_{N}]$ is the power allocation. (3b) is the power constraint for each user $n$ , where $P_{n}^{max}$ is the maximum power that user $n$ can achieve. (3c) is the constraint for $\bm{\pi}$ .

The problem P0 involves combinatorial optimization and continuous numerical optimization, which is NP-hard. To effectively solve the problem P0, we decompose it into the SIC ordering optimization and the optimization of power allocation under the given SIC ordering, as shown in Fig. 2:

•

SIC Ordering: It is computationally expensive to iteratively search for the optimal SIC ordering from $\Pi$ at the $N!$ scale. We tell that one SIC ordering outperforms another one by solving the power allocation problems P1 and comparing their utilities $R(\bm{\pi})$ . However, classical comparison-based sorting algorithms cannot perform better than $O(n\log n)$ on average [52], which requires repeatedly solving P1, resulting in long execution latency. In this paper, the deep reinforcement learning method is adopt to generate the SIC ordering before the power allocation.

•

Power Allocation: When the SIC ordering $\bm{\pi}$ is determined, we only need to solve the power allocation $\mathbf{p}$ , as follows:


$\displaystyle\textbf{P1}:R(\bm{\pi})=\max\limits_{\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln{R_{n}}}$	(4a)
$\displaystyle s.t.\$	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N}.$	(4b)

We can solve this power allocation sub-problem P1 by converting it to a convex problem and using the inter-point method.

In the next Section, these two subproblems are solved by taking advantage of DRL and convex optimization.

4 Algorithm Design

4.1 Algorithm Overview

Fig. 3 shows the schematics of the proposed ASOPA framework. It uses an encoder-decoder-based actor network $\theta$ to generate the SIC ordering sequentially and uses optimization techniques to optimize all users’ transmit power. The design of the actor network follows from the pointer network [54, 53] for routing problems. In Fig. 3, the black solid lines depict the inference process of ASOPA, which necessitates real-time network parameters. These parameters include each user’s weight $w_{n}$ , maximum power $P_{n}^{max}$ , and channel gain $g_{n}$ . The complete set of users’ information, denoted by $\mathbf{X}=\{(w_{i},P_{i}^{max},g_{i})\}_{i\in\mathcal{N}}$ , is fed into the actor network to generate the SIC ordering $\bm{\pi}$ . Following the determination of the SIC ordering $\bm{\pi}$ , we solve the sub-problem P1 for the optimal power allocation $\mathbf{p}$ through convex optimization. Then, ASOPA outputs the network decision, represented as $(\bm{\pi},\mathbf{p})$ , based on the instantaneous user information $\mathbf{X}$ .

The red dotted lines in Fig. 3 illustrate the policy update process in ASOPA. We employ a replica of the actor network, referred to as the baseline network $\theta^{\prime}$ , and train the actor network using the REINFORCE algorithm. Each users’ information $\bf{X}$ is fed into both the actor and baseline networks. This process yields the output SIC orderings $\bm{\pi}$ and the baseline SIC orderings $\bm{\pi}^{BL}$ . Subsequently, $R\left({\bm{\pi}|{\bf{X}}}\right)$ and $R\left(\bm{\pi}^{BL}|\bf{X}\right)$ are obtained by solving problem P1, which are used to compute the loss function. The backpropagation method is employed to update the parameters $\theta$ of the actor network. The procedures of ASOPA are detailed in the following subsections.

4.2 SIC Ordering

As shown in Fig. 3, the SIC ordering $\bm{\pi}$ is generated by the actor network composed of an encoder and a decoder as follows:

1.

The encoder takes users’ information $\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{N}]$ as input and outputs users’ embedding $\mathbf{E}=[\mathbf{e}_{1},\mathbf{e}_{2},...,\mathbf{e}_{N}]$ using self-attention layers. The global embedding ${\overline{\mathbf{e}}}$ is then calculated as the average of $\mathbf{E}$ .
2.

The decoder generates the SIC ordering in an iterative process. At each iteration $t$ , by utilizing the cross-attention layers, the decoder generates the probability of all users according to $[{\overline{\mathbf{e}}},{\mathbf{e}}_{\pi_{t-1}}]$ and $\mathbf{E}$ . By masking the previously selected users, the probability of the remaining users being selected is calculated using the softmax function, allowing the decoder to determine the current user’s index $\pi_{t}$ . At the end of each iteration, the decoder updates users selected for masking and takes ${\mathbf{e}}_{\pi_{t}}$ as input to next iteration. It iterates $N$ times to obtain the complete SIC ordering $\bm{\pi}=[\pi_{1},\pi_{2},...,\pi_{N}]$ .

The integration of an attention scheme in the encoder and an iterative decoding scheme in the decoder empowers ASOPA to effectively manage a varying number of users in NOMA networks. This approach ensures adaptability and responsiveness to user dynamics, maintaining optimal performance across diverse network scenarios.

4.2.1 Encoder

In this section, we describe how the encoder maps user information $\bf{X}$ through the neural network into users’ embedding $\bf{E}$ suitable for subsequent processing. Users’ information $\mathbf{X}=[\mathbf{x}_{1},\dots,\mathbf{x}_{N}]$ is first be expanded into $d_{e}$ dimensions by a fully connected feed forward ( $\text{FF}_{1}$ ) layer, and then it passes a multi-head attention (MHA) layer and a feed forward ( $\text{FF}_{2}$ ) layer. Both MHA and $\text{FF}_{2}$ layer have a residual connection and are followed by batch normalization in [53]. Hence, we have:

\displaystyle\begin{split}&\hat{\mathbf{E}}=\text{BN}\bigg{(}\text{FF}_{1}\big{(}\mathbf{X}\big{)}+\text{MHA}\Big{(}\text{FF}_{1}\big{(}\mathbf{X}\big{)}\Big{)}\bigg{)}\\ &\mathbf{E}=\text{BN}\bigg{(}\hat{\mathbf{E}}+\text{FF}_{2}\big{(}\hat{\mathbf{E}}\big{)}\bigg{)}.\end{split}

(5)

The details of the multi-head attention mechanism are shown in Appendix A. The length of users’ embedding is adaptive to the variable number of users. Each term in the obtained users’ embedding $\mathbf{E}=[\mathbf{e}_{1},\dots,\mathbf{e}_{N}]$ takes into account all the users’ information.

4.2.2 Decoder

The decoder iteratively generates the users’ SIC ordering by utilizing the user embeddings obtained from the encoder. In the decoder, the user embedding $\mathbf{E}$ is initially passed through a linear layer to derive $\mathbf{K}$ , $\mathbf{V}$ , and $\mathbf{K}^{\prime}$ as follows:

\mathbf{K}=\mathbf{W}^{K}\mathbf{E},\quad\mathbf{V}=\mathbf{W}^{V}\mathbf{E},\quad\mathbf{K}^{\prime}=\mathbf{W}^{{K}^{\prime}}\mathbf{E},

(6)

where $\mathbf{W}^{K}$ , $\mathbf{W}^{V}$ , and $\mathbf{W}^{{K}^{\prime}}$ are matrices of learnable parameters. The global embedding $\overline{\mathbf{e}}=\frac{1}{N}\sum_{n=1}^{N}\mathbf{e}_{n}$ is computed to effectively capture the overall network state. The derived values of $\mathbf{K}$ , $\mathbf{V}$ , and $\mathbf{K}^{\prime}$ , along with $\overline{\mathbf{e}}$ , are then utilized in subsequent iterations of the decoder.

As shown in Fig. 3, the decoder performs in an iteration mode and generates an $N$ users’ SIC ordering $\bm{\pi}=[\pi_{1},\pi_{2},...,\pi_{N}]$ . For each iteration $t\in\left[1,2,...,N\right]$ , the decoder takes the last decoded user’s index $\pi_{t-1}$ and decides the current decoded user’s index $\pi_{t}$ .

Firstly, the concatenation module concatenates an input vector $\left[\overline{\mathbf{e}},\mathbf{e}_{\pi_{t-1}}\right]$ that contains the global information and the information of the previously decoded users. $\mathbf{e}_{\pi_{t-1}}$ denotes the previous decoded user’s embedding. The embedding $\mathbf{e}_{\pi_{t-1}}$ of the input vector changes with iteration, which captures the user preferences on the SIC ordering. When decoding the first user with no previous decoded user, $\mathbf{e}_{0}$ is set as the learnable parameter vector.

Secondly, the masked multi-head attention layer generates a feature vector $\mathbf{u}_{t}$ that involves the query, keys, and values initially introduced in [55]. The keys $\mathbf{K}$ and values $\mathbf{V}$ are obtained in (6), while the single query is variable input computed as follows:

\mathbf{q}_{t}=\mathbf{W}^{Q}\left[\overline{\mathbf{e}},\mathbf{e}_{\pi_{t-1}}\right],

(7)

where $\mathbf{W}^{Q}$ are learnable parameters matrices. The multi-head attention mechanism is described in Appendix A and omitted for brevity. Then the output of the masked multi-head attention layer can be calculated as

\mathbf{u}_{t}=\text{softmax}\left(\text{mask}\left(\tfrac{{\mathbf{q}_{t}}^{T}\mathbf{K}}{\sqrt{d_{k}}}\right)\right)\mathbf{V},

(8)

where $d_{k}$ is the dimension of $\mathbf{q}$ , the softmax function can refer to equation (4) of Appendix A, and the mask function can be expressed as:

\text{mask}\left(\tfrac{{\mathbf{q}_{t}}^{T}\mathbf{k}_{i}}{\sqrt{d_{k}}}\right)=\begin{cases}-\infty&\text{ if }i\in\{\pi_{1},\pi_{2},...,\pi_{t-1}\},\\ \frac{{\mathbf{q}_{t}}^{T}\mathbf{k}_{i}}{\sqrt{d_{k}}}&\text{ otherwise },\end{cases}

(9)

where $\mathbf{k}_{i}$ denotes the $i$ -th element of the keys $\mathbf{K}$ . At each iteration $t$ , the users selected in previous iterations are masked to guard that each user’s index appears precisely once in $\bm{\pi}$ .

Thirdly, the single-head attention layer is used to generate the probability of users being selected at time $t$ . $\bm{\ell}_{t}$ can also be considered the similarity between the single query and the keys of the single-head attention layer. The keys $\mathbf{K}^{\prime}$ is obtained in (6), and the single query $\mathbf{q}^{\prime}_{t}$ is computed as follows:

\mathbf{q}^{\prime}_{t}=\mathbf{W}^{Q^{\prime}}\mathbf{u}_{t},

(10)

where $\mathbf{W}^{Q^{\prime}}$ are learnable parameters matrices. Then, $\bm{\ell}_{t}$ can be derived as

\bm{\ell}_{t}=\text{softmax}\left(\text{mask}\left(\text{clip}\left(\tfrac{{\mathbf{q}^{\prime}_{t}}^{T}\mathbf{K}^{\prime}}{\sqrt{d_{k}}}\right)\right)\right),

(11)

where the clip function can clip the result within $\left[-10,10\right]$ by the tanh function to avoid the probability of each selected user being too large or too small [53].

Finally, the selection module selects the user to be decoded based on $\bm{\ell}_{t}$ . In the inference phase, the selection module works in the greedy mode. Specifically, the selection module greedily selects the one with the greatest probability to be the $t$ -th decoded user, as

\pi_{t}=\mathop{\arg\max}_{n}\ell_{t,n}.

(12)

Up to now, the decoder operation at iteration $t$ is completed.

Upon repeating the aforementioned steps $N$ times in the decoder, we can compile all $\pi_{t}$ to form the SIC decoding order $\bm{\pi}$ :

\bm{\pi}=[\pi_{1},\dots,\pi_{N}].

(13)

4.2.3 Step-by-Step SIC Ordering Example

To illustrate the iterative generation of the SIC ordering, we provide an example involving 5 users as shown in Fig. 4.

The encoder takes the five users’ information $\mathbf{X}=[\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{5}]$ as input, and correspondingly calculate the users’ embedding $\mathbf{E}=[\mathbf{e}_{1},\mathbf{e}_{2},\dots,\mathbf{e}_{5}]$ and the global embedding ${\bf{\bar{e}}}=\frac{1}{5}\sum_{n=1}^{5}{{{\bf{e}}_{n}}}$ .

The decoder then iteratively generates the first three SIC orderings of five users as follows:

1.

In the first iteration ( $t=1$ ), the decoder inputs $[\overline{\mathbf{e}},\mathbf{e}_{0}]$ , computes the probabilities $\bm{\ell}_{t}$ using Equation (11), and selects the user with the highest probability for the first decoding. In this example, user 2 has the highest probability ( $\ell_{1,2}=0.4187$ ) and is selected as $\pi_{1}=2$ .
2.

In the second iteration ( $t=2$ ), the decoder takes the global embedding $\bar{\mathbf{e}}$ and the embedding of the first decoded user ( $\mathbf{e}_{2}$ ) as inputs $[\overline{\mathbf{e}},\mathbf{e}_{2}]$ , and recalculates the probabilities $\bm{\ell}_{t}$ . As user 2 has already been selected, its probability is masked and set to zero ( $\ell_{2,2}=0$ ). The highest probability in this iteration is $\ell_{2,1}=0.4728$ , leading to the selection of user 1 as $\pi_{2}=1$ .
3.

In the third iteration ( $t=3$ ), the decoder inputs $[\overline{\mathbf{e}},\mathbf{e}_{1}]$ , and recalculates the probabilities $\bm{\ell}_{t}$ . With the probabilities of the previous selected users masked ( $\ell_{3,2},\ell_{3,1}=0$ ), user 5 has the highest probability ( $\ell_{3,5}=0.7953$ ) in this iteration and thus user 5 is selected as $\pi_{3}=5$ .

This procedure is repeated for two more iterations until the SIC decoding order for all five users is determined. The specific probabilities ${\ell}_{t,n}$ of all users over iterations are shown in Table II.

TABLE II: Case study - Inference of the decoder

	Input	Output
		$\bm{\ell}_{t}$					Decoded user $\pi_{t}$
		user 1	user 2	user 3	user 4	user 5	Decoded user $\pi_{t}$
$t=1$	$\left[\overline{\mathbf{e}},\mathbf{e}_{0}\right]$	0.2727	0.4187	0.0223	0.0295	0.2568	$\pi_{1}=2$
$t=2$	$\left[\overline{\mathbf{e}},\mathbf{e}_{2}\right]$	0.4728	0 (masked)	0.0399	0.0528	0.4345	$\pi_{2}=1$
$t=3$	$\left[\overline{\mathbf{e}},\mathbf{e}_{1}\right]$	0 (masked)	0 (masked)	0.0893	0.1154	0.7953	$\pi_{3}=5$
$t=4$	$\left[\overline{\mathbf{e}},\mathbf{e}_{5}\right]$	0 (masked)	0 (masked)	0.4520	0.5480	0 (masked)	$\pi_{4}=4$
$t=5$	$\left[\overline{\mathbf{e}},\mathbf{e}_{4}\right]$	0 (masked)	0 (masked)	1	0 (masked)	0 (masked)	$\pi_{5}=3$

4.3 Power Allocation Under Given SIC Ordering

In this subsection, we design a convex transformation algorithm for the power allocation problem under given SIC ordering. Referring to Fig. 3, the actor network of ASOPA only generates SIC ordering without considering transmit power allocation and the corresponding constraints. To obtain the corresponding power allocation and evaluate the SIC ordering, we solve the power allocation subproblem and obtain the achieved network utility as follows.

Upon determining the SIC ordering $\bm{\pi}$ , the sub-problem P1 is addressed to identify the optimal power allocation satisfying the constraints and subsequently calculate the network utility $R(\bm{\pi}|\mathbf{X})$ for the specified SIC ordering. Given that the data rate $R_{n}$ for each user is non-convex, P1 is inherently a non-convex problem. To tackle this, we employ variable substitution to transform P1 into a convex problem. The details of this transformation and the proof of its convexity are provided in Appendix B. We solve the transformed version of P1 using the interior-point method [56] of the CVX solver. The solution of P1 yields the optimal power allocation $\mathbf{p}$ under the given SIC ordering $\bm{\pi}$ satisfying the constraints. Following this, ASOPA outputs the network decision $(\bm{\pi},\mathbf{p})$ based on the instantaneous user information $\mathbf{X}$ .

Consequently, the network utility $R(\bm{\pi}|\mathbf{X})$ can be calculated. This calculated utility provides essential feedback on the effectiveness of the SIC ordering under the current policy. As such, the resource allocation module acts as a critic in training the actor network, playing a crucial role in evaluating these generated SIC orderings.

Notice that ASOPA can be migrated to any other NOMA optimization problem with the required resource allocation problem. The extensibility of ASOPA is discussed in following Sec. 5.

4.4 Policy Update

The red dotted lines in Fig. 3 represent the policy update process. For an input instance $\mathbf{X}$ , our goal is to maximize the expected sum of weighted users’ logarithmic throughput, as

\mathbb{E}_{\bm{\pi}\sim z_{{\theta}}\left(\cdot|\mathbf{X}\right)}R\left(\bm{\pi}|\mathbf{X}\right).

(14)

where $z_{\theta}\left(\bm{\pi}|\mathbf{X}\right)=\prod_{t=1}^{N}\ell_{t,\pi_{t}}$ is a measure of the likelihood that the generated SIC decoding order $\bm{\pi}$ is the optimal sequence given the current network state $\mathbf{X}$ .

To explore more SIC orderings in the training phase, at the selection module of the decoder network, the user based on the probability $\bm{\ell}_{t}$ at the iteration $t$ is sampled [53] as

\pi_{t}\sim\bm{\ell}_{t}.

(15)

After iterating $t\in\left[1,2,...,N\right]$ , we obtain the SIC ordering $\bm{\pi}$ . Then, the gradient of the actor network $\theta$ can be formulated by the REINFORCE algorithm [57] as

\begin{split}\mathbb{E}\Big{(}\!R\left(\bm{\pi}|\mathbf{X}\right)\!\nabla_{{\theta}}\log z_{{\theta}}\left(\bm{\pi}|\mathbf{X}\right)\Big{)}.\end{split}

(16)

However, the REINFORCE algorithm may be of high variance and thus produce slow learning [59]. To improve the performance of DRL, the REINFORCE algorithm with baseline [59] is adopted in this paper. As illustrated in Fig. 3, the baseline network $\theta^{\prime}$ , which uses the actor network parameters from the previous epoch, serves as a baseline to generate SIC orderings $\bm{\pi}^{BL}$ and subsequently calculates the network utility $R(\bm{\pi}^{BL}|\mathbf{X})$ . The difference between $R(\bm{\pi}^{BL}|\mathbf{X})$ and $R(\bm{\pi}^{BL}|\mathbf{X})$ is utilized to train the current actor network $\theta$ .

Consequently, the gradient of the REINFORCE algorithm with baseline can be expressed and then approximated by Monte Carlo sampling as

\begin{split}&\mathbb{E}\left(\left(\!R\left(\bm{\pi}|\mathbf{X}\right)\!\!-\!\!R(\bm{\pi}^{BL}|\mathbf{X})\!\right)\!\nabla_{{\theta}}\log z_{{\theta}}\left(\bm{\pi}|\mathbf{X}\right)\right)\\ &\approx\frac{1}{|\bm{\tau}|}\sum_{i=1}^{|\bm{\tau}|}\bigg{(}\!\!\!\left(R\left(\bm{\pi}_{i}|\mathbf{X}_{i}\right)-R\left(\bm{\pi}_{i}^{BL}|\mathbf{X}_{i}\right)\right)\nabla_{{\theta}}\log z_{{\theta}}\left(\bm{\pi}_{i}|\mathbf{X}_{i}\right)\!\!\bigg{)}.\end{split}

(17)

where $|\bm{\tau}|$ is the batch size, $\mathbf{X}_{i}$ is the $i$ -th input, $\bm{\pi}_{i}$ is the SIC ordering produced by the actor network based on (15), and $\bm{\pi}_{i}^{BL}$ is the $i$ -th SIC ordering produced by the baseline network. In practice, the expectation in Equation (17) is approximated by averaging over a batch of uniformly sampled input instances ${\left\{{{{\bf{X}}_{\tau}}}\right\}_{\tau\in\bm{\tau}}}$ , where $\bm{\tau}$ represents the index set of these sampled instances. After obtaining the gradients (17), Adam [58] is applied as the optimizer to update the actor network’s parameters ${\theta}$ .

Our reinforcement learning algorithm for training the actor network is outlined in Algorithm 1. To facilitate this process, an empty memory with limited capacity is established to store past samples. As new samples are received in each time slot, policy updates are executed infrequently. For every policy update, a random batch of samples is selected from this memory to train the actor network. The baseline network, on the other hand, undergoes updates every epoch which consists of $M$ times policy update. This update process involves copying the parameters from the actor network to the baseline network, as denoted by $\theta^{\prime}=\theta$ . This systematic approach ensures continuous adaptation and optimization of the actor network’s performance based on the latest data.

input : Users’ weights, maximum transmit power, and channel gains at each time slot

s

\mathbf{X}_{s}=\{(w_{i},P_{i}^{max},g_{i})\}_{i\in\mathcal{N}}

, the training interval

\delta_{T}

of the actor network, the update epoch

\delta_{E}

of the baseline network;

output : SIC order

\bm{\pi}

and power allocation

\mathbf{p}

;

2 Initialize the actor network’s parameters

{\theta}

;

3 Initialize the baseline network’s parameters

{\theta}^{\prime}\leftarrow{\theta}

;

4 for $epoch=1,\cdots,E$ do

5 Generate the SIC ordering

\bm{\pi}

for

\mathbf{X}

of the epoch from the actor network based on (15); //Inference of ASOPA for each sample

6 Obtain

\mathbf{p}

and

R\left(\bm{\pi}|\mathbf{X}\right)

for

\mathbf{X}

of the epoch by solving problem P1;

7 for $batch=1,\cdots,M$ do

8 Uniformly sample a batch of samples

\{\bm{X}_{\tau}\}_{\tau\in\bm{\tau}}

from prvious samples of the epoch; //Infrequently policy update of ASOPA can be executed in parallel on different servers in practical applications

9 Generate the SIC ordering

\{\bm{\pi}_{\tau}\}_{\tau\in\bm{\tau}}

from the actor network based on (15);

10 Generate the baseline SIC ordering

\{\bm{\pi}^{BL}_{\tau}\}_{\tau\in\bm{\tau}}

from the baseline network based on (12);

11 Obtain

\{\left(R\left(\bm{\pi}_{\tau}|\mathbf{X}_{\tau}\right),R\left(\bm{\pi}_{\tau}^{BL}|\mathbf{X}_{\tau}\right)\right)\}_{\tau\in\bm{\tau}}

by solving problem P1;

12 Calculate the gradient

\nabla_{{\theta}}S(\theta)

from (17) based on

\{\left(R\left(\bm{\pi}_{\tau}|\mathbf{X}_{\tau}\right),R\left(\bm{\pi}_{\tau}^{BL}|\mathbf{X}_{\tau}\right)\right)\}_{\tau\in\bm{\tau}}

;

13 Update the actor network’s parameters

\theta

using the Adam optimization algorithm based on the calculated gradients

\nabla_{{\theta}}

;

14 end for

15 Update the baseline network’s parameters

{\theta}^{\prime}\leftarrow{\theta}

;

16 end for

Algorithm 1 Training ASOPA

4.5 Computation Complexity

ASOPA operates through two distinct processes: the inference process and the policy update process. During each time slot, ASOPA’s inference process is activated to generate Successive Interference Cancellation (SIC) orderings and power allocations. Contrarily, the policy update process can be carried out less frequently, and can also be executed in parallel on different servers in practical applications. Given the crucial role of inference delay in determining the feasibility of field deployment, the inference complexity of ASOPA is a key area of interest.

The inference process is detailed in lines 4-5 of Algorithm 1. Line 4 involves generating the SIC ordering from the actor network, where the computational complexity is primarily driven by matrix multiplications in the attention mechanism. The complexity of interactions between the query, key, and value is $O(HDN^{2})$ , with $H$ representing the number of heads in multiple attention mechanisms, $D$ the dimension of these components, and $N$ the number of users. Line 5 addresses the generation of power allocation by solving subproblem P1, which is reformulated into a convex problem P2 and solved using the cvxopt solver with the Interior Point Method. The computational complexity of this method is $O(N^{3.5})$ [60]. Therefore, the overall inference complexity of ASOPA is $O(N^{3.5})$ . In Section 6.3, we will numerically demonstrate that ASOPA meets the real-time requirements of NOMA networks.

5 Extension Scenarios

ASOPA can be easily extended to various NOMA scenarios. To migrate to a new scenario, only the inputs of the actor network and baseline network need to be modified, while the structure of the actor network used to determine the SIC ordering remains unchanged. Correspondingly, the resource allocation subproblems are adjusted and re-solved for specific NOMA scenarios, ensuring optimal performance and efficiency under different network conditions.

In this section, we evaluate the extension scenarios of ASOPA, including NOMA networks with imperfect CSI, NOMA networks with multiple-antenna setups, and NOMA networks with multiple-BS setups.

5.1 Multiple Antenna

In this subsection, we evaluate ASOPA in NOMA networks with multiple-antenna setups, since multiple-antenna technology has advantages in improving spectrum and energy efficiency [61, 62]. The uplink multiple-antenna system consists of a ${M_{r}}$ antennas BS and $N$ single-antenna users as shown in Fig. 5. The received signal ${\bf{Y}}\in{\mathbb{C}^{{M_{r}}\times 1}}$ of BS is

{\bf{Y}}={\bf{HS}}+{\bf{n}},

(18)

where ${\bf{H}}\in{\mathbb{C}^{{M_{r}}\times N}}$ denotes the channel state from users to the receive antennas of BS, ${\bf{S}}={\left[{{s_{1}},{s_{2}},...,{s_{N}}}\right]^{T}}$ denotes the transmit signal matrix, and ${\bf{n}}\sim{\cal{CN}}\left({{\bf{0}},{\sigma^{2}}{\bf{I}}}\right)$ represents the additive white Gaussian noise at the BS side. ${\cal{CN}}(0,\sigma^{2})$ denotes the complex Gaussian distribution with mean zero and the variance $\sigma^{2}$ . The linear equalization, such as zero-forcing (ZF) or minimum-mean-square-error (MMSE), is used in multiple-antenna scenarios to symbol-by-symbol detection. According to the states of multiple-antenna scenario, ASOPA correspondingly generates the SIC ordering and power allocation. The specific steps are as follows.

The equalization matrix ${\bf{V}}\in{\mathbb{C}}^{{N}\times{M_{r}}}$ for ZF [61] or MMSE [62] can be expressed as [63]

{\bf{V}}=\left\{{\begin{array}[]{*{20}{l}}{\bf{P}}{{\bf{H}}^{H}}{{({{\bf{H}}}{\bf{P}}{\bf{H}}^{H}+\sigma{\bf{I}})}^{-1}}&{\text{when MMSE}},\\ {{\bf{H}}^{H}}{{({{\bf{H}}}{\bf{H}}^{H})}^{-1}}&{\text{when ZF}},\end{array}}\right.

(19)

where ${\bf{P}}$ denotes the diagonal matrix ${diag}(p_{1},p_{2},…,p_{N})$ .

Then the received estimated signal is

{\bf{\hat{S}}}={\bf{VY}}={\bf{VHS}}+{\bf{Vn}},

(20)

and the estimated value of user $n$ is

{\hat{s}_{n}}=\sqrt{{p_{n}}}{{\bf{v}}_{n}}{{\mathbf{h}}_{n}}{s_{n}}+\sum\limits_{n^{\prime}\neq n}{\sqrt{{p_{n^{\prime}}}}}{{\bf{v}}_{n}}{{\bf{h}}_{n^{\prime}}}{s_{n^{\prime}}}+{{\bf{v}}_{n}}{\bf{n}},

(21)

where ${{\bf{h}}_{n}}\in{\mathbb{C}^{{M_{r}}\times 1}}$ denotes the channel states from user $n$ to the receive antenna of the BS, and ${\bf{v}}_{n}\in{\mathbb{C}^{1\times{{M_{r}}}}}$ denotes the $n$ -th row of ${\bf{V}}$ , which can be given by

{\bf{v}}_{n}\!\!=\!\!\left\{\!{\begin{array}[]{*{20}{c}}\!\!\!{{p}}_{n}{{\bf{h}}_{n}^{H}}{{\left(\!\sum\limits_{\xi\left(n^{\prime}\right)\geq\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}\!\!\!\!\!\!\!\!\!\!\!\!{p_{n}}{{\bf{h}}_{n^{\prime}}}{{\bf{h}}_{n^{\prime}}^{H}}\!+\!\sigma{\bf{I}}\!\right)}^{\!-1}}\!\!\!\!\!\!\!\!&\!\!\!\!{\text{when MMSE}},\!\!\!\\ {{\bf{h}}_{n}^{H}}{{({{\bf{h}}_{n}}{\bf{h}}_{n}^{H})}^{-1}}\!\!\!&\!\!\!\!\!\!\!\!\!\!\!{\text{when ZF}},\!\!\end{array}}\right.\!\!\!

(22)

According to the estimated value of user $n$ , its achieved transmit rate can be calculated by

{R_{n}}={\log_{2}}\left({1\!+\!\frac{{{{\left|{{{\bf{v}}_{n}}{{\bf{h}}_{n}}}\right|}^{2}}{p_{n}}}}{{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}\!\!\!\!\!\!\!\!\!\!{{{\left|{{{\bf{v}}_{n}}{{\bf{h}}_{n^{\prime}}}}\right|}^{2}}{p_{n^{\prime}}}+}{{\left|{{{\bf{v}}_{n}}}\right|}^{2}}{\sigma^{2}}}}}\right).

(23)

When using ZF method, the equalization matrix $\bf{V}$ in (22) is independent of $\bf{p}$ , so that the power allocation problem can be transformed into a convex as appendix E and solved by the CVX solver.

However, when using the MMSE method, the equalization matrix $\bf{V}$ depends on the variable $\bf{p}$ , making the power allocation problem of MMSE intractable and non-convex. To address this difficulty, alternative optimization is employed to decompose the non-convex problem into two subproblems: one for $\bf{V}$ and one for $\bf{p}$ . When $\bf{V}$ is fixed, the power allocation problem can be transformed into a convex problem, as shown in Appendix E. Therefore, the alternative optimization starts with an initial power allocation to calculate the corresponding equalization matrix. Using this equalization matrix, it then applies the convex method to determine a new power allocation. With this updated power allocation, the equalization matrix is recalculated. This process repeats until the difference between successive power allocations is smaller than the set threshold.

Different from the single antenna scenario, the channel gain between the BS and a user $n$ is a complex value, whose real and imaginal parts are denoted as $\bm{h}^{r}_{n}=\{h^{r}_{1,n},...,h^{r}_{M_{r},n}\}$ and $\bm{h}^{c}_{n}=\{h^{c}_{1,n},...,h^{c}_{M_{r},n}\}$ , respectively. To tackle multiple-antenna scenarios, ASOPA modifies its input from ${\bf{X}}={\{({w_{n}},P_{i}^{max},{g_{n}})\}_{n\in{\cal N}}}$ to ${{\bf{X}}}={\{({w_{n}},P_{n}^{max},{\bm{h}^{r}_{n}},\bm{h}^{c}_{n})\}_{n\in{\mathcal{N}}}}$ . The rest of the ASOPA structure remains the same as one illustrated in Fig. 3.

5.2 Imperfect Channel

In this subsection, we assess the impact of estimation errors on the performance of ASOPA. For the perfect CSI scenario, the channel gain is expressed as $g_{n}=\overline{g}_{n}|\alpha_{n}|^{2}$ , where $\overline{g}_{n}=A_{d}\left(\frac{3\cdot 10^{8}}{4\pi f_{c}b_{n}}\right)^{b_{e}}$ and $\alpha_{n}~{}\sim~{}{\cal{CN}}\left({0,1}\right)$ account for path loss power gain and the Rayleigh fading channel coefficient between BS and $n$ -th user, respectively. Since the path loss coefficient are large-scale fading factors and are slowly varying, we assume that the path loss coefficient $\overline{g}_{n}$ between BS and each user can be estimated perfectly. However, in dynamic and complex wireless environments, accurately acquiring time-varying Rayleigh fading channel gains is challenging. Following the approaches in [64, 65], the Rayleigh fading channel gain is modeled as

{\alpha_{n}}={\hat{\alpha}_{n}}+{\epsilon_{n}}

(24)

where ${\alpha_{n}}$ is the realistic Rayleigh fading channel coefficient between BS and $n$ -th user, ${\hat{h}_{n}}\sim{\cal{CN}}\left({0,1-\sigma_{\epsilon}^{2}}\right)$ denotes the estimated channel coefficient, and ${\mathcal{\epsilon}_{n}}~{}\sim~{}{\cal{CN}}\left({0,\sigma_{{{\mathcal{\epsilon}}}}^{2}}\right)$ is the estimated error. Note that the parameter $\sigma_{{{\mathcal{\epsilon}}}}^{2}$ indicates the quality of channel estimation, and keeps constant as [66, 67]. We assume that ${\hat{\alpha}_{n}}$ and ${\epsilon_{n}}$ are uncorrelated.

If the perfect CSI is known, the maximum achievable data rate between BS and $n$ -th can be written as

c_{n}=W\log_{2}(1+\phi_{n})

(25)

where

{\phi_{n}}=\frac{{{p_{n}}|\alpha_{n}|^{2}{{\overline{g}}_{n}}}}{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}|\alpha_{n}|^{2}{{\overline{g}}_{n^{\prime}}}}+{N_{0}}}}.

(26)

In (26), $\phi_{n}$ denotes the signal-to-interference-plus-noise ratio (SINR) of the user $n$ . In practice, the BS can only obtain the estimated fading channel coefficient ${\hat{\alpha}}_{n}$ . The scheduled data rate with imperfect CSI can be expressed as

r_{n}=W\log_{2}(1+{\hat{\phi}}_{n})

(27)

where

{{\hat{\phi}}_{n}}=\frac{{{p_{n}}|\hat{\alpha}_{n}|^{2}{{\overline{g}}_{n}}}}{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}|\hat{\alpha}_{n}|^{2}{{\overline{g}}_{n^{\prime}}}}+{N_{0}}}}.

(28)

However, the scheduled data rate with imperfect CSI may easily exceed the maximum achievable data rate, i.e., $r_{n}>c_{n}$ . To measure the performance of this case, we introduce outage probability as a metric [68, 69]. Therefore, the weighted proportional fairness function with outage probability can be expressed as $\sum\limits_{n=1}^{N}{w_{n}\ln r_{n}\Pr[r_{n}\leq c_{n}|{\hat{\alpha}}_{n}]}$ . $\Pr[r_{n}\leq c_{n}|{\hat{\alpha}}_{n}]$ denotes the probability of a case when the scheduled data rate $r_{n}$ is less than or equal to the maximum data rate $c_{n}$ under the estimated channel coefficient ${\hat{\alpha}}_{n}$ . The optimization problem can be reformulated as


$\displaystyle\max\limits_{\mathbf{\bm{\pi}},\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln r_{n}\Pr[r_{n}\leq c_{n}\|{\hat{\alpha}}_{n}]}$	(29a)
$\displaystyle s.t.\$	$\displaystyle\Pr[c_{n}<r_{n}\|{\hat{\alpha}}_{n}]\leq\epsilon_{out},\forall n\in\mathcal{N},$	(29b)
	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N},$	(29c)
	$\displaystyle\bm{\pi}\in\Pi,$	(29d)

where (29b) is introduced to satisfy the channel outage probability requirement $\epsilon_{out}$ for all users in the imperfect CSI scenario. Due to the probability constraints (29b), this problem (29) turns into a non-convex problem and cannot easily be optimally solved in polynomial time [68]. To tackle this problem efficiently, we transform the probabilistic mixed problem into a non-probability problem as


$\displaystyle\max\limits_{\mathbf{\bm{\pi}},\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln(1-\epsilon_{out}){\tilde{r}}_{n}}$	(30a)
$\displaystyle s.t.\$	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N},$	(30b)
	$\displaystyle\bm{\pi}\in\Pi,$	(30c)

where ${\tilde{r}}_{n}=W\log_{2}(1+{\tilde{\phi}}_{n})$ , and the transformed SINR ${\tilde{\phi}}_{n}$ can be expressed as

{\tilde{\phi}}_{n}=\frac{\epsilon_{out}{F_{|g_{n}|^{2}}^{-1}(\epsilon_{out}/2)p_{n}}}{\epsilon_{out}\sigma_{\epsilon}^{2}+\!\!\!\!\!\!\!\!\!\!\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}\!\!\!\!\!\!\!\!\!\!2(|{\hat{g}}_{n^{\prime}}|^{2}+\sigma^{2}_{\epsilon})p_{n^{\prime}}},

(31)

where $F_{|g_{n}|^{2}}^{-1}(\epsilon_{out}/2)$ denotes the inverse cumulative distribution function of a noncentral chi-square random variable with 2 degrees of freedom and non-centrality parameter $2|{\hat{g}}_{n}|^{2}/\sigma^{2}_{\epsilon}$ . The details of the probabilistic mixed problem transformation are shown in Appendix D.

Notice that once the SIC ordering is determined, the power allocation problem of (61) can be transformed into a convex problem as well as Appendix B and solved by the CVX solver. Thus, ASOPA can be applied to solve it by simply modifying its input from ${\bf{X}}={\{({w_{n}},P_{i}^{max},{g_{n}})\}_{n\in{\cal N}}}$ to ${{\bf{X}}}={\{({w_{n}},P_{n}^{max},{|{{\hat{\alpha}}_{n}|}^{2}{\overline{g}}_{n}})\}_{n\in{\mathcal{N}}}}$

5.3 Multiple BS

In this subsection, we assess the performance of ASOPA in NOMA networks with multiple BSs, represented by the set $\mathbf{B}$ , each containing $N_{b}$ users where $b\in\mathbf{B}$ . In scenarios involving multiple BSs, each user experiences inter-cell interference from users linked to other BSs. To quantify this interference, the channel gain from a user $n$ in BS $b$ to another BS $b^{\prime}$ is defined as $h^{({b})}_{n,b^{\prime}}$ . Specifically, the superscript $(b)$ indicates that user $n$ is associated with BS $b$ .

To accommodate the multiple-BS scenario, it is straightforward to adjust the input of ASOPA to ${\bf{X}}=\{(w_{n}^{(b)},P_{n,max}^{(b)},g_{n}^{(b)},\bm{h}^{(b)}_{n})\}_{n\in\mathcal{N}_{b},b\in\mathbf{B}}$ , where $\bm{h}^{(b)}_{n}=\{h^{(b)}_{n,b^{\prime}}\}_{b^{\prime}\in\mathbf{B}\setminus\{b\}}$ . The resource allocation problem is still convex and can be solved using the CVX solver. The detailed setup of the multiple-BS scenario is in Appendix E.

The addition of inter-cell interference, however, adds complexity to the SIC ordering problem. In ASOPA, the next decoding user is iteratively chosen based on the highest probability from Equation (12), under the assumption that all users are within the same BS. However, in scenarios involving multiple BSs, the SIC decoding order is specific to each BS. Comparing the probabilities of users from different BSs lacks physical insight, as such comparisons are not meaningful in this context. To overcome this issue, we introduce enhance the decoding process by introducing an additional masking mechanism in the decoder. This mechanism allows for the generation of appropriate SIC orderings for users across all BSs. Specifically, the decoder generates the SIC ordering for users associated with the current BS while effectively masking the users of other BSs. This approach ensures efficient decoding order determination in multi-BS NOMA networks without needing to modify Equation (12).

TABLE III: Case study - The mask mechanism in ASOPA in a dual-BS NOMA network

	Input	Output
		$\bm{\ell}^{(1)}_{t}$			$\bm{\ell}^{(2)}_{t}$			Decoded user $\pi^{(b)}_{t}$
		user 1	user 2	user 3	user 1	user 2	user 3	Decoded user $\pi^{(b)}_{t}$
$t=1$	$[\overline{\mathbf{e}},\mathbf{e}_{0}]$	0.3997	0.2890	0.3114	0(masked)	0 (masked)	0 (masked)	$\pi^{(1)}_{1}=1$
$t=2$	$[\overline{\mathbf{e}},\mathbf{e}^{(1)}_{1}]$	0 (masked)	0.4839	0.5161	0 (masked)	0 (masked)	0 (masked)	$\pi^{(1)}_{2}=3$
$t=3$	$[\overline{\mathbf{e}},\mathbf{e}^{(1)}_{3}]$	0 (masked)	1	0 (masked)	0 (masked)	0 (masked)	0 (masked)	$\pi^{(1)}_{3}=2$
$t=4$	$[\overline{\mathbf{e}},\mathbf{e}^{(1)}_{2}]$	0 (masked)	0 (masked)	0 (masked)	0.0731	0.4698	0.4571	$\pi^{(2)}_{1}=2$
$t=5$	$[\overline{\mathbf{e}},\mathbf{e}^{(2)}_{2}]$	0 (masked)	0 (masked)	0 (masked)	0.0407	0 (masked)	0.9593	$\pi^{(2)}_{2}=3$
$t=6$	$[\overline{\mathbf{e}},\mathbf{e}^{(2)}_{3}]$	0 (masked)	0 (masked)	0 (masked)	1	0 (masked)	0 (masked)	$\pi^{(2)}_{3}=1$

To demonstrate the effectiveness of the enhanced mask mechanism in ASOPA, let’s consider a case study outlined in Table III. As depicted in Fig. 6, we consider a dual-BS NOMA network, where each BS contains three users, as $N_{b}=3$ for all $b\in\mathbf{B}=\{1,2\}$ . Consequently, it takes six iterations for ASOPA to establish the SIC ordering for all users. Utilizing the mask mechanism, ASOPA initially decodes the users in BS 1 during iterations $t=1$ , 2, and 3, and then shifts to decoding users in BS 2 for iterations $t=4$ , 5, and 6. In the first iteration ( $t=1$ ), the algorithm calculates the probabilities $\bm{\ell}^{(1)}_{1}$ for users in BS 1 from Equation (12) and simultaneously masks the probabilities of users in BS 2 by setting $\bm{\ell}^{(2)}_{1}=0$ . Given that user 1 in BS 1 has the highest probability of 0.3997, it is selected as the first decoded user, $\pi^{(1)}_{1}=1$ . In the next two iterations, users in BS 2 remain masked, indicated by $\bm{\ell}^{(2)}_{2}=0$ and $\bm{\ell}^{(2)}_{3}=0$ . Conversely, when decoding the SIC ordering for users in BS 2 during iterations $t=4$ , 5, and 6, the users in BS 1 are masked with $\bm{\ell}^{(1)}_{t}=0$ . This mechanism enhances the efficiency and accuracy of ASOPA in multi-BS NOMA networks by systematically focusing on one BS at a time, thereby streamlining the decoding process.

6 Numerical Results

In this section, we evaluate the proposed ASOPA algorithm through simulations in uplink NOMA networks. In these simulations, users are uniformly deployed within a 100-meter radius circle, with a BS at the center. The average channel gain, $\overline{g}_{n}$ , adheres to the free-space path loss model, following $\overline{g}_{n}=A_{d}\left(\frac{3\cdot 10^{8}}{4\pi f_{c}b_{n}}\right)^{b_{e}}$ [43], where $A_{d}=4.11$ represents the antenna gain, $f_{c}=915$ MHz is the carrier frequency, $b_{n}$ is the distance between each user and the BS, and $b_{e}=2.8$ is the path loss exponent. Each user $n$ ’s wireless channel gain, $g_{n}$ , is modeled as a Rayleigh fading channel, expressed as $g_{n}=\overline{g}_{n}|\alpha_{n}|^{2}$ , with $|\alpha_{n}|^{2}$ being an independent random channel fading factor following an exponential distribution with unit mean. The system parameters include a bandwidth $B$ of 1 MHz and a noise power spectral density of $-174$ dBm/Hz. Each user’s maximum power is capped at $P_{n}^{max}=1$ Watt, and the user weight $w_{n}$ is chosen from the set $\{1,2,4,8,16,32\}$ .

For the neural network training, the samples $\mathbf{X}$ arrive in each time slots and are stored in a replay memory of size 1280. The number of users $N$ in each sample varies uniformly between 5 and 10. The batch size for once policy update is set to $\left|\bm{\tau}\right|=64$ , and each training epoch consists of $M=20$ times policy updates. After each training epoch, the baseline network updates its parameters $\theta^{\prime}$ . The learning rate for the Adam optimizer is set at 1e-4, and the embedding dimension for users in the actor network is $d_{e}=128$ . The simulations are carried out on a desktop with an Intel Core i7-10700 2.9 GHz CPU, 32 GB memory, and an NVIDIA GeForce RTX 3060 Ti GPU, ensuring robust computational performance. The source code for ASOPA is accessible at https://github.com/Jil-Menzerna/ASOPA.

6.1 Convergence Performance

In Fig. 7, we evaluate the effect of different parameters on the convergence performance of ASOPA, including different learning rates, batch sizes, and embedding dimensions.

Fig. 7(a) shows the effect of different learning rates. We can see that a significant learning rate (1e-2 or 1e-3) causes the algorithm to converge to a local optimum, but a small learning rate (1e-5) results in slow convergence. Hence, the learning rate is set as 1e-4.

Fig. 7(b) shows the effect of different batch sizes. A small batch size (8 or 16) leads to high variance in the network utility. The larger the batch size, the more memory space the algorithm consumes. Also, a large batch size may reduce the randomness of gradient descent and lead to the local minimum value. Hence, the batch size is set to 64.

Fig. 7(c) shows the effect of different embedding dimensions $d_{e}$ . A small embedding dimension (16) cannot adequately characterize features and thus degrades the performance and convergence speed. A large embedding dimension (256) may overfit the training set, resulting in unstable performance. Hence, the embedding dimension is set as $d_{e}=128$ .

Overall, the simulation results in Fig. 7 show that the proposed ASOPA can converge under the set parameters.

6.2 Network Utility Performance

To evaluate the SIC ordering generated by ASOPA, we compare it with five baseline algorithms:

1.

Exhaustive search [24]: This scheme calculates the network utility for all $N!$ SIC orderings and obtains the optimal network utility.
2.

Tabu search [34]: This scheme initiates a SIC ordering and swaps any two users’ ordering to search. For each search iteration, it tries all possible swapping of two users for a SIC ordering and selects the best one for the next search iteration. To the best of our knowledge, the Tabu search algorithm presented in [34] is the state-of-the-art algorithm for dynamic SIC ordering, albeit at the cost of high computational complexity.
3.

Meta-scheduling [21]: This scheme sequentially adds and inserts each user into an order. It tries every possible insertion position for each insertion and greedily chooses the one with the greatest utility gain.
4.

Weight descending [23]: The static SIC ordering follows the descending order of users’ weights.
5.

Channel descending [25]: The static SIC ordering follows the descending order of users’ channel gains.

After those baseline algorithms determine the SIC ordering, the optimal transmit power is determined by the power allocation method proposed in Section 4.3.

Fig. 8 presents the network utility achieved by different algorithms for varying numbers of users $N$ . Exhaustive search achieves optimal performance with $N=5$ and $N=8$ but not $N=10$ due to the unacceptable running time for enumerating $10!$ possible SIC ordering. When $N=5$ and $N=8$ , through sufficient search iteration, Tabu search achieves 99.61% and 99.19% of the optimal performance obtained by exhaustive search. ASOPA achieves 97.54% and 97.60% of the optimal performance, which is close to the performance of Tabu search. When $N=5$ , $N=8$ , and $N=10$ , ASOPA is over 10% higher in network utility than the other three baseline algorithms besides Tabu search, respectively.

Fig. 9 provides further evaluation of ASOPA in large-scale scenarios, specifically where the number of users $N$ varies from 10 to 20. In Fig. 9(a), ASOPA demonstrates a consistent convergence rate of around 50 epochs, regardless of the specific values of $N$ . Meanwhile, Fig. 9(b) illustrates that ASOPA consistently achieves average 95% performance of Tabu search, and outperforms the other three baseline algorithms in these large-scale scenarios, aligning with the observations from Fig. 8. An interesting observation is that for $N>16$ , the channel descending algorithm surpasses Meta-scheduling in terms of network utility. This comparison further underscores that network utility tends to decline when the number of users in a NOMA system exceeds a certain threshold, particularly when $N>16$ . The decline of network utility can be attributed to the concave logarithm throughput function $\ln{R_{n}}$ in network utility in (3). Intuitively, as the number of users increases, the sum rate $\sum_{n=1}^{N}R_{n}$ tends to saturate, leading to a decrease in the sum of logarithms $\sum_{n=1}^{N}\ln R_{n}$ due to Jensen’s inequality. Overall, the results depicted in Fig. 9 confirm ASOPA’s effectiveness in handling large-scale scenarios and its superiority over all baseline algorithms.

In Fig. 10, we further compare the performance of ASOPA and baseline algorithms over 1000 independent samples when $N=5$ . Fig. 10(a) displays the mean, median, confidence interval, and outliers of the normalized network utility for different algorithms. The normalized network utility is the ratio of the network utility achieved by an algorithm to the optimal network utility obtained by exhaustive search. We observe that the medians of Tabu search and ASOPA are close to 1, and the confidence intervals of Tabu search and ASOPA are over 99% and 97%, respectively. Although some outliers affect the mean of ASOPA, it still outperforms the baseline algorithms besides Tabu search. In Fig. 10(b), we present the hit rate of the top 10 maximum network utilities for ASOPA and baseline algorithms. The hit rate is defined as the percentage of times that an algorithm generates an SIC ordering that appears in the top 10 maximum network utilities obtained by exhaustive search. We observe that ASOPA achieves hit rates of over 55% and 70% for the top 5 and top 10 maximum network utilities, respectively. The results in Fig. 10 further confirm that ASOPA can achieve near-optimal network utility performance.

Fig. 11 shows the network utility under different maximum distances of users to the BS. The network utility achieved by all algorithms decreases slightly as the maximum distance increases. Within the distance range $[100,300]~{}\text{m}$ , ASOPA achieves performance close to that of Tabu search algorithm and outperforms the other three baseline algorithms by an average of 10%.

Fig. 12 demonstrates how the network utility varies with different levels of noise power spectral densities. As the noise power spectral density increases, the network utility of all algorithms decreases, and the difference between ASOPA and baseline algorithms diminishes. At a noise power spectral density of -144 dBm/Hz, ASOPA and the channel descending algorithm exhibit the slightest difference of 4.27%. This result indicates that when the users’ channel quality is inferior, the users’ channel state significantly impacts the SIC ordering.

6.3 Execution Latency

In order to meet the real-time requirement of NOMA networks, the execution time of the SIC ordering and power allocation algorithm need be much smaller than the slot duration, i.e., two seconds [43]. To evaluate the efficiency of ASOPA and baseline algorithms, we test the average execution time under different numbers of users, and the results are shown in Fig. 13 and Table. IV.

TABLE IV: The average execution latency of different algorithms (ms)

Algorithms	N=5	N=8	N=10	N=14	N=20
Exhaustive search	823	$741\,933$	/	/	/
Tabu search	252	$1\,717$	$5\,643$	$45\,122$	$492\,677$
Meta-scheduling	84	349	771	$3\,025$	$14\,737$
Weight des.	7	15	25	60	165
Channel des.	7	14	23	52	132
ASOPA	8	16	24	53	152

The execution time of ASOPA is close to that of the weight descending algorithm and the channel descending algorithm, i.e., 24 ms, 25 ms, and 23 ms for a ten-user NOMA network. The execution delay is scalable with the network size $N$ and is acceptable for field deployment. ASOPA only takes 152 ms even for a twenty-user NOMA network. However, the execution latency of Meta-scheduling and Tabu saerch significantly increases with the network size $N$ , consuming 771 ms and $5643$ ms for a ten-user NOMA network, respectively. The execution time of exhaustive search is exponentially increasing with $N$ . It takes 823 ms and $6630$ ms even for NOMA networks with five-user and six-user, respectively. According to Fig. 13 and Table. IV, Tabu search fails to cope with real-time execution when $N>7$ , while Meta-scheduling fails at $N>10$ . In contrast, ASOPA maintains the same latency as the static algorithm for all the number of users. In particular, is three orders of magnitude lower than Tabu search and two orders of magnitude lower than Meta-scheduling when $N=20$ .

ASOPA uses the actor network to generate the SIC ordering, whose time consumption is negligible. The primary time overhead of various algorithms comes from solving the power allocation problem by the interior-point method. Exhaustive search solves the power allocation problem $N!$ times. Meta-scheduling solves the power allocation ${N(N+1)}/{2}$ times. Tabu search solves the power allocation ${IN(N+1)}/{2}$ times, where $I$ denotes the number of search iterations. ASOPA, the weight descending algorithm, and the channel descending algorithm solve the power allocation problem once. Therefore, the proposed ASOPA executes efficiently like the static SIC ordering algorithms, while performing as well as the exhaustive search algorithm.

Regarding the training latency, ASOPA’s policy update is conducted infrequently and in parallel with the inference process, as detailed in Algorithm 1. Extensive evaluations have shown that the duration of a single policy update is approximately one second when the number of users $N$ is 10 and training batch size is 64. On average, the duration of the policy update process is less than 20 ms for each sample. Therefore, the policy update process of ASOPA can feasibly be executed online for NOMA networks, ensuring that the system remains up-to-date and responsive to changing network conditions.

6.4 Extension Scenario

Fig. 14 presents the network utility achieved by different algorithms for varying numbers of users $N$ in multiple-antenna scenario. Specifically, we consider a NOMA network with two antennas at the BS. All algorithms use the minimum-mean-square-error (MMSE) [62] linear equalization to detect symbols. For $N=5$ , the optimal network utility was calculated using exhaustive search. Remarkably, ASOPA achieves 90.94% of the optimal network utility, with only a 5% performance degradation compared to the Tabu search algorithm. Furthermore, ASOPA consistently outperforms the other three baseline algorithms across all settings, which agrees the observation in Fig. 8. Due to the complexity of user state in multiple antenna scenarios, the performance of ASOPA can be further optimized in future work.

Fig. 15 shows how network utility varies with different variances of estimated error under imperfect CSI conditions with five users. ASOPA consistently achieves over 98% of the optimal performance obtained through exhaustive search and achieves 99% performance of Tabu search. Specifically, when $\sigma_{{\epsilon_{n}}}^{2}=0.025$ , ASOPA’s network utility is 6.75%, 12.49%, 13.44% higher than that of Meta-scheduling, channel descending and weight descending, respectively. These results demonstrate ASOPA’s robustness and effectiveness in scenarios with imperfect CSI, highlighting its ability to adapt to varying degrees of channel estimation errors.

Fig. 16 presents the network utility achieved by various algorithms for different pairs of users $N_{1}$ - $N_{2}$ in dual-BS NOMA networks. From the figure, it’s evident that ASOPA performs comparably to the exhaustive search method and surpasses other benchmark algorithms. Notably, Tabu search in [34] and Meta-scheduling proposed in [21] is not applicable in this scenario and thus is not included in the comparison. Particularly, when each BS has three or four users, ASOPA achieves 99.45% and 99.80% of the optimal performance determined by exhaustive search, respectively. The network utility reaches its peak when each BS is serving six users. This optimal network utilization can be attributed to the increasing inter-cell interference with the number of users and the logarithmic throughput function in network utility. As the number of users increases, although the sum throughput saturates, some users’ weighted logarithmic throughput even becomes a small negative value (say $-10$ ), which leads to a decline in network utility. As the number of users per BS rises to ten, ASOPA’s performance advantage becomes more pronounced, showing a 27.69% higher network utility compared to the channel descending algorithm. These results highlight ASOPA’s effectiveness in adapting to varying user densities in multi-BS NOMA networks.

7 Conclusion

This paper focuses on optimizing the sum-weighted logarithmic throughput in uplink NOMA-based wireless networks by jointly optimizing the SIC ordering and users’ transmit powers. To tackle this problem, we propose the ASOPA framework, which innovatively combines DRL with optimization theory. Key to ASOPA’s success is an attention-based actor network, trained via reinforcement learning, which effectively derives a near-optimal SIC ordering. Subsequently, this is complemented by the application of optimization techniques to allocate the optimal transmit power for users. Simulation results show that ASOPA can achieve near-optimal performance in a low execution latency. A particularly noteworthy aspect of ASOPA is its extensibility; the framework is adept at solving a range of optimization challenges, particularly those that involve dynamic SIC orderings within the NOMA context.

Looking ahead, our aim is to evolve ASOPA for more complex scenarios, including developing a distributed framework for NOMA networks with multiple base stations, tackling the challenges of imperfect SIC decoding and integrating the QoS constraints in our framework.

Appendix A Details of the Multi-head Attention Mechanism

The input to the encoder is denoted as $\mathbf{X}$ . It is first transformed into a $d_{e}$ -dimensional space by a fully connected feed-forward ( $\text{FF}_{1}$ ) layer. The output of the $\text{FF}_{1}$ layer is then fed into a multi-head attention (MHA) layer and a feed-forward ( $\text{FF}_{2}$ ) layer in sequence. Therefore, the encoding process can be expressed as

\displaystyle\begin{split}&\hat{\mathbf{e}}_{n}={\text{BN}}\bigg{(}{\text{FF}}_{1}\big{(}\mathbf{x}_{n}\big{)}+{\text{MHA}}\Big{(}{\text{FF}}_{1}\big{(}\mathbf{X}_{1}\big{)},...,{\text{FF}}_{1}\big{(}\mathbf{X}_{n}\big{)}\Big{)}\bigg{)}\\ &\mathbf{e}_{n}=\text{BN}\bigg{(}\hat{\mathbf{e}}_{n}+\text{FF}_{2}\big{(}\hat{\mathbf{e}}_{n}\big{)}\bigg{)}.\end{split}

(32)

For the MHA layer and the $\text{FF}_{2}$ layer, they have the residual connection (RC) and are followed by batch normalization (BN). The details of each layer are shown as follows.

A.1 Attention Mechanism

We utilize the attention mechanism proposed in [55]. The attention mechanism computes a weighted sum of values, where the weight is determined by a compatibility function based on a query and a set of keys. The query, keys, and values are all embeddings. Specifically, we compute the query $\mathbf{q}_{n}$ , the key $\mathbf{k}_{n}$ , and the value $\mathbf{v}_{n}$ for each user $n$ by multiplying their respective embedding $\mathbf{e}_{n}$ with parameter matrices $\mathbf{W}^{Q}$ , $\mathbf{W}^{K}$ , and $\mathbf{W}^{V}$ . These parameter matrices have sizes $d_{k}\times d_{e}$ , $d_{k}\times d_{e}$ , and $d_{v}\times d_{e}$ , respectively, as

\mathbf{q}_{n}=\mathbf{W}^{Q}\mathbf{e}_{n},\mathbf{k}_{n}=\mathbf{W}^{K}\mathbf{e}_{n},\mathbf{v}_{n}=\mathbf{W}^{V}\mathbf{e}_{n}.

(33)

Then we compute the compatibility $u_{n,j}$ of user $n$ ’s query $\mathbf{q}_{n}$ with user $j$ ’s key $\mathbf{k}_{j}$ :

u_{n,j}=\frac{{\mathbf{q}_{n}}^{T}\mathbf{k}_{j}}{\sqrt{d_{k}}}.

(34)

From $u_{n,j}$ , we can compute the attention weight $a_{n,j}$ by a softmax function:

a_{n,j}=\frac{\exp\left(u_{n,j}\right)}{\sum_{j=1}^{N}\exp\left(u_{n,j}\right)}.

(35)

Finally, we compute the sum of weighted keys to get the final message $\mathbf{e}_{n}^{\prime}$ :

\mathbf{e}_{n}^{\prime}=\sum_{j=1}^{N}a_{n,j}\mathbf{v}_{j}.

(36)

A.2 Multi-head Attention

Multi-head attention uses $M$ groups of different parameters $\mathbf{W}_{m}^{Q}$ , $\mathbf{W}_{m}^{K}$ and $\mathbf{W}_{m}^{V}$ . We set $M=8$ and $d_{k}=d_{v}=\frac{d_{e}}{M}=16$ , to get the messages, which are denoted as $\mathbf{e}_{n,m}^{\prime},\forall m\in\{1,\dots,M\}$ , and use $d_{e}\times d_{v}$ matrices $\mathbf{W}_{m}^{A}$ to change their size and then sum them up as the final message:

\text{MHA}_{n}\left(\mathbf{e}_{1},\dots,\mathbf{e}_{N}\right)=\sum_{m=1}^{M}\mathbf{W}_{m}^{A}\mathbf{e}_{n,m}^{\prime}.

(37)

A.3 Feed Forward Layer

There are two feed-forward layers in the encoder. The first $\text{FF}_{1}$ is just a fully connected layer with learnable parameters $\mathbf{W}_{1}$ and $\mathbf{b}_{1}$ :

\text{FF}_{1}\left(\mathbf{x}_{n}\right)=\mathbf{W}_{1}\mathbf{x}_{n}+\mathbf{b}_{1}.

(38)

And the second feed-forward layer $\text{FF}_{2}$ consists of two fully connected layer and use a Relu activation after the first connected layer:

\text{FF}_{2}(\hat{\mathbf{e}}_{n})=\mathbf{W}_{2,2}\text{Relu}\left(\mathbf{W}_{2,1}\hat{\mathbf{e}}_{n}+\mathbf{b}_{2,1}\right)+\mathbf{b}_{2,2},

(39)

where $\hat{\mathbf{e}}_{n}$ is the input for $\text{FF}_{2}$ , $\mathbf{W}_{2,1}$ and $\mathbf{b}_{2,1}$ are the parameter matrix and bias of the first fully connected layer, respectively, and $\mathbf{W}_{2,2}$ and $\mathbf{b}_{2,2}$ are the one of the second layer, respectively.

A.4 Batch Normalization

We use batch normalization shown in [53]:

\text{BN}\left(\mathbf{e}_{n}\right)=\mathbf{w}^{\text{bn}}\odot\overline{\text{BN}}\left(\mathbf{e}_{n}\right)+\mathbf{b}^{\text{bn}},

(40)

where $\mathbf{w}^{\text{bn}}$ and $\mathbf{b}^{\text{bn}}$ are learnable $d_{e}$ -dimensional affine parameters, $\odot$ denotes the element-wise product, and $\overline{\text{BN}}$ refers to batch normalization without affine transformation.

Appendix B Convex transformation and proof of P1

For user’s transmit power $p_{n}>0,\forall n\in\mathcal{N}$ , let $p_{n}=e^{y_{n}},\forall n\in\mathcal{N}$ . We introduce an auxiliary variable $\nu_{n}$ for user $n$ and add a constraint to guarantee that the weighted logarithmic throughput of user $n$ is not less than ${\nu_{n}}$ . Then the power allocation sub-problem P1 can be transformed into the following convex problem


$\displaystyle\textbf{P2}:$	$\displaystyle\max\limits_{\bm{\nu,y}}\ \sum\limits_{n=1}^{N}{\nu_{n}}$	(41a)
$\displaystyle s.t.\$	$\displaystyle e^{y_{n}}\leq P_{n}^{max},$	(41b)
	$\displaystyle w_{n}\ln{\log_{2}\left(1+\frac{e^{y_{n}}g_{n}}{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}{e^{y_{n^{\prime}}}g_{n^{\prime}}}+N_{0}}\right)}\geq{\nu_{n}},$	(41c)
	$\displaystyle\forall n\in\mathcal{N},$

where $\bm{\nu}=[\nu_{1},\nu_{2},...,\nu_{N}]$ and $\bm{y}=[y_{1},y_{2},...,y_{N}]$ . It’s easy to know that (41a) and (41b) are convex. Next, we will show that (41c) is convex. First we convert (41c) as

\frac{e^{y_{n}}g_{n}}{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}{e^{y_{n^{\prime}}}g_{n^{\prime}}}+N_{0}}\geq 2^{e^{\frac{\nu_{n}}{w_{n}}}}-1.

(42)

Then we take the reciprocal of both sides and take the natural logarithm of both sides, so we can get

\ln\left(\frac{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}{e^{y_{n^{\prime}}}g_{n^{\prime}}}+N_{0}}{e^{y_{n}}g_{n}}\right)+\ln\left(2^{e^{\frac{\nu_{n}}{w_{n}}}}-1\right)\leq 0.

(43)

The first term in the left-hand-side (LHS) is a log-sum-exp function which is convex [52]. The second-order derivative of the second term in the LHS is

\frac{\ln 2}{{w_{n}}^{2}}e^{\frac{\nu_{n}}{w_{n}}}2^{e^{\frac{\nu_{n}}{w_{n}}}}\left(2^{e^{\frac{\nu_{n}}{w_{n}}}}-\ln 2e^{\frac{\nu_{n}}{w_{n}}}-1\right),

(44)

whose value is non-negative. Since the first order derivative of the term inside brackets in (44) is $\frac{\ln 2}{{{w}_{n}}}{{e}^{\frac{{{\nu}_{n}}}{{{w}_{n}}}}}\left({{2}^{{{e}^{\frac{{{\nu}_{n}}}{{{w}_{n}}}}}}}-1\right)$ , which is positive due to ${{\nu}_{n}}>0,{{w}_{n}}>0$ . Thus, the minimum of (44) is $\ln{\frac{e}{2}}$ larger than zero, and the second term in the LHS is also convex. Therefore, (41c) is convex. For (41a) $\sim$ (41c) are convex, the problem P2 is convex. The proof is completed.

Appendix C Probabilistic problem Transformation

In the imperfect CSI scenario, the outage probability requirement turns the problem into an intractable non-convex probability mixed problem. Following [70, 69], we transform this problem into a non-probability problem by approximations.

Firstly, we transform the outage probabilistic requirement into another form of probabilistic constraint as follows. The maximum achievable data rate is rewritten as

	$\displaystyle c_{n}$	$\displaystyle=W\log_{2}(1+{\phi}_{n})$
		$\displaystyle=W\log_{2}\left(1+\frac{c^{S}_{n}}{c^{I}_{n}}\right),$		(45)

where $c^{S}_{n}={{p_{n}}|\alpha_{n}|^{2}{{\overline{g}}_{n}}}$ and $c^{I}_{n}=\!\!\!\!\!\!\!{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{p_{n^{\prime}}}|\alpha_{n}|^{2}{{\overline{g}}_{n^{\prime}}}}+{N_{0}}}$ . The scheduled data rate can be rewritten as

	$\displaystyle r_{n}$	$\displaystyle=W\log_{2}(1+{\hat{\phi}}_{n})$
		$\displaystyle=W\log_{2}\left(1+\frac{{b}^{S}_{n}}{{b}^{I}_{n}}\right),$		(46)

and we have

\displaystyle{\hat{\phi}}_{n}=\frac{{b}^{S}_{n}}{{b}^{I}_{n}}=2^{\frac{r_{n}}{W}}-1.

(47)

According to the above transformation and the total probability theorem, the outage probability constraints can be transformed as

$\displaystyle\Pr[c_{n}<r_{n}\|{\hat{\alpha}}_{n}]=$	$\displaystyle\Pr\left[{\phi}_{n}<{\hat{\phi}}_{n}\|{\hat{\alpha}}_{n}\right]$
$\displaystyle=$	$\displaystyle\Pr\left[\frac{c^{S}_{n}}{c^{I}_{n}}<2^{\frac{r_{n}}{W}}-1\|{\hat{\alpha}}_{n}\right]$
$\displaystyle=$	$\displaystyle\Pr[E1]\cdot\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle+\Pr[E2]\cdot\Pr\left[c_{n}^{S}>{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out},$	(48)

where $\Pr[E_{1}]=\Pr\left[\frac{c^{S}_{n}}{c^{I}_{n}}<2^{\frac{r_{n}}{W}}-1|c_{n}^{S}\leq{b}^{S}_{n},{\hat{\alpha}}_{n}\right]$ and $\Pr[E_{2}]=\Pr\left[\frac{c^{S}_{n}}{c^{I}_{n}}<2^{\frac{r_{n}}{W}}-1|c_{n}^{S}>{b}^{S}_{n},{\hat{\alpha}}_{n}\right]$ . Then, we have the following theorem.

Theorem 1

Following [69], the outage probability constraint (48) can be approximated as

\displaystyle\Pr\left[c_{n}^{I}\geq{b}^{I}_{n}|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out}/2,

(49)

and

\displaystyle\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}|{\hat{\alpha}}_{n}\right]=\epsilon_{out}/2.

(50)

Proof 1

According to (23), we have

	$\displaystyle\Pr\left[c_{n}^{I}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ =\Pr\left[c_{n}^{I}\geq{b}^{S}_{n}/(2^{\frac{r_{n}}{W}}-1)\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ =\Pr\left[\frac{{b}^{S}_{n}}{c_{n}^{I}}\leq 2^{\frac{r_{n}}{W}}-1\|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out}/2,$		(51)

and when $c_{n}^{S}>{b}^{S}_{n}$ , we can always have

\displaystyle\Pr\left[E_{2}\right]=\Pr\left[\frac{c^{S}_{n}}{c^{I}_{n}}<2^{\frac{r_{n}}{W}}-1|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out}/2.

(52)

According to (24), we have

\displaystyle\Pr\left[c_{n}^{S}>{b}^{S}_{n}|{\hat{\alpha}}_{n}\right]=1-\epsilon_{out}/2.

(53)

Based on (52) and (53), the probabilistic constraint (48) satisfies the following approximations

	$\displaystyle\Pr[c_{n}<r_{n}\|{\hat{\alpha}}_{n}]$
$\displaystyle=$	$\displaystyle\Pr[E1]\cdot\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]+\Pr[E2]\cdot\Pr\left[c_{n}^{S}>{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
$\displaystyle\leq$	$\displaystyle\epsilon_{out}/2+(\epsilon_{out}/2)(1-\epsilon_{out}/2)=\epsilon_{out}-\epsilon_{out}^{2}/4.$	(54)

For $\epsilon_{out}\ll 1$ , we have $\epsilon_{out}-\epsilon_{out}^{2}/4\approx\epsilon_{out}$ . Therefore, the probabilistic constraint (48) can be approximated as (49) and (50). This completes the proof.

Secondly, based on the transformed probabilistic constraints (49) and (50) of Theorem 1, the probabilistic mixed problem can be further transformed to a non-probabilistic problem as follows.

According to the Markov inequality, the LHS of (49) can derive as follows [70]

	$\displaystyle\Pr\left[c_{n}^{I}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ =\Pr\left[{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}+{N_{0}}}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \leq\frac{E\left[{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}}\right]}{{b}^{I}_{n}-N_{0}}$
	$\displaystyle\ \ \ \ \ \ =\frac{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}}}{{b}^{I}_{n}-N_{0}}=\epsilon_{out}/2,$		(55)

where the right side of the Markov inequality is set to $\epsilon_{out}/2$ according to (49).

Since $\left|{\alpha}_{n}^{2}\right|$ is a non-central chi-squared distributed random variable with two degrees of freedom, the LHS of (50) can be rewritten as

	$\displaystyle\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =\Pr\left[{p_{n}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n}}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =\Pr\left[{\left\|\alpha_{n}\right\|}^{2}\leq\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =F_{{\left\|\alpha_{n}\right\|}^{2}}\left(\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}\right)$
	$\displaystyle\ \ \ \ \ =1-Q_{1}\left(\sqrt{\frac{2\left\|{\hat{\alpha}}_{n}\right\|^{2}}{\sigma_{\epsilon}^{2}}},\sqrt{\frac{2}{\sigma_{\epsilon}}\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}}\right)$		(56)

where $F(\cdot)$ denotes a cumulative distribution function (cdf) of a non-central chi-square random variable with non-centrality parameter ${2\left|{\hat{\alpha}}_{n}\right|^{2}}/{\sigma_{\epsilon}^{2}}$ , and $Q_{1}(\cdot)$ is the first-order Marcum Q-function. Based on (50), (56) is equal to $\epsilon_{out}/2$ , and ${b}^{S}_{n}$ can be expressed as

\displaystyle{b}^{S}_{n}=F^{-1}_{{\left|\alpha_{n}\right|}^{2}}(\epsilon/2)\cdot p_{n}{\overline{g}}_{n},

(57)

where $F^{-1}(\cdot)$ is the inverse non-central chi-square cdf of $F(\cdot)$ . Based on (47), (57) and $|\alpha_{n}|^{2}=|{\hat{\alpha}}_{n}|^{2}+\sigma_{\epsilon}^{2}$ , (55) can be further transformed into

	$\displaystyle\frac{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}}}{b_{n}^{S}/(2^{\frac{r_{n}}{W}}-1)-N_{0}}$
	$\displaystyle\ \ \ \ \ =\frac{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\left(\|{\hat{\alpha}}_{n}\|^{2}+\sigma^{2}_{\epsilon}\right){{\overline{g}}_{n^{\prime}}}}}}{\frac{F^{-1}_{{\|\alpha_{n}\|}^{2}}\left(\epsilon_{out}/2\right)\cdot p_{n}{\overline{g}}_{n}}{2^{\frac{r_{n}}{W}}-1}-N_{0}}=\frac{\epsilon_{out}}{2}.$		(58)

Therefore, considering the outage probability constraint, the approximated signal-to-interference-plus-noise ratio (SINR) ${\tilde{\phi}}_{n}$ for the $n$ -th user can be derived as

\displaystyle{\tilde{\phi}}_{n}=\frac{\epsilon_{out}{F^{-1}_{{|\alpha_{n}|}^{2}}\left(\epsilon_{out}/2\right)\cdot p_{n}{\overline{g}}_{n}}}{\epsilon_{out}N_{0}+2{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\left(|{\hat{\alpha}}_{n}|^{2}+\sigma^{2}_{\epsilon}\right){{\overline{g}}_{n^{\prime}}}}}},

(59)

and the corresponding data rate can be written as

\displaystyle\tilde{r}_{n}=W\log 2(1+{\tilde{\phi}}_{n}).

(60)

Finally, the weighted proportional fairness function with outage probability is transformed into the following non-probability optimization problem


	$\displaystyle\max\limits_{\mathbf{\bm{\pi}},\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln(1-\epsilon_{out}){\tilde{r}}_{n}}$
	$\displaystyle s.t.\$	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N},$
		$\displaystyle\bm{\pi}\in\Pi.$

Appendix D Alternative Algorithm for Multiple-antenna with MMSE equalization matrices

Under MMSE methods, the equalization matrices involving transmit power variable $\bf p$ turn the power allocation subproblem into an intractable non-convex problem. To tackle this non-convex problem, we utilize the alternative algorithm to further transform it into the following subproblem: the calculation of ${\bf V}$ under given $\bf p$ and the optimization of $\bf p$ under given ${\bf V}$ . The specific process of the alternative algorithm is as follows.

Firstly, we initiate the transmit power ${\bf p}$ .

Secondly, according to the definition, the equalization matrices ${\bf V}$ under the MMSE method can be easily calculated by the given $\bf p$ as

{\bf V}={\bf{P}}{{\bf{H}}^{H}}{{({{\bf{H}}}{\bf{P}}{\bf{H}}^{H}+\sigma{\bf{I}})}^{-1}}

(62)

Thirdly, obtained ${\bf V}$ , the power allocation subproblem can be formulated as


$\displaystyle\textbf{P1}:R(\bm{\pi})=\max\limits_{\textbf{p}}\$	$\displaystyle\sum\limits_{n=1}^{N}{w_{n}\ln{R_{n}}}$	(63a)
$\displaystyle s.t.\$	$\displaystyle 0<p_{n}\leq P_{n}^{max},\forall n\in\mathcal{N}.$	(63b)

where $R_{n}$ is

{R_{n}}\!=\!{\log_{2}}\!\left(\!\!{1\!+\!\frac{{{{\left|{{{\bf{v}}_{n}}{{\bf{h}}_{n}}}\right|}^{2}}{p_{n}}}}{{\sum\limits_{\xi\left(n^{\prime}\right)>\xi\left(n\right),\forall n^{\prime}\in\mathcal{N}}\!\!\!\!\!\!\!\!\!\!{{{\left|{{{\bf{v}}_{n}}{{\bf{h}}_{n^{\prime}}}}\right|}^{2}}{p_{n^{\prime}}}+}{{\left|{{{\bf{v}}_{n}}}\right|}^{2}}{\sigma^{2}}}}}\!\!\right)\!.

(64)

${\bf v}_{n}$ denotes the $n$ -row of the obtained ${\bf V}$ . Since ${\bf v}_{n}$ is a constant under a given ${\bf V}$ , (63) can be transformed into a convex problem as Appendix. B and then solved by CVX solver.

The alternative algorithm repeat the second and third steps above until the gap between the previous iteration’s ${\bf p}$ and the current iteration’s ${\bf p}$ is less than the threshold value.

Appendix E Multiple-BS Scenario

We consider a uplink NOMA network consisting of a set of BSs $\mathbf{B}$ and each BS $b$ is associated with $N_{b}$ users. A BS simultaneously receives signal from its associated users and the other users, and iteratively decodes signal via a SIC ordering. In the SIC process, the remaining undecoded signal and the other users’ signal are treated as interference. Therefore, the SINR between user $n$ and BS $b$ can be expressed as

\displaystyle{\phi^{(b)}_{n}}=\frac{{{p}^{(b)}_{n}}{g^{(b)}_{n}}}{{\sum\limits_{\!\!\!\!\!\!\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}_{b}}{{p^{(b)}_{n^{\prime}}}{g^{(b)}_{n^{\prime}}}}+\sum\limits_{b^{\prime}\in\mathbf{B}\setminus b,\forall m\in{\cal N}_{b}}{{p^{(b^{\prime})}_{m}}{h^{(b^{\prime})}_{m,b}}}+{N_{0}}}}.

Then, we have the data rate between user $n$ associated with BS $b$ ,

\displaystyle{R^{(b)}_{n}}={\log_{2}}(1+\phi^{(b)}_{n}).

(65)

Therefore, the joint SIC ordering and power allocation optimization problem for multiple-BS NOMA can be expressed as:


	$\displaystyle\mathop{\max}\limits_{{\bf{\pi}},{\bf{p}}}\sum\limits_{b\in{\bf{B}}}{\sum\limits_{n=1}^{N_{b}}{{w^{(b)}_{n}}\ln{R^{(b)}_{n}}}}$
$\displaystyle s.t.\ \$	$\displaystyle 0<{p^{(b)}_{n}}\leq P_{n,max}^{{(b)}},\forall n\in N_{b},\forall b\in{\bf{B}}$	(66a)
	$\displaystyle\pi\in{\Pi_{\mathbf{B}}}$	(66b)

where ${w^{(b)}_{n}}$ is the weight of user $n$ associated with BS $b$ and ${\bf{\pi}}=\left[\{\pi^{(b)}_{n}\}_{n\in\mathcal{N}_{b}}|b\in\bf{B}\right]$ indicates the SIC order. Here ${\pi^{(b)}_{i}}=n$ means that the $i$ -th decoded user in BS $b$ is user $n$ and ${\Pi_{\mathbf{B}}}$ is the permutation set of all possible SIC orderings.

References

[1] Z. Yang et al., ”AI-Driven UAV-NOMA-MEC in Next Generation Wireless Networks,” IEEE Wireless Commun., vol. 28, no. 5, pp. 66-73, Oct. 2021.
[2] Y. Liu, S. Zhang, X. Mu, Z. Ding, R. Schober, N. Al-Dhahir, E. Hossain, and X. Shen. “Evolution of NOMA toward next generation multiple access (NGMA) for 6G,” IEEE J. Sel. Areas Commun., vol. 40, no. 4, pp. 1037–1071, Apr. 2022.
[3] A. Zakeri, A. Khalili, M. R. Javan, N. Mokari, and E. Jorswieck, “Robust energy-efficient resource management, SIC ordering, and beamforming design for MC MISO-NOMA enabled 6G,” IEEE Trans. Signal Process., vol. 69, pp. 2481–2498, Mar. 2021.
[4] Z. Lin, M. Lin, J.B. Wang, T. De Cola and J. Wang, “Joint beamforming and power allocation for satellite-terrestrial integrated networks with non-orthogonal multiple access,” IEEE J. Sel. Topics Signal Process., vol. 13, no. 3, pp.657-670, 2019.
[5] M. Diamanti, G. Fragkos, E.E. Tsiropoulou and S. Papavassiliou, “Unified user association and contract-theoretic resource orchestration in NOMA heterogeneous wireless networks,” IEEE Open J. Commun. Soc., 1, pp.1485-1502, 2020.
[6] Z. Lin, M. Lin, T. De Cola, J.B. Wang, W.P. Zhu and J. Cheng, “Supporting IoT with rate-splitting multiple access in satellite and aerial-integrated networks,” IEEE Internet Things J., vol. 8, no. 14, pp. 11123-11134, 2021.
[7] A. Akbar, S. Jangsher, and F. A. Bhatti, “NOMA and 5G emerging technologies: A survey on issues and solution techniques,” Comput. Netw., vol. 190, pp. 107950, 2021.
[8] Y. Wu, Y. Song, T. Wang, L. Qian and T. Q. S. Quek, “Non-Orthogonal Multiple Access Assisted Federated Learning via Wireless Power Transfer: A Cost-Efficient Approach,” IEEE Trans. Commun, vol. 70, no. 4, pp. 2853–2869, Apr. 2022.
[9] O. Maraqa, A. S. Rajasekaran, S. Al-Ahmadi, H. Yanikomeroglu, and S. M. Sait, “A survey of rate-optimal power domain NOMA with enabling technologies of future wireless networks,” IEEE Commun. Surveys Tuts., vol. 22, no. 4, pp. 2192–2235, 4th Quart., 2020.
[10] M. vaezi, R. Schober, Z. Ding, and H. V. Poor, “Non-orthogonal multiple access: Common myths and critical questions,” IEEE Wireless Commun., vol. 26, no. 5, pp. 174–180, Oct. 2019.
[11] S. M. R. Islam, N. Avazov, O. A. Dobre, and K. S. Kwak, “Power-domain non-orthogonal multiple access (NOMA) in 5G systems: Potentials and challenges,” IEEE Commun. Surveys Tuts., vol. 12, no. 2, pp. 721–742, 2nd Quart., 2016.
[12] D. Zhai, R. Zhang, L. Cai, B. Li and Y. Jiang, “Energy-Efficient User Scheduling and Power Allocation for NOMA-Based Wireless Networks With Massive IoT Devices,” IEEE Internet Things J., vol. 5, no. 3, pp. 1857–1868, Jun. 2018.
[13] K. Chi, Z. Chen, K. Zheng, Y.-H. Zhu, and J. Liu, “Energy provision minimization in wireless powered communication networks with network throughput demand: TDMA or NOMA?” IEEE Trans. Commun., vol. 67, no. 9, pp. 6401–6414, Sep. 2019.
[14] A. Zakeri, M. Moltafet, and N. Mokari, “Joint radio resource allocation and SIC ordering in NOMA-based networks using submodularity and matching theory,” IEEE Trans. Veh. Technol., vol. 68, no. 10, pp. 9761–9773, Oct. 2019.
[15] L. Huang et al, “Throughput Guarantees for Multi-Cell Wireless Powered Communication Networks with Non-Orthogonal Multiple Access,” IEEE Trans. Veh. Technol., vol. 7, no. 11, pp. 12104-12116, Nov. 2022.
[16] J. Cui, Z. Ding, and P. Fan, “The application of machine learning in mmWave-NOMA systems,” in Proc. IEEE 87th Veh. Technol. Conf., pp. 1–6., 2018.
[17] L. You and D. Yuan, “A note on decoding order in user grouping and power optimization for multi-cell NOMA with load coupling,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 495–505, Jan. 2021.
[18] K. Jiang, T. Jing, Y. Huo, F. Zhang, and Z. Li, “SIC-based secrecy performance in uplink NOMA multi-eavesdropper wiretap channels,” IEEE Access, vol. 6, pp. 19664–19680, Apr. 2018.
[19] Y. Gao, B. Xia, K. Xiao, Z. Chen, X. Li, and S. Zhang, “Theoretical analysis of the dynamic decode ordering SIC receiver for uplink NOMA systems,” IEEE Commun. Lett., vol. 21, no. 10, pp. 2246-2249, Oct. 2017.
[20] Y. Gao, B. Xia, Y. Liu, Y. Yao, K. Xiao, and G. Lu, “Analysis of the dynamic ordered decoding for uplink NOMA systems with imperfect CSI,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 6647–6651, Jul. 2018.
[21] L. Qian, A. Feng, Y. Huang, Y. Wu, B. Ji, and Z. Shi, “Optimal SIC ordering and computation resource allocation in MEC-aware NOMA NB-IoT networks,” IEEE Internet Things J., vol. 6, no. 2, pp. 2806–2816, Apr. 2019.
[22] L. Zhang, F. Fang, G. Huang, Y. Chen, H. Zhang, Y. Jiang, and W. Ma, “Energy-Efficient Non-Orthogonal Multiple Access for Downlink Communication in Mobile Edge Computing Systems,” IEEE Trans. Mobile Comput., vol. 21, no. 12, pp. 4310-4322, Dec. 2022.
[23] Z. Ding, R. Schober, and H. V. Poor, “ Unveiling the importance of SIC in NOMA systems—Part II: New results and future directions,” IEEE Commun. Lett., vol. 24, no. 11, pp. 2378–2382, Nov. 2020.
[24] D. Hu, Q. Zhang, Q. Li, and J. Qin, “Joint position, decoding order, and power allocation optimization in UAV-based NOMA downlink communications,” IEEE Syst. J., vol. 14, no. 2, pp. 2949–2960, Jun. 2020.
[25] R. Duan, J. Wang, C. Jiang, H. Yao, Y. Ren, and Y. Qian, “ Resource allocation for multi-UAV aided IoT NOMA uplink transmission systems,” IEEE Internet Things J., vol. 6, no. 4, pp. 7025–7037, Aug. 2019.
[26] W. Wang, N. Zhao, L. Chen, X. Liu, Y. Chen and D. Niyato, ”UAV-Assisted Time-Efficient Data Collection via Uplink NOMA,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7851-7863, Nov. 2021.
[27] Y. Pan, C. Pan, Z. Yang, and M. Chen, “Resource allocation for D2D communication underlaying a NOMA-based cellular network,” IEEE Wireless Commun. Lett., vol. 7, no. 1, pp. 130–133, Feb. 2018.
[28] H. Sun, F. Zhou, R. Q. Hu, and L. Hanzo, “Robust beamforming design in a NOMA cognitive radio network relying on SWIPT,” IEEE J. Sel. Areas Commun., vol. 37, no. 1, pp. 142-155, Jan. 2019.
[29] X. Chen, A. Benjebbour, Y. Lan, A. Li, and H. Jiang, “Evaluation of downlink non-orthogonal multiple access (NOMA) combined with SU-MIMO,” in Proc. IEEE 25th Annu. Int. Symp. Pers., Indoor, Mobile Radio Commun., Washington, DC, USA, 2014, pp. 1887–1891.
[30] P. Lai, Q. He, G. Cui, F. Chen, J. Grundy, M. Abdelrazek, J. Hosking, and Y.Yang, “Cost-Effective User Allocation in 5G NOMA-Based Mobile Edge Computing Systems,” IEEE Trans. Mobile Comput., vol. 21, no. 12, pp. 4263-4278, Dec. 2022.
[31] G. Cui, Q. He, X. Xia, F. Chen, F. Dong, H. Jin, and Y. Yang, ”OL-EUA: Online User Allocation for NOMA-Based Mobile Edge Computing,” IEEE Trans. Mobile Comput., vol. 22, no. 4, pp. 2295-2306, Apr. 2023
[32] K. Yakou and K. Higuchi, “Downlink NOMA with SIC using unified user grouping for non-orthogonal user multiplexing and decoding order,” in Proc. Int. Symp. Intell. Signal Process. Commun. Syst. (ISPACS), Indonesia, Nusa Dua, 2015, pp. 508–513.
[33] Z. Liu, F. Yang, J. Song and Z. Han, “NOMA-Based MISO Visible Light Communication Systems With Optical Intelligent Reflecting Surface: Joint Active and Passive Beamforming Design,” IEEE Internet Things J., vol. 11, no. 10, pp. 18753-18767, 15 May15, 2024,
[34] L. P. Qian, X. Dong, M. Wu, Y. Wu and L. Zhao. “Long-Term Energy Consumption Minimization in NOMA-Enabled Vehicular Edge Computing Networks.” IEEE Trans. Intell. Transp. Syst., doi: 10.1109/TITS.2024.3404991.
[35] L. Huang, X. Feng, C. Zhang, L. P. Qian, and Y. Wu, “Deep Reinforcement Learning-based Joint Task Offloading and Bandwidth Allocation for Multi-User Mobile Edge Computing”, Digital Commun. Netw., vol. 5, no. 1, pp. 10-17, Feb. 2019.
[36] Y.-C. Wu, T. Q. Dinh, Y. Fu, C. Lin, and T. Q. S. Quek, “A hybrid DQN and optimization approach for strategy and resource allocation in MEC Networks,” IEEE Trans. Wirel. Commun., vol. 20, no. 7, pp. 4282-4295, Jul. 2021.
[37] X. Zhou, L. Huang, T. Ye and W. Sun, “Computation Bits Maximization in UAV-Assisted MEC Networks With Fairness Constraint,” IEEE Internet Things J., vol. 9, no. 21, pp. 20997-21009, Nov. 1, 2022.
[38] S. Tuli, S. Ilager, K. Ramamohanarao, and R. Buyya, “Dynamic scheduling for stochastic edge-cloud computing environments using A3C learning and residual recurrent neural networks,” IEEE Trans. Mobile Comput., vol. 21, no. 3, pp. 940–954, Aug. 2020.
[39] T. Zhao, F. Li and L. He, “DRL-based joint resource allocation and device orchestration for hierarchical federated learning in NOMA-enabled industrial IoT,” IEEE Trans. Ind. Informat.,early access, Apr. 27, 2022, doi: 10.1109/TII.2022.3170900.
[40] Y. Zou, Y. Liu, X. Mu, X. Zhang, Y. Liu and C. Yuen, “Machine Learning in RIS-assisted NOMA IoT Networks,” IEEE Internet Things J., early access, Feb. 16, 2022, doi: 10.1109/JIOT.2023.3245288.
[41] M. Samir, M. Elhattab, C. Assi, S. Sharafeddine, and A. Ghrayeb, “Optimizing age of information through aerial reconfigurable intelligent surfaces: A deep reinforcement learning approach,” IEEE Trans. Veh. Technol., vol. 70, no. 4, pp. 3978–3983, Apr. 2021.
[42] X. Li, Q. Wang, J. Liu, and W. Zhang, “Trajectory design and generalization for UAV enabled networks: A deep reinforcement learning approach,” Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Seoul, South Korea, pp. 1-6, 2020.
[43] L. Huang, S. Bi, and Y.-J. A. Zhang, “Deep reinforcement learning for online computation offloading in wireless powered mobile edge computing networks,” IEEE Trans. Mobile Comput., vol. 19, no. 11, pp. 2581-2593, Jul. 2020.
[44] Q. Zhang, “DROO: Integrated Learning and Optimization for Edge Computing Offloading,” Computer., vol. 54, no. 12, pp.4-6, Dec. 2021.
[45] S. Bi, L. Huang, H. Wang and Y. -J. A. Zhang, ”Lyapunov-Guided Deep Reinforcement Learning for Stable Online Computation Offloading in Mobile-Edge Computing Networks,” in IEEE Transactions on Wireless Communications, vol. 20, no. 11, pp. 7519-7537, Nov. 2021.
[46] X. Li, L. Huang, H. Wang, S. Bi , and Y. J. A. Zhang, “ An integrated optimization-learning framework for online combinatorial computation offloading in MEC networks,” IEEE Wireless Communications , vol. 29, no. 1, pp. 170-177, Jan. 2022.
[47] B. Zhu,K. Chi, J. Liu, K. Yu, and S. Mumtaz. “Efficient offloading for minimizing task computation delay of NOMA-based multiaccess edge computing.” IEEE Trans. Commun., vol. 70, no. 5, pp. 3186-3203, 2022.
[48] F. Kelly, A. Maulloo, and D. Tan, “Rate control for communication networks: Shadow prices, proportional fairness and stability,” J.Oper. Res. Soc., vol. 49, no. 3, pp. 237–252, 1998.
[49] W. Li, S. Wang, Y. Cui, X. Cheng, R. Xin, M. A. Al-Rodhaan and A. Al-Dhelaan, ”AP Association for Proportional Fairness in Multirate WLANs,” IEEE/ACM Trans. Netw., vol. 22, no. 1, pp. 191-202, Feb. 2014.
[50] L. Chen, Y. Feng, B. Li, and B. Li, ”Promenade: Proportionally Fair Multipath Rate Control in Datacenter Networks with Random Network Coding,” IEEE Trans. Parallel Distrib. Syst., vol. 30, no. 11, pp. 2536-2546, Nov. 2019.
[51] I. -H. Hou and P. Gupta, “Proportionally Fair Distributed Resource Allocation in Multiband Wireless Systems,” IEEE/ACM Trans. Netw., vol. 22, no. 6, pp. 1819-1830, Dec. 2014
[52] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to algorithms, MIT Press, 2022.
[53] W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” 2018. [Online]. Available: arXiv: 1803.08475.
[54] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer networks,” Proc. 28th Int. Conf. Neural Inf. Process. Syst., pp. 2692–2700, 2015.
[55] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez,Ł. Kaiser and I. Polosukhin, “Attention is all you need,” in Proc. 31st Int. Conf. Neural Inf. Process. Syst., 2017, pp. 6000-6010.
[56] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004.
[57] R. J. Williams, “Simple statistical gradient following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, pp. 229–256, 1992.
[58] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. [Online]. Available: arXiv: 1412.6980.
[59] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, 2018.
[60] S. Mehrotra, “On the implementation of a primal-dual interior point method,” SIAM J. Optim., vol. 2, no. 4, pp. 575–601, 1992
[61] X. Qi, M. Peng, and H. Zhang, “Joint mmWave Beamforming and Resource Allocation in NOMA-MEC Network for Internet of Things,” IEEE Trans. Veh. Technol., vol. 72, no. 4, pp. 4969-4980, Apr. 2023.
[62] J. Zhang, J. Fan, J. Zhang, D. W.K.Ng, Q. Sun and B. Ai, “Performance Analysis and Optimization of NOMA-Based Cell-Free Massive MIMO for IoT,” IEEE Internet Things J., vol. 9, no. 12, pp. 9625-9639, June. 2022.
[63] W., Christoph. “Detection and precoding for multiple input multiple output channels.” Ph.D. dissertation., Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany, 2004.
[64] Taesang Yoo and A. Goldsmith, “Capacity and power allocation for fading MIMO channels with channel estimation error”, IEEE Transactions on Information Theory, vol. 52, no. 5, pp. 2203-2214, May 2006.
[65] S. Han, S. Ahn, E. Oh and D. Hong, “Effect of Channel-Estimation Error on BER Performance in Cooperative Transmission,” IEEE Trans. Veh. Technol., vol. 58, no. 4, pp. 2083-2088, May 2009.
[66] Z. Yang, Z. Ding, P. Fan and G. K. Karagiannidis, “On the Performance of Non-orthogonal Multiple Access Systems With Partial Channel Information,” IEEE Trans. Commun., vol. 64, no. 2, pp. 654-667, Feb. 2016.
[67] Y. Gao, B. Xia, Y. Liu, Y. Yao, K. Xiao and G. Lu, “Analysis of the Dynamic Ordered Decoding for Uplink NOMA Systems With Imperfect CSI,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 6647-6651, July 2018.
[68] F. Fang, H. Zhang, J. Cheng, S. Roy, and V.C. Leung. “Joint user scheduling and power allocation optimization for energy-efficient NOMA systems with imperfect CSI,” IEEE J. Sel. Areas Commun., vol. 35, no. 12, pp. 2874-2885, 2017.
[69] Zhang, H., Zhang, J. and Long, K., “Energy efficiency optimization for NOMA UAV network with imperfect CSI,” IEEE J. Sel. Areas Commun., vol. 38, no. 12, pp. 2798-2809, 2020.
[70] D. W. K. Ng, E. S. Lo and R. Schober, “Energy-efficient resource allocation in OFDMA systems with large numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 11, no. 9, pp. 3292–3304, Sep. 2012.

$\displaystyle\Pr[c_{n}<r_{n}\|{\hat{\alpha}}_{n}]=$	$\displaystyle\Pr\left[{\phi}_{n}<{\hat{\phi}}_{n}\|{\hat{\alpha}}_{n}\right]$
$\displaystyle=$	$\displaystyle\Pr\left[\frac{c^{S}_{n}}{c^{I}_{n}}<2^{\frac{r_{n}}{W}}-1\|{\hat{\alpha}}_{n}\right]$
$\displaystyle=$	$\displaystyle\Pr[E1]\cdot\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle+\Pr[E2]\cdot\Pr\left[c_{n}^{S}>{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out},$	(48)

	$\displaystyle\Pr\left[c_{n}^{I}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ =\Pr\left[c_{n}^{I}\geq{b}^{S}_{n}/(2^{\frac{r_{n}}{W}}-1)\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ =\Pr\left[\frac{{b}^{S}_{n}}{c_{n}^{I}}\leq 2^{\frac{r_{n}}{W}}-1\|{\hat{\alpha}}_{n}\right]\leq\epsilon_{out}/2,$		(51)

	$\displaystyle\Pr\left[c_{n}^{I}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ =\Pr\left[{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}\!\!\!\!\!\!\!\!\!\!\!\!\!\!{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}+{N_{0}}}\geq{b}^{I}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ \ \leq\frac{E\left[{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}}\right]}{{b}^{I}_{n}-N_{0}}$
	$\displaystyle\ \ \ \ \ \ =\frac{{\sum\limits_{\xi\left({n^{\prime}}\right)>\xi\left(n\right),\forall n^{\prime}\in{\cal N}}{{p_{n^{\prime}}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n^{\prime}}}}}}{{b}^{I}_{n}-N_{0}}=\epsilon_{out}/2,$		(55)

	$\displaystyle\Pr\left[c_{n}^{S}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =\Pr\left[{p_{n}}\|\alpha_{n}\|^{2}{{\overline{g}}_{n}}\leq{b}^{S}_{n}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =\Pr\left[{\left\|\alpha_{n}\right\|}^{2}\leq\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}\|{\hat{\alpha}}_{n}\right]$
	$\displaystyle\ \ \ \ \ =F_{{\left\|\alpha_{n}\right\|}^{2}}\left(\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}\right)$
	$\displaystyle\ \ \ \ \ =1-Q_{1}\left(\sqrt{\frac{2\left\|{\hat{\alpha}}_{n}\right\|^{2}}{\sigma_{\epsilon}^{2}}},\sqrt{\frac{2}{\sigma_{\epsilon}}\frac{{b}^{S}_{n}}{p_{n}{\overline{g}}_{n}}}\right)$		(56)