This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep Learning-Assisted Parallel Interference Cancellation for Grant-Free NOMA in Machine-Type Communication

Yongjeong Oh, Jaehong Jo, Byonghyo Shim, and Yo-Seb Jeon Yongjeong Oh, Jaehong Jo, and Yo-Seb Jeon are with the Department of Electrical Engineering, POSTECH, Pohang, Gyeongbuk 37673, Republic of Korea (e-mails: {yongjeongoh, jaehongjo, yoseb.jeon}@postech.ac.kr).Byonghyo Shim is with the Institute of New Media and Communications and the Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, Republic of Korea (e-mail: [email protected]).
Abstract

In this paper, we present a novel approach for joint activity detection (AD), channel estimation (CE), and data detection (DD) in uplink grant-free non-orthogonal multiple access (NOMA) systems. Our approach employs an iterative and parallel interference removal strategy inspired by parallel interference cancellation (PIC), enhanced with deep learning to jointly tackle the AD, CE, and DD problems. Based on this approach, we develop three PIC frameworks, each of which is designed for either coherent or non-coherence schemes. The first framework performs joint AD and CE using received pilot signals in the coherent scheme. Building upon this framework, the second framework utilizes both the received pilot and data signals for CE, further enhancing the performances of AD, CE, and DD in the coherent scheme. The third framework is designed to accommodate the non-coherent scheme involving a small number of data bits, which simultaneously performs AD and DD. Through joint loss functions and interference cancellation modules, our approach supports end-to-end training, contributing to enhanced performances of AD, CE, and DD for both coherent and non-coherent schemes. Simulation results demonstrate the superiority of our approach over traditional techniques, exhibiting enhanced performances of AD, CE, and DD while maintaining lower computational complexity.

Index Terms:
Grant-free non-orthogonal multiple access, parallel interference cancellation, deep learning, end-to-end training, non-coherent scheme.

I Introduction

With the rapid growth of the Internet of Things (IoT), massive machine-type communication (mMTC) has been recognized as one of the core services for 5G and 6G [1, 2, 3, 4]. While there are a vast number of IoT devices in the network, only a fraction of them are activated at a given time, and they transmit small amounts of sensing, control, and command information. Traditional grant-based orthogonal multiple access will be clearly inefficient because it requires a complicated handshaking mechanism, even when a device transmits a small number of data bits [5, 6].

To overcome the problem, grant-free non-orthogonal multiple access (NOMA) has emerged as a promising solution for mMTC [7, 8, 6, 9, 10, 11, 12, 13]. Overall, grant-free NOMA systems can be classified into two classes: (i) the coherent scheme and (ii) the non-coherent scheme. In the coherent scheme, each active device transmits a unique spreading sequence as a pilot signal, followed by data signals, without engaging in a resource-intensive handshaking process [10]. Upon receiving the superimposed signal from the active devices, the base station (BS) identifies their activities and channels through activity detection (AD) and channel estimation (CE), leveraging the knowledge of their spreading sequences. After AD and CE, the BS then performs data detection (DD) based on the estimated activities and channels.

In the non-coherent scheme, each device is assigned a set of spreading sequences and, upon activation, transmits one of these sequences based on the data bits [14, 11, 12]. After receiving the superimposed signal, the BS performs the joint estimation of device activities and data bits by identifying the transmitted spreading sequences. Often, the choice of coherent and non-coherent types depends on the number of data bits [11]. In particular, in mMTC scenarios where the number of data bits is small, the non-coherent scheme is preferred. Whereas, when the number of data bits is large, the coherent scheme is popularly used since the non-coherent scheme suffers from expensive computational complexity.

The aforementioned grant-free NOMA schemes do not require an explicit handshaking process, which leads to a significant reduction in the communication latency. Furthermore, by sharing the same time-frequency resources among multiple devices, better utilization of communication resources can be achieved. Despite these advantages, they also face several problems. For example, since the IoT devices outnumber the communication resource units, it is very difficult to assign orthogonal spreading sequences to each device, which causes collisions among devices and subsequent performance degradation [10, 15].

I-A Prior Works

In recent years, to address the challenge we mentioned, a compressed sensing (CS) approach has been popularly used for AD and CE in the coherent scheme [16, 17, 18, 19] and AD and DD in the non-coherent scheme [14, 11, 12]. This approach exploits the sparse activity patterns exhibited by numerous devices. It also provides performance guarantees for AD, CE, and DD in an asymptotic regime where the number of devices and the length of spreading sequences approach infinity while maintaining a fixed ratio between them [11]. Despite these advantages, CS-based techniques have certain limitations and practical issues to address. First, since the length of the spreading sequence is finite due to the limited coherence time, the performance guarantees obtained in the asymptotic regime cannot be applied. This challenge becomes particularly noticeable in critical or low-latency mMTC scenarios, where a short spreading sequence is essential to satisfy the latency requirements [20, 21]. Second, the restricted isometric property (RIP), which is a fundamental requirement for ensuring the accuracy of CS recovery techniques, might be violated when the number of active devices is large [22]. These inherent limitations of the CS approach can result in significant performance degradation for AD, CE, and DD, thereby constraining its broader applicability to diverse grant-free NOMA scenarios.

Successive interference cancellation (SIC) has been recognized as a promising alternative to tackle AD, CE, and DD problems in grant-free NOMA [23, 24, 25]. The main advantage of the SIC lies in its ability to iteratively eliminate interference, thereby enhancing the overall system performance. However, in the absence of accurate knowledge about activity and channel gain, determining the proper cancellation order becomes challenging. This difficulty may result in significant error propagation, leading to a degradation in overall performance [25]. Parallel interference cancellation (PIC) can address the aforementioned problem as this approach does not require a prior knowledge about the channel gain order [26, 27, 28]. Furthermore, the PIC approach can achieve lower latency compared to the SIC approach [28]. Despite these benefits, most PIC techniques to date do not fully account for certain unique characteristics of grant-free NOMA systems, such as non-orthogonal spreading sequences and sparse activity patterns. Neglecting these characteristics can potentially lead to error propagation, thereby limiting the performance gains offered by PIC. Moreover, to the best of our knowledge, none of the existing works have investigated the advantages of a PIC approach for AD, CE, and DD in grant-free NOMA, despite its potential advantages in addressing the limitations of the CS and SIC approaches.

I-B Contributions

In this paper, we propose a deep learning (DL)-assisted PIC approach for grant-free NOMA systems. Specifically, we develop three PIC frameworks based on the proposed approach: (i) the pilot-only PIC, (ii) the data-aided PIC, and (iii) the non-coherent PIC. The key idea of these frameworks is to eliminate inter-user interference in an iterative and parallel manner inspired by PIC, while jointly optimizing the AD, CE, and DD performances via end-to-end training facilitated by DL. Notably, our frameworks surpass existing techniques in both performance and computational efficiency without requiring prior knowledge of channel gain orders and device activities. The major contributions of this paper can be summarized as follows:

  • We present a DL-assisted pilot-only PIC framework for the coherent scheme in grant-free NOMA. The proposed framework comprises multiple stages, each incorporating three modules: CE, IC, and AD modules. Each module performs its specified task, while simultaneously collaborating to enhance the overall performance of the grant-free NOMA system. Furthermore, the proposed framework supports end-to-end training, which can improve overall performance by training complex relationships and dependencies between different modules.

  • We extend the pilot-only PIC framework into the data-aided PIC framework. The most prominent feature of this framework is that it utilizes the received data signals as well as the received pilot signals for CE. To this end, in the data-aided framework, both CE and DD modules are included as a part of end-to-end learning, while PIC is applied to both the received pilot and data signals. By incorporating additional information for the CE module, this framework not only improves CE performance but also enhances AD and DD performances compared to the pilot-only PIC framework.

  • We extend the pilot-only PIC framework into the non-coherent PIC framework. The distinctive feature of this framework is its ability to simultaneously perform AD and DD, using a single unified module. By doing so, the non-coherent PIC framework effectively addresses the inherent inefficiency of coherent schemes, which necessitate the transmission of pilot signals even when dealing with small data bits.

  • Through extensive numerical evaluation, we demonstrate the superior performance of our frameworks compared to traditional techniques. Our simulation results show that, in a scenario with 2020 devices, 1818-length coherence interval, and 22 data bits, the proposed pilot-only PIC framework achieves more than a 1.251.25-fold decrease in both AD and DD errors and a 3.33.3dB decrease in CE error compared to the AMP technique. Furthermore, it is demonstrated that, in the same scenario above, the proposed data-aided PIC framework achieves more than a twofold decrease in both AD and DD errors and a 6.96.9dB decrease in CE error compared to the AMP technique. Moreover, the proposed pilot-only and data-aided PIC frameworks achieves these improvements with less computational complexities. The effectiveness of the proposed non-coherent PIC framework is also verified when the number of data bits is small. The computational complexities of the proposed frameworks are also analyzed in terms of simulation time.

Organization

The remainder of the paper is organized as follows. In Section II, we first introduce an uplink grant-free NOMA system with coherent and non-coherent schemes. We then summarize the key challenges inherent in this system. In Section III, we present the proposed pilot-only PIC framework that tackles the key challenges in grant-free NOMA. In Section IV, we discuss the extensions of the pilot-only PIC framework, focusing on the data-aided PIC and non-coherent PIC frameworks. In Section V, we provide simulation results that demonstrate the superiority of the proposed frameworks. Finally, in Section VI, we present our conclusions and future research directions.

Notation

Upper-case and lower-case boldface letters denote matrices and column vectors, respectively. 𝔼[]\mathbb{E}[\cdot] is the statistical expectation and ()\mathbb{P}(\cdot) is the probability. 𝖱𝖾{}{\sf Re}\{\cdot\} and 𝖨𝗆{}{\sf Im}\{\cdot\} denote real and imaginary components, respectively. 𝒂2\|{\bm{a}}\|_{2} and 𝒂1\|{\bm{a}}\|_{1} are the Euclidean norm and Manhattan norm of a real vector 𝒂{\bm{a}}, respectively. \lfloor\cdot\rfloor is the floor function. 𝟎n{\bm{0}}_{n} is the vector with all entries equal to zero. 𝑰N{\bm{I}}_{N} is an NN by NN identity matrix.

II System Model

In this section, we first describe the uplink grant-free NOMA system with two communication schemes, and then discuss key challenges behind the practical deployment of these systems.

Refer to caption
Figure 1: Illustration of the uplink grant-free NOMA system considered in our work.

II-A Uplink Grant-Free NOMA System

We consider an uplink grant-free NOMA system comprising KK devices and one BS, as illustrated in Fig. 1. Without loss of generality, we assume that both the devices and the BS are equipped with a single antenna. The channel between device kk and the BS is modeled as follows [11, 12]:

gk=βkhk,\displaystyle g_{k}=\sqrt{\beta_{k}}h_{k}, (1)

where βk\beta_{k} is a large-scale fading coefficient and hk𝒞𝒩(0,1)h_{k}\sim\mathcal{CN}(0,1) is a small-scale fading channel. The channel gkg_{k} remains constant during the transmission of τ\tau samples. The device kk becomes active with a probability of ϵ\epsilon and aims to transmit JJ data bits [b1,b2,,bJ]𝖳{0,1}J[b_{1},b_{2},\ldots,b_{J}]^{\sf T}\in\{0,1\}^{J} to the BS. Under this system, two communication schemes can be considered [10, 11]: (i) coherent scheme and (ii) non-coherent scheme. The overall diagram of these two schemes is illustrated in Fig. 2, with detailed explanations provided below.

Refer to caption
Figure 2: Overall diagram of two communication schemes in grant-free NOMA: (i) coherent scheme and (ii) non-coherent scheme.
  • Coherent scheme: In this scheme, each device kk has a unique spreading sequence 𝒔kL{\bm{s}}_{k}\in\mathbb{C}^{L}, known to the BS. All spreading sequences have unit power, i.e., 𝒔k2=1,k\|{\bm{s}}_{k}\|_{2}=1,\forall k, and their lengths are smaller than the number of devices, i.e., L<KL<K. When a certain device kk is active, it modulates JJ data bits to DD symbols. The ddth symbol is represented as xk,d𝒞x_{k,d}\in\mathcal{C}, where 𝒞\mathcal{C} denotes a constellation set, and d{1,,D}d\in\{1,\ldots,D\}. Subsequently, active device kk spreads ddth data symbol xk,d𝒞x_{k,d}\in\mathcal{C} using the spreading sequence, generating 𝒔kxk,d{\bm{s}}_{k}x_{k,d}. During the coherent transmission, each active device kk first transmits its spreading sequence 𝒔k{\bm{s}}_{k} as a pilot signal for AD and CE, and then transmits the DD sequences {𝒔kxk,d}d\{{\bm{s}}_{k}x_{k,d}\}_{\forall d} as data signals to the BS. Under this transmission strategy, the sequence length LL should be less than τ/(D+1)\tau/(D+1) due to the limited channel coherence time. The received pilot and data signals at the BS are given by

    𝒚p\displaystyle{\bm{y}}_{\rm p} =k=1Kρ𝒔kakgk+𝒏p=k=1K𝒔kγk+𝒏p,\displaystyle=\sum_{k=1}^{K}\rho{\bm{s}}_{k}a_{k}g_{k}+{\bm{n}}_{\rm p}=\sum_{k=1}^{K}{\bm{s}}_{k}\gamma_{k}+{\bm{n}}_{\rm p}, (2)

    and

    𝒚d,d\displaystyle{\bm{y}}_{{\rm d},d} =k=1Kρ𝒔kakgkxk,d+𝒏d,d=k=1K𝒔kγkxk,d+𝒏d,d,\displaystyle=\sum_{k=1}^{K}\rho{\bm{s}}_{k}a_{k}g_{k}x_{k,d}+{\bm{n}}_{{\rm d},d}=\sum_{k=1}^{K}{\bm{s}}_{k}\gamma_{k}x_{k,d}+{\bm{n}}_{{\rm d},d}, (3)

    respectively, where ρ\rho is the transmission power of the devices, ak{0,1}a_{k}\in\{0,1\} is the device activity indicator with [ak=1]=ϵ\mathbb{P}[a_{k}=1]=\epsilon and [ak=0]=1ϵ\mathbb{P}[a_{k}=0]=1-\epsilon, and 𝒏p{\bm{n}}_{\rm p} and 𝒏d,d{\bm{n}}_{{\rm d},d} represent the additive white Gaussian noise (AWGN) at the BS. After receiving the superimposed signals in (2) and (3), the BS estimates the device activity indicators and the effective channels, subsequently detecting the data symbols {xk,d}k,d\{x_{k,d}\}_{\forall k,d}.

  • Non-coherent scheme: In this scheme, each device kk has a set of spreading sequences 𝒮k={𝒔k,1,𝒔k,2,,𝒔k,2J}\mathcal{S}_{k}=\{{\bm{s}}_{k,1},{\bm{s}}_{k,2},\ldots,{\bm{s}}_{k,2^{J}}\}, where each 𝒔k,jL{\bm{s}}_{k,j}\in\mathbb{C}^{L} is known to the BS and has unit power, i.e., 𝒔k,j2=1,k,j\|{\bm{s}}_{k,j}\|_{2}=1,\forall k,j. When a device kk is active, it chooses one of these sequences based on the JJ data bits. Then, the active device kk transmits this selected spreading sequence to the BS. Under the non-coherent transmission strategy, the sequence length LL should be less than τ\tau. The received signal at the BS is given by

    𝒚nc\displaystyle{\bm{y}}_{\rm nc} =k=1Kj=12Jρ𝒔k,jak,jgk+𝒏nc\displaystyle=\sum_{k=1}^{K}\sum_{j=1}^{2^{J}}\rho{\bm{s}}_{k,j}a_{k,j}g_{k}+{\bm{n}}_{\rm nc}
    =k=1Kj=12J𝒔k,jγk,j+𝒏nc,\displaystyle=\sum_{k=1}^{K}\sum_{j=1}^{2^{J}}{\bm{s}}_{k,j}\gamma_{k,j}+{\bm{n}}_{\rm nc}, (4)

    where 𝒏nc{\bm{n}}_{\rm nc} is the AWGN at the BS, ak,j{0,1}a_{k,j}\in\{0,1\} indicates whether or not jjth sequence of device kk is transmitted, and satisfies

    j=12Jak,j={1,with probability ϵ,0,with probability 1ϵ.\displaystyle\sum_{j=1}^{2^{J}}{a}_{k,j}=\begin{cases}{1,}&{\text{with probability }\epsilon},\\ {0,}&{\text{with probability }1-\epsilon}.\end{cases} (5)

    In this scheme, we use the notation ak=j=12Jak,j{0,1}a_{k}=\sum_{j=1}^{2^{J}}{a}_{k,j}\in\{0,1\} to represent the device activity indicator, which aligns with the terminology used in the coherent scheme. After receiving the superimposed signal in (II-A), the BS estimates {ak,j}k,j\{a_{k,j}\}_{\forall k,j}, and demodulates them to obtain data bits.

In the non-coherent scheme, the BS should take into account 2JK2^{J}K sequences, which are far longer than the KK sequences in the coherent scheme. Consequently, the non-coherent scheme exhibits expensive computational complexity, particularly when the number JJ of data bits is large. Despite this computational overhead, the non-coherent scheme offers superior performance in scenarios where JJ is small, as reported in [11].

Let a^k\hat{a}_{k}, γ^k\hat{\gamma}_{k}, b^k,j\hat{b}_{k,j} be the estimated activity indicator, effective channel, and jjth data bit of device kk, respectively. Then, the performance of AD is evaluated using the false alarm probability Pfa=[a^k=1|ak=0]P_{\rm fa}=\mathbb{P}[\hat{a}_{k}=1|a_{k}=0] and the miss detection probability Pmd=[a^k=0|ak=1]P_{\rm md}=\mathbb{P}[\hat{a}_{k}=0|a_{k}=1] [29]. The average error probability is given by

Perr=(1ϵ)Pfa+ϵPmd.\displaystyle P_{\rm err}=(1-\epsilon)P_{\rm fa}+\epsilon P_{\rm md}. (6)

If PfaP_{\rm fa} and PmdP_{\rm md} are equal, one can trivially have that Perr=Pfa=PmdP_{\rm err}=P_{\rm fa}=P_{\rm md}. The performance of CE is evaluated by the normalized mean-squared error (NMSE) defined as

NMSE=𝔼[k=1K|γ^kγk|2k=1K|γk|2].\displaystyle{\rm NMSE}=\mathbb{E}\left[\frac{\sum_{k=1}^{K}|\hat{\gamma}_{k}-{\gamma}_{k}|^{2}}{\sum_{k=1}^{K}|\gamma_{k}|^{2}}\right]. (7)

The performance of DD is assessed using the bit error rate (BER) defined as

BER=1(\displaystyle{\rm BER}=1-\big{(} [b^k,j=0,a^k=1|bk,j=0,ak=1]\displaystyle\mathbb{P}[\hat{b}_{k,j}=0,\hat{a}_{k}=1|b_{k,j}=0,a_{k}=1]
+\displaystyle+\, [b^k,j=1,a^k=1|bk,j=1,ak=1]\displaystyle\mathbb{P}[\hat{b}_{k,j}=1,\hat{a}_{k}=1|b_{k,j}=1,a_{k}=1]
+\displaystyle+\, [a^k=0|ak=0]),\displaystyle\mathbb{P}[\hat{a}_{k}=0|a_{k}=0]\big{)}, (8)

where, if ak=0a_{k}=0 and a^k=1\hat{a}_{k}=1, JJ bit errors are considered to occur for device kk, and vice versa.

II-B Key Challenges in Grant-Free NOMA

A major challenge in realizing grant-free NOMA in Sec. II-A is the limited length of the spreading sequence. This limitation is inevitable due to the finite channel coherence time and scarce radio resources available for data transmission. This challenge becomes even more crucial in scenarios involving critical or low-latency mMTC, where a small length of spreading sequences is necessary to meet stringent latency requirements. Due to this inherent limitation in spreading sequence length, it is impossible to guarantee the optimality of existing AD, CE, and DD techniques based on the CS approach. Another key challenge comes from the lack111Typically, in grant-free NOMA, attaining the above information may require additional communication overhead between the BS and each device, which can lead to increased resource consumption and potential delays in the communication process. of the knowledge of channel gain orders. This can lead to significant error propagation of the SIC approach for AD, CE, and DD techniques. Although existing PIC-based CE techniques do not require prior knowledge about channel gain orders, they can still suffer from significant error propagation in grant-free NOMA due to non-orthogonal spreading sequences and the absence of accurate knowledge about device activities. To establish grant-free NOMA as a viable solution, it is crucial to devise a comprehensive framework overcoming the aforementioned challenges without compromising the performance of AD, CE, and DD.

Refer to caption
Figure 3: Illustration of the pilot-only PIC framework with TT stages and KK AD modules.

III Proposed Pilot-Only PIC Framework

In this section, we present a pilot-only PIC framework based on DL, which enables the accurate and efficient joint estimation of device activities and their associated channels. Our key strategy is to eliminate inter-user interference from the received pilot signal 𝒚p{\bm{y}}_{\rm p} in an iterative and parallel manner. This strategy also involves addressing the CE problem with the consideration of sparse device activity patterns in grant-free NOMA, while tackling the AD problem. In what follows, we first introduce the overall structure of the proposed framework and then present details of the training procedure.

III-A Overall Structure

The proposed pilot-only PIC framework includes TT stages, each consisting of KK CE modules and KK IC modules. Following these stages, we integrate KK AD modules that estimate the activity of each device in a parallel manner. The high-level illustration of the proposed framework is presented in Fig. 3, while detailed explanations of each component of our framework are provided below.

III-A1 Input and Output

The proposed framework aims to accurately estimate both the effective channel γk{\gamma}_{k} and the device activity indicator aka_{k} from the received pilot signal 𝒚p{\bm{y}}_{\rm p}. To achieve this goal, we apply two preprocessing steps to γk{\gamma}_{k} and 𝒚p{\bm{y}}_{\rm p}: (i) real-domain conversion and (ii) standardization. In the real-domain conversion step, we transform the effective channel γk{\gamma}_{k} and the received signal 𝒚p{\bm{y}}_{\rm p}, originally represented in the complex domain, into the real domain: 𝜸¯k=[(γk),(γk)]𝖳2\bar{\bm{\gamma}}_{k}=[\Re({\gamma}_{k}),\Im({\gamma}_{k})]^{\sf T}\in\mathbb{R}^{2} and 𝒚¯p=[(𝒚p𝖳),(𝒚p𝖳)]𝖳2L\bar{\bm{y}}_{\rm p}=[\Re({\bm{y}}_{\rm p}^{\sf T}),\Im({\bm{y}}_{\rm p}^{\sf T})]^{\sf T}\in\mathbb{R}^{2L}. This conversion allows us to establish a relationship between 𝒚¯p\bar{\bm{y}}_{\rm p} and {𝜸¯k}k\{\bar{\bm{\gamma}}_{k}\}_{\forall k}, as follows:

𝒚¯p=k=1K[(𝒔k)(𝒔k)(𝒔k)(𝒔k)]𝜸¯k+[(𝒏p)(𝒏p)].\displaystyle\bar{\bm{y}}_{\rm p}=\sum_{k=1}^{K}\begin{bmatrix}\Re({\bm{s}}_{k})&-\Im({\bm{s}}_{k})\\ \Im({\bm{s}}_{k})&\Re({\bm{s}}_{k})\end{bmatrix}\bar{\bm{\gamma}}_{k}+\begin{bmatrix}\Re({\bm{n}}_{\rm p})\\ \Im({\bm{n}}_{\rm p})\end{bmatrix}. (9)

The next step involves standardization, a crucial process for improving the performance of the deep neural network (DNN). We first compute the standard deviations σγ¯\sigma_{\bar{\gamma}} and σy¯p\sigma_{\bar{y}_{\rm p}} for the entries of {𝜸¯k}k\{\bar{\bm{\gamma}}_{k}\}_{\forall k} and 𝒚¯p\bar{\bm{y}}_{\rm p}, respectively, utilizing the training dataset stored at the BS. After obtaining the standard deviations, we standardize input data as follows: 𝜸~k=𝜸¯k/σγ¯,k\tilde{\bm{\gamma}}_{k}=\bar{\bm{\gamma}}_{k}/\sigma_{\bar{\gamma}},\forall k and 𝒚~p(1)=𝒚¯p/σy¯p\tilde{\bm{y}}_{\rm p}^{(1)}=\bar{\bm{y}}_{\rm p}/\sigma_{\bar{y}_{\rm p}}. This process yields the standardized received signal given by

𝒚~p(1)=k=1K[(𝒔~k)(𝒔~k)(𝒔~k)(𝒔~k)]𝜸~k+1σy¯[(𝒏p)(𝒏p)],\displaystyle\tilde{\bm{y}}_{\rm p}^{(1)}=\sum_{k=1}^{K}\begin{bmatrix}\Re(\tilde{\bm{s}}_{k})&-\Im(\tilde{\bm{s}}_{k})\\ \Im(\tilde{\bm{s}}_{k})&\Re(\tilde{\bm{s}}_{k})\end{bmatrix}\tilde{\bm{\gamma}}_{k}+\frac{1}{\sigma_{\bar{y}}}\begin{bmatrix}\Re({\bm{n}}_{\rm p})\\ \Im({\bm{n}}_{\rm p})\end{bmatrix}, (10)

where 𝒔~k=(σγ¯/σy¯p)𝒔k,k\tilde{\bm{s}}_{k}=({\sigma_{\bar{\gamma}}}/{\sigma_{\bar{y}_{\rm p}}}){\bm{s}}_{k},\forall k. Building upon the relationship in (10), our DNN takes 𝒚~p(1){\tilde{\bm{y}}}_{\rm p}^{(1)} as input and aims to predict both {𝜸~k}k\{{\tilde{\bm{\gamma}}}_{k}\}_{\forall k} and {ak}k\{a_{k}\}_{\forall k} with high accuracy.

III-A2 Stage (KK CE modules and KK IC modules)

The core process of conventional PIC-based CE techniques is to eliminate inter-user interference from the received signal in an iterative and parallel manner [28, 27, 26]. This approach offers a distinct advantage over SIC-based CE techniques, as it does not require prior knowledge of channel gain orders. However, conventional PIC techniques are not designed for grant-free NOMA, and the use of such techniques in grant-free NOMA can introduce significant error propagation due to the use of non-orthogonal spreading sequences and the absence of accurate knowledge about device activities. To overcome this problem and fully harness the advantages of PIC, we incorporate DL and PIC. This integrating approach enables us to solve complex and intractable problems in grant-free NOMA, including AD, CE, and DD, based on a data-driven manner. Furthermore, we can improve both performance and computational efficiency by integrating communication knowledge related to PIC into DL, rather than treating DL as a black box. Given that our framework consists of a sequence of TT uniform stages, the subsequent sections in this part will focus on elucidating the operations in stage t{1,,T}t\in\{1,\ldots,T\}.

Let 𝜽CE(t,k){\bm{\theta}}_{\rm CE}^{(t,k)} be the parameter vector of the kkth CE module in stage tt. The mapping function of the CE module is defined as

𝜸^k(t)=f𝜽CE(t,k)(𝒚~p,k(t)).\displaystyle\hat{\bm{\gamma}}_{k}^{(t)}=f_{{\bm{\theta}}_{\rm CE}^{(t,k)}}(\tilde{\bm{y}}_{{\rm p},k}^{(t)}). (11)

Here, 𝒚~p,k(t)\tilde{\bm{y}}_{{\rm p},k}^{(t)} represents an output of the IC module, defined as

𝒚~p,k(t)=𝒚~p(1)i=1,ikK[(𝒔~i)(𝒔~i)(𝒔~i)(𝒔~i)]𝜸^i(t1),\displaystyle\tilde{\bm{y}}_{{\rm p},k}^{(t)}=\tilde{\bm{y}}_{\rm p}^{(1)}-\sum_{i=1,i\neq k}^{K}\begin{bmatrix}\Re(\tilde{\bm{s}}_{i})&-\Im(\tilde{\bm{s}}_{i})\\ \Im(\tilde{\bm{s}}_{i})&\Re(\tilde{\bm{s}}_{i})\end{bmatrix}\hat{\bm{\gamma}}_{i}^{(t-1)}, (12)

where 𝜸^i(0)=𝟎2,i{1,,K}\hat{\bm{\gamma}}_{i}^{(0)}={\bm{0}}_{2},\forall i\in\{1,\ldots,K\}. In our IC module, interferences with estimated channels in the previous stage are removed from the superimposed received signal in (10). This interference removal process is carried out in parallel, similar to conventional PIC, and is essential for improving CE accuracy. It is noteworthy that all CE modules in our framework allow back-propagation of the gradients and, therefore, can be jointly trained in an end-to-end manner.

III-A3 AD Module

In the AD module, we employ a DNN, which is individually parameterized by 𝜽AD(k){\bm{\theta}}_{\rm AD}^{(k)} for each device kk. Each AD module, associated with device kk, estimates the device activity indicator aka_{k} from the received signal after interference removal processes across TT stages. This design facilitates efficient AD without the need for an excessively complex or over-parameterized model.

For the final layer of all KK AD modules, we utilize the sigmoid activation function. This decision makes our training process smoother as we use true device activity indicators represented by binary labels. The mapping function for each AD module is represented as follows:

pk=f𝜽AD(k)(𝒚~p,k(T+1)),\displaystyle p_{k}=f_{{\bm{\theta}}_{\rm AD}^{(k)}}(\tilde{\bm{y}}_{{\rm p},k}^{(T+1)}), (13)

where pkp_{k} is the activity score within the range of (0,1)(0,1) and 𝒚~p,k(T+1)\tilde{\bm{y}}_{{\rm p},k}^{(T+1)} is the output from the last stage of the proposed framework. It should be noted that 𝒚~p,k(T+1)\tilde{\bm{y}}_{{\rm p},k}^{(T+1)} represents the received signal after interference removal processes during the previous TT stages. This fact implies that the device activity can be more accurately estimated from 𝒚~p,k(T+1)\tilde{\bm{y}}_{{\rm p},k}^{(T+1)} than simply using the original received signal 𝒚~p(1)\tilde{\bm{y}}_{\rm p}^{(1)} without interference removal.

III-A4 Final Decision

In the grant-free NOMA system, the activity indicator should be either zero or one. To do so, we perform the thresholding of pkp_{k} to generate the activity indicator aka_{k} for device kk. That is,

a^k={1,if pkτ,0,otherwise,\displaystyle\hat{a}_{k}=\begin{cases}{1,}&{\text{if }p_{k}\geq\tau},\\ {0,}&{\text{otherwise}},\end{cases} (14)

where τ\tau is a pre-defined threshold that balances the trade-off between the false-alarm and miss-detection probabilities. It should be noted we treat τ\tau as a hyper-parameter to provide flexibility applicable to various grant-free NOMA scenarios. Meanwhile, the estimated channel should take on zero value when the device is determined to be inactive. To ensure this, we set an estimate of the channel based on the estimated activity indicator in (14) and the standardization in (10) as follows:

γ^k=a^kσγ¯(γ^k,1(T)+jγ^k,2(T)),\displaystyle\hat{\gamma}_{k}=\hat{a}_{k}\sigma_{\bar{\gamma}}\Big{(}\hat{\gamma}_{k,1}^{(T)}+j\hat{\gamma}_{k,2}^{(T)}\Big{)}, (15)

where γ^k,i(T)\hat{\gamma}_{k,i}^{(T)} is the iith component of 𝜸^k(T)\hat{\bm{\gamma}}_{k}^{(T)}. After obtaining the estimated effective channel in (15), various detection methods, including minimum mean squared error, SIC, or maximal-ratio combining (MRC), can be employed to detect the symbols {xk,d}k,d\{x_{k,d}\}_{\forall k,d}. In this work, we focus on the MRC [11] for simplicity:

𝒚¯d,d,k\displaystyle\bar{{\bm{y}}}_{{\rm d},d,k} =𝒔k𝖧γ^k𝒚d,d\displaystyle=\frac{{\bm{s}}_{k}^{\sf H}}{{\hat{\gamma}_{k}}}{\bm{y}}_{{\rm d},d}
=γkγ^kxk,d+1γ^ki=1,ikK𝒔k𝖧𝒔iγixi,d+𝒔k𝖧γ^k𝒏d,d,\displaystyle=\frac{{\gamma}_{k}}{\hat{\gamma}_{k}}x_{k,d}+\frac{1}{{\hat{\gamma}_{k}}}\sum_{i=1,i\neq k}^{K}{\bm{s}}_{k}^{\sf H}{\bm{s}}_{i}\gamma_{i}x_{i,d}+\frac{{\bm{s}}_{k}^{\sf H}}{{\hat{\gamma}_{k}}}{\bm{n}}_{{\rm d},d}, (16)

Note that MRC is applied to the devices with a^k=1\hat{a}_{k}=1. Subsequently, x^k,d\hat{x}_{k,d} is determined from 𝒚¯d,d,k\bar{{\bm{y}}}_{{\rm d},d,k} using the maximum likelihood detector.

III-B Training Procedure

In the grant-free NOMA, the CE and AD problems are categorized as regression and classification problems. These two tasks are closely related, as both tasks rely on the same received signal and have dependencies stemming from activity indicators. To accurately estimate both {𝜸~k}k\{\tilde{\bm{\gamma}}_{k}\}_{\forall k} and {ak}k\{a_{k}\}_{\forall k}, we employ a joint loss function that can optimize the performance of both regression and classification tasks.

Firstly, we use the regression loss function to measure the discrepancy between the estimated and true channels.

Reg=12Kt=1Twtk=1K𝜸^k(t)𝜸~k1,\displaystyle\mathcal{L}_{\rm Reg}=\frac{1}{2K}\sum_{t=1}^{T}w_{t}\sum_{k=1}^{K}\|\hat{\bm{\gamma}}_{k}^{(t)}-\tilde{\bm{\gamma}}_{k}\|_{1}, (17)

where wtw_{t} is the weight associated with each stage. In the regression loss in (17), we leverage the mean absolute error (MAE) instead of the more commonly used mean squared error (MSE) in regression tasks [30, 31]. Our motivation behind this choice comes from the sparse characteristic of effective channels {𝜸~k}k\{\tilde{\bm{\gamma}}_{k}\}_{\forall k} in grant-free NOMA systems. Specifically, this characteristic implies that while the majority of devices exhibit zero channel gains, a small subset has much higher channel gains, resulting in outliers within the dataset. Traditionally, the MSE loss serves as a prevalent metric used to evaluate the performance of CE [30, 31]. However, in the presence of outliers in effective channels, the MSE loss may prioritize minimizing errors, particularly for these outliers, while potentially neglecting the majority of the data [32]. In contrast, MAE is known to be robust to outliers [32], aligning well with the characteristics of grant-free NOMA scenarios. Therefore, our decision to use MAE enhances the overall performance and robustness of our framework while mitigating error propagation.

Next, we present the classification loss function which measures the difference between the activity scores and the true activity indicators as follows:

Class=1Kk=1K[aklogpk+(1ak)log(1pk)].\displaystyle\mathcal{L}_{\rm Class}=-\frac{1}{K}\sum_{k=1}^{K}\left[a_{k}\log p_{k}+(1-a_{k})\log(1-p_{k})\right]. (18)

This loss function is commonly known as the binary cross-entropy (BCE) loss function and is highly effective for binary label training. By adopting the BCE loss function, we can effectively train KK AD modules as well as the entire framework. Combining the loss functions in (17) and (18), we obtain the joint loss function:

=λClass+(1λ)Reg,\displaystyle\mathcal{L}=\lambda\mathcal{L}_{\rm Class}+(1-\lambda)\mathcal{L}_{\rm Reg}, (19)

where λ(0,1)\lambda\in(0,1) is a hyper-parameter determined by the relative importance of CE performance compared to AD performance.

Our proposed framework offers end-to-end training facilitated by the IC module and joint loss function (19), enhancing performance by capturing complex inter-module relationships. Our framework also addresses the vanishing gradient problem by computing MAE loss at each stage tt, not just the final stage. Additionally, our framework integrates joint training for regression (CE) and classification (AD) tasks. This feature not only streamlines the training pipeline but also reduces the size of the DNN compared to strategies that separate regression and classification tasks.

IV Extension of the Pilot-Only PIC

In this section, we introduce two promising extensions of the pilot-only PIC framework. The first extension, referred to as data-aided PIC, is designed to extract common channel information shared between the received pilot signal 𝒚p{\bm{y}}_{\rm p} and the received data signals {𝒚d,d}d\{{\bm{y}}_{{\rm d},d}\}_{\forall d}. By jointly utilizing received pilot and data signals, this framework significantly enhances the overall performance in coherent scheme. The second extension, referred to as non-coherent PIC, is designed to support non-coherent scheme in grant-free NOMA scenarios, transmitting a small number of data bits without spreading sequences. Leveraging the benefits of non-coherent scheme, this framework provides a valuable solution for enhancing communication efficiency in grant-free NOMA, particularly when the number of data bits to be transmitted is small. In the remainder of this section, we first introduce the data-aided PIC framework and then present the non-coherent PIC framework.

Refer to caption
Figure 4: Illustration of the ttth stage in the data-aided PIC framework, consisting of KK CE modules, KK IC modules, and KK DD modules.

IV-A Data-Aided PIC

The proposed data-aided PIC framework includes TT uniform stages, each consisting of KK CE modules, KK IC modules, and KK DD modules. A notable distinction between the data-aided PIC framework and the pilot-only PIC framework lies in the inclusion of the DD module and the exclusion of the AD module. The DD module is responsible for estimating both the device activity indicator aka_{k} and the data symbols {xk,d}d\{x_{k,d}\}_{\forall d}. Another notable distinction is the input of the CE module, which includes {𝒚d,d}d\{{\bm{y}}_{{\rm d},d}\}_{\forall d} as well as 𝒚p{\bm{y}}_{\rm p} to extract the common channel information between received signals. The high-level illustration of the ttth stage of the data-aided PIC framework is depicted in Fig. 4, with detailed explanations provided below.

IV-A1 Input and Output

The proposed data-aided PIC framework aims to accurately estimate the effective channels {γk}k\{\gamma_{k}\}_{\forall k}, device activity indicators {ak}k\{a_{k}\}_{\forall k}, and data symbols {xk,d}k,d\{x_{k,d}\}_{\forall k,d} from the received signals 𝒚p{\bm{y}}_{\rm p} and {𝒚d,d}d\{{\bm{y}}_{{\rm d},d}\}_{\forall d}. To achieve this goal, real-domain conversion and standardization are applied to {γk}k\{\gamma_{k}\}_{\forall k}, 𝒚p{\bm{y}}_{\rm p}, and {𝒚d,d}d\{{\bm{y}}_{{\rm d},d}\}_{\forall d}, as described in Sec. III-A. The resulting effective channel and received signals are denoted as {𝜸~k}k\{\tilde{{\bm{\gamma}}}_{k}\}_{\forall k}, 𝒚~p(1)\tilde{\bm{y}}_{\rm p}^{(1)}, and {𝒚~d,d(1)}d\{\tilde{\bm{y}}_{{\rm d},d}^{(1)}\}_{\forall d}. Our DNN takes 𝒚~p(1)\tilde{\bm{y}}_{\rm p}^{(1)} and {𝒚~d,d(1)}d\{\tilde{\bm{y}}_{{\rm d},d}^{(1)}\}_{\forall d} as input and aims to predict {𝜸~k}k\{\tilde{{\bm{\gamma}}}_{k}\}_{\forall k}, {ak}k\{a_{k}\}_{\forall k}, and {xk,d}k,d\{x_{k,d}\}_{\forall k,d}.

IV-A2 Stage (KK CE modules, KK IC modules, and KK DD modules)

In each stage of the data-aided PIC framework, the effective channel 𝜸~k\tilde{\bm{\gamma}}_{k} is initially estimated, followed by the estimation of data symbols {xk,d}d\{x_{k,d}\}_{\forall d} based on the obtained effective channel. After obtaining the estimated channel and data symbols, the PIC is performed. Let 𝜽CE(t,k){\bm{\theta}}_{\rm CE}^{(t,k)} be the parameter vector of the kkth CE module in stage tt. The mapping function of the CE module is defined as

𝜸^k(t)=f𝜽CE(t,k)(𝒚~p,k(t),{𝒚~d,d,k(t)}d),\displaystyle\hat{\bm{\gamma}}_{k}^{(t)}=f_{{\bm{\theta}}_{\rm CE}^{(t,k)}}(\tilde{\bm{y}}_{{\rm p},k}^{(t)},\{\tilde{\bm{y}}_{{\rm d},d,k}^{(t)}\}_{\forall d}), (20)

where 𝒚~p,k(t)\tilde{\bm{y}}_{{\rm p},k}^{(t)} and 𝒚~d,d,k(t)\tilde{\bm{y}}_{{\rm d},d,k}^{(t)} represent the outputs of the kkth IC module in stage (t1)(t-1) whose definitions will be provided later. Let 𝜽DD(t,k){\bm{\theta}}_{\rm DD}^{(t,k)} be the parameter vector of the kkth DD module in stage tt. For the last layer of all DD modules, the sigmoid activation function is utilized. The mapping function of the DD module is represented as

{𝒑d,d,k(t)}d=f𝜽DD(t,k)({𝒚~d,d,k(t)}d,𝜸^k(t)),\displaystyle\{{\bm{p}}_{{\rm d},d,k}^{(t)}\}_{\forall d}=f_{{\bm{\theta}}_{\rm DD}^{(t,k)}}(\{\tilde{\bm{y}}_{{\rm d},d,k}^{(t)}\}_{\forall d},\hat{\bm{\gamma}}_{k}^{(t)}), (21)

where 𝒑d,d,k(t)=[𝒑d,d,k,1(t),,𝒑d,d,k,|𝒞|(t)]𝖳(0,1)|𝒞|{\bm{p}}_{{\rm d},d,k}^{(t)}=[{\bm{p}}_{{\rm d},d,k,1}^{(t)},\ldots,{\bm{p}}_{{\rm d},d,k,|\mathcal{C}|}^{(t)}]^{\sf T}\in(0,1)^{|\mathcal{C}|} and 𝒑d,d,k,c(t){\bm{p}}_{{\rm d},d,k,c}^{(t)} implies the probability that xk,dx_{k,d} is the ccth symbol in the constellation 𝒞\mathcal{C}. Given {𝒑d,d,k(t)}d\{{\bm{p}}_{{\rm d},d,k}^{(t)}\}_{\forall d} and 𝜸^k(t)\hat{\bm{\gamma}}_{k}^{(t)}, the IC module initiates the following PIC processes:

𝒚~p,k(t+1)\displaystyle\tilde{\bm{y}}_{{\rm p},k}^{(t+1)} =𝒚~p(1)i=1,ikK[(𝒔~i)(𝒔~i)(𝒔~i)(𝒔~i)]𝜸^i(t),\displaystyle=\tilde{\bm{y}}_{\rm p}^{(1)}-\sum_{i=1,i\neq k}^{K}\begin{bmatrix}\Re(\tilde{\bm{s}}_{i})&-\Im(\tilde{\bm{s}}_{i})\\ \Im(\tilde{\bm{s}}_{i})&\Re(\tilde{\bm{s}}_{i})\end{bmatrix}\hat{\bm{\gamma}}_{i}^{(t)}, (22)

and

𝒚~d,d,k(t+1)\displaystyle\tilde{\bm{y}}_{{\rm d},d,k}^{(t+1)} =𝒚~d,d(1){i=1,ikK[(𝒔~i)(𝒔~i)(𝒔~i)(𝒔~i)]\displaystyle=\tilde{\bm{y}}_{{\rm d},d}^{(1)}-\Bigg{\{}\sum_{i=1,i\neq k}^{K}\begin{bmatrix}\Re(\tilde{\bm{s}}_{i})&-\Im(\tilde{\bm{s}}_{i})\\ \Im(\tilde{\bm{s}}_{i})&\Re(\tilde{\bm{s}}_{i})\end{bmatrix}
×[γ^i,1(t)γ^i,2(t)γ^i,2(t)γ^i,1(t)][(𝒙𝒞𝖳𝒑d,d,k(t))(𝒙𝒞𝖳𝒑d,d,k(t))]},\displaystyle~{}~{}~{}\times\begin{bmatrix}\hat{\gamma}_{i,1}^{(t)}&-\hat{\gamma}_{i,2}^{(t)}\\ \hat{\gamma}_{i,2}^{(t)}&\hat{\gamma}_{i,1}^{(t)}\end{bmatrix}\begin{bmatrix}\Re({\bm{x}}_{\mathcal{C}}^{\sf T}{\bm{p}}_{{\rm d},d,k}^{(t)})\\ \Im({\bm{x}}_{\mathcal{C}}^{\sf T}{\bm{p}}_{{\rm d},d,k}^{(t)})\end{bmatrix}\Bigg{\}}, (23)

where 𝒙𝒞{\bm{x}}_{\mathcal{C}} is a symbol vector comprising all symbols in 𝒞\mathcal{C}. If the constellation set is binary phase-shift keying (BPSK), i.e., 𝒞={1,1}\mathcal{C}=\{-1,1\}, the PIC process in (IV-A2) can be simplified as follows:

𝒚~d,d,k(t+1)\displaystyle\tilde{\bm{y}}_{{\rm d},d,k}^{(t+1)} =𝒚~d,d(1)i=1,ikK[(𝒔~i)(𝒔~i)(𝒔~i)(𝒔~i)]\displaystyle=\tilde{\bm{y}}_{{\rm d},d}^{(1)}-\sum_{i=1,i\neq k}^{K}\begin{bmatrix}\Re(\tilde{\bm{s}}_{i})&-\Im(\tilde{\bm{s}}_{i})\\ \Im(\tilde{\bm{s}}_{i})&\Re(\tilde{\bm{s}}_{i})\end{bmatrix}
×𝜸^i(t)(pd,d,k,1(t)pd,d,k,2(t)),\displaystyle~{}~{}~{}\times\hat{\bm{\gamma}}_{i}^{(t)}(p_{{\rm d},d,k,1}^{(t)}-p_{{\rm d},d,k,2}^{(t)}), (24)

where pd,d,k,1(t)p_{{\rm d},d,k,1}^{(t)} and pd,d,k,2(t)p_{{\rm d},d,k,2}^{(t)} represent the first and second components of 𝒑d,d,k(t){\bm{p}}_{{\rm d},d,k}^{(t)}, respectively, and have the meanings of [xk,d=1]\mathbb{P}[x_{k,d}=1] and [xk,d=1]\mathbb{P}[x_{k,d}=-1], respectively.

It should be noted that, in the design of the CE module, various DNN structures, including a convolutional neural network, can be considered to effectively extract common channel information from the received pilot and data signals. In our framework, we use the fully connected neural network (FCNN) whose input is structured as [(𝒚~p,k(t))𝖳,(𝒚~d,1,k(t))𝖳,,(𝒚~d,D,k(t))𝖳]𝖳[(\tilde{\bm{y}}_{{\rm p},k}^{(t)})^{\sf T},(\tilde{\bm{y}}_{{\rm d},1,k}^{(t)})^{\sf T},\ldots,(\tilde{\bm{y}}_{{\rm d},D,k}^{(t)})^{\sf T}]^{\sf T} in order to reduce the overall complexity. Similarly, the DD module also adopts an FCNN with its input structured as [(𝒚~d,1,k(t))𝖳,,(𝒚~d,D,k(t))𝖳,(𝜸^k(t))𝖳]𝖳[(\tilde{\bm{y}}_{{\rm d},1,k}^{(t)})^{\sf T},\ldots,(\tilde{\bm{y}}_{{\rm d},D,k}^{(t)})^{\sf T},(\hat{\bm{\gamma}}_{k}^{(t)})^{\sf T}]^{\sf T} and its output organized as [(𝒑d,1,k(t))𝖳,,(𝒑d,D,k(t))𝖳]𝖳[({\bm{p}}_{{\rm d},1,k}^{(t)})^{\sf T},\ldots,({\bm{p}}_{{\rm d},D,k}^{(t)})^{\sf T}]^{\sf T}.

IV-A3 Final Decision

The final procedure of the data-aided PIC framework is to estimate the device activity indicators {ak}k\{a_{k}\}_{\forall k} and the data bits {bk,d}k,j\{b_{k,d}\}_{\forall k,j}. To accomplish this, we first compute the maximum value of 𝒑d,d,k(T){\bm{p}}_{{\rm d},d,k}^{(T)} as

pmax,d,k=max{pd,d,k,1(T),,pd,d,k,|𝒞|(T)},\displaystyle{p}_{{\rm max},d,k}=\max\!\Big{\{}{p}_{{\rm d},d,k,1}^{(T)},\ldots,{p}_{{\rm d},d,k,|\mathcal{C}|}^{(T)}\Big{\}}, (25)

implying the maximum probability that a certain symbol in 𝒞\mathcal{C} is transmitted. Indeed, pmax,d,kp_{{\rm max},d,k} should be zero if the device does not transmit any symbol, i.e., the device is inactive. Our data-aided PIC framework can address this circumstance because the DD module with the sigmoid function can naturally produce low values of {pd,d,k,c(T)}c\{{p}_{{\rm d},d,k,c}^{(T)}\}_{\forall c} if the symbol is not transmitted. After obtaining {pmax,d,k}d\{{p}_{{\rm max},d,k}\}_{\forall d}, we compute their mean as

μk=1Dd=1Dpmax,d,k(0,1),\displaystyle\mu_{k}=\frac{1}{D}\sum_{d=1}^{D}p_{{\rm max},d,k}\in(0,1), (26)

which quantifies the probability of device kk being activated and transmitting DD symbols. Subsequently, we determine an estimate of the activity indicator for device kk as

a^k={1,if μkτ,0,otherwise,\displaystyle\hat{a}_{k}=\begin{cases}{1,}&{\text{if }\mu_{k}\geq\tau},\\ {0,}&{\text{otherwise}},\end{cases} (27)

We note that the data-aided PIC framework does not explicitly utilize the AD module, a component leveraged in the pilot-only PIC framework in Sec. III. This strategic choice not only reduces the complexity of the data-aided framework but also enhances its training efficiency compared to strategies that separate AD and DD tasks. After obtaining a^k\hat{a}_{k}, we finally determine an estimate of the symbol xk,dx_{k,d} as

x^k,d={x𝒞,c,if a^k=1 and pmax,d,k=pd,d,k,c(T),0,otherwise,\displaystyle\hat{x}_{k,d}=\begin{cases}{x_{\mathcal{C},c},}&{\text{if }\hat{a}_{k}=1\text{ and }p_{{\rm max},d,k}=p_{{\rm d},d,k,c}^{(T)}},\\ {0,}&{\text{otherwise}},\end{cases} (28)

where x𝒞,c{x_{\mathcal{C},c}} is the ccth component of 𝒙𝒞{\bm{x}}_{\mathcal{C}}. Here, the second line in (28) implies device kk is estimated to be inactive. Following this, an estimate of the data bit is readily determined through the demodulation process. Meanwhile, to evaluate the CE performance of the data-aided PIC framework, we can determine an estimate of the channel based on a^k\hat{a}_{k}, 𝜸^k(T)\hat{\bm{\gamma}}_{k}^{(T)}, and the equation in (15).

IV-A4 Training Procedure

The overall training strategy of the data-aided PIC framework is similar to that of the pilot-only PIC framework. We first categorize the CE and DD problems as regression and classification problems. These two tasks are closely related, as both tasks rely on the same received data signals and have dependencies stemming from activity indicators. To address this interdependence, we employ a joint loss function that can optimize the performance of both regression and classification tasks. This comprehensive training approach facilitates the effective learning and coordination of the two tasks within the data-aided PIC framework.

Firstly, we utilize the regression loss function, quantifying the disparity between the estimated channels and the true channels, as the MAE-based loss function in (17). Further details regarding the rationale for adopting this function are elaborated in Sec. III-B. By modifying the loss function in (18), we also present the classification loss function to measure the performance of data detection, as follows:

ClassDA=1KD|𝒞|t=1Tk=1Kd=1Dc=1|𝒞|ClassDA(t,d,k,c),\displaystyle\mathcal{L}_{\rm ClassDA}=\frac{1}{KD|\mathcal{C}|}\sum_{t=1}^{T}\sum_{k=1}^{K}\sum_{d=1}^{D}\sum_{c=1}^{|\mathcal{C}|}\mathcal{L}_{\rm ClassDA}^{(t,d,k,c)}, (29)

and

ClassDA(t,d,k,c)=ek,d,clogpd,d,k,c(t)(1ek,d,c)log(1pd,d,k,c(t)),\displaystyle\mathcal{L}_{\rm ClassDA}^{(t,d,k,c)}\!=\!-e_{k,d,c}\log p_{{\rm d},d,k,c}^{(t)}-(1-e_{k,d,c})\log(1-p_{{\rm d},d,k,c}^{(t)}), (30)

where ek,d,c{0,1}e_{k,d,c}\in\{0,1\} indicates whether or not device kk transmits the ccth symbol in 𝒞\mathcal{C} at the ddth data transmission time. Similar to (19), combining the loss functions in (17) and (29), we obtain the joint loss function for training the data-aided PIC framework. By minimizing this joint loss function, collaborative training for all CE and DD modules can be achieved.

Refer to caption
Figure 5: Illustration of the ttth stage in the non-coherent PIC framework, consisting of KK CE modules, KK IC modules, and KK DD modules.

IV-B Non-Coherent PIC

The proposed non-coherent PIC framework includes TT uniform stages, each consisting of KK CE modules, KK IC modules, and KK DD modules. Unlike the pilot-only and data-aided frameworks, this framework is specifically designed for the non-coherent scheme introduced in Sec. II-A. The high-level illustration of the ttth stage of the non-coherent PIC framework is depicted in Fig. 5.

IV-B1 Input and Output

The proposed non-coherent PIC framework aims to accurately estimate the {γk}k\{\gamma_{k}\}_{\forall k}, {ak,j}k,j\{a_{k,j}\}_{\forall k,j} and {ak}k\{a_{k}\}_{\forall k}. To achieve this goal, we apply real-domain conversion and standardization to {γk,j}k,j\{\gamma_{k,j}\}_{\forall k,j} and 𝒚nc{\bm{y}}_{\rm nc}, as described in Sec. III-A. The resulting effective channel and received signals are denoted as {𝜸~k,j}k,j\{\tilde{{\bm{\gamma}}}_{k,j}\}_{\forall k,j} and 𝒚~nc(1)\tilde{\bm{y}}_{\rm nc}^{(1)}, respectively. Our DNN takes 𝒚~nc(1)\tilde{\bm{y}}_{\rm nc}^{(1)} as input and aims to predict {𝜸~k}k\{\tilde{{\bm{\gamma}}}_{k}\}_{\forall k}, {ak,j}k,j\{a_{k,j}\}_{\forall k,j}, and {ak}k\{a_{k}\}_{\forall k}.

IV-B2 Stage (KK CE modules, KK IC modules, and KK DD modules)

In each stage of the non-coherent PIC framework, the effective channel 𝜸~kj=12J𝜸~k,j\tilde{\bm{\gamma}}_{k}\triangleq\sum_{j=1}^{2^{J}}\tilde{\bm{\gamma}}_{k,j} is initially estimated, followed by the estimation of {ak,j}k\{a_{k,j}\}_{\forall k} based on the obtained effective channel. Subsequently, the PIC is performed using the IC module. Let 𝜽CE(t,k){\bm{\theta}}_{\rm CE}^{(t,k)} be the parameter vector of the kkth CE module in stage tt. The mapping function of the CE module is defined as

𝜸^k(t)j=12J𝜸^k,j(t)=f𝜽CE(t,k)(𝒚~nc,k(t)),\displaystyle\hat{\bm{\gamma}}_{k}^{(t)}\triangleq\sum_{j=1}^{2^{J}}\hat{\bm{\gamma}}_{k,j}^{(t)}=f_{{\bm{\theta}}_{\rm CE}^{(t,k)}}(\tilde{\bm{y}}_{{\rm nc},k}^{(t)}), (31)

where 𝒚~nc,k(t)\tilde{\bm{y}}_{{\rm nc},k}^{(t)} is the outputs of the kkth IC module in stage (t1)(t-1), and its definition will be provided later. Note that the true effective channel 𝜸~k=j=12J𝜸~k,j\tilde{\bm{\gamma}}_{k}=\sum_{j=1}^{2^{J}}\tilde{\bm{\gamma}}_{k,j} becomes 𝟎2{\bm{0}}_{2} when the device kk is inactive. Conversely, the true effective channel takes on a non-zero value 𝜸~k,i\tilde{\bm{\gamma}}_{k,i} when the device kk is active with ak,i=1a_{k,i}=1. In this regard, the CE module in (31) estimates the sum of 𝜸^k,1(t),,𝜸^k,2J(t)\hat{\bm{\gamma}}_{k,1}^{(t)},\ldots,\hat{\bm{\gamma}}_{k,2^{J}}^{(t)}, rather than individually estimating each 𝜸^k,j(t)\hat{\bm{\gamma}}_{k,j}^{(t)}. This strategy effectively reduces the overall complexity of the proposed framework, while facilitating the training process. Next, let 𝜽DD(t,k){\bm{\theta}}_{\rm DD}^{(t,k)} be the parameter vector of the kkth DD module in stage tt. For the last layer of all DD modules, the sigmoid activation function is utilized. The mapping function of the DD module is represented as follows:

𝒑k(t)=f𝜽DD(t,k)(𝒚~nc,k(t),𝜸^k(t)),\displaystyle{\bm{p}}_{k}^{(t)}=f_{{\bm{\theta}}_{\rm DD}^{(t,k)}}(\tilde{\bm{y}}_{{\rm nc},k}^{(t)},\hat{\bm{\gamma}}_{k}^{(t)}), (32)

where 𝒑k(t)=[pk,1(t),,pk,2J(t)]𝖳(0,1)2J{\bm{p}}_{k}^{(t)}=[p_{k,1}^{(t)},\ldots,p_{k,2^{J}}^{(t)}]^{\sf T}\in(0,1)^{2^{J}} and pk,j(t)p_{k,j}^{(t)} implies the probability that ak,j=1a_{k,j}=1. Subsequently, the IC module performs the following PIC processes:

𝒚~nc,k(t+1)\displaystyle\tilde{\bm{y}}_{{\rm nc},k}^{(t+1)} =𝒚~nc(1)i=1,ikKj=12J[(𝒔~i)(𝒔~i)(𝒔~i)(𝒔~i)]𝜸^i(t)pi,j(t),\displaystyle=\tilde{\bm{y}}_{{\rm nc}}^{(1)}-\sum_{i=1,i\neq k}^{K}\sum_{j=1}^{2^{J}}\begin{bmatrix}\Re(\tilde{\bm{s}}_{i})&-\Im(\tilde{\bm{s}}_{i})\\ \Im(\tilde{\bm{s}}_{i})&\Re(\tilde{\bm{s}}_{i})\end{bmatrix}\hat{\bm{\gamma}}_{i}^{(t)}p_{i,j}^{(t)}, (33)

where 𝜸^i(t)pi,j(t)\hat{\bm{\gamma}}_{i}^{(t)}p_{i,j}^{(t)} represents the estimate of 𝜸^i,j(t),i{1,,K}\hat{\bm{\gamma}}_{i,j}^{(t)},i\in\{1,\ldots,K\}. Aligned with the design principle of the data-aided PIC framework, the DD module of the non-coherent PIC framework is configured as an FCNN, with its input structured as [(𝒚~nc,k(t))𝖳,(𝜸^k(t))𝖳]𝖳[(\tilde{\bm{y}}_{{\rm nc},k}^{(t)})^{\sf T},(\hat{\bm{\gamma}}_{k}^{(t)})^{\sf T}]^{\sf T}.

IV-B3 Final Decision

The final procedure of the non-coherent PIC framework is to estimate {ak}k\{a_{k}\}_{\forall k} and {ak,j}k,j\{a_{k,j}\}_{\forall k,j}. To do so, we first compute the maximum value of 𝒑k(T){\bm{p}}_{k}^{(T)} as

pmax,k=max{pk,1(T),,pk,2J(T)}.\displaystyle{p}_{{\rm max},k}=\max\!\Big{\{}{p}_{k,1}^{(T)},\ldots,{p}_{k,2^{J}}^{(T)}\Big{\}}. (34)

Subsequently, we determine an estimate of the activity indicator for device kk as

a^k={1,if pmax,kτ,0,otherwise,\displaystyle\hat{a}_{k}=\begin{cases}{1,}&{\text{if }{p}_{{\rm max},k}\geq\tau},\\ {0,}&{\text{otherwise}},\end{cases} (35)

where τ\tau is the pre-defined threshold discussed in Sec. III. We finally determine an estimate of ak,ja_{k,j} as

a^k,j={1,if a^k=1 and pmax,k=pk,j(T),0,otherwise.\displaystyle\hat{a}_{k,j}=\begin{cases}{1,}&{\text{if }\hat{a}_{k}=1\text{ and }p_{{\rm max},k}=p_{k,j}^{(T)}},\\ {0,}&{\text{otherwise}}.\end{cases} (36)

It should be noted that the decision procedure in (36) satisfies

j=12Ja^k,j={1,if a^k=1,0,if a^k=0.\displaystyle\sum_{j=1}^{2^{J}}\hat{a}_{k,j}=\begin{cases}{1,}&{\text{if }\hat{a}_{k}=1},\\ {0,}&{\text{if }\hat{a}_{k}=0}.\end{cases} (37)

This property involves the unique characteristic in the non-coherent scheme, given in (5), wherein at most one a^k,j\hat{a}_{k,j} is equal to one if the device kk is determined to be active.

Refer to caption
(a) ϵ=0.1\epsilon=0.1
Refer to caption
(b) ϵ=0.2\epsilon=0.2
Refer to caption
(c) ϵ=0.3\epsilon=0.3
Refer to caption
(d) ϵ=0.4\epsilon=0.4
Figure 6: Comparison of miss detection and false alarm probabilities of different grant-free NOMA frameworks with the coherent scheme for various values of ϵ\epsilon when L=6L=6.

IV-B4 Training Procedure

The overall training strategy of the non-coherent PIC framework is similar to that of the pilot-only and data-aided PIC frameworks. We employ the regression loss function, which quantifies the disparity between the estimated channels and the true channels, as the MAE-based loss function in (17). Subsequently, we introduce the classification loss function, assessing the performance of {𝒑k(t)}k,t\{{\bm{p}}_{k}^{(t)}\}_{\forall k,t}, given by

ClassNC=1K2Jt=1Tk=1Kj=12JClassNC(t,k,j),\displaystyle\mathcal{L}_{\rm ClassNC}=\frac{1}{K2^{J}}\sum_{t=1}^{T}\sum_{k=1}^{K}\sum_{j=1}^{2^{J}}\mathcal{L}_{\rm ClassNC}^{(t,k,j)}, (38)

where

ClassNC(t,k,j)=ak,jlogpk,j(t)(1ak,j)log(1pk,j(t)).\displaystyle\mathcal{L}_{\rm ClassNC}^{(t,k,j)}=-a_{k,j}\log p_{k,j}^{(t)}-(1-a_{k,j})\log(1-p_{k,j}^{(t)}). (39)

Similar to (19), the combination of the loss functions in (17) and (38) results in the joint loss function used for training the non-coherent PIC framework. By minimizing this joint loss function, collaborative training for all CE and DD modules is achieved, leading to improved performance in CE, DD, and AD.

V Simulation Results and Analysis

In this section, we demonstrate the superiority of the proposed frameworks over existing AD, CE, and DD frameworks for the uplink grant-free NOMA system, using simulations.

V-A Simulation Settings

In our simulations, we consider an uplink grant-free NOMA system where 2020 devices are distributed uniformly within a cell, with their locations ranging from 5050 m to 500500 m from the center of the cell222In mMTC scenarios with an extremely large number of devices, to efficiently accommodate these devices, we can consider the strategies to form groups for these devices and allocate different communication resources to each group, as discussed in [33]. This strategy is especially effective in critical or low-latency mMTC scenarios where the sequence length LL should be sufficiently small [20, 21]. . The devices transmit spreading sequences with a power of ρ=20\rho=20 dBm and a bandwidth of 100100 kHz [34]. For data transmission, the devices modulate the data bits using BPSK. The power spectral density of the AWGN at the BS is 169-169 dBm/Hz, and βk\beta_{k} follows the path-loss model 128.136.7log10dk-128.1-36.7\log_{10}d_{k} dB, where dkd_{k} is the distance between device kk and the BS [19, 12]. We generate each spreading sequence 𝒔k{\bm{s}}_{k} or 𝒔k,j{\bm{s}}_{k,j} from a Gaussian distribution 𝒞𝒩(𝟎,1L𝑰)\mathcal{CN}({\bm{0}},\frac{1}{L}{\bm{I}}) and subsequently normalize it to ensure 𝒔k2=1\|{\bm{s}}_{k}\|_{2}=1 or 𝒔k,j2=1\|{\bm{s}}_{k,j}\|_{2}=1. It should be noted that, unlike CS algorithms, the proposed frameworks do not impose any specific structure constraints on the spreading sequences.

For training, we first generate training data samples with ϵ=0.25\epsilon=0.25. Then, we perform post-processing on both the training and test datasets to ensure that every data sample contains at least one active device, which is essential for evaluating NMSE in (7). We train our model with a batch size of 10241024 for 100100 epochs using the Adam optimizer in [35] with a learning rate of 0.0010.001, λ=0.5\lambda=0.5, and wt=1,tw_{t}=1,\forall t. All the AD, CE, and DD modules in our frameworks consist of two hidden layers, each comprising 6464 neurons and rectified linear unit (ReLU) activation functions.

For comparison, in the coherent scheme, we consider the approximate message passing (AMP) in [18], the least absolute shrinkage and selection operator (LASSO) in [36], and an FCNN. In the LASSO framework, the objective is to minimize the following loss function:

L=12𝒚p𝑺𝜸2+ν𝜸1,\displaystyle\mathcal{L}_{\rm L}=\frac{1}{2}\|{\bm{y}}_{\rm p}-{\bm{S}}{\bm{\gamma}}\|_{2}+\nu\|{\bm{\gamma}}\|_{1}, (40)

where 𝑺=[𝒔1,,𝒔K]{\bm{S}}=[{\bm{s}}_{1},\ldots,{\bm{s}}_{K}], and ν\nu is set to 0.050.05. In the AMP and LASSO frameworks, device activity is determined based on whether the estimated channel gain is larger than a predefined threshold. The FCNN framework involves two distinct DNNs, each performing CE and AD tasks. These networks comprise 1010 hidden layers with 20482048 neurons per layer and ReLU activation functions. Other training parameters are the same as those of our framework. In the coherent scheme, all frameworks employ the MRC method in (III-A4) to detect the symbols {xk,d}k,d\{x_{k,d}\}_{\forall k,d}. For comparison, in the non-coherent scheme, we modify the AMP and LASSO frameworks to ensure the constraint in (5), similar to [11, 12]. We refer to these modified frameworks as Non-coherent AMP and Non-coherent LASSO, respectively.

Refer to caption
(a) Average error probability
Refer to caption
(b) NMSE
Refer to caption
(c) BER
Figure 7: Performance comparison of different grant-free NOMA frameworks with the coherent scheme for various values of τ\tau.

V-B Evaluation with Coherent Scheme

In Fig. 6, we compare the false alarm probability PfaP_{\rm fa} and miss detection probability PmdP_{\rm md} of different grant-free NOMA frameworks with the coherent scheme for various values of ϵ\epsilon. For the proposed frameworks, we set L=6L=6, T=4T=4 for the pilot-only PIC, and T=2T=2 for the data-aided PIC. Their simulation times are comparable but significantly lower than those of the other frameworks. Fig. 6 shows that the proposed pilot-only and data-aided PIC frameworks outperform all existing frameworks regardless of ϵ\epsilon, even though they were trained with a fixed ϵ=0.25\epsilon=0.25. In particular, the performance gap between the proposed and existing frameworks increases as ϵ\epsilon decreases. This result not only highlights the superiority of the proposed frameworks but also demonstrates their effective applicability to diverse grant-free NOMA scenarios. Fig. 6 also shows that the proposed data-aided PIC framework achieves the lowest error rates for all ϵ\epsilon. This result suggests that the data-aided PIC framework effectively extracts the common channel information shared between the received pilot and data signals. Meanwhile, our frameworks exhibit higher AD accuracy compared to the LASSO framework. This improvement can be attributed to our optimization strategy which relies on the joint loss function in (19), differing from the loss function in (40).

In Fig. 7, we compare the average error probability PerrP_{\rm err}, NMSE, and BER of different grant-free NOMA frameworks with the coherent scheme for various values of τ\tau. In this simulation, we set ϵ=0.1\epsilon=0.1, J=2J=2, L=τ/3L=\tau/3, and determine appropriate thresholds to ensure the condition Perr=Pfa=PmdP_{\rm err}=P_{\rm fa}=P_{\rm md}. The BER is then evaluated under this specified condition. Fig. 7 shows that PerrP_{\rm err}, NMSE, and BER of all frameworks decrease as τ\tau increases. Fig. 7 also demonstrates that the proposed pilot-only and data-aided PIC frameworks outperform other frameworks across all performance metrics. The performance gaps between the AMP and proposed frameworks significantly increase as τ\tau decreases. Under this result, we note that the CS-based AMP technique should satisfy the RIP condition to ensure the recovery performance. Meanwhile, the performance gaps between the FCNN and the proposed frameworks indicate that incorporating communication knowledge from the PIC into DL is more effective than treating DL as a black box. In particular, the NMSE results demonstrate that it is difficult to optimize a naive FCNN without knowledge of the communication.

Refer to caption
(a) Average error probability
Refer to caption
(b) BER
Figure 8: Performance comparison of different grant-free NOMA frameworks with the non-coherent scheme for various values of ϵ\epsilon.
Refer to caption
(a) Average error probability
Refer to caption
(b) BER
Figure 9: Performance comparison of different grant-free NOMA frameworks with the non-coherent scheme for various values of ρ\rho.

V-C Evaluation with Non-Coherent Scheme

In Fig. 8, we compare PerrP_{\rm err} and BER of different grant-free NOMA frameworks with the non-coherent scheme for various values of ϵ\epsilon. In this simulation, we set τ=10\tau=10, K=10K=10, T=4T=4, J=2J=2, and ρ=10\rho=10 dBm. Fig. 8 shows that the proposed non-coherent PIC framework outperforms the existing frameworks. In particular, the performance gap in terms of both PerrP_{\rm err} and BER between the proposed and existing frameworks increases as ϵ\epsilon decreases, even though it was trained with a fixed ϵ=0.25\epsilon=0.25.

In Fig. 9, we compare PerrP_{\rm err} and BER of different grant-free NOMA frameworks with the non-coherent scheme for various values of transmission power ρ\rho. In this simulation, we set ϵ=0.2\epsilon=0.2, τ=10\tau=10, K=10K=10, T=4T=4, and J=2J=2. Fig. 9 shows that the proposed framework achieves the lowest PerrP_{\rm err} and BER across varying ρ\rho. This result suggests the robustness of the proposed framework for various values of ρ\rho.

TABLE I: BER and PerrP_{\rm err} of different grant-free NOMA frameworks for various values of JJ.
BER PerrP_{\rm err}
The number of bits JJ 11 22 11 22
Pilot-only PIC 0.123{\bf 0.123} 0.263{\bf 0.263} 0.101{\bf 0.101} 0.230{\bf 0.230}
Data-aided PIC 0.063{\bf 0.063} 0.120{\bf 0.120} 0.062{\bf 0.062} 0.114{\bf 0.114}
LASSO 0.1670.167 0.2940.294 0.1500.150 0.2740.274
AMP 0.1780.178 0.3240.324 0.1620.162 0.3040.304
FCNN 0.2420.242 0.3270.327 0.1470.147 0.2420.242
Non-coherent PIC 0.055{\bf 0.055} 0.089{\bf 0.089} 0.053{\bf 0.053} 0.084{\bf 0.084}
Non-coherent LASSO 0.0670.067 0.0960.096 0.0640.064 0.0920.092
Non-coherent AMP 0.0680.068 0.1100.110 0.0660.066 0.1040.104

V-D Comparison between Coherent and Non-Coherent Schemes

In Table I, we compare the coherent and non-coherent schemes using different grant-free NOMA frameworks. In this simulation, we set ϵ=0.2\epsilon=0.2, τ=10\tau=10, K=10K=10, and ρ=15\rho=15 dBm. For the proposed frameworks, we set T=7T=7 for the pilot-only PIC, T=3T=3 for the data-aided PIC, and T=4T=4 for the non-coherent PIC. Their simulation times are comparable but significantly lower than those of the other frameworks. It should be noted that the sequence length is determined differently between the coherent and non-coherent schemes under the fixed τ=10\tau=10. Specifically, for the coherent scheme, it is calculated as L=τ/(J+1)L=\lfloor\tau/(J+1)\rfloor, and for the non-coherent scheme, it is set to L=τ=10L=\tau=10. Table I shows that the grant-free NOMA frameworks with the non-coherent scheme achieve lower BER compared to those with the coherent scheme. In particular, the proposed non-coherent PIC framework achieves the lowest error rates. This result suggests that the non-coherent scheme is particularly useful for scenarios transmitting a small number of data bits [11].

TABLE II: Performance of different grant-free NOMA frameworks with 5000050000 test data samples when ϵ=0.1\epsilon=0.1 and τ=10\tau=10.
PerrP_{\rm err} BER Simulation time (s)
Pilot-only PIC-I 0.1790.179 0.1940.194 0.0180.018
Pilot-only PIC-II 0.7500.750
Pilot-only PIC-III 264.176264.176
Data-aided PIC-I 0.0910.091 0.0930.093 0.0160.016
Data-aided PIC-II 0.7030.703
Data-aided PIC-III 258.518258.518
FCNN-I 0.2030.203 0.2640.264 0.0060.006
FCNN-II 12.98812.988
FCNN-III 389.118389.118
LASSO 0.2170.217 0.2240.224 848.751848.751
AMP 0.2480.248 0.2550.255 394.774394.774
Non-coherent PIC-I 0.1060.106 0.1080.108 0.0210.021
Non-coherent PIC-II 1.1031.103
Non-coherent PIC-III 272.600272.600
Non-coherent LASSO 0.1210.121 0.1240.124 981.684981.684
Non-coherent AMP 0.1400.140 0.1430.143 397.015397.015

In Table II, we evaluate PerrP_{\rm err}, BER, and simulation time333Simulation time is measured using Intel(R) Core(TM) i7-10700K CPU @ 3.80 GHz, NVIDIA GeForce RTX 3070, and 32.0 GB RAM. of different grant-free NOMA frameworks, using 50,00050,000 test data samples. In this simulation, we set ϵ=0.1\epsilon=0.1, τ=10\tau=10, K=10K=10, and ρ=10\rho=10 dBm. For both the FCNN and proposed frameworks, we consider three data sample processing scenarios: (i) employing GPU to process all data samples at once, (ii) utilizing CPU to process all data samples at once, and (iii) using CPU to process each data sample. For each scenario, we use the notations FCNN-jj and Proposed-jj, where j{I,II,III}j\in\{{\rm I},{\rm II},{\rm III}\} for notational simplicity. For conventional frameworks, we consider the third scenario due to the iterative nature of their algorithms. Table II demonstrates that the proposed frameworks achieve the lowest error rates while maintaining short simulation times. Although FCNN-I requires less simulation time than PIC-I, the proposed frameworks outperform the FCNN framework in both PerrP_{\rm err} and BER performances by considerable margins. Meanwhile, simulation times for the first and second scenarios are significantly lower than for the third scenario in the proposed frameworks. This result implies that the proposed frameworks can be utilized effectively when the BS needs to perform AD and DD from multiple received signals stored in the buffer, a common scenario in real-world applications.

VI Conclusion

In this paper, we proposed a novel DL-assisted PIC approach for joint AD, CE, and DD tasks for the uplink grant-free NOMA system. We have tackled the inherent challenges in existing grant-free NOMA frameworks, including performance degradation due to a limited spreading sequence length, non-orthogonal spreading sequences, and lack of knowledge of channel gain orders and device activities, by leveraging the advantages of PIC and DL. Specifically, we have developed three PIC-based frameworks, namely, the pilot-only PIC, the data-aided PIC, and the non-coherent PIC frameworks. To enhance training efficiency and mitigate the vanishing gradient problem, we designed the end-to-end training strategy facilitated by joint loss function and IC modules. This strategy not only enables joint optimization for AD, CE, and DD tasks but also allows for training complex relationships between different modules, leading to improved performance. An important direction of future research is extending our framework to incorporate spreading sequence design, aiming to preserve orthogonality and alleviate interference.

References

  • [1] L. Chettri and R. Bera, “A comprehensive survey on Internet of Things (IoT) toward 5G wireless systems,” IEEE Internet Things J., vol. 7, no. 1, pp. 16–32, Jan. 2020.
  • [2] D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, J. Li, D. Niyato, O. Dobre, and H. V. Poor, “6G Internet of Things: A comprehensive survey,” IEEE Internet Things J., vol. 9, no. 1, pp. 359–383, Jan. 2022.
  • [3] M. B. Shahab, R. Abbas, M. Shirvanimoghaddam, and S. J. Johnson, “Grant-free non-orthogonal multiple access for IoT: A survey,” IEEE Commun. Surveys Tuts., vol. 22, no. 3, pp. 1805–1838, thirdquarter 2020.
  • [4] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 615–637, Feb. 2021.
  • [5] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, and H. V. Poor, “Non-orthogonal multiple access in multi-cell networks: Theory, performance, and practical challenges,” IEEE Commun. Mag., vol. 55, no. 10, pp. 176–183, Oct. 2017.
  • [6] Y. Yuan, S. Wang, Y. Wu, H. V. Poor, Z. Ding, X. You, and L. Hanzo, “NOMA for next-generation massive IoT: Performance potential and technology directions,” IEEE Commun. Mag., vol. 59, no. 7, pp. 115–121, Jul. 2021.
  • [7] R. Abbas, M. Shirvanimoghaddam, Y. Li, and B. Vucetic, “A novel analytical framework for massive grant-free NOMA,” IEEE Trans. Commun., vol. 67, no. 3, pp. 2436–2449, Mar. 2019.
  • [8] J.-C. Jiang and H.-M. Wang, “Massive random access with sporadic short packets: Joint active user detection and channel estimation via sequential message passing,” IEEE Trans. Wireless Commun., vol. 20, no. 7, pp. 4541–4555, Jul. 2021.
  • [9] W. Kim, Y. Ahn, and B. Shim, “Deep neural network-based active user detection for grant-free NOMA systems,” IEEE Trans. Commun., vol. 68, no. 4, pp. 2143–2155, Apr. 2020.
  • [10] L. Liu, E. G. Larsson, W. Yu, P. Popovski, C. Stefanovic, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the Internet of Things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88–99, Sep. 2018.
  • [11] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,” IEEE Trans. Commun., vol. 66, no. 12, pp. 6164–6175, Dec. 2018.
  • [12] Z. Ma, W. Wu, F. Gao, and X. Shen, “Model-driven deep learning for non-coherent massive machine-type communications,” IEEE Trans. Wireless Commun., early access, Jul. 24, 2023.
  • [13] Y. Ahn, W. Kim, and B. Shim, “Active user detection and channel estimation for massive machine-type communication: Deep learning approach,” IEEE Internet Things J., vol. 9, no. 14, pp. 11 904–11 917, Jul. 2022.
  • [14] K. Senel and E. G. Larsson, “Device activity and embedded information bit detection using AMP in massive MIMO,” in Proc. IEEE Glob. Commun. Conf. (GLOBECOM), Singapore, Dec. 2017, pp. 1–6.
  • [15] A. T. Abebe and C. G. Kang, “MIMO-based reliable grant-free massive access with QoS differentiation for 5G and beyond,” IEEE J. Sel. Areas Commun., vol. 39, no. 3, pp. 773–787, Mar. 2021.
  • [16] Y. Du, C. Cheng, B. Dong, Z. Chen, X. Wang, J. Fang, and S. Li, “Block-sparsity-based multiuser detection for uplink grant-free NOMA,” IEEE Trans. Wireless Commun., vol. 17, no. 12, pp. 7894–7909, Dec. 2018.
  • [17] M. Borgerding, P. Schniter, and S. Rangan, “AMP-inspired deep networks for sparse linear inverse problems,” IEEE Trans. Signal Process., vol. 65, no. 16, pp. 4293–4308, Aug. 2017.
  • [18] D. L. Donoho, A. Maleki, and A. Montanari, “The noise-sensitivity phase transition in compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 6920–6941, Oct. 2011.
  • [19] L. Liu and W. Yu, “Massive connectivity with massive MIMO–Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, Jun. 2018.
  • [20] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Phase transition analysis for covariance-based massive random access with massive MIMO,” IEEE Trans. Inf. Theory, vol. 68, no. 3, pp. 1696–1715, Mar. 2022.
  • [21] S. R. Pokhrel, J. Ding, J. Park, O.-S. Park, and J. Choi, “Towards enabling critical mMTC: A review of URLLC within mMTC,” IEEE Access, vol. 8, pp. 131 796–131 813, Jul. 2020.
  • [22] U. K. Ganesan, E. Björnson, and E. G. Larsson, “Clustering-based activity detection algorithms for grant-free random access in cell-free massive MIMO,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7520–7530, Nov. 2021.
  • [23] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Channel estimation and user activity identification in massive grant-free multiple-access,” IEEE Open J. Veh. Technol., vol. 1, pp. 296–316, Aug. 2020.
  • [24] X. Bian, Y. Mao, and J. Zhang, “Supporting more active users for massive access via data-assisted activity detection,” in Proc. IEEE Int. Conf. Commun. (ICC), Montreal, QC, Canada, Jun. 2021, pp. 1–6.
  • [25] J. Ahn, B. Shim, and K. B. Lee, “Sparsity-aware ordered successive interference cancellation for massive machine-type communications,” IEEE Wireless Commun. Lett., vol. 7, no. 1, pp. 134–137, Feb. 2018.
  • [26] X. Song and K. Li, “Improved channel estimation algorithm based on parallel interference cancellation,” in Proc. Int. Conf. Neural Netw. Signal Process. (ICNNSP), Zhenjiang, China, Jun. 2008, pp. 466–469.
  • [27] N. Ye, X. Li, H. Yu, L. Zhao, W. Liu, and X. Hou, “DeepNOMA: A unified framework for NOMA using deep multi-task learning,” IEEE Trans. Wireless Commun., vol. 19, no. 4, pp. 2208–2225, Sep. 2020.
  • [28] J. Andrews, “Interference cancellation for cellular systems: A contemporary overview,” IEEE Wireless Commun., vol. 12, no. 2, pp. 19–29, Apr. 2005.
  • [29] H. Yu, Z. Fei, Z. Zheng, N. Ye, and Z. Han, “Deep learning-based user activity detection and channel estimation in grant-free NOMA,” IEEE Trans. Wireless Commun., vol. 22, no. 4, pp. 2202–2214, Apr. 2023.
  • [30] Y. Bai, W. Chen, B. Ai, Z. Zhong, and I. J. Wassell, “Prior information aided deep learning method for grant-free NOMA in mMTC,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 112–126, Jan. 2022.
  • [31] Z. Zhang, Y. Li, C. Huang, Q. Guo, C. Yuen, and Y. L. Guan, “DNN-aided block sparse Bayesian learning for user activity detection and channel estimation in grant-free non-orthogonal random access,” IEEE Trans. Veh. Technol., vol. 68, no. 12, pp. 12 000–12 012, Dec. 2019.
  • [32] S. P. Boyd and L. Vandenberghe, Convex Optimization.   Cambridge, U.K.: Cambridge Univ. Press, 2004.
  • [33] H. S. Jang, B. C. Jung, T. Q. S. Quek, and D. K. Sung, “Resource-hopping-based grant-free multiple access for 6G-enabled massive IoT networks,” IEEE Internet of Things Journal, vol. 8, no. 20, pp. 15 349–15 360, Oct. 2021.
  • [34] D. A. Tubiana, J. Farhat, G. Brante, and R. D. Souza, “Q-learning NOMA random access for IoT-satellite terrestrial relay networks,” IEEE Wireless Commun. Lett., vol. 11, no. 8, pp. 1619–1623, Aug. 2022.
  • [35] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), San Diego, CA, USA, May 2015, pp. 1–13.
  • [36] A. Maleki, L. Anitori, Z. Yang, and R. G. Baraniuk, “Asymptotic analysis of complex LASSO via complex approximate message passing (CAMP),” IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4290–4308, Jul. 2013.