An Efficient Active Set Algorithm for Covariance Based Joint Data and Activity Detection for Massive Random Access with Massive MIMO

Abstract

This paper proposes a computationally efficient algorithm to solve the joint data and activity detection problem for massive random access with massive multiple-input multiple-output (MIMO). The BS acquires the active devices and their data by detecting the transmitted preassigned nonorthogonal signature sequences. This paper employs a covariance based approach that formulates the detection problem as a maximum likelihood estimation (MLE) problem. To efficiently solve the problem, this paper designs a novel iterative algorithm with low complexity in the regime where the device activity pattern is sparse – a key feature that existing algorithmic designs have not previously exploited for reducing complexity. Specifically, at each iteration, the proposed algorithm focuses on only a small subset of all potential sequences, namely the active set, which contains a few most likely active sequences (i.e., transmitted sequences by all active devices), and performs the detection for the sequences in the active set. The active set is carefully selected at each iteration based on the current detection result and the first-order optimality condition of the MLE problem. Simulation results show that the proposed active set algorithm enjoys significantly better computational efficiency (in terms of the CPU time) than the state-of-the-art algorithms.

Index Terms— Active set, joint data and active detection, massive MIMO, massive random access.

1 Introduction

Massive machine-type communication (mMTC) is a main use case in the fifth generation (5G) cellular systems [1]. A challenging task in mMTC is the uncoordinated random access, in which a large number of sporadically active devices wish to send small data to the base-station (BS) in the uplink [2]. To meet the low-latency requirement in mMTC, the grant-free random access scheme could be a promising solution [3, 4], in which each device is preassigned multiple signature sequences from a large set of nonorthogonal sequences, and the active device selects one sequence from the assigned sequences to transmit. The data and device identification are embedded in the sequence selection. The BS then detects the active devices and decodes their data by detecting which sequences are transmitted.

By exploiting the sporadic device traffic, the joint data and activity detection problem has been formulated as a compressed sensing problem [4], in which the data and the device activity are recovered along with the instantaneous channel state information (CSI) via the approximate message passing (AMP) algorithm. Similar methods have also been used for the scenario where each device is associated with only one sequence for the purpose of device activity detection [5, 6, 7]. However, if the CSI is not needed, it is actually possible to recover the data and activities without recovering the channel coefficients using a convariance based approach [8, 9], which outperforms the AMP method, especially in the massive multiple-input multiple-output (MIMO) systems. In the covariance based method, the detection problem is formulated as a maximum likelihood estimation (MLE) problem, in which the channel coefficients are treated as random samples and averaged out in computing the covariance. This covariance based method is first suggested in [10] for device activity detection, and it has also been used for a few related data/activity detection problems, e.g., data decoding for unsourced random access [11], cooperative activity detection in cell-free systems [12], and activity detection with interference [13].

The coordinate descent (CD) algorithm that iteratively updates the sequence selection for each device is commonly used in solving the detection problem in the covariance based approach, which achieves excellent detection performance; see [8, 10, 13] for more details. The possible reason for the popularity of the CD algorithm is that its subproblem (i.e., the original problem with respect to only one variable) admits a nice closed-form solution [10], which makes it easily implementable. To further speed up the convergence of the CD algorithm, a new coordinate sampling strategy is proposed in [14]. Other algorithms for solving the detection problem include the expectation maximization/minimization (EM) algorithm (i.e., sparse Bayesian learning) [15] and the SPICE algorithm [16]. However, none of the above mentioned solutions take advantage of the sparsity of the true solution of the detection to lower their algorithmic complexities, thus becoming less computationally efficient when the problem’s dimension is huge, which is the case in mMTC.

In this paper, we propose a computationally efficient algorithm that carefully exploits the sparsity of the true solution to solve the joint data and activity detection problem in the covariance based approach. Specifically, we propose an iterative algorithm that attacks the original large-scale problem by solving a sequence of small-size problems. We focus on only a small subset of all sequences at each iteration, termed as the active set, which contains only the most likely active sequences¹¹1Active sequences in this paper refer to transmitted sequences by all active devices in the joint data and activity detection problem. and can be seen as an approximation of the set of active sequences. We perform joint data and activity detection for only the sequences in the active set using a low-complexity spectral projected gradient (PG) algorithm [17]. We carefully update the active set at each iteration based on the current detection result and the first-order optimality condition of the joint detection problem. We also establish the convergence of the proposed active set algorithm. Simulation results show that as compared to the commonly used CD algorithm in the covariance based approach, the proposed active set algorithm has much higher computational efficiency (in terms of the CPU time).

2 System Model and Problem Formulation

2.1 System Model

Consider an uplink single-cell system where there are one BS equipped with $M\gg 1$ antennas and $N$ devices each equipped with a single antenna. Assume a quasi-static narrow-band channel model, where the wireless channels remain unchanged within each transmission block but may vary over different blocks. Let $\sqrt{g_{n}}\mathbf{h}_{n}\in\mathbb{C}^{M\times 1}$ denote the channel vector from device $n$ to the BS, where $g_{n}\geq 0$ is the large-scale fading component (depending on the device’s location), and $\mathbf{h}_{n}\in\mathbb{C}^{M\times 1}$ is the Rayleigh fading component following the i.i.d. complex Gaussian distribution.

In each coherence block, only $K\ll N$ devices are active (due to the sporadic traffic), and each active device wishes to transmit $J$ bits of data to the BS, where $J$ is a small value in the mMTC scenario. Assume that each device $n$ has a unique signature sequence set $\mathcal{S}_{n}=\left\{\mathbf{s}_{n,1},\mathbf{s}_{n,2},\ldots,\mathbf{s}_{n,Q}\right\},$ where $\mathbf{s}_{n,q}\in\mathbb{C}^{L\times 1},1\leq q\leq Q\triangleq 2^{J},$ and $L$ is the signature sequence length. When device $n$ is active and needs to send $J$ bits of data, it selects one sequence from $\mathcal{S}_{n}$ to transmit. Finally, let $\chi_{n,q}\in\{0,1\}$ indicate whether or not sequence $q$ of device $n$ (i.e., $\mathbf{s}_{n,q}$ ) is transmitted. Notice that at most one sequence is selected by each device, then it follows that $\chi_{n,q}$ satisfies $\sum_{q=1}^{Q}\chi_{n,q}\in\{0,1\}$ , where $\sum_{q=1}^{Q}\chi_{n,q}=0$ indicates that device $n$ is inactive, and $\sum_{q=1}^{Q}\chi_{n,q}=1$ indicates that device $n$ is active.

Assume that the sequences transmitted by active devices are perfectly synchronized. Then the received signal $\mathbf{Y}\in\mathbb{C}^{L\times M}$ at the BS, which is a superposition of the transmitted signals from all active devices, can be expressed as

\displaystyle\mathbf{Y}

\displaystyle=\sum_{n=1}^{N}\sum_{q=1}^{Q}\chi_{n,q}\mathbf{s}_{n,q}\sqrt{g_{n}}\mathbf{h}_{n}^{T}+\mathbf{W},

(1)

where $\mathbf{W}\in\mathbb{C}^{L\times M}$ is the effective i.i.d. Gaussian noise whose variance $\sigma_{w}^{2}$ is the background noise power normalized by the device transmit power.

To obtain a more compact expression of the received signal in (1), we define $\mathbf{S}_{n}=[\mathbf{s}_{n,1},\ldots,\mathbf{s}_{n,Q}]\in\mathbb{C}^{L\times Q}$ , $\mathbf{D}_{n}=\sqrt{g_{n}}\operatorname{diag}\{\chi_{n,1},\ldots,\chi_{n,Q}\}\in\mathbb{C}^{Q\times Q},$ $\mathbf{H}_{n}=[\mathbf{h}_{n},\ldots,\mathbf{h}_{n}]^{T}\in\mathbb{C}^{Q\times M}$ for all $n.$ Based on them, we further define $\mathbf{S}=[\mathbf{S}_{1},\ldots,\mathbf{S}_{N}]\in\mathbb{C}^{L\times NQ}$ , $\bm{\Gamma}^{{1}/{2}}=\operatorname{diag}\{\mathbf{D}_{1},\ldots,\mathbf{D}_{N}\}\in\mathbb{C}^{NQ\times NQ},$ and $\mathbf{H}=[\mathbf{H}_{1}^{T},\ldots,\mathbf{H}_{N}^{T}]^{T}\in\mathbb{C}^{NQ\times M}$ . Then, the received signal in (1) can be compactly expressed as

\displaystyle\displaystyle\mathbf{Y}=\mathbf{S}\bm{\Gamma}^{{1}/{2}}\mathbf{H}+\mathbf{W}.

(2)

Let $\bm{\gamma}\in\mathbb{C}^{NQ\times 1}$ denote the diagonal entries of $\bm{\Gamma}$ , i.e., $\bm{\gamma}=[\bm{\gamma}_{1}^{T},\ldots,\bm{\gamma}_{N}^{T}]^{T}$ , where $\bm{\gamma}_{n}=[\gamma_{n,1},\ldots,\gamma_{n,Q}]^{T}\in\mathbb{C}^{Q\times 1}$ with $\gamma_{n,q}=g_{n}\chi_{n,q}$ . In the following, we will use $\bm{\gamma}$ and $\bm{\Gamma}$ interchangeably.

2.2 Problem Formulation

The joint activity and data detection problem is to detect the variables $\gamma_{n,q}$ ’s, which indicate both the activity of device $n$ and its data (if it is active) from the received signal $\mathbf{Y}$ based on the knowledge of the signature sequence matrix $\mathbf{S}.$ Specifically, if $\gamma_{n,q}>0,$ then device $n$ is active and it transmits sequence $\mathbf{s}_{n,q};$ otherwise device $n$ is inactive.

As shown in [10, 8], the above joint activity and data detection problem can be mathematically formulated as the MLE problem. Specifically, it can be observed from (2) that given $\bm{\gamma}$ , each column of $\mathbf{Y}$ , denoted as $\mathbf{y}_{m}\in\mathbb{C}^{L\times 1},1\leq m\leq M$ , can be seen as independent samples from a complex Gaussian distribution as

\displaystyle\mathbf{y}_{m}\sim\mathcal{CN}\left(\mathbf{0},\mathbf{S}\bm{\Gamma}^{1/2}\bm{\Lambda}\bm{\Gamma}^{1/2}\mathbf{S}^{H}+\sigma_{w}^{2}\mathbf{I}\right),

(3)

where the covariance matrix is obtained by computing $\mathbb{E}[\mathbf{y}_{m}\mathbf{y}_{m}^{H}]$ based on (2), $\bm{\Lambda}$ is a block diagonal matrix with each block being the all-one matrix $\mathbf{E}\in\mathbb{R}^{Q\times Q}$ , and $\mathbf{I}$ is an identity matrix. Since there is at most one non-zero entry in each diagonal block $\mathbf{D}_{n}$ in $\bm{\Gamma}^{1/2},$ the covariance matrix in (3) can be simplified as

\mathbf{S}\bm{\Gamma}^{1/2}\bm{\Lambda}\bm{\Gamma}^{1/2}\mathbf{S}^{H}+\sigma_{w}^{2}\mathbf{I}=\mathbf{S}\bm{\Gamma}\mathbf{S}^{H}+\sigma_{w}^{2}\mathbf{I}.

Given $\bm{\gamma},$ we have $p(\mathbf{Y}|\bm{\gamma})=\Pi_{m=1}^{M}p(\mathbf{y}_{m}|\bm{\gamma}).$ Based on this and (3), the minimization of $-\tfrac{1}{M}\log p(\mathbf{Y}|\bm{\gamma})$ , equivalent to the maximization of $p(\mathbf{Y}|\bm{\gamma}),$ can be formulated as


$\displaystyle\underset{\bm{\gamma}}{\operatorname{min}}$	$\displaystyle\log\left\|\mathbf{S}\bm{\Gamma}\mathbf{S}^{H}+\sigma_{w}^{2}\mathbf{I}\right\|+\operatorname{Tr}\left(\left(\mathbf{S}\bm{\Gamma}\mathbf{S}^{H}+\sigma_{w}^{2}\mathbf{I}\right)^{-1}\hat{\bm{\Sigma}}\right)$	(4a)
$\displaystyle\operatorname{s.t.}$	$\displaystyle\bm{\gamma}\geq\mathbf{0},$	(4b)

where $\hat{\bm{\Sigma}}=\mathbf{Y}\mathbf{Y}^{H}/M$ is the sample covariance matrix computed by averaging over different antennas, and $\bm{\gamma}\geq\mathbf{0}$ is due to the fact that $\gamma_{n,q}=g_{n}\chi_{n,q}\geq 0$ for all $n$ and $q.$ Since the objective function in problem (4) depends on $\mathbf{Y}$ only through the sample covariance matrix $\hat{\bm{\Sigma}},$ the approach of estimating activity and associated data based on solving problem (4) is called the covariance based approach. It is worthwhile mentioning that problem (4) reduces to the activity detection problem in [10] if each device has only a single signature sequence (i.e., $J=0$ and thus $Q=1$ ).

Let $f(\bm{\gamma})$ denote the objective function of problem (4). Then, for any $q=1,2,\ldots,Q,n=1,2,\ldots,N,$ the gradient of $f(\bm{\gamma})$ with respect to $\gamma_{n,q}$ is

\left[\nabla f(\bm{\gamma})\right]_{n,q}=\textbf{{s}}_{n,q}^{H}\bm{\Sigma}^{-1}\textbf{{s}}_{n,q}-\textbf{{s}}_{n,q}^{H}\bm{\Sigma}^{-1}\bm{\hat{\Sigma}}\bm{\Sigma}^{-1}\textbf{{s}}_{n,q}.

The first-order (necessary) optimality condition of problem (4) is

\displaystyle\left[\nabla f(\bm{\gamma})\right]_{n,q}\begin{cases}=0,&\operatorname{if}~{}\gamma_{n,q}>0;\\ \geq 0,&\operatorname{if}~{}\gamma_{n,q}=0,\end{cases}\ \forall~{}q,n,

(5)

which is equivalent to

[\bm{\gamma}-\nabla f(\bm{\gamma})]_{+}-\bm{\gamma}=\textbf{0},

where $[\cdot]_{+}$ denotes the projection operator onto the nonnegative orthant. It can be checked that computing $\nabla f(\bm{\gamma})$ has a complexity of ${\cal{O}}(NQL^{2}).$

3 Proposed Active Set Algorithm

The basic idea of the proposed active set algorithm for solving problem (4) is to fully exploit the sparsity of its true solution in the algorithmic design, which is in sharp contrast to all existing algorithms such as EM [15], CD [10, 8], and SPICE [16]. More specifically, at each iteration, the active set algorithm first judiciously selects an active set then solves the subproblem defined over the variables in the active set with all the other variables fixed being zero. Since the true solution of problem (4) is sparse, it is expected that the cardinality of the carefully selected active set, i.e., the dimension of the subproblem, will be significantly less than the total number of variables of the original problem (4). Therefore, solving the subproblem defined over the variables in the active set will be much more computationally efficient than directly solving the original problem (4) (over all variables).

Selecting the active set. In principle, a desirable active set should contain the indices of active sequences in order to correctly detect the active users and associated data; on the other hand, its cardinality should be as small as possible in order to avoid unnecessary computation and improve the computational efficiency. Our selection strategy of the active set $\mathcal{A}^{k}$ at a given feasible point $\bm{\gamma}^{k}$ is based on (i) the engineering insight of the joint activity and data detection problem and (ii) the first-order necessary optimality condition (5) of the joint detection problem. In particular, for any given feasible point $\bm{\gamma}^{k},$ the selected active set $\mathcal{A}^{k}$ contains the indices whose corresponding values of $\bm{\gamma}^{k}$ are positive and large (based on (i)) and the indices whose corresponding values of $\nabla f(\bm{\gamma}^{k})$ are negative and small (due to (ii)). Mathematically, the proposed selection strategy of the active set $\mathcal{A}^{k}$ is

\mathcal{A}^{k}=\Big{\{}(i,q)\mid\gamma^{k}_{i,q}>\omega_{k}\ \textup{or}\ [\nabla f(\bm{\gamma}^{k})]_{i,q}<-\nu_{k}\Big{\}},

(6)

where $\omega_{k}$ and $\nu_{k}$ are two positive parameters. The choices of the parameters $\omega_{k}$ and $\nu_{k}$ provide a trade-off between reducing the cardinality of the active set and not missing the active sequences. In general, the smaller these two parameters, the larger the cardinality of the selected active set and the lower probability of missing the active sequences. To make sure of not missing the active sequences, we let $\omega_{k}\downarrow 0$ and $\nu_{k}\downarrow 0$ in (6), which means that $\omega_{k}$ and $\nu_{k}$ monotonically decrease and converge to zero.

Solving the subproblem. At the $k$ -th iteration, once the active set $\mathcal{A}^{k}$ is selected, we solve the following subproblem


$\displaystyle\min$	$\displaystyle\quad\hat{f}(\bm{\gamma}_{\mathcal{A}^{k}})$	(7a)
$\displaystyle\operatorname{s.t.}$	$\displaystyle\quad\bm{\gamma}_{\mathcal{A}^{k}}\geq\mathbf{0},$	(7b)

where $\bm{\gamma}_{\mathcal{A}^{k}}$ is the subvector of $\bm{\gamma}$ indexed by $\mathcal{A}^{k}$ and $\hat{f}(\bm{\gamma}_{\mathcal{A}^{k}})$ is $f(\bm{\gamma})$ defined over $\bm{\gamma}_{\mathcal{A}^{k}}$ with all the other variables fixed being zero. Obviously, problem (7) is different from problem (4). For instance, problem (7) is defined over $\bm{\gamma}_{\mathcal{A}^{k}}$ , whereas problem (4) is defined over $\bm{\gamma}$ . Therefore, the dimension of problem (7) is potentially much smaller than that of problem (4) (if the set $\mathcal{A}^{k}$ in (7) is properly chosen).

We apply the spectral PG algorithm [17] to solve the subproblem in (7) until $\bm{\gamma}_{\mathcal{A}^{k}}^{k+1}$ satisfying

\left\|\left[[\bm{\gamma}_{\mathcal{A}^{k}}^{k+1}-\nabla\hat{f}(\bm{\gamma}_{\mathcal{A}^{k}}^{k+1})]_{+}-\bm{\gamma}_{\mathcal{A}^{k}}^{k+1}\right]\right\|<\varepsilon_{k}

(8)

is found, where $\varepsilon_{k}>0$ is the solution tolerance at the $k$ -th iteration. The spectral PG algorithm [17] is an PG algorithm with the spectral stepsizes, also called the Barzilai-Borwein (BB) stepsizes [18], which approximately solves the Quasi-Newton equation. In the PG algorithm, we need to compute the gradient of the objective function $\hat{f}(\bm{\gamma}_{\mathcal{A}^{k}})$ (equivalent to computing the partial gradient of function $f(\bm{\gamma})$ with respect to the variables in $\mathcal{A}^{k}$ ), which has a complexity of ${\cal{O}}(\left|{\mathcal{A}^{k}}\right|L^{2}).$ Two distinctive advantages of the spectral PG algorithm [17] in the context of solving problem (7) are as follows. First, the non-negative constraint is easy to project onto, and thus the algorithm can be easily implemented to solve problem (7). Second, the algorithm enjoys a quite good numerical performance due to the use of the alternating BB stepsizes [18, 19].

Now, we are ready to present the proposed active set PG algorithm for solving problem (4). The pseudocodes of the proposed algorithm are given in Algorithm 1.

Algorithm 1 Proposed active set PG algorithm for solving problem (4)

1: Initialize:

\bm{\gamma}^{0}=\mathbf{0}

k=0

\left\{\omega_{k},\nu_{k},\varepsilon_{k}\right\}_{k\geq 0},

and

\varepsilon>0;

2: repeat

3: Select the active set

\mathcal{A}^{k}

according to (6);

4: Apply the spectral PG algorithm [17] to solve the subproblem (7) until (8) is satisfied;

5: Set

k\leftarrow k+1;

6: until

\|[\bm{\gamma}^{k}-\nabla f(\bm{\gamma}^{k})]_{+}-\bm{\gamma}^{k}\|<\varepsilon

7: Output:

\bm{\gamma}^{k}

Next, we present some convergence properties of the proposed active set PG Algorithm 1 (without rigorous proofs due to the space limitation). Note that a not careful selection of the active set (and parameters in it) might lead to oscillation (and divergence) of the corresponding active set algorithm among different active sets. The following finite termination property is mainly because of the activity set selection strategy in (6) (and careful choices of parameters $\omega_{k}$ and $\nu_{k}$ ) and the convergence property of the spectral PG algorithm.

Theorem 1

For any given tolerance $\varepsilon>0,$ suppose that the parameters $\omega_{k}$ and $\nu_{k}$ in (6) satisfy $\omega_{k}\downarrow 0$ and $\nu_{k}\downarrow 0$ and the parameter $\varepsilon_{k}$ in (8) satisfy $\lim\limits_{k\rightarrow\infty}\varepsilon_{k}<\varepsilon,$ then the active set PG Algorithm 1 will terminate within a finite number of iterations.

4 Simulation Results

In this section, we present some simulation results to show the efficiency of the proposed active set PG algorithm for solving the joint data and activity detection problem (4). We generate the same parameters as in [8] in our numerical simulations. More specifically, we consider a single cell of radius 1000m and consider the worst-case scenario where all devices are located in the cell edge such that the large-scale fading components $g_{n}$ ’s are the same for all devices. The power spectrum density of the background noise is -169dBm/Hz over 10 MHz and the transmit power of each device is set as 25dBm. The number of BS antennas, the length of the signature sequence, and the bits of the data are set to be $M=256,$ $L=150,$ and $J=1$ (and thus $Q=2$ ), respectively. We generate all signature sequences from i.i.d. complex Gaussian distribution with zero mean and unit variance. We set $K/N=0.1,$ which means that $10\%$ of the total devices are active, and compare the performance of different algorithms as $N$ increases. The parameters in the proposed Algorithm 1 are

	$\displaystyle\omega_{k}$	$\displaystyle={10^{-6-k}},\varepsilon_{k}=\max\left\{10^{-k},0.8*10^{-3}\right\},$
	$\displaystyle\nu_{k}$	$\displaystyle=\min\left\{10^{4-k},0.5\left\|\min_{n,q}\left\{\left[\nabla f(\bm{\gamma}^{k})\right]_{n,q}\right\}\right\|\right\},$

and $\varepsilon=10^{-3}.$ All simulation results in this section are obtained by averaging over $500$ Monte-Carlo runs.

Refer to caption — Fig. 1: Performance of the proposed active set PG algorithm.

The upper subfigure in Fig. 1 plots the average ratio of the cardinality of the selected active sets during all iterations of the proposed Algorithm 1 and the number of active sequences (i.e., $K$ ). In the ideal case where the set of active sequences is always selected as the active set in Algorithm 1, the corresponding ratio is $1$ ; in the worst case where the whole set is always selected as the active set in Algorithm 1, the corresponding ratio is $QN/K=20$ . This ratio measures the efficiency of the corresponding active set selection strategy and the smaller the ratio the better the active set selection strategy. We can observe from the upper subfigure in Fig. 1 that the ratio is in the interval $[1.5,2.5]$ (and in fact very close to $2$ for different $N$ ’s), which clearly shows that the proposed active set selection strategy (6) is very efficient. The lower subfigure in Fig. 1 plots the average number of iterations that the proposed algorithm needs to terminate. This subfigure shows that Algorithm 1 will generally terminate within 4–7 iterations, which validates the finite termination result in Theorem 1.

Next, we compare the proposed active set PG Algorithm 1 with the following three benchmark algorithms:

•

Random CD [8, 10]: To the best of our knowledge, random CD is the state-of-the-art algorithm for solving the covariance based detection problem (4), which is much faster than EM [15] and SPICE [16]. Among various variants, one of the most efficient ones is the so-called random permuted CD [8], which randomly permutes the indices of all variables at each iteration and then updates the variables one by one according to the order in the permutation in closed form (see line 5 of [10, Algorithm 1]).
•

Ideal CD: This refers to the algorithm which applies the random permuted CD algorithm to solve problem (4) defined over the variables in the (ideal) set of active sequences. Since this ideal set is not known at the BS in practice, ideal CD is not a practical algorithm. Ideal CD is compared here for the purpose of characterizing the best possible performance of the CD types of algorithms.
•

Ideal PG: This refers to the algorithm which applies the PG algorithm [17] to solve problem (4) defined over the variables in the (ideal) set of active sequences. This algorithm is also not practical and is only theoretically interesting for characterizing the best possible performance of the PG algorithm.

We have observed in the simulations that random CD and active set PG algorithms always find the same solution of problem (4), and thus below we focus on their CPU time comparison. Fig. 2 plots the CPU time comparison of the proposed algorithm with the above three benchmark algorithms. It can be clearly observed from Fig. 2 that the proposed active set PG algorithm significantly outperforms the state-of-the-art random CD algorithm [10, 8] in terms of the CPU time. The proposed algorithm even achieves slightly better computational efficiency than the ideal CD algorithm. This shows the importance of exploiting the sparsity of the true solution in order to efficiently solve problem (4). Note that we fix $K/N=0.1$ in Fig. 2. It is expected that the proposed active set PG algorithm will become more efficient than the random CD algorithm as the ratio $K/N$ becomes smaller (i.e., the solution of problem (4) becomes more sparse).

We have also observed that directly applying the PG algorithm [17] to solve problem (4) is much slower than random CD. Fig. 2 shows that ideal PG is more efficient than ideal CD. These observations are consistent with our optimization practice that it is better to coordinately update all variables together instead of individually (unless for very large-scale optimization problems where it might be computationally expensive to update all variables together).

In summary, the high computational efficiency of the proposed active set PG algorithm is mainly attributed to the following two factors. First, the active set selection strategy (6) is efficient, which is able to substantially reduce the dimension of the subproblems (compared to the original problem). Second, it is important to choose an appropriate algorithm for solving the subproblems defined over the variables in the active set and the PG algorithm [17] turns out to be a good option (which is obviously much better than the state-of-the-art CD algorithm [8, 10]).

5 Conclusions

Scalable and efficient joint data and activity detection is essential for massive random access in mMTC. In this paper, we propose a novel active set PG algorithm that carefully exploits the sporadic nature of the device traffic. The proposed algorithm is much more efficient than the existing state-of-the-art algorithms (in terms of the CPU time). We have observed from simulation results that several first-order algorithms can find the same (global) solution of the nonconvex joint detection problem. It will be interesting to obtain some theoretical guarantees for this observation.

References

[1] C. Bockelmann, N. Pratas, H. Nikopour, K. Au, T. Svensson, C. Stefanovic, P. Popovski, and A. Dekorsy, “Massive machine-type communications in 5G: Physical and MAC-layer solutions,” IEEE Commun. Mag., vol. 54, no. 9, pp. 59–65, Sept. 2016.
[2] X. Chen, D. W. K. Ng, W. Yu, E. G. Larsson, N. Al-Dhahir, and R. Schober, “Massive access for 5G and beyond,” IEEE J. Sel. Areas Commun. (to appear), 2020.
[3] L. Liu, E. G. Larsson, W. Yu, P. Popovski, Č. Stefanović, and E. de Carvalho, “Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 88–99, Sept. 2018.
[4] K. Senel and E. G. Larsson, “Grant-free massive MTC-enabled massive MIMO: A compressive sensing approach,” IEEE Trans. Commun., vol. 66, no. 12, pp. 6164–6175, Dec. 2018.
[5] L. Liu and W. Yu, “Massive connectivity with massive MIMO —Part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, June 2018.
[6] Z. Chen, F. Sohrabi, and W. Yu, “Sparse activity detection for massive connectivity,” IEEE Trans. Signal Process., vol. 66, no. 7, pp. 1890–1904, Apr. 2018.
[7] L. Liu and Y.-F. Liu, “An efficient algorithm for device detection and channel estimation in asynchronous IoT systems,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. (ICASSP), Toronto, Canada, 2021. [Online]. Available: https://arxiv.org/abs/2010.09979
[8] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Covariance based joint activity and data detection for massive random access with massive MIMO,” in Proc. IEEE Int. Conf. Commun. (ICC), Shanghai, China, May 2019, pp. 1–6.
[9] Z. Chen, F. Sohrabi, Y.-F. Liu, and W. Yu, “Phase transition analysis for covariance based massive random access with massive MIMO,” 2020. [Online]. Available: https://arxiv.org/abs/2003.04175
[10] S. Haghighatshoar, P. Jung, and G. Caire, “Improved scaling law for activity detection in massive MIMO systems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Vail, CO, USA, June 2018, pp. 381–385.
[11] A. Fengler, G. Caire, P. Jung, and S. Haghighatshoar, “Massive MIMO unsourced random access,” 2019. [Online]. Available: http://arxiv.org/abs/1901.00828
[12] X. Shao, X. Chen, D. W. K. Ng, C. Zhong, and Z. Zhang, “Covariance-based cooperative activity detection for massive grant-free random access,” 2020. [Online]. Available: https://arxiv.org/abs/2008.10155
[13] D. Jiang and Y. Cui, “ML estimation and MAP estimation for device activities in grant-free random access with interference,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), 2020, pp. 1–6.
[14] J. Dong, J. Zhang, Y. Shi, and J. H. Wang, “Faster activity and data detection in massive random access: A multi-armed bandit approach,” 2020. [Online]. Available: https://arxiv.org/abs/2001.10237
[15] D. P. Wipf and B. D. Rao, “An empirical Bayesian strategy for solving the simultaneous sparse approximation problem,” IEEE Trans. Signal Process., vol. 55, no. 7, pp. 3704–3716, July 2007.
[16] Z. Yang, J. Li, P. Stoica, and L. Xie, “Sparse methods for direction-of-arrival estimation,” in Academic Press Library in Signal Processing. Elsevier, 2018, vol. 7, pp. 509–581.
[17] E. G. Birgin, J. M. Martínez, and M. Raydan, “Nonmonotone spectral projected gradient methods on convex sets,” SIAM J. Optim., vol. 10, no. 4, pp. 1196–1211, 2000.
[18] J. Barzilai and J. M. Borwein, “Two-point step size gradient methods,” IMA J. Numer. Anal., vol. 8, no. 1, pp. 141–148, 1988.
[19] Y.-H. Dai and R. Fletcher, “Projected Barzilai-Borwein methods for large-scale box-constrained quadratic programming,” Numer. Math., vol. 100, no. 1, pp. 21–47, 2005.