\UseRawInputEncoding

Improved Recursive Algorithms for V-BLAST to Save Computations and Memories

Hufei Zhu, Yanyang Liang, Fuqin Deng*, Genquan Chen and Jiaming Zhong H. Zhu, Y. Liang J. Zhong, F. Deng and G. Chen are with the Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, Guangdong, China.This paper was presented in part at the IEEE International Conference on Communications.

Abstract

For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the computational complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively, and as far as we know, they require the least computations and memories, respectively, among the existing recursive algorithms. In this paper, we propose Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, Improvement V uses another lemma to speed up the matrix inversion step by the factor of $1.67$ . Then the formulas adopted in our Improvement V are applied to deduce Improvement VI for interference cancellation, which saves memories without sacrificing speed compared to Improvement II. In the existing algorithm with speed advantage, the proposed algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. Both proposed algorithms speed up the existing algorithm with speed advantage by the factor of $1.3$ , while the proposed algorithm II achieves the speedup of $1.86$ and saves about half memories, compared to the existing algorithm with memory saving.

Index Terms:

Recursive algorithm, V-BLAST, memory saving, inversion of partitioned matrix, interference cancellation.

I Introduction

Spatial-multiplexing (SM) mutiple-input multiple-output (MIMO) wireless communication systems [1] transmit multiple substreams concurrently from multiple antennas, to achieve very high data rate and spectral efficiency in rich multi-path environments. As the number of transmit antennas increases, the maximum-likelihood (ML) MIMO detectors [2, 3, 4] with the optimum performance quickly require excessive computational complexity. Vertical Bell Laboratories Layered Space-Time (V-BLAST) systems [5] are SM MIMO schemes with good performance-complexity tradeoff, where usually ordered successive interference cancellation (OSIC) detectors based on hard decisions are adopted to detect the substream symbols iteratively with the optimal ordering based on signal-to-noise ratio (SNR). In each iteration, one of the undetected symbols is detected through a linear zero-forcing (ZF) or minimum meansquare error (MMSE) filter, and then the hard decision of the detected symbol is utilized to cancel the interference in the received symbol vector. OSIC detectors choose the undetected symbol with the highest SNR in each iteration to perform detection and interference cancellation, since the unreliable hard decision causes detrimental error propagation (EP) [6].

EP due to imperfect hard decisions can severely limit the performance of OSIC detectors [6, 7]. Then the ordering schemes based on the loglikelihood ratio (LLR) [7, 8] and maximum a posteriori probability (MAP) [9] exploit the instantaneous noise and the resulting symbol decision error, respectively, to achieve a performance gain over the conventional SNR ordering scheme. On the other hand, iterative soft interference cancellation (ISIC) MIMO detectors were proposed to reduce the EP effects by replacing hard decisions with soft decisions in interference cancellation [10, 11, 12, 13], which are extended to iterative detection-decoding (IDD) schemes with soft-decision feedback for coded systems, and can achieve near ML performance with controllable complexity [14, 15]. Since the direct-sequence code-division multiple-access (DS-CDMA) system can be modeled as a MIMO channel [14], the ISIC MIMO detectors are also obtained in [16] and [14] by applying the ISIC detector in [17] for the coded DS-CDMA system. For the case with the non-linear or unknown channel model, a data-driven implementation of the ISIC MIMO detector was proposed in [15], where simple dedicated deep neural networks (DNNs) replace the channel model-based building blocks of the ISIC MIMO detector in [10].

The calculation of OSIC and ISIC detectors for MIMO requires quite high complexity. Several efficient implementations of OSIC were proposed, which mainly include the recursive algorithms [18, 19, 20, 21] and the square-root algorithms [22, 23, 24, 25] based on the detection error covariance matrix and its square-root matrix, respectively. On the other hand, several efficient implementations of ISIC were proposed in [26, 27].

In addition to the above-mentioned EP effect reductions and efficient implementations, there have been many other researches related to the OSIC MIMO detector recently. It was applied in linear dispersion (LD) based perfect space-time coded MIMO systems [28], MIMO orthogonal frequency division multiplexing (OFDM) with index modulation (IM) [29], and the power-domain non-orthogonal multiple access (NOMA) systems with MIMO [30, 31, 32]. For MIMO filter bank multicarrier (FBMC) systems, neighbourhood detection based OSIC detectors with QR and sorted QR decompositions were proposed in [33]. For 5G mobile networks with massive MIMO, the OSIC detector was combined with the K-correction algorithm based on ML criterion and the partial tree search detection in [34] and [35], respectively. In [36], an exact closed-form expression was given for the average outage probability of zero-forcing (ZF) OSIC V-BLAST with $2$ transmitters and $r\geq 2$ receivers. Moreover, comprehensive survey of MIMO detectors (including OSIC detectors) has been given in [37] and [38].

In this paper, we focus on the recursive OSIC detectors [18, 19, 20, 21]. The original recursive algorithm proposed in [18] was improved in [19] and [20] to reduce the computational complexity, and the improvements were then incorporated in [21] to give the “fastest known algorithm” before [21]. The contributions of [21] also include one recursive algorithm with speed advantage that adopts a further improvement for the “fastest known algorithm” with the speedups claimed to be $1.3$ , and the other recursive algorithm with memory saving that is slower than the “fastest known algorithm”. We further improve the matrix inversion applied in [19] and the interference cancellation scheme proposed in [20], both of which are part of the recursive algorithm with speed advantage in [21].

Firstly, the lemma for inversion of partitioned matrix [39, Ch. 14.12] applied in [19] is replaced with another one [40, Equ. 8], to accelerate the computation of the initial detection error covariance matrix. Then the formulas adopted by us for matrix inversion are utilized to deduce the improved interference cancellation scheme with memory saving, which uses the entries in the error covariance matrix instead of those in its inverse matrix. Finally, we use the above two improvements to propose the recursive algorithm with both speed advantage and memory saving, which is the fastest and requires the least memories compared to the existing recursive algorithms. In our future works, the proposed recursive OSIC detector with both speed advantage and memory saving will be extended to the recursive ISIC detector, which can reduce the required computations and memories compared to the low-complexity ISIC detector proposed in [27] recently.

The V-BLAST system model is overviewed in section II, followed by the description of the recursive algorithms for V-BLAST in [18, 20, 19, 21] in Section III. Two recursive algorithms for V-BLAST are proposed in Section IV to improve the recursive algorithm with speed advantage in [21]. Then the presented recursive algorithms for V-BLAST are compared in Section V. Finally, we make conclusion in Section VI.

In what follows, uppercase and lowercase bold face letters represent matrices and column vectors, respectively. $(\bullet)^{T}$ , $(\bullet)^{*}$ , $(\bullet)^{H}$ , and $(\bullet)^{-1}$ denote transposition, conjugate, conjugate transposition, and inversion of matrices, respectively. ${\bf{{\rm I}}}_{M}$ is the identity matrix with size $M$ . Moreover, the MATLAB standard is followed in the notations for some variables.

II System Model

The considered V-BLAST system consists of $M$ transmit antennas and $N(\geq M)$ receive antennas in a rich-scattering and flat-fading wireless channel. At the transmitter, the data stream is de-multiplexed into $M$ sub-streams, and each sub-stream is encoded and fed to its respective transmit antenna. Denote the vector of transmit symbols from $M$ antennas as

{\bf{s}}_{M}=[s_{1},s_{2},\cdots,s_{M}]^{T},

(1)

and denote the $N\times M$ complex channel matrix ${\bf{H}}$ as

{\bf{H}}_{M}=[{\bf{h}}_{:1},{\bf{h}}_{:2},\cdots,{\bf{h}}_{:M}]

(2)

where ${\bf{h}}_{:m}$ ( $m=1,2,\cdots,M$ ) is the $m^{th}$ column of ${\bf{H}}$ . Then the symbols received in $N$ antennas can be written as

{\bf{x}}^{(M)}={\bf{H}}_{M}\cdot{\bf{s}}_{M}+{\bf{n}},

(3)

where ${\bf{n}}$ is the $N\times 1$ complex Gaussian noise vector with zero mean and covariance $\sigma_{n}^{2}{\bf{{\rm I}}}_{N}$ .

The conventional V-BLAST scheme detects $M$ components of ${\bf{s}}_{M}$ by $M$ recursions with the optimal ordering based on SNR. In each recursion, the component with the highest post detection SNR among all the undetected components is detected by a linear filter and then its effect is substracted from the received signal vector [5, 22, 18]. Assume that in the $i^{th}$ ( $i=1,2,\cdots,M$ ) recursion, the undetected $m=M-i+1$ transmit symbols and the corresponding channel matrix are the first $m$ entries and columns of the permuted ${\bf{s}}_{M}$ and ${\bf{H}}_{M}$ , respectively, i.e., ${\bf{s}}_{m}$ and ${\bf{H}}_{m}$ defined by (1) and (2). Then we obtain the reduced-order problem (3) with $M$ replaced by any $m(<M)$ . Accordingly, the linear MMSE estimation of the undetected $m$ symbols (i.e., ${\bf{s}}_{m}$ ) is

{\bf{\hat{s}}}_{m}=\left({{\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m}}\right)^{-1}{\bf{H}}_{m}^{H}{\bf{x}}^{(m)},

(4)

where $\alpha=\frac{\sigma_{n}^{2}}{\sigma_{s}^{2}}$ , and $\sigma_{s}^{2}$ is the power of each symbol in ${\bf{s}}_{M}$ .

III The Recursive V-BLAST Algorithms

III-A The Original Recursive V-BLAST Algorithm

The original recursive V-BLAST algorithm [18] is based on

		$\displaystyle{\bf{R}}_{m}={\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m}$			(5a)
		$\displaystyle{\bf{Q}}_{m}={\bf{R}}_{m}^{-1}=\left({{\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m}}\right)^{-1}.$			(5b)

The above ${\bf{Q}}_{m}$ is the covariance matrix [22, 18] for the detection error ${\bf{e}}_{m}={\bf{s}}_{m}-{\bf{\hat{s}}}_{m}$ , i.e., $E\{{\bf{e}}_{m}{\bf{e}}_{m}^{H}\}=\sigma_{n}^{2}{\bf{Q}}_{m}$ . Then in each recursion, the symbol with the highest post detection SNR among all the undetected $m$ symbols corresponds to the minimal diagonal entry in ${\bf{Q}}_{m}$ .

In [18], the Sherman-Morrison formula is applied to deduce

{{\bf{Q}}_{[n]}}={{\bf{Q}}_{[n-1]}}-\frac{{{{\bf{Q}}_{[n-1]}}{{\bf{h}}_{n:}}{\bf{h}}_{n:}^{H}{{\bf{Q}}_{[n-1]}}}}{{1+{\bf{h}}_{n:}^{H}{{\bf{Q}}_{[n-1]}}{{\bf{h}}_{n:}}}},

(6)

where ${\bf{h}}_{n:}^{H}$ is the $n^{th}$ row of ${\bf{H}}$ . Accordingly, the initial ${\bf{Q}}={\bf{Q}}_{[N]}$ and ${\bf{R}}={\bf{R}}_{[N]}$ are obtained by computing (6) and

{\bf{R}}_{[n]}=\sum\limits_{l=1}^{n}{\bf{h}}_{l:}{\bf{h}}_{l:}^{H}+\alpha{\bf{I}}_{M}={\bf{R}}_{[n-1]}+{\bf{h}}_{n:}{\bf{h}}_{n:}^{H}

(7)

iteratively for $n=1,2,\cdots,N$ , where ${\bf{Q}}_{[0]}=\frac{1}{\alpha}{\bf{I}}_{M}$ and ${\bf{R}}_{[0]}=\alpha{\bf{I}}_{M}$ . Then set the initial ${\bf{R}}_{M}={\bf{R}}$ , ${\bf{Q}}_{M}={\bf{Q}}$ , ${\bf{x}}^{(M)}={\bf{x}}$ and ${\bf{p}}=\left[1,2,\cdots,M\right]^{T}$ for the recursion phase.

In the recursion with $m$ ( $m=M,M-1,\cdots,2$ ) undetected symbols, ${\bf{p}}$ is permuted so that the ${p_{m}}$ -th (i.e., ${\bf{p}}(m)$ -th) symbol has the highest SNR among the undetected symbols, while ${\bf{H}}_{m}$ , ${\bf{R}}_{m}$ and ${\bf{Q}}_{m}$ are permuted accordingly. Then the estimation of the ${p_{m}}$ -th symbol is computed by

\hat{s}_{{p_{m}}}={\bf{q}}_{m}^{H}{\bf{H}}_{m}^{H}{\bf{x}}^{(m)}

(8)

with ${\bf{q}}_{m}$ to be the $m^{th}$ column of ${\bf{Q}}_{m}$ , and $\hat{s}_{{p_{m}}}$ is quantized to $s_{{p_{m}}}$ according to the constellation in use. For the next recursion, the interference of $s_{{p_{m}}}$ is cancelled in ${\bf{x}}^{(m)}$ to get

{\bf{x}}^{(m-1)}={\bf{x}}^{(m)}-s_{{p_{m}}}{\bf{h}}_{:{{m}}},

(9)

${\bf{R}}_{m-1}$ is determined from ${\bf{R}}_{m}$ by

{\bf{R}}_{m}=\left[{\begin{array}[]{*{20}c}{{\bf{R}}_{m-1}}&{{\bf{\bar{r}}}_{m}}\\ {{\bf{\bar{r}}}_{m}^{H}}&{\gamma_{m}}\\ \end{array}}\right],

(10)

and ${\bf{Q}}_{m}$ is deflated to ${\bf{Q}}_{m-1}$ by

{\bf{Q}}_{m-1}={\bf{\bar{Q}}}_{m-1}-\frac{{{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}}}{{{\gamma_{m}}+{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}}},

(11)

where ${\bf{\bar{Q}}}_{m-1}$ is ${\bf{Q}}_{m}$ with the last row and column removed, as described by [21, Eq. (13)]

{\bf{Q}}_{m}=\left[{\begin{array}[]{*{20}c}{{\bf{\bar{Q}}}_{m-1}}&{{\bf{\bar{q}}}_{m}}\\ {{\bf{\bar{q}}}_{m}^{H}}&{\omega_{m}}\\ \end{array}}\right].

(12)

In (10)-(12), ${\bf{\bar{r}}}_{m}$ and ${{\bf{\bar{q}}}_{m}}$ are ${\bf{r}}_{m}$ and ${\bf{q}}_{m}$ (i.e., the $m^{th}$ columns of ${\bf{R}}_{m}$ and ${\bf{Q}}_{m}$ ) with the last entry removed, respectively.

III-B Existing Improvements

In the original recursive algorithm, the dominant complexity in the order of $O({M^{3}}+{M^{2}}N)$ comes from the initialization of ${\bf{Q}}$ and ${\bf{R}}$ , the estimation of $s_{p_{m}}$ and the deflation of ${\bf{Q}}_{m}$ (where $2\leq m\leq M$ ) by (6), (7), (8) and (11), respectively. To reduce the computational complexity of the original recursive algorithm, the following Improvements I-IV have been proposed in [19], [20] and [21], respectively.

Improvement I: To speed up the computation of the initial ${\bf{Q}}$ , the Sherman-Morrison formula (6) applied in [18] is replaced with the lemma for inversion of partitioned matrix [39, Ch. 14.12] in [19] to compute the inverse of ${\bf{R}}_{m}$ partitioned by (10), which is ${\bf{Q}}_{m}$ partitioned by (12) where

		$\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\frac{{{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}}}{{\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}}}$			(13a)
		$\displaystyle{\bf{\bar{q}}}_{m}={-\gamma_{m}^{-1}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}}$			(13b)
		$\displaystyle\omega_{m}={\gamma_{m}^{-1}+\gamma_{m}^{-2}{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}}.$			(13c)

In [19], the above (13c) and (12) are applied to compute ${\bf{Q}}_{m}={\bf{R}}_{m}^{-1}$ from ${\bf{Q}}_{m-1}$ iteratively for $m=2,3,...,M$ , to get the initial ${\bf{Q}}={\bf{Q}}_{M}$ from

{\bf{Q}}_{1}=1/{\bf{R}}_{1}.

(14)

Improvement II: In [20], (8) has been simplified to

\hat{s}_{p_{m}}={\bf{q}}_{m}^{H}{\bf{z}}_{m}

(15)

with ${\bf{z}}_{m}$ defined by

{\bf{z}}_{m}={\bf{H}}_{m}^{H}{\bf{x}}^{(m)}.

(16)

Only the initial ${\bf{z}}_{M}$ is computed by (16) with $m=M$ . Then for $m=M,M-1,\cdots,2$ , the interference of $s_{p_{m}}$ is cancelled in the permuted ${\bf{z}}_{m}$ to update ${\bf{z}}_{m}$ into ${\bf{z}}_{m-1}$ efficiently by

{\bf{z}}_{m-1}={\bf{\bar{z}}}_{m}-s_{p_{m}}{{\bf{\bar{r}}}_{m}},

(17)

where ${\bf{\bar{z}}}_{m}$ is ${\bf{z}}_{m}$ with the last entry removed, as ${\bf{\bar{r}}}_{m}$ and ${{\bf{\bar{q}}}_{m}}$ .

Improvement III: In [19, Algorithm II], [20], the structure of Hermitian matrices is exploited to simplify the initialization and deflation of ${\bf{Q}}$ by (6) and (11), respectively.

Improvement IV: In [21], the deflation of ${\bf{Q}}_{m}$ was fastened by replacing (11) with

{\bf{Q}}_{m-1}={\bf{\bar{Q}}}_{m-1}-\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H},

(18)

where ${\bf{\bar{Q}}}_{m-1}$ , ${\bf{\bar{q}}}_{m}$ and $\omega_{m}$ are in ${\bf{Q}}_{m}$ , as shown in (12).

In [21], the above Improvements I-III were incorporated into the original algorithm to give the “fastest known algorithm” before [21], and then the above Improvement IV was proposed to improve the “fastest known algorithm” into the algorithm with speed advantage, which is summarized in Algorithm 1.

Algorithm 1 The Algorithm with Speed Advantage in [21]

0: Set

{\bf{p}}=\left[1,2,\cdots,M\right]^{T}

and compute

{\bf{z}}_{M}={\bf{H}}^{H}{\bf{x}}

; Compute (7) iteratively for

n=1,2,\cdots,N

to obtain the initial

{\bf{R}}_{M}={\bf{R}}_{[N]}

; Compute

{\bf{Q}}_{1}

by (14), and then compute (13c) and (12) iteratively for

m=2,3,\cdots,M

, to obtain the initial

{\bf{Q}}_{M}

;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

q_{jj}

is the

j^{th}

diagonal entry of

{\bf{Q}}_{m}

; Permute entries

l_{m}

and

m

{\bf{p}}

and

{\bf{z}}_{m}

; Permute rows and columns

l_{m}

and

m

{\bf{R}}_{m}

and

{\bf{Q}}_{m}

;

2: Compute

\hat{s}_{{p_{m}}}

by (15), which is quantized to

s_{{p_{m}}}

;

3: Cancel the effect of

s_{{p_{m}}}

{\bf{z}}_{m}

to obtain

{\bf{z}}_{m-1}

by (17), and deflate

{{\bf{Q}}_{m}}

{{\bf{Q}}_{m-1}}

by (18);

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

On the other hand, the algorithm with memory saving proposed in [21] does not calculate ${\bf{R}}$ to avoid the overhead of storing and permuting ${\bf{R}}$ . It only incorporates Improvements III and IV into the original algorithm, and has not incorporated Improvements I and II where the entries in ${\bf{R}}$ are utilized. Moreover, it no longer computes ${\bf{R}}_{M}$ by (7) in the initialization phase, since the deflation of ${\bf{Q}}_{m}$ in the recursion phase uses the entries in ${\bf{Q}}$ instead of those in ${\bf{R}}$ after Improvement IV is incorporated. The algorithm with memory saving in [21] is summarized in Algorithm 2.

Algorithm 2 The Algorithm with Memory Saving in [21]

0: Set

{\bf{p}}=\left[1,2,\cdots,M\right]^{T}

and

{\bf{H}}_{M}={\bf{H}}

; Compute (6) iteratively for

n=1,2,\cdots,N

to obtain the initial

{\bf{Q}}_{M}={\bf{Q}}_{[N]}

;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

q_{jj}

is the

j^{th}

diagonal entry of

{\bf{Q}}_{m}

; Permute entries

l_{m}

and

m

{\bf{p}}

; Permute columns

l_{m}

and

m

{\bf{H}}_{m}

; Permute rows and columns

l_{m}

and

m

{\bf{Q}}_{m}

;

2: Compute

\hat{s}_{{p_{m}}}

by (8), which is quantized to

s_{{p_{m}}}

;

3: Cancel the effect of

s_{{p_{m}}}

{\bf{x}}^{(m)}

to obtain

{\bf{x}}^{(m-1)}

by (9); Deflate

{{\bf{Q}}_{m}}

{{\bf{Q}}_{m-1}}

by (18); Remove the last column of

{\bf{H}}_{m}

to obtain

{\bf{H}}_{m-1}

;

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

To the best of our knowledge, the algorithm with speed advantage (i.e., Algorithm 1) and the algorithm with memory saving (i.e., Algorithm 2) in [21] require the least computations and memories, respectively, among the existing recursive V-BLAST algorithms including those in [18, 20, 19, 21].

IV Proposed Two Recursive V-BLAST Algorithms

In this section, we propose ¹¹1Only Improvement V was presented in part at IEEE International Conference on Communications (ICC). Notice that Improvement V is obtained by applying [40, Equ. 8] in this paper, while it was re-derived without citing any literature at ICC’09. Improvements V and VI to replace the above Improvements I and II, respectively. Improvement V accelerates Improvement I that inverts a partitioned matrix. Then the formulas adopted in Improvement V are applied to deduce Improvement VI, which saves memories without sacrificing speed compared to Improvement II, by using ${\bf{Q}}$ instead of ${\bf{R}}$ for interference cancellation in the recursion phase.

IV-A Improvement V with Speed Advantage for Proposed Recursive Algorithm I to Invert a Partitioned Matrix

Instead of the lemma for inversion of partitioned matrix [39, Ch. 14.12] applied in [19], the proposed Improvement V uses the equation inverting a partitioned matrix [40, Equ. 8] ²²2The equation inverting a partitioned matrix was not first proposed in [40], where some literatures published between 1917 and 1978 were listed near equation (8) to discuss the first explicit presentation of that equation., to compute the inverse of ${\bf{R}}_{m}$ partitioned by (10), which is ${\bf{Q}}_{m}$ partitioned by (12) where

		$\displaystyle\omega_{m}=\left({\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}}\right)^{-1}$			(19a)
		$\displaystyle{\bf{\bar{q}}}_{m}=-\omega_{m}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}$			(19b)
		$\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}.$			(19c)

From equation (8) in [40], we deduce (19c) in Appendix A, which significantly simplifies (13c) from Improvement I. We also write (19c) as

		$\displaystyle{\bf{\tilde{q}}}_{m}={\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}$			(20a)
		$\displaystyle\omega_{m}=\left({\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{\tilde{q}}}_{m}}\right)^{-1}$			(20b)
		$\displaystyle{\bf{\bar{q}}}_{m}=-\omega_{m}{\bf{\tilde{q}}}_{m}$			(20c)
		$\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H},$			(20d)

to avoid the unnecessary division to compute $\omega_{m}^{-1}$ in (19c). In Algorithm 1 (i.e., the improved algorithm with speed advantage in [21]), we compute the initial ${\bf{Q}}$ by (20d) from Improvement V instead of (13c) from Improvement I, to obtain the proposed recursive algorithm I with speed advantage.

IV-B Improvement VI with Memory Saving for Interference Cancellation

In the proposed Improvement VI, the above (20d) from Improvement V is applied to deduce an improved interference cancellation scheme with memory saving that avoids using ${\bf{R}}_{M}$ in the recursion phase. For simplicity, the detection order is assumed to be $M,M-1,\cdots,1$ in this subsection.

Obviously, we need to avoid using ${{\bf{\bar{r}}}_{m}}$ in ${\bf{R}}_{m}$ to update ${\bf{z}}_{m}$ into ${\bf{z}}_{m-1}$ by (17). Then in the recursion phase, the initial ${\bf{z}}_{M}$ will remain unchanged, which is applied to define the vector

{{\bf{d}}_{m}}={{\bf{Q}}_{m}}\left({\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}\right),

(21)

where ${\bf{z}}_{M}(1:m)$ denotes the first $m$ entries of ${\bf{z}}_{M}$ . ${{\bf{d}}_{m}}$ defined by (21), which becomes

{{\bf{d}}_{M}}={{\bf{0}}_{M}}

(22)

when $m=M$ , is applied to compute the estimation of $s_{p_{m}}$ by

\hat{s}_{p_{m}}={\bf{q}}_{m}^{H}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}}(m),

(23)

and can be updated into ${{\bf{d}}_{m-1}}$ efficiently by

{{\bf{d}}_{m-1}}={\bf{\bar{d}}}_{m}-\left({{{s}_{{p_{m}}}}+{\bf{d}}_{m}(m)}\right){\bf{\bar{q}}}_{m}/\omega_{m},

(24)

where ${{\bf{d}}_{m}}(m)$ is the last entry in ${{\bf{d}}_{m}}$ , and ${\bf{\bar{d}}}_{m}$ denotes ${\bf{d}}_{m}$ with the last entry removed, i.e.,

{\bf{d}}_{m}=\left[{\begin{array}[]{*{20}{c}}{\bf{\bar{d}}}_{m}^{T}&{{\bf{d}}_{m}}(m)\end{array}}\right]^{T}.

(25)

To deduce the above (23), write (21) as the column vector ${{\bf{Q}}_{m}}{{\bf{z}}_{m}}={{\bf{Q}}_{m}}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}}$ with the last entry to be

{\bf{q}}_{m}^{H}{{\bf{z}}_{m}}{{=}}{\bf{q}}_{m}^{H}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}}(m),

(26)

and then substitute (26) into (15). Moreover, the above (24) will be deduced in the rest of this subsection, which cancels the interference of $s_{p_{m}}$ in ${\bf{d}}_{m}$ to update ${\bf{d}}_{m}$ into ${\bf{d}}_{m-1}$ efficiently.

Firstly, we verify that ${\bf{\bar{d}}}_{m}$ , ${\bf{d}}_{m}(m)$ and ${{\bf{d}}_{m-1}}$ in (24) satisfy

		$\displaystyle{\begin{array}[]{{20}{c}}{\bf{\bar{d}}}_{m}=\left[{\begin{array}[]{{20}{c}}{{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}}&-\omega_{m}{\bf{\tilde{q}}}_{m}\end{array}}\right]\\ \times\left({{\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}}\right)\end{array}}$			(27d)
		$\displaystyle{\bf{d}}_{m}(m)=\left[{\begin{array}[]{*{20}{c}}-\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{{\omega_{m}}}\end{array}}\right]\left({{\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}}\right)$			(27f)
		$\displaystyle{{\bf{d}}_{m-1}}={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}.$			(27g)

We substitute (20d) and (20c) into (12) to obtain

{\bf{Q}}_{m}=\left[{\begin{array}[]{*{20}c}{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}&-\omega_{m}{\bf{\tilde{q}}}_{m}\\ -\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{\omega_{m}}\\ \end{array}}\right],

(28)

and then substitute (28) and (25) into (21) to obtain (27d) and (27f). To verify (27g), substitute (17) into (21) with $m=m-1$ to write ${{\bf{d}}_{m-1}}$ as

	$\displaystyle{{\bf{d}}_{m-1}}$	$\displaystyle={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}+{{s}_{{p_{m}}}}{{\bf{\bar{r}}}_{m}}}\right)$
		$\displaystyle={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}},$

into which substitute (20a).

The above (27g) is applied to deduce (24) finally. Substitute (27g) and (27d) into ${{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m}$ to obtain

{{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m}={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}\\ -\left[\begin{array}[]{*{20}{c}}{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}&-\omega_{m}{\bf{\tilde{q}}}_{m}\end{array}\right]\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right],

which can be simplified into

{{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m}={{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}+{\bf{\tilde{q}}}_{m}\times\\ \left[{\begin{array}[]{*{20}{c}}-\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{\omega_{m}}\end{array}}\right]\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right]

(29)

since the sum of all the terms containing ${{\bf{Q}}_{m-1}}$ is zero. Finally, we substitute (27f) and ${\bf{\tilde{q}}}_{m}=-{\bf{\bar{q}}}_{m}/\omega_{m}$ (deduced from (20c)) into (29) to obtain ${{\bf{d}}_{m-1}}-{{\bf{\bar{d}}}_{m}}=-{s_{{p_{m}}}}\frac{{{{{\bf{\bar{q}}}}_{m}}}}{{{\omega_{m}}}}-\frac{{{{{\bf{\bar{q}}}}_{m}}}}{{{\omega_{m}}}}{{\bf{d}}_{m}}(m)$ , i.e., (24). Notice that in the above derivation, ${\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}$ in (27d) and (27f) needs to be written as $\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right]$ .

${{\bf{d}}_{m}}$ defined by (21) can be replaced with

{{\mathbf{t}}_{m}}={{\mathbf{Q}}_{m}}{{\mathbf{z}}_{m}},

(30)

which is substituted into (21) to show that ${{\bf{d}}_{m}}$ and ${{\mathbf{t}}_{m}}$ satisfy

{{\mathbf{d}}_{m}}={{\mathbf{Q}}_{m}}{{\mathbf{z}}_{M}}(1:m)-{{\mathbf{t}}_{m}}.

(31)

Accordingly, (23) and (24) are also replaced with

{{\hat{s}}_{{p_{m}}}}={{\bf{t}}_{m}}(m)

(32)

and

{{\bf{t}}_{m-1}}={{\bf{\bar{t}}}_{m}}+\left({{s_{{p_{m}}}}-{{\bf{t}}_{m}}(m)}\right){{\bf{\bar{q}}}_{m}}/{\omega_{m}}

(33)

(with ${{\bf{\bar{t}}}_{m}}$ obtained by removing the last entry in ${{\bf{t}}_{m}}$ ), respectively, which are deduced in Appendix B. However, a little more computations ³³3The initial ${{\mathbf{t}}_{M}}$ is computed by (30) with $M^{2}$ multiplications and additions, while the initial ${{\bf{d}}_{M}}$ is set by (22) without any computations. On the other hand, (32) only saves $\sum\limits_{m=1}^{M}m\approx\frac{{{M^{2}}}}{2}$ multiplications and additions totally compared to (23). are required when (32) and (33) are utilized instead of (23) and (24), respectively. Then the proposed recursive algorithm with ${{\bf{t}}_{m}}$ will be ignored in the remainder of this paper.

IV-C Proposed Recursive Algorithm II with Both Speed Advantage and Memory Saving

In the proposed recursive algorithm I with speed advantage, i.e., Algorithm 1 with (13c) replaced by (20d), we can replace (15) and (17) (from Improvement II) with (23) and (24) (from Improvement VI), respectively. Then ${{\bf{\bar{r}}}_{m}}$ in (17) is replaced with ${\bf{\bar{q}}}_{m}$ and $\omega_{m}$ in (24) (i.e., ${\bf{q}}_{m}$ in (23)), to avoid using ${{\bf{R}}_{M}}$ in the recursion phase. Accordingly, we can cover ${{\bf{R}}_{M}}$ with ${{\bf{Q}}_{M}}$ in the initialization phase to save memories.

When we compute ${{\bf{Q}}_{M}}$ from ${{\bf{R}}_{M}}$ by the iterations of (20d) and (12), ${\bf{Q}}_{i}$ is computed from ${\bf{Q}}_{i-1}$ and column $i$ of ${\bf{R}}_{i}$ (i.e., ${\bf{\bar{r}}}_{i}$ and $\gamma_{i}$ ) in the $i^{th}$ ( $2\leq i\leq M$ ) iteration, i.e., only columns $i+1$ to $M$ in the upper triangular part of ${\bf{R}}_{M}$ are required in the next iterations $i+1$ to $M$ . Accordingly, the subsequent computations will not be affected if we cover the submatrix ${\bf{R}}_{i}$ in ${\bf{R}}_{M}$ with ${\bf{Q}}_{i}$ , by writing (20d) from Improvement V as

		$\displaystyle{\bf{\tilde{q}}}={\bf{R}}(1:i-1,1:i-1){\bf{R}}(1:i-1,i)$			(34a)
		$\displaystyle{\bf{R}}(i,i)=1/\left({{\bf{R}}(i,i)-{\bf{R}}{{(1:i-1,i)}^{H}}{\bf{\tilde{q}}}}\right)$			(34b)
		$\displaystyle{\bf{R}}(1:i-1,i)=-{\bf{R}}(i,i)\cdot{\bf{\tilde{q}}}$			(34c)
		$\displaystyle{\bf{R}}(i,1:i-1)={\bf{R}}{(1:i-1,i)^{H}}$			(34d)
		$\displaystyle\begin{array}[]{l}{\bf{R}}(1:i-1,1:i-1)=\\ {\bf{R}}(1:i-1,1:i-1)-{\bf{\tilde{q}}}\times{\bf{R}}(i,1:i-1).\end{array}$			(34g)

We write (20a), (20b) and (20c) as the above (34a), (34b) and (34c), respectively. In (34d), we use the conjugate transposition of column $i$ in the upper triangular part of ${\bf{Q}}_{i}$ to cover row $i$ in its low triangular part. Moreover, we use (20c) to simplify (20d) into ${\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}-{\bf{\tilde{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}$ , which is written as (34g).

To cover ${\bf{R}}_{M}$ with ${\bf{Q}}_{M}$ , we compute (34g) iteratively for $i=2,3,\cdots,M$ , while ${\bf{R}}_{1}$ is covered with ${\bf{Q}}_{1}$ firstly by

{\bf{R}}(1,1)=1/{\bf{R}}(1,1),

(35)

which is deduced from (14). Notice that in (34g), only column $i$ in the upper triangular part of ${\bf{R}}_{i}$ is utilized to compute ${\bf{Q}}_{i}$ , while the entire matrix ${\bf{R}}_{i}$ is covered with ${\bf{Q}}_{i}$ .

In the initialization phase, we also cover ${{\bf{H}}_{M}}$ with ${{\bf{R}}_{M}}$ to save memories, since ${{\bf{H}}_{M}}$ is unwanted in the recursion phase. Actually, we choose the implementation which covers the upper triangular part of a square submatrix in

{\bf{\tilde{H}}}={\bf{H}}_{M}^{H}

(36)

with that of ${{\bf{R}}_{M}}$ . Accordingly, we substitute (36) into (5a) to get ${\bf{R}}_{M}={\bf{\tilde{H}}}{\bf{\tilde{H}}}^{H}+\alpha{\bf{I}}_{M}$ , and apply it to cover row $i$ in the upper triangular part of the square submatrix ${\bf{\tilde{H}}}(:,1:M)$ with that of ${\bf{R}}_{M}$ , by

{\bf{\tilde{H}}}(i,i:M)={\bf{\tilde{H}}}(i,:){\bf{\tilde{H}}}{(i:M,:)^{H}}+\left[\alpha\enspace{{\bf{0}}_{M-i}^{T}}\right].

(37)

By computing (37) iteratively for $i=1,2,\cdots,M$ , we cover the upper triangular part of the square submatrix ${\bf{\tilde{H}}}(:,1:M)$ with that of ${\bf{R}}_{M}$ . The above reuse of memories will not affect subsequent calculations, since only rows $i$ to $M$ of ${\bf{\tilde{H}}}$ are utilized in (37) to compute row $i$ in the upper triangular part of ${\bf{R}}_{M}$ , i.e., row $i$ of ${\bf{\tilde{H}}}$ will not be utilized in the next $(i+1)^{th}$ to $M^{th}$ iterations of (37). Moreover, we set the initial ${{\bf{d}}_{M}}$ by (22), and substitute (36) into (16) with $m=M$ to compute ${\bf{z}}_{M}$ from ${\bf{\tilde{H}}}$ by

{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}^{(M)}.

(38)

In the recursion phase, we still permute ${\bf{Q}}_{m}$ according to the SNR ordering, which causes ${\bf{q}}_{m}$ (i.e., ${\bf{\bar{q}}}_{m}$ and $\omega_{m}$ ) in ${\bf{Q}}_{m}$ permuted. Then it can be seen from (23) and (24) that we need to permute entries $l_{m}$ and $m$ in ${\bf{z}}_{M}$ and ${{\bf{d}}_{m}}$ .

We summarize the proposed recursive V-BLAST algorithm II with both speed advantage and memory saving in Algorithm 3, which revises Algorithm 1 (i.e., the improved algorithm with speed advantage in [21]) by replacing Improvements I and II with Improvements V and VI, respectively. In Algorithm 3, we utilize (34g) from Improvement V, (23) and (24) from Improvement VI, and (35)-(22).

Algorithm 3 The Proposed Recursive Algorithm with Both Speed Advantage and Memory Saving

0: Set

{\bf{p}}=\left[1,2,\cdots,M\right]^{T}

and

{{\bf{d}}_{M}}={{\bf{0}}_{M}}

; Store

{\bf{\tilde{H}}}={\bf{H}}^{H}

and compute

{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}

; Compute (37) iteratively for

i=1,2,\cdots,M

to cover the upper triangular part of

{\bf{\tilde{H}}}(:,1:M)

with that of

{\bf{R}}_{M}

; Compute (35), and then compute (34g) iteratively for

i=2,3,\cdots,M

, to cover the entire matrix

{\bf{R}}_{M}

(i.e., the entire square submatrix

{\bf{\tilde{H}}}(:,1:M)

) with

{\bf{Q}}_{M}

;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

q_{jj}

is the

j^{th}

diagonal entry of

{\bf{Q}}_{m}

. Permute entries

l_{m}

and

m

{\bf{p}}

{\bf{z}}_{M}

and

{{\bf{d}}_{m}}

. Permute rows and columns

l_{m}

and

m

{\bf{Q}}_{m}

;

2: Compute

\hat{s}_{{p_{m}}}

by (23), which is quantized to

s_{{p_{m}}}

;

3: Deflate

{\bf{d}}_{m}

{{\bf{d}}_{m-1}}

by (24), and deflate

{{\bf{Q}}_{m}}

{{\bf{Q}}_{m-1}}

by (18);

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

To further save memories and permutation operations, we can avoid permuting ${\bf{z}}_{M}$ , ${{\bf{d}}_{m}}$ and ${\bf{Q}}_{m}$ in the recursion phase of Algorithm 3, and store only the upper triangular part of the Hermitian ${{\bf{Q}}_{i}}$ ( $i=1,2,\cdots,M$ ). The corresponding versions of the proposed recursive algorithm II with both speed advantage and memory saving are introduced in Appendix C.

V Performance Analysis and Numerical Results

TABLE I: Complexities of the Presented Recursive V-BLAST Algorithms

Complexity

The original algorithm in [18]

(

3M^{2}N+\frac{2}{3}M^{3}

\frac{5}{2}M^{2}N+\frac{1}{2}M^{3}

)

The algorithm with memory saving in [21]

2M^{2}N+\frac{1}{6}M^{3}

The “fastest known algorithm” before [21]

\frac{1}{2}M^{2}N+\frac{4}{3}M^{3}

The algorithm with speed advantage in [21]

\frac{1}{2}M^{2}N+M^{3}

The proposed two algorithms

\frac{1}{2}M^{2}N+\frac{2}{3}M^{3}

The dominant complexity of (20d) is $\frac{1}{2}M^{3}$ complex multiplications and additions, which is dedicated to computing ${\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}$ and ${\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}$ for $m=2,3,...,M$ . The complexity of (13c) should be $\frac{5}{6}M^{3}$ instead of $\frac{1}{2}M^{3}$ claimed in [21], which is dedicated to computing ${\bf{g}}_{m-1}={\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}$ , ${\bf Q}_{m-1}+{\bf g}_{m-1}{\bf g}_{m-1}^{H}/(\cdots)$ and ${\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}$ . Compared to the inversion step in [19, 21] to compute the initial ${\bf Q}$ by (13c) and (12), the proposed inversion step by (20d) and (12) achieves the speedup of $(\frac{5}{6})/(\frac{1}{2})=1.67$ , and requires only $\frac{1}{3}$ divisions.

The complexity claimed in [21] for the above-mentioned inversion step to compute the initial ${\bf Q}$ should increase by $\frac{5}{6}M^{3}-\frac{1}{2}M^{3}=\frac{1}{3}M^{3}$ . Then actually “fastest known algorithm” before [21] and the recursive algorithm with speed advantage in [21] require the complexities of $\frac{1}{2}M^{2}N+\frac{4}{3}M^{3}$ and $\frac{1}{2}M^{2}N+M^{3}$ , respectively, instead of $\frac{1}{2}M^{2}N+M^{3}$ and $\frac{1}{2}M^{2}N+\frac{2}{3}M^{3}$ claimed in [21]. Coincidentally, the above $\frac{1}{2}M^{2}N+\frac{2}{3}M^{3}$ (claimed in [21]) is exactly the dominant complexity of the proposed recursive algorithm I with speed advantage, and is also that of the proposed recursive algorithm II with both speed advantage and memory saving, since no dominant complexity is required to compute (15), (17), (23) and (24) from Improvements II and VI. On the other hand, the recursive algorithm with memory saving in [21] requires $2M^{2}N+\frac{1}{6}M^{3}$ multiplications instead of $\frac{5}{2}M^{2}N+\frac{1}{6}M^{3}$ multiplications claimed in [21, Equ. 27], since it adopts Improvement III where the computation of the initial ${\bf{Q}}$ by (6) costs $\frac{3}{2}M^{2}N$ multiplications [20, near Equ. 12] instead of $2M^{2}N$ multiplications claimed in [21, Equ. 27].

The dominant complexities of the presented recursive V-BLAST algorithms are compared in Table I, where an item $j$ denotes $j$ multiplications and additions, and $(j,k)$ denotes $j$ multiplications and $k$ additions. It can be seen from Table I that when $M=N$ , the actual speedup of the algorithm with speed advantage in [21] over the previous “fastest known algorithm” should be $(\frac{11}{6})/(\frac{3}{2})=1.22$ instead of $1.3$ claimed in [21], while the speedups of either proposed recursive algorithm over the algorithm in [21] with speed advantage and that with memory saving are $(\frac{3}{2})/(\frac{7}{6})=1.3$ and $(\frac{13}{6})/(\frac{7}{6})\approx 1.86$ , respectively.

Assuming $N=M$ , we carried out numerical experiments to count the average floating-point operations (flops) of the presented recursive V-BLAST algorithms, including the original algorithm [18], the algorithm with memory saving in [21], the “fastest known algorithm” before [21], the algorithm with speed advantage in [21], the proposed algorithm with speed advantage, and the proposed algorithm with both speed advantage and memory saving. All results are shown in Fig. 1. It can be seen that they are consistent with the theoretical flops calculation.

Refer to caption — Figure 1: Comparison of computational complexities among the original algorithm [18], the algorithm with memory saving in [21], the “fastest known algorithm” before [21], the algorithm with speed advantage in [21], the proposed algorithm with speed advantage, and the proposed algorithm with both speed advantage and memory saving.

With respect to the “fastest known algorithm” before [21], the algorithm with memory saving in [21] (i.e., Algorithm 2) costs more computations to save memories for storing ${\bf{R}}$ , and then only needs to store ${\bf{H}}$ and ${\bf{Q}}$ . As a comparison ⁴⁴4We only compare memories for storing matrices, which are at least $2\times 2$ ., the proposed recursive algorithm II with both speed advantage and memory saving only stores ${\bf{Q}}$ in the recursion phase, and only uses memories for storing ${\bf{H}}$ in the initialization phase since it covers ${\bf{H}}$ with ${\bf{R}}$ and ${\bf{Q}}$ successively. Accordingly, it can be concluded that the proposed algorithm II saves about half memories with respect to the algorithm with memory saving in [21], and then requires the least memories among the existing recursive V-BLAST algorithms.

VI Conclusion

Firstly, we presented the original recursive V-BLAST algorithm, and its existing Improvements I-IV to reduce the computational complexity. Moreover, we also described the existing recursive V-BLAST algorithm with speed advantage and that with memory saving, which incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. Then we propose Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, the proposed Improvement V uses another lemma to compute the inverse matrix with less complexity. On the other hand, the formulas adopted in the proposed Improvement V are applied to deduce Improvement VI with the improved interference cancellation scheme, which saves memories without sacrificing speed compared to Improvement II.

In the existing recursive V-BLAST algorithm with speed advantage, the proposed recursive algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed recursive algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. With respect to the existing algorithm with speed advantage, both proposed algorithms achieve the speedup of $1.3$ , and speed up the matrix inversion step by the factor of $1.67$ . On the other hand, the proposed algorithm II achieves the speedup of $1.86$ and saves about half memories, compared to the existing recursive V-BLAST algorithm with memory saving, and requires the least memories among the existing recursive algorithms.

We only focus on the OSIC detectors throughout this paper. In our future works, we will show that the reordered description based on the equivalent channel matrix for the ISIC detector can be regarded as the extension of the OSIC detector. Accordingly, we will extend the recursive OSIC detector with both speed advantage and memory saving proposed in this paper to the recursive ISIC detector, which can reduce the required computations and memories compared to the recently proposed low-complexity ISIC detector.

Appendix A The Proof of (19c)

Equation (8) in [40] can be written as

\left[{\begin{array}[]{*{20}{c}}{\bf{A}}&{\bf{U}}\\ {\bf{V}}&{\bf{D}}\end{array}}\right]^{-1}=\left[{\begin{array}[]{*{20}{c}}{{\bf{\tilde{A}}}}&{{\bf{\tilde{U}}}}\\ {{\bf{\tilde{V}}}}&{{\bf{\tilde{D}}}}\end{array}}\right]

(39)

with

		$\displaystyle{\bf{\tilde{A}}}={{\bf{A}}^{-1}}+({{\bf{A}}^{-1}}{\bf{U}}){({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}({\bf{V}}{{\bf{A}}^{-1}})$			(40a)
		$\displaystyle{\bf{\tilde{U}}}=-({{\bf{A}}^{-1}}{\bf{U}}){({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}$			(40b)
		$\displaystyle{\bf{\tilde{V}}}=-{({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}({\bf{V}}{{\bf{A}}^{-1}})$			(40c)
		$\displaystyle{\bf{\tilde{D}}}={({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}},$			(40d)

which inverts a partitioned matrix. By comparing (39) with (10) and (12), we can conclude that ${\bf{A}}$ , ${\bf{U}}$ , ${\bf{V}}$ , ${\bf{D}}$ , ${\bf{\tilde{A}}}$ , ${\bf{\tilde{U}}}$ and ${\bf{\tilde{D}}}$ can be replaced with ${{\bf{R}}_{m-1}}$ , ${{\bf{\bar{r}}}_{m}}$ , ${\bf{\bar{r}}}_{m}^{H}$ , ${\gamma_{m}}$ , ${\bf{\bar{Q}}}_{m-1}$ , ${{\bf{\bar{q}}}_{m}}$ and ${\omega_{m}}$ , respectively, and ${{\bf{A}}^{-1}}$ can be replaced with ${{\bf{Q}}_{m-1}}={\bf{R}}_{m-1}^{-1}$ . Accordingly, we can write (40a), (40b) and (40d) as

{\bf{\bar{Q}}}_{m-1}={{\bf{Q}}_{m-1}}+({{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})\times\\ {({\gamma_{m}}-{\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})^{-1}}({\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}),

(41)

{{\bf{\bar{q}}}_{m}}=-({{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}}){({\gamma_{m}}-{\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})^{-1}}

(42)

and (19a), respectively. Finally, we substitute (19a) into (42) to obtain (19b), and substitute (19a) and (19b) into (41) to obtain (19c).

Appendix B The Proof of (32) and (33)

It can be seen from (31) that

		$\displaystyle{{\mathbf{\bar{d}}}_{m}}={{\mathbf{Q}}_{m}}(1:m-1,:){{\mathbf{z}}_{M}}(1:m)-{{\mathbf{\bar{t}}}_{m}}$			(43a)
		$\displaystyle{{\mathbf{d}}_{m}}(m)=\mathbf{q}_{m}^{H}{{\mathbf{z}}_{M}}(1:m)-{{\mathbf{t}}_{m}}(m),$			(43b)

and it can be seen from (12) that

		$\displaystyle{{\mathbf{Q}}_{m}}(1:m-1,:)=\left[\begin{matrix}\mathbf{\bar{Q}}_{m-1}&\mathbf{\bar{q}}_{m}\\ \end{matrix}\right]$			(44a)
		$\displaystyle\mathbf{q}_{m}^{H}=\left[\begin{matrix}\mathbf{\bar{q}}_{m}^{H}&{{\omega}_{m}}\\ \end{matrix}\right].$			(44b)

Then substitute (19c) into (44a) to obtain ${{\mathbf{Q}}_{m}}(1:m-1,:)=\left[{{\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}}\quad{{\bf{\bar{q}}}_{m}}\right]$ , which is substituted into (43a) to obtain

$\displaystyle{{\bf{\bar{d}}}_{m}}$	$\displaystyle=\left[{{\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}}\quad{{\bf{\bar{q}}}_{m}}\right]{{\bf{z}}_{M}}(1:m)-{{\bf{\bar{t}}}_{m}}$
	$\displaystyle=({\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}){{\bf{z}}_{M}}(1:m-1)$
	$\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\;+{{\bf{z}}_{M}}(m){\bf{\bar{q}}}_{m}-{{\bf{\bar{t}}}_{m}}.$	(45)

On the other hand, substituted (44b) into (43b) to obtain

	$\displaystyle{{\bf{d}}_{m}}(m)$	$\displaystyle=\left[{\begin{array}[]{*{20}{c}}{{\bf{\bar{q}}}_{m}^{H}}&{{\omega_{m}}}\end{array}}\right]{{\bf{z}}_{M}}(1:m)-{{\bf{t}}_{m}}(m)$		(47)
		$\displaystyle={\bf{\bar{q}}}_{m}^{H}{{\bf{z}}_{M}}(1:m-1)+{\omega_{m}}{{\bf{z}}_{M}}(m)-{{\bf{t}}_{m}}(m).$		(48)

To deduce (32), we only need to substitute (43b) into (23) to obtain ${{\hat{s}}_{{p_{m}}}}={\bf{q}}_{m}^{H}{{\bf{z}}_{M}}(1:m)-\left({{\bf{q}}_{m}^{H}{{\bf{z}}_{M}}(1:m)-{{\bf{t}}_{m}}(m)}\right)$ . To deduce (33), we need to substitute (31) with $n=n-1$ , (45) and (48) into (24) to obtain

	$\displaystyle{{\bf{Q}}_{m-1}}{{\bf{z}}_{M}}(1:m-1)-{{\bf{t}}_{m-1}}=({\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H})\times$
	$\displaystyle{{\bf{z}}_{M}}(1:m-1)+{{\bf{z}}_{M}}(m){\bf{\bar{q}}}_{m}-{{\bf{\bar{t}}}_{m}}-$
	$\displaystyle\left({{s_{{p_{m}}}}+\left({{\bf{\bar{q}}}_{m}^{H}{{\bf{z}}_{M}}(1:m-1)+{\omega_{m}}{{\bf{z}}_{M}}(m)-{{\bf{t}}_{m}}(m)}\right)}\right){{\bf{\bar{q}}}_{m}}/{\omega_{m}},$

which can be easily simplified into (33).

Algorithm 4 The Proposed Version with Memory Saving and Fewer Permutation Operations

0: Same as Initialization in Algorithm 3;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

{q_{jj}}={\bf{Q}}(p_{j},p_{j})

is the

p_{j}^{th}

diagonal entry of

{\bf{Q}}

; Permute entries

l_{m}

and

m

{\bf{p}}

; Obtain

\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right)

and

{\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right)

{\bf{Q}}

, which form

{\bf{q}}_{m}={\left[{{\bf{\bar{q}}}_{m}^{T}}\enspace{{\omega_{m}}}\right]^{T}}

;

2: Compute

\hat{s}_{{p_{m}}}

by (49), and then quantize

\hat{s}_{{p_{m}}}

s_{{p_{m}}}

;

3: Use

{\bf{\bar{q}}}_{m}

and

\omega_{m}

to update

{{\bf{d}}\left({\bf{p}}(1:m-1)\right)}

by (50), and update

{\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)

by (51);

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

Appendix C Several Versions of Proposed Recursive V-BLAST Algorithm II to Further Save Memories and Permutation Operations

In this appendix, we introduce several versions of the proposed recursive V-BLAST algorithm II with both speed advantage and memory saving. Firstly, we avoid the permutation operations for ${\bf{z}}_{M}$ , ${{\bf{d}}_{m}}$ and ${\bf{Q}}_{m}$ in step 1 of the recursion phase in Algorithm 3. Then we give the implementations that store only the upper triangular part of the Hermitian ${{\bf{Q}}_{i}}$ ( $i=1,2,\cdots,M$ ).

Algorithm 5 The Proposed Version Storing Only the Upper Triangular Part of

{\bf{Q}}

0: Set

{\bf{p}}=\left[1,2,\cdots,M\right]^{T}

and

{{\bf{d}}_{M}}={{\bf{0}}_{M}}

; Store

{\bf{\tilde{H}}}={\bf{H}}^{H}

and compute

{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}

; Compute (37) for

i=1,2,\cdots,M

to cover the upper triangular part of

{\bf{\tilde{H}}}(:,1:M)

with that of

{\bf{R}}_{M}

; Compute

{\bf{Q}}_{1}

by (35), and then compute (55), (34b), (34c) and (34g) iteratively for

i=2,3,\cdots,M

, to cover the upper triangular part of

{\bf{R}}_{M}

(i.e.,

{\bf{\tilde{H}}}(:,1:M)

) with that of

{\bf{Q}}_{M}

;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

q_{jj}

is the

j^{th}

diagonal entry of

{\bf{Q}}_{m}

. Permute entries

l_{m}

and

m

{\bf{p}}

{\bf{z}}_{M}

and

{{\bf{d}}_{m}}

. Exchange

{{\bf{Q}}_{m}}(1:{l_{m}}-1,{l_{m}})

and

{{\bf{Q}}_{m}}(1:{l_{m}}-1,m)

{{\bf{Q}}_{m}}({l_{m}}+1:m-1,m)

and

{{\bf{Q}}_{m}}{({l_{m}},{l_{m}}+1:m-1)^{H}}

, and

{{\bf{Q}}_{m}}({l_{m}},{l_{m}})

and

{{\bf{Q}}_{m}}(m,m)

, and let

{{\bf{Q}}_{m}}({l_{m}},m)={{\bf{Q}}_{m}}{({l_{m}},m)^{*}}

;

2: Compute

\hat{s}_{{p_{m}}}

by (23), which is quantized to

s_{{p_{m}}}

;

3: Deflate

{\bf{d}}_{m}

{{\bf{d}}_{m-1}}

by (24), and compute only the upper triangular part of

{{\bf{Q}}_{m-1}}

by (18);

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

Algorithm 6 The Proposed Version Storing the Upper Triangular Part of

{\bf{Q}}

with Fewer Permutation Operations

0: Same as Initialization in Algorithm 5;

0: For

m=M,M-1,\cdots,2

1: Find

l_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}

, where

{q_{jj}}={\bf{Q}}(p_{j},p_{j})

is the

p_{j}^{th}

diagonal entry of

{\bf{Q}}

; Permute entries

l_{m}

and

m

{\bf{p}}

; In

{\bf{Q}}

, get

\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right)

, and get

{\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right)

where

{\bf{Q}}\left({{\bf{p}}(i),{p_{m}}}\right)={\bf{Q}}\left({{p_{m}},{\bf{p}}(i)}\right)^{*}

(

i=1,2,\cdots,m-1

) if

{\bf{p}}(i)>{p_{m}}

;

2: Compute

\hat{s}_{{p_{m}}}

by (49) with

{\bf{q}}_{m}

formed by

{\bf{q}}_{m}={\left[{{\bf{\bar{q}}}_{m}^{T}}\enspace{{\omega_{m}}}\right]^{T}}

, and then quantize

\hat{s}_{{p_{m}}}

s_{{p_{m}}}

;

3: Use

{\bf{\bar{q}}}_{m}

and

\omega_{m}

to update

{{\bf{d}}\left({\bf{p}}(1:m-1)\right)}

by (50), and compute only the upper-triangular entries in

{\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)

by (51);

3: When

m=1

, only run step 2 to get

s_{p_{1}}

;

C-A The Version to Avoid Permutation Operations

We can avoid the permutation operations for ${\bf{z}}_{M}$ , ${{\bf{d}}_{m}}$ and ${\bf{Q}}_{m}$ in step 2 of the recursion phase in Algorithm 3. Accordingly, we need to replace (23), (24) and (18) with

\hat{s}_{{p_{m}}}={\bf{q}}_{m}^{H}{\bf{z}}_{M}({\bf{p}}(1:m))-{{\bf{d}}}({p_{m}}),

(49)

{{\bf{d}}\left({\bf{p}}(1:m-1)\right)}={{\bf{d}}\left({\bf{p}}(1:m-1)\right)}\\ -\left({{{s}_{{p_{m}}}}+{\bf{d}}\left({p_{m}}\right)}\right){\bf{w}}_{m-1}/{\omega_{m}}

(50)

and

{\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)=\\ {\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)-\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H},

(51)

respectively. In the above (50) and (51), ${\bf{\bar{q}}}_{m}$ and $\omega_{m}$ can be obtained in ${\bf{Q}}$ by

{\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right)

(52)

and

\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right),

(53)

respectively, which form ${\bf{q}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m),{p_{m}}}\right)$ for (49) by

{\bf{q}}_{m}={\left[{\begin{array}[]{*{20}{c}}{{\bf{\bar{q}}}_{m}^{T}}&{{\omega_{m}}}\end{array}}\right]^{T}}.

(54)

The proposed version with memory saving and fewer permutation operations is summarized in Algorithm 4.

C-B The Versions to Store Only the Triangular Part of the Hermitian Matrix

To further save memories, we can store only the upper triangular part of the Hermitian ${{\bf{Q}}_{i}}$ where $i=2,3,\cdots,M$ . Now instead of computing $\bf{\tilde{q}}$ by (34a), we obtain $\bf{\tilde{q}}$ by computing the $j^{th}$ entry of $\bf{\tilde{q}}$ by

{\bf{\tilde{q}}}(j)=[{{\bf{R}}{{(1:j-1,j)}^{H}}}\;{{\bf{R}}(j,j:i-1)}]{\bf{R}}(1:i-1,i)

(55)

for $j=1,2,\cdots,i-1$ , since only the upper triangular part of ${\bf{R}}(1:i-1,1:i-1)={{\bf{Q}}_{i-1}}$ is available in (34a). Then we modify Algorithm 3 and Algorithm 4 into Algorithm 5 and Algorithm 6, respectively.

References

[1] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Commun., pp. 311-335, Mar. 1998.
[2] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389-399, Mar. 2003.
[3] K.-J. Yang and S.-H. Tsai, “Maximum Likelihood and Soft Input Soft Output MIMO Detection at a Reduced Complexity,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12389-12393, Dec. 2018.
[4] V. Elvira and I. Santamaria, “Multiple Importance Sampling for Symbol Error Rate Estimation of Maximum-Likelihood Detectors in MIMO Channels,” IEEE Trans. Signal Process., vol. 69, pp. 1200-1212, 2021.
[5] P. W. Wolniansky, G. J. Foschini, G. D. Golden and R. A. Valenzuela, “V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel,” Proc. Int. Symp. Signals, Syst., Electron. (ISSSE 98), Pisa, Italy, Sept. 1998, pp. 295-300.
[6] K. Liu and A. M. Sayeed, “An Iterative Extension of BLAST Decoding Algorithm for Layered Space-Time Signals,” IEEE Transactions on Communications, vol. 53, no. 9, pp. 1595-1595, Sept. 2005.
[7] S. W. Kim and K. P. Kim, “Log-likelihood-ratio-based detection ordering in V-BLAST,” IEEE Trans. Commun., vol. 54, no. 2, pp. 302-307, Feb. 2006.
[8] S. R. Lee, S. H. Park, S. W. Kim, and I. Lee, “Enhanced detection with new ordering schemes for V-BLAST systems,” IEEE Trans. Wireless Commun., vol. 57, no. 6, pp. 1648-1651, Jun. 2009.
[9] Y. Yapıcı, İ. Güvenç and Y. Kakishima, “A MAP-Based Layered Detection Algorithm and Outage Analysis Over MIMO Channels,” IEEE Trans. Wireless Commun., vol. 17, no. 7, pp. 4256-4269, Jul. 2018.
[10] W.-J. Choi, K.-W. Cheong, and J. M. Cioffi, “Iterative soft interference cancellation for multiple antenna systems,” Proc. IEEE Wireless Commun. Netw. Conf. Rec., Chicago, IL, USA, Sep. 2000, pp. 304 C309.
[11] X. Li, H. Huang, G. J. Foschini and R. A. Valenzuela, “Effects of iterative detection and decoding on the performance of BLAST,” Globecom ’00 - IEEE. Global Telecommunications Conference, San Francisco, CA, USA, 2000, pp. 1061-1066 vol.2.
[12] J. W. Choi, A. C. Singer, J. Lee and N. I. Cho, “Improved linear soft-input soft-output detection via soft feedback successive interference cancellation,” IEEE Transactions on Communications, vol. 58, no. 3, pp. 986-996, March 2010.
[13] P. Xiao, W. Yin, C. Cowan and V. Fusco, “VBLAST Detection Algorithms Utilizing Soft Symbol Estimate and Noncircular CAI,” IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2441-2447, May 2011.
[14] Y. -C. Liang, E. Y. Cheu, L. Bai and G. Pan, “On the Relationship Between MMSE-SIC and BI-GDFE Receivers for Large Multiple-Input Multiple-Output Channels,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3627-3637, Aug. 2008.
[15] N. Shlezinger, R. Fu and Y. C. Eldar, “DeepSIC: Deep Soft Interference Cancellation for Multiuser MIMO Detection,” IEEE Transactions on Wireless Communications, vol. 20, no. 2, pp. 1349-1362, Feb. 2021.
[16] F. Cao, J. Li and J. Yang, “On the Relation Between PDA and MMSE-ISDIC,” IEEE Signal Processing Letters, vol. 14, no. 9, pp. 597-600, Sept. 2007.
[17] Xiaodong Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Transactions on Communications, vol. 47, no. 7, pp. 1046-1061, July 1999.
[18] J. Benesty, Y. Huang and J. Chen, “A fast recursive algorithm for optimum sequential signal detection in a BLAST system,” IEEE Trans. Signal Process., pp. 1722-1730, Jul. 2003.
[19] L. Szczeciński and D. Massicotte, “Low complexity adaptation of MIMO MMSE receivers, implementation aspects,” Proc. Global Commun. Conf. (Globecom’05), St. Louis, MO, USA, Nov., 2005.
[20] H. Zhu, Z. Lei, F.P.S. Chin, “An improved recursive algorithm for BLAST,” Signal Process., vol. 87, no. 6, pp. 1408-1411, Jun. 2007.
[21] Y. Shang and X. G. Xia, “On fast recursive algorithms for V-BLAST with optimal ordered SIC detection,” IEEE Trans. Wireless Commun., vol. 8, pp. 2860-2865, Jun. 2009.
[22] B. Hassibi, “An efficient square-root algorithm for BLAST,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’00), pp. 737-740, Jun. 2000.
[23] H. Zhu, Z. Lei, and F. Chin, “An improved square-root algorithm for BLAST,” IEEE Signal Process. Lett., vol. 11, no. 9, pp. 772-775, 2004.
[24] H. Zhu, W. Chen, B. Li, and F. Gao, “An improved square-root algorithm for V-BLAST based on efficient inverse Cholesky factorization,” IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 43-48, 2011.
[25] K. Pham and K. Lee, “Low-Complexity SIC Detection Algorithms for Multiple-Input Multiple-Output Systems,” IEEE Trans. Signal Process., pp. 4625-4633, vol. 63, no. 17, Sept. 2015.
[26] C. Studer, S. Fateh and D. Seethaler, “ASIC Implementation of Soft-Input Soft-Output MIMO Detection Using MMSE Parallel Interference Cancellation,” IEEE Journal of Solid-State Circuits, vol. 46, no. 7, pp. 1754-1765, July 2011.
[27] S. Park, “Low-Complexity LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems,” IEEE Trans. on Signal Processing, vol. 70, pp. 1890-1899, 2022.
[28] M. J. Grabner, X. Li and S. Fu, “An Adaptive BLAST Successive Interference Cancellation Method for High Data Rate Perfect Space-Time Coded MIMO Systems,” IEEE Trans. Veh. Technol., vol. 69, no. 2, pp. 1542-1553, Feb. 2020.
[29] E. Basar, “On Multiple-Input Multiple-Output OFDM with Index Modulation for Next Generation Wireless Networks,” IEEE Trans. Signal Process., vol. 64, no. 15, pp. 3868-3878, 1 Aug.1, 2016.
[30] K. A. Alnajjar, M. El-Tarhuni, “A C-V-BLAST spread spectrum massive MIMO NOMA scheme for 5G systems with channel imperfections,” Physical Commun., vol. 35, 2019.
[31] S. Özyurt, E. P. Simon and J. Farah, “NOMA With Zero-Forcing V-BLAST,” IEEE Commun. Letters, vol. 24, no. 9, pp. 2070-2074, Sept. 2020.
[32] J. Choi, “On the power allocation for MIMO-NOMA systems with layered transmissions,” IEEE Trans. Wireless Commun., vol. 15, no. 5, pp. 3226-3237, May 2016.
[33] P. Singh, B. U. Rani, H. B. Mishra and K. Vasudevan, “Neighbourhood Detection-based ZF-V-BLAST Architecture for MIMO-FBMC-OQAM Systems,” Proc. Global Commun. Conf. (Globecom’18), Abu Dhabi, United Arab Emirates, 2018, pp. 1-6.
[34] H.-Y. Lu, L.-P. Chang and H.-S. Hung, “Partial Tree Search Assisted Symbol Detection for Massive MIMO Systems,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13319-13327, Nov. 2020.
[35] S. Adnan, Y. Fu, B. J. Ahmed, M. F. Tahir, F. Banoori, “Modified ordered successive interference cancellation MIMO detection using low complexity constellation search,” AEU - Int. J. of Electron. and Commun., vol. 121, 2020.
[36] S. Özyurt and M. Torlak, “Exact Outage Probability Analysis of Dual-Transmit-Antenna V-BLAST With Optimum Ordering,” IEEE Trans. Veh. Technol., vol. 68, no. 1, pp. 977-982, Jan. 2019.
[37] S. Yang and L. Hanzo, “Fifty Years of MIMO Detection: The Road to Large-Scale MIMOs,” IEEE Commun. Surveys Tuts., vol. 17, no. 4, pp. 1941-1988, Fourthquarter 2015.
[38] C. Xu, S. Sugiura, S. X. Ng, P. Zhang, L. Wang and L. Hanzo, “Two Decades of MIMO Design Tradeoffs and Reduced-Complexity MIMO Detection in Near-Capacity Systems,” IEEE Access, vol. 5, pp. 18564-18632, 2017.
[39] T. K. Mood and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice Hall, 2000.
[40] H. V. Henderson and S. R. Searle, “On Deriving the Inverse of a Sum of Matrices,” SIAM Review, vol. 23, no. 1, Jan. 1981.