This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\UseRawInputEncoding

Improved Recursive Algorithms for V-BLAST to Save Computations and Memories

Hufei Zhu, Yanyang Liang, Fuqin Deng*, Genquan Chen and Jiaming Zhong H. Zhu, Y. Liang J. Zhong, F. Deng and G. Chen are with the Faculty of Intelligent Manufacturing, Wuyi University, Jiangmen 529020, Guangdong, China.This paper was presented in part at the IEEE International Conference on Communications.
Abstract

For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then Improvements I-IV were introduced to further reduce the computational complexity. The existing recursive algorithm with speed advantage and that with memory saving incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively, and as far as we know, they require the least computations and memories, respectively, among the existing recursive algorithms. In this paper, we propose Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, Improvement V uses another lemma to speed up the matrix inversion step by the factor of 1.671.67. Then the formulas adopted in our Improvement V are applied to deduce Improvement VI for interference cancellation, which saves memories without sacrificing speed compared to Improvement II. In the existing algorithm with speed advantage, the proposed algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. Both proposed algorithms speed up the existing algorithm with speed advantage by the factor of 1.31.3, while the proposed algorithm II achieves the speedup of 1.861.86 and saves about half memories, compared to the existing algorithm with memory saving.

Index Terms:
Recursive algorithm, V-BLAST, memory saving, inversion of partitioned matrix, interference cancellation.

I Introduction

Spatial-multiplexing (SM) mutiple-input multiple-output (MIMO) wireless communication systems [1] transmit multiple substreams concurrently from multiple antennas, to achieve very high data rate and spectral efficiency in rich multi-path environments. As the number of transmit antennas increases, the maximum-likelihood (ML) MIMO detectors [2, 3, 4] with the optimum performance quickly require excessive computational complexity. Vertical Bell Laboratories Layered Space-Time (V-BLAST) systems [5] are SM MIMO schemes with good performance-complexity tradeoff, where usually ordered successive interference cancellation (OSIC) detectors based on hard decisions are adopted to detect the substream symbols iteratively with the optimal ordering based on signal-to-noise ratio (SNR). In each iteration, one of the undetected symbols is detected through a linear zero-forcing (ZF) or minimum meansquare error (MMSE) filter, and then the hard decision of the detected symbol is utilized to cancel the interference in the received symbol vector. OSIC detectors choose the undetected symbol with the highest SNR in each iteration to perform detection and interference cancellation, since the unreliable hard decision causes detrimental error propagation (EP) [6].

EP due to imperfect hard decisions can severely limit the performance of OSIC detectors [6, 7]. Then the ordering schemes based on the loglikelihood ratio (LLR) [7, 8] and maximum a posteriori probability (MAP) [9] exploit the instantaneous noise and the resulting symbol decision error, respectively, to achieve a performance gain over the conventional SNR ordering scheme. On the other hand, iterative soft interference cancellation (ISIC) MIMO detectors were proposed to reduce the EP effects by replacing hard decisions with soft decisions in interference cancellation [10, 11, 12, 13], which are extended to iterative detection-decoding (IDD) schemes with soft-decision feedback for coded systems, and can achieve near ML performance with controllable complexity [14, 15]. Since the direct-sequence code-division multiple-access (DS-CDMA) system can be modeled as a MIMO channel [14], the ISIC MIMO detectors are also obtained in [16] and [14] by applying the ISIC detector in [17] for the coded DS-CDMA system. For the case with the non-linear or unknown channel model, a data-driven implementation of the ISIC MIMO detector was proposed in [15], where simple dedicated deep neural networks (DNNs) replace the channel model-based building blocks of the ISIC MIMO detector in [10].

The calculation of OSIC and ISIC detectors for MIMO requires quite high complexity. Several efficient implementations of OSIC were proposed, which mainly include the recursive algorithms [18, 19, 20, 21] and the square-root algorithms [22, 23, 24, 25] based on the detection error covariance matrix and its square-root matrix, respectively. On the other hand, several efficient implementations of ISIC were proposed in [26, 27].

In addition to the above-mentioned EP effect reductions and efficient implementations, there have been many other researches related to the OSIC MIMO detector recently. It was applied in linear dispersion (LD) based perfect space-time coded MIMO systems [28], MIMO orthogonal frequency division multiplexing (OFDM) with index modulation (IM) [29], and the power-domain non-orthogonal multiple access (NOMA) systems with MIMO [30, 31, 32]. For MIMO filter bank multicarrier (FBMC) systems, neighbourhood detection based OSIC detectors with QR and sorted QR decompositions were proposed in [33]. For 5G mobile networks with massive MIMO, the OSIC detector was combined with the K-correction algorithm based on ML criterion and the partial tree search detection in [34] and [35], respectively. In [36], an exact closed-form expression was given for the average outage probability of zero-forcing (ZF) OSIC V-BLAST with 22 transmitters and r2r\geq 2 receivers. Moreover, comprehensive survey of MIMO detectors (including OSIC detectors) has been given in [37] and [38].

In this paper, we focus on the recursive OSIC detectors [18, 19, 20, 21]. The original recursive algorithm proposed in [18] was improved in [19] and [20] to reduce the computational complexity, and the improvements were then incorporated in [21] to give the “fastest known algorithm” before [21]. The contributions of [21] also include one recursive algorithm with speed advantage that adopts a further improvement for the “fastest known algorithm” with the speedups claimed to be 1.31.3, and the other recursive algorithm with memory saving that is slower than the “fastest known algorithm”. We further improve the matrix inversion applied in [19] and the interference cancellation scheme proposed in [20], both of which are part of the recursive algorithm with speed advantage in [21].

Firstly, the lemma for inversion of partitioned matrix [39, Ch. 14.12] applied in [19] is replaced with another one [40, Equ. 8], to accelerate the computation of the initial detection error covariance matrix. Then the formulas adopted by us for matrix inversion are utilized to deduce the improved interference cancellation scheme with memory saving, which uses the entries in the error covariance matrix instead of those in its inverse matrix. Finally, we use the above two improvements to propose the recursive algorithm with both speed advantage and memory saving, which is the fastest and requires the least memories compared to the existing recursive algorithms. In our future works, the proposed recursive OSIC detector with both speed advantage and memory saving will be extended to the recursive ISIC detector, which can reduce the required computations and memories compared to the low-complexity ISIC detector proposed in [27] recently.

The V-BLAST system model is overviewed in section II, followed by the description of the recursive algorithms for V-BLAST in [18, 20, 19, 21] in Section III. Two recursive algorithms for V-BLAST are proposed in Section IV to improve the recursive algorithm with speed advantage in [21]. Then the presented recursive algorithms for V-BLAST are compared in Section V. Finally, we make conclusion in Section VI.

In what follows, uppercase and lowercase bold face letters represent matrices and column vectors, respectively. ()T(\bullet)^{T}, ()(\bullet)^{*}, ()H(\bullet)^{H}, and ()1(\bullet)^{-1} denote transposition, conjugate, conjugate transposition, and inversion of matrices, respectively. IM{\bf{{\rm I}}}_{M} is the identity matrix with size MM. Moreover, the MATLAB standard is followed in the notations for some variables.

II System Model

The considered V-BLAST system consists of MM transmit antennas and N(M)N(\geq M) receive antennas in a rich-scattering and flat-fading wireless channel. At the transmitter, the data stream is de-multiplexed into MM sub-streams, and each sub-stream is encoded and fed to its respective transmit antenna. Denote the vector of transmit symbols from MM antennas as

𝐬M=[s1,s2,,sM]T,{\bf{s}}_{M}=[s_{1},s_{2},\cdots,s_{M}]^{T}, (1)

and denote the N×MN\times M complex channel matrix 𝐇{\bf{H}} as

𝐇M=[𝐡:1,𝐡:2,,𝐡:M]{\bf{H}}_{M}=[{\bf{h}}_{:1},{\bf{h}}_{:2},\cdots,{\bf{h}}_{:M}] (2)

where 𝐡:m{\bf{h}}_{:m} (m=1,2,,Mm=1,2,\cdots,M) is the mthm^{th} column of 𝐇{\bf{H}}. Then the symbols received in NN antennas can be written as

𝐱(M)=𝐇M𝐬M+𝐧,{\bf{x}}^{(M)}={\bf{H}}_{M}\cdot{\bf{s}}_{M}+{\bf{n}}, (3)

where 𝐧{\bf{n}} is the N×1N\times 1 complex Gaussian noise vector with zero mean and covariance σn2IN\sigma_{n}^{2}{\bf{{\rm I}}}_{N}.

The conventional V-BLAST scheme detects MM components of 𝐬M{\bf{s}}_{M} by MM recursions with the optimal ordering based on SNR. In each recursion, the component with the highest post detection SNR among all the undetected components is detected by a linear filter and then its effect is substracted from the received signal vector [5, 22, 18]. Assume that in the ithi^{th} (i=1,2,,Mi=1,2,\cdots,M) recursion, the undetected m=Mi+1m=M-i+1 transmit symbols and the corresponding channel matrix are the first mm entries and columns of the permuted 𝐬M{\bf{s}}_{M} and 𝐇M{\bf{H}}_{M}, respectively, i.e., 𝐬m{\bf{s}}_{m} and 𝐇m{\bf{H}}_{m} defined by (1) and (2). Then we obtain the reduced-order problem (3) with MM replaced by any m(<M)m(<M). Accordingly, the linear MMSE estimation of the undetected mm symbols (i.e., 𝐬m{\bf{s}}_{m}) is

𝐬^m=(𝐇mH𝐇m+α𝐈m)1𝐇mH𝐱(m),{\bf{\hat{s}}}_{m}=\left({{\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m}}\right)^{-1}{\bf{H}}_{m}^{H}{\bf{x}}^{(m)}, (4)

where α=σn2σs2\alpha=\frac{\sigma_{n}^{2}}{\sigma_{s}^{2}}, and σs2\sigma_{s}^{2} is the power of each symbol in 𝐬M{\bf{s}}_{M}.

III The Recursive V-BLAST Algorithms

III-A The Original Recursive V-BLAST Algorithm

The original recursive V-BLAST algorithm [18] is based on

𝐑m=𝐇mH𝐇m+α𝐈m\displaystyle{\bf{R}}_{m}={\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m} (5a)
𝐐m=𝐑m1=(𝐇mH𝐇m+α𝐈m)1.\displaystyle{\bf{Q}}_{m}={\bf{R}}_{m}^{-1}=\left({{\bf{H}}_{m}^{H}{\bf{H}}_{m}+\alpha{\bf{I}}_{m}}\right)^{-1}. (5b)

The above 𝐐m{\bf{Q}}_{m} is the covariance matrix [22, 18] for the detection error 𝐞m=𝐬m𝐬^m{\bf{e}}_{m}={\bf{s}}_{m}-{\bf{\hat{s}}}_{m}, i.e., E{𝐞m𝐞mH}=σn2𝐐mE\{{\bf{e}}_{m}{\bf{e}}_{m}^{H}\}=\sigma_{n}^{2}{\bf{Q}}_{m}. Then in each recursion, the symbol with the highest post detection SNR among all the undetected mm symbols corresponds to the minimal diagonal entry in 𝐐m{\bf{Q}}_{m}.

In [18], the Sherman-Morrison formula is applied to deduce

𝐐[n]=𝐐[n1]𝐐[n1]𝐡n:𝐡n:H𝐐[n1]1+𝐡n:H𝐐[n1]𝐡n:,{{\bf{Q}}_{[n]}}={{\bf{Q}}_{[n-1]}}-\frac{{{{\bf{Q}}_{[n-1]}}{{\bf{h}}_{n:}}{\bf{h}}_{n:}^{H}{{\bf{Q}}_{[n-1]}}}}{{1+{\bf{h}}_{n:}^{H}{{\bf{Q}}_{[n-1]}}{{\bf{h}}_{n:}}}}, (6)

where 𝐡n:H{\bf{h}}_{n:}^{H} is the nthn^{th} row of 𝐇{\bf{H}}. Accordingly, the initial 𝐐=𝐐[N]{\bf{Q}}={\bf{Q}}_{[N]} and 𝐑=𝐑[N]{\bf{R}}={\bf{R}}_{[N]} are obtained by computing (6) and

𝐑[n]=l=1n𝐡l:𝐡l:H+α𝐈M=𝐑[n1]+𝐡n:𝐡n:H{\bf{R}}_{[n]}=\sum\limits_{l=1}^{n}{\bf{h}}_{l:}{\bf{h}}_{l:}^{H}+\alpha{\bf{I}}_{M}={\bf{R}}_{[n-1]}+{\bf{h}}_{n:}{\bf{h}}_{n:}^{H} (7)

iteratively for n=1,2,,Nn=1,2,\cdots,N, where 𝐐[0]=1α𝐈M{\bf{Q}}_{[0]}=\frac{1}{\alpha}{\bf{I}}_{M} and 𝐑[0]=α𝐈M{\bf{R}}_{[0]}=\alpha{\bf{I}}_{M}. Then set the initial 𝐑M=𝐑{\bf{R}}_{M}={\bf{R}}, 𝐐M=𝐐{\bf{Q}}_{M}={\bf{Q}}, 𝐱(M)=𝐱{\bf{x}}^{(M)}={\bf{x}} and 𝐩=[1,2,,M]T{\bf{p}}=\left[1,2,\cdots,M\right]^{T} for the recursion phase.

In the recursion with mm (m=M,M1,,2m=M,M-1,\cdots,2) undetected symbols, 𝐩{\bf{p}} is permuted so that the pm{p_{m}}-th (i.e., 𝐩(m){\bf{p}}(m)-th) symbol has the highest SNR among the undetected symbols, while 𝐇m{\bf{H}}_{m}, 𝐑m{\bf{R}}_{m} and 𝐐m{\bf{Q}}_{m} are permuted accordingly. Then the estimation of the pm{p_{m}}-th symbol is computed by

s^pm=𝐪mH𝐇mH𝐱(m)\hat{s}_{{p_{m}}}={\bf{q}}_{m}^{H}{\bf{H}}_{m}^{H}{\bf{x}}^{(m)} (8)

with 𝐪m{\bf{q}}_{m} to be the mthm^{th} column of 𝐐m{\bf{Q}}_{m}, and s^pm\hat{s}_{{p_{m}}} is quantized to spms_{{p_{m}}} according to the constellation in use. For the next recursion, the interference of spms_{{p_{m}}} is cancelled in 𝐱(m){\bf{x}}^{(m)} to get

𝐱(m1)=𝐱(m)spm𝐡:m,{\bf{x}}^{(m-1)}={\bf{x}}^{(m)}-s_{{p_{m}}}{\bf{h}}_{:{{m}}}, (9)

𝐑m1{\bf{R}}_{m-1} is determined from 𝐑m{\bf{R}}_{m} by

𝐑m=[𝐑m1𝐫¯m𝐫¯mHγm],{\bf{R}}_{m}=\left[{\begin{array}[]{*{20}c}{{\bf{R}}_{m-1}}&{{\bf{\bar{r}}}_{m}}\\ {{\bf{\bar{r}}}_{m}^{H}}&{\gamma_{m}}\\ \end{array}}\right], (10)

and 𝐐m{\bf{Q}}_{m} is deflated to 𝐐m1{\bf{Q}}_{m-1} by

𝐐m1=𝐐¯m1𝐐¯m1𝐫¯m𝐫¯mH𝐐¯m1γm+𝐫¯mH𝐐¯m1𝐫¯m,{\bf{Q}}_{m-1}={\bf{\bar{Q}}}_{m-1}-\frac{{{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}}}{{{\gamma_{m}}+{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}}}, (11)

where 𝐐¯m1{\bf{\bar{Q}}}_{m-1} is 𝐐m{\bf{Q}}_{m} with the last row and column removed, as described by [21, Eq. (13)]

𝐐m=[𝐐¯m1𝐪¯m𝐪¯mHωm].{\bf{Q}}_{m}=\left[{\begin{array}[]{*{20}c}{{\bf{\bar{Q}}}_{m-1}}&{{\bf{\bar{q}}}_{m}}\\ {{\bf{\bar{q}}}_{m}^{H}}&{\omega_{m}}\\ \end{array}}\right]. (12)

In (10)-(12), 𝐫¯m{\bf{\bar{r}}}_{m} and 𝐪¯m{{\bf{\bar{q}}}_{m}} are 𝐫m{\bf{r}}_{m} and 𝐪m{\bf{q}}_{m} (i.e., the mthm^{th} columns of 𝐑m{\bf{R}}_{m} and 𝐐m{\bf{Q}}_{m}) with the last entry removed, respectively.

III-B Existing Improvements

In the original recursive algorithm, the dominant complexity in the order of O(M3+M2N)O({M^{3}}+{M^{2}}N) comes from the initialization of 𝐐{\bf{Q}} and 𝐑{\bf{R}}, the estimation of spms_{p_{m}} and the deflation of 𝐐m{\bf{Q}}_{m} (where 2mM2\leq m\leq M) by (6), (7), (8) and (11), respectively. To reduce the computational complexity of the original recursive algorithm, the following Improvements I-IV have been proposed in [19], [20] and [21], respectively.

Improvement I: To speed up the computation of the initial 𝐐{\bf{Q}}, the Sherman-Morrison formula (6) applied in [18] is replaced with the lemma for inversion of partitioned matrix [39, Ch. 14.12] in [19] to compute the inverse of 𝐑m{\bf{R}}_{m} partitioned by (10), which is 𝐐m{\bf{Q}}_{m} partitioned by (12) where

𝐐¯m1=𝐐m1+𝐐m1𝐫¯m𝐫¯mH𝐐m1γm𝐫¯mH𝐐m1𝐫¯m\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\frac{{{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}}}{{\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}}} (13a)
𝐪¯m=γm1𝐐¯m1𝐫¯m\displaystyle{\bf{\bar{q}}}_{m}={-\gamma_{m}^{-1}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}} (13b)
ωm=γm1+γm2𝐫¯mH𝐐¯m1𝐫¯m.\displaystyle\omega_{m}={\gamma_{m}^{-1}+\gamma_{m}^{-2}{\bf{\bar{r}}}_{m}^{H}{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}}. (13c)

In [19], the above (13c) and (12) are applied to compute 𝐐m=𝐑m1{\bf{Q}}_{m}={\bf{R}}_{m}^{-1} from 𝐐m1{\bf{Q}}_{m-1} iteratively for m=2,3,,Mm=2,3,...,M, to get the initial 𝐐=𝐐M{\bf{Q}}={\bf{Q}}_{M} from

𝐐1=1/𝐑1.{\bf{Q}}_{1}=1/{\bf{R}}_{1}. (14)

Improvement II: In [20], (8) has been simplified to

s^pm=𝐪mH𝐳m\hat{s}_{p_{m}}={\bf{q}}_{m}^{H}{\bf{z}}_{m} (15)

with 𝐳m{\bf{z}}_{m} defined by

𝐳m=𝐇mH𝐱(m).{\bf{z}}_{m}={\bf{H}}_{m}^{H}{\bf{x}}^{(m)}. (16)

Only the initial 𝐳M{\bf{z}}_{M} is computed by (16) with m=Mm=M. Then for m=M,M1,,2m=M,M-1,\cdots,2, the interference of spms_{p_{m}} is cancelled in the permuted 𝐳m{\bf{z}}_{m} to update 𝐳m{\bf{z}}_{m} into 𝐳m1{\bf{z}}_{m-1} efficiently by

𝐳m1=𝐳¯mspm𝐫¯m,{\bf{z}}_{m-1}={\bf{\bar{z}}}_{m}-s_{p_{m}}{{\bf{\bar{r}}}_{m}}, (17)

where 𝐳¯m{\bf{\bar{z}}}_{m} is 𝐳m{\bf{z}}_{m} with the last entry removed, as 𝐫¯m{\bf{\bar{r}}}_{m} and 𝐪¯m{{\bf{\bar{q}}}_{m}}.

Improvement III: In [19, Algorithm II], [20], the structure of Hermitian matrices is exploited to simplify the initialization and deflation of 𝐐{\bf{Q}} by (6) and (11), respectively.

Improvement IV: In [21], the deflation of 𝐐m{\bf{Q}}_{m} was fastened by replacing (11) with

𝐐m1=𝐐¯m1ωm1𝐪¯m𝐪¯mH,{\bf{Q}}_{m-1}={\bf{\bar{Q}}}_{m-1}-\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}, (18)

where 𝐐¯m1{\bf{\bar{Q}}}_{m-1}, 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m} are in 𝐐m{\bf{Q}}_{m}, as shown in (12).

In [21], the above Improvements I-III were incorporated into the original algorithm to give the “fastest known algorithm” before [21], and then the above Improvement IV was proposed to improve the “fastest known algorithm” into the algorithm with speed advantage, which is summarized in Algorithm 1.

Algorithm 1 The Algorithm with Speed Advantage in [21]
0:     Set 𝐩=[1,2,,M]T{\bf{p}}=\left[1,2,\cdots,M\right]^{T} and compute 𝐳M=𝐇H𝐱{\bf{z}}_{M}={\bf{H}}^{H}{\bf{x}}; Compute (7) iteratively for n=1,2,,Nn=1,2,\cdots,N to obtain the initial 𝐑M=𝐑[N]{\bf{R}}_{M}={\bf{R}}_{[N]}; Compute 𝐐1{\bf{Q}}_{1} by (14), and then compute (13c) and (12) iteratively for m=2,3,,Mm=2,3,\cdots,M, to obtain the initial 𝐐M{\bf{Q}}_{M};
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjjq_{jj} is the jthj^{th} diagonal entry of 𝐐m{\bf{Q}}_{m}; Permute entries lml_{m} and mm in 𝐩{\bf{p}} and 𝐳m{\bf{z}}_{m}; Permute rows and columns lml_{m} and mm in 𝐑m{\bf{R}}_{m} and 𝐐m{\bf{Q}}_{m};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (15), which is quantized to spms_{{p_{m}}};
3:  Cancel the effect of spms_{{p_{m}}} in 𝐳m{\bf{z}}_{m} to obtain 𝐳m1{\bf{z}}_{m-1} by (17), and deflate 𝐐m{{\bf{Q}}_{m}} to 𝐐m1{{\bf{Q}}_{m-1}} by (18);
3:     When m=1m=1, only run step 2 to get sp1s_{p_{1}};

On the other hand, the algorithm with memory saving proposed in [21] does not calculate 𝐑{\bf{R}} to avoid the overhead of storing and permuting 𝐑{\bf{R}}. It only incorporates Improvements III and IV into the original algorithm, and has not incorporated Improvements I and II where the entries in 𝐑{\bf{R}} are utilized. Moreover, it no longer computes 𝐑M{\bf{R}}_{M} by (7) in the initialization phase, since the deflation of 𝐐m{\bf{Q}}_{m} in the recursion phase uses the entries in 𝐐{\bf{Q}} instead of those in 𝐑{\bf{R}} after Improvement IV is incorporated. The algorithm with memory saving in [21] is summarized in Algorithm 2.

Algorithm 2 The Algorithm with Memory Saving in [21]
0:     Set 𝐩=[1,2,,M]T{\bf{p}}=\left[1,2,\cdots,M\right]^{T} and 𝐇M=𝐇{\bf{H}}_{M}={\bf{H}}; Compute (6) iteratively for n=1,2,,Nn=1,2,\cdots,N to obtain the initial 𝐐M=𝐐[N]{\bf{Q}}_{M}={\bf{Q}}_{[N]};
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjjq_{jj} is the jthj^{th} diagonal entry of 𝐐m{\bf{Q}}_{m}; Permute entries lml_{m} and mm in 𝐩{\bf{p}}; Permute columns lml_{m} and mm in 𝐇m{\bf{H}}_{m}; Permute rows and columns lml_{m} and mm in 𝐐m{\bf{Q}}_{m};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (8), which is quantized to spms_{{p_{m}}};
3:  Cancel the effect of spms_{{p_{m}}} in 𝐱(m){\bf{x}}^{(m)} to obtain 𝐱(m1){\bf{x}}^{(m-1)} by (9); Deflate 𝐐m{{\bf{Q}}_{m}} to 𝐐m1{{\bf{Q}}_{m-1}} by (18); Remove the last column of 𝐇m{\bf{H}}_{m} to obtain 𝐇m1{\bf{H}}_{m-1};
3:     When m=1m=1, only run step 2 to get sp1s_{p_{1}};

To the best of our knowledge, the algorithm with speed advantage (i.e., Algorithm 1) and the algorithm with memory saving (i.e., Algorithm 2) in [21] require the least computations and memories, respectively, among the existing recursive V-BLAST algorithms including those in [18, 20, 19, 21].

IV Proposed Two Recursive V-BLAST Algorithms

In this section, we propose 111Only Improvement V was presented in part at IEEE International Conference on Communications (ICC). Notice that Improvement V is obtained by applying [40, Equ. 8] in this paper, while it was re-derived without citing any literature at ICC’09. Improvements V and VI to replace the above Improvements I and II, respectively. Improvement V accelerates Improvement I that inverts a partitioned matrix. Then the formulas adopted in Improvement V are applied to deduce Improvement VI, which saves memories without sacrificing speed compared to Improvement II, by using 𝐐{\bf{Q}} instead of 𝐑{\bf{R}} for interference cancellation in the recursion phase.

IV-A Improvement V with Speed Advantage for Proposed Recursive Algorithm I to Invert a Partitioned Matrix

Instead of the lemma for inversion of partitioned matrix [39, Ch. 14.12] applied in [19], the proposed Improvement V uses the equation inverting a partitioned matrix [40, Equ. 8] 222The equation inverting a partitioned matrix was not first proposed in [40], where some literatures published between 1917 and 1978 were listed near equation (8) to discuss the first explicit presentation of that equation., to compute the inverse of 𝐑m{\bf{R}}_{m} partitioned by (10), which is 𝐐m{\bf{Q}}_{m} partitioned by (12) where

ωm=(γm𝐫¯mH𝐐m1𝐫¯m)1\displaystyle\omega_{m}=\left({\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}}\right)^{-1} (19a)
𝐪¯m=ωm𝐐m1𝐫¯m\displaystyle{\bf{\bar{q}}}_{m}=-\omega_{m}{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m} (19b)
𝐐¯m1=𝐐m1+ωm1𝐪¯m𝐪¯mH.\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}. (19c)

From equation (8) in [40], we deduce (19c) in Appendix A, which significantly simplifies (13c) from Improvement I. We also write (19c) as

𝐪~m=𝐐m1𝐫¯m\displaystyle{\bf{\tilde{q}}}_{m}={\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m} (20a)
ωm=(γm𝐫¯mH𝐪~m)1\displaystyle\omega_{m}=\left({\gamma_{m}-{\bf{\bar{r}}}_{m}^{H}{\bf{\tilde{q}}}_{m}}\right)^{-1} (20b)
𝐪¯m=ωm𝐪~m\displaystyle{\bf{\bar{q}}}_{m}=-\omega_{m}{\bf{\tilde{q}}}_{m} (20c)
𝐐¯m1=𝐐m1+ωm𝐪~m𝐪~mH,\displaystyle{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}, (20d)

to avoid the unnecessary division to compute ωm1\omega_{m}^{-1} in (19c). In Algorithm 1 (i.e., the improved algorithm with speed advantage in [21]), we compute the initial 𝐐{\bf{Q}} by (20d) from Improvement V instead of (13c) from Improvement I, to obtain the proposed recursive algorithm I with speed advantage.

IV-B Improvement VI with Memory Saving for Interference Cancellation

In the proposed Improvement VI, the above (20d) from Improvement V is applied to deduce an improved interference cancellation scheme with memory saving that avoids using 𝐑M{\bf{R}}_{M} in the recursion phase. For simplicity, the detection order is assumed to be M,M1,,1M,M-1,\cdots,1 in this subsection.

Obviously, we need to avoid using 𝐫¯m{{\bf{\bar{r}}}_{m}} in 𝐑m{\bf{R}}_{m} to update 𝐳m{\bf{z}}_{m} into 𝐳m1{\bf{z}}_{m-1} by (17). Then in the recursion phase, the initial 𝐳M{\bf{z}}_{M} will remain unchanged, which is applied to define the vector

𝐝m=𝐐m(𝐳M(1:m)𝐳m),{{\bf{d}}_{m}}={{\bf{Q}}_{m}}\left({\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}\right), (21)

where 𝐳M(1:m){\bf{z}}_{M}(1:m) denotes the first mm entries of 𝐳M{\bf{z}}_{M}. 𝐝m{{\bf{d}}_{m}} defined by (21), which becomes

𝐝M=𝟎M{{\bf{d}}_{M}}={{\bf{0}}_{M}} (22)

when m=Mm=M, is applied to compute the estimation of spms_{p_{m}} by

s^pm=𝐪mH𝐳M(1:m)𝐝m(m),\hat{s}_{p_{m}}={\bf{q}}_{m}^{H}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}}(m), (23)

and can be updated into 𝐝m1{{\bf{d}}_{m-1}} efficiently by

𝐝m1=𝐝¯m(spm+𝐝m(m))𝐪¯m/ωm,{{\bf{d}}_{m-1}}={\bf{\bar{d}}}_{m}-\left({{{s}_{{p_{m}}}}+{\bf{d}}_{m}(m)}\right){\bf{\bar{q}}}_{m}/\omega_{m}, (24)

where 𝐝m(m){{\bf{d}}_{m}}(m) is the last entry in 𝐝m{{\bf{d}}_{m}}, and 𝐝¯m{\bf{\bar{d}}}_{m} denotes 𝐝m{\bf{d}}_{m} with the last entry removed, i.e.,

𝐝m=[𝐝¯mT𝐝m(m)]T.{\bf{d}}_{m}=\left[{\begin{array}[]{*{20}{c}}{\bf{\bar{d}}}_{m}^{T}&{{\bf{d}}_{m}}(m)\end{array}}\right]^{T}. (25)

To deduce the above (23), write (21) as the column vector 𝐐m𝐳m=𝐐m𝐳M(1:m)𝐝m{{\bf{Q}}_{m}}{{\bf{z}}_{m}}={{\bf{Q}}_{m}}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}} with the last entry to be

𝐪mH𝐳m=𝐪mH𝐳M(1:m)𝐝m(m),{\bf{q}}_{m}^{H}{{\bf{z}}_{m}}{{=}}{\bf{q}}_{m}^{H}{\bf{z}}_{M}(1:m)-{{\bf{d}}_{m}}(m), (26)

and then substitute (26) into (15). Moreover, the above (24) will be deduced in the rest of this subsection, which cancels the interference of spms_{p_{m}} in 𝐝m{\bf{d}}_{m} to update 𝐝m{\bf{d}}_{m} into 𝐝m1{\bf{d}}_{m-1} efficiently.

Firstly, we verify that 𝐝¯m{\bf{\bar{d}}}_{m}, 𝐝m(m){\bf{d}}_{m}(m) and 𝐝m1{{\bf{d}}_{m-1}} in (24) satisfy

𝐝¯m=[𝐐m1+ωm𝐪~m𝐪~mHωm𝐪~m]×(𝐳M(1:m)𝐳m)\displaystyle{\begin{array}[]{*{20}{c}}{\bf{\bar{d}}}_{m}=\left[{\begin{array}[]{*{20}{c}}{{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}}&-\omega_{m}{\bf{\tilde{q}}}_{m}\end{array}}\right]\\ \times\left({{\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}}\right)\end{array}} (27d)
𝐝m(m)=[ωm𝐪~mHωm](𝐳M(1:m)𝐳m)\displaystyle{\bf{d}}_{m}(m)=\left[{\begin{array}[]{*{20}{c}}-\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{{\omega_{m}}}\end{array}}\right]\left({{\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}}}\right) (27f)
𝐝m1=𝐐m1(𝐳M(1:m1)𝐳¯m)+spm𝐪~m.\displaystyle{{\bf{d}}_{m-1}}={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}. (27g)

We substitute (20d) and (20c) into (12) to obtain

𝐐m=[𝐐m1+ωm𝐪~m𝐪~mHωm𝐪~mωm𝐪~mHωm],{\bf{Q}}_{m}=\left[{\begin{array}[]{*{20}c}{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}&-\omega_{m}{\bf{\tilde{q}}}_{m}\\ -\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{\omega_{m}}\\ \end{array}}\right], (28)

and then substitute (28) and (25) into (21) to obtain (27d) and (27f). To verify (27g), substitute (17) into (21) with m=m1m=m-1 to write 𝐝m1{{\bf{d}}_{m-1}} as

𝐝m1\displaystyle{{\bf{d}}_{m-1}} =𝐐m1(𝐳M(1:m1)𝐳¯m+spm𝐫¯m)\displaystyle={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}+{{s}_{{p_{m}}}}{{\bf{\bar{r}}}_{m}}}\right)
=𝐐m1(𝐳M(1:m1)𝐳¯m)+spm𝐐m1𝐫¯m,\displaystyle={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}},

into which substitute (20a).

The above (27g) is applied to deduce (24) finally. Substitute (27g) and (27d) into 𝐝m1𝐝¯m{{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m} to obtain

𝐝m1𝐝¯m=𝐐m1(𝐳M(1:m1)𝐳¯m)+spm𝐪~m[𝐐m1+ωm𝐪~m𝐪~mHωm𝐪~m][𝐳M(1:m1)𝐳¯m𝐳M(m)𝐳m(m)],{{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m}={{\bf{Q}}_{m-1}}\left({{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\right)+{{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}\\ -\left[\begin{array}[]{*{20}{c}}{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H}&-\omega_{m}{\bf{\tilde{q}}}_{m}\end{array}\right]\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right],

which can be simplified into

𝐝m1𝐝¯m=spm𝐪~m+𝐪~m×[ωm𝐪~mHωm][𝐳M(1:m1)𝐳¯m𝐳M(m)𝐳m(m)]{{\bf{d}}_{m-1}}-{\bf{\bar{d}}}_{m}={{s}_{{p_{m}}}}{\bf{\tilde{q}}}_{m}+{\bf{\tilde{q}}}_{m}\times\\ \left[{\begin{array}[]{*{20}{c}}-\omega_{m}{\bf{\tilde{q}}}_{m}^{H}&{\omega_{m}}\end{array}}\right]\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right] (29)

since the sum of all the terms containing 𝐐m1{{\bf{Q}}_{m-1}} is zero. Finally, we substitute (27f) and 𝐪~m=𝐪¯m/ωm{\bf{\tilde{q}}}_{m}=-{\bf{\bar{q}}}_{m}/\omega_{m} (deduced from (20c)) into (29) to obtain 𝐝m1𝐝¯m=spm𝐪¯mωm𝐪¯mωm𝐝m(m){{\bf{d}}_{m-1}}-{{\bf{\bar{d}}}_{m}}=-{s_{{p_{m}}}}\frac{{{{{\bf{\bar{q}}}}_{m}}}}{{{\omega_{m}}}}-\frac{{{{{\bf{\bar{q}}}}_{m}}}}{{{\omega_{m}}}}{{\bf{d}}_{m}}(m), i.e., (24). Notice that in the above derivation, 𝐳M(1:m)𝐳m{\bf{z}}_{M}(1:m)-{{\bf{z}}_{m}} in (27d) and (27f) needs to be written as [𝐳M(1:m1)𝐳¯m𝐳M(m)𝐳m(m)]\left[{\begin{array}[]{*{20}{c}}{{\bf{z}}_{M}(1:m-1)-{\bf{\bar{z}}}_{m}}\\ {{\bf{z}}_{M}(m)-{{\bf{z}}_{m}}(m)}\end{array}}\right].

𝐝m{{\bf{d}}_{m}} defined by (21) can be replaced with

𝐭m=𝐐m𝐳m,{{\mathbf{t}}_{m}}={{\mathbf{Q}}_{m}}{{\mathbf{z}}_{m}}, (30)

which is substituted into (21) to show that 𝐝m{{\bf{d}}_{m}} and 𝐭m{{\mathbf{t}}_{m}} satisfy

𝐝m=𝐐m𝐳M(1:m)𝐭m.{{\mathbf{d}}_{m}}={{\mathbf{Q}}_{m}}{{\mathbf{z}}_{M}}(1:m)-{{\mathbf{t}}_{m}}. (31)

Accordingly, (23) and (24) are also replaced with

s^pm=𝐭m(m){{\hat{s}}_{{p_{m}}}}={{\bf{t}}_{m}}(m) (32)

and

𝐭m1=𝐭¯m+(spm𝐭m(m))𝐪¯m/ωm{{\bf{t}}_{m-1}}={{\bf{\bar{t}}}_{m}}+\left({{s_{{p_{m}}}}-{{\bf{t}}_{m}}(m)}\right){{\bf{\bar{q}}}_{m}}/{\omega_{m}} (33)

(with 𝐭¯m{{\bf{\bar{t}}}_{m}} obtained by removing the last entry in 𝐭m{{\bf{t}}_{m}}), respectively, which are deduced in Appendix B. However, a little more computations 333The initial 𝐭M{{\mathbf{t}}_{M}} is computed by (30) with M2M^{2} multiplications and additions, while the initial 𝐝M{{\bf{d}}_{M}} is set by (22) without any computations. On the other hand, (32) only saves m=1MmM22\sum\limits_{m=1}^{M}m\approx\frac{{{M^{2}}}}{2} multiplications and additions totally compared to (23). are required when (32) and (33) are utilized instead of (23) and (24), respectively. Then the proposed recursive algorithm with 𝐭m{{\bf{t}}_{m}} will be ignored in the remainder of this paper.

IV-C Proposed Recursive Algorithm II with Both Speed Advantage and Memory Saving

In the proposed recursive algorithm I with speed advantage, i.e., Algorithm 1 with (13c) replaced by (20d), we can replace (15) and (17) (from Improvement II) with (23) and (24) (from Improvement VI), respectively. Then 𝐫¯m{{\bf{\bar{r}}}_{m}} in (17) is replaced with 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m} in (24) (i.e., 𝐪m{\bf{q}}_{m} in (23)), to avoid using 𝐑M{{\bf{R}}_{M}} in the recursion phase. Accordingly, we can cover 𝐑M{{\bf{R}}_{M}} with 𝐐M{{\bf{Q}}_{M}} in the initialization phase to save memories.

When we compute 𝐐M{{\bf{Q}}_{M}} from 𝐑M{{\bf{R}}_{M}} by the iterations of (20d) and (12), 𝐐i{\bf{Q}}_{i} is computed from 𝐐i1{\bf{Q}}_{i-1} and column ii of 𝐑i{\bf{R}}_{i} (i.e., 𝐫¯i{\bf{\bar{r}}}_{i} and γi\gamma_{i}) in the ithi^{th} (2iM2\leq i\leq M) iteration, i.e., only columns i+1i+1 to MM in the upper triangular part of 𝐑M{\bf{R}}_{M} are required in the next iterations i+1i+1 to MM. Accordingly, the subsequent computations will not be affected if we cover the submatrix 𝐑i{\bf{R}}_{i} in 𝐑M{\bf{R}}_{M} with 𝐐i{\bf{Q}}_{i}, by writing (20d) from Improvement V as

𝐪~=𝐑(1:i1,1:i1)𝐑(1:i1,i)\displaystyle{\bf{\tilde{q}}}={\bf{R}}(1:i-1,1:i-1){\bf{R}}(1:i-1,i) (34a)
𝐑(i,i)=1/(𝐑(i,i)𝐑(1:i1,i)H𝐪~)\displaystyle{\bf{R}}(i,i)=1/\left({{\bf{R}}(i,i)-{\bf{R}}{{(1:i-1,i)}^{H}}{\bf{\tilde{q}}}}\right) (34b)
𝐑(1:i1,i)=𝐑(i,i)𝐪~\displaystyle{\bf{R}}(1:i-1,i)=-{\bf{R}}(i,i)\cdot{\bf{\tilde{q}}} (34c)
𝐑(i,1:i1)=𝐑(1:i1,i)H\displaystyle{\bf{R}}(i,1:i-1)={\bf{R}}{(1:i-1,i)^{H}} (34d)
𝐑(1:i1,1:i1)=𝐑(1:i1,1:i1)𝐪~×𝐑(i,1:i1).\displaystyle\begin{array}[]{l}{\bf{R}}(1:i-1,1:i-1)=\\ {\bf{R}}(1:i-1,1:i-1)-{\bf{\tilde{q}}}\times{\bf{R}}(i,1:i-1).\end{array} (34g)

We write (20a), (20b) and (20c) as the above (34a), (34b) and (34c), respectively. In (34d), we use the conjugate transposition of column ii in the upper triangular part of 𝐐i{\bf{Q}}_{i} to cover row ii in its low triangular part. Moreover, we use (20c) to simplify (20d) into 𝐐¯m1=𝐐m1𝐪~m𝐪¯mH{\bf{\bar{Q}}}_{m-1}={\bf{Q}}_{m-1}-{\bf{\tilde{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}, which is written as (34g).

To cover 𝐑M{\bf{R}}_{M} with 𝐐M{\bf{Q}}_{M}, we compute (34g) iteratively for i=2,3,,Mi=2,3,\cdots,M, while 𝐑1{\bf{R}}_{1} is covered with 𝐐1{\bf{Q}}_{1} firstly by

𝐑(1,1)=1/𝐑(1,1),{\bf{R}}(1,1)=1/{\bf{R}}(1,1), (35)

which is deduced from (14). Notice that in (34g), only column ii in the upper triangular part of 𝐑i{\bf{R}}_{i} is utilized to compute 𝐐i{\bf{Q}}_{i}, while the entire matrix 𝐑i{\bf{R}}_{i} is covered with 𝐐i{\bf{Q}}_{i}.

In the initialization phase, we also cover 𝐇M{{\bf{H}}_{M}} with 𝐑M{{\bf{R}}_{M}} to save memories, since 𝐇M{{\bf{H}}_{M}} is unwanted in the recursion phase. Actually, we choose the implementation which covers the upper triangular part of a square submatrix in

𝐇~=𝐇MH{\bf{\tilde{H}}}={\bf{H}}_{M}^{H} (36)

with that of 𝐑M{{\bf{R}}_{M}}. Accordingly, we substitute (36) into (5a) to get 𝐑M=𝐇~𝐇~H+α𝐈M{\bf{R}}_{M}={\bf{\tilde{H}}}{\bf{\tilde{H}}}^{H}+\alpha{\bf{I}}_{M}, and apply it to cover row ii in the upper triangular part of the square submatrix 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M) with that of 𝐑M{\bf{R}}_{M}, by

𝐇~(i,i:M)=𝐇~(i,:)𝐇~(i:M,:)H+[α𝟎MiT].{\bf{\tilde{H}}}(i,i:M)={\bf{\tilde{H}}}(i,:){\bf{\tilde{H}}}{(i:M,:)^{H}}+\left[\alpha\enspace{{\bf{0}}_{M-i}^{T}}\right]. (37)

By computing (37) iteratively for i=1,2,,Mi=1,2,\cdots,M, we cover the upper triangular part of the square submatrix 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M) with that of 𝐑M{\bf{R}}_{M}. The above reuse of memories will not affect subsequent calculations, since only rows ii to MM of 𝐇~{\bf{\tilde{H}}} are utilized in (37) to compute row ii in the upper triangular part of 𝐑M{\bf{R}}_{M}, i.e., row ii of 𝐇~{\bf{\tilde{H}}} will not be utilized in the next (i+1)th(i+1)^{th} to MthM^{th} iterations of (37). Moreover, we set the initial 𝐝M{{\bf{d}}_{M}} by (22), and substitute (36) into (16) with m=Mm=M to compute 𝐳M{\bf{z}}_{M} from 𝐇~{\bf{\tilde{H}}} by

𝐳M=𝐇~𝐱(M).{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}^{(M)}. (38)

In the recursion phase, we still permute 𝐐m{\bf{Q}}_{m} according to the SNR ordering, which causes 𝐪m{\bf{q}}_{m} (i.e., 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m}) in 𝐐m{\bf{Q}}_{m} permuted. Then it can be seen from (23) and (24) that we need to permute entries lml_{m} and mm in 𝐳M{\bf{z}}_{M} and 𝐝m{{\bf{d}}_{m}}.

We summarize the proposed recursive V-BLAST algorithm II with both speed advantage and memory saving in Algorithm 3, which revises Algorithm 1 (i.e., the improved algorithm with speed advantage in [21]) by replacing Improvements I and II with Improvements V and VI, respectively. In Algorithm 3, we utilize (34g) from Improvement V, (23) and (24) from Improvement VI, and (35)-(22).

Algorithm 3 The Proposed Recursive Algorithm with Both Speed Advantage and Memory Saving
0:     Set 𝐩=[1,2,,M]T{\bf{p}}=\left[1,2,\cdots,M\right]^{T} and 𝐝M=𝟎M{{\bf{d}}_{M}}={{\bf{0}}_{M}}; Store 𝐇~=𝐇H{\bf{\tilde{H}}}={\bf{H}}^{H} and compute 𝐳M=𝐇~𝐱{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}; Compute (37) iteratively for i=1,2,,Mi=1,2,\cdots,M to cover the upper triangular part of 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M) with that of 𝐑M{\bf{R}}_{M}; Compute (35), and then compute (34g) iteratively for i=2,3,,Mi=2,3,\cdots,M, to cover the entire matrix 𝐑M{\bf{R}}_{M} (i.e., the entire square submatrix 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M)) with 𝐐M{\bf{Q}}_{M};
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjjq_{jj} is the jthj^{th} diagonal entry of 𝐐m{\bf{Q}}_{m}. Permute entries lml_{m} and mm in 𝐩{\bf{p}}, 𝐳M{\bf{z}}_{M} and 𝐝m{{\bf{d}}_{m}}. Permute rows and columns lml_{m} and mm in 𝐐m{\bf{Q}}_{m};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (23), which is quantized to spms_{{p_{m}}};
3:  Deflate 𝐝m{\bf{d}}_{m} to 𝐝m1{{\bf{d}}_{m-1}} by (24), and deflate 𝐐m{{\bf{Q}}_{m}} to 𝐐m1{{\bf{Q}}_{m-1}} by (18);
3:     When m=1m=1, only run step 2 to get sp1s_{p_{1}};

To further save memories and permutation operations, we can avoid permuting 𝐳M{\bf{z}}_{M}, 𝐝m{{\bf{d}}_{m}} and 𝐐m{\bf{Q}}_{m} in the recursion phase of Algorithm 3, and store only the upper triangular part of the Hermitian 𝐐i{{\bf{Q}}_{i}} (i=1,2,,Mi=1,2,\cdots,M). The corresponding versions of the proposed recursive algorithm II with both speed advantage and memory saving are introduced in Appendix C.

V Performance Analysis and Numerical Results

TABLE I: Complexities of the Presented Recursive V-BLAST Algorithms
Complexity
The original algorithm in [18]
(3M2N+23M33M^{2}N+\frac{2}{3}M^{3},
52M2N+12M3\frac{5}{2}M^{2}N+\frac{1}{2}M^{3})
The algorithm with memory saving in [21] 2M2N+16M32M^{2}N+\frac{1}{6}M^{3}
The “fastest known algorithm” before [21] 12M2N+43M3\frac{1}{2}M^{2}N+\frac{4}{3}M^{3}
The algorithm with speed advantage in [21] 12M2N+M3\frac{1}{2}M^{2}N+M^{3}
The proposed two algorithms 12M2N+23M3\frac{1}{2}M^{2}N+\frac{2}{3}M^{3}

The dominant complexity of (20d) is 12M3\frac{1}{2}M^{3} complex multiplications and additions, which is dedicated to computing 𝐐m1𝐫¯m{\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m} and 𝐐m1+ωm𝐪~m𝐪~mH{\bf{Q}}_{m-1}+\omega_{m}{\bf{\tilde{q}}}_{m}{\bf{\tilde{q}}}_{m}^{H} for m=2,3,,Mm=2,3,...,M. The complexity of (13c) should be 56M3\frac{5}{6}M^{3} instead of 12M3\frac{1}{2}M^{3} claimed in [21], which is dedicated to computing 𝐠m1=𝐐m1𝐫¯m{\bf{g}}_{m-1}={\bf{Q}}_{m-1}{\bf{\bar{r}}}_{m}, 𝐐m1+𝐠m1𝐠m1H/(){\bf Q}_{m-1}+{\bf g}_{m-1}{\bf g}_{m-1}^{H}/(\cdots) and 𝐐¯m1𝐫¯m{\bf{\bar{Q}}}_{m-1}{\bf{\bar{r}}}_{m}. Compared to the inversion step in [19, 21] to compute the initial 𝐐{\bf Q} by (13c) and (12), the proposed inversion step by (20d) and (12) achieves the speedup of (56)/(12)=1.67(\frac{5}{6})/(\frac{1}{2})=1.67, and requires only 13\frac{1}{3} divisions.

The complexity claimed in [21] for the above-mentioned inversion step to compute the initial 𝐐{\bf Q} should increase by 56M312M3=13M3\frac{5}{6}M^{3}-\frac{1}{2}M^{3}=\frac{1}{3}M^{3}. Then actually “fastest known algorithm” before [21] and the recursive algorithm with speed advantage in [21] require the complexities of 12M2N+43M3\frac{1}{2}M^{2}N+\frac{4}{3}M^{3} and 12M2N+M3\frac{1}{2}M^{2}N+M^{3}, respectively, instead of 12M2N+M3\frac{1}{2}M^{2}N+M^{3} and 12M2N+23M3\frac{1}{2}M^{2}N+\frac{2}{3}M^{3} claimed in [21]. Coincidentally, the above 12M2N+23M3\frac{1}{2}M^{2}N+\frac{2}{3}M^{3} (claimed in [21]) is exactly the dominant complexity of the proposed recursive algorithm I with speed advantage, and is also that of the proposed recursive algorithm II with both speed advantage and memory saving, since no dominant complexity is required to compute (15), (17), (23) and (24) from Improvements II and VI. On the other hand, the recursive algorithm with memory saving in [21] requires 2M2N+16M32M^{2}N+\frac{1}{6}M^{3} multiplications instead of 52M2N+16M3\frac{5}{2}M^{2}N+\frac{1}{6}M^{3} multiplications claimed in [21, Equ. 27], since it adopts Improvement III where the computation of the initial 𝐐{\bf{Q}} by (6) costs 32M2N\frac{3}{2}M^{2}N multiplications [20, near Equ. 12] instead of 2M2N2M^{2}N multiplications claimed in [21, Equ. 27].

The dominant complexities of the presented recursive V-BLAST algorithms are compared in Table I, where an item jj denotes jj multiplications and additions, and (j,k)(j,k) denotes jj multiplications and kk additions. It can be seen from Table I that when M=NM=N, the actual speedup of the algorithm with speed advantage in [21] over the previous “fastest known algorithm” should be (116)/(32)=1.22(\frac{11}{6})/(\frac{3}{2})=1.22 instead of 1.31.3 claimed in [21], while the speedups of either proposed recursive algorithm over the algorithm in [21] with speed advantage and that with memory saving are (32)/(76)=1.3(\frac{3}{2})/(\frac{7}{6})=1.3 and (136)/(76)1.86(\frac{13}{6})/(\frac{7}{6})\approx 1.86, respectively.

Assuming N=MN=M, we carried out numerical experiments to count the average floating-point operations (flops) of the presented recursive V-BLAST algorithms, including the original algorithm [18], the algorithm with memory saving in [21], the “fastest known algorithm” before [21], the algorithm with speed advantage in [21], the proposed algorithm with speed advantage, and the proposed algorithm with both speed advantage and memory saving. All results are shown in Fig. 1. It can be seen that they are consistent with the theoretical flops calculation.

Refer to caption
Figure 1: Comparison of computational complexities among the original algorithm [18], the algorithm with memory saving in [21], the “fastest known algorithm” before [21], the algorithm with speed advantage in [21], the proposed algorithm with speed advantage, and the proposed algorithm with both speed advantage and memory saving.

With respect to the “fastest known algorithm” before [21], the algorithm with memory saving in [21] (i.e., Algorithm 2) costs more computations to save memories for storing 𝐑{\bf{R}}, and then only needs to store 𝐇{\bf{H}} and 𝐐{\bf{Q}}. As a comparison 444We only compare memories for storing matrices, which are at least 2×22\times 2., the proposed recursive algorithm II with both speed advantage and memory saving only stores 𝐐{\bf{Q}} in the recursion phase, and only uses memories for storing 𝐇{\bf{H}} in the initialization phase since it covers 𝐇{\bf{H}} with 𝐑{\bf{R}} and 𝐐{\bf{Q}} successively. Accordingly, it can be concluded that the proposed algorithm II saves about half memories with respect to the algorithm with memory saving in [21], and then requires the least memories among the existing recursive V-BLAST algorithms.

VI Conclusion

Firstly, we presented the original recursive V-BLAST algorithm, and its existing Improvements I-IV to reduce the computational complexity. Moreover, we also described the existing recursive V-BLAST algorithm with speed advantage and that with memory saving, which incorporate Improvements I-IV and only Improvements III-IV into the original algorithm, respectively. Then we propose Improvements V and VI to replace Improvements I and II, respectively. Instead of the lemma for inversion of partitioned matrix applied in Improvement I, the proposed Improvement V uses another lemma to compute the inverse matrix with less complexity. On the other hand, the formulas adopted in the proposed Improvement V are applied to deduce Improvement VI with the improved interference cancellation scheme, which saves memories without sacrificing speed compared to Improvement II.

In the existing recursive V-BLAST algorithm with speed advantage, the proposed recursive algorithm I with speed advantage replaces Improvement I with Improvement V, while the proposed recursive algorithm II with both speed advantage and memory saving replaces Improvements I and II with Improvements V and VI, respectively. With respect to the existing algorithm with speed advantage, both proposed algorithms achieve the speedup of 1.31.3, and speed up the matrix inversion step by the factor of 1.671.67. On the other hand, the proposed algorithm II achieves the speedup of 1.861.86 and saves about half memories, compared to the existing recursive V-BLAST algorithm with memory saving, and requires the least memories among the existing recursive algorithms.

We only focus on the OSIC detectors throughout this paper. In our future works, we will show that the reordered description based on the equivalent channel matrix for the ISIC detector can be regarded as the extension of the OSIC detector. Accordingly, we will extend the recursive OSIC detector with both speed advantage and memory saving proposed in this paper to the recursive ISIC detector, which can reduce the required computations and memories compared to the recently proposed low-complexity ISIC detector.

Appendix A The Proof of (19c)

Equation (8) in [40] can be written as

[𝐀𝐔𝐕𝐃]1=[𝐀~𝐔~𝐕~𝐃~]\left[{\begin{array}[]{*{20}{c}}{\bf{A}}&{\bf{U}}\\ {\bf{V}}&{\bf{D}}\end{array}}\right]^{-1}=\left[{\begin{array}[]{*{20}{c}}{{\bf{\tilde{A}}}}&{{\bf{\tilde{U}}}}\\ {{\bf{\tilde{V}}}}&{{\bf{\tilde{D}}}}\end{array}}\right] (39)

with

𝐀~=𝐀1+(𝐀1𝐔)(𝐃𝐕𝐀1𝐔)1(𝐕𝐀1)\displaystyle{\bf{\tilde{A}}}={{\bf{A}}^{-1}}+({{\bf{A}}^{-1}}{\bf{U}}){({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}({\bf{V}}{{\bf{A}}^{-1}}) (40a)
𝐔~=(𝐀1𝐔)(𝐃𝐕𝐀1𝐔)1\displaystyle{\bf{\tilde{U}}}=-({{\bf{A}}^{-1}}{\bf{U}}){({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}} (40b)
𝐕~=(𝐃𝐕𝐀1𝐔)1(𝐕𝐀1)\displaystyle{\bf{\tilde{V}}}=-{({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}({\bf{V}}{{\bf{A}}^{-1}}) (40c)
𝐃~=(𝐃𝐕𝐀1𝐔)1,\displaystyle{\bf{\tilde{D}}}={({\bf{D}}-{\bf{V}}{{\bf{A}}^{-1}}{\bf{U}})^{-1}}, (40d)

which inverts a partitioned matrix. By comparing (39) with (10) and (12), we can conclude that 𝐀{\bf{A}}, 𝐔{\bf{U}}, 𝐕{\bf{V}}, 𝐃{\bf{D}}, 𝐀~{\bf{\tilde{A}}}, 𝐔~{\bf{\tilde{U}}} and 𝐃~{\bf{\tilde{D}}} can be replaced with 𝐑m1{{\bf{R}}_{m-1}}, 𝐫¯m{{\bf{\bar{r}}}_{m}}, 𝐫¯mH{\bf{\bar{r}}}_{m}^{H}, γm{\gamma_{m}}, 𝐐¯m1{\bf{\bar{Q}}}_{m-1}, 𝐪¯m{{\bf{\bar{q}}}_{m}} and ωm{\omega_{m}}, respectively, and 𝐀1{{\bf{A}}^{-1}} can be replaced with 𝐐m1=𝐑m11{{\bf{Q}}_{m-1}}={\bf{R}}_{m-1}^{-1}. Accordingly, we can write (40a), (40b) and (40d) as

𝐐¯m1=𝐐m1+(𝐐m1𝐫¯m)×(γm𝐫¯mH𝐐m1𝐫¯m)1(𝐫¯mH𝐐m1),{\bf{\bar{Q}}}_{m-1}={{\bf{Q}}_{m-1}}+({{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})\times\\ {({\gamma_{m}}-{\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})^{-1}}({\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}), (41)
𝐪¯m=(𝐐m1𝐫¯m)(γm𝐫¯mH𝐐m1𝐫¯m)1{{\bf{\bar{q}}}_{m}}=-({{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}}){({\gamma_{m}}-{\bf{\bar{r}}}_{m}^{H}{{\bf{Q}}_{m-1}}{{\bf{\bar{r}}}_{m}})^{-1}} (42)

and (19a), respectively. Finally, we substitute (19a) into (42) to obtain (19b), and substitute (19a) and (19b) into (41) to obtain (19c).

Appendix B The Proof of (32) and (33)

It can be seen from (31) that

𝐝¯m=𝐐m(1:m1,:)𝐳M(1:m)𝐭¯m\displaystyle{{\mathbf{\bar{d}}}_{m}}={{\mathbf{Q}}_{m}}(1:m-1,:){{\mathbf{z}}_{M}}(1:m)-{{\mathbf{\bar{t}}}_{m}} (43a)
𝐝m(m)=𝐪mH𝐳M(1:m)𝐭m(m),\displaystyle{{\mathbf{d}}_{m}}(m)=\mathbf{q}_{m}^{H}{{\mathbf{z}}_{M}}(1:m)-{{\mathbf{t}}_{m}}(m), (43b)

and it can be seen from (12) that

𝐐m(1:m1,:)=[𝐐¯m1𝐪¯m]\displaystyle{{\mathbf{Q}}_{m}}(1:m-1,:)=\left[\begin{matrix}\mathbf{\bar{Q}}_{m-1}&\mathbf{\bar{q}}_{m}\\ \end{matrix}\right] (44a)
𝐪mH=[𝐪¯mHωm].\displaystyle\mathbf{q}_{m}^{H}=\left[\begin{matrix}\mathbf{\bar{q}}_{m}^{H}&{{\omega}_{m}}\\ \end{matrix}\right]. (44b)

Then substitute (19c) into (44a) to obtain 𝐐m(1:m1,:)=[𝐐m1+ωm1𝐪¯m𝐪¯mH𝐪¯m]{{\mathbf{Q}}_{m}}(1:m-1,:)=\left[{{\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}}\quad{{\bf{\bar{q}}}_{m}}\right], which is substituted into (43a) to obtain

𝐝¯m\displaystyle{{\bf{\bar{d}}}_{m}} =[𝐐m1+ωm1𝐪¯m𝐪¯mH𝐪¯m]𝐳M(1:m)𝐭¯m\displaystyle=\left[{{\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}}\quad{{\bf{\bar{q}}}_{m}}\right]{{\bf{z}}_{M}}(1:m)-{{\bf{\bar{t}}}_{m}}
=(𝐐m1+ωm1𝐪¯m𝐪¯mH)𝐳M(1:m1)\displaystyle=({\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}){{\bf{z}}_{M}}(1:m-1)
+𝐳M(m)𝐪¯m𝐭¯m.\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\;+{{\bf{z}}_{M}}(m){\bf{\bar{q}}}_{m}-{{\bf{\bar{t}}}_{m}}. (45)

On the other hand, substituted (44b) into (43b) to obtain

𝐝m(m)\displaystyle{{\bf{d}}_{m}}(m) =[𝐪¯mHωm]𝐳M(1:m)𝐭m(m)\displaystyle=\left[{\begin{array}[]{*{20}{c}}{{\bf{\bar{q}}}_{m}^{H}}&{{\omega_{m}}}\end{array}}\right]{{\bf{z}}_{M}}(1:m)-{{\bf{t}}_{m}}(m) (47)
=𝐪¯mH𝐳M(1:m1)+ωm𝐳M(m)𝐭m(m).\displaystyle={\bf{\bar{q}}}_{m}^{H}{{\bf{z}}_{M}}(1:m-1)+{\omega_{m}}{{\bf{z}}_{M}}(m)-{{\bf{t}}_{m}}(m). (48)

To deduce (32), we only need to substitute (43b) into (23) to obtain s^pm=𝐪mH𝐳M(1:m)(𝐪mH𝐳M(1:m)𝐭m(m)){{\hat{s}}_{{p_{m}}}}={\bf{q}}_{m}^{H}{{\bf{z}}_{M}}(1:m)-\left({{\bf{q}}_{m}^{H}{{\bf{z}}_{M}}(1:m)-{{\bf{t}}_{m}}(m)}\right). To deduce (33), we need to substitute (31) with n=n1n=n-1, (45) and (48) into (24) to obtain

𝐐m1𝐳M(1:m1)𝐭m1=(𝐐m1+ωm1𝐪¯m𝐪¯mH)×\displaystyle{{\bf{Q}}_{m-1}}{{\bf{z}}_{M}}(1:m-1)-{{\bf{t}}_{m-1}}=({\bf{Q}}_{m-1}+\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H})\times
𝐳M(1:m1)+𝐳M(m)𝐪¯m𝐭¯m\displaystyle{{\bf{z}}_{M}}(1:m-1)+{{\bf{z}}_{M}}(m){\bf{\bar{q}}}_{m}-{{\bf{\bar{t}}}_{m}}-
(spm+(𝐪¯mH𝐳M(1:m1)+ωm𝐳M(m)𝐭m(m)))𝐪¯m/ωm,\displaystyle\left({{s_{{p_{m}}}}+\left({{\bf{\bar{q}}}_{m}^{H}{{\bf{z}}_{M}}(1:m-1)+{\omega_{m}}{{\bf{z}}_{M}}(m)-{{\bf{t}}_{m}}(m)}\right)}\right){{\bf{\bar{q}}}_{m}}/{\omega_{m}},

which can be easily simplified into (33).

Algorithm 4 The Proposed Version with Memory Saving and Fewer Permutation Operations
0:    Same as Initialization in Algorithm 3;
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjj=𝐐(pj,pj){q_{jj}}={\bf{Q}}(p_{j},p_{j}) is the pjthp_{j}^{th} diagonal entry of 𝐐{\bf{Q}}; Permute entries lml_{m} and mm in 𝐩{\bf{p}}; Obtain ωm=𝐐(pm,pm)\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right) and 𝐪¯m=𝐐(𝐩(1:m1),pm){\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right) in 𝐐{\bf{Q}}, which form 𝐪m=[𝐪¯mTωm]T{\bf{q}}_{m}={\left[{{\bf{\bar{q}}}_{m}^{T}}\enspace{{\omega_{m}}}\right]^{T}};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (49), and then quantize s^pm\hat{s}_{{p_{m}}} to spms_{{p_{m}}};
3:  Use 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m} to update 𝐝(𝐩(1:m1)){{\bf{d}}\left({\bf{p}}(1:m-1)\right)} by (50), and update 𝐐(𝐩(1:m1),𝐩(1:m1)){\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right) by (51);
3:    When m=1m=1, only run step 2 to get sp1s_{p_{1}};

Appendix C Several Versions of Proposed Recursive V-BLAST Algorithm II to Further Save Memories and Permutation Operations

In this appendix, we introduce several versions of the proposed recursive V-BLAST algorithm II with both speed advantage and memory saving. Firstly, we avoid the permutation operations for 𝐳M{\bf{z}}_{M}, 𝐝m{{\bf{d}}_{m}} and 𝐐m{\bf{Q}}_{m} in step 1 of the recursion phase in Algorithm 3. Then we give the implementations that store only the upper triangular part of the Hermitian 𝐐i{{\bf{Q}}_{i}} (i=1,2,,Mi=1,2,\cdots,M).

Algorithm 5 The Proposed Version Storing Only the Upper Triangular Part of 𝐐{\bf{Q}}
0:     Set 𝐩=[1,2,,M]T{\bf{p}}=\left[1,2,\cdots,M\right]^{T} and 𝐝M=𝟎M{{\bf{d}}_{M}}={{\bf{0}}_{M}}; Store 𝐇~=𝐇H{\bf{\tilde{H}}}={\bf{H}}^{H} and compute 𝐳M=𝐇~𝐱{\bf{z}}_{M}={\bf{\tilde{H}}}{\bf{x}}; Compute (37) for i=1,2,,Mi=1,2,\cdots,M to cover the upper triangular part of 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M) with that of 𝐑M{\bf{R}}_{M}; Compute 𝐐1{\bf{Q}}_{1} by (35), and then compute (55), (34b), (34c) and (34g) iteratively for i=2,3,,Mi=2,3,\cdots,M, to cover the upper triangular part of 𝐑M{\bf{R}}_{M} (i.e., 𝐇~(:,1:M){\bf{\tilde{H}}}(:,1:M)) with that of 𝐐M{\bf{Q}}_{M};
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjjq_{jj} is the jthj^{th} diagonal entry of 𝐐m{\bf{Q}}_{m}. Permute entries lml_{m} and mm in 𝐩{\bf{p}}, 𝐳M{\bf{z}}_{M} and 𝐝m{{\bf{d}}_{m}}. Exchange 𝐐m(1:lm1,lm){{\bf{Q}}_{m}}(1:{l_{m}}-1,{l_{m}}) and 𝐐m(1:lm1,m){{\bf{Q}}_{m}}(1:{l_{m}}-1,m), 𝐐m(lm+1:m1,m){{\bf{Q}}_{m}}({l_{m}}+1:m-1,m) and 𝐐m(lm,lm+1:m1)H{{\bf{Q}}_{m}}{({l_{m}},{l_{m}}+1:m-1)^{H}}, and 𝐐m(lm,lm){{\bf{Q}}_{m}}({l_{m}},{l_{m}}) and 𝐐m(m,m){{\bf{Q}}_{m}}(m,m), and let 𝐐m(lm,m)=𝐐m(lm,m){{\bf{Q}}_{m}}({l_{m}},m)={{\bf{Q}}_{m}}{({l_{m}},m)^{*}};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (23), which is quantized to spms_{{p_{m}}};
3:  Deflate 𝐝m{\bf{d}}_{m} to 𝐝m1{{\bf{d}}_{m-1}} by (24), and compute only the upper triangular part of 𝐐m1{{\bf{Q}}_{m-1}} by (18);
3:     When m=1m=1, only run step 2 to get sp1s_{p_{1}};
Algorithm 6 The Proposed Version Storing the Upper Triangular Part of 𝐐{\bf{Q}} with Fewer Permutation Operations
0:    Same as Initialization in Algorithm 5;
0:    For m=M,M1,,2m=M,M-1,\cdots,2:
1:  Find lm=argminj=1,2mqjjl_{m}=\mathop{\arg\min}\limits_{j=1,2...}^{m}{q}_{jj}, where qjj=𝐐(pj,pj){q_{jj}}={\bf{Q}}(p_{j},p_{j}) is the pjthp_{j}^{th} diagonal entry of 𝐐{\bf{Q}}; Permute entries lml_{m} and mm in 𝐩{\bf{p}}; In 𝐐{\bf{Q}}, get ωm=𝐐(pm,pm)\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right), and get 𝐪¯m=𝐐(𝐩(1:m1),pm){\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right) where 𝐐(𝐩(i),pm)=𝐐(pm,𝐩(i)){\bf{Q}}\left({{\bf{p}}(i),{p_{m}}}\right)={\bf{Q}}\left({{p_{m}},{\bf{p}}(i)}\right)^{*} (i=1,2,,m1i=1,2,\cdots,m-1) if 𝐩(i)>pm{\bf{p}}(i)>{p_{m}};
2:  Compute s^pm\hat{s}_{{p_{m}}} by (49) with 𝐪m{\bf{q}}_{m} formed by 𝐪m=[𝐪¯mTωm]T{\bf{q}}_{m}={\left[{{\bf{\bar{q}}}_{m}^{T}}\enspace{{\omega_{m}}}\right]^{T}}, and then quantize s^pm\hat{s}_{{p_{m}}} to spms_{{p_{m}}};
3:  Use 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m} to update 𝐝(𝐩(1:m1)){{\bf{d}}\left({\bf{p}}(1:m-1)\right)} by (50), and compute only the upper-triangular entries in 𝐐(𝐩(1:m1),𝐩(1:m1)){\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right) by (51);
3:     When m=1m=1, only run step 2 to get sp1s_{p_{1}};

C-A The Version to Avoid Permutation Operations

We can avoid the permutation operations for 𝐳M{\bf{z}}_{M}, 𝐝m{{\bf{d}}_{m}} and 𝐐m{\bf{Q}}_{m} in step 2 of the recursion phase in Algorithm 3. Accordingly, we need to replace (23), (24) and (18) with

s^pm=𝐪mH𝐳M(𝐩(1:m))𝐝(pm),\hat{s}_{{p_{m}}}={\bf{q}}_{m}^{H}{\bf{z}}_{M}({\bf{p}}(1:m))-{{\bf{d}}}({p_{m}}), (49)
𝐝(𝐩(1:m1))=𝐝(𝐩(1:m1))(spm+𝐝(pm))𝐰m1/ωm{{\bf{d}}\left({\bf{p}}(1:m-1)\right)}={{\bf{d}}\left({\bf{p}}(1:m-1)\right)}\\ -\left({{{s}_{{p_{m}}}}+{\bf{d}}\left({p_{m}}\right)}\right){\bf{w}}_{m-1}/{\omega_{m}} (50)

and

𝐐(𝐩(1:m1),𝐩(1:m1))=𝐐(𝐩(1:m1),𝐩(1:m1))ωm1𝐪¯m𝐪¯mH,{\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)=\\ {\bf{Q}}\left({{\bf{p}}(1:m-1),{\bf{p}}(1:m-1)}\right)-\omega_{m}^{-1}{\bf{\bar{q}}}_{m}{\bf{\bar{q}}}_{m}^{H}, (51)

respectively. In the above (50) and (51), 𝐪¯m{\bf{\bar{q}}}_{m} and ωm\omega_{m} can be obtained in 𝐐{\bf{Q}} by

𝐪¯m=𝐐(𝐩(1:m1),pm){\bf{\bar{q}}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m-1),{p_{m}}}\right) (52)

and

ωm=𝐐(pm,pm),\omega_{m}={\bf{Q}}\left({{p_{m}},{p_{m}}}\right), (53)

respectively, which form 𝐪m=𝐐(𝐩(1:m),pm){\bf{q}}_{m}={\bf{Q}}\left({{\bf{p}}(1:m),{p_{m}}}\right) for (49) by

𝐪m=[𝐪¯mTωm]T.{\bf{q}}_{m}={\left[{\begin{array}[]{*{20}{c}}{{\bf{\bar{q}}}_{m}^{T}}&{{\omega_{m}}}\end{array}}\right]^{T}}. (54)

The proposed version with memory saving and fewer permutation operations is summarized in Algorithm 4.

C-B The Versions to Store Only the Triangular Part of the Hermitian Matrix

To further save memories, we can store only the upper triangular part of the Hermitian 𝐐i{{\bf{Q}}_{i}} where i=2,3,,Mi=2,3,\cdots,M. Now instead of computing 𝐪~\bf{\tilde{q}} by (34a), we obtain 𝐪~\bf{\tilde{q}} by computing the jthj^{th} entry of 𝐪~\bf{\tilde{q}} by

𝐪~(j)=[𝐑(1:j1,j)H𝐑(j,j:i1)]𝐑(1:i1,i){\bf{\tilde{q}}}(j)=[{{\bf{R}}{{(1:j-1,j)}^{H}}}\;{{\bf{R}}(j,j:i-1)}]{\bf{R}}(1:i-1,i) (55)

for j=1,2,,i1j=1,2,\cdots,i-1, since only the upper triangular part of 𝐑(1:i1,1:i1)=𝐐i1{\bf{R}}(1:i-1,1:i-1)={{\bf{Q}}_{i-1}} is available in (34a). Then we modify Algorithm 3 and Algorithm 4 into Algorithm 5 and Algorithm 6, respectively.

References

  • [1] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Commun., pp. 311-335, Mar. 1998.
  • [2] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389-399, Mar. 2003.
  • [3] K.-J. Yang and S.-H. Tsai, “Maximum Likelihood and Soft Input Soft Output MIMO Detection at a Reduced Complexity,” IEEE Trans. Veh. Technol., vol. 67, no. 12, pp. 12389-12393, Dec. 2018.
  • [4] V. Elvira and I. Santamaria, “Multiple Importance Sampling for Symbol Error Rate Estimation of Maximum-Likelihood Detectors in MIMO Channels,” IEEE Trans. Signal Process., vol. 69, pp. 1200-1212, 2021.
  • [5] P. W. Wolniansky, G. J. Foschini, G. D. Golden and R. A. Valenzuela, “V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel,” Proc. Int. Symp. Signals, Syst., Electron. (ISSSE 98), Pisa, Italy, Sept. 1998, pp. 295-300.
  • [6] K. Liu and A. M. Sayeed, “An Iterative Extension of BLAST Decoding Algorithm for Layered Space-Time Signals,” IEEE Transactions on Communications, vol. 53, no. 9, pp. 1595-1595, Sept. 2005.
  • [7] S. W. Kim and K. P. Kim, “Log-likelihood-ratio-based detection ordering in V-BLAST,” IEEE Trans. Commun., vol. 54, no. 2, pp. 302-307, Feb. 2006.
  • [8] S. R. Lee, S. H. Park, S. W. Kim, and I. Lee, “Enhanced detection with new ordering schemes for V-BLAST systems,” IEEE Trans. Wireless Commun., vol. 57, no. 6, pp. 1648-1651, Jun. 2009.
  • [9] Y. Yapıcı, İ. Güvenç and Y. Kakishima, “A MAP-Based Layered Detection Algorithm and Outage Analysis Over MIMO Channels,” IEEE Trans. Wireless Commun., vol. 17, no. 7, pp. 4256-4269, Jul. 2018.
  • [10] W.-J. Choi, K.-W. Cheong, and J. M. Cioffi, “Iterative soft interference cancellation for multiple antenna systems,” Proc. IEEE Wireless Commun. Netw. Conf. Rec., Chicago, IL, USA, Sep. 2000, pp. 304 C309.
  • [11] X. Li, H. Huang, G. J. Foschini and R. A. Valenzuela, “Effects of iterative detection and decoding on the performance of BLAST,” Globecom ’00 - IEEE. Global Telecommunications Conference, San Francisco, CA, USA, 2000, pp. 1061-1066 vol.2.
  • [12] J. W. Choi, A. C. Singer, J. Lee and N. I. Cho, “Improved linear soft-input soft-output detection via soft feedback successive interference cancellation,” IEEE Transactions on Communications, vol. 58, no. 3, pp. 986-996, March 2010.
  • [13] P. Xiao, W. Yin, C. Cowan and V. Fusco, “VBLAST Detection Algorithms Utilizing Soft Symbol Estimate and Noncircular CAI,” IEEE Transactions on Signal Processing, vol. 59, no. 5, pp. 2441-2447, May 2011.
  • [14] Y. -C. Liang, E. Y. Cheu, L. Bai and G. Pan, “On the Relationship Between MMSE-SIC and BI-GDFE Receivers for Large Multiple-Input Multiple-Output Channels,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3627-3637, Aug. 2008.
  • [15] N. Shlezinger, R. Fu and Y. C. Eldar, “DeepSIC: Deep Soft Interference Cancellation for Multiuser MIMO Detection,” IEEE Transactions on Wireless Communications, vol. 20, no. 2, pp. 1349-1362, Feb. 2021.
  • [16] F. Cao, J. Li and J. Yang, “On the Relation Between PDA and MMSE-ISDIC,” IEEE Signal Processing Letters, vol. 14, no. 9, pp. 597-600, Sept. 2007.
  • [17] Xiaodong Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Transactions on Communications, vol. 47, no. 7, pp. 1046-1061, July 1999.
  • [18] J. Benesty, Y. Huang and J. Chen, “A fast recursive algorithm for optimum sequential signal detection in a BLAST system,” IEEE Trans. Signal Process., pp. 1722-1730, Jul. 2003.
  • [19] L. Szczeciński and D. Massicotte, “Low complexity adaptation of MIMO MMSE receivers, implementation aspects,” Proc. Global Commun. Conf. (Globecom’05), St. Louis, MO, USA, Nov., 2005.
  • [20] H. Zhu, Z. Lei, F.P.S. Chin, “An improved recursive algorithm for BLAST,” Signal Process., vol. 87, no. 6, pp. 1408-1411, Jun. 2007.
  • [21] Y. Shang and X. G. Xia, “On fast recursive algorithms for V-BLAST with optimal ordered SIC detection,” IEEE Trans. Wireless Commun., vol. 8, pp. 2860-2865, Jun. 2009.
  • [22] B. Hassibi, “An efficient square-root algorithm for BLAST,” Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP ’00), pp. 737-740, Jun. 2000.
  • [23] H. Zhu, Z. Lei, and F. Chin, “An improved square-root algorithm for BLAST,” IEEE Signal Process. Lett., vol. 11, no. 9, pp. 772-775, 2004.
  • [24] H. Zhu, W. Chen, B. Li, and F. Gao, “An improved square-root algorithm for V-BLAST based on efficient inverse Cholesky factorization,” IEEE Trans. Wireless Commun., vol. 10, no. 1, pp. 43-48, 2011.
  • [25] K. Pham and K. Lee, “Low-Complexity SIC Detection Algorithms for Multiple-Input Multiple-Output Systems,” IEEE Trans. Signal Process., pp. 4625-4633, vol. 63, no. 17, Sept. 2015.
  • [26] C. Studer, S. Fateh and D. Seethaler, “ASIC Implementation of Soft-Input Soft-Output MIMO Detection Using MMSE Parallel Interference Cancellation,” IEEE Journal of Solid-State Circuits, vol. 46, no. 7, pp. 1754-1765, July 2011.
  • [27] S. Park, “Low-Complexity LMMSE-Based Iterative Soft Interference Cancellation for MIMO Systems,” IEEE Trans. on Signal Processing, vol. 70, pp. 1890-1899, 2022.
  • [28] M. J. Grabner, X. Li and S. Fu, “An Adaptive BLAST Successive Interference Cancellation Method for High Data Rate Perfect Space-Time Coded MIMO Systems,” IEEE Trans. Veh. Technol., vol. 69, no. 2, pp. 1542-1553, Feb. 2020.
  • [29] E. Basar, “On Multiple-Input Multiple-Output OFDM with Index Modulation for Next Generation Wireless Networks,” IEEE Trans. Signal Process., vol. 64, no. 15, pp. 3868-3878, 1 Aug.1, 2016.
  • [30] K. A. Alnajjar, M. El-Tarhuni, “A C-V-BLAST spread spectrum massive MIMO NOMA scheme for 5G systems with channel imperfections,” Physical Commun., vol. 35, 2019.
  • [31] S. Özyurt, E. P. Simon and J. Farah, “NOMA With Zero-Forcing V-BLAST,” IEEE Commun. Letters, vol. 24, no. 9, pp. 2070-2074, Sept. 2020.
  • [32] J. Choi, “On the power allocation for MIMO-NOMA systems with layered transmissions,” IEEE Trans. Wireless Commun., vol. 15, no. 5, pp. 3226-3237, May 2016.
  • [33] P. Singh, B. U. Rani, H. B. Mishra and K. Vasudevan, “Neighbourhood Detection-based ZF-V-BLAST Architecture for MIMO-FBMC-OQAM Systems,” Proc. Global Commun. Conf. (Globecom’18), Abu Dhabi, United Arab Emirates, 2018, pp. 1-6.
  • [34] H.-Y. Lu, L.-P. Chang and H.-S. Hung, “Partial Tree Search Assisted Symbol Detection for Massive MIMO Systems,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 13319-13327, Nov. 2020.
  • [35] S. Adnan, Y. Fu, B. J. Ahmed, M. F. Tahir, F. Banoori, “Modified ordered successive interference cancellation MIMO detection using low complexity constellation search,” AEU - Int. J. of Electron. and Commun., vol. 121, 2020.
  • [36] S. Özyurt and M. Torlak, “Exact Outage Probability Analysis of Dual-Transmit-Antenna V-BLAST With Optimum Ordering,” IEEE Trans. Veh. Technol., vol. 68, no. 1, pp. 977-982, Jan. 2019.
  • [37] S. Yang and L. Hanzo, “Fifty Years of MIMO Detection: The Road to Large-Scale MIMOs,” IEEE Commun. Surveys Tuts., vol. 17, no. 4, pp. 1941-1988, Fourthquarter 2015.
  • [38] C. Xu, S. Sugiura, S. X. Ng, P. Zhang, L. Wang and L. Hanzo, “Two Decades of MIMO Design Tradeoffs and Reduced-Complexity MIMO Detection in Near-Capacity Systems,” IEEE Access, vol. 5, pp. 18564-18632, 2017.
  • [39] T. K. Mood and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing, Prentice Hall, 2000.
  • [40] H. V. Henderson and S. R. Searle, “On Deriving the Inverse of a Sum of Matrices,” SIAM Review, vol. 23, no. 1, Jan. 1981.