EM and SAGE algorithms for DOA estimation in the presence of unknown uniform noise

Ming-yan Gong and Bin Lyu M. Gong is with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China (e-mail: [email protected]).Bin Lyu is with the Key Laboratory of Ministry of Education in Broadband Wireless Communication and Sensor Network Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (e-mail: [email protected]).

Abstract

The expectation-maximization (EM) and space-alternating generalized EM (SAGE) algorithms have been applied to direction of arrival (DOA) estimation in known noise. In this work, the two algorithms are proposed for DOA estimation in unknown uniform noise. Both the deterministic and stochastic signal models are considered. Moreover, a modified EM (MEM) algorithm applicable to the noise assumption is also proposed. These proposed algorithms are improved to ensure the stability when the powers of sources are unequal. After being improved, numerical results illustrate that the EM algorithm has similar convergence with the MEM algorithm and the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model. Furthermore, numerical results show that processing the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

Index Terms:

Array signal processing, DOA estimation, EM algorithm, Maximum likelihood, Statistical signal processing.

I Introduction

Direction of arrival (DOA) estimation is an important part of array signal processing and some high resolution estimation techniques have been proposed. In particular, the maximum-likelihood (ML) technique plays a critical role due to its superior performance. However, ML direction finding problems are non-convex and difficult to obtain their closed-form solutions, thus leading to various iterative methods of solution [1], [2].

One computationally efficient method to compute ML estimators is the expectation-maximization (EM) algorithm in [3]. [4] and [5] have employed the EM algorithm to solve ML direction finding problems. The EM algorithm consists of two sequential steps at every iteration [4], [5]: an expectation step (E-step) is to estimate the complete-data sufficient statistics by finding their conditional expectations, and a maximization step (M-step) is to estimate the signal parameters by parallel maximizations. However, the EM algorithm updates all of the parameter estimates simultaneously, which results in slow convergence. In order to speed up the convergence of the EM algorithm, [6] proposes the space-alternating generalized EM (SAGE) algorithm. [7] and [8] show that the SAGE algorithm does yield faster convergence in terms of DOA estimation.

The EM and SAGE algorithms in ML direction finding are usually derived under known noise [4], [5], [7], [8]. Know noise is with a known statistical model and without unknown parameters, which may be unrealistic in certain applications. In fact, many seminal works in ML direction finding consider the so-called unknown uniform noise model [1], [2], [9]–[11]. The covariance matrix of unknown uniform noise can be expressed as $\sigma\mathbf{I}_{N}$ , where $\sigma$ is an unknown common variance and $\mathbf{I}_{N}$ is the $N$ -order identity matrix. Under this noise assumption, [9] presents a computationally attractive alternating projection algorithm for computing the deterministic ML estimator, [10] investigates the statistical performance of this ML estimator and derives the Cramer-Rao lower bound, [11] compares some statistical properties of both the deterministic and stochastic ML estimators. In addition to uniform noise, nonuniform noise has also attracted increasing attention. Nonuniform noise has an arbitrary diagonal covariance matrix and thus makes associated ML direction finding problems complex. For efficiently computing the deterministic and stochastic ML estimators in unknown nonuniform noise, [12] and [13] have presented two alternating optimization algorithms, respectively.

In this work, we develop the EM and SAGE algorithms in ML direction finding for unknown uniform noise. Theoretical analyses indicate that $\sigma$ has little effect on the two algorithms for the deterministic signal model. However, the M-step in the EM algorithm for the stochastic signal model can be no longer simplified to parallel subproblems easily when $\sigma$ is unknown. Hence, we divide the M-step into two conditional maximization steps (CM-steps) based on the expectation-CM (ECM) algorithm [14]. Besides, we propose a modified EM (MEM) algorithm applicable to unknown uniform noise. Note that although the EM algorithm in [15] is similar to the MEM algorithm, it is incorrectly derived.

Existing simulations about the EM and SAGE algorithms always adopt sources of equal power [4], [5], [7], [8], [15]. However, we find that when the powers of sources are unequal, multiple DOA estimates obtained by the EM, MEM, and SAGE algorithms tend to be consistent with the true DOA of the source with the largest power. Hence, we improve these algorithms. After being improved, numerical results illustrate that 1) the EM algorithm has similar convergence with the MEM algorithm, 2) the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model, i.e., the SAGE algorithm converges faster and is more efficient for avoiding the convergence to an unwanted stationary point of the log-likelihood function, and 3) the SAGE algorithm cannot always outperform the EM and MEM algorithms for the stochastic signal model.

The EM, MEM, and SAGE algorithms for the deterministic signal model can process samples from the stochastic signal model. Hence, we via simulation compare the convergence of the EM and SAGE algorithms for both models. Numerical results illustrate that under the same samples, initial DOA estimates, and stopping criterion, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

The contributions of this work are summarized as follows:

•

We develop the EM and SAGE algorithms in ML direction finding for unknown uniform noise. In particular, we derive the SAGE algorithm for the stochastic signal model, which is not discussed in [6], [7], and [15].
•

We propose an MEM algorithm applicable to the unknown uniform noise assumption.
•

We improve the EM, MEM, and SAGE algorithms to ensure the stability when the powers of sources are unequal.
•

We via simulation show that the EM algorithm has similar convergence with the MEM algorithm and the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model. However, the SAGE algorithm cannot always outperform the EM and MEM algorithms for the stochastic signal model.
•

We via simulation show that processing the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

Notations: $\mathbf{a}^{T}$ , $\mathbf{a}^{H}$ , and $\|\mathbf{a}\|$ are the transposition, conjugate transposition, and Euclidean norm of a vector $\mathbf{a}$ , respectively. $\mathbf{0}=[0~{}\cdots~{}0]^{T}$ and $\mathbf{1}=[1~{}\cdots~{}1]^{T}$ . $\mathbf{A}^{-1}$ , $|\mathbf{A}|$ , and $\mathrm{Tr}\{\mathbf{A}\}$ are the inversion, determinant, and trace of a square matrix $\mathbf{A}$ , respectively. $\mathbf{0}_{N}$ is the $N$ -order zero matrix. $\mathbf{A}\geq\mathbf{0}_{N}$ and $\mathbf{A}>\mathbf{0}_{N}$ denote that the $N$ -order square matrix $\mathbf{A}$ is positive semi-definite and definite, respectively. $\mathbb{E}\{\cdot\}$ and $\mathbb{D}\{\cdot\}$ denote expectation and variance, respectively. $\jmath$ is the imaginary unit.

II Data Model and Problem Formulation

We consider an array composed of $N$ isotropic sensors receiving the signals emitted by $M$ narrowband far-field sources with the same known center wavelength $\lambda$ . The array geometry is arbitrary and known in a Cartesian coordinate system and let the $n$ th sensor be at a known position $\mathbf{p}_{n}=[x_{n}~{}y_{n}~{}z_{n}]^{T}$ close to the origin. Spherical coordinates are used to show the DOAs of the $M$ sources. The elevation and azimuth angles of the $m$ th source are denoted by $\varphi_{m}\in[0,\pi]$ and $\phi_{m}\in[0,2\pi)$ , respectively. Thus, its unit directional vector can be expressed as $\mathbf{q}_{m}=[\sin(\varphi_{m})\cos(\phi_{m})~{}\sin(\varphi_{m})\sin(\phi_{m})~{}\cos(\varphi_{m})]^{T}$ in the Cartesian coordinate system.

We use the origin as the reference point of the array. Then, the phase difference, with respect to the $m$ th source, between the two signals, respectively, received at the origin and the $n$ th sensor can be approximated as $\psi_{n,m}=-\frac{2\pi}{\lambda}\mathbf{p}_{n}^{T}\mathbf{q}_{m}$ . After being down-converted to baseband, the received signal vector of the array can be characterized by [1], [2]

\displaystyle\mathbf{y}(t)={\sum}_{m=1}^{M}\mathbf{a}(\theta_{m})s_{m}(t)+\mathbf{w}(t),

(1)

where $\theta_{m}=[\varphi_{m}~{}\phi_{m}]^{T}\in\Omega$ , $\mathbf{a}(\theta_{m})=[e^{\jmath\psi_{1,m}}~{}\cdots~{}e^{\jmath\psi_{N,m}}]^{T}$ , $s_{m}(t)$ is the $m$ th source baseband signal received at the origin and its power is $P_{m}$ , $\mathbf{w}(t)\sim\mathcal{CN}(\textbf{0},\sigma\mathbf{I}_{N})$ represents a white Gaussian noise vector with a common noise variance $\sigma$ .

EM-type algorithms require the definition of the underlying complete data and their associated log-likelihood functions. According to the EM paradigm in [4] and [5], after sampling we design the samples (“snapshots”) or incomplete data as

	$\displaystyle\mathbf{y}(t)$	$\displaystyle=$	$\displaystyle{\sum}_{m=1}^{M}\big{[}\mathbf{a}(\theta_{m})s_{m}(t)+\mathbf{w}_{m}(t)\big{]}$		(2)
		$\displaystyle=$	$\displaystyle{\sum}_{m=1}^{M}\mathbf{x}_{m}(t),t=1,2,\dots,T,$		(2)

where $\mathbf{w}_{m}(t)\sim\mathcal{CN}(\textbf{0},\alpha_{m}\sigma\mathbf{I}_{N})$ and the $\mathbf{w}_{m}(t)$ ’s are mutually uncorrelated, $\boldsymbol{\alpha}=[\alpha_{1}~{}\cdots~{}\alpha_{M}]^{T}\geq\mathbf{0}$ and $\mathbf{1}^{T}\boldsymbol{\alpha}=1$ , the $\mathbf{x}_{m}(t)$ ’s are the complete data, $T$ is the number of samples.

Since the joint probability density functions (PDFs) of the incomplete and complete data depend also on the statistical model of the $s_{m}(t)$ ’s, we consider the deterministic and stochastic signal models separately.

II-A Deterministic Signal Model

The deterministic signal model requires the deterministic, arbitrary, and unknown $s_{m}(t)$ ’s [4], [5], [9]–[11], i.e., $\mathbf{x}_{m}(t)\sim\mathcal{CN}(\mathbf{a}(\theta_{m})s_{m}(t),\alpha_{m}\sigma\mathbf{I}_{N})$ , $\mathbf{y}(t)\sim\mathcal{CN}(\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t),\sigma\mathbf{I}_{N})$ with $\mathbf{A}(\boldsymbol{\theta})=[\mathbf{a}(\theta_{1})~{}\cdots~{}\mathbf{a}(\theta_{M})]$ , $\boldsymbol{\theta}=[\theta_{1}~{}\cdots~{}\theta_{M}]\in\boldsymbol{\Omega}~{}(\boldsymbol{\Omega}=\Omega^{M})$ , and $\mathbf{s}(t)=[s_{1}(t)~{}\cdots~{}s_{M}(t)]^{T}$ . Then, the incomplete- and complete-data log-likelihood functions are, respectively, expressed as


	$\displaystyle\mathcal{L}(\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{Y};\boldsymbol{\theta},\mathbf{S},\sigma)$
$\displaystyle=$	$\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\\|\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\\|^{2},$	(3a)

	$\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\sigma)$
$\displaystyle=$	$\displaystyle-TMN\ln(\pi\sigma)-TN\sum_{m=1}^{M}\ln(\alpha_{m})-$	(3b)
	$\displaystyle\frac{1}{\sigma}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\\|\mathbf{x}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2},$	(3b)

where $\mathbf{S}=[\mathbf{s}(1)~{}\cdots~{}\mathbf{s}(T)]$ , $\mathbf{Y}=[\mathbf{y}(1)~{}\cdots~{}\mathbf{y}(T)]$ , $\mathbf{X}_{m}=[\mathbf{x}_{m}(1)~{}\cdots~{}\mathbf{x}_{m}(T)]$ , and $\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{S})$ denotes the signal parameters while $\sigma$ is the only noise parameter. Thus, the ML estimation problem, i.e., $\max_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0}\mathcal{L}(\boldsymbol{\Theta},\sigma)$ , can be simplified to

\displaystyle\mathop{\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0}}N\ln(\sigma)+\frac{1}{\sigma T}\sum_{t=1}^{T}\big{\|}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\big{\|}^{2}.

(4)

II-B Stochastic Signal Model

In the stochastic signal model, $s_{m}(t)\sim\mathcal{CN}(0,P_{m})$ . For simplicity, all of the $s_{m}(t)$ ’s and $\mathbf{w}_{m}(t)$ ’s are assumed to be mutually uncorrelated [4], [5], i.e., $\mathbf{x}_{m}(t)\sim\mathcal{CN}(\textbf{0},\mathbf{C}_{m})$ with $\mathbf{C}_{m}=P_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\alpha_{m}\sigma\mathbf{I}_{N}$ , and $\mathbf{y}(t)\sim\mathcal{CN}(\mathbf{0},\mathbf{C}_{y})$ with $\mathbf{C}_{y}=\sum_{m=1}^{M}\mathbf{C}_{m}$ . Then, the incomplete- and complete-data log-likelihood functions are, respectively, expressed as


	$\displaystyle\mathcal{L}(\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{Y};\boldsymbol{\theta},\mathbf{P},\sigma)$
$\displaystyle=$	$\displaystyle-TN\ln(\pi)-T\ln\|\mathbf{C}_{y}\|-\sum_{t=1}^{T}\mathbf{y}^{H}(t)\mathbf{C}_{y}^{-1}\mathbf{y}(t),$	(5a)

	$\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\sigma)$
$\displaystyle=$	$\displaystyle-TMN\ln(\pi)-T\sum_{m=1}^{M}\ln\|\mathbf{C}_{m}\|-$	(5b)
	$\displaystyle\sum_{t=1}^{T}\sum_{m=1}^{M}\mathbf{x}^{H}_{m}(t)\mathbf{C}_{m}^{-1}\mathbf{x}_{m}(t),$	(5b)

where $\mathbf{P}=[P_{1}~{}\cdots~{}P_{M}]^{T}$ and $\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{P})$ . Finally, the ML estimation problem can be simplified to

\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{P}\geq\mathbf{0},\sigma>0}\ln|\mathbf{C}_{y}|+\mathrm{Tr}\big{\{}\mathbf{C}_{y}^{-1}\hat{\mathbf{R}}_{y}\big{\}},

(6)

where $\hat{\mathbf{R}}_{y}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{y}(t)\mathbf{y}^{H}(t)\geq\mathbf{0}_{N}$ is the sample covariance matrix.

III EM Algorithm

In this section, we derive the EM algorithm for solving problems (4) and (6). For convenience, we define the following notations:

1)

$(\cdot)^{(k)}$ denotes an iterative value obtained during the $k$ th iteration and $(\cdot)^{(0)}$ denotes an initial value.
2)

$\hat{\mathbf{R}}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{m}(t)\mathbf{x}^{H}_{m}(t)\geq\mathbf{0}_{N}$ .
3)

$\boldsymbol{\Pi}_{m}=\frac{1}{N}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})$ .
4)

$e^{(k)}_{m}=\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k)}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\geq 0$ .
5)

$d^{(k)}_{m}=\mathrm{Tr}\big{\{}\big{(}\mathbf{I}_{N}-\boldsymbol{\Pi}^{(k)}_{m}\big{)}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}=\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-e^{(k)}_{m}\geq 0$ .

III-A Deterministic Signal Model

At the $k$ th E-step, the conditional expectation of (3b) is computed by

	$\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle-T\Big{\{}MN\ln(\pi\sigma)+N\sum_{m=1}^{M}\ln(\alpha_{m})+\frac{1}{\sigma}\big{[}c^{(k)}+$	(7)
	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2}\big{]}\Big{\}},$	(7)

where the conditional PDF of $\mathbf{x}_{m}(t)$ can be derived from [17] and


$\displaystyle\mathbf{x}^{(k)}_{m}(t)=$	$\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle\alpha_{m}\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k-1)})\mathbf{s}^{(k-1)}(t)\big{]}$		(8a)
	$\displaystyle+\mathbf{a}(\theta^{(k-1)}_{m})s^{(k-1)}_{m}(t),\forall m,t,$		(8a)

$\displaystyle c^{(k)}$	$\displaystyle=$	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\mathrm{Tr}\left\{\mathbb{D}\big{\{}\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\big{\}}\right\}$	(8b)
	$\displaystyle=$	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\mathrm{Tr}\left\{\alpha_{m}(1-\alpha_{m})\sigma^{(k-1)}\mathbf{I}_{N}\right\}$
	$\displaystyle=$	$\displaystyle N(M-1)\sigma^{(k-1)}.$

At the $k$ th M-step, based on (7) the EM algorithm updates the estimates of $\boldsymbol{\Theta}$ and $\sigma$ by

	$\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0}$	$\displaystyle MN\ln(\sigma)+\frac{1}{\sigma}\big{[}c^{(k)}+$			(9)
		$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2}\big{]},$			(9)

which can be solved easily in a separable fashion. Then, we have the parameter estimates:


	$\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,$		(10a)

	$\displaystyle s^{(k)}_{m}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{m})\mathbf{x}^{(k)}_{m}(t),\forall m,t,$		(10b)

	$\displaystyle\sigma^{(k)}=(1-\frac{1}{M})\sigma^{(k-1)}+\frac{1}{MN}\sum_{m=1}^{M}d^{(k)}_{m}/\alpha_{m},$		(10c)

where $\hat{\mathbf{R}}^{(k)}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{m}(t)[\mathbf{x}^{(k)}_{m}(t)]^{H}$ and $\sigma^{(k)}>0$ if $\sigma^{(k-1)}>0$ .

Remark 1.

Eliminating $\sigma$ , problem (4) can be simplified to

\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S}}\frac{1}{T}\sum_{t=1}^{T}\big{\|}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\big{\|}^{2},

(11)

which indicates that for the deterministic signal model, the ML estimator of $\boldsymbol{\Theta}$ is unrelated to $\sigma$ . Additionally, note that the $\mathbf{x}^{(k)}_{m}(t)$ ’s in (8a), the $\theta^{(k)}_{m}$ ’s in (10a), and the $s^{(k)}_{m}(t)$ ’s in (10b) are unrelated to $\sigma^{(k-1)}$ and respectively identical with these in the EM algorithm under the known $\sigma$ [5], [8]. Then, we can conclude that the EM algorithm under the unknown $\sigma$ is equivalent to that under the known $\sigma$ when not estimating $\sigma$ . Accordingly, (10c) can be omitted when the algorithm does not consider the nuisance parameter $\sigma$ .

III-B Stochastic Signal Model

Note that $\hat{\mathbf{R}}_{m}$ is a complete-data sufficient statistic for $\mathbf{C}_{m}$ [5], [16], which contains $\theta_{m}$ , $P_{m}$ , and $\sigma$ . At the $k$ th E-step, the EM algorithm estimates the $\hat{\mathbf{R}}_{m}$ ’s by finding their conditional expectations [3]:

$\displaystyle\hat{\mathbf{R}}^{(k)}_{m}=$	$\displaystyle\mathbb{E}\left\{\hat{\mathbf{R}}_{m}\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}+$	(12)
	$\displaystyle\left\{\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\right\},\forall m,$	(12)

where

		$\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}$
	$\displaystyle=$	$\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{y}(t),\forall m,t,$
		$\displaystyle\mathbb{D}\left\{\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}$
	$\displaystyle=$	$\displaystyle\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m,t.$

At the $k$ th M-step, based on (5b) and (12) the EM algorithm updates the estimates of $\boldsymbol{\Theta}$ and $\sigma$ by

	$\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{r}\geq\mathbf{0},\sigma>0}$	$\displaystyle MN\ln(\sigma)+\sum_{m=1}^{M}\big{[}\ln\|\mathbf{D}_{m}\|$			(13)
		$\displaystyle+\frac{1}{\sigma}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{]},$			(13)

where $\mathbf{C}_{m}=\sigma\mathbf{D}_{m}$ with $\mathbf{D}_{m}=r_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\alpha_{m}\mathbf{I}_{N}$ and $\mathbf{r}=[r_{1}~{}\cdots~{}r_{M}]^{T}$ with $r_{m}=\frac{P_{m}}{\sigma}$ being the signal to noise ratio of the $m$ th source.

Due to $\sigma$ , it seems to be very difficult that problem (13) can be simplified to parallel subproblems. To perform the M-step simply, we divide it into two CM-steps based on the ECM algorithm [14].

•

First CM-step: estimate $\boldsymbol{\Theta}$ but hold $\sigma=\sigma^{(k-1)}$ fixed. Then, (13) can be simplified to the $M$ parallel subproblems

\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0}\ln|\mathbf{D}_{m}|+\frac{1}{\sigma^{(k-1)}}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,

(14)

which can be solved using the method in [5]. Accordingly, the estimate of $\boldsymbol{\Theta}$ is updated by


$\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,$		(15a)

$\displaystyle P^{(k)}_{m}=$	$\displaystyle r^{(k)}_{m}\sigma^{(k-1)}$
$\displaystyle=$	$\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}-\alpha_{m}\sigma^{(k-1)}\big{)},0\right\},\forall m,$	(15b)

where $\theta^{(k)}_{m}$ is indeterminate if $P^{(k)}_{m}=0$ .

•

Second CM-step: estimate $\sigma$ but hold $\boldsymbol{\Theta}=\boldsymbol{\Theta}^{(k)}$ fixed. Then, (13) can be simplified to

\displaystyle\min_{\sigma>0}MN\ln(\sigma)+\frac{1}{\sigma}\sum_{m=1}^{M}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{m}]^{-1}\hat{\mathbf{R}}^{(k)}_{m}\right\}.

(16)

Thus, the estimate of $\sigma$ is updated by

	$\displaystyle\sigma^{(k)}$	$\displaystyle=$	$\displaystyle\frac{1}{MN}\sum_{m=1}^{M}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{m}]^{-1}\hat{\mathbf{R}}^{(k)}_{m}\right\}$		(17)
		$\displaystyle=$	$\displaystyle\frac{1}{N}\sigma^{(k-1)}+\frac{1}{MN}\sum_{m=1}^{M}d^{(k)}_{m}/\alpha_{m},$		(17)

where $\sigma^{(k)}>0$ if $\sigma^{(k-1)}>0$ .

Remark 2.

Although the M-step (13) of the EM algorithm is divided into the two sequential CM-steps (14) and (16), the monotonicity of the algorithm still holds [14], i.e.,

\displaystyle\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})\geq\mathcal{L}(\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}).

Obviously, this algorithm is only a generalized EM (GEM) algorithm [3] but for convenience, we still call it the EM algorithm.

IV MEM Algorithm

In the previous section, $\boldsymbol{\alpha}$ is fixed and known. In this section, we regard $\boldsymbol{\alpha}$ as a parameter to be estimated and thus propose an MEM algorithm applicable to the unknown uniform noise assumption.

In order to estimate $\sigma$ and $\boldsymbol{\alpha}$ in the MEM algorithm easily, we introduce $\sigma_{m}=\alpha_{m}\sigma$ as the common noise variance of the $m$ th source, i.e.,

\displaystyle\mathbf{w}_{m}(t)\sim\mathcal{CN}(\mathbf{0},\sigma_{m}\mathbf{I}_{N}).

(18)

Then, $\sigma$ and $\boldsymbol{\alpha}$ can be estimated by $\sigma=\sum_{m=1}^{M}\sigma_{m}$ and $\alpha_{m}=\sigma_{m}/\sigma$ after estimating $\boldsymbol{\sigma}=[\sigma_{1}~{}\cdots~{}\sigma_{M}]^{T}$ at the M-step of the MEM algorithm. In addition, we first assume $\boldsymbol{\sigma}^{(k)}>\mathbf{0}$ for $k\geq 0$ .

IV-A Deterministic Signal Model

According to (18), (3b) is rewritten as

	$\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\boldsymbol{\sigma})$
$\displaystyle=$	$\displaystyle-TMN\ln(\pi)-TN\sum_{m=1}^{M}\ln(\sigma_{m})-$	(19)
	$\displaystyle\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\sigma_{m}}\\|\mathbf{x}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2}.$	(19)

With the help of (19), at the $k$ th E-step the MEM algorithm computes the conditional expectation:

	$\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle-TMN\ln(\pi)-T\sum_{m=1}^{M}\Big{\{}N\ln(\sigma_{m})+\frac{1}{\sigma_{m}}\big{[}c^{(k)}_{m}$	(20)
	$\displaystyle+\frac{1}{T}\sum_{t=1}^{T}\\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2}\big{]}\Big{\}},$	(20)

where


$\displaystyle\mathbf{x}^{(k)}_{m}(t)=$	$\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k-1)})\mathbf{s}^{(k-1)}(t)\big{]}$		(21a)
	$\displaystyle+\mathbf{a}(\theta^{(k-1)}_{m})s^{(k-1)}_{m}(t),\forall m,t,$		(21a)

$\displaystyle c^{(k)}_{m}$	$\displaystyle=$	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\mathrm{Tr}\left\{\mathbb{D}\big{\{}\mathbf{x}_{m}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\big{\}}\right\}$	(21b)
	$\displaystyle=$	$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\mathrm{Tr}\left\{\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)}\mathbf{I}_{N}\right\}$
	$\displaystyle=$	$\displaystyle N\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)},\forall m.$

At the $k$ th M-step, based on (20) the MEM algorithm updates the estimates of $\boldsymbol{\Theta}$ and $\boldsymbol{\sigma}$ by the following $M$ parallel subproblems:

	$\displaystyle\min_{\theta_{m}\in\Omega,\mathbf{s}_{m},\sigma_{m}>0}$	$\displaystyle N\ln(\sigma_{m})+\frac{1}{\sigma_{m}}\big{[}c^{(k)}_{m}+$			(22)
		$\displaystyle\frac{1}{T}\sum_{t=1}^{T}\\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\\|^{2}\big{]},\forall m,$			(22)

where $\mathbf{s}_{m}=[s_{m}(1)~{}\cdots~{}s_{m}(T)]$ . Then, we can obtain the parameter estimates:


	$\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,$		(23a)

	$\displaystyle s^{(k)}_{m}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{m})\mathbf{x}^{(k)}_{m}(t),\forall m,t,$		(23b)

	$\displaystyle\sigma^{(k)}_{m}=\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)}+d^{(k)}_{m}/N,\forall m,$		(23c)

where $\hat{\mathbf{R}}^{(k)}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{m}(t)[\mathbf{x}^{(k)}_{m}(t)]^{H}$ . Note that when $\boldsymbol{\sigma}^{(k-1)}>\mathbf{0}$ , based on (23c) we have $1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}>0,\forall m$ , and $\sigma^{(k)}_{m}>0,\forall m$ , i.e., $\boldsymbol{\sigma}^{(k)}>\mathbf{0}$ .

Remark 3.

The $\mathbf{x}^{(k)}_{m}(t)$ ’s in (21a) are related to $\boldsymbol{\sigma}^{(k-1)}$ , so the $\theta^{(k)}_{m}$ ’s and $s^{(k)}_{m}(t)$ ’s in (23) are related to $\boldsymbol{\sigma}^{(k-1)}$ , i.e., iterative knowledge associated with $\sigma$ is utilized to estimate $\boldsymbol{\Theta}$ in the MEM algorithm for the deterministic signal model.

IV-B Stochastic Signal Model

According to (18), $\mathbf{x}_{m}(t)\sim\mathcal{CN}(\mathbf{0},\mathbf{C}_{m})$ with $\mathbf{C}_{m}=P_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\sigma_{m}\mathbf{I}_{N}$ and (5b) is rewritten as

		$\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\boldsymbol{\sigma})$
	$\displaystyle=$	$\displaystyle-T\sum_{m=1}^{M}\big{[}N\ln(\pi)+\ln\|\mathbf{C}_{m}\|+\mathrm{Tr}\big{\{}\mathbf{C}_{m}^{-1}\hat{\mathbf{R}}_{m}\big{\}}\big{]}.$			(24)

With the help of (24), at the $k$ th E-step the MEM algorithm estimates the $\hat{\mathbf{R}}_{m}$ ’s by finding their conditional expectations:

$\displaystyle\hat{\mathbf{R}}^{(k)}_{m}=$	$\displaystyle\mathbb{E}\left\{\hat{\mathbf{R}}_{m}\|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}$
$\displaystyle=$	$\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}+$	(25)
	$\displaystyle\left\{\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\right\},\forall m,$	(25)

where

	$\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m,$
	$\displaystyle\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m.$

At the $k$ th M-step, based on (24) and (25) the MEM algorithm estimates $\boldsymbol{\Theta}$ and $\boldsymbol{\sigma}$ by the following $M$ parallel subproblems¹¹1Unlike the M-step (13) in the EM algorithm for the stochastic signal model, the M-step in the MEM algorithm for the stochastic signal model can be easily simplified to parallel subproblems.:

	$\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0,\sigma_{m}>0}$	$\displaystyle N\ln(\sigma_{m})+\ln\|\mathbf{D}_{m}\|+$			(26)
		$\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,$			(26)

where $\mathbf{C}_{m}=\sigma_{m}\mathbf{D}_{m}$ with $\mathbf{D}_{m}=r_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\mathbf{I}_{N}$ and $r_{m}=P_{m}/\sigma_{m}$ . Since $|\mathbf{D}_{m}|=Nr_{m}+1$ and $\mathbf{D}_{m}^{-1}=\mathbf{I}_{N}-\frac{Nr_{m}}{Nr_{m}+1}\boldsymbol{\Pi}_{m}$ , subproblems (26) can be rewritten as

	$\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0,\sigma_{m}>0}N\ln(\sigma_{m})+\ln(Nr_{m}+1)+$
	$\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-\frac{Nr_{m}}{\sigma_{m}(Nr_{m}+1)}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m.$		(27)

We first simplify subproblems (27) by eliminating $\mathbf{r}$ [18], [19]. Hence, after estimating $\boldsymbol{\theta}$ and $\boldsymbol{\sigma}$ , $\mathbf{r}$ and $\mathbf{P}$ are estimated by

\displaystyle r^{(k)}_{m}=\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}/\sigma^{(k)}_{m}-1\big{)},0\right\},\forall m,

\displaystyle P^{(k)}_{m}=r^{(k)}_{m}\sigma^{(k)}_{m}=\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}-\sigma^{(k)}_{m}\big{)},0\right\},\forall m,

which imply that if $r^{(k)}_{m}=0$ , $\theta^{(k)}_{m}$ will be indeterminate and $\sigma^{(k)}_{m}=\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}$ by (27). However, to estimate $\boldsymbol{\theta}$ and $\boldsymbol{\sigma}$ , we must assume $\mathbf{r}^{(k)}>\mathbf{0}$ . After eliminating $\mathbf{r}$ , subproblems (27) are simplified to

	$\displaystyle\min_{\theta_{m}\in\Omega,\sigma_{m}>0}$	$\displaystyle(N-1)\ln(\sigma_{m})+\ln\left(\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right)+$			(28)
		$\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\left\{\big{(}\mathbf{I}_{N}-\boldsymbol{\Pi}_{m}\big{)}\hat{\mathbf{R}}^{(k)}_{m}\right\},\forall m.$			(28)

Next, we simplify subproblems (28) by eliminating $\boldsymbol{\sigma}$ . Thus, after estimating $\boldsymbol{\theta}$ , $\boldsymbol{\sigma}$ is estimated by

\displaystyle\sigma^{(k)}_{m}=d^{(k)}_{m}/(N-1),\forall m.

After eliminating $\boldsymbol{\sigma}$ , subproblems (28) are simplified to

	$\displaystyle\min_{\theta_{m}\in\Omega}$	$\displaystyle(N-1)\ln\left(\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right)$			(29)
		$\displaystyle+\ln\left(\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right),\forall m,$			(29)

where $\theta_{m}\in\delta^{(k)}_{m}=\big{\{}\theta_{m}\in\Omega|\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{\}}$ due to the fact that when $\theta^{(k)}_{m}\in\delta^{(k)}_{m}$ , $e^{(k)}_{m}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}$ and

$\displaystyle r^{(k)}_{m}$	$\displaystyle=$	$\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}/\sigma^{(k)}_{m}-1\big{)},0\right\}$
	$\displaystyle=$	$\displaystyle\max\left\{\frac{1}{N}\big{[}(N-1)e^{(k)}_{m}/d^{(k)}_{m}-1\big{]},0\right\}$
	$\displaystyle=$	$\displaystyle\big{(}e^{(k)}_{m}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{)}/d^{(k)}_{m}>0,\forall m.$

Note that $(N-1)\ln\big{(}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-x\big{)}+\ln(x)$ is a monotonically decreasing function of $x$ for $x\geq\frac{1}{N}\mathrm{Tr}\{\hat{\mathbf{R}}^{(k)}_{m}\}$ , subproblems (29) are thus equivalent to

\displaystyle\max_{\theta_{m}\in\delta^{(k)}_{m}}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m.

(30)

Based on the above analysis, the estimates of $\boldsymbol{\Theta}$ and $\boldsymbol{\sigma}$ are updated by


	$\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,$		(31a)

	$\displaystyle\sigma^{(k)}_{m}=\left\{\begin{array}[]{ll}{d^{(k)}_{m}/(N-1)}&{e^{(k)}_{m}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}},\\ {\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}}&{e^{(k)}_{m}\leq\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}},\\ \end{array}\right.\forall m,$		(31d)

	$\displaystyle P^{(k)}_{m}=\max\left\{\frac{1}{N-1}\big{(}e^{(k)}_{m}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{)},0\right\},\forall m.$		(31e)

Finally, we give the following remark.

Remark 4.

In the MEM algorithm for the stochastic signal model, $\boldsymbol{\sigma}^{(k)}>\mathbf{0}$ for $k\geq 1$ if $\boldsymbol{\sigma}^{(0)}>\mathbf{0}$ .

Proof.

We utilize a proof by contradiction. Without loss of generality, assume $\boldsymbol{\sigma}^{(k)}>\mathbf{0}$ for $0\leq k\leq K-1$ $(K\geq 1)$ and $\sigma^{(K)}_{i}=0$ . Obviously, we have

	$\displaystyle\mathbf{C}^{(K-1)}_{m}=$	$\displaystyle P^{(K-1)}_{m}\mathbf{a}(\theta^{(K-1)}_{m})\mathbf{a}^{H}(\theta^{(K-1)}_{m})$			(32)
		$\displaystyle+\sigma^{(K-1)}_{m}\mathbf{I}_{N}>\mathbf{0}_{N},\forall m.$			(32)

Based on (31b) and $\sigma^{(K)}_{i}=0$ , we first consider $\sigma^{(K)}_{i}=\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(K)}_{i}\big{\}}=0$ . Then, $\hat{\mathbf{R}}^{(K)}_{i}=\mathbf{0}_{N}$ as $\hat{\mathbf{R}}^{(K)}_{i}\geq\mathbf{0}_{N}$ , resulting to that $e^{(K)}_{i}=0$ and by (25) we have

	$\displaystyle\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}$	$\displaystyle=$	$\displaystyle\mathbf{0}_{N},$
	$\displaystyle\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}$	$\displaystyle=$	$\displaystyle\mathbf{0}_{N},$

which indicate $\hat{\mathbf{R}}_{y}=\mathbf{0}_{N}$ and $\mathbf{C}^{(K-1)}_{i}=\mathbf{C}^{(K-1)}_{y}=\sum_{m=1}^{M}\mathbf{C}^{(K-1)}_{m}$ . Thus, $\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}=\mathbf{0}_{N}$ , which contradicts $\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}>\mathbf{0}_{N}$ by (32).

Next, we consider $\sigma^{(K)}_{i}=d^{(K)}_{i}/(N-1)=0$ , i.e., $\hat{\mathbf{R}}^{(K)}_{i}=f^{(K-1)}_{i}\boldsymbol{\Pi}_{i}>\mathbf{0}_{N}$ as $e^{(K)}_{i}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(K)}_{i}\big{\}}$ . Then, by (25) we have

	$\displaystyle\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}$	$\displaystyle=$	$\displaystyle h^{(K-1)}_{i}\boldsymbol{\Pi}_{i},$
	$\displaystyle\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}$	$\displaystyle=$	$\displaystyle v^{(K-1)}_{i}\boldsymbol{\Pi}_{i},$

where $h^{(K-1)}_{i}\geq 0$ and $v^{(K-1)}_{i}>0$ , resulting in

\displaystyle\left|\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}\right|=0,

and

\displaystyle\left|\mathbf{C}^{(K-1)}_{y}-\mathbf{C}^{(K-1)}_{i}\right|=\left|{\sum}_{m\neq i}\mathbf{C}^{(K-1)}_{m}\right|=0,

which contradicts $\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}>\mathbf{0}_{N}$ by (32). The proof is completed. ∎

V SAGE Algorithm

In this section, the SAGE algorithm is proposed. We design that the SAGE algorithm updates the DOA estimates in the ascending order of source number at each iteration and one iteration finishes when all the parameter estimates are updated. Besides, let $(\cdot)^{(k,i)}$ denote an iterative value obtained during updating the estimate of $\theta_{i}$ at the $k$ th iteration, $(\cdot)^{(k-1)}=(\cdot)^{(k-1,M)}=(\cdot)^{(k,0)}$ .

The SAGE algorithm updates the parameter estimates by the E- and M-steps of the EM or MEM algorithm. When updating the estimate of $\theta_{i}$ at the $k$ th iteration, the SAGE algorithm first associates all of the noise with the $i$ th source signal component [6], [7]:

\alpha_{m}=\left\{\begin{array}[]{ll}{1}&{m=i},\\ {0}&{m\neq i}.\\ \end{array}\right.

(33)

V-A Deterministic Signal Model

According to (33), $\mathbf{x}_{i}(t)\sim\mathcal{CN}(\mathbf{a}(\theta_{i})s_{i}(t),\sigma\mathbf{I}_{N})$ , $\mathbf{x}_{m}(t)=\mathbf{a}(\theta_{m})s_{m}(t)$ for $m\neq i$ , and (3b) is rewritten as

		$\displaystyle\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\sigma)$
	$\displaystyle=$	$\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\\|\mathbf{x}_{i}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\\|^{2}.$			(34)

At the E-step, the SAGE algorithm computes the conditional expectation of $\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma)$ by

		$\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma)\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}$
	$\displaystyle=$	$\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\\|\mathbf{x}^{(k)}_{i}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\\|^{2},$			(35)

where $\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{S})$ and


$\displaystyle\mathbf{x}_{i}^{(k)}(t)=$	$\displaystyle\mathbf{x}_{i}^{(k,i)}(t)=\mathbb{E}\left\{\mathbf{x}_{i}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}$
$\displaystyle=$	$\displaystyle\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k,i-1)})\mathbf{s}^{(k,i-1)}(t)\big{]}$	(36a)
	$\displaystyle+\mathbf{a}(\theta_{i}^{(k,i-1)})s_{i}^{(k,i-1)}(t),\forall t,$	(36a)

$\displaystyle\mathbb{D}\left\{\mathbf{x}_{i}(t)\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}=\mathbf{0}_{N},\forall t.$		(36b)

At the M-step, based on (35) the SAGE algorithm estimates $\theta_{i}$ , $\mathbf{s}_{i}$ , and $\sigma$ by

\displaystyle\min_{\theta_{i}\in\Omega,\mathbf{s}_{i},\sigma>0}N\ln(\sigma)+\frac{1}{\sigma T}\sum_{t=1}^{T}\|\mathbf{x}_{i}^{(k)}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\|^{2}.

(37)

Thus, the estimates of $\theta_{i}$ , $\mathbf{s}_{i}$ , and $\sigma$ are updated by


	$\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}},$		(38a)

	$\displaystyle s^{(k)}_{i}(t)=s^{(k,i)}_{i}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{i})\mathbf{x}^{(k)}_{i}(t),\forall t,$		(38b)

	$\displaystyle\sigma^{(k,i)}=d^{(k)}_{i}/N,$		(38c)

where $\hat{\mathbf{R}}^{(k)}_{i}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{i}(t)[\mathbf{x}^{(k)}_{i}(t)]^{H}$ .

Finally, the other parameter estimates are not updated and their iterative values are


	$\displaystyle\theta^{(k,i)}_{m}=\theta^{(k,i-1)}_{m},\forall m\neq i,$		(39a)

	$\displaystyle s^{(k,i)}_{m}(t)=s^{(k,i-1)}_{m}(t),\forall m\neq i,t.$		(39b)

At each iteration of the SAGE algorithm, the E- and M-steps are repeated $M$ times, leading to that $\mathbf{X}_{1},\dots,\mathbf{X}_{M}$ and all elements in $\boldsymbol{\Theta}$ are estimated once while $\sigma$ is estimated $M$ times.

Remark 5.

Notice that the $\mathbf{x}^{(k)}_{i}(t)$ ’s in (36a), $\theta^{(k)}_{i}$ in (38a), and the $s^{(k)}_{i}(t)$ ’s in (38b) are unrelated to $\sigma^{(k,i-1)}$ and respectively identical with these in the SAGE algorithm under the known $\sigma$ [8], i.e., the SAGE algorithm under the unknown $\sigma$ is equivalent to that under the known $\sigma$ when not estimating $\sigma$ . Accordingly, (38c) can be omitted when the algorithm does not consider the nuisance parameter $\sigma$ .

V-B Stochastic Signal Model

According to (33), $\mathbf{x}_{i}(t)\sim\mathcal{CN}(\textbf{0},\mathbf{C}_{i})$ with $\mathbf{C}_{i}=P_{i}\mathbf{a}(\theta_{i})\mathbf{a}^{H}(\theta_{i})+\sigma\mathbf{I}_{N}$ and for $m\neq i$ , the statistical model of $\mathbf{x}_{m}(t)$ depends only on $s_{m}(t)$ due to $\mathbf{x}_{m}(t)=\mathbf{a}(\theta_{m})s_{m}(t)$ . Thus, (5b) is rewritten as

	$\displaystyle\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\sigma)$
$\displaystyle=$	$\displaystyle-T(M-1)\ln(\pi)-T\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}_{m}/P_{m}\big{]}$	(40)
	$\displaystyle-TN\ln(\pi)-T\big{[}\ln\|\mathbf{C}_{i}\|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}_{i}\big{\}}\big{]},$	(40)

where $|a|$ denotes the modulus of a complex number $a$ , $\hat{P}_{m}$ is a sufficient statistic for $P_{m}$ [16],

	$\displaystyle\mathbf{S}_{i}$	$\displaystyle=$	$\displaystyle[\cdots~{}\mathbf{s}_{i-2}^{T}~{}\mathbf{s}_{i-1}^{T}~{}\mathbf{s}_{i+1}^{T}~{}\mathbf{s}_{i+2}^{T}~{}\cdots]^{T},$
	$\displaystyle\hat{P}_{m}$	$\displaystyle=$	$\displaystyle\frac{1}{T}{\sum}_{t=1}^{T}\|s_{m}(t)\|^{2}.$

At the E-step, the SAGE algorithm computes the conditional expectation of $\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma)$ by

	$\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma)\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}$
$\displaystyle=$	$\displaystyle-T(M-1)\ln(\pi)-TN\ln(\pi)$	(41)
	$\displaystyle-T\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}^{(k,i)}_{m}/P_{m}\big{]}$
	$\displaystyle-T\big{[}\ln\|\mathbf{C}_{i}\|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{]},$

where $\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{P})$ ,

$\displaystyle\hat{P}^{(k,i)}_{m}=$	$\displaystyle\mathbb{E}\left\{\hat{P}_{m}\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}$
$\displaystyle=$	$\displaystyle P^{(k,i-1)}_{m}\big{[}1-\mathbf{a}^{H}(\theta^{(k,i-1)}_{m})\mathbf{b}^{(k,i-1)}_{m}\big{]}$	(42)
	$\displaystyle+[\mathbf{b}^{(k,i-1)}_{m}]^{H}\hat{\mathbf{R}}_{y}\mathbf{b}^{(k,i-1)}_{m}\geq 0$	(42)

with $\mathbf{b}^{(k,i-1)}_{m}=[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{a}(\theta^{(k,i-1)}_{m})P^{(k,i-1)}_{m}$ , and

$\displaystyle\hat{\mathbf{R}}^{(k)}_{i}=$	$\displaystyle\hat{\mathbf{R}}^{(k,i)}_{i}=\mathbb{E}\left\{\hat{\mathbf{R}}_{i}\|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}$
$\displaystyle=$	$\displaystyle\mathbf{C}^{(k,i-1)}_{i}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{C}^{(k,i-1)}_{i}+$	(43)
	$\displaystyle\left\{\mathbf{C}^{(k,i-1)}_{i}-\mathbf{C}^{(k,i-1)}_{i}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{C}^{(k,i-1)}_{i}\right\}.$	(43)

At the M-step, based on (41) the SAGE algorithm updates the estimates of $\theta_{i}$ , $\mathbf{P}$ , and $\sigma$ by

	$\displaystyle\min_{\theta_{i}\in\Omega,\mathbf{P}\geq\mathbf{0},\sigma>0}$	$\displaystyle\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}^{(k,i)}_{m}/P_{m}\big{]}$			(44)
		$\displaystyle+\big{[}\ln\|\mathbf{C}_{i}\|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{]},$			(44)

resulting in that

\displaystyle P^{(k,i)}_{m}=\hat{P}^{(k,i)}_{m},\forall m\neq i,

(45)

while the estimates of $\theta_{i}$ , $P_{i}$ , and $\sigma$ are updated by

\displaystyle\min_{\theta_{i}\in\Omega,r_{i}\geq 0,\sigma>0}N\ln(\sigma)+\ln|\mathbf{D}_{i}|+\frac{1}{\sigma}\mathrm{Tr}\big{\{}\mathbf{D}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}},

(46)

where $\mathbf{C}_{i}=\sigma\mathbf{D}_{i}$ with $\mathbf{D}_{i}=r_{i}\mathbf{a}(\theta_{i})\mathbf{a}^{H}(\theta_{i})+\mathbf{I}_{N}$ and the solution can be obtained from the solutions of subproblems (26). Following (31), $\theta_{i}$ , $P_{i}$ , and $\sigma$ can be estimated by


	$\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}},$		(47a)

	$\displaystyle\sigma^{(k,i)}=\left\{\begin{array}[]{ll}{d^{(k)}_{i}/(N-1)}&{e^{(k)}_{i}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}},\\ {\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}}&{e^{(k)}_{i}\leq\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}},\\ \end{array}\right.$		(47d)

	$\displaystyle P^{(k,i)}_{i}=\max\left\{\frac{1}{N-1}\big{(}e^{(k)}_{i}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{)},0\right\},$		(47e)

where $\sigma^{(k,i)}=0$ is possible although its probability is very low. For example, if $M=2$ , $T=1$ (a single snapshot), and $P^{(1,1)}_{1}=0$ after updating the estimate of $\theta_{1}$ , we will have

\displaystyle\mathbf{C}^{(1,1)}_{2}=\mathbf{C}^{(1,1)}_{y}=P^{(1,1)}_{2}\mathbf{a}(\theta^{(1,1)}_{2})\mathbf{a}^{H}(\theta^{(1,1)}_{2})+\sigma^{(1,1)}\mathbf{I}_{N}>\mathbf{0}_{N}

and $\hat{\mathbf{R}}^{(1)}_{2}=\mathbf{y}(1)\mathbf{y}^{H}(1)$ by (43) when updating the estimate of $\theta_{2}$ . Furthermore, if $\mathbf{y}(1)=u\mathbf{a}(\bar{\theta})$ $(u\neq 0)$ , we will obtain $\theta^{(1)}_{2}=\bar{\theta}$ by (47a) and $\sigma^{(1,2)}=d^{(1)}_{2}/(N-1)=0$ by (47b).

To avoid $\sigma^{(k,i)}=0$ , we use two CM-steps based on the ECM algorithm to reestimate $\theta_{i}$ , $P_{i}$ , and $\sigma$ by problem (46) if $\sigma^{(k,i)}=0$ in (47b).

•

First CM-step: estimate $\theta_{i}$ and $P_{i}$ but hold $\sigma=\sigma^{(k,i-1)}$ fixed. Then, problem (46) is simplified to

\displaystyle\min_{\theta_{i}\in\Omega,r_{i}\geq 0}\ln|\mathbf{D}_{i}|+\frac{1}{\sigma^{(k,i-1)}}\mathrm{Tr}\big{\{}\mathbf{D}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}},

(48)

which can be solved by referring to (15). Thus, the estimates of $\theta_{i}$ and $P_{i}$ are updated by


$\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}},$		(49a)

$\displaystyle P^{(k,i)}_{i}=$	$\displaystyle r^{(k,i)}_{i}\sigma^{(k,i-1)}$
$\displaystyle=$	$\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{i}-\sigma^{(k,i-1)}\big{)},0\right\},$	(49b)

where $\theta^{(k)}_{i}$ is indeterminate if $P^{(k,i)}_{i}=0$ .

•

Second CM-step: estimate $\sigma$ but hold $\theta_{i}=\theta^{(k)}_{i}$ and $r_{i}=r^{(k,i)}_{i}$ fixed. Then, problem (46) is simplified to

\displaystyle\min_{\sigma>0}N\ln(\sigma)+\frac{1}{\sigma}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{i}]^{-1}\hat{\mathbf{R}}^{(k)}_{i}\right\},

(50)

where $\mathbf{D}^{(k)}_{i}=r^{(k,i)}_{i}\mathbf{a}(\theta^{(k)}_{i})\mathbf{a}^{H}(\theta^{(k)}_{i})+\mathbf{I}_{N}$ . Thus, the estimate of $\sigma$ is updated by

	$\displaystyle\sigma^{(k,i)}$	$\displaystyle=$	$\displaystyle\frac{1}{N}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{i}]^{-1}\hat{\mathbf{R}}^{(k)}_{i}\right\}$		(51)
		$\displaystyle=$	$\displaystyle\big{(}\sigma^{(k,i-1)}+d^{(k)}_{i}\big{)}/N,$		(51)

which implies $\sigma^{(k,i)}>0$ if $\sigma^{(k,i-1)}>0$ .

Finally, the other parameter estimate(s) is(are) not updated and the iterative value(s) is(are)

\displaystyle\theta^{(k,i)}_{m}=\theta^{(k,i-1)}_{m},\forall m\neq i.

(52)

The E- and M-steps are repeated $M$ times at each iteration of the SAGE algorithm, so $\hat{\mathbf{R}}_{1},\dots,\hat{\mathbf{R}}_{M}$ and all elements in $\boldsymbol{\theta}$ are estimated once, $\sigma$ and all elements in $\mathbf{P}$ are estimated $M$ times, each element of $\hat{\mathbf{P}}=[\hat{P}_{1}~{}\cdots~{}\hat{P}_{M}]^{T}$ is estimated $M-1$ times or once ( $M=2$ ).

VI Properties of the Proposed EM, MEM, and SAGE Algorithms

VI-A Convergence Point

It is well known that under the known $\sigma$ , the EM and SAGE algorithms satisfy standard regularity conditions [4], [6], [20] and converge to different stationary points or the same stationary point of $\mathcal{L}(\boldsymbol{\Theta})$ , which is $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ with $\sigma$ fixed, implying that $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ is a well-behaved objective function. Thus, the proposed algorithms also satisfy the regularity conditions and converge to different stationary points or the same stationary point of $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ .

Of course, the convergence points of the proposed algorithms depend on their initial points. Given a poor initial point, the proposed algorithms may never converge to the maximum point of $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ . To generate an appropriate initial point, the effective initialization procedure in [9] can be adopted using the deterministic signal model.

VI-B Complexity and Stability

Note that the computational complexities of the EM, MEM, and SAGE algorithms are dominated by searching the $\theta_{m}^{(k)}$ ’s:

\displaystyle\theta_{m}^{(k)}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}},\forall m.

(53)

Hence, if we adopt brute force to search $\theta_{m}^{(k)}$ , i.e., evaluating the objective function $\mathrm{Tr}\{\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\}$ on a coarse grid to locate a grid point, close to the maximum point of $\mathrm{Tr}\{\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\}$ , as the initial point of a gradient algorithm and then applying this gradient algorithm to search the maximum point as $\theta_{m}^{(k)}$ [4], these algorithms will have almost the same computational complexity at each iteration [7].

However, when the powers of sources are unequal, we have found via simulation that the DOA estimates of multiple sources, updated by (53), tend to be consistent with the true DOA of the source with the largest power and the algorithms may be unstable.

To address this issue, we can choose $\theta_{m}^{(k-1)}$ as the initial point of a gradient algorithm and then apply this gradient algorithm to search a local maximum point of $\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}}$ as $\theta_{m}^{(k)}$ , e.g., Algorithm 2 in the next section has given excellent numerical results. Under this choice, we still have

\displaystyle\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k)}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}}\geq\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k-1)}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}},\forall m,

(54)

which actually guarantee the monotonicity of the algorithms but only meet the requirement of GEM algorithms [3]. For convenience, we do not change the names of the algorithms under this choice.

VII Numerical Results

In this section, numerical results are provided to illustrate the convergence performances of the proposed algorithms. To ensure that a two-dimensional scatter plot can reflect the DOAs of the $M$ sources, the array is assumed to be a uniform linear array with $\mathbf{p}_{n}=[\frac{\lambda}{2}(n-1)~{}0~{}0]^{T}$ , $N=10$ , and $T=20$ , then $M=2$ and $\varphi_{1}=\varphi_{2}=90\degree$ are known while $\phi_{1}$ and $\phi_{2}$ are to be estimated. SAGE²²2In this section, the EM, MEM, and SAGE algorithms are simply written as EM, MEM, and SAGE, respectively. for the stochastic signal model is given in Algorithm 1 and the other algorithms in this section can be obtained by referring to Algorithm 1. For comparison, let all tolerances be $\epsilon=0.001$ , and $\mathbf{S}$ in the deterministic signal model is also generated by the independent random numbers $s_{m}(t)\sim\mathcal{CN}(0,P_{m})$ .

Algorithm 1 SAGE for the Stochastic Signal Model

1: Initialize the parameter set

\xi^{(0)}=\{\boldsymbol{\theta}^{(0)},\mathbf{P}^{(0)},\sigma^{(0)}\}

with

\boldsymbol{\theta}=[\phi_{1}~{}\phi_{2}]^{T}

k=1

, and

i=1

2: while

i\leq M

3: Find

\hat{\mathbf{R}}^{(k)}_{i}

and

\hat{P}^{(k,i)}_{m}

for

m\neq i

at the E-step and obtain

\xi^{(k,i)}

at the M-step, then

i=i+1

4: end while

5: while

\|\boldsymbol{\theta}^{(k)}-\boldsymbol{\theta}^{(k-1)}\|>\epsilon~{}(\mathrm{degree})

k=k+1

and

i=1

7: while

i\leq M

8: Find

\hat{\mathbf{R}}^{(k)}_{i}

and

\hat{P}^{(k,i)}_{m}

for

m\neq i

at the E-step and obtain

\xi^{(k,i)}

at the M-step, then

i=i+1

9: end while

10: end while

11: Output

\boldsymbol{\theta}^{(k)}

Furthermore, $\boldsymbol{\alpha}=[0.5~{}0.5]^{T}$ in the EM algorithm and the gradient ascent method with the backtracking line search [21], given in Algorithm 2, is adopted to search the $\phi^{(k)}_{m}$ ’s in (54). Several simulation parameters in Algorithm 2 are $\rho=0.1$ , $\eta=0.3$ , and $\gamma=0.5$ .

Algorithm 2 Gradient Ascent Based Angle Estimation

1: Define

g(\phi_{m})=N\times\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}

, initialize

\phi_{m}=\phi_{m}^{(k-1)}\in(0,\pi)~{}(\mathrm{radian})

\rho\in(0,1)

\eta\in(0,0.5)

, and

\gamma\in(0,1)

2: while

|g^{\prime}(\phi_{m})|>\epsilon

t=\left\{\begin{array}[]{ll}\rho\times\frac{\pi-\phi_{m}}{g^{\prime}(\phi_{m})},&g^{\prime}(\phi_{m})>0,\\ \rho\times\frac{-\phi_{m}}{g^{\prime}(\phi_{m})},&g^{\prime}(\phi_{m})<0.\end{array}\right.

4: while

g\big{(}\phi_{m}+tg^{\prime}(\phi_{m})\big{)}<g(\phi_{m})+\eta t|g^{\prime}(\phi_{m})|^{2}

t=\gamma t

6: end while

\phi_{m}=\phi_{m}+tg^{\prime}(\phi_{m})

8: end while

\phi_{m}^{(k)}=\phi_{m}

VII-A Deterministic Signal Model

For comparing the convergence of EM, MEM, and SAGE, Fig. 1 plots their $\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})$ ’s, $\phi^{(k)}_{1}$ ’s, and $\phi^{(k)}_{2}$ ’s as functions of $k$ under one realization. We observe that given a good initial point, the algorithms can converge to consistent DOA estimates, and EM has similar convergence with MEM while SAGE converges faster than EM and MEM.

Refer to caption — Figure 1: $\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})$ , $\phi^{(k)}_{1}$ , and $\phi^{(k)}_{2}$ comparison of EM, MEM, and SAGE for the deterministic signal model under one realization versus $k$ with $\phi_{1}=20\degree$ , $\phi_{2}=80\degree$ , $P_{1}=-2~{}\mathrm{dB}$ , $P_{2}=4~{}\mathrm{dB}$ , $\sigma=4~{}\mathrm{dB}$ , $\phi^{(0)}_{1}=24\degree$ , $\phi^{(0)}_{2}=84\degree$ , $\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}$ , $\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}$ , and $\sigma^{(0)}=1$ .

Figs. 2 and 3 show two scatter plots of the DOA estimates obtained by the algorithms under 200 independent realizations. The same samples of each realization are processed by the algorithms. In Fig. 2, the total numbers of wanted points from EM, MEM, and SAGE are 68, 72, and 179, respectively. In Fig. 3, the total numbers of wanted points from EM, MEM, and SAGE are 159, 157, and 190, respectively. Figs. 2 and 3 imply that given a poor initial point, SAGE is more efficient for avoiding the convergence to an unwanted stationary point of $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ than EM and MEM.

Note that both sources in Fig. 2 are not closely spaced, so it is very difficult to mix up both sources and the wanted points in Fig. 2 are centered around the true position $(25\degree,75\degree)$ . However, both sources in Fig. 3 are closely spaced and the wanted points are centered around $(78\degree,70\degree)$ or $(70\degree,78\degree)$ , i.e., the algorithms are likely to mix up closely spaced sources. Fortunately, we only focus on the DOAs of sources in DOA estimation and mixing up sources has no effect on the result.

According to these simulations, we can conclude that for the deterministic signal model, 1) EM has similar convergence with MEM, and 2) SAGE outperforms EM and MEM.

VII-B Stochastic Signal Model

For comparing the convergence of EM, MEM, and SAGE, Fig. 4 plots their $\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})$ ’s, $\phi^{(k)}_{1}$ ’s, and $\phi^{(k)}_{2}$ ’s as functions of $k$ . We can observe that EM has similar convergence with MEM while SAGE converges faster than EM and MEM.

Figs. 5 and 6 show two scatter plots of the DOA estimates obtained by the algorithms under 200 independent realizations. The same samples of each realization are processed by the algorithms. In Fig. 5, the total numbers of wanted points from EM, MEM, and SAGE are 185, 186, and 175, respectively. In Fig. 6, the total numbers of wanted points from EM, MEM, and SAGE are 161, 161, and 172, respectively. Figs. 5 and 6 indicate that EM has similar convergence with MEM but compared to EM and MEM, SAGE is less and more efficient for avoiding the convergence to an unwanted stationary point of $\mathcal{L}(\boldsymbol{\Theta},\sigma)$ in Figs. 5 and 6, respectively. The algorithms are likely to mix up closely spaced sources, so the wanted points in Fig. 5 are centered around $(25\degree,75\degree)$ and the wanted points in Fig. 6 are centered around $(78\degree,70\degree)$ or $(70\degree,78\degree)$ .

From Figs. 4–6, we can conclude that for the stochastic signal model, 1) EM has similar convergence with MEM, and 2) SAGE cannot always outperform EM and MEM.

VII-C Deterministic and Stochastic Signal Models

The EM, MEM, and SAGE algorithms for the deterministic signal model can process samples from the stochastic signal model, which means that the algorithms for the stochastic signal model can be compared to these for the deterministic signal model. The above numerical results have shown that EM has similar convergence with MEM, so we only compare EM and SAGE for both models in this subsection for simplicity.

Since both models have the same DOA parameter $\boldsymbol{\theta}$ , the stopping criterion $\|\boldsymbol{\theta}^{(k)}-\boldsymbol{\theta}^{(k-1)}\|\leq\epsilon$ is suitable. Fig. 7 shows a scatter plot of the DOA estimates obtained by EM and SAGE under 50 independent realizations. The same samples of each realization are processed by both algorithms for both models. From Fig. 7, we can observe that both algorithms for both models obtain consistent DOA estimates.

Based on Fig. 7, Fig. 8 compares the numbers of iterations. We can easily observe that EM for the deterministic signal model generally requires a larger number of iterations than EM for the stochastic signal model. Moreover, SAGE for the deterministic signal model generally requires a smaller number of iterations than SAGE for the stochastic signal model. More importantly, SAGE for the deterministic signal model always requires the smallest number of iterations for each realization. Thus, we can conclude that SAGE for the deterministic signal model is superior to the other algorithms in the computational cost.

VIII Conclusion

We have developed the EM and SAGE algorithms for DOA estimation in unknown uniform noise and proposed an MEM algorithm applicable to the noise assumption. Then, we improve the EM, MEM, and SAGE algorithms to ensure the stability when the powers of sources are unequal. After being improved, numerical results illustrate that the EM algorithm has similar convergence with the MEM algorithm, the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model, and the SAGE algorithm converges faster than the EM and MEM algorithms for the stochastic signal model. In addition, numerical results indicate that when these algorithms process the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

References

[1] H. Krim and M. Viberg, “Two decades of array signal processing research: the parametric approach,” IEEE Signal Processing Magazine, vol. 13, no. 4, pp. 67–94, Jul. 1996.
[2] L. C. Godara, “Application of antenna arrays to mobile communications. II. Beam-forming and direction-of-arrival considerations,” Proceeding of the IEEE, vol. 85, no. 8, pp. 1195–1245, Aug. 1997.
[3] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.
[4] M. Feder and E. Weinstein, “Parameter estimation of superimposed signals using the EM algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 4, pp. 477–489, Apr. 1988.
[5] M. I. Miller and D. R. Fuhrmann, “Maximum-likelihood narrow-band direction finding and the EM algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 9, pp. 1560–1577, Sep. 1990.
[6] J. A. Fessler and A. O. Hero, “Space-alternating generalized expectation-maximization algorithm,” IEEE Transactions on Signal Processing, vol. 42, no. 10, pp. 2664–2677, Oct. 1994.
[7] P. Chung and J. F. Bohme, “Comparative convergence analysis of EM and SAGE algorithms in DOA estimation,” IEEE Transactions on Signal Processing, vol. 49, no. 12, pp. 2940–2949, Dec. 2001.
[8] M. Gong and B. Lyu, “Alternating maximization and the EM algorithm in maximum-likelihood direction finding,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 9634–9645, Oct. 2021.
[9] I. Ziskind and M. Wax, “Maximum likelihood localization of multiple sources by alternating projection,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 10, pp. 1553–1560, Oct. 1988.
[10] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramer-Rao bound,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 5, pp. 720–741, May 1989.
[11] P. Stoica and A. Nehorai, “Performance study of conditional and unconditional direction-of-arrival estimation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 10, pp. 1783–1795, Oct. 1990.
[12] M. Pesavento and A. B. Gershman, “Maximum-likelihood direction-of-arrival estimation in the presence of unknown nonuniform noise,” IEEE Transactions on Signal Processing, vol. 49, no. 7, pp. 1310–1324, Jul. 2001.
[13] C. E. Chen, F. Lorenzelli, R. E. Hudson, and K. Yao, “Stochastic maximum-likelihood DOA estimation in the presence of unknown nonuniform noise,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3038–3044, Jul. 2008.
[14] X. Meng and D. B. Rubin, “Maximum likelihood estimation via the ECM algorithm: A general framework,” Biometrika, vol. 80, no. 2, pp. 267–278, Jun. 1993.
[15] P. Chung and J. F. Bohme, “DOA estimation using fast EM and SAGE algorithms,” Signal Processing, vol. 82, no. 11, pp. 1753–1762, Nov. 2002.
[16] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, USA: Prentice-Hall PTR, 1993.
[17] I. B. Rhodes, “A tutorial introduction to estimation and filtering,” IEEE Transactions on Automatic Control, vol. 16, no. 6, pp. 688–706, Dec. 1971.
[18] A. G. Jaffer, “Maximum likelihood direction finding of stochastic sources: a separable solution,” in Proc. ICASSP, New York, USA, Apr. 1988.
[19] P. Stoica and A. Nehorai, “On the concentrated stochastic likelihood function in array signal processing,” Circuits, Systems Signal Processing, vol. 14, no. 5, pp. 669–674, Sep. 1995.
[20] C. F. Jeff Wu, “On the convergence properties of the EM algorithm,” Annals of Statistics, vol. 11, no. 1, pp. 95–103, Mar. 1983.
[21] S. Boyd and L. Vandenberghe, Convex Optimization. New York, USA: Cambridge University Press, 2004.

Ming-yan Gong received the B.Eng. degree in metal material engineering from the Jiangsu University of Science and Technology, Zhenjiang, China, in 2016 and the M.Eng. degree in signal and information processing from the Nanjing University of Posts and Telecommunications, Nanjing, China, in 2019. He is currently working toward the Ph.D. degree with the Beijing Institute of Technology, Beijing, China. His research interests include array signal processing and MIMO communications.

Bin Lyu received the B.E. and Ph.D. degrees from the Nanjing University of Posts and Telecommunications (NJUPT), Nanjing, China, in 2013 and 2018, respectively. He is currently an Associate Professor with NJUPT. His research interests include wireless communications and signal processing.